
Hello, and welcome to this Free Code Camp
course on algorithmic trading and Python. My name is Nick McCollum, I'm going to be
your instructor for this course. And in this course, we're going to focus on building three
large quantitative finance projects. In the first project, we are going to build an equal
weight version of the popular s&p 500 index fund. In the second project, we are going
to build a quantitative momentum strategy that selects the best stocks based on a variety
of momentum, investing metrics. And in the third project, we're going to build a quantitative
value screener that select stocks that are attractive based on a number of value metrics.
So that's a quick summary of the three projects that we are going to build in this course.
Before we move into learning more about the course and those projects, I have a couple
of housekeeping items that have to go over first, to start this Free Code Camp course
on algorithmic trading.
And Python is made possible through a grant from IE x cloud.
If you've ever read the book, flash boys by Michael Lewis, you may have heard of it. In
this course, we're gonna be using some other tools and API's to populate some of the data
that we will need for these algorithmic trading strategies. The other quick thing that you're
going to notice starting now is a little message at the bottom of your screen that's going
to stay there for the rest of the video. And what that message intends to represent is
that this course is for educational purposes only, I'm going to be providing some high
level finance concepts, but nothing in this course or in this video should be construed
as investment advice.
So just to reiterate, this course, is for educational purposes only.
And none of this should be considered investment advice. So with all of that out of the way,
let's dig into a little bit more about what this course is going to teach you. So let's
start by just going over a quick course overview. The first thing that we're going to discuss
is some basics of the field of algorithmic trading. Then we're going to discuss some
API basics, and kind of just quickly go over how the course is configured and laid out.
The next thing we're going to discuss is the first project of this course, which is an
equal way version of the popular s&p 500 index fund.
And then we're going to talk about our
second project, which, as I mentioned earlier, is a quantitative momentum strategy. And then
we're going to discuss briefly our third project, which is a quantitative value strategy. So
that's kind of a 30,000 foot view of what you're going to be learning in this course.
Let's zoom in a little bit and talk about some algorithmic trading basics. So for anyone
who's completely unfamiliar with algorithmic trading, it basically means using computers
to make investment decisions.
So in the past, where you may have had a team of financial
analysts or investment researchers to kind of consider stocks and decide what to buy.
Many of the popular investing strategies that are implemented today actually use more computers
than they do humans. So algorithmic trading just means using computers to make investment
decisions. Now, there are many different types of algorithmic trading, on the one hand, you
have, you know, your super high frequency trading Big Brain, really, really sophisticated
complex strategies. And then at the other end of the spectrum, you can kind of basically
map any type of fundamental strategy that humans have have led in the past, and just
make it a little more efficient by providing some computer software in there. So as you
know, this slide kind of alludes to the main difference between the many different types
of algorithmic trading is generally the speed at which those trades are executed. Now, before
we actually dig into more about the world of algorithmic trading, it's kind of helpful
to have a high level understanding of who the main players are in this space.
And one
of the reasons I wanted to kind of provide this information at the start of this course
is so that you can see that the field of algorithmic trading is very big. And if you do want to
make a career that there are lots of employers lots of opportunities and lots of jobs in
this space. So right now, I'm just going to briefly discuss a few of the largest players
in the algorithmic trading landscape. The first one, which is kind of the 50,000 pound
gorilla in the room is Renaissance technologies. I believe they're based in Long Island, and
they have $165 billion in assets under management. Now, Renaissance technologies is mostly famous
for their Medallion Fund, which is only open to employees at this point, which I think
has returned something insane, like 50%, a year for 20 years or something like that.
Now, none of us can access that fund because it's for employees only.
But Renaissance technologies
probably has one of the best performing investment funds in the history of finance, and perhaps
most interestingly, is that Renaissance technologies was not founded by a finance major or a investment
analyst to start. The person who actually started Renaissance technologies was a math
PhD from, I believe, a California University. His name's Jim Simons, and many people would
kind of consider him to be one of the forefathers of quantitative finance. So that's kind of
the largest player in this space. Another popular one is AQR Capital Management, which
was founded by Cliff Asness and a few other managing partners and they have $61 billion
in assets under management. Now, unlike Renaissance technologies, AQR Capital Management actually
has strategy and you're not gonna invest it through mutual funds and other types of strategies.
And one last interesting thing about that firm is that their name is kind of the IBM
of investment firms where IBM is International Business Machines a very bland name, AQR Capital
Management, the AQR component of that name actually stands for Applied quantitative research.
So if you've ever curious what that stands for, That's what it is.
The last one is Citadel
securities, which has $32 billion in assets under management. And they're a much more
high frequency trading firm. And the first two that I mentioned, kind of famous for their
role in the market making space. So they were founded by Ken Griffin. And that kind of summarizes
three of the large players, you can tell by just the assets under management of these
three firms alone, more than 200 billion, that there are lots of opportunities and jobs.
And this is a very large space. So definitely something worth learning a bit about. Now,
in this course, as you probably have can tell by the title, we are going to be using Python
for these algorithmic trading strategies. Now, before we begin actually how to use Python
and how to write the code for it, there's a few high level things you should understand.
The first one is that Python is probably the most popular programming language for algorithmic
trading. And the reason why is just because it has a lot of libraries.
But the downside
to using Python is that it's a fairly slow programming language. So if you want to execute
high performance code, Python is typically not your best bet. Now, the solution that
many practitioners have found for this is that Python is often used as a glue language
to trigger code that actually runs in other languages. A common example of this is the
NumPy library for Python. And that's actually a library that we'll be using in this course.
Now, NumPy is the most popular Python library for performing numerical computing. And it's
perhaps most popular for its data structure called the NumPy array, which easily allows
you to store and manipulate one or two dimensional data structures in Python. Now, although NumPy
is actually a Python library, and it is called and manipulated using the Python programming
language, the core underlying functionality is written in C, which is a much faster language
and provide lots of performance enhancements to Python users who use the NumPy library
in their code.
So just to quickly summarize, the main idea of this section is that Python
is the most popular programming language for quantitative finance. But it's also a slow
language. So a lot of times you will use Python to trigger functionality that actually runs
in other programming languages that are faster. So now let's quickly talk about the algorithmic
trading process can kind of be broken down into the following steps, the first step is
to collect data, the second step is to develop a hypothesis for a strategy. The third step
step is to back test that strategy. Now, back testing just means formulating your strategy,
and then seeing how it would have performed historically over time.
And generally, there's
kind of two ways you want to do that you want to take the strategy back as far back in time
as you can, and across as many markets as you can. So to provide a quick example, let's
say that you have a hypothesis that the largest firms outperform, you would probably want
to test that here in the United States, and go back as far back in time as you can. And
then you would also want to test the performance of the largest firms in Canada, Europe, Japan,
China, India, all the other international markets as well. And if that strategy performs
well in all of those markets, then you can be sure that you're probably on to something.
Now, the fourth strategy would be to implement that we're sorry, the fourth step rather,
would be to implement that strategy in production. So you've collected data, you develop a hypothesis,
you've back testing strategy, and now you're going to start actually trading that strategy
with real money. In real accounts moving forward and seeing how it does with real money in
the strategy.
Now you have a high level understanding of the algorithmic trading process, there's
a few ways that this course will be different from a real algorithmic trading strategy.
And I just want to highlight those quickly right now before we dive into some information
about the three projects we'll be building. Now, just to quickly summarize the three major
ways that this course will be different. The first is that we'll be using random data,
the data provider that we're using for this course, which is is cloud is a paid API. So
if you want to use real data, you can follow the steps in this course, but actually pay
for an account and use a real API token. What we'll be using instead is we will be using
a sandbox API token which provides random data, but it's also free. So what that means
is you can you can do all the normal things that you can with the IRS code API, but the
data will be randomly generated. So we may see some interesting results because of that.
The second thing that is different is that we will not actually be executing trades in
this course.
Now, the reason why that is, is because many of the people who take this
course will be using different custodians to execute their trades. And each one of those
custodians will have a different API infrastructure. So I wanted to make this course as widely
applicable as possible. So what we're going to be doing instead is we're going to be imagining
that we work at a firm where our traders expect us to generate Excel documents and send them
to them to execute trades.
So the output of each of these three strategies will be an
Excel document that has the the name of each companies that need to be purchased and the
number of shares of each one of those companies that the trader needs to purchase. So we won't
be executing trades, but instead, we'll we'll be generating order sheets to send to our
imaginary traders further down the line. And just to reiterate that second point, we're
gonna be saving these recommended tricks to Excel files, which could then be sent to the
traders afterwards. Alright, so now that we have a bit of a high level understanding of
the algorithmic trading landscape, it's time to talk about API basics. configuration. So,
to start, let's discuss what is an API. API is an application programming interface. And
you probably aren't familiar with that if you've ever heard the term API before, but
that doesn't actually tell you what it does.
I think the easiest way to understand what
an API is, is, it's a way for your software to interact with and potentially control someone
else's software. So in this case, we're gonna be using the IE x cloud API. And what we're
going to do is use their API to access their database of financial data and import it into
our Python script. So like I say, API's allow you to interact with someone else's software
using your own code. And in this course, we're going to use the IRS code API to gather stock
market data to make investment decisions. And to provide you a quick example of what
that looks like. Here's a code snippet. So this is a cell from a Jupyter Notebook that
has four lines, you can see on the first line that we declare a symbol AAPL, and that is
the stock symbol for Apple on the NASDAQ exchange.
On the second line, we're creating a string
called API URL that has some interpolated values that interpolate in the symbol and
our AI x cloud API token token. And on the third line, we're using the requests library
of Python to provide a GET requests to that API URL. And then, on the last line, we're
just printing the data that gets returned. Now, this is kind of a lot. And if you don't
fully understand this, not to worry, you will see this in a lot more detail when we actually
build the projects for this course. So that's kind of what an API call looks like. And,
broadly speaking, in this course, we're only going to be using GET requests to gather data
from the IRS code API. Now, if you've never used an API before, there's many different
ways you can interact with them. But there's four major ones. And I'm going to show you
briefly some of the other ways to interact with an API right now before we proceed through
this course.
So get requests allow you to get data from the API. And that's what we'll
be using in this course. An alternative is a POST request. And that is a method for adding
data to the database that's exposed by the API. So it gives a request allows you to get
data from the API and a POST request allows you to push data to the API. a pull request
allows you to add and overwrite data in the database that's exposed by the API. Now you
can think of a POST request as a create only request. And you can think of a put request
as a create or replace request. So there's a slight difference there. And but even though
it's a small difference, it's an important difference, because that's kind of a common
source of bugs for software developers. And the fourth kind of important type of API call
is a delete, which, as it name implies, it deletes data from the API's database. So to
provide a quick summary of this section, we're going to be using exclusively GET requests
in our course.
But there's three main types of requests that you should understand those
are post, put, and delete. Now, like most software concepts, API's are best learned
through rigorous practice. Because of that, here is a URL that you can use to get a long
list of public API's that are great practice for you to interact with API's. Now, I would
probably recommend if you've never used an API before to proceed through this course,
that will give you a base fundamental understanding of how to work with API's.
And then once that
course is done, you can go to this public URL and practice with some other API's that
are of interest to you personally. So this is not something for you to look at right
now. But it's a good resource for you to head later on. Alright, so now that you have a
high level understanding of the course configuration and some API basics, let's move on to talking
about our equal way s&p 500 fund. So if you've never heard of it before, the s&p 500 is the
world's most popular stock market index. And many investment funds are benchmarked to the
s&p 500. That means that they seek to replicate the performance of this index by owning all
of the stocks that are held in the index. Now, if you've never really researched how
stocks are added to the s&p 500, there's a bit of a there's a committee process and some
some sophistication there. But the easiest way to understand it is that the s&p 500 broadly
covers the 500 largest companies in the United States.
So if you own the s&p 500 index fund,
you basically own the 500 largest companies in the US. Now aside from the fact that it
owns the 500 largest companies, one of the other most important characteristics of the
s&p 500 is that it is market capitalization weighted. Now, market capitalization is just
a finance term, that means basically size. So coming up with a large market capitalization
just means that it's a large company. And what this means is that larger companies get
a correspondingly larger weight in the s&p 500 index. For the first product of this course,
we're going to build an alternative version of the s&p 500 index fund that provide that,
that assigns the same way to each company in the index. So instead of Apple having a
very big way and best by having a small way, every company will have the same weight. So
that's going to be the first project of this course. And the second project we are going
to build a quantitative momentum screener.
Now, momentum investing means investing in
assets that have increased in price the most. To help you understand let's kind of go through
a quick example. Imagine that you have the choice between investing in two stocks that
have had the following returns over the last year. Apple has stock has gone up 35% and
Microsoft stock has gone up 20%. A momentum investing strategy would suggest investing
in Apple because of its higher recent price return. There are many other nuances to momentum
strategies, including the concept of high quality momentum.
And that basically means
all else being equal, you would want a stock that has steadily increasing price rather
than no price change for a while, and then a large jump at the end. But we'll explore
those when we actually build our quantitative momentum strategy later in this course. The
third project we're going to build is a quantitative value screener. Now, value investing just
means investing in stocks that are trading below their perceived intrinsic value. So
if you kind of want to make a very simple analogy, value investing is kind of the idea
of buying $1 for 75 cents, and hoping that you can sell it again for $1 later.
So value
investing is a very popular investing strategy because many of the world's best historical
investors like Warren Buffett, Seth Klarman, and Benjamin Graham have employed the strategy.
Creating algorithmic trading investing strategies relies on a concept called multiples and multiples
are simply a way that investors use to estimate how valuable a company is. So to be more specific
multiples are calculated by dividing a company's stock price by some measures of the company's
worth, like earnings or assets. So three examples of multiples that are used in value investing
are the price to earnings ratio, the price to book ratio, and the price to free cash
flow ratio.
Now, as the name implies, the price to earnings ratio is calculated by dividing
a company's stock price bytes earnings per share, the price to book value ratio is calculated
by dividing a company's stock price by its book value per share. And the price to free
cash flow ratio is calculated by dividing a company's stock price by its free cash flow
per share. So lots of different ways you can calculate multiples, but those are kind of
three, three of the most important ones Anyway, now, each of the individual multiples to us
by value investors has its pros and cons. One way to minimize the impact of any specific
multiple is by using what's called a composite. Now, a composite is just a average of many
different valuation strategies. So in our strategy that we build in this course, we're
going to actually be using a composite of five different value investing metrics.
Now,
with all that out of the way, you have a solid understanding of what we're going to be doing
in this course. And it's time for us to dig into the first project. So without further
ado, let's do just that. Alright, so it's now time for us to start tackling our first
project. If you've skipped ahead to this section of this course, there's a couple of things
that you should do before proceeding. So the first is to rewind back to the last section
and install all of your dependencies.
If you're a pretty experienced Python developer, the
main dependencies for this course are NumPy, pandas and XLS x writer. So if you already
have those on your machine, that's fine. If you've already been following throughout this
video so far, I'm in our algorithmic trading Python folder that we cloned our local computer,
I have my virtual environment activated. And what I'm going to do now is launch the Jupyter
Notebook for our first project. So to do that the command is Python dash M and then notebook.
So this should open up a browser window. And what we want to do is navigate into the starter
Files folder of the algorithmic trading repository, and then open up the equal wait s&p 500 dot
i py MB notebook, it should look like this. So basically, this is what we're going to
be working through in this video.
It has some instructions written in markdown and some
blank code cells. And that's what we're going to complete as we work through this section.
So let's start with just some background information. What we're going to do in this project is
to build an equal weight version of the s&p 500 index fund. The s&p 500 is an equal way
or not an equal weight, market weight, market cap weighted basket of the 500 largest companies
in the United States. Interestingly enough, it actually has more than 500 stocks in it
because five or so of the largest companies in the US actually have multiple share classes.
So an example of that is Google it has a Class A and A Class C share that trade on the stock
exchange. So since alphabet, the company is included in the s&p 500, both of its stocks
are and because of that, there's usually around 505 stocks in the s&p 500, despite the index
only representing 500 companies.
So anyways, to move on, what we're going to do now is
create a different version of the s&p 500 index fund that doesn't weight the stocks
in the index by market capitalization, but instead equal weights them so what this means
is that the larger companies in the index will have less weight than the traditional
version. And the smaller companies in the index will have a larger way than the traditional
version. So there's some backup information here just says that the largest one that's
benchmarked to this index is the spdr s&p 500 ETF trust that's commonly known as the
ticker spy spy, which is the ticker that it trades on on the stock exchange. And that
ETF has more than $250 billion dollars of assets under management, which is pretty crazy.
Most asset management firms don't have that much. Hmm. So a really big fun to be sure.
And it says here The goal of this section of the course is to create a Python script
that will accept the value of your money.
Portfolio and tell you how many shares of
each s&p 500 constituent that you need to purchase to get an equal wave version of the
index funds. So the first thing we need to do is, you know, like most Python scripts,
this project will rely on a number of open source software libraries. So we're going
to import those first, we install them to our local computer in the dependency section
of this course. And now we just need to import them. And we're going to go through them one
by one here, and I'll briefly explain what each of these libraries does in case you haven't
haven't worked with them in the past. Alright, so the first library we're going to import
is NumPy. And the command to do that is important NumPy. Now NumPy as NP now, NumPy is a numerical
computing library. And it's known for its fast actually execution speed.
And the reason
why NumPy is so fast is because it's actually a C, or maybe c++ module. So when you call
a Python function in NumPy, it actually executes the code in a different programming language.
And that programming language is kind of faster by design. So a NumPy is often used in finance
and other applications to speed up, you know, basic functions like summing or multiplication,
because it has modules running in a different programming language that is naturally faster.
So that's NumPy.
We're also going to import pandas, as PD. And pandas is a portmanteau
for panel data, or panel data is I guess, because it has an S on it. And what pandas
does is it makes it very easy to work with tabular data in Python. So tabular data is
anything that has rows and columns, you can think of an Excel spreadsheet as perhaps one
of the most commonly known versions of tabular data. And pandas is most widely use for its
data structure called the panda's data frame. And a data frame is just a data structure
that holds tabular data. So pandas allows you to store data in a data frame, and then
use many different built in pandas functions and methods to manipulate the data within
that data frame. You'll see lots of examples about this as you work through this or this
course. And if you're interested in learning more about pandas Free Code Camp has an excellent
YouTube video on the topic that I would highly recommend.
So that's pandas. And then there's
a couple others we need as well. So actually, before we move on, one thing that's worth
talking about is these aliases, so we didn't just import NumPy, we imported NumPy as NP.
And we didn't just import pandas, but we imported pandas as PD. So why do we do that? This just
save us a bit of typing because we're going to be often calling functions from these libraries.
So an example would be, let's just run this.
So those imports run and then create a new
code. So if we wanted to create a panda's data frame that, you know, will just create
an empty one. For an example, if we had just imported pandas without importing it as PD,
we would have to write panda's dot data frame. And this call is the class that instantiates
A panda's data frame object. Now, because we imported it under the alias PD, we can
just do PD dot data frame and it's a little bit more readable and faster to type. So it's
not necessary to import the under aliases. I know a few developers who just run pandas
DataFrame. But importing pandas under the alias PD and NumPy under the alias NP is pretty
common and kind of widely considered to be a best practice. So that's what we're going
to follow in this course. Alright, let's move on to importing the other libraries we need.
So the next one is requests. And requests is a very popular Python library that it's
kind of considered the gold standard for making HTTP requests.
Now, an HTTP request is just
basically an internet request that you can send to an API to get back some data. And
in this course, we're going to be using the requests library to execute our API calls
to the IE x cloud API to I guess, receive our stock market data that we need to calculate
the weightings for s&p 500 index funds, so the request libraries for HTTP requests. And
that's basically all you need to know about it for now.
The next thing we need to import
is XLS. s writer, this is a library that makes it very easy to save well formatted documents,
Excel documents from a Python script. So that's XLS x writer. And the last library we need
is math. So math is just a basic Python library that provides many of the basic mathematical
functions that you need to, I guess, execute operations within Python scripts.
Alright,
so once you run that code, all of your libraries will be imported. And the next thing that
we need to do is to import our list of stocks. Now, like I said, this is going to be a list
of about 500 stocks, maybe a little more if there's any stocks that are dual listed in
the s&p 501 thing to note about this project is that since these constituents change over
time, in an ideal world, you'd actually connect right to an Index Provider or an API, financial
data Data API or something like that. So that if a stock is added or removed from the s&p
500, then your equal weight s&p 500 strategy would reflect that now all of the tools that
would provide that information are paid and this is a free course. So I didn't want to
provide any paid resources in the free course that kind of defeats the point of it being
free. So what, what I did instead was I saved a list of the 500 stocks in the s&p 500 into
a CSV file, and you can click this link here to download that file.
So what we need to
do is click that, it'll go into our Downloads folder, and then we need to move that file
into our starter files, full starter files, excuse me, we need to move this file into
our starter Files folder so that it can be accessed by the other files in that directory.
So let's do that now. And then once that's done, we can import into our Jupyter Notebook.
So open up your finder app, or I guess it would be finder app or Windows Explorer or
whatever app you have on your operating system to explore files and navigate to your downloads
folder in one tab. And then in your other tab, you're going to want to navigate into
that algorithmic trading and Python course.
So here it is. And then specifically, you
want to navigate into the starter file folder and then drag this s&p 500 stocks dot CSV
into the starter Files folder like that. Okay, so now that that's done, what we need to do
is save the s&p 500 stocks as a panda's data frame. And the way to do that is we're going
to assign it to a panda's data frame named stocks. So stocks equals k dot read CSV, and
then the name of the file. So if you just type in SP, and then tab, it might autocomplete
there it is. And if it doesn't autocomplete, just type in the file name. So what is this,
it is a this is a method that lives within the pandas library that takes in data in the
form of a CSV, and then stores it into a panda's data frame. So you can verify that by just
doing type of stocks, and this should return panda's data frame.
Yep. And then if you print
it out, it'll actually show the data. So Jupyter Notebooks have this kind of special feature
where if you type a variable on the last line of a code cell, it will print it. So this
is what is contained in our, in our panda's data frame. Like I said, there's actually
505 stocks within this data frame, which means that there's five stocks that have dual shared
share structures in the s&p 500 at this time. So that's what we're going to be working with
throughout the rest of this tutorial. The next thing we need to do is acquire an API
token, like I mentioned in the introduction to this course, we're going to be using the
IE x cloud API to gather all of our financial data. And like most API's, they require some
form of authentification before you can pull data from it. So what we're actually gonna
be using in this course, is the sandbox mode of the IE x cloud API. And sandbox mode is
a kind of like a play mode, where you can make sure that all of your API calls work,
but it doesn't actually return real time financial data.
Instead, it returns randomized financial
data. And the reason why sandbox mode exists is because it allows you to test whether or
not your API calls function properly before you actually start to use the real API, and
incur data usage as a result. So the easiest way to to handle this, in my opinion, when
building this course, was to create a secrets.py file and secrets.py files are files that are
stored in your repository or on the server that's running this code that doesn't actually
get pushed to GitHub or any kind of remote Git store.
So the reason why is because most
of the information stored in a secret.py file is sensitive or confidential in some way,
and you just don't want that type of information stored in a centralized Git repository. So
you'll notice in the Git ignore of this course, the secrets.py file is actually included in
the Git ignore, which means that if you ever fork this repository to work on it, and then
push your changes up to GitHub, the information contained in the secrets.py file doesn't actually
get pushed up as well. So since it's actually a sandbox API key, it doesn't really matter
in this specific instance, whether the secrets.py file is exposed.
But keep in mind for future
Python courses are anything else that you work on that you should never share your secrets.py
file with anyone. But this course is kind of an exception, since we're working with
a sandbox API key. So to start, we're going to click here. And this will download a secrets.py
file to your downloads folder. Now, you may get a warning that says something like this,
in my case, it says this type of file can harm your computer do you want to keep secrets.pi
anyway, and we're going to keep it in and move it into the same starter Files folder
that we moved our CSV file of stocked into. So if you open back up your finder, app or
whatever app that you have on your operating system, you should have your two tabs still
open, you can click and drag the secrets.py file into the starter Files folder. And then
we need to import our IE x cloud API key into our script. So what we're going to do for
that is we're going to say from secrets, import, ie x, and then if you hit tab, it should autocomplete.
It says cannot import I yet have API token.
From secrets. I wonder where that is. usually
done. Okay, so this is equally as. Alright, so sometimes what happens with these Jupyter
Notebooks is that when you add a file to the working directory that the Jupyter Notebook
is opening, it actually doesn't recognize that it's there until you reset the kernel.
So let's try that we're going to say, restart the kernel. And then we're going to go back
to the top and run all of our code cells from the start. Awesome. And this time it imported
correctly. So in a lot of cases, I might run into little hiccups like that when working
through this code. And in some courses, people edit it out for a more clean experience. But
I would say in the majority of cases, I'm going to leave them in so that you can see
the debugging process and what I actually do to fix the problems that I encounter. So
that was a small example of that. You'll see more as you work through this course.
So now
we're going to make our first API call. So making API calls is a bit of a, an art because
every API works a little bit differently. And some API's have lackluster documentation.
In this case, the i x cloud API has excellent documentation. And I'm going to kind of show
you a bit of the documentation as we work through this section that creates our first
API call. So in this specific case, for the s&p 500 index fund, there's two things we
need, we need the market capitalization for each stock and the price of each stock. So
to start, we're just going to do an API call for one single stock. And then, to generalize
that, for our entire universe of stocks, what we're going to do is loop over every stock
that's contained in the panda's data frame that we created earlier, and then run that
same API, call it for each one of those stocks using a Python for loop. So to start, let's
just create symbol equals. And we're going to make this a string that is Apple's topic,
which is AAPL.
And then on the next line, we're going to say, what is the API URL? And
that's a good question. What API endpoint is that, that's what we're gonna find out.
Next. To do that, we're going to navigate over to the IX, cog documentations. And the
easiest way to find this is just go to is cloud. Doc's on a Google search. And it's
this first hit here, that is not an ad. So click through to this. And what this is, is
basically a massive single page resource for all the possible information that you might
need about the cloud API. ie x cloud has remarkably good documentation. There are other API's
out there that maybe aren't as commercialized or are not as mature that won't have documentation
that that's this good. So as you kind of look through this, keep in mind that this is probably
a better than average documentation page for an public API.
Now, the first thing we need
is the base URL for the API. And what the base URL is, is it's basically a URL that
will start every HTTP request. And then after that base URL, you have to add which specific
endpoint that you want to retrieve from the API. The way most API's work is they only
expose certain data through each endpoint, which makes things faster if you only need
to retrieve certain data. So one example might be, if you have the Google Maps API, you might
only want the name of a location. So you would send coordinates and get back its name, you
wouldn't get back other information like population or other sorts of things because you only
want name and sending only that limited information is faster than sending information you wouldn't
meet. So API endpoints are limited in that respect. But to start, the first thing we
did is the base URL.
Now, you can see in this first block of text called API reference that
the base URL for the API is HTTPS for slash cloud.ie. x API comm we're actually using
the sandbox mode of this API in this course, which means that we're not charged for any
usage. But we get randomized that In exchange, so you could actually use the sandbox mode
to create real investment strategies, but it's free. And it's still a good way to learn
about how data API's work in general. But because of that, we can't use this base URL,
what we need to do is a Ctrl F for sandbox. And if we kind of search through the first
little bit, we find this section called testing sandbox. It says this is cloud provides all
accounts a free unlimited use sandbox for testing, every account will be assigned to
test tokens available via the console, all sandbox API, or sorry, all sandbox endpoints
function the same as production, you only need to change the base URL and token.
So
what does that mean, we need to change the base URL, which means we need to grab this
instead of the base URL that was provided earlier. So copy that to your clipboard, move
back to our Jupyter Notebook and paste that right there. Awesome. So we have our base
URL. The next thing we need to do is figure out which endpoint we need. And as you can
see from looking at this, we ideally want to find an endpoint that provides both market
capitalization and stock price. There is one API endpoint provided by AI x cloud that provides
both of those, but I'm going to show you first how you would find just any endpoint that
provides one of those metrics and then I'll show you the input we're actually going to
use so let's start with market capitalization.
Note that is cloud does have this search bar
over here in the top left, but for some reason, I always just find myself using a Ctrl F instead.
So what we're gonna do here is do market cap or let's just do market cap not market capitalization
because it's sometimes abbreviated. So what do we have here for the first its enterprise
value and price to sales. So these are both metrics that are calculated using market capitalization,
they don't actually give us market capitalization. So neither of those is what we need. The next
one is actually a true market cap metric. And this is exactly what we need. But if you
look through the rest of the response attributes from this API endpoint, this doesn't actually
give you stock price. So we like I said, we ideally want to find an endpoint that has
both price and market cap.
And with a bit of searching, you would find that the, quote
endpoint provides both of those. So I just did a Ctrl F for for slash, quote, forward
slash. And this takes us to the information section about the quote endpoint. Now, every
endpoint description on the IE x cloud API starts with a section on how to execute the,
the HTTP request for that endpoint. So in this case, we want to execute a get HTTP request.
And the endpoint is for slash stock, and then symbol and then forward slash quote. So if
we just copy this, and add this to the API endpoint that we have in our Jupyter Notebook,
this actually won't work.
Because this, you know, curly brackets symbol thing here doesn't
mean anything in the context of this string. If we tried to execute this HTTP request as
it is, this will return a 400 error probably. So what we need to do is we need to transform
this API URL into an F string. Now, an F string, if you've never used them before, I'll give
you a very brief introduction. So if you have string equals Free Code, camp is awesome.
And then you said I need to change the values within this string based on some outside variable.
We will say we'll create a variable called adjective. And this just describes a word
that we want to call Free Code Camp, we will say the adjective is superb instead of awesome.
And then we want to pass that adjective variable where this awesome word is within within the
strength.
So the way to do that is you create an F string, now, as strings are called that
strings, because you simply put an F on the start of the string. And then this allows
you to interpolate values in so here, curly brackets, and then we type adjective. Now,
what this string actually stores is Free Code Camp is superb. And you can test that by printing
the string. Free Code Camp is super now we're going to use this string functionality to
create our API endpoint. And you're sorry, yeah, no, to create our API URL. So we're
going to do is we're going to cut this and just get rid of it.
And then here, this already
has the curly brackets that are associated with an F string. So all we need to do. And
actually, in addition to the curly brackets, it has the right variable name that we called
here, I think I did that by design, well, kind of building the code for this course.
So what we're doing is adding an F there. And then if we print this oops, if we print
this API URL, this should print out this string. But instead of symbol, it should say a PL
all caps. Perfect. That's exactly what we need. Okay. The last thing we need to do is
we need to add an appendage to the end of this API URL that that passes in our API token
to authenticate us through the i x cloud API, and basically say, this user does have permission
to access this API endpoint. Now the way to do that is you say, question mark token equals,
and then we're going to use an F string again, and say, is cloud token that we imported earlier,
if you just type in all caps, ie x and then hit tab, this will autocomplete for you.
Now,
if we print this, oh, sorry, I didn't actually have a print statement there. If we print
this. This gives us a fully fledged AI x cloud endpoint that we can pay using the requests
library. So moving on to that, what we need to do now is execute an HTTP request and store
the results of that HTTP request in some outside variable, we're going to call that variable
data. And what we need to assign it to is a requests dot get method. Now, if you've
never used the requests library, that's what we imported at the top of our script. Here.
It's kind of known as the world's best Python library for executing HTTP requests. Their
tagline is, their tagline is HTTP for humans. And I quite like that because it does describe
Well, what they do. So what we need to do here is just type in request dot get and then
pass in our API URL. Now, if we run that, what the heck is this data variable? The easiest
way to check is to just do the pipe function and pass in data.
And what it is is it's a
response object, which is contained within the LA models, module of the requests library.
Now, this response object has many different things inside of it. And one of the more interesting
ones is the status code. So we can do bank data dot status code, and I'm gonna hit tab
and see if that autocompletes okay didn't set out, you know, there it is.
And this gives
us a 404. Now, why the heck does it give us a 404? Clearly I did something wrong here.
Okay, so I just figured it out the error. And what I missed here was, there's an additional
kind of suffix that needs to get added to the i x cloud endpoint, called forward slash
stable. Now, why does that exist? If you look at this section, here, there's two different
naming conventions they can use, you can use stable or latest stable is kind of the latest
stable API version. As you can see here, well, latest, you can access the latest API version,
which may be in beta. So basically, stable is what you want to use if you want to make
sure that your application never breaks. And latest is what you want to use.
If you want
access to the most bleeding edge tech that might not be fully tested yet. So all we need
to do is go back here, and here type for slash stable for slash, and then run this. Now we
get a response 200, air response 200 code, I mean, which means that the HTTP request
was executed properly. And I actually don't like this format. It's not as accessible as
this status code attribute that we can access. And the reason I like this better is you can
say, you can test it using equality operator. So is this equal to 200? True. And that's
kind of an easy way to handle exceptions with HTTP. But if you say, if it's not equal to
200, oops, you could say if it's not equal to 200, then do something. Anyways. What we
need to do now is this object is not in any format that we can really access. So what
we're going to do is we're going to transform the data that this HTTP request returned to
us into a JSON object, using the dot JSON method, it actually accept no parameters.
So that's all we need to do.
Now, if we print the data object or the data variable, it gives
us a long JSON type kind of Python dictionary object that we can use to kind of parse objects
out. Now, if you've never worked much with Python dictionaries, they're very useful.
I'll make a very quick example here. dictionary equals now dictionaries are always kind of
created using curly brackets.
And what you can say is say A equals sorry, a colon one,
and B, colon two. So this is a Python dictionary called dictionary, very creative naming, the
reason why dictionaries are useless because you can pass in. So these entries here are
called key value pairs. This is the key this a and this is the value one. The reason why
dictionaries are useful is because you can pass in the key and it returns the value.
Now, this data variable will behave exactly the same way. So as an example, if you want
to access this, so this is one key value pair.
And if you want to access the value, which
is AAPL, all you have to do is pass in the key, which is symbol in square brackets like
that, and this should return an apple perfect. So the two attributes that we wanted of this
our market capitalization and price. So we're going to parse that in the next section of
this course. Okay, so what we're gonna do here is say price equals data. And then where
is the price contained in this, the easiest way to tell it to do a Ctrl F. And you can
see there's a whole bunch of different price variables here.
The one we want is probably
not calculation price, latest price sounds right. So that's the latest price provided
by the IETF party API. Now, if you ever see something like this, and you say, Hmm, I don't
know what latest price means. That's kind of a bad example. But let's look at something
else in here. That's not as obvious. All right, this variable extended change. Now what the
heck is extended change the is called documentation. So that should tell you so if you just copy
that extended change string to your clipboard, and then go back to the IE x cloud docs, run
a Ctrl F search and type in extended change, it will take you not just to this JSON object,
but there should be a description below.
So as you can see here, extended change refers
to the price change between extended price and latest price. So without knowing what
those two means this is kind of meaningless. But this was just an example to show you how
you can look up the meaning of different data points within the JSON response in the i x
cloud docs. Okay, so that's how to parse price. We can print it out and see what the actual
prices 515 28. Now, I actually am not sure if the is cloud price data is randomized in
our sandbox mode, you can test it pretty easily by just doing NASDAQ Apple on Google search
for 97. Did it ever get to 515? Okay, so I don't think that this price is accurate, but
it's actually not far off from the real world price.
So this price data is randomized, but
it's interestingly, quite close to the real price. So that's how you price out price.
And then for market cap, we'll just say market cap, Eagles data, and then market cap. And
similarly, you can print market cap. Awesome. This gives like a, I think this is I'm gonna
divide this by a trillion. That's sorry, that's only a billion. That's a trillion.
So this
gives us a market cap of 2.1 8 trillion, which again, I think is pretty close to Apple's
real market capitalization. Yeah, so they're at 2.1 3 trillion. So in both cases, pretty
similar, although, still not exactly correct. Okay, so that's how you parse a API call.
What we're going to do now is we're basically going to scale up that process to all of the
stocks within our s&p 500. csv, and then store all of the responses in a panda's data frame
that we will later save into an Excel file. So that's what's up next. Alright, so the
first thing we need to do is actually, we're going to exit out of this Ctrl F box here,
because it was lighting that up and causing me some eggs. The first thing we need to do
is specify the columns of our panda's data frame. And to do that, we're going to need
to know exactly what we're building. So we're going to build a panda's data frame that has
a few columns, it's going to have the ticker for each stock, it's going to have the stock
price for each stock is going to have the market capitalization of each stock, and then
it's going to have the number of shares to buy for each stock.
So let's start by specifying
that in a variable called my columns, and this is just going to be a Python list. And
the entries are going to be thicker. stock price, market capitalization, hopefully, I
can spell everything right. number of shares to buy. Okay, so that's the Python list. And
then we're going to do now is create a blank panda's data frame that has those columns
specified. So to do that, we're going to call this variable final data frame, because we're
going to this will be the final output of this program at the end. And to create a panda's
data frame, you use the PD dot data frame, class instantiation. And inside of this, we're
going to pass in columns equals my columns. Now, you notice that since this is columns,
I didn't want to call this columns, because columns equals columns is a bit confusing,
although I think it would work. Let's try it out.
So that does work. I just think it's
quite a bit easier to read. If you say caught my columns, it kind of removes that repetition.
Okay, so our panda's data frame has been created. What does it look like? pandas, data frames
are just kind of two ways to print them. As with most things in a Jupyter Notebook, you
can either use a formal print statement like that, or you can just list the variable name
as the last line of code. So this is kind of what it looks like. There's no data in
here right now. So it's kind of empty. But what we can do is just do create a list of
a list, that's going to be 0000. And this is kind of how pandas dataframes typically
render. So one of the advantages of not using a formal print statement is that when you
just list the data frame variable name on the last line of a code sale, it will render
in this nice format that kind of allows you to mouse over it, and it changes color.
And
it's just a little bit nicer to look at than the alternative, if you use a print statement
will print like this in just plain text, and it's not quite as nice. So that's what it
looks like when you print it with plain text. Actually, you know, the, I'm not a big fan
of it at all. So we'll stick to printing data frames, specifically like this. So that's
what our data frame looks like. I'm going to get rid of this line of zeros because that
was just to show you how they render. And that's what we're going to be working with
moving forward. So what we're going to do now is I'm going to show you how to append
these data points to this panda's data frame. So what we're going to use to do that is the
append method. So final data frame dot append.
And then what we need to put in here is a
panda's series that lists all of the entries in the data frame. So I'll just quickly explain
what a panda's series before a panda's series is before we move on, so if we add back this
row of zeros to our panda's data frame, like that. So a panda's data frame is a two dimensional
data structure, which means it has rows and columns, whereas a panda's series is a one
dimensional data frame.
If you've ever worked much in Python, it's similar to a Python list.
And if you've ever worked with NumPy, it's similar to a NumPy array just has different
methods and functions associated with it. But within a panda's data frame, every row
and every column of that data frame is a panda's series. So, in order to a panda's data to
the bottom of a panda's data frame, that data needs to be a panda's series. I'll show you
what I mean now. So let's take this row zeros out and rerun this code cell. And then what
we need to do here is create a panda's series with PD dot series. And then the pandas series
accepts a Python list. So lots of different layers here. But what we're eventually going
to work out to is adding a row to this panda's data frame. So the first thing we're going
to add is the name of the stock, the routing, which was stored in the symbol variable earlier.
And then we're going to add our price and market cap variables which we created earlier
price market cap, and then we're going to add another cell called N A.
And what does
this na mean? Well, this last column, the number of shares to buy, we actually can't
calculate it until we have pulled in all the metrics for every stock. So what we're going
to do is just start as na now, and later on, we'll go back in access that data and change
it once we've actually pulled in data for all of our stocks. So what happens when we
run this code cell? Oh, we missed a comma in there. Now we get an error can only append
a series if ignore index equals true. So this is super super common. We're working in pandas,
this ignore index equals true, basically, almost always needs to be added to the append
method whenever you're appending data to a panda's data frame. So this is when we get
Whoa, this isn't what we wanted. And as you can see, all of these columns that we created
have na n values in them, which means not a number. And then over here, we have the
data that we actually want it now why is that? It's because we didn't tell the append method,
which columns to add that data to.
So what we need to do to fix that is we need to add
another argument here that says, index equals my columns. Oh, I missed a comma, I think
No. Oh, so the first sorry, the problem here is that this index equals my column needs
to be within the pandas series and not within the panda's data frame. So small difference
there. So when we run this, we're missing a comma. Again, when we run this, this actually
generates what we want. And so to give a quick recap, we created a final data frame dot append
method. And then within that final data frame dot append method, we created a pandas series
that has all of the data points we wanted. And then we specified index equals my columns,
which tells the append method which columns to add this data to. And then the last thing
we did was we added to ignore index equals true, which is kind of necessary whenever
you're appending data to a panda's data frame. Alright, so what we need to do now is loop
through every ticker in our stocks variable, and execute an API call for that stock, and
then store the results of that API call in our panda's data frame.
So to start, we're
going to overwrite our old panda's data frame variable with an empty panda's data frame
that has the same columns as before, so columns equals my columns, then, we need to create
a for loop that says for stock stock in stocks, ticker. And to start, let's just print all
of these print stock, what happens? Great. So we're successfully looping through all
the stocks in that data frame, we can get rid of that print statement. Now what happens
next, we, we need to actually create an API call for each stock. So to do that, we'll
scroll up to our old API call cell.
And we're going to copy both of these lines, copy and
then scroll down. Alright, so both of those are getting copied in here, this needs to
get indented. And then instead of symbol here, we're going to write stock. R IE x cloud API
token is the same, the data is the same. So that's all good. Now, one thing to note about
looping through our list of stocks in this way, is that it's going to be really, really
slow. And the reason why it's slow is because executing an HTTP request is one of the slowest
things that you can do in Python. So this slide here, where we actually execute the
request is very, very slow. Later on, we'll see how to improve the performance of this
code by doing batch API requests that make it very easy to retrieve information on multiple
stocks with a single API requests.
But that's kind of a more advanced topic. So we're going
to go through with single API requests first. Okay, so now we need to basically recreate
the append statement that we use earlier. So final data frame dot append. Inside the
append statement, we need to create a panda's series of sorry, PD dot series, and inside
that pandas series, we need to create a Python list. So lots going on here. The first thing
that we're going to add is stock.
Now what is stock stock is the incrementer of this
for loop. So this will actually be the ticker of the stock we're working with. The next
thing we need to add is data. parsed to latest price, so this is the same person that we
did before, this will give us our stock price, which is the second column of the data frame.
The last thing is data, and then market cap, which is the market capitalization just like
we did earlier. And then just like before, we will do na for the last row within this
pandas series, function, we need to add index equals my columns. And then outside of the
pandas series as before, we need to add, ignore in index equals true. Now, since this is going
to be so slow, like I mentioned, let's just do this for five stocks and see what happens.
Next, oh, there's a missing comma right there, run. Great. Now if we print final DataFrame,
what's gonna happen? You think it'll have information for all five stocks, and nothing's
there. Why is that? Now, the reason why that is, is because this final data frame dot append
method doesn't actually modify the original data frame unless you tell it to.
So to do
that, the easiest way is to just say, final data frame. I can't spell today final data
frame equals equals, and then the append statement. So if we run this, and then run the following,
so we should see a different output. Awesome. So this is for the first five stocks in our
list of stocks. If we take off this little argument here, this will take a very long
time to run, let's just see that in action. I'm going to run this print statement too.
And then while this runs, I'm going to have a little sip of my coffee and cut out the
rest of this until it's done. Alright, so as you can see, this is a very long panda's
data frame that has all of the information we needed. Since it took so long, we're going
to move on now to using batch API calls to improve the performance of this code.
Now,
I x cloud, like most data providers actually gives you discounts if you use batch API calls,
because it's a lot less load on their infrastructure. So that's one reason to use them. The fact
that it speeds up your code is another reason to use them. Overall, if you can use batch
API calls in your scripts, it's generally a good practice. So pay attention to this
section because it's really important. I x cloud as a data provider limits their batch
API calls to 100 tickers. So what we need to do first is find some way to split our
list of tickers up into sub lists of length 100.
Now, it's not super intuitive on how
to do that. But basically, what I did just to show you is I said, how to split a list
into sub lists. And in Python. Awesome, here's the sorry, that took a bit of time, I had
to skip ahead. But here's the function that I was talking about. So it's called chunks.
And in the finished files of this course, you'll see that actually gave credit to this.
It has 3000 uploads, which is pretty crazy, even for Stack Overflow. So this is what we'll
be using to split our list or our pandas series, in this case into chunks of size n. So we'll
run that code cell to define this function. Okay, what we need to do now is use this chunks
function to create a list of lists where every list is no longer than 100. So to do that,
we'll say, symbol groups that will we'll call this equals chunks, and then the list we want
to split is stocks, ticker.
Now, the reason why we have to pass this as ticker is because
this is a panda's data frame. And ticker is the header of the only column. So if we just
did stocks, this gives us the data frame. If we do stocks at header ticker, that gives
us a series which we can pass into this chunks function. And everyone say 100. So this gives
us a chunk generator.
And to get the actual lists, we just have to pass it into a list
function. So we'll do that. And it will say symbol groups, see what that gives us. Okay,
so this gives us a list of lists, or more specifically gives us a list of pandas series.
So this is from index zero to index 99. This is from index 100, index 199. This is from
index 200 to 299 300 to 399 400 to 49. At the end, we have this one that's much shorter,
it's 500 to 504. And that's because there's 505 stocks in the s&p 500. And this just gives
us groups of 100. At most, and since this one only has five minutes shorter. Okay, so
moving back up, what we need to do now is make a for loop that lists loops through every
list, or sorry, every panda's series within that list, executes a batch API call. And
then for every stock in that list, appends the information from that stock to our final
data frame.
So first thing to do is for i in range, zero to length of symbol groups,
and then print. So what does this give us this gives us 012345. Now, those are all the
indices This symbol group groups list. So if we do this, oh, sorry, that's not what
I meant to do print that. This will just print out all of the lists within our list of lists
called symbol groups. So now what we need to do is we need to transform all of the stocks
that's in each of those lists into a string, and that string will be passed into the URL
of the HTTP request that we're executing.
So first, we'll just create an empty list
called symbol strings. And what this is, is it's going to be a list of strings, where
each string is a comma separated string of all the stocks in this object. So all we need
to do is use the append method tabbies in so we're going to do symbol strings, dot 10.
And then what we're going to use is the join method called on the comma character join,
and then symbol groups. Now, if you've never used the join method before, basically what
this does is it says, Take all of the elements of this and join them together by separating
them with this.
So to see that in action, we'll just do print symbol, strings. Strings
at index i. Alright, so here's one example. Here's another example. And you can tell these
examples apart. Because at the end of the line, there's a gap. Sorry, this is one example,
I had one extra line there, you can see this example ends there because there's a gap.
And it starts on the line after this, because this is the last line of the first example.
So there we have our five lists of symbol strings, I'm going to comment this out, because
we don't want that printing every time we run this code. So the next thing we need to
do is create a blank final data frame again. So we're using the same variable name, which
probably not a good practice, but we're overriding it each time.
So equals P dot data, frame,
and columns equals my columns, you guys know the drill by now. Now, let's just print out
to make sure that it looks okay. Awesome, so we can delete this out. Okay, the next
thing we need to do is loop through every string in our symbol strings object, and then
use that string to execute a batch API request. So we're gonna use a for loop for this. And
to keep the, the nomenclature pretty simple, we're going to just say for symbol string,
in symbol, strings. And then we can actually print these out to make sure that this is
working right. Simple string. Awesome, that looks good. We're going to take away that
print statement now.
And then we need to create a batch API call URL. And as before, this
is going to be an F string where we pass in the the base URL of the API endpoint, and
then pass in all of the strings through for each of these symbol strings into the endpoint
to get 100 to get data for 100 stocks back. Now, if you've never run a batch API call
through is called before you probably are unsure how to do this, let's just go to the
docks here and do a Ctrl F for batch and see what comes up batch requests. Okay. So
So what it says here is that use market to query multiple symbols, okay, so this is what
we want this stock market. And then, as you can see, you pass in one of the parameters
as symbols equals a comma separated value of symbols, and then types equals a comma
separated value of API endpoints that you want to hit. And then if applicable range
tells you how much data to pull for. And then at the very end of this, you would pass on
your token as well.
Because we're going to do is oops, did not mean to click that. What
we're going to do is copy this, go back to our Jupyter Notebook. We're going to pass
that in here for now. And then we're going to scroll up to our last API call and get
this base URL, including the stable that I missed earlier. We're going to copy this here.
We're going to delete the double forward slash. And then we're going to move over to the end
of the API call and change a few things.
So the first thing we're going to do is remove
this last parameter and this range parameter. We don't need those for the API call that
we're doing. The next thing we're going to do is remove the chart and news endpoint from
that example, because we don't need them. Next, we're going to actually remove these
symbols that are hard coded into this, we're going to do a variable interpolation interpolation
for a symbol string. And then what we're going to do is move to the very end. And as before,
we can just copy on the IaaS cloud API token that we had in our original API call. So copy
this scroll. Like down, go to the end of the string and add it. Awesome. Now, let's just
do for symbol string and symbol string up to, but not including the first one. And then
we're going to just print this and see what it gives us. Oh, and I spelt that wrong here,
which I'm going to correct now. And then print API call URL. Now we're getting an error,
that API call URL is not defined.
There's three L's here. That's why I run that again.
Alright, so this gives us a long URL that looks okay to me. One really easy way to test
this is to just click on it. It's a 400. Bad request. I'm not sure why. Stable slash stock
slash bash, let's make sure that that matches, slash stock market slash batch. Symbols equals
symbol string. Types equals one.
Oh, so the problem here is that when you're passing on
multiple parameters to an API request, so this first character is a question mark, and
then every successive character that chains these parameters together needs to be an ampersand.
So we're going to copy this ampersand and replace this question mark with it. And I
think that should fix the problem. Great. So this is actually opened in a browser, the
JSON that is going to be returned from this API request.
So now we can do what we did
before, use the requests library to get this data. So I'm gonna do data equals requests,
dot get, and then that URL. And then as before, we're going to pass the JSON method onto this.
Actually, before we do that, let's just do print data dot status code. So returns 200,
which means it's working properly. Now what we can do is pass the JSON method on to this
data variable. And then we need to parse data from this. So more specifically, we need to
parse data for every specific stock that's in this symbol string. Now the way to do this
is just like before, how we use this join method to join together all of the different
strings that were contained in that list, you can use the split method to basically
do the opposite. So we can say, or symbol in symbol, string dot split, and we're going
to split on the comma character.
Let's just print every symbol and see what happens. Awesome.
So it looks like it's successfully looping through all of the first 100 stocks there.
And what we need to do now is parse the batch API call for data for every specific stock
in the order that we need, and then append that data to the panda's data frame. So just
like before, we're going to use the append method, we're gonna say final data frame,
equals final data frame, dot append.
And then in here, we're going to specify that it's
a panda's series. And inside the pandas series, we're going to pass in a Python list. So all
of that is exactly what we did before. So I kind of skipped over it. And then here,
we're going to pass in the same for variable, so symbol, and then price, which will be data.
And then we have to do actually multiple levels of parsing this time, because not only do
we have to parse the metric for that stock, but we also have to parse the batch API call
to get the information for that specific stock. So here's how you do that. We do symbol, and
then quote, so this is the stock and then the endpoint, and then we need to parse out
the metric latest price.
And we can follow similar logic here to get
the market capitalization. So market cap, and then just like before, we are going to
not specify the last column, we will calculate that later. Let's run this and see what happens.
Actually, I'm going to add one last line to this set print scalar data frame. All right,
let's run this and see what happens. Type error can only append a series. This is that
ignore index specification that I mentioned earlier. So that needs to get added in right
here. And actually now that I think about it, inside this pandas series class instantiation,
we need to pass in index equals my columns.
Alright, let's try this. Invalid index index
and ignore index equals true. I'm missing a comma there. One more time. Awesome. So
as you can see, we've successfully upended the ticker, the stock price, the market capitalization
and the number of shares to buy the only other Change we have to make this code sell is to
remove this little appendage right here, that made it loop only over the first instance.
So now when we run this, it will actually loop over every single string and add every
stock within our list to our panda's data frame. So let's try that. Awesome. As you
can see, that was way, way faster than using individual API calls for each stock, the other
one took probably two or three minutes. And that was only a few seconds.
So huge improvement
there. I mean, in theory, if all the API requests take the same amount of time using batch API
call should make it 100 times faster because it does one call for every 100 stocks, but
anyways, much faster in any case, and it looks like we go all the way from A to Z, all the
data looks okay, so we're ready to proceed to calculating the number of shares to buy.
Alright, so the reason why we left this calculating the number of shares to buy step until later
in the script is because I want this to actually work regardless of how big your portfolio
is. So how we're going to handle that is, we're going to create a Python input that
says how large is your portfolio, and then based on what you tell Python, it will calculate
the number of shares to buy accordingly. So to do that, we're going to use pythons input
function. And more specifically, we're gonna say, portfolio size is equal to input.
And
then inside this input, it accepts a string. And that string is kind of what question you
want, Python asked you. So here's how it's the looks, enter the value of your portfolio.
Now if we run this, you can see there's an input that says enter the value of your portfolio,
we can write, you know, $1,000. And then we can access that value later on in the script.
So now, if we do, you know, portfolio size, it'll print out 1000. Note that this is actually
a string. So that fact that this accepts a string is actually something we need to make
sure we handle because since it's a string, you could say I don't have a portfolio. And
that would actually kind of break the rest of this script, because it's going to try
to do mathematical operations on this string, which is obviously impossible. So what we
need to do is create kind of a try accept statement.
So if you're not familiar with
try accept, basically, the way these work is, try except, you could say, try to do this.
And if this doesn't work, it will do whatever is specified here. So one example would be
like, if you say, two plus n, and is not defined, so then it will print out, you know, and is
not fine. We're going to comment this out, so it doesn't run. Now if we run this, it
should print n is not defined. Awesome. Now, if we say two plus two, it will doesn't print
it. But if we wrapped it in a print statement would four, okay, so we're going to use this
same logic to handle the fact that people might not put number variables into this portfolio
size input.
So the first thing to do is uncomment. This, the next thing we need to do is to try
to kind of handle our input as a float variable. So to do that, we're just gonna create a variable
called Val. And we're going to say that's equal to float, portfolio size. Now, this
float function is basically going to take this portfolio size variable and try to force
it to be a float. So if we say, print vow, actually, no ignore that we're going to do
is put that in here and say, print vow.
And then for this, except we're going to say,
Please enter an integer. Now, if we put in a sentence will say, again, I don't have a
portfolio returns, please enter an integer. Now, this broad try accept statement is actually
not good. And the reason why is because this will react regardless of what type of error
is happening. So what we need to do to make sure that it only handles this specific type
of error that we're trying to address is we need to try to do this and kind of force an
error.
So we'll put a string in there. And then as you can see, the type of error that's
created is a value error. So we need to basically change this script so that it only reacts
to value errors instead of reacting to all types of errors. And the way that you do that
is you put the type of error right here and run this again and see that it does handle
this properly. Okay, so what we want to do now is instead of just handling this, instead
of just handling this and saying please enter an integer, we need to actually make it redo
this. So how does this look? Please enter an integer. All we want to do is say in here,
enter the value of your portfolio. And then above this, we want to tell them exactly why
they're getting this input prompt again. So we're going to print and say, That's, please
double brackets, that's not a number, and then four slash, and so that'll create a new
line, please.
Please try again. And then now equals float. Awesome. So just to recap, we
try to get the value of someone's portfolio using the input function. In the first step
of the try accept block, we try to force that that that input ID value to be a flow variable
and assign it to a variable called Val. If that doesn't work, because of a value error,
then we print down, that's not a number, please try again. And then we prompt them to enter
the value of their portfolio again, and then we, we force it to be a strongly typed float
again. Now, one interesting thing to note about this is that if you do it twice, string,
and then string, it will still trigger a value error. So you need to kind of rely that the
user of this script is going to be like, I guess, astute enough to not enter a string
twice. So it might look something like this, they go and say, enter the value of your portfolio
$1,000.
And they say, That's not a number, please try again. So then you say, Oh, it's
$1,000. And then you can proceed through the rest of the script. Okay, so now that we've
accepted the number from the user of this script, we need to actually use that number
to base our share calculations on now since this is equal weight, every stock in the portfolio
will have the same position size and position size is kind of just a fancy financial term
that says how much money you're going to invest in each stock now, Center's 505.
Stocks, an
easy example would be if your portfolio is $505,000, all you would do is invest $1,000
in each stock. So as you can see, the easiest way to calculate that is just to say, position
size, is equal to the value divided by the length of any column from the final data frame.
So in this case, we'll just do final data frame dot index, and then print position size.
This says, We made a small portfolio here of $1,000. So it's just you know, saying give
us $1.98 in each stock, let's use something more reasonable, like a million dollars. Awesome.
So now saying invest, you know, $19,000 in each stock, that sounds about right to me,
Oh, hang on, is this a million or 123, that's 10 million. So $19,000 in each stock, or almost
20,000 makes sense for a $10 million portfolio. Alright, so we have the position size now.
Now we need to calculate how many shares of each stock need to be purchased to get to
that position size in that stock.
So let's take an easy example. Apple stock price is
about $500. Right now. So if we wanted, you know, number of Apple shares to buy, you would
just say, position, size divided by Apple stock price, which is $500. ish. And then
you can print the number of Apple shares. And in this case, it says, okay, buy 39.6
Apple shares. Now, this introduces an interesting concept in that a lot of places do not support
fractional trading, which means that you can buy a portion of a share, you can only buy
whole shares.
So you can buy 39 shares, or you can buy 40 shares, but you cannot buy
39.6 shares. So how do we handle this? Well, it's tempting to just say, let's round it
to the closest number up or down. But what might happen then is that if you are rounding
up more integers than you're rounding down, you'll actually end up buying more stock than
you have money for. And that is not a good thing, you would get to the end of your allocation
and say, oh, man, I don't actually have enough money left over to get to my position size
target on this stock.
So because of that, we actually have to round each of these down.
And that's where that math module that we imported earlier comes in. So math dot floor
is pythons round down function. So if we print this, that should give us 39. Oh, shoot, math
is not defined. I think it might be lowercase n. Yeah, math floor. So that gives us 39.
Okay, so this is kind of like the logical intuition on how to do this, we now need to
programmatically loop through every row of our panda's data frame, and actually apply
this logic to the number of shares by column. Now to do this, I'm just going to comment
that out because it'll be useful for us later. Alright, so we're going to say for i in range
from zero, and then we needed to go to the last row of the data frame, and the easiest
way to do that is just say the length of any row within the data frame.
What I typically
use is final data frame index. As you can see, this logic is actually basically what
I used here as well. So Let's use this and then just print everything in here. All right,
this goes from goes from zero to 504. Let's just make sure that adds up with the data
frame we clicked earlier or printed earlier, I should say, it goes from zero to 504. Awesome.
Now, we need to access this cell in each row of the data frame. The easiest way to do that
is using the LLC method.
So what we're gonna do is, say final data frame, dot LLC. And
this is a basically like an easy row column way of accessing data in pandas. So for the
row, we're going to say I and for the column, we're going to say, the number of shares to
buy. And let's just print this to see if it prints a bunch of Ma, because that's what
it should print. Awesome. So we're correctly accessing the data. Now we need to actually
assign the data. So to do that, you can just use the equality operator and then say, math
dot floor. And then we need to do basically this calculation here, position size divided
by the stock price. So we'll say, position, size divided by now what are we going to do
for stock price? Do we have to execute another API call and pull that data from IE x again,
we actually don't and the reason why is because that data is already stored in the panda's
data frame.
So we can take similar logic to this LFC method here and put that there, paste
it, and just change this to stock. Price. Oh, this tab, autocomplete is not working.
So if we run this, what happens? It ran correctly. Let's do this again, and print the data frame
to see if it looks okay. Awesome. So as you can see, it correctly kind of shows the number
of shares to buy here. One easy way to fact check this, well, there's a couple things
you knew, the first thing is to say, Okay, this apple stock price is close to the 39
that we calculated earlier. So that's a good sign. Another thing you can do is say, look
at two stocks that have different stock prices, and the one that has the higher stock price
should be recommended to buy less shares than one that has a lower stock price.
So let's
just use the first two rows of this data frame as an example, you can see that for this stock
with ticker a, it costs $100 to buy it and you're buying 197. And then the second stock
has a much lower stock price. And because of that, you have to buy many more shares
to actually reach your target position size. Now, that kind of all looks good. So we can
move on, I'm going to take away this comment code and just tidy this up a little bit. And
we can proceed to the next cell. All right. So this brings us to the last section of this
project, which is to save our panda's data frame into an XLS. x file for non technical
users to access to excel. So the big idea here is that we did all the work in a Python
script, you could schedule the script to run periodically, and then the output would be
sent to your trading team to actually buy and sell stocks in response to the output.
So to do this, we're going to use the XLS x writer library for Python.
Now, oops, the
new mean remnant. If you remember, at the start of this project, many lines of code
ago, we actually imported this here. So it's actually in our script already, we just need
to use it now to save our data frame as an Excel file. Now XLS x writer is an excellent
package and offers an insane amount of customization. But the trade off for this is that the library
can seem a bit complex to new users, I'm going to do my best to explain it well throughout
this section. And because of that, this might be a bit long.
So if this seems easy, or you've
used that asset, if this seems easy, or you've used XLS x writer in the past, you can feel
free to speed up the speed here a bit or skip ahead. Alright, so the first thing we need
to do is actually initialize our XLS x writer object. And the way that we do that is by
saying writer equals PD dot XML writer. And then here, we need to pass in the file name
that we want to save to in this case, we're going to do recommended trades dot XLS x.
And then we need to specify the engine, which is XLS x writer. Now, let's break this down
a little bit to kind of describe what's going on here. As you can see, the class that we're
initializing here is from the pandas library, and not actually from XLS x writer.
And the
reason why that is, is because since pandas deals with tabular data, and so much tabular
data is actually saved to Excel files. They have a very tightly coupled integration. And
it actually kind of is easiest to initialize a new writer object from pandas rather than
XLS x writer and kind of by default, you always save these objects to a variable called writer.
It allows you to reuse your code without having to change the object name later and it's kind
of just a best practice.
So when I say writer equals P dot x, x l writer, and then this
is the filename that we want to save to and then the engine equals XLS x writer. It may
seem a bit redundant, but this library can also be used to Save XML file. So we have
to actually specify that we want to work in Excel. So that's the first thing we need to
do. The next thing we need to do is actually to pass our panda's data frame into this object
in specify which tab of the Excel file we want it to be saved to.
So to do that, we
do final data frame.to Excel. And then the first thing we pass in is that writer object.
The next thing we pass in is the name of the tab we want it to be saved to so we'll say
recommended trade again. And the last thing we pass in is an index equals false argument.
Alright, so our object has been created. The next thing we need to do is to create the
format's we need for our XLS x file. Now, earlier when I said that this library can
be confusing or complicated to new users, this is where it gets really, really complicated.
Formatting Excel files with XLS x writer is a science and there's a lot to it. But I'm
going to try to keep things simple here. And hopefully, you guys won't run into any hiccups.
First things we're going to do is create, I guess, two variables that specify the color
scheme for our Excel sheet. And then we will reference those variables in all the formats
that we create later on.
So that if we want to change one color, it will be reflected
across all the different formats that we create later, instead of hard coding them in and
having to change many different instances have a specific color. So specifically, what
we're going to create is a background color variable and a font color variable. So background
color, and these are just going to be empty strings for now. font color. Alright, so what
goes in here, these actually store the HTML hex codes for the very first for the colors
that we want to select.
So I'm going to use two colors that kind of match the style of
the Free Code Camp website, I really like it, it's command line chic, and very modern.
So we're just going to follow that to make things easy, the background color is going
to be zero, a zero, a two, three, and the font color is going to be at six s.
Alright,
so that's that. Now we need to actually create a few different formats that we will be referencing
when we actually apply the format's to our cells in our Excel sheet later on. So as you
can see up here, we need a string format $1 format with decimals, $1 format without decimals
and an integer format. So let's start with string format. And then what we're going to
do here is create a writer dot buck dot add format method. Inside this add format method,
we have to create a dictionary. And this dictionary specifies the format of the cells that are
going to have this applied to them later.
So what goes in here, there's three main attributes
on color, background, color, and border. Now each of these should actually be strings,
I'm going to fix that indenting issue. Since this is a dictionary, what is going on, I
find the indenting on Jupyter Notebooks leaves a lot to be desired. Since this is a dictionary,
these should be strings. And we have to add colons to each one. All right, so the font
color is going to be font color. And this font color variable is what we defined up
here, this background color is going to be background color. And this border is going
to be one, this just means that a solid border around each one now. This is our first one,
let's run this code to make sure it works properly.
Awesome. Now we need to basically
template off of this and create a few other formats. So we have string format, and we
need to name the other ones that we described before this code cell is a string format,
dollar format, and in integer format, now, all of these are going to share the same attributes
that the string format specified, we just need to add a few things to each one. So for
dollar format, we need to specify a number format attribute like this. And the way that
this works is actually pretty easy. You just pass in a number with zeros in it templated
the way that you want it to be formatted. So in this case, it will look like that. And
then we will do the same thing for the integer format. So num format and then we're just
going to say zero. Awesome. Let's run this code cell to make sure there's no syntax errors.
Excellent.
Alright, so now we need to apply the format that we just created to our XLS
x file. And this is like a big complicated so what I did here was I actually provided
an example and we can kind of use this to template on for all of the columns in our
sheet. So what we need to do is say writer dot sheets, and then parse out the recommended
trade sheet from an object. And then we use the set column method on that item to success.
To find information about the column we're doing now, we're gonna do this one by one
first, and then we're going to kind of put everything into a loop because it'll be a
bit more efficient on code and much more readable. So let's start with column A. And you actually
can't just put a, you have to put a through a like that. So that tells you the column.
And then we need to specify the column width, let's do 18 pixels. And then you need to say
which format you want to apply to it. Column A is our ticker format.
So we need string
format, it's not a number. Now, if you run that, it returns zero. And that basically
tells you that it ran properly, the way to actually see this in action is you need to
actually save your writer object. Now, to do that, you say writer dot save. And then
if you go to your working directory, where you're running through this, you should be
able to open up your recommended trades. Excel file and see that the first column has actually
been formatted properly. So let's do that for all the other columns. And then we can
change a couple things to make it formatted better. So how to do this, we'll just copy
this a few times and change the letter four times to be specific.
So this should be B
for me, this should be C for C, and this should be d3, D. Now let's run this and see that
what happens to the Excel sheet. Only one was formatted. Why is that? These? Oh, I think
we have to go back here and re initialize this. Alright, let's try now. Awesome. So
as you can see, the columns are being formatted properly, but the headers aren't. Just so
you know, we're gonna solve that later. It's not something that I'm ignoring. But we will
solve that in a few cells. So what we need to do now is we're going to simplify this
by making two loops. And each of those loops is going to work with this column format,
a dictionary that we're going to create.
So let's create column format, and make a dictionary.
Now what this dictionary is going to be is it's going to have a key for each column.
So A, B, C, and D. and the value for each column is going to be a list inside that list,
the first item is going to be the title. And the second item is going to be which format
we want to apply. format. It's a ticker string format, and then stock price, dollar format,
comma, their market cap. And the last one is going to be number of shares to buy and
integer format. So let's run that sudden, that cell ran correctly, which tells you that
that dictionary is formatted properly. And now we're ready to start building a couple
of loops to do all of the work that we did up here, but automatically, so the loop that
we're going to be building specifically is going to loop through every key in that dictionary.
So when you say for column in column, format stuff key, if you've never used the, the keys
method to iterate through a dictionary before, it basically just returns all of these.
So
we print the call and it should print ABCD. Yep. Now what we need to do is create a writer
dot sheets, recommended trades dot set column, we're basically doing all the work we did
up here. So set column. Here, we're going to do an F string that interpolates in column,
column. So this would just be like for column A would be a colon a, we're going to Ms string.
And then we're gonna do 18 pixels wide. And then we're going to do column format, add
column.
So this will be this dictionary at key A. So this would return this list. And
then we need the second element of this list. And since Python is zero index, three parts
one. Great, so that runs properly. Awesome. So if you if we redo all of this to Okay,
here's we're going to do, we're going to comment out this. We're going to put a writer dot
save method there. And then we're going to go back to the start reinitialize, our writer
object and run all this code. Awesome. Let's take a look and see what our Excel file looks
like. Great, so everything's being formatted properly. In that loop. Our code is a lot
cleaner, it's easier to read. It's a big improvement. Now what we need to do is handle these and
columns in XLS x right are kind of hard to deal with, what I always do is I just overwrite
them. So writer dot sheets, recommended trades. And then I do dot, right.
And this right method
you pass in, what information you want stored in the cell, and what format you want to apply
to that cell. You also need to specify the location of the cell. So here's what we're
gonna do renew a one, we're gonna say ticker. And we're gonna say string format. Now, if
we go back to the top again, reinitialize this and run through the rest of this, and
go back to our cell file. The first cell is formatted properly. So now we just need to
do the same thing for stock price market capitalization and number of shares to buy. Let's go back,
easiest way to handle that is to just copy and paste. So this will change to a one B
one note sorry, that should be a one or sorry, a two or B one. B one, Okay, perfect. All
right, a one, B one, c one, D one. And this will be stock price.
With dollar format. This
will be market capital realization with dollar format. And this will be number shares to
buy with integer format. Alright, as before, we're going to go back to the top, we're going
to reinitialize, this Excel writer object, run our code to the very end, and then take
another look at our Excel file. Awesome. So everything looks formatted properly, the only
thing that might be worth improving is to widen this a little bit. But I mean, anyone
who has an auto format or for Excel will be able to handle that pretty easily.
Alright,
so this violates the programming principle of do not repeat yourself, this is very, very
loopable. So we're going to create another loop here, that basically does this logic.
So Alright, so we're going to add this in here. And what we need to do is, say, column
here, and convert this to an F string. So as this loopster through this will go from
a one to B one to C one to D one. And then we need to capture this value here. So it's
going to be column, column, format. So this is the dictionary. And then we parse out columns.
So this is the value of a key value pair, and then we want zero, and this will be the
first entry of the list, that is the value in the key value pair. So that's this, and
then this needs to get replaced with the second entry of the list. Easiest way to do that
is to just copy this and replace the zero with a one. So let's go there, one. Alright,
as before, we're going to scroll all the way up to the top reinitialize, our writer object,
move all the way through to the end.
Let's open this up and see if everything works.
Okay, great. So this is the final output of the first project from this algorithmic trading
and Python course. It says here that the last step is to save the writer output that's actually
this line. So we're already done. And just like that, you built your first algorithmic
trading project in Python, I hope you had fun, the next few projects are gonna be even
more often, you're going to learn a lot more. So I hope you enjoy the rest of this course.
And kudos to you for sticking through it so far. All right. So welcome to the second project
for this course, where we're going to build a quantitative momentum strategy, what we
need to do to start just like before is we need to open up our Jupyter Notebook.
So here,
I'm here in my terminal in my home root directory, and what I need to do is navigate into my
dev folder. And within that dev folder, I need to navigate into algorithmic trading
in Python, which you can see here auto completed for me, so we'll move into there. And now
that we're here, we need to activate our virtual environment. So you'll remember from before
that the command to do that is source and then V and V four slash bin forward slash
activate. If you look to the left and your terminal command, here, you can see that that
V and V in parentheses there shows that the virtual environment has been activated correctly,
and none of the virtual environment has been activated.
You can watch it with Python bash,
and a notebook that will launch a Jupyter Notebook. Now, one thing that I'm not sure
I mentioned before is that you can deactivate a virtual environment by simply typing deactivate.
And if you try to launch a Jupyter Notebook with this, it will actually fail because it'll
say no module named notebook. So we need to reactivate our virtual environment and then
launch our Jupyter Notebook. This will start the server in our browser and we can get started.
Just like before, we're going to want to navigate into starter files and then open up the second
project this time, which is quantitative momentum strategy. So let's open that To get started.
Alright, so just like before, this Jupyter Notebook starts with a bit of background information.
So it says momentum investing means investing in the stocks that have increased in price
the most.
For this project, we're going to build an investing strategy that selects the
50 stocks with the highest price momentum. From there, we will calculate recommended
trades for an equal weight portfolio of the stock. So, this project will combine a lot
of the stuff that we used in the last section. So that repetition will help you remember
what we actually just did, because I know that that last project actually contain a
lot of stuff.
And then we're going to build on top of that by not just equal weighting
a already selected universe of stocks being the s&p 500. But we're going to actually select
a subset of those stocks based on their momentum characteristics. So let's get started by first
importing some libraries. Alright, so just like before, we're gonna need NumPy, and we're
gonna import that as NP, we're gonna need pandas, and we're gonna import that as PD.
So as a quick recap, NumPy is the best Python library for numerical computing and pandas
is a Data Science Library that makes it very easy to work with tabular data in Python,
you'll remember that we worked a bunch of the PD dot data frame structure in the last
project.
And we'll do the same thing in this project. We're also going to need the requests
library for making HTTP requests, we're going to need the math library for performing some
basic math functions. And we're going to need the Sai pi library. Actually, we're not going
to need the whole library, we're just going to need the stats library from Sai pi imports
that. And the last thing that we need is the XLS. x writer library, which we saw in the
last section allows us to easily format and save Excel files from a Python script. Now,
we've seen all of these before except for the sci fi stats module. And what this does
is it makes it very easy to calculate percentile scores. So we'll see later on in this project
that what we're actually going to do is we're going to gather and parse momentum momentum
metrics for all of the stocks in our universe. And then we're going to calculate percentile
scores for those momentum metrics, then we're going to rank the stocks on those percentile
scores, and we're going to select the 50 stocks that have the highest average percentile score
across a broad basket of momentum metrics.
So if you haven't seen side by stats, it's
pretty easy. We're just gonna use the percentile score calculation feature of it. And yeah,
so that's it for our libraries, let's move on to importing our list of stocks. So actually,
we should run this code sale first. Alright, so now that our imports have been done for
our open source Python libraries, we need to actually import our list of stocks. Now.
We also need to import our API token, we did both of these steps in the last project as
well. So before I actually code this out, try to go through and without looking at your
last one, figure out how to import your list of stocks into a panda's series and figure
out how to import your API token from your secrets file.
So cut here and try to do that
before proceeding. Okay, I hope you had success doing that if you didn't, the command to import
your list of SOS PT, underscore, sorry, dot read underscore CSV, and then type in your
list of stocks. Now, it should autocomplete if you just type in SP. And then you can print
it out to make sure it looks okay.
Just like before, it has all 505 stocks in the s&p 500.
And then for the secrets file, we went from secrets import. And if you hit tab, you can
see that the is how the API token actually shows up here. So that is great. We can run
this code cell to get both of those imported, and we can move on to make our first API call.
Now, this is where this project will start to be quite a bit different than the last
project that we worked through. Because we're actually going to be pulling different stock
market information from the IE x cloud API. More specifically, instead of trying to pull
price and market capitalization, we're going to be pulling price in one year stock return.
So to do that, we're going to go to the i x cloud docs.
Oops, let's try and keep this
all in one browser window. There we go. So I x docs. And we're gonna go here, it's usually
the first non Add Entry in a Google search. And then we're going to run a Ctrl F and see
what kind of return metrics we can actually find in here. Okay, so return is clearly not
a good thing to search for, because it gives 341 heads and you can see why that is, is
because a lot of these will say most endpoints return.
So the fact that endpoints can return
something in stocks can also return a percentage is not good. What if we just type it in momentum?
These are all more complex momentum metrics than we need. Okay, what if we type in price
return? Zero if we type in price return with no space. Okay, once we type in year return,
this might be REST API endpoint that gives us a one year return or something like that.
Still nothing. What if we do performance? Okay, this gives us 21. sector performance.
As you can see, it's not always super straightforward on how to find these. Okay. So now I'm searching
through return again, it is a lot of hits. But I feel like it's going to be near the
top of this documentation page. Okay, so this looks promising. This is a historical prices
endpoint. And this would actually allow you to get the job and you could pull in historical
prices for let's say, one year, and use the today's price divided by the price one year
ago, and then subtract one and that will give you your one year price return.
However, I
know that I x actually calculates this metric and I separate endpoints. So we're going to
keep moving through this until we find it. Alright, so I had to cut their terms of what
we're actually looking for is the word change, not the word return. And I do a Ctrl F, it's
the first thing we find, of course, so this is the key stats endpoint, which you can get
with the stats keyword in the URL. And if you look down here, there's actually an awesome
group of momentum metrics. So you can do maximum percent change five year percent change to
year percent change one year, year to date, six months, three months, in case you can't
see us all zoom in a little bit. I guess you can't zoom I There we go. Woof. That's a lot
of zoom. There we go. There we go. Alright, so max change five year, two year, one year
here today, six months, three months, one month, three days, sorry, 30 days and five
days. So this is the stats endpoint we're going to do is we're going to copy this.
And
we're just going to put it in our Jupyter Notebook for now to come back to later. And
we also need the sandbox, base API endpoint. zoom back to normal zoom, because it's a little
easier for me to navigate. So here's the testing sandbox, we're going to find that this is
the base URL, we're going to put this in here, right here. And then remember from last time,
we need to actually include stable here. So this is a example of the base URL that we
need to hit with our API call. We're going to call this string API underscore URL. And
then we actually need to specify a symbol. So just like before, we're going to use Apple
as an example symbol. And we're going to use an F string to interpolate that symbol variable
into this, we can take this right off the end, because we're not actually trying to
parse a specific stat. And then we can use the requests library to make an easy API call
to get this data.
So to do that, we're going to say data is equal to request dot get, and
then we're going to do API underscore URL. And then we're going to sprint the status
code of that HTTP request to make sure that went through properly. So this gets 400. We
did something wrong. What is it? sandbox that is, oh, we didn't pass on our i x cloud API
token. So we need to do question mark token equals IE x and then tab, it should autocomplete
and try that.
Alright, so returns 200, which means the API call is working properly. Let's
transform this into a JSON and put this again. Alright, so as you can see this content. As
you can see, this contains all the information that it should contain. The information that
we specifically want is the one year price returns for every stock in the universe. So
to do that, it's actually quite easy. We're going to cover that next. So to parse a JSON
object like this, we saw in the last section that you just pass in square brackets, and
then the key of the dictionary. So the key that we want now is one year change percent.
And all these keys generally don't start with integers. They start with letters so that's
why it says year five instead of five year and etc.
But this is the We want, so we're
going to copy this, we're going to move down here, we're going to add that in. And then
this should give us 1.42. So what does that mean? That means that Apple has increased
in price 142% in the last year, which is pretty impressive, it means that the stock has more
than doubled. But I guess that's what happens when you sell a lot of expensive iPhones.
So just like before, we're going to kind of loop through all of the stocks in our universe
and provide API calls for all of them. Instead of living through them one by one, like we
saw in the last section, which is extremely slow, we're going to move right on to executing
batch API calls, because that's kind of a better practice. And there's no point practicing
executing single API calls when you'll generally never do that in practice.
So the first thing
we need to do is chunk our list of stocks into groups of 100, using the chunks function
that we created last time. So to make this easy, I've actually included a bunch of reusable
code from the last project here. So just run this code cell that will give us the chunks
function. And then we use the chunks function to divide our list of stocks into symbol groups
of length 100, then we create an empty list called symbol strings. And then for every
100 stocks in this symbol groups list, we create a comma separated list of strings and
then add them to this simple strings variable.
So if you go through every symbol string in
that list, we did this last project as well. But we're gonna do it again, just for practice.
And then you print it, you'll see that this is basically just five lists. And I'm sorry,
there should be four in Yeah, so this is just five lists of length 100. So when I say lists,
they're not Python lists, they're actually strings separated by commas, but there's five
of them. So here's one, here's another. Here's where the third one and fourth one, the fifth
one, and I said there's five, there's actually six sorry, that's because we have this short
one of length five at the end.
Okay, so that's all of this, I'm going to delete this out.
And we are going to move on to creating a blank panda's data frame. So just like before,
we're going to call this final data frame. And we're going to instantiate it by saying
PT, dot data, frame, and then columns equals, and then you'll see up in this code, so I've
actually defined the names of the columns we want here. So it's ticker price, when your
price return and number of shares to buy. These are basically the same column names
as we used in the last project, except market capitalization has been replaced by one year
price return here. So we can go down here and specify those columns by saying columns
equals my underscore columns. And then, if we print this, we should see an empty panda's
data frame with the columns we specify. Awesome, so that's good. We're going to delete that.
And then we need to loop through all of our symbol strings in the symbol string, symbol
strings object and create a batch API call.
So for symbol, string and symbol, strings,
and then we want to create a batch API call URL, we're going to leave that as an empty
string for a moment. And then we want to create a data variable. And that's going to be a
requests dot get method that accepts batch API call URL, and then calls the JSON method
on it to transform it from a request subject to a JSON object. So what goes in here? Let's
go back to the iCloud Doc's to look at fact API call syntax again. Alright, so I just
did a Ctrl, F for batch here to find this section, which tells you everything you need
to know about batch requests, so stock, and then symbol, and then batch.
Actually, we
want to query. So this syntax here is for querying multiple endpoints for one stop,
but we want to query one endpoint for multiple socks. So we actually want to use this syntax
here. So I'm going to copy this link address. And I think this just gives us the last bit.
Now it gives us the entire thing, we're going to change this to sandbox, because by default,
the docs give you a a live URL, not a sandbox URL.
And then we're gonna want to move down
here and change a few things here. So what endpoint Do we need first, actually, to start,
move this up here so you can actually see it. To start, let's take off this last parameter
and the range parameter. So those can both be gone. And we're going to have to add back
our token equals, and then is cloud. Oh, sorry, this isn't an F string. So that won't autocomplete
i x cloud. Yep, I x cloud API token, then we're going to want to change the endpoints
next. So what goes there? Let's scroll back up to see which endpoint we actually queried
earlier. Alright, so we use the stats endpoint. So let's put that down into our batch API
call. stats. Then here, we want to change our symbols to be an interpolated variable
that matches whatever the incrementer is of our for loop. So that is symbol string. Alright,
so now what we need to do is, we're going to run over this loop once, just to make sure
that we're getting a 200 HTTP code to make sure that the HTTP request is being executed
properly.
So to do that, we're going to print data dot status code. And for this to actually
work properly, we're gonna have to remove this JSON method temporarily from the HTTP
request. And then here, we're specifying what we loop over, we're gonna want to change this
to a colon one, which tells us to loop only over the first entry of that string. So if
this is working properly, we will get a response of 200. Awesome, so that's good, we can get
rid of this status code attribute from the data object and call this JSON method on it
to see what the structure of the HTTP response actually looks like. So this is basically
a huge dictionary, it has many levels. So at the top level, there is a key for this
ticker, and a value of a dictionary.
And then within that dictionary, there is a key for
the endpoint stats, and a dictionary for the value. And then within that dictionary, there's
a key for every metric with a value that corresponds to that metric. So we have to do multiple
levels of parsing here. But before we do that, we actually have to loop over all of the stocks.
So let's let's build that loop. First off we're going to do is we're going to say for
symbol in symbol, string, dot split, and we're going to split on the colon character, which
separates all of the tickers in the symbol string.
So let's see what that split method
actually returns before we go any further, Copy that, comment that out and then print
that oops, print that print, this is what I want to print. There we go. Awesome. So
let's run that and see what it gives us. As you can see, this is a Python list where each
item in the list is a symbol that corresponds to all the comma separated symbols in the
symbol string. Okay, perfect. So we'll get rid of this did not mean open my dev tools.
We will uncomment this.
And then we will start parsing the data in this loop. So we're going
to say. So what we want to do here is we want to, for every ticker in this list, we want
to append the relevant metrics to the final data frame. So to do that, we're going to
say final data frame is equal to final data frame dot append. And then within this append
method, there's a few things we need, we need a pandas series. And that pandas series will
accept a Python list. And it will also accept the index equals my underscore columns argument
that, as you'll recall, from the first project, this tells the append method were to actually
put the new metrics in the existing panda's data frame. And then outside the panda series,
we're going to actually add ignore index equals true. Okay, perfect. So this is where we want
to do our parsing. The first metric we want is symbol. The second metric we want is price.
So to get price, we're going to need to do data, and then parse the symbol, and then
parse the stats endpoint.
And then we're going to have to actually see what the name of the
price metric is within this. So let's write here. Let's just print out one example of
this. Oh, sorry. All right there, right here. Let's crank data. And then we'll use Apple
as an example ticker. And then we'll use stats. And then let's just see what this gives us.
All right, there's giving us an error, because we didn't actually format the rest of it properly.
But let's look and see here what we have for price. I'm going to do Ctrl, F for price,
and see if anything highlights up here.
Market cap employees. Doesn't look like this actually
has any price data. So we need to add here to our endpoints, we're gonna say price, and
stats, let's go back gives us if we parse price, I think this just gives us 504 hours.
Okay, perfect. So this is what we need to do for price. We're going to replace Apple
with a symbol that we're looping over. And then we're also going to pass in one year
price return. So we can use this print statement up here to see what we need. We're getting
an error here because we're trying to open To values to a data frame that has four values.
So for now, I'm just going to put in NA placeholders for both of them. And actually, for the number
of shares to buy, that na placeholder is actually going to stick. So we only need to figure
out the one year price return. So we'll run this to get rid of that error. And then look
in here to see what we need to parse out for the one year price return.
All right. Year
One is what we need to search for, okay, here's the metric year one change percent. So we'll
copy that, we will paste that there for now. And then what we want to do is copy this,
paste it there, and then add this in as a additional level of parsing to that. Alright,
let's run this, I'm going to take this off. And this means that instead of looping over
just the first symbol string, we're going to loop over all of the symbol strings. And
then inside of this big loop, I'm going to print the final data frame so you can see
what it looks like. With any luck, this should be okay. I'm sorry, I'm going to do this again
and remove this print statement. Alright. Float object is not sub scriptable. Huh, that's
because we need to change this endpoint to stats price. Alright, so this will take a
sec to run. It's a pretty big data frame to build. Awesome, everything looks good. So
just to recap, what we did here was, we created an empty panda's data frame with the columns
that are specified here.
And then we looped over all of the symbol strings in our symbol
strings object, we created a batch API call URL for those symbol strings that hits two
different endpoints, it hits the price endpoint and it hits the stats endpoint. And then we
use the requests library to execute an HTTP request and get that data in the form of a
JSON object. For every string in that comma separated string of string, or sorry, for
every symbol in that comma separated string of symbols.
We split them using the split
method to get a list of them. And then we looped over that list to a pend the data for
each symbol to our empty panda's data frame. And then we printed the panda's data frame.
And it looks like this. Everything seems to be working so far, so we can move on to removing
low momentum stocks from our panda's data frame. Alright, so like the other code cells
in this project, this has a bit of background information, it says the investment strategy
that we're building seeks to identify the 50 highest momentum stocks in the s&p 500.
Because of this, the next thing we need to do is remove all the stocks in our data frame
that fall below this momentum threshold.
To do that, we'll sort the panda's data frame
by one year price return and then drop all stocks outside the top 50. That sounds easy
enough, and pandas actually makes it very easy to sort data within the data frame. So
we're going to use some built in functionality of the pandas library to do this. Alright,
so the first thing we're gonna do is call final data frame. And then on this final data
frame, we're going to call the pandas method that allows us to sort the data frame based
on values contained as columns. And that method is sorted values. Now, this takes a few different
parameters. The first one is the column that you want to use to sort so we're going to
use one year price return, if you just type in one and then tab, it might autocomplete.
Now if it doesn't autocomplete, you can just go up here and copy it, copying is always
a good idea, because there's lots of different ways you could spell this and just kind of
avoid any typos, although typos are pretty easy bug to fix.
Alright, so one year price
return. And then the next parameter it accepts is whether or not you want it to sort ascending
or sort by descending. Now, we want to sort descending values so that the highest momentum
stocks are at the top. So to do that, we'll say ascending equals false. And then the last
thing we want to say is in place equals true. Now, if you've worked with pandas lots in
the past, you're definitely familiar with this in place equals true method. What this
does is it changes the function from just returning a sorted data frame to actually
modifying the original data frame. Now, to see this in action, you can take it away,
run this, and it shows the highest one year price return stocks at the top.
But then,
if you run that line, and then print the original data frame right below it, it will not return
the sort of data frame, it will just return the original one. So this shows you that without
the in place equals true method that doesn't actually modify the original date. Right now,
if you add the in place equals true method, that's not true, it will return the sorted
data frame. So that's how we start the values. Now what we need to do is modify the panda's
data frame so that it only includes the first 50 rows. Now to do that, it's actually quite
easy. You can just pass in the dash 50 Yeah, dash 50 or sorry, not dash colon 50 specification
into square brackets like this. Now, the indices are all different now because it's been sorted.
So I'm actually going to just make sure that this is right by doing length. Yeah, so it's
50. Great. So this has 50 stocks now. And then to make this so that it actually modifies
the original pandas data frame, all you need to do is say file dot train equals finals
that frame equals 50.
Now if you print it, it will just return a data frame with 50 rows.
Awesome. So awesome. So one last thing that we can do is to reset the index. So it doesn't
start and have kind of this random list of numbers. So the easiest way to do that is
just with pandas reset underscore index method doesn't actually accept any parameters. But
as you can see, it does the job. Now, just like before. Without the in place equals true
method, this doesn't actually modify the original data frame, so we'll have to specify that
in here. You'll notice as you work more and more with the pandas library that this in
place equals true parameter works in almost all of the functions that you use regularly
in the library. So there we go. So just to recap what we did here, we took our final
data frame object, which contains price and momentum data on all the stocks in the s&p
500. And we sorted its rows based on their one year price returns such that the highest
momentum stocks are at the top, and then we use the in place equals true parameter to
actually modify the original data frame instead of just returning a temporary copy.
Once that
was done in the next line, we modified the data frames such that it only contains the
50 stocks with the highest price momentum. And then we use pandas reset underscore index
method to change the index of the data frame so that it actually runs from zero to 49,
or zero to 50, I should say, zero to 49. Yeah, zero 49. And then we just printed out the
final data frame on the last line.
So with that out of the way, we can move on to actually
calculating the number of shares to buy for this simple momentum strategy. Alright, so
just like in the last project, this is going to be kind of similar, the only change we're
going to make is that we're going to wrap the functionality of accepting the port portfolio
size inside of a function because we'll actually be using the same functionality later in this
tutorial, when we build a better and more robust strategy, you can actually see that
heading here, we started in the next code sale.
So to do that, we're gonna have to wrap
it in a function and it's probably not necessary, wrap it in a function, you could just copy
paste code, it is a Jupyter Notebook after all, but it is definitely better to create
functions if you're ever going to reuse any code. And since we're not building too many
functions in this Python course, it's kind of a good excuse to practice building Python
functions. So if you're familiar with Python functions, and since we've already done this
same functionality in the last project, try to pause this video here and try to complete
the function yourself without watching me code through it.
So break now and try that
yourself. Alright, so we're going to do is we're going to say def portfolio underscore
size. So if you're not familiar with function syntax, in Python, this is just defining a
new function called portfolio size. And the empty brackets here means that it's accepts
no parameters. And then the first thing we're gonna do is we need to create a global variable
in here. So this function can define a variable, and then it can be accessed outside of that
function. So if you were to do that as global, and we're going to call this variable at portfolio
underscore size, and then what we're gonna do is we're gonna say portfolio. portfolio,
underscore size equal to input. And then just like before, we're gonna say, enter this size
of your portfolio. Now, as we saw in the last project, we need to make some special functionality
here such that if someone tries to enter a string into this Python input, it will tell
them to re enter a float value.
So we're going to do inverse A try float, portfolio, underscore
size. And if this returns an error, we're going to say, print. That is not a number.
And then we're going to print again and say, Please try it again. And then we are going
to say, basically, this exact same functionality. Great. And then the only other change we need
to make is to actually specialize this accept statement so that it only works on the certain
type of error that would happen if someone entered a string into this. So that type of
error is a value error. So we'll do that here. And actually not to look at this, this is
a bit unnecessary with two separate print statements. So I'm going to copy this here
and do in newline character and try that. Okay, so let's run this and see if it defines
properly. Awesome, it does. So we're going to use this portfolio size function to accept
a portfolio size now and then we're going to use it again later to accept a portfolio
size for our better and more realistic momentum strategy.
But we're going to make sure this
works. And actually, before we proceed, one thing that's worth noting is that this should
function should be called portfolio input. And it doesn't actually have to be called
portfolio input. But it definitely should not be called portfolio size, because then
it has the same name as this variable. And that's kind of an easy way to introduce logical
bugs in your code is by having a function and a variable with this same name. So we're
going to change this to portfolio input, and we're going to run this code cell to define
it, then we're going to test it by running portfolio underscore input. And then we are
going to print portfolio underscore size. Now, here, it's especially important to print
the variable that's defined inside the function because we need to make sure that this, this
global keyword is working properly. So let's run this and see what happens. So another
size of your portfolio will enter $1,000, and it should print $1,000.
Awesome. Now let's
try this again, we'll put in a string. So I will say, my portfolio is too small to matter
to you. That's kind of sad, but we're gonna try it and then it says that is not a number,
please try again. Okay, so my portfolio is one. Awesome, so our portfolio input function
is working properly, we can now move on to looping through our panda's data frame and
calculating the number of shares to buy. So the first thing we need to do is actually
calculate our position size. And to do that, I'm going to actually enter a slightly larger
portfolio. So we're gonna say, That's 1000. That's a million. Let's do 10 million. And
what we need to do is calculate position size. So to calculate, to calculate position size,
we're going to do position size is equal to float portfolio size, divided by the length
of our panda's data frame.
So we'll say length of final data frame, and then you can pick
the length of any column, I usually do dot index for this. So this gives us let's print
it out to see what the position size is. This gives us is that 200,000? Yep, yep, 200,000.
And that makes sense, because we're just doing 10 million divided by 50. So that kind of
math checks out. And then the next thing to do is we need to loop through all of the rows
in the panel data frame. So for that, we're gonna say for i in range, zero, and then the
length of the data frame, final data frame. And then let's just print it to see what happens.
So this goes from zero to 49. And if we go back up to a panda's data frame, we can see
that the last index is 49. And the first index is zero. So that's awesome. That looks like
it should work properly.
Let's take away this print statement now. And we're going to loop
through all of the entries within the number of shares to buy column. Now, the way to do
that is with pandas LLC functionality, we're gonna pass in I for the row, and we're going
to pass in number of shares to buy for the column. Now, what we want to do here is just
make sure that we're accessing this properly. So we're going to say zero. And then at the
bottom, we're going to say final underscore data. Now, if this works properly, what should
happen is every entry in the column of number of shares to buy should be replaced with a
zero. So let's try this. Awesome. Now we're getting a warning here, because it's saying,
I think this is actually a bug with penisula, I'm not sure the setting with copy warning,
it's because it says, If value is trying to be set on a copy of a slice from a DataFrame,
try using dot LLC equals value instead.
And that's kind of exactly what we're doing. So
I'm not really sure what's going on here. But you can definitely ignore that for now.
So now what we need to do is calculate actually how many shares to buy. So to do that, we're
going to have to access to the Price column of the data frame, and we're gonna have to
divide the position side by price. So first, we'll take position size, and then we will
divide it by this LLC method, but instead of number of shares to buy, we're just going
to replace it with price. Now, let's run this and see what happens. Awesome. So as you can
see, this is giving us fractional shares for everything. And like we talked about in the
last section of this course, we don't want fractional shares for this. So the last thing
we need to do is wrap all of this in a math dot floor function. Oops. And this will round
every number in this number shares to by column to the lowest, closest interviewer.
So let's
try that. Awesome. It's looking good. I think we are ready to proceed to building our better
and more realistic momentum strategy. So let's do that now. Alright, so it's time for us
to build a better and more realistic quantitative momentum strategy. Now, what does it mean
for a momentum strategy to be better? There's a lot of different ways you could define it.
Obviously, a strategy is better if it performs better, but it's kind of hard to tell that
in without foresight. So what I mean by building a better momentum strategy is that real world
quantitative investment firms typically differentiate between high quality momentum stocks and low
quality momentum stocks. So a high quality momentum stock is a stock that over a long
period of time shows slow and steady price returns. So you know, if a stock returns half
a percent every month for a year, that would be kind of like a high quality, slow and steady
momentum stock, a low quality momentum stock might not show any momentum for a long period
of time and then surge upwards over a short period of time.
So if you haven't done VAT
returns 0% for 11 months and then 10% in one month, that would be considered a low quality
momentum stock, even though it's one year price return would be 10%. And might be better
than a lot of other stocks. So it might be hard to tell why that definition of high quality
versus low quality stocks is important. But the reason why is because you want stocks
that are seeing price returns in reaction to slow and steady business progress, not
one time Business News. So the example that I used here, as I said, the reason why high
quality momentum stocks are preferred is because low quality momentum can often be caused by
short term news that is unlikely to be repeated in the future, like an FDA drug approval for
a biotechnology company. So this kind of begs the question, how do you identify high quality
momentum stocks, and the best way to do it is to take a basket of different time series
momentum metrics, and use them all when calculating which momentum stocks to include in your strategy.
So to identify high quality momentum, we're going to build a quantitative momentum strategy
that select stocks that have the highest percentiles of one month price return, three month price
returns, six months price returns, and one year price returns.
And we're going to call
this our high quality momentum strategy. So you'll see that I use the abbreviation of
HQ m fairly often. And that stands for high quality momentum. Let's start by building
our data frame so so the first thing we need to do is actually specify our columns. This
is going to be a much longer Python list than what we used in the past for columns, because
it's going to include so many more metrics.
So the first thing we're going to use is ticker,
and then price. And then number of shares to buy. We're gonna have to create a bunch
of slots for our, for our return metrics and our percentile metrics. So to give you like
a broad overview, we're gonna fill in one column for each time period that has how much
that stock has returned in the last year. And then right beside that column, there's
going to be another column that has the percentile score of that return relative to all of the
other stocks in the universe. So we're going to do one year, Christ's return, one year,
return percentile, and then six month, price return. And then six month return percentile,
and then helps three, what's the next one, three months, yeah, three month price return.
And then three month, return for same trial, and then one month price return.
And one month,
return percentile. Alright, let's run this just to make sure it's formatted properly.
Awesome. So now we can create our data frame which is going to be HQ n data frame is equal
to PD data frame. And we're going to specify columns equals HQ m underscore columns. Awesome.
Now let's print this HTML data frame to make sure that it is an empty panda's data frame
with the columns that we specified. Awesome. Now, we can start looping over our symbol
string object that we created earlier for symbol, string, and symbol strings. And here's
where we have to create our batch API calls and do kind of all the similar logic that
we use before. So the first thing we're going to do is execute our batch API call.
So we
can actually use very similar logic to that as before. So we're going to scroll up here,
find our old batch API call, and use the same logic because we don't actually need any metrics
that weren't included in this. So we'll copy both of these lines. So what we're grabbing
here is the batch API call URL, and the HTTP request that grabs the JSON object from that
URL and stores it in our Python script. So we'll scroll back down to where we were here.
And we're going to add this to this loop. All right, and now we need to loop through
all of the symbols in that symbol string. So we're gonna use for symbol and symbol underscore
string dot split on the comma, character, just like before.
And then in here, we need
to parse out all of these metrics. So whenever I'm doing high level parsing like this, I
find that it's easiest to just literally copy this in, and then you can replace each metric
with what you need. So this is going to be an append method. First HQ m underscore data
frame is equal to h qm underscore data frame dot append. And then in here, we have to pass
in all the things that we've been using throughout this course. So first, it'll be a panda's
series. And inside that pandas series, there'll be a pipe tongueless we'll fill that in later.
And then we have to say index equals HTML underscore columns. And then here we have
to add the special ignore, index equals true. Alright, let's move back inside this Python
list, we're going to paste those metrics. And then I'm just going to get the indentation
looking a little better here.
So do that. Alright, so one easy thing we can do is all
of the percentile metrics, we're just going to fill in to start with na. And we will loop
back through all of those metrics. And use the Sai pi library to calculate their percentile
scores later. Alright, so that's easy. ticker will be replaced by symbol, which is the incrementer
of the for loop that we're in here, price will be replaced by the price metric, the
price endpoint from the cloud API. So to do that, we're going to say, data and then symbol.
And then price.
Awesome, the number of shares to buy just like before will be filled in
with an NA placeholder. And the one year price return we actually already parsed out in our
last example, so we can just go back up and copy that down to our new high quality momentum
strategy. Alright, so here's the original parsing of the one year price return, we can
copy that, scroll back down to where we were.
Sorry, right here. All right. And we will
replace this with the parsing of the equivalent metric. Awesome. And then, as you can see,
here, we are just going to copy this down to all of the other price returns and then
change them to fit. So this is month six instead of year one. And then we'll copy this and
move it down to here. And this will be month three instead of MONTH six. And this will
be month one instead of MONTH three. All right, so let's run this to see if it works. And
then I'll do a quick recap of everything we did in that cell. So there is invalid syntax
here. There's a missing comment there. Let's try this again. Length of past values is 11.
Index implies 10. What's going on 1-234-567-8910 11. So we're missing a comma here. And if
we run this, now, it should work. Alright, so that is running correctly, it looks like
we're gonna run this again and print out the final date, or sorry, it's not the final data
frame anymore. It's the HTM data frame. And see what happens.
All right, so this is a
big data frame quite a bit bigger than the ones we worked with. In the past. It has ticker
price number of shares to buy when your price return, one year return percentile, six month
price, return six month return percentile, three month price, return three month term,
percentile, one month, press return and one month return percentile. So we have lots left
to do to this data frame. But first, we're going to go through to each of these percentile
columns. And we're going to replace these na values with a percentile score that shows
how high a company's momentum is for that time period, relative to all the other stocks
in the universe.
So this is we're gonna do this, there's going to be a lot of repeated
logic in this section. So I find the easiest way to do this is to create a array called
time periods. And then inside this array, we're going to create a string for each time
period that we're going to be looping over. And then we can use string interpolation to
make this much easier. So we're basically going to say our time periods are one year,
and six months, and three months, and one month. Now, he's not exactly clear yet how
we're going to be using this, what we're going to do is we're going to loop through all of
our columns and replace this value with a value from this list so that we have to write
the logic one, the logic ones, but we can actually use a loop to calculate the percentile
scores for all of the percentile columns.
So that's kind of what we're doing here. Now,
the next thing to do is create the actual loop. So for row in HQ, m dot data frame dot
index, and then we need to use another loop inside of this. This inner loop is going to
loop over the time periods object that we just created. So for time period in time periods,
And then here, we're going to use the pandas LLC method to change the value of each time
period column. So we're gonna say, HQ m, data frame dot LLC, the row is going to be this
row incrementer, that we're looping over here. And then the column is going to be we're gonna
make an F string here. And we're going to say, it's going to look something like this
with an interpolated value. And so let's copy that, we will put that in here. And then this
is the time period that we want to interpolate.
So we will Oh, and just so you know, you've
probably seen me do this quite a few times in this course, all that I'm doing there is
I highlight the string, and then I do a, you know, I enter the curly bracket character,
so I just enter that character, and it will automatically wrap whatever you have highlighted
in those curly brackets, it's quite a nice little feature. Okay, so we're going to replace
this with time period. And then what we want to do is assign that to something. So to start,
let's just try assigning them all to zero. And then oops, didn't mean to edit that. And
then at the end of this code cell, we will print HTM underscore data frame. Let's see
what happens. Alright, so you can see all the percentile scores are being replaced with
zero, which is awesome. But that's not quite exactly what we need to do yet, we need to
actually use the sci fi stats module to calculate the percentile score based on whatever's inside
of this.
So let's do that. Now. The function we're gonna use is stats dot percentile of
score. And then how this function works is it takes two arguments, the first argument
is the entire column. And then the second argument is an entry from that column. So
to start, we're going to need the column which is simply HQ m, issue m underscore data frame,
I'm going to scroll this up to make sure you can see it, and then we need to pass in this
string.
Sorry, not that string, we need to pass in the this string as the column name
and then interpolate in the time period. So it will look like this. Awesome. Now, what
we need to do next is interpolate in the time period value, so we need to change this to
an F string, and then change this to time period. Alright, the last thing we need to
do is put a comma here, and add in the actual value that we want to calculate. So what we're
going to do is use the LFC method, I'm going to copy this and put it in as a placeholder.
And then all I have to do is change the oops, the column name to match this. Now, I know
that this is a really long Python line, we're gonna refactor it in a second.
So just hang
tight. Copy that, and put that in here. All right, looks like I got some duplicate apps
in parentheses. Okay, let's try this. Alright, so these look quite good. You can see that
Apple, which has one of the highest percentile scores for one year has a sorry, has one of
the highest momentum scores for one year has a 99th percentile. So that's good. Everything
here looks pretty logical to me. So what we're going to do now is refactor this code, so
it's much more readable. So first thing we're going to do is we're gonna say, change row
equals in our sorry, change column equals something and percentile call is equal to
something, the percentile column is this. So we can copy that and replace this with
this. And the return column is this. Or the change column mean? Sorry. Alright, so let's
go back through and replace these.
Oops. And we'll go down here and replace this with the
same thing. Okay, this is quite a bit more readable. Let's scroll up. So you can see
what actually did here. So basically, this is saying, we're going to take the HTM data
frame, and at row of row, and in the percentile column, we're going to assign it to this calculation
stats dot percentile of score. And then we're going to call this H chrome underscore data
frame a change call. So this is the column that we're using to base our percentile calculations
on. And this is the entry from that column whose percentile score we want to calculate.
Now, one other thing that we could do to this is we could say, change this import somehow.
So if we go back to the top, and we say, copy this, and move up to here, and from Sai pi,
import stats, and then we could say, okay, percentile of score is what we want.
Score.
Score. Okay, so what this is, we run this import now. So we're saying from the side
PI's stats module, import the percentile of score, function as score, so we can go down
and erase that very long function call and just type score instead. And all those changes
will make this much more readable, it means you won't have to scroll horizontally to actually
see the entire line. Just so you know, in Python, and most software development kind
of considered a bad practice to have really long lines. So if you can break it up like
this, it's much more readable and much better, I think, I believe too much. Take this. Alright,
but almost fits on one line. Now, it's still quite a long call. But if we run this, now,
it should have all the same functionality. Great. So now we can move on to actually calculating
the HTM score. Now, we haven't actually talked about the H COVID score yet.
So let's do that.
Now. The HTM score is just going to be an arithmetic mean of all the percentile scores
for the momentum returns that we calculated in the last section. So to do that, we're
going to import the mean function from pythons built in statistics module. So we're gonna
say from statistics, import mean, let's run that to see if it works properly. Now, the
way the mean function works is you can just say, like, a few numbers in here, what's the
mean of two and six? Well, this should be four. And mean takes one position. Sorry,
this has to be a list in here. Yeah, so that's kind of the mean function works, it's kind
of exactly what you'd expect, we're going to use that to calculate the average of all
of the percentile scores that we calculated in the last section of this project.
And we
actually have to do this row by row. So the first thing to do is to create a loop. We've
been using lots of loops in here, but it's good practice. So for throw in HTN, data frame
dot index, and then the first thing we're going to do is create a empty Python list
called momentum percentiles. And then we're going to loop through all of the appropriate
columns in the data frame to get the momentum score for that row. So let's first do momentum,
percentiles equals empty Python list. And then we need to loop through all of the time
periods and the time period object that we created earlier.
So for time, period and time
periods, and then we need to append the momentum scores to this momentum percentiles list.
So momentum percentiles dot append. And then in here, we want to say, an LLC method on
the HTM data frame HTML underscore data frame, dot LLC, the first entry will be the row,
the second entry will be an F string. And inside that F string, we are going to say,
time period, and then return percentile. Alright, let's try this just once. And see if we actually
get a reasonable momentum percentiles object. Alright, so we get 8585 6235. Let's go up
here and see if that looks right. 8585 62 and 35. Awesome. So now what we can do is
calculate the mean.
And we're going to put that mean in the HTM score column. Now, looking
at this, in hindsight, we don't actually have an H COVID score column doing. What we're
going to do now is go back up to the top and add our h qm score to our columns. So first,
add this to this column. For now, it's going to be an NA placeholder, just like all of
these were h score. So we have to add and add a placeholder here. And now we just need
to go back to here and rerun all of our code cells. If this runs correctly, then we know
that everything we just did worked awesome. So now, we need to say that each gram column
needs to be assigned the mean of all of the values in the momentum percentiles list. So
we can do that again, using an LLC method. We'll say HTM data frame, dot LLC, and then
the first sector This is row The second is HTM store. And this needs to be assigned to
the mean of momentum.
Percentiles and now if we print the HM data frame after running
this All of the entries in our HTML data frame should have a HTM score calculated on the
very right. Now the only other change we need to make is to delete this limiter on the end
of index. And let's see how this looks. Awesome. So it looks like we've successfully calculated
an HTM score for every stock in our panda's data frame. Let's do a quick recap of what
we did here. So from the Python statistics module, we imported the mean function. And
then we loop over all of the rows in our panda's data frame. For each row, we re instantiated
an empty Python list called momentum percentiles. And then, for every time period in our time
periods list, we lived through the time periods and to the momentum percentiles list, we upended
the percentile scores for that stock, and then we used the LSC method to assign the
average of all of those percentile scores to the HTM score column. Then, to conclude,
we printed out that panda's data frame.
Alright, let's move on to selecting the 50 best momentum
stocks. So as before, we can identify the 50 best momentum stocks in our universe by
sorting the panda's data frame on the score that matters to us. So in the last section,
that was one year price return, and this section is actually going to be HTM score. So the
logic here is very similar, we're going to do HTM underscore data frame. And we're going
to use the sort values method, sorry, underscore values. And we're going to pass in here is
the HTM score. And then we also need ascending equals false. So this will put our highest
ranked momentum stocks at the top. And the last thing we need is in place equals true,
this will cause the store value function to actually modify the original data frame. So
let's print it out and see what this looks like. Alright, we can see our top few stocks
all have extremely high momentum here, which is a great sign. Okay. The next thing we need
to do is to drop all stocks except for the top 50.
And we saw how to do this earlier,
you just say HTM data frame equals HTM data frames, and then index it from calling 50.
And if we print this out, now, you can see that it is shorter than it was before, it's
hard to tell how much because we haven't reset the index yet. So let's do reset underscore
index. Sorry, that's a method, not an attribute. And it goes from zero to 49. Awesome. The
last thing we need to do is Pat place in place equals true into this. And then if we print
it out, we should see our finalized hmda data frame that goes from zero to 49 on the index
column. Awesome. Great.
So we are ready to start calculating the number of shares to
buy. And we'll use the portfolio input function that we created earlier. So let's run that.
And named portfolio input is not defined as we as I spelt it wrong. Let's do $1 million
portfolio. And then in the next section, we need to calculate the number of shares to
buy and this is exactly what we did before. So first, we need to calculate position size,
which is going to be the size of our portfolio divided by how many stocks are in it. So the
first thing we need to do is take the float value of portfolio underscore size, and then
divide it by the length of our HTM data frame. And you can take any column here, I usually
take the index, just like that, and then let's print out position size 20,000, that looks
right. Awesome. Now we need to loop through a row or panda's data frame.
So we'll say
for I and H m data frame index. And then we need to assign the number of shares to buy
to the number of shares and I call them so we'll use the LLC method to do that issue
dot frame dot LLC. And then we'll pass in i comma and then the number of shares to buy.
To test this out, let's assign them all a value of zero first, and then print it out
to see what happens. That comma shouldn't be there. Awesome number of shares to buy
has all been assigned zero. Great. So like before, what we're going to do now is we're
going to divide our position size by the company's stock price. And the easiest way to do that
is to copy this over and replace the number of shares to buy column with the Price column.
Price. Okay, this looks good. We're getting decimals again.
So the last thing to do is
wrap this in the math dot floor function. And just like that, we successfully calculated
the number of shares to buy for each stock In our HCM data frame, oh, there's a there's
one thing actually that we did wrong earlier. So you'll notice now that there's a, there's
a new column called index. And the reason why that's there is because when we did our
reset index earlier, I can't believe I didn't notice that at the time. But there's an optional
parameter here. And just to give you ever been more information about how to use Jupyter
notebooks, if you want to know what possible parameters are available for a function or
a method, you can do Shift Tab and this will pop up. And what we want to do is say drop
equals true. What this does is drop equals true. What this does is, the old index just
gets removed from the pandas DataFrame completely.
That's not the case, I'm going to show you
exactly how you debug this. I'm going to do pandas dot, and then reset index to go to
the actual documentation. Oh, sorry, I'll tell you exactly what the bug is here. And
the reason why this happened is because we ran this more than once. So if we just go
back up here, to where we initialize the issue data frame, and then run all the way down,
this problem will be solved. See, just like that, it's gone. It goes index and then ticker.
Okay, so just to recap, we should have originally put this here, drop equals true in our reset
index method we didn't and that caused that to be an extra index column, which we've now
fixed. So now when we run this, we get the number of shares to buy, and all the other
columns we need. Alright, so the next thing we need to do is format our panda's data frame
into a Microsoft Excel document for use by non technical LSI users.
So the first thing
we need to do to do this, we're gonna actually to back up a bit. To do this, we're gonna
be using the same libraries we did in the last project, which is XLS x writer. And to
start, the first thing we need to do is initialize a writer object. So to do that you do writer
equals pd.xl, XL, writer.
And then the first thing you need to pass in is the name of the
Excel file that you want to create. In this case, we're going to call it momentum underscore
strategy dot XLS x. The next thing you want to add is the engine, which is engine equals
XLS. x writer. Alright, so the next thing we need to do is say, Hmm, data frame, got
to underscore Excel. And the first argument of this function is going to be the writer
object we just created. The second argument is going to be the sheet name. In this case,
it's going to be sheet underscore name equals moment momentum strategy. And the last argument
is going to be index equals false. Alright, so run that code cell, and we can proceed.
Now, the next thing we need to do is actually create the formats that we need, we actually
created all of these formats in the last section of this course.
So or in the last project
of this course, I should say, so I'm not going to make you work through this again, instead,
we're just going to run this code cell to proceed. Now we need to actually loop through
the columns format dictionary, again, to apply the formats that we created to the columns
that we have. So the easiest way to do this, in my opinion, is to scroll all the way back
here to where we initially created our columns, which is right here. And then I'm going to
copy all of these and use this as kind of a basis for our dictionary that we're going
to build. Alright, so we're going to paste that there. Now we need to start with column
name. So it's just gonna we're just gonna be doing this to start a, b, c, d, e. Now
note that if I was in like an actual development environment, I typically write big programs
in VS code. I can do this much faster with multi cursor. Ah, Jupyter notebooks are better
for visualizations and other sorts of things, though.
Alright, so our columns are going
to be from a through L. And then we're going to want to wrap each of these inside of a
list. Now depending on your operating system, there's different ways that you can enable
this multi cursor functionality that I'm using right now, I am using Linux, and I'm just
Ctrl clicking so that I can add the start of the list each line. And then Oops, I'm
going to move back to the start of the list, I should try this again. Now what I'm going
to do is try to move to the end of the line and add the closing list brackets not gonna
be super pretty because some of these have columns and don't look back before the comma,
try that. Okay, what do we have any of these broken? Last one is? Great. Okay, so now we
have a columns from a dictionary where the keys are each column letter, and the values
are a list with one item where that one item is the name of that column.
The last thing
we need to do is work through these one by one and apply the template we want. So this
will be string underscore template, price will be dollar underscore template, number
of shares to buy will be string, note energy template, not string template. One year price
return will be percent, one year return percentile, so most of these are going to be percent template
moving down. So I'm just going to copy this and paste it. Alright, this is looking good.
So far, let's run this code. So next to make sure that this dictionary is actually formatted
properly, so it ran correctly, which is good. Alright, so the next thing we need to do is
loop through every key in this dictionary and apply the right format to that column.
So let's try that.
So we're gonna say, for column in column, underscore formats, dot
keys. And let's just print column to make sure this is working, right, A, B, C, D, E,
F, G, H, i, j, k, l, that looks good. Alright, the next thing we need to do is use the method
for XLS writer that actually applies a format to a specific column. As we saw in the last
project, the way to do that is we need to call the writer object and then from that
writer object, we need to access the sheets attribute, which is a list of all the sheets
in the Excel file.
And then we need to access the momentum strategy tab from that list.
And then we need to call the set column method on that, I was going to take three parameters,
let's put two commas in here to reflect that. The first parameter is the format that we
want to apply. So the first parameter is the column that we want to apply the format to.
So it will look something like this A through A, the second parameter is the column width,
which we're going to say 18 pixels. Actually, let's do something like 22. And then the last
is which column we're going to want to apply to that. So we'll say string underscore format.
And then let's just test that this morning, I will say, writer dot save. And then if we
run this, what happens, right, it's actually similar to strategy string format is not Oh,
it's because string template.
All right, now let's go to our folder and see if this is
working right. starter files momentum strategy, so is there which is good. If we open this
up, we can see that the data in here, we can see that the first column has been formatted
properly, we're going to go format the headers in the same way that we did in the last section,
so don't worry about that yet. Alright, so now we need to change this set column method
such that it actually is dynamically applying formats to each column in the data frame.
So first thing to do is change the column like this. And the second thing to do is to
add a two. Here, we're going to access the column format dictionary, we're gonna pass
in the column key. Alright, let's try that what happens now columns is not defined. That's
because that should be column. Now if we go to our Excel file, still not working. And
that is we saw this in the last project, too. We need to go back up and re initialize our
writer object and go down to the bottom again.
Right, what is happening here, we have some
kind of bug on writer dot save. Why is that? I'll save Oh, I see. Okay, so here in the
writer dot sheets, dot set column method. This is actually a list and we need to actually
parse it to get the format that we want. That format is in the second item of the list,
which is at index one. So we press one there, and this should work properly and we can go
see our oops, ignore that. We can go see our newly formatted Excel file. Alright, so the
format's are being applied to each column properly, you can see that they're not actually
wide enough because the column names in this project are longer than the column names in
the last project. So let's change that 22 to 25. And then the last thing we need to
do is actually check format the column headers.
And we saw in the last section that the easiest
way to do that is just overriding them using the right methods. So the right method is
similar to the set color method, except instead of taking in an entire column for the first
one, it takes just a cell from a column. So we're going to do here is we're going to access
the column.
And then once that's the first cell of that column, here, we're going to
specify the column width. And then here, we are going to do the exact same thing and apply
the format from the second entry of that list. Alright, let's go back to the top of our formatting
the Excel format section, run through all of this again, and with any luck, this should
be formatted exactly like we want. There we go. Alright, so you can see that this is not
good, because what is going on in this top row? Well, I see, this doesn't actually take
column with this takes the value that we want to put in it.
And that's just a mistake on
my part, a logical mistake. That should be this list at index zero. So let's copy that
list, put it here and put index zero. We'll move this writer dot save method, back down
to the last column, this Jupyter Notebook or the last cell of distributor notebook,
rather than we'll move back up to where the writer object was initialized and run it to
the bottom. Now, when we open up our momentum strategy Excel file, we should see that the
entire notebook is done, we have ticker price number of shares to buy one year price return,
one year return percentile, six month price, return six months return percentile, three
month price, return three months return percentile, one month, price, return, one month, return
percentile, and HTM score. Now, you can see that all of these aren't actually formatted
like they should be. So all these percentile scores are multiplied by, I guess 100 over
what their actual value should be.
So let's go back into a notebook and fix that. If we
scroll up here. We're gonna go all the way back to where our percentile scores were originally
calculated. And that is right here, I think. Yeah. So note, sorry, this is the HTML score,
we need to go up a little bit further. Right, so this is where they originally calculated,
we're going to move to the end of this calculation, and divide it by 100. And then, we're going
to go back to where we created our h qm DataFrame. And run this entire notebook to the very end.
All right, so this should all run with no problems. We're waiting on our input here,
so I'll do a million dollars again. And this should run to the bottom with no problem.
Alright, let's check out our new Excel file to see if those percentile scores were reduced
in magnitude like they should have been. Alright, so we have one year price return and then
9.2. That looks good. Alright, so you can see that this has been sorted by hmm score
properly, we go from 99.1, down to 81.9. And just like that, we have a list of the 50 highest
momentum stocks in the s&p 500, along with the number of shares that you should buy if
you want to create an equal weight basket on those 50 stocks.
So that concludes our
second project for this algorithmic trading and Python course, I hope you enjoyed this
project. In the next section, we're going to learn how to build a value investing strategy
that buys the 50 cheapest stocks in the s&p 500 based on a composite score of a number
of different value investing metrics. So I will see you there in the next project, which
is going to be a value investing strategy. Now, for anyone who's not familiar with value
investing, it just means trying to buy stocks and other investments that are cheap relative
to our SL. For anyone who's not familiar with value investing, it just means trying to buy
stocks and other investments that are cheap relative to our assessment of their intrinsic
value.
There's many different ways to try to estimate the intrinsic value of a stock
or other investments. So what we're going to do in this project is we're going to build
a simple value strategy that relies on one metric. And then we're going to improve that
value strategy by creating a composite value strategy that takes into account many different
metrics when trying to estimate intrinsic value. So to start, let's navigate into I'm
in my terminal here in my home directory.
Let's navigate into our algorithmic trading
Python folder. And then let's launch our Jupyter Notebook. Sorry, before we do that, we have
to launch our virtual environment. And then we can launch our Jupyter Notebook with Python
dash m notebook. A Jupyter Notebook will open up in your browser Then you're gonna want
to navigate into our starter Files folder and open up the quantitative value strategy
ipnb notebook. Alright, so, like I said before the value investing means investing in the
stocks that are cheapest relative to common measures of business value like earnings or
assets.
For this project, we're going to build an investing strategy that selects the 50
stocks with the best value metrics. From there, we will calculate recommended trades for equal
weight portfolio of these 50 stocks. So like before, the first thing that we need to do
is import the open source Python libraries that we'll be using throughout our Jupyter
Notebook. So it's going to be the same libraries as we use in the last section.
So import NumPy
as NP as a quick recap, NumPy is a numerical computing library that you need to work with
pandas, and then we'll need to import pandas as PD, then we'll need to import XLS x writer,
this is the module that we use to save our pandas data frame as an Excel file at the
end of this Jupyter Notebook, we're going to import the requests library which is used
for making HTTP requests. You'll recall from our first project that the slogan for requests
is HTTP for humans, which I quite like. And then we're also going to need to import the
stats module of Sai pi.
Pi import stats. And the last thing we're gonna need to import
is the math module. So let's run this code cell to to import all of these libraries into
our Python script. Great, that's done. The next thing we need to do is import our list
of stocks and our API token. So just like before, we're going to import our list of
stocks into a panda's data frame called stocks. And we're also going to import our is cloud
token from our secrets file. So let's do that. First, we'll say stocks equals PD dot read
underscore, CSV. And then we're gonna type in SP and hit tab to autocomplete the file
name. Let's just print this to make sure that it is working like we intend. So this is a
list of all the stocks, and this is a panda's data frame, you can get the pandas series
that's associated with the only column that data frame by parsing out the ticker column.
We'll do that later.
And the other thing we need to do here is say from secrets, import,
and then if you just hit tab here, it'll give you some options. What we want is this is
cloud API token. Great. So that is how we import our list of stocks and API tokens.
And in the next session, we're gonna make our first API call using our AI x cloud token.
Great. So like before, what we need to do now is figure out which end point of the is
cloud API that we need to hit. So I said in the introduction to this project, that we're
going to start by building a simple value strategy that screens for stocks based on
a single value metric, and that value metric is going to be the price to earnings ratio.
Now, for anyone who's not familiar with a company's price earnings ratio, it's basically
you take their stock price, and you divide it by how much earnings you expect them to
have over the next year.
So as a basic example, let's say we'll look at Apple again. So Apple's
current stock price is $508.06. Now, if you wanted to calculate a price to earnings ratio
for Apple, we need to find an earnings estimate. So we'll just type Apple earnings estimate.
And it looks like these are all quarterly estimates. And we want an annual estimate.
So whenever you're calculating a price earnings ratio, or any other kind of valuation metric,
you always use annual numbers in the denominator. So looks like this chart from NASDAQ provides
some estimated earnings metrics. So yearly earnings forecast, this is what we want. And
we will use the fiscal year ending in September 2021. It looks like they have consensus earnings
per share of $16 per share. And that means that to calculate the price earnings ratio,
we just calculate the take the company's current stock price and divided by 16.
So what does
this give us 31. So that means Apple's price earnings ratio is 31. That's some high level
background on what the price earnings ratio is. Fortunately, I x cloud makes it very easy
to access price earnings ratios, because if you go to their documentation, they actually
provide the price earnings ratio natively here. So we'll just start by trying to control
f find it price to earnings. And it looks like we have it right here. So P e ratio in
this API endpoint refers to the price earnings ratio. Oh, I don't know what that was a link
in here somewhere. I clicked the epic timestamp there. So anyways, P e ratio refers to the
price to earnings ratio for the company. Do you know what endpoint is this? This is the
quote endpoint. Now this is good because one of the other things that we'll need for this
screener is we're going to need the latest stock price for the company.
So you look in
the JSON response here. You'll notice that they have latest price here. And they also
have down at the bottom p e ratio. So what we're going to do is We're going to take this
API endpoint, we're going to copy it into our Jupyter Notebook. And we are going to
go back to the sandbox mode here and get our base URL for every API call that we're going
to make. So here is the sandbox base URL, we're going to copy that, move it back here,
put it there.
And you'll recall from the first project that we actually have to add stable
here to show that Oh, whoops, was stable and show that we want the latest stable version
of the API. And not any kind of bleeding edge version of the API that might have beta features
that aren't fully tested yet. So this is going to be kind of a sample API endpoint that we'll
use to get our price to earnings ratio, we're going to call this API URL, and we're going
to make it a string.
Now, we need to make two small changes to this. First of all, we're
going to take this field off yet, because we don't actually want to parse a single metric
out of this endpoint, we want to parse the entire JSON response, we're going to leave
symbol there, but we're going to turn this into an F string so that that symbol actually
becomes interpolated. Now, we need to make a symbol variable here, and we're gonna say
the symbol is Apple. And the next thing we need to do is create an actual API call using
the requests library. So to do that, we're gonna say data equals requests dot get. And
then inside of this get method, we're going to pass in this API underscore URL. And then
we are going to print out the status code of this HTTP request.
Now, we've talked about
this before, but status codes tell you whether your HTTP request was successful or not. Successful
HTTP request will return a status code of 200. And most erroneous HTTP requests will
return a status code that's somewhere in the 400 range. So 500 is actually also an indicator.
So we're gonna do a print statement that prints out data dot status underscore code. And if
this gives us 200, then we know that our first API call was executed correctly. Alright,
so this gives us 400. And the reason why I've done this actually before in this course,
but I forgot to add on our authentication token. So to do that, we're just going to
interpolate in this question mark token equals and then, inside these curly brackets, we're
going to add our AI x cloud API token. And this gives us a status code of 200. Awesome.
Now, we'll transform this data variable into a JSON object using the JSON method, which
accepts no parameters. And then we can print out this JSON response to see what it actually
contains.
Okay, so you can see that it has a number of different data points available
in this HTTP response, including symbol company name, primary exchange calculation price,
a whole bunch of other stuff, you can see actually, since we're using sandbox mode,
this primary exchange quite owns, that's not a real Stock Exchange, it looks like they
actually just took the company's real exchange, which is NASDAQ, spelt like this, and SBA
Q, and then they scrambled it or it looks like they scrambled it somehow.
Anyway. So
that's kind of an interesting part of using the the sandbox mode of the if cloud API.
So what we want to do here is this latest price endpoint, and this P ratio endpoint
here. So we'll parse out both of those variables in the next code cell. Okay, so the first
thing we want to do is calculate it with the stock price. So we'll say price equals data.
And then this is just a Python dictionary, you can actually verify that by just printing
out its type like this, this will give us some debt. Yeah, it's a class of debt. So
we'll pick up that print statement. Since it's a Python dictionary, you can you can
access the value stored in the dictionary, bypassing the keys and using square brackets.
We've seen this earlier in this course. But it's always good to have a good little refresher.
So the two things we want our latest price. So we'll we'll assign that to the price variable.
And let's just print it to make sure it was parsed out correctly.
513 89. That's awesome.
The other thing we want is price earnings ratio. So we'll say P e ratio equals data.
And then we're going to parse out this P e ratio string here. And just like we did with
price, we're going to print this out to make sure that it is working properly. The ratio
39.6, you can see that this is a little bit higher than the 31 pe that we calculated manually.
And that's probably because i x cloud is using a different kind of earnings. One of the tricky
things about running any kind of value strategy quantitatively like this is that when you're
using earnings or any kind of other fundamental metric, there are numerous different ways
that it can be calculated.
You can use earnings estimates, which is what I did in my example,
you can also use trailing earnings, which some people believe are more reliable because
they're actually reported earnings instead of relying on someone's estimate. But if you
ever calculate a P e ratio, and you're confident that your math is right, but it doesn't match
someone else's math, the reason why it's probably because they're defining ratio slightly differently.
So when you're doing any kind of work like this in the future, it's always good practice
to strictly define what you're doing. So if you're using trailing earnings, say, I'm using
trailing earnings, and I'm calculating my P e ratio, if you're using forward earnings,
you might want to call it a forward p e ratio instead of just p e ratio so that it removes
that ambiguity.
Alright, so like before, we can now move on to executing a batch API call.
And just like I did in the momentum strategy for this course, I've actually provided some
reusable code that we have used earlier on, so that you don't actually have to kind of
redundantly write the same code over and over again, although if you're up for the practice,
what you could do is just delete this entire code cell and try to write our chunks function
and our symbol strings variable from scratch.
So for now, for the sake of brevity, I'm going
to just run this code cell and proceed with executing our batch API call. So actually,
before I do that, I'll give you a quick quick recap of what's going on here. So first off
this, this comment at the top just tells you where I source this chunks function from.
Since I didn't write this myself, it's always good to attribute a function to wherever you
received if it's someone else's work. So what does this chunks function actually do? This
chunks function takes in a list, which is this LST.
And it also takes in this variable
n. And what it does is it you can see in the doc string here that a yield successive any
size chunks from lists. And to be more specific, if you have a list of 505, like is, which
is what we're going to pass in, it will return five lists of n sorry, you pass in a list
of 505 and a value of n of 100, it will pass you back a list that is five entries of length
101 entry of length five. So that's what that is. Next step, we create a list using this
chunks variable that takes all of our stocks and divides them into five lists of length
101, length of list five, and then we create an empty symbol strings list. And then with
this for loop, we take every group of stocks that's in the symbol groups variable, and
we create a comma separated string that has all of the tickers from each list.
And if
you want to see what those look like, we can actually uncomment this print statement and
run this code down. So you can see, this is one list. This is another list. This is the
next list. This is the end of the fourth, this is the end of the fifth. And this is
the entire fifth list, which is shorter, because it wasn't long enough to actually fill out
the length of 100 that was specified by the function. Okay. And then the last thing that's
done in this code cell is we specify the columns for our panda's data frame. So the columns
are ticker price, price earnings ratio, and number of shares to buy. So I'm going to run
this again, with this print statement commented out to remove that large print.
Okay, we can
now move on to writing code from scratch. So the first thing that we want to do is to
create a blank panda's data frame. And we saw how to do that we're going to call it
final data frame. And we're going to assign it a value of PD dot data frame. And within
this class instantiation, we're going to pass in columns equals my underscore columns. Now,
we saw before that we could have just called this columns, but then you have columns equals
columns, and that can be a bit unreadable. So this is what we're going to do to make
sure that that data frame was created properly, we're going to print it out. And as you can
see, it is an empty panda's data frame with four columns. And the column names are the
four names that we specified in this list in the previous code cell. So now that that's
done, what we need to do is we need to loop over every symbol string in our symbol, strings,
object.
And to do that, we're going to say for symbol string in single strings. And then
we need to actually create another loop inside of here, that loops over every ticker in that
symbol string. So we're gonna say, for symbol and symbol, underscore string dot split. And
you'll recall from the last two projects that if we call this split method, what it does
is it basically does the opposite of this join method that we saw in the last code cell.
So this is where it's going to be. And then we're just going to print out a symbol. And
this will allow us to make sure that we're actually iterating over every symbol in our
universe.
Great. So it goes all the way from a and this should go to z because these are
in order. Yep, all the way down to z. So that looks like it's working. Okay. Alright, so
within the first for loop, but outside of the second for loop, we're going to create
a variable, we're going to call that variable batch API calls URL. And what is this going
to be, it's going to be a URL that is a string, and it's going to be an F string. And it's
going to query this endpoint that we use earlier, this quote endpoint for every stock in our
universe. And it's going to do that in batch API calls for 100 stocks at a time now, to
get the proper syntax. for that. We're going to go back to the AI x cloud docs, and we're
going to search for batch API. Call. Another week off with anything. So I'm just going
to search for batch. And here you can see batch requests. Now, I'm going to scroll down
here to this example, which is what we copied in the past, I'm going to copy the link address,
I'm going to go back to my Jupyter Notebook.
I'm going to paste that in here. And then
we have to make a few modifications. So the first thing we want to change is the symbols.
And all we're going to do here is we're going to actually change them into an interpolated
value. This interpolated value should be symbol underscore string. Now, what is that symbol
string variable, it is the iterator of this first for loop. So it will be if you uncomment
this and run this. This is one example of such a simple string that you can interpolate
into that API called URL. So we'll comment that out and run it again. Great. So there's
a few other changes that we have to make to this batch API call URL. One of the first
things is that if you look at the types of API endpoints that we're hitting here, we
have, quote, we have news and we have chart, we actually only need the quote endpoint.
So let's take away all the other ones.
Awesome. And then there's two other parameters specified
here, range and last. And we don't need either of those. So we can take them both off and
replace them with our token equals is called URL. All right, so let's just run this to
make sure that that's formatted properly. And instead of this print statement, we're
going to comment it out and just replace it with a pass keyword. Oops, ignore that. Alright,
so we're going to replace this with a pass keyword. Great. So that ran properly, which
indicates that the syntax is okay. Now, it's time for us to actually, so we've created
our API call URL, but we haven't actually executed the HTTP request yet. And that's
what we'll do next.
So to do that, we're going to use the requests library to execute a get
request. And we're going to store that response in a variable called data. So data equals
requests. Press dot get, and we're going to pass in batch API call URL. And then we're
going to call the JSON method on that, actually, no, we're just going to leave it as is. And
we're going to print data dot status, underscore code for each one. Now, since there's six
entries in our list, if everything's working properly, then this should print 206 times
we get 403 for each one. Now, this is actually interesting. So I mentioned earlier that 405
100 codes and when I say 400, to 500, codes, I don't mean exactly 400. And exactly 500.
I mean, anything from 400, to 499. And anything from 500 to 599. These are generally indicate
error. So let's go look exactly what type of error This is. Um, so type HTTP, or three,
run a search. And Wikipedia is actually pretty good resource for this. It says the HTTP four
three is an HTTP status code, meaning access to the requested resources forbidden for some
reason.
Now, this is a very, very specific HTTP response, that usually means that you
have you're not entering the proper API token. And I'll show you exactly why that is, I know
what they're looking because we didn't change this cloud keyword to sandbox. If we do that
again, and run it, we should get six 200 codes. Awesome. 123456. Great. So the next thing
we need to do is actually parse the data and using an append method, add it to the final
data frame.
So we've seen this before, what we need to do is add that here, we'll take
away this print statement in the past keyword. And then we'll say, final underscore data
frame, equals file underscore data frame dot append. And then within this append method,
we need to create a panda's series. We've seen this before. And within that kind of
series, we need to create a Python list.
So what goes inside this Python list we need?
Basically, one data point for every column, we need ticker price price earnings ratio
and the number of shares to buy. So we'll say symbol. That's our ticker, we'll say data.
And then we'll parse out the latest price data point and then we'll say data and we'll
parse out the PE ratio data point. And then for the number of shares to buy, we will parse
out nothing and just say and under n slash a. Now for this panda series, we have to specify
what the indexes will say index equals my columns. This tells the append method, which
columns of the panda's data frame need to be considered when you're appending it each
time, so that's very important.
And then we also need to add nor underscore index. Alright,
so if we run this code sell now and then print out our final data frame, we should get a
data frame that has 505 rows, and each row corresponds to one stock with the four metrics
that we selected. So with any luck, it should run six And we're getting response object
is not so scriptable. I'll show you what the problem is here. When we ran our HTTP request,
we didn't do dot JSON and to actually turn it into a Python dictionary. So if we run
this now that should be fixed. Latest price. Oh, I grossly oversimplified the person here.
So since this is a batch API call, we actually need to do multiple levels of parsing.
The
first is symbol. The second is which input we want to hit. So if we add that to both
of these parses, we should be okay. Alright, so here's our data frame, it goes from index
zero to index 504, which indicates that all 505 stocks have been successfully added. And
it has four columns ticker price, price to earnings ratio, and the number of shares to
buy.
So now that that's done, we can move on to removing glamor stocks. Alright, so
now we need to remove the glamour stocks from our panda's data frame. And for anyone who's
not familiar with the term glamour stock, it's basically the term that's used to describe
the opposite of a value stock. So if a stock ranks very poorly on value metrics, then instead
of being called a non value sock or something like that, it's just called the glamour stock.
Now, since the goal of this strategy is to identify the 50 best value stocks from our
universe, our next step is to remove glamor stocks from the DataFrame.
And to do this
will kind of follow similar logic as we did before, we're going to sort the panda's data
frame such that the stocks with the lowest price earnings ratios are at the top, and
then we're going to remove all of the entries from the data frame except for stocks that
have that fall below the top 50 in that ranking scheme. So let's do that. The first thing
we need to do is start the panda's data frame. And to do that, we will say, final data frame,
dot sort underscore values. Now, you'll recall that this is the exact same method applied
to the panda's data frame that we use to store values in the momentum strategy section of
this course. And then what we want to pass in first is the column that we want to start
out and that is price to earnings ratio, I just tried to do a tab autocomplete and it
didn't work.
So what I'm going to do is just go up here and copy this. That means I don't
have to worry about making sure I get the capitalization rate everything. The next thing
we need to do is specify the ascending parameter. Now we want it to be descending, which means
that sending is equal to false. And the last thing we need to say is in place equals true.
This parameter tells Panis that instead of just returning a modified data frame, we want
to modify the original data frame, let's print out the final data frame object to see if
this worked correctly. So as you can see, we have a price earnings ratio of 2400 at
the top and a price earnings ratio of not the bottom, I actually did this correctly,
this should be true. Now we see that our lowest values are at the top and our highest values
are at the bottom.
Now one important thing to note with this is that since ascending
defaults, having a true value, we can actually remove this completely. And it'll give it
the same output. As you see. Another thing that's worth mentioning is that all of the
stocks that have negative price to earnings ratios. If you think about the mathematical
definition of the price earnings ratio, it's price divided by earnings. So in order for
price to earnings ratio to have a negative value than either price has to be negative
or earnings has to be negative.
Now, stocks can't actually have a negative price. So what
this means is that these stocks have negative earnings, there is kind of a wide body of
evidence to say that you might not want to invest in stocks that have negative earnings.
So in a real investment strategy, you may want to remove all stocks that have negative
price earnings ratios. And one easy way to do that would be something like this final
data frame such that final data frame at the price to earnings ratio column is greater
than zero.
Now what is this return should return the final data frame with all of the
negative values removed. So we're going to include this and to actually modify the original
data frame, we have to assign it, you'll notice that if we print this, this gives us the final
data frame with negative values. So what we're going to do is take this out, we're gonna
say final data frame is equal to that. And then we need to return just the top 50 stocks
according to this order schema. We saw how to do that in the momentum project of this
course. But it's pretty easy. You just say final data frame is equal to final underscore
data frame.
And then you want to access out with square brackets colon 50. Now if we print
this, it should be a data frame that ranges from zero to 50. Okay, so this looks good
to me. There's a couple things we can do to make sure that this is working properly. We
can print its length, this should be 50. Awesome. And then the last thing we want to do is reset
its index so that the index actually runs from zero to 50.
So we'll do we underscore
index in this give us. Alright, as you can see, this now goes from zero to 49, which
means there's 50 entries in the panda's data frame, the only other thing we need to do
is get rid of this new index column, we saw one way to do it in the last section of this
course, I'm gonna show you a different way now. First, we're, one thing I've read here
is going to stay in place equals true within this. And that means that the reset index
method will actually modify the original data frame.
Now, if we print this out, again, we
should see the modified data frame. Awesome. And since I ran this twice, we actually have
two separate reset indices. So let's go back up here. And we'll rerun this, we run this
and we run this and we should just have one duplicate index. Okay. What we want to do
now is we're going to call the draw method, we're going to pass in the index column, we're
also going to pass in in place equals true here, what does this give us? It gives us
a key error because index is not found in axis. Okay? The drop method, by definition
automatically acts on rows not on columns.
So to change that, we have to say x equals
one. What is this give us cannot insert level underscore zero, it exists, that's because
we already ran that cell. So if we run this now what happens? Alright, everything looks
to appears to have worked properly. So we'll print this out again to make sure it looks
good. Awesome. So now let's just do a quick recap of what we learned in this section.
I explained briefly what a glamor stock is, we sorted our final data frames such that
the stocks with the lowest price earnings ratios were at the top of the data frame,
and we specified the in place equals true parameters so that the original data frame
was actually modified. And it didn't just return a modified data frame. So that's what
that line does. Next up, we, we sorted the data frames such that all rows that have a
price to earnings ratio value below zero were dropped from the data frame, then we trim
the data frame to only include the stocks with the 50 lowest price earnings ratios,
we reset the index, and then we drop the new index column that was created by way of this
reset index method.
Now, one important thing to note about this is that we could have actually
specified the drop parameter here to do the same thing. That's what we did in the last
section. But the reason why I showed you this method instead, is just to reinforce the idea
that whenever you're doing any kind of software development, there's usually multiple different
ways to solve a given problem. So that's that, Let's now move on to calculating the number
of shares that we need to buy.
Just like we did in the momentum strategy section of this
course, we're going to use the portfolio input function to do this. And to make things easy
for you, I actually included the portfolio input function there, I encourage you to try
to delete that formula and recreate it from scratch, scratch, if you want the extra practice.
For now, I'm just going to run this. And then I'm going to use the portfolio input function
to accept a portfolio size variable. So let's do that. And I'm going to test here that the
try accept value error statement is working by putting in a string, so I'm going to say
my portfolio is $1 million. And then if I enter this, then it should print it. That's
not a number of try again. Awesome. Now I can say $1 million, so that six zeros 123456.
Awesome. The next thing we need to do is calculate the position size. So just like we did before,
position size can be calculated by dividing the portfolio size by the number of shares
in the portfolio.
So we're going to position size divided by the length of our final data
frame. And you can take any column of this data frame to find its length, I'm going to
use the index and there's no kind of right or wrong answer here. I just the index is
what I typically use for this. So that's that, and this position size should be $20,000,
I believe. Now here it says type error unsupported operand type.
That's because this portfolio
size, despite it actually being a number, it's still formatted as a string, so we need
to wrap it in a float function. This gives us $20,000. Awesome. Now we need to loop through
the every row in the panda's data frame, and modify the value that's contained in the number
of shares to buy column. So to do that, we're going to say for row in in final data frame
dot index, and then we are going to say, final data frame dot LSE. This is the exact same
method that we used to modify this before.
We're going to do row and then number of shares
to buy. Then we're going to use the assignment operator to say divide that stocks or sorry
to divide our position, size by that stocks stock price. So We put pricing here. That
should be the trick. Now we'll print out our final data frame to make sure that this is
operating correctly. This looks good. Let's just compare two stocks here. So we have a
stock at $35 and a stock at $18. So this is interesting, because this is roughly double
this. So this should be roughly double this, and it kind of passes the eyeball test. So
that's good. Now, the only other thing we need to add here is the floor function. So
as you'll recall from the last section of or the last project of this course, you can
usually buy fractional shares at some brokers. So if we wrap this in the floor function,
it will give us the closest lower integer to the number of shares that are calculated.
So if we run this, we can see that those have all been transformed to integers.
Awesome.
We are now ready to move on to building a better and more realistic value strategy.
All right, so it is now it's time for us to build a better and more realistic value strategy.
Now, just like in the momentum project that we built earlier in this course, what we're
going to do to make a better value strategy is to build a strategy that takes into account
many different metrics. And the reason why that makes the strategy better is because
every valuation metrics tend to have certain flaws.
For example, as we saw earlier in this
project, the price to earnings ratio doesn't exactly work that well with stocks that have
negative earnings. Similarly, another popular valuation metric is the price to book value
ratio. And that ratio doesn't work well with stocks that perform a bunch of share buybacks.
So there's lots of different things to consider for every valuation metric. And one common
way that investment managers reduce any of the flaws, reduce the impact of any specific
valuation metrics flaws, is by creating value strategies that take into account a compensated
value.
So instead of just saying, we're going to rank stocks on price to earnings ratio,
they rank percentiles for a bunch of different valuation metrics, and then rank stocks according
to the average of those percentile scores. So that's exactly what we're gonna do in this
section. Here are all the ratios we consider. So price to earnings ratio, we already saw
that price to book ratio. Now, this is the company's stock price, the company's stock
price divided by its shareholders equity per share, the price to sales ratio, this is the
company's stock price divided by its revenue per share enterprise value divided by EBITDA.
Now, that is a more complicated valuation metric, even R stands for earnings before
interest, taxes, depreciation, amortization, enterprise value is basically the number that
you would have to pay to buy out the company in full. So that doesn't just mean buying
it all at a stock, you would have to also buy it all of his debt and any minority shareholdings
that exists. So that's what enterprise value is.
And then another column valuation metric
is enterprise value divided by gross profit. So all of these five valuation metrics, we're
going to now pull in from the i x cloud API. Now to do that, we're going to start by just
making a single API call. And then we will loop through that same API call for a bunch
of different stocks that are listed in our stock data structure to build a panda's data
frame that contains all the data. So to start, we'll just do this for one symbol, and that
symbol is going to be apple. And then we're going to do a batch API call here. And unlike
before, we're not going to use a batch API call because we want to query the i x cloud
API for multiple symbols. But instead, we have to do use a batch API call because we're
querying the API for multiple endpoints. So since the different valuation measures listed
here are all so different, they're not all actually going to be stored in the same endpoint.
So we need to find where they are in the iOS prod API, and then add those endpoints to
a batch API call URL.
So let's start by just creating the variable batch API call URL.
And just like before, this is going to be an F string. Now let's to make this easy,
we're just going to base this off of the batch API calls that we executed earlier in this
project. So let's scroll up and find that. Alright, so here is where it is, I am going
to highlight all of this and copy it to my clipboard, I'm going to scroll back down to
where we were just working and paste this in. Okay, now, we have to make a few changes
here. The first is that we need to replace this symbol string with just symbol. And then
let's actually execute an API call against this to make sure that it's working properly.
So we've seen this many times by now. But we're going to do data equals and then to
define the variable we're going to say requests dot get, and then we're going to pass in that
batch API call URL, not the data URL, batch, API call URL. And then let's do print data
dot status underscore code.
And if this returns 200, then we know it's working correctly.
That is all Okay, so let's go through all of these metrics one by one, I'm actually
going to copy these, and paste them down here and comment them all out. Oops, sorry not
to do. There we go. And then underneath each of these comments, we're actually going to
parse that value from the IRS cloud API. So the first thing we need to do is get the price
earnings ratio. And we actually saw how to do this in the last section of this tutorial.
So if you go back up here, we are going to go to where we copied this original batch
API call URL. And we are just going to grab this parse here.
So it's data and then symbol,
and then quote, and then PE ratio. So let's copy that and scroll back down to where we
were just working, which is right here. And if we print this, this should give us what
we need. Okay, now we're getting this error response object is not subscript well, and
that's because I haven't called the JSON method on our API call. So we're going to add that
here. And then we're going to remove this data status code, because that's now going
to return an error. Let's run this code cell again. Great. So this tells us that Apple
has a current price earnings ratio of 38.84, according to the i x cloud API. Let's now
assign this variable to it for this value to a variable called PE ratio. And we're going
to do similar things down here. So PB ratio is going to be pairs to book ratio. And for
now, we're just going to assign that a value of not a number.
So this is a not a number
data structure that's stored within the NumPy library. If you've never seen that before,
in order to do a P, or sorry, p s ratio, same thing, np dot n a n. And then down here, we're
going to do Evie to either that same thing, np.na n. And then we're going to do Evie to
gross profit, Evie to gross profit equals np.mn. Okay, we've now filled these all with
placeholder values of NP dot and n. Let's now go into the IRS cloud docs, and figure
out where these data points are contained within the i x cloud API. So just like before,
all I did for this was I Google is cloud docs. And it's the first non add, hit clicking.
Alright, so now I'm going to do a Ctrl F search for PB.
And with any luck, this will show
us where the price to book ratio is held. There were seven hits, I just searched through
all of them. I didn't, it wasn't any of them. So now I'm going to track the price to book
it only gets two hits. And both of those hits are this value score, which looks like it's
its own composite metric here. So let's try price to book with hyphens. There we go price
to book, what endpoint is this. So this is the advanced stats endpoint. And to add that
to our batch API call URL, I'm just going to copy it. And I'm going to add it here right
after the quote.
So comma advanced stats. So now we are going to print our data object
and see what we have to parse out. So do that. Alright, so we're gonna have to do multiple
levels of parsing here, we're going to print data, Apple. And then within that, we are
going to parse out the advanced stats endpoint. Alright, so now when we run this code sale,
this will show us what's contained inside the advanced tests endpoint, we want PB ratio.
So I'm going to do a Ctrl F for PB here and see if it highlights anything. I don't see
it there. So I'm going to do a Ctrl F for book. And I see it right there. So price to
book. Now we can copy this in and parse it out. Alright, so this tells us that Apple
has a current price to book ratio of 23.8. I'm going to copy this and assign it to our
variable of PB underscore ratio. Then what I'm going to do is I'm going to remove the
final section of parsing from this line, I'm going to print out the JSON object again,
and I'm going to try to find the price to sales ratio.
So just like before, I will start
by saying PS. There is one hit within this data structure, but it's TTM ups. If you've
never seen that before, all that stands for is trailing 12 months earnings per share.
And that's no and that's not what we want. So I'm going to do a Ctrl F search again for
sales. And you can see right here, we have price to sales ratio. Awesome. So I'm going
to do is I'm going to copy this price to book ratio data point and instead of price to book,
I'm going to change this to price to sales. And then I'm going to print the PS ratio so
that we can see what value we get. So this tells us at the bottom here, that Apple has
a current price to sales ratio of $8.20, or sorry, 8.28.
So I can delete that. Now we
have to parse out enterprise value to EBITDA, and enterprise value to gross profit. Now,
both of these are a little more complex, but we can still make it work. The reason they're
more complex is because the V to EBIT ratio and the EB to gross profit ratio, they aren't
actually provided explicitly by the IAS cloud API.
Instead, we're going to have to pull
in their constituent metrics. So enterprise value, and Eva, or enterprise value and gross
profit respectively. And then we're going to have to do the division ourselves. So to
start, let's try and find enterprise value, we'll create a variable called enterprise
value. And to start we'll assign that a value of np.na n. And then we're going to want to
do a Ctrl F search within this endpoint for enterprise. And, interestingly enough, the
ice cloud API does provide one of the enterprise multiples, but its enterprise value to revenue,
which is not one of the ones we're going to use. So instead, we're going to pull in this
enterprise value. And point, we're going to grab this for price to sales, and we're going
to copy it down to the enterprise value variable, and then replace this with enterprise value.
And then let's print it now since Apple is a very large company, this enterprise value
should be huge.
Yeah, it looks like it's more than 2 trillion if I'm just eyeballing it
there. Actually, let's divide that by trying to see if I'm right one, and then we should
be nine zeros 12345678. Sorry, not nine zeros, 12 zeros. So in three more 123. And it looks
like it's around $2.2 trillion of enterprise value for Apple right now, which is pretty
crazy. Alright, so we have our enterprise value. Now we need our EDA, let's do a Ctrl
F search for Eva, and see if it pops up in this endpoint. You can see right there that
it is pretty easy to access, what we'll do is we'll just copy this, paste it there, or
change the variable name to Eva, and then change the parsing to Eva as well.
It's all
capitals, so make sure you capitalize this as you're working through it. And then for
our Evie to EBIT ratio, what we're just going to do is say enterprise underscore value divided
by and that ran correctly, what we need to do now is printed to see what value we're
actually storing Evie to EBIT. So this tells us that Apple has a current Evie to EBIT ratio
of 28.2. Awesome.
So using this information, we can now proceed to calculating our Evie
to gross profit, we already have our enterprise value, and it's sort of a variable up here.
So what we need to do now is create a variable called gross profit and parse out the gross
profit number from this from this JSON object. So to do that, I'm going to do a Ctrl F search
for gross. And you can see the gross profit metric is right there. So what I'm going to
do is copy the parsing that we did for our EBIT variable earlier in this Kotel, I'm going
to paste it here.
And then I'm going to replace the image parsing with a gross profit parsing.
Now if we print out gross profit, what does it give us? This shows you that Apple had
gross profit of I'm going to say that's 10 billion, I'm not really sure, just by looking
at it, there's no commas, which makes it a little harder to read. Anyways, the numbers
being pulled out successfully, that's the main thing. So now what we're going to do
is say Evie, to gross profit is equal to enterprise value, divided by gross profit. And then we
will print this on the next line print, Evie to gross profit.
And I'm actually gonna run
this code set again with that statement. So this shows us that Apple has an end to gross
profit ratio of 22.6. Okay, so we've now done all of the parsing and calculations that we
need to get all of our value metrics for one stock. Now it's time for us to generalize
this logic and build a panda's data frame that stores all of this information. So we'll
do that next. Now, you'll see in the header for this code sale that he says you'll notice
that I use the abbreviation RV often and that's what we're going to use for robust value,
which is kind of the name I'm going to use for this investment strategy. It's kind of
the equivalent to high quality momentum that we use in the last project. So as before,
the first thing we need to do is say RV underscore columns. And then we need to create a Python
list that defines all of the columns that we're going to use in our panda's data frame.
So the first thing will be ticker that's the stock ticker of the company we're looking
at.
The next thing will be price and that is the stock price of the company we're looking
at. The next thing is number of shares to buy Just like in our first two projects that
tells our trader team how many shares of the specific security to buy when they're rebalancing.
And now we're going to start working through all of our valuation metrics. So more specifically,
we're going to say, price to earnings ratio. And then we're going to say a percentile score
for price earnings ratio. So we'll call that P e percentile. And then we're going to we
need a column for our price to book ratio, so price to book ratio.
And then we need a
column for the percentile score of price to book so PB percentile. And then we need a
column for our price to sales ratio. So that's the price to sales ratio. And then we need
price to actually we'll say p s percentiles, and that matches the abbreviation that we
use for our first two percentile scores. And then we need enterprise value divided by Earnings
Before Interest taxes, depreciation amortization.
So So for that, we're just to bring it right
from the get go because it's quite long. And we will say, Evie, to EBIT, da. And then on
the next line, we will do a percentile score for that metric. So Evie to EBIT. percentile.
And then we need our Evie to gross profit. So we'll say Eb to GP and then Eb two, Eb
to gross profit percentile. The last thing we need to do is calculate our robots value
score. So we'll do that right from the get go, this time, our RB score, awesome, I'm
gonna run this code sale just to make sure that the syntax here is all working, that
is awesome, we can now move on to creating an empty panda's data frame. So we'll call
this data frame RV data frame. And that stands for robust value data frame.
And to associate
it we're gonna say, PD data frame. And then inside of this class call, we're going to
say columns equals RV underscore columns. And what does that look like? It doesn't look
like anything because I didn't check it out yet. So let's add RV DataFrame on the last
line. And as you can see, this has all of the columns that we specified, and we are
ready to start creating our loop. Now, we have already created our symbol strings object
earlier in this project. So if you scroll up here, a little further, what am I looking
for? Here is so this code cell was already provided when you guys open this up. And this
is created simple strings that we can loop through when creating our data frame. So we
don't have to do that, again, because those variable values are already stored in this
Jupyter Notebook. So we're ready to create the loop. All we need to do now is say, for
symbol string in symbol strings, and to make sure this is looping correctly, let's just
print out the single string.
Awesome, that looks good. And then within this loop, we
have to create our batch API call. And we're gonna do that in the same way that we did
before, we're first gonna create the URL. And then once the URL has been created, we're
going to use our requests library to actually execute the HTTP request. So we'll call the
URL batch API call URL. And then we already created this URL a few cells earlier. So we
can just go up to here and copy this with a few minor changes. So the minor changes
I was referring to are that we have to replace this symbol with symbol string, and that it
should be good to go. To make sure this is working properly.
I'm just going to print
the batch API call URL. Let's see what happens. Awesome. You can test easily whether any of
these is working by just clicking on and what that will do is in a separate browser window,
it will open up the JSON object that you'll be pulling in. So let's see, awesome in a
browser window like this, this is pretty useless, because you can't really manipulate it.
But
that's a quick easy way to tell that your API calls working right. Now let's execute
our HTTP request. So we're gonna do that in the exact same way that we did before data
is equal to requests dot get. And then inside that get method pass in batch. API, call the
URL. And then we're going to print out data dot status code. And since we already clicked
on the URL to open up the JSON object in a separate tab a few moments ago, we know that
these API calls are working.
But printing the status code is another way you can check
it. So if these API calls are functioning as we want them to, this should print out
HTTP code of 200 for each API call. So let's see. 123456 and it should be six. So that
is awesome. All right. So now we're ready to start parsing our data frame. And to do
that, the first thing you need to do is add the JSON method to the request dot get method
and what this does is it turns this object which is a request response object in To a
JSON for us to manipulate. Now, if you print out the data for each one will be a bunch
of big JSON objects.
And this is what we need to start manipulating. So now that we're in
here, we need to loop through every ticker within our symbol string, and append the data
for that ticker to our empty panda's data frame. To do that, we need an inner loop.
And that inner loop will look like this. We'll say for symbol in symbol, underscore string
dot split. And then we're going to split on the column character just like we did in the
other two projects. To make sure this is working properly, let's print out our symbol. Awesome.
So you can see we're going through all of the symbols in our stocks, panda's data frame
that we created at the very start of this Jupyter Notebook.
And we're printing them
out in alphabetical order. Okay, so now that that's done, we can start using the append
method to add data to our empty panda's data frame. And to do that, what we're gonna say
is RV data frame is equal to RV underscore data frame dot append. Now we're gonna use
same, we're gonna use similar logic inside of this head method as we've used earlier
in this course. So when you attend a series, and then inside of that panda's series, we
need a Python list. And we also need to specify index is equal to RV columns. Now, in case
you're not remembering what this does, this index equals RV columns specifies which columns
of the empty panda's data frame, each item of this pandas series should be appended to,
and then inside of the pandas series, we want to add the ignore index equals true. Now,
I haven't really done a super job explaining this.
But to be very specific, what this does
is it means that every row that you add to the panda's data frame will have its index
column automatically calculated for you. So the first row will have index zero, the second
row will have index one, the third row will have index to the 50th row will have index
49, and so on. So that's what this ignore index does. Now, within this panda series,
we need to parse out all of these data points.
So like I did in the last project, what I'm
going to do here is I'm actually just going to copy these in. And since it's a comma separated
list, we can just replace each element of this list with the data that we want to parse
out. So we'll do that one by one ticker, we want to replace with symbol. And remember
that symbol is the incrementer of the inner for loop that we're building here. For price,
we want to parse it with price. And we saw how to do that up here. That's actually not
true, we didn't see how to do that there. But what we can do is go back up to our earlier
project where we did this for for just a screener that only factors in price earnings ratio,
we can copy this parsing that we use here.
So we'll do that and scroll back down here.
And then we want to replace this price with that awesome. Now number of shares to buy
just like in the in the other projects that we worked on in this course were replaced
with a placeholder value and a price earnings ratio we actually parsed in the last code
cell along with all of the other valuation metrics that we need. So what we can do here
is just say, copy that and add it where the price earnings ratio is.
And then we come
to the section where we need price to earnings ratio percentile and all of the percentile
scores in this pandas DataFrame, just like in the ones that we worked with earlier in
this course, we're going to replace them with a place where the value of na, because in
order to calculate percentiles, you need the panda's data frame to be fully populated first.
So we will calculate these a little later on in this project. For now, we're going to
replace them all with an A.
And one more. And for that matter, this robust value score
in the last column also needs here. So let's do that now to Alright, the next thing we
need is our price to book ratio. And just like the price to earnings ratio, we had parsed
that out in the last Cosell. So I'm going to copy that I'm going to add that here. Oh,
sorry, not there, down here. I knew something was gonna look great. So add that there, we
are going to do the same thing with our price to sales ratio. So that's right here.
All
right now for Evie to even up an Eevee to gross profit. We actually have to do some
special logic here because there are metrics that are using both of those calculations
that aren't available for every stock. So here's how we handle that. The first thing
we need to do is pull in our enterprise value, or even in our gross profit metrics. And we're
going to do that inside of this append method. So there's enterprise value. Here's the eba.
And here is gross profit. Awesome. And then in here, for now, I'm not going to do the
special logic that I just referred to referred to earlier, because I want to show you what
happens if you go into pptp. Oh, and this isn't easy as it should be enter price value.
And then here would be enterprise value divided by gross profit. All right, let's run this
and see what happens. So far, so good 72. And then that we get a key error for a Pl.
And that's because all of those applets it says need to be replaced with symbol.
Let's
do that. Try that. Great. We have another key err, because there's more instances of
Apple up here. That is embarrassing. Like I said earlier in this course, I'm not going
to edit this out, because I want you to see how I debug stuff in real time, it might be
useful to see the debugging process of a different developer. Okay, what happens now, here's
what we're getting. So it says unsupported operand type for division, integer and non
type. And that's referring to this line right here. Okay, what's going on, let's crank out
our enterprise value. So we have a value for enterprise value. Now let's print it or EBITDA.
None, now, that is the source of the error. So to fix this, we need to find some sort
of workaround to fill this value, even when there's a none type for one of the metrics
that it contains.
Now, there's lots of different ways that you can handle this. But what we're
going to do is this, I'll show you so we saw when building our portfolio input function
in earlier projects of this course, that tricep statements can be used to handle exceptions.
So that's what we're going to use, we're gonna use try this. And if that doesn't work, we're
going to make an accept statement that handles specifically the type of error one carrying,
which is a type error. And then we're going to say np.na n now, because this is in an
append statement, we actually can't put it there. Instead, we're going to create an EV
to EBITDA ratio, and put this outside of the append state. And I'll show you more about
what that looks like in a second correctly. Okay, so this is going to be putting aside
the append statement right here after we create all these.
And since the indenting, is in
a loop, I have to tab that in. And then we will replace this with sorry, Evie, to Eva.
Now, okay, what this does is it tries to calculate a vd beta using this. And if that fails because
of a type error, then it will fill it in with NP dot and n. Now, that's not good, because
na N stands for not a number, and how are you supposed to source logs on value metrics
that aren't numbers? Well, we're going to see later on in this section, or later on
in this project, I should say that there's a number of different ways that you can handle
missing data in the pandas library to fill in the values that aren't there with reasonable
values by performing calculations on the other values in the data set.
So I will see more
about that later. But the main idea to remember now is that even though we're filling our
panda's data frame with null values, now, those aren't going to stay there for our final
calculations, because we're going to replace them with something more meaningful later.
Okay, the next thing we need to do is we haven't seen it yet. But this type error happens with
the gross profit metric as well because some companies don't actually report gross profit
depending on how, what their operations look like and those sorts of things. So what we're
gonna do is create a similar Eb to gross profit metric here and use the exact same type error
Exception Handling to to handle this, okay, so Eb to gross profits, enterprise value divided
by gross profit, this looks good. The only other thing we need to do is take this variable
and replace the actual divisor in the income statement. Okay, now I'm going to delete this
other code cell that we were working with and run it again to see if it works.
So far,
so good. Awesome, so that code cell ran with no problem let's print our data frame now
see what it looks like. Awesome. So we have ticker stock price number of shares to buy
price to earnings ratio, p percentile price to book ratio, PB percentile, price to sales
ratio. O p s percentile EV to EBITDA percentile up
to GP percentile and the army score. Everything looks like it was added properly here. So
we can now move on to dealing with missing data in our data frame. Now, our data frame
contains missing values, because we use that try accept logic in the last section to fill
in missing values. Whenever the cloud API provided a nun response for a metric that
we were using. That's basically what this says it says our data frame contains some
missing data because all the metrics we require are not available through the API that we're
using.
So to see where this missing data is, what you can do is use the pandas is no method.
So this is kind of the statement you execute to see all call or sorry, all the rows that
have missing data, and then you say, RV data frame, and then in square brackets, you would
say RV data frame is no. And then on that isn't all you want to say any and then axis
equals one. Okay, so to break this down, we're taking our RV data frame, and then we are
filtering in the square brackets to say any part of RV data frame where the is no method
is true. And then on that we want to apply it to any columns. And this axis equals one
tells us columns, not rows. So when we run this, this is going to return a modified version
of our panda's data frame that only contains rows where there's missing data.
All right,
so as you can see, there is quite a bit of missing data. And you can actually see how
much there is by calling the length of this index. So there's 55 rows that have some sort
of missing data in the panda's data frame, which is about 10% of the entire data set.
So let's look at this again. As you can see, all not all of the indices of the panda's
data frame exists. So it starts at 1718 2026. There's lots of missing rows, because all
the missing rows have no missing data. So let's take a look at this first row, which
covers the stock Aflac, which is an insurance company, there is a non value here for price
to sales ratio, there is a nun value. Actually, that's the only non value for that column.
Now, if you look, in other columns, like this column, there's a non value for price to earnings
ratio.
And overall, these non values aren't super problematic, because in the next section,
we're going to see how to deal with it. So this says dealing with that dealing with missing
data is an important topic in data science. And there's two main approaches, you can either
drop the missing data using the drop na method, or you can replace the missing data with a
fill in a method.
So the following NF, it's pretty interesting, because you can actually
pass information that allows it to fill in missing data with summary data from that same
column. So as one example, if we have missing data in our price to earnings ratio column,
we can fill it in with the average value of all price earnings ratios for all the other
stocks in the data set. So that's exactly what we're going to do, we're going to replace
missing data with the average non null data point from that column.
So here's the code.
To do this, we need to create a for loop that loops over all of the columns in our panda's
data frame. So we'll say for column and RV underscore data frame dot columns. And then
let's just print this out to see what is happening. Great. As you can see, this is successfully
looping over our pandas data frames. And then what we want to say is our V underscore data
frame, dot fill, and a Now if you've never used this before, one easy way to see which
parameters it accepts is most into the the parentheses of the method and then hit Shift
Tab. And this will pop up a little bit of documentation. So what we're going to do is
we're going to say fill in the missing values with the average value from that column.
So
to get the average value, we're going to say, fill in a and then we're going to say, RV
underscore data frame, we're going to select that specific column from the data frame.
And then we're going to call the mean method on dot mean. And one other thing I forgot
to add here is that we don't want to apply this to the entire data frame, we want to
apply to this specific column on that data frame.
Now, the reason why you have to add
this is because if you didn't, for the first iteration of this loop, when it loops through
the ticker column, it would fill the entire data frame with the average value of the ticker
column, which is not what we want. So if we run this, we get a value error and says cannot
convert string to float. Now, the reason why this is is because you can't actually, so
what this is doing is it's trying to calculate a mean for the ticker column, which is obviously
impossible.
So we actually need to create a subset of our columns. And to do that, the
easiest way is probably to just print out our columns, early data columns. We will grab
in And then we need to modify our for loop so that it only loops over certain columns
of the panda's data frame. So we will put that there. And then does this still work
with it being multi line, it does awesome. So we want to take out all of our non numerical
columns. So that would be all the percentiles it would be PB percentile, p s percentile,
EV to EBITDA percentile.
Ebgp percentile, and are the percentile. And I'm actually going
to take out this Price column too, because there's not going to be any missing price
data. Alright, so if we run this, this should look a little better. I'm actually going to
put these all in one line just for readability. There we go. Alright, what happens when I
run this? Awesome. Now, if we run our same command from earlier, to identify missing
data in our data frame? If we have done this properly, this should return an empty data
frame because there shouldn't be any null values anymore. And there still is why is
that? This price to sales ratio still has quite a bit of null values.
Why is that? Oh,
I see. So within this fill in a method, you actually have to pass the in place equals
true. And this says, Don't just return the modified column, modify the original column
instead. So we'll do that. And then when we run this, we get an empty panda's data frame,
which shows us that we have successfully dealt with all the missing data in the original
data frame.
So that's kind of a high level introduction of how to deal with missing data
in pandas. There's lots of more information you could learn about this. And dealing with
missing data is a full topic in and of itself. So this is just a high level introduction,
we can now move on to calculating value percentiles. Alright, so what we need to do now is we need
to calculate value score percentiles for every stock in our universe. More specifically,
we need to calculate percentile scores for the following metrics, price to earnings ratio,
price to book ratio, price to sales ratio, Evie to EBIT and Evie to gross profit. Now,
unlike before, we can't use the start of every string as a way to interpolate in that string.
Because in the last project, we had the same two words, at the start of every momentum
metric, like an example would be, we had one year price momentum and one year, return percentile.
So we can't do that anymore, because we use some abbreviations. So if you look here, you
can't use price to earnings ratio to find the price earnings ratio percentile column
because there's an abbreviation.
So I did that intentionally so that you can see a different
way of approaching this problem. What we're going to do instead is I just copied the column
names to my clipboard like this. And we are going to turn this into a dictionary, we're
going to call this dictionary metrics. And here's what we're going to do. So we need
to wrap this entire thing in curly brackets. And for every metric column, that column name
is going to be a key for the dictionary. And that percentile column is going to be the
value of the dictionary like that. So if we do that same operation on a few different
lines, this will fill it up pretty quickly. All right, let's put these on our own lines
like that. Okay, so basically, what this does, I'm gonna run this code cell to make sure
it's formatted properly. And the way we're going to use this dictionary is we're going
to pass in a, a metric, and it will return back the name of the corresponding percentile
row.
So to calculate our percentile scores, we are going to loop through every metric
in this metrics dictionary. So here's how we'll do that we'll say for metric in metrics,
dot fees. And then within here, we need to calculate percentile scores. And we need to
do that for every row. So we need to create an inner loop that says for row in RV underscore
data frame, index, and then here's what we're gonna do. We're going to print the row now
this should print how many metrics are there five, five times this should print zero to
504. So we have 12345. Awesome. Okay, now here we need to use the pandas LLC method
to access the percentile score.
Columns. Now here's how we'll do that we'll say RV underscore
data frame, dot LLC. And then the first entry of this LSE method will be in a row and the
second entry will be the column name. And that will be metric. No sorry, that won't
be metric that will be metrics have metric. And this funny looking dictionary access statement
is basically pulling in a percentile column. To make sure that that's working properly,
let's just print this. And what this should do is just print a whole bunch of anaise.
Awesome. So this shows that we're accessing all of the different percentile columns. And
we want to do here is use the percentile of score to calculate a percentile. So to do
that, we'll say that is equal to stats dot percentile score.
And then within this, we're
gonna have two items. So this will be the entire column. And this will be the value
of that row at the column. So to get the entire column, we're just going to say, RV and just
look at the thing. And then we're gonna pass in the name of the column, so that will be
metric. And then over here, we need an over here, we want to get the value of the row
at that column. To do that, we're going to say, RV underscore data frame, LLC.
And instead
of saying, the metrics dictionary, at that metric, we're just gonna say the metric. So
let's run this and then print our panda's data frame to see what that looks like. Great.
So here, we have all the percentile scores. Now, you can see this appears to be right,
because here we have a low Evie to GP and has a low percentile. Here's a high Evie to
GP with a higher percentile. And you can see the same logic for all of these now, just
like before, this Python line is very long, and it's actually all fit on one line. So
I'll show you how you can easily make that a little cleaner. we'll import the function
specific resources from sci fi dot stats, import, percentile of score as score. And
then we can replace this with just score. And actually, since we use a different dictionary
method, this now fits on one line is much more readable. If you run it, they'll give
the same output. Great. So that's how we calculated our percentile scores. As a quick recap of
what is done in this code cell, we imported the percentile of score function from the
Sai pi dot stats module, and we named it score.
And then we created a dictionary of metrics
where every key in the dictionary is one of the valuation metrics that we're using an
every value of the dictionary is the corresponding percentile column of our panda's data frame.
And then we created a double nested for loop, where the first for loop loops over every
metric in our metric dictionary, and the second for loop loops over every row in our panda's
data frame. And then, within both of those four loops, we have an operation here that
calculates the percentile score of every evaluation metric and assigns that value to the corresponding
percentile score column. And at the very end, we just printed out our past data frame. Now
we can move on to calculating our lead score. So just like before, we're going to use the
mean function from the built in Python stats module. So we'll say from stats, import name,
and then we're just going to calculate the mean of five and 10. Make sure that this is
working properly. We say no module name stats, that's because I should have said statistics
Great.
So the mean of five and 10 is indeed 7.5. So that seems to be working properly.
Now what we need to do is we need to loop through every row of our panda's data frame,
and calculate the the avar calculate the RV score for that row. So to start, we'll just
say for throw in RV storage data frame index. And let's just print that row. So this should
print a bunch of numbers from zero to 504. So it goes zero all the way to 504. That is
excellent. Now, we need to access this row of the panda's data frame and like before,
we can do that with the LSC method. So what I'll do is I'll just copy this, paste it here,
and I'll change which column we're accessing to RV score.
And to make sure this is working
properly, let's just print this. Alright, that looks good. Now what we need to do is
access all of the different percentile scores from our pandas DataFrame and calculate their
mean. So on this side, we'll say mean and then since this will be a fairly long line,
I'm actually going to variables for all these. It's a P e ratio, PB ratio, PS ratio, Evie
to EBIT ratio, and Eb to gross profit. Now I need to grab the LSC method from up here.
Actually, I just thought a better way to do this, what we're going to do is we're going
to delete this, we're going to create a empty pandas list, and we're going to call that
value percentiles. And we want to grab this same for loop here. So for metric metrics
duckies. And we are going to say value underscore percentiles, dot append. And then we are going
to grab the value from the corresponding percentile.
Awesome. We want netwrix indexing metric.
Alright, we are going to now just do this over one iteration of the loop to make sure
it works properly. And before we do that, we actually have in here to calculate the
mean of this value percentiles. Yes. And inside the loop, we are going to correct the RV DataFrame.
Awesome. Now, one other thing we can do to test this actually is printed out value percentiles.
Alright, so to make sure this is working properly, all we have to do is compare this to this,
you can see the same, we have to compare this to this, you can see that the same, we have
to compare this to this and you can see that they're the same. And we have to compare this
to this, and this to this. Awesome, so that is working properly. And what we can do now
is delete these print statements. Make this loop iterate over the entire panda's data
frame. And then inside the loop, we can print the RV underscore data frame. Awesome, you
can see all of our percentile scores were correctly calculated.
And now we can move
on to calculating. Now we can move on to selecting the 50 best value stocks. Great. So as before,
we can use the panda sword value method to identify the 50 best stocks in our universe
according to the RV score column. So to do that, we access our RV data frame, we call
the store values method on it. And then in here, we have to pass in a few things we have
to pass in the column we want to sell it on. And then we have to specify the ascending
method. And we want this ascending to be true because we want the lowest valuation metrics
at the top. And then we have to specify in place once that's done, we can print our panda's
data frame outside of this or values given like this to see what modifications have been
made. All right, so now you can see the cheapest stocks according to a basket of different
valuation metrics. We have Kohl's we have FTI. I'm not actually sure what company that
is, but we can find it pretty easily FTI stock and it's technic FMP.
It looks like they're
a global oil and gas company. It makes sense that they're cheap right now because oil prices
are so low Asia HFC is hollyfrontier Corp. They are also an oil and gas company. And
Al is an airline. They are priced very cheaply right now because the entire travel industry
has experienced significant headwinds due to the covid 19 pandemic. And HP I believe
is Hewlett Packard. I note Yeah, Hewlett Packard enterprises and they make printers and those
sorts of things. They've kind of always traded a pretty cheap valuation. So that's that.
On the other end of the spectrum, you can see what stocks are trading at extremely high
valuations.
So dx cm, I believe is dex calm. They make continuous glucose monitoring. Yep,
they make continuous glucose monitoring systems for diabetes management. Now I'm not sure
what stock that is, but we can find out now. This is ServiceNow and they're an American
software company. Every software company in the world is trading at pretty high multiples
right now that we have AMD they make CPUs and other sorts of computer hardware just
like Nvidia and then pa yc i think is paychecks which is a it is paychecks they are a payment
processing company. Not pay comm although they're in the same business. It's an American
online payroll and human resource technology provider based in Oklahoma City. Okay, so
that's done. What we need to do now is we need to drop all of the stocks that are after
50th index. So to do that, we will say RV underscore data frame is equal to RV underscore
data frame at index zero to index 50.
And I actually didn't print that. So what we can
do now our data frame. Great. So this looks like it's been truncated to only contain the
50 cheapest stocks in our universe to make sure that there's like 50, what we can say
is we want to calculate the length of the index of the data frame. And that is 50. Awesome.
The only other thing we need to do is we need to filter or sorry, not filter, we need to
reset the index of a panda's data frame. And we saw before that we do reset underscore
index. And then we want to pass drop equals true into it to avoid duplicating the existing
index of the panda's data frame. And then we also want to pass in, in places where it's
true. And like all the other instances, including right here, what this in place equals true
parameter does is it means we're gonna actually modify the original panda's data frame instead
of just returning a modified version temporarily.
So we already have our data frame and see
what this looks like. Awesome. So now our index goes from zero, I should finish up 49.
Yep. And these are the 50 cheapest stocks in our universe. So now that we've selected
the 50 best value stocks, we can move on to calculating the number of shares to buy. Now,
earlier in this Jupyter Notebook, we defined a portfolio input function that will allow
us to easily create our portfolio size. So to do that, we will say portfolio underscore
size, we will run this by itself, and then we will enter a string object does not call
it a poll. Why is that? That's because we need to pass in portfolio input in our portfolio
size. All right, now we want to enter the value of our portfolio. So that's going to
be a million dollars. So one and then six zeros 123123.
Actually, let's spice things
up a bit and do something different. So we'll do two and a half million to two, five, and
then five zeros. 12345. Awesome. Now let's print our portfolio, size object or variable
i should say, can make sure that it's stored correctly. Awesome. Now we need to calculate
our position size. And we've done this twice now first in our equal weight s&p 500 strategy
and then in our momentum strategy. So you should be good to do this on your own, I would
encourage you to pause this video and try to do it by yourself. But if you want to follow
along, you can. So to do this, we will say position, size is equal to and then the numerator
of this division is going to be the portfolio size variable that we just created.
And the
denominator is going to be the length of our panda's data frame. So I've mentioned this
in earlier videos, but I prefer to just do the length of the RVM store data frame index,
and then we can print our position size. Now, what does this give us an unsupported operand
type for string and integer. That's because I actually ran into this problem in the momentum
project as well, we need to force this to be a float.
Now what is this giving us $50,000.
And if you think about it, our portfolio is two and a half million dollars. And there's
505 stocks in our sorry, 50 stocks in our universe. So if we want to make sure that
this is right, we can just test using the double equality operator, whether the position
size is equal to two and a half million divided by 50. And if this returns true, then we know
that the calculation has been done properly. Awesome. Now, we need to figure out how many
shares of each stock we need to buy to reach a position size of $50,000 in each stock.
So to do that, just like before, we're going to create a for loop that loops through every
row of our panda's data frame.
So we'll say for row in our VM, so data frame index, and
make sure that's living properly, we'll just print them that should go from zero 49. And
it does, which is awesome. And then we will say RV underscore DataFrame, LLC. And in here,
the first thing that we need to put in is the row, the second thing we need to put in
is the number of shares the by column. And to make sure that's working properly, let's
just print it out. The reason I always test these LLC methods is because sometimes if
you type the column title wrong, it won't work. And it looks like that's exactly what
happened here. So let's see what happened.
We have number of shares to buy. And to see
what the difference is I'm going to scroll up to where we originally defined that column
title, and I'm going to paste it beside it. So that's way up here. And so we have number
of shares to buy number of shares to buy. They look the same to me. Maybe that's not
actually what the error was. Let's go back down and actually read the error message that
could help during handling of the above session, okay, key error Okay, let's print out our
column names to see what's going on. Number of, Oh, my soul, it is it is just a typo,
I didn't put an S on share.
So that's the problem. And now this prints out all the entries
in our number of shares to buy column. Of course, that's not what we want, we need to
actually change the value of those cells in the panda's data frame. So to do that, we'll
use the assignment operator, the equal sign, and then we need to do a division. So what
goes on the top, it's our position size. And then in the numerator, we need to figure out
our stock price. So to do that, we will use another LLC and paste it there, we will replace
the column with price. And that should be all we need to calculate the number of chairs
we need to buy.
So now that that's done, let's print out our V data frame and run this code.
So great. So we can see here that for calls, it says by 2000 shares for FDI, it says by
6500 shares. And you'll notice that since we use the bigger portfolio size, these numbers
are just broadly larger than when we use the $1 million portfolio size, which makes sense.
You can also quickly sanity check this by looking at this stock, it has a lower stock
price than this stock. And because of that, you need to buy many more shares of that stock
to reach the required position size. So everything looks to be good here. The only other thing
we need to modify is to say math dot floor. And what this does is it rounds every entry
in that number of shares to by column down to the nearest integer, for all the reasons
that we described in the first two projects of this course.
So that is done, we can now
move on to formatting our Excel output. Great. So just like before, the first thing we need
to do is create our writer object. And to do that, we will say writer is equal to PD
dot Excel writer. And then in here, we have to pass in a few things. The first thing we
have to pass in is the name of the Excel file that we want to create, which is value underscore
strategy dot XLS x x x x x s LS. And then the next thing we need to pass in is the engine,
the engine that we use here, just like before, is XLS x writer. Now to make sure this is
working properly, you can just do the render dot save method that we have learned about
earlier. And then if you go into the folder that we're working in starter files, you should
see the file here, and it should just be empty. So here it is. And if you click on it, it's
just an empty XML document, which is awesome, because we haven't done any work on it yet.
So what we need to do now is we can delete this save method from that code.
So now we
need to access our RV data frame. And we use the to excel method to pass in the name of
to pass in a panda's data frame to the Excel file that we're working with. Now this takes
three things, you have to say Reiter, and then you have to say she underscore name is
equal to the value strategy. And then you have to pass in index equals false.
And this
just means that the index here will get passed into the panda's data frame. So we'll do all
that. That creates our fantasy frame. The next section is that we need to create the
format's that we'll need for our XLS x file. And you'll recall from our first project that
we've already have lots of practice with creating these formats.
And since we've already done
this, we're just going to run this code cell and then continue through this tutorial. So
the next thing we need to do is actually apply these formats to each of our columns. And
to do that, we're going to start by creating a dictionary and then we will loop through
that dictionary afterwards. So let's go up here to where we first created the columns
of our panda's data frame. And from there, we will paste them in and create a dictionary.
So here's all of our columns, we will scroll to the bottom to paste those in here, we'll
get the indexing set up properly. And then we will say column formats equals. And there
is I'm going to add this to its own line, and then I'm going to put this ticker on a
new line.
And then I'm going to get the indenting. Awesome, that looks a bit better. What we
need to do now is actually format this like a dictionary. So I'm going to use the multi
cursor functionality of this Jupyter Notebook to add my cursor to every one of these. And
then I'm going to add some boilerplate each of these lines. So the first thing we're gonna
do is specify a column. I'm going to put a on each line and then we'll fix it afterwards.
Then we need to put the semi colon which define basically is the defining characteristic of
a Python dictionary. And then we're going to create a Python list and to do that, we're
going to move everything At the end of each line, there we go, we're back one and then
add the closing list character. The other thing we're going to do is move back another
one, and add in a comma, and then string underscore format. Now, this worked well for every every
line except for a few.
So it's always good to go through and just sanity check these,
that one needed a bit of work and this one here, something wrong with it, that used to
go there, and we can delete that. Okay, I'm going to just run this code to see if this
is formatted properly. It's not string format is not defined. Okay, that's because it's
called string template, that is not good. What I'm going to do is do a Find and Replace.
So I haven't showed you guys this yet. But it's an interesting feature added up real
quick, if you hit Edit here, and then scroll down to find replace, what I'm going to do
is I'm going to search for string underscore format, and replace all of those with string
underscore template. Boom, that fixes that problem. Now, if I run this code, show code
cell, we should be good to go. And we are so that is awesome. What I need to do now
is change the column letters for each of these. So I hope you guys all know your alphabet
will go A, B, C, D, E, F, G, H, J, and R.
All right, now we can actually move on to
formatting our our Excel file. So what we'll do to start is we will say for column in column
underscore format stuff. So what this does is it loops through all of the column letters
that are keys of this dictionary, and make sure that's working properly. Let's just print
out all the column letters, it loops from a through n, and the dictionary goes from
a through n. So that is good. Now what we need to do is we need to call writer dot sheets.
And then we need to access the sheet that we named our panda's data frame, and that
is value strategy. So we will copy that, we will move back down here and we will paste
that in there. And then we would call the set column method. And you'll recall that
this takes a few different parameters, it takes the columns and that will look like
that it takes the column width, so we'll use 25 pixels.
And then it takes the format that
will look something like this string underscore template. Now, I'm going to run this loop.
And you'll notice that the statement that is being run with each loop doesn't actually
change with response to the loop so that this is just going to make sure that this is working
properly for column A. And then once that's done, we can go into about changing the values
for each iteration of the loop. I'm also going to call the writer dot save method to save
it. And we can now look in our working directory to see if this has formatted column A successfully
and it has, so we can now move on to interpolate values into this. So here we want to use column.
And we need to change this to an F strings that actually interpolate properly, we need
to use column there.
And then here, we need to access the second item of this list. So
to do that, we will say I can't quite scroll this up enough to get it away from my webcam
there. That should be okay. So here we want to say column underscore formats at column.
So this gives us the list and then we want the second item of that list. So we'll try
that and now we'll see what our data frame looks like. Actually, to do that, we need
to go back up here and run all this awesome, grow working directory, open this up. And
they've all been formatted with the string format. So that is good. What we need to do
now is we need to go up here and actually change the format that's being applied to
each one. So we have string template, dollar template, integer template, flow template
and percent template. And this float template here actually did wrong this should be 0.0.
We're on that. And then we'll go through here and change its format according to what should
be dollar this should be an integer.
This should be a float. P percentile should be
a percent. price to book ratio should be a float. price to book ratio percentile should
be a percent, price to sales ratio should be a float. So sales ratio percentile should
be a percent indash should be a float. Even a percentile should be a percent. You need
to gt should be a float. Maybe a GP percentile should be a percent and robust value score
should also be a percentage because it is an average of all the different percentiles
in our panda's data frame.
So we'll run all this again and see what this looks like. Awesome.
So we have ticker, which is a string price, which is $1 number of shares to buy, which
is an integer price to earnings ratio, which is a float p percentile. Everything looks
to be good here. Great. So the only other thing that needs change about this is that
we need to format our columns. And we also need to change our percentiles, because just
like in the momentum project, they are two orders of magnitude bigger than what they
should be.
So we'll fix the percentile score calculations first, that's pretty easy, all
I need to do is scroll up to here, find the loop where we calculated our percentile scores
calculated by percentiles. So if we divide this by 100, and that should fix the problem.
So we can do this, run a bunch of code cells, do a two and a half million dollar portfolio
again, and then we're back down to the format error. So I'm actually gonna go through all
of this and save it again, just to make sure that that was done properly.
If we open this
up, and we go here, we can now see that the percentiles all range from zero to 100, as
desired. Awesome. The next thing we need to do is actually format our column headers.
So to do that, we will say, writer dot sheets, and then passing values strategy, select these
sheets that we want to format, and then we call the right method. Now, you'll recall
that the right method takes a few different things and takes the cell we want to write.
So this will be for each row, or sorry, for each column, it will be the the first cell
of that column, so we'll say, column, and then one. So that's the first cell of that
column. The second thing we need is the actual value that we want to write into that cell.
So that will be the first entry of each of these lists just like before, so we paste
that in and use index zero. And then the last thing that it takes is the format that you
want to apply to that cell so that just like the set column method that we use here is
going to be the second entry of that list.
So we will pass it index one. Alright, so
that should be all the formatting. What I'm going to do next is just move this writer
dot save down to here, I'm going to delete all of these empty cells. And then I'm going
to scroll back up to where we created our Excel file. I'm going to run all the code,
and you should be good to go. So this started here, so I'm going to click that and then
do Shift Enter until I reached the end of the Jupyter Notebook.
And here's our final
spreadsheet. So as you can see, all the formats have been applied successfully. And you have
now created a quantitative value investing strategy that identifies the 50 cheapest stocks
in the s&p 500 using a basket of different valuation metrics. And that concludes our
third project. So kudos to you for making it all the way through this algorithmic trading
and Python course..
As an Amazon Associate I earn from qualifying purchases.