Scraping Stock Market Prices With Impressive Ease
Language
- unknown
by James (Golang Project Structure Admin)
Global stock markets increasingly affect everyone’s lives, whether we like it or not.
Investors seek financial returns from profitable companies, and workers may find it easier to bargain for new jobs or pay rises when the markets are riding high, at least if they possess the sorts of skills that are in demand. Economists and politicians also use prices to judge how much confidence there is in a country or industry’s biggest businesses.
This post will show you how to scrape the web to retrieve information about a specific company’s stock.
Table of Contents
Sources of Information
There are commercially available APIs that will provide reliable and regularly updated sources of information — often in the JSON or XML file formats — on the movement of prices on the global stock markets. If you’re building something critically important, then you should probably use one of these. Two companies that are providing some of the best APIs are Alpha Vantage and IEX Cloud. But subscriptions to access those APIs cost money or have restrictive limits for free accounts.
Since we’re just going to download some basic information on the price of a single stock, we don’t need to go to all that trouble. We can just scrape publicly accessible information from the internet. The source that I’m going to use is Yahoo Finance.
Downloading the Page’s HTML
One of the biggest companies on the New York Stock Exchange right now is Tesla. So let’s use that in our example code, and we’ll be able to see whether Elon has managed to send the stock price soaring or tumbling with his Twitter trolling today.
Every major stock has its own symbol, which is usually three or four letters long and helps to uniquely identify the company in question. Tesla’s stock symbol is TSLA. We’ll use this in the URL of the webpage that we’re going to scrape our information from.
package main
import (
"fmt"
"io"
"log"
"net/http"
)
const (
symbol = "TSLA"
)
func main() {
res, err := http.Get("https://finance.yahoo.com/quote/" + symbol)
if err != nil {
log.Fatalln(err)
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code %d: %s", res.StatusCode, res.Status)
}
body, err := io.ReadAll(res.Body)
if err != nil {
log.Fatalln(err)
}
fmt.Println(string(body))
}
In the example above, we simply perform an HTTP GET request using the client in Go’s standard library. We then print out all of the page’s HTML, just to test that it’s working — or we exit the program with a fatal error, if, for any reason, we can’t download the page. Check your internet connection before you run the program, to make sure that you can access the web. You could even try loading the Yahoo page in your web browser, so you know what it looks like.
Getting the Market Price
Now that we’ve been able to download the webpage, we’re going to use it to get the information we need.
We could have used the HTML-parsing package created by the Go team (golang.org/x/net/html), but there’s a much easier option that we’re going to use. Goquery is a Golang package that is designed to work similarly to the tried-and-trusted Javascript library jQuery: in other words, it makes it really easy to find specific HTML elements on a page using selectors and to modify them or get their attributes or content.
package main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
const (
symbol = "TSLA"
)
func main() {
res, err := http.Get("https://finance.yahoo.com/quote/" + symbol)
if err != nil {
panic(err)
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code %d: %s", res.StatusCode, res.Status)
}
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
log.Fatalln(err)
}
marketPrice := doc.Find("[data-field=regularMarketPrice][data-symbol=" + symbol + "]").Text()
if marketPrice == "" {
log.Fatalf("cannot access market price")
}
fmt.Printf("%s price: %s\n", symbol, marketPrice)
}
You can see in the code above how we just create a document in Goquery using the body of our HTTP request, which provides the HTML of the webpage, and then use the Find
method to select a specific element. In this case, we want to check the stock-market price of Tesla, so we select the element on the page that contains the price and then we store it as a strong using the Text
method (similar to Javascript’s innerText
property or the text
method in jQuery).
Remember that you’ll need to run the go mod init
and go mod tidy
commands in the directory where your Go code is located, if you haven’t already set up a module and downloaded the necessary package.
If everything worked successfully, the code above should have printed out the current price of Tesla, using the latest data from the US stock markets. We stored the marketPrice
variable as a string, but you could easily convert it to a float64 or two ints (one for the dollars and the other for cents, using the decimal place as a separator). However, we only want to display the number in our console, not manipulate it, so there’s no need for us to parse the string: we’ll just print it out.
We now have a program that connects to the web, downloads a page containing information about a particular company on the stock market, parses that page, gets the price and prints it out the to the screen. I hope you’re as impressed as I am at how easy that was to achieve!
Creating a Helper Function
At the moment, we’re just stuffing everything into our main function, but let’s create a helper function to tidy things up. We’ll separate the code that downloads the webpage from the code that parses it.
func downloadStockWebpageBody(symbol string) io.ReadCloser {
res, err := http.Get("https://finance.yahoo.com/quote/" + symbol)
if err != nil {
log.Fatalln(err)
}
if res.StatusCode != 200 {
log.Fatalf("status code %d: %s", res.StatusCode, res.Status)
}
return res.Body
}
The downloadStockWebpageBody
function only performs a single task: it connects to Yahoo, downloads the webpage containing the necessary stock-market information and returns the HTTP response body (so we can pass it to goquery). Notice that there is one small but important difference between our code here and the code that we had in the main
function though: we are no longer using the defer
keyword to close the response body, because, if we did, it would not be usable when we returned it to the calling function, which would defeat the purpose of returning it entirely. So we will keep the defer
call in the main
function.
Scraping More Information
We can now use the helper function that we’ve just defined with our code below, in order to access even more information from the same page that we downloaded earlier. By separating the HTTP-client code into the downloadStockWebpageBody
function, we can more clearly see what information we’re scraping here:
package main
import (
"fmt"
"log"
"github.com/PuerkitoBio/goquery"
)
func downloadStockWebpageBody(symbol string) io.ReadCloser
const (
symbol = "TSLA"
)
func main() {
body := downloadStockWebpageBody(symbol)
defer body.Close()
doc, err := goquery.NewDocumentFromReader(body)
if err != nil {
log.Fatalln(err)
}
getTextByField := func(field string) string {
s := "[data-field=" + field + "][data-symbol=" + symbol + "]"
return doc.Find(s).Text()
}
marketPrice := getTextByField("regularMarketPrice")
marketChange := getTextByField("regularMarketChange")
marketChangePercent := getTextByField("regularMarketChangePercent")
if marketPrice == "" || marketChange == "" || marketChangePercent == "" {
log.Fatalln("cannot access market price")
}
fmt.Printf(
"%s\n%s\n\t %s\n\t%s\n",
symbol,
marketPrice,
marketChange,
marketChangePercent,
)
}
You can see here that we have now created a lambda function and stored it in the getTextByField
variable within the main function, since doing that reduces the need to type out the same code when we try to access very similar HTML elements: all that differentiates them is the value of the data attribute, which we take as an argument to the function.
Here’s the Final Result
Below is the output that the previous program produced for me. Of course, you’ll almost certainly get a different result, depending on how the stock markets in general and Tesla in particular are performing when you run the program.
TSLA
1,007.18
-7.79
(-0.77%)
The first number is the price of the stock in US dollars. That’s a pretty big number: it really does cost a significant amount of money just to buy a single share in Tesla. However, many financial-services companies now offer the ability to buy fractional stocks, allowing you to own only a portion of a single share (perhaps buying a hundredth of a TSLA stock for just over $10), which makes it possible for retail investors with relatively small amounts of money to get involved.
The first indented number in the output above is how much the share price has risen or, in this case, fallen since the stock markets opened. You can see that Tesla’s share price fell on the day that I ran this program, since this is a negative number. If you want to know in relative terms how much it fell by, then it may be more useful to look at the percentage change in the share price, which is the final number shown in parentheses.
Disclosure
While I do own some investments that track US markets, I do not personally own Tesla stock, so I’m not trying to give any explicit or implicit investment advice with this blog post. It’s not my intention to persuade you to buy or sell any financial securities, and you shouldn’t risk your money on the markets unless you can afford to lose it and have a clear idea of what you’re doing.
The purpose of this post has just been to show how easy it is to scrape publicly accessible information in order to process, modify or display it in your Go programs, so I hope you enjoyed my walkthrough of the code. See if you can use my examples as a base to build something bigger and even more impressive!