When exploring the world of quantitative finance or algorithmic trading, you quickly end up facing a very common issue. Where do you get historical market data? If you have an account with Interactive Brokers, you can download historical data from them using Python. This article will show you how.
No matter what sort of analysis or trading you plan to do, you’ll need access to quality market data for your research and development. This can be a challenging and possibly expensive process. If all you want is daily U.S. Equity closing prices for large cap stocks, you’ll probably be able to find this from a number of free or close to free sources. However, you may want intraday data (prices at hourly, minute, or even sub minute levels). Or maybe you need data for other types of securities (futures, bonds, foreign stocks, for example). In this case, you will find the data to be a bit more expensive and difficult to find. For example, I found that historical 1 minute data for the full S&P 500 going back to 1998 will cost over $750 from several vendors.
However, some brokerages will give you access to historical data as part of their service offerings. For example, Interactive Brokers (IB) offers APIs for fetching historical data at different resolutions. For many, this data may be good enough for historical backtesting and research. And the price you are already paying for market data includes this data.
OK, there has to be a catch, right?
Yes, there are several issues with downloading data from a broker like IB. They point out that you may want to purchase your data from a vendor that specializes in historical market data. Some of these issues are:
- Being forced to use a clunky API instead of just downloading bulk CSV files. Some vendors, like Polygon offer bulk file downloads of historical data. IB forces you to use their API which adds a little bit of complexity to the process.
- IB has placed restrictions on their APIs to prevent users from abusing the system. Downloads should be rate limited to avoid being flagged for abuse of the system. IB’s servers will also rate limit your results if you send too many requests, and you may get disconnected.
- IB doesn’t offer historical data for stocks that are no longer listed. Your dataset will automatically suffer from survivorship bias. Some companies are acquired at high prices, others go bankrupt or are delisted. In both cases your historical backtests will have neither of these scenarios included. It also appears some expired futures data is not available, but I haven’t been able to verify this yet.
Even given these issues, using IB to obtain some historical data for research is worth considering as a first option. This is especially true if you’re already paying for the market data. If it doesn’t meet your needs, you can always purchase data form someone else.
In order to fetch historical data, you need to have met several criteria:
- Opened an IB account, and funded it
- Downloaded and configured the TWS software and python API
- Subscribed to Level 1 (top of book) market data for any contracts you wish to query
Please see my earlier article on how to do all of the above. There’s an example application that describes the basics of IB’s APIs.
Along with these steps, IB places some limitations on fetching data:
- No more than 50 outstanding requests at a time. They note that it is probably more efficient to do fewer requests rather than try to test the upper limit.
- If asking for 30 second bars or lower, no 6 requests for the same contract in 2 seconds, 60 requests in 10 minutes, or two identical in 15 seconds. If you are grabbing consecutive single days for a symbol you can hit this limit pretty easily.
- In general, if your request will return more than a few thousand bars you should consider splitting it up.
So what sort of data is available? Bar data is available in sizes of 1, 5, 10, 15, and 30 seconds. Resolutions below 30 seconds are only available for six months from the current date. They will also generate larger bars of 1, 2, 3, 5, 10, 15, 20, and 30 minutes and 1, 2, 3, 4, and 8 hours, along with daily, weekly, and monthly bars. Those bars can consist of trades, bids and asks, midpoint, and various other fields described in the documentation. Note that building bars with last price and bid/ask will require at least two queries (TRADES and BID_ASK), then merging the data together. When considering the pacing of requests, this may factor into any downloading decisions.
In my testing, I found that more than a few thousand rows of data are returned for some queries (for example, fetching daily data for 40 years of AAPL returns over 9000 rows, 20 years of NVDA returns over 5000 rows at once. For minute bar data, I found that querying multiple days of daily data will cause rate limiting to take effect.
In order to run my code, you need to follow the directions from my earlier post to install the IB API. Once you’ve activated your Python virtualenv, you also need to make sure you’ve installed a few more Python libraries.
pyenv activate ib-example pip install python-dateutil matplotlib jupyter
I’ve posted a command line application to GitHub that allows for some flexible downloads of data. It supports a few different command line options for querying different ranges of data.
$ ./src/download_bars.py -h usage: Downloader for Interactive Brokers bar data. Using TWS API, will download historical instrument data and place csv files in a specified directory. Handles basic errors and reports issues with data that it finds. Examples: Get the continuous 1 minute bars for the E-mini future from GLOBEX ./download_bars.py --security-type CONTFUT --start-date 20191201 --end-date 20191228 --exchange GLOBEX ES Get 1 minute bars for US Equity AMGN for a few days ./download_bars.py --size "1 min" --start-date 20200202 --end-date 20200207 AMGN [-h] [-d] [--logfile LOGFILE] [-p PORT] [--size SIZE] [--duration DURATION] [-t DATA_TYPE] [--base-directory BASE_DIRECTORY] [--currency CURRENCY] [--exchange EXCHANGE] [--localsymbol LOCALSYMBOL] [--security-type SECURITY_TYPE] [--useRTH] [--start-date START_DATE] [--end-date END_DATE] [--max-days] symbol [symbol ...] positional arguments: symbol options: -h, --help show this help message and exit -d, --debug turn on debug logging --logfile LOGFILE log to file -p PORT, --port PORT local port for TWS connection --size SIZE bar size --duration DURATION bar duration -t DATA_TYPE, --data-type DATA_TYPE bar data type --base-directory BASE_DIRECTORY base directory to write bar files --currency CURRENCY currency for symbols --exchange EXCHANGE exchange for symbols --localsymbol LOCALSYMBOL local symbol (for futures) --security-type SECURITY_TYPE security type for symbols --useRTH use Regular Trading Hours --start-date START_DATE First day for bars --end-date END_DATE Last day for bars --max-days Set start date to earliest date
For example, to fetch all historical data for AAPL as daily bars and place the csv file in
./download_bars.py --max-days --size '1 day' AAPL
To fetch a week of 1 minute bars for AMGN, with each day saved as a separate csv file in
./download_bars.py --size "1 min" --start-date 20200202 --end-date 20200207 AMGN
You can refer to the code for more details. However, at a higher level using the IB historical data API involves several methods. First, I use the
reqHeadTimeStamp method to find the timestamp for the earliest data available for the contract. This is useful if we want to access the entire history of data, or to validate that we aren’t requesting data before the earliest date. Our result for this query is processed in the
headTimeStamp method. Next, we invoke the
reqHistoricalData method, making sure to request a reasonable amount of data. The results of this call are handled in the
historicalData method, which is called once for each bar. Once all the data has been delivered, the
historicalDataEnd method is invoked. There, we check that we’ve received all our data, save it to disk, and check to see if we have more data in our timespan to download. If so, we invoke the
reqHistoricalData method again, repeating this process until all the data is downloaded. All the IB methods are well documented in the IB API documentation.
I’ve also created a very simple Jupyter notebook that shows what some of the data looks like.
24 thoughts on “How to get historical market data from Interactive Brokers using Python”
./download_bars.py –start-date ‘2008-01-02’ –size ‘1 day’ IBM -t ‘ADJUSTED_LAST’
needs something like:
— download_bars.py.~1~ 2021-09-12 21:20:11.954378539 -0400
+++ download_bars.py 2021-09-12 21:43:20.546817876 -0400
@@ -67,7 +67,7 @@
cid, # tickerId, used to identify incoming data
– self.current.strftime(“%Y%m%d 00:00:00”), # always go to midnight
+ “”, # self.current.strftime(“%Y%m%d 00:00:00”), # always go to midnight
self.duration, # amount of time to go back
self.args.size, # bar size
self.args.data_type, # historical data type
Matt, Thanks so much for designing such awesome piece of code. This is a jewel. I am still a learner and that really helps a lot. Just one quick question, which part of the code take care of the pacing violation mentioned here (https://interactivebrokers.github.io/tws-api/historical_limitations.html). I am building a back tester and need to go as far as possible without having to wait days or worst being disconnected
I’m glad the code has been helpful. The code doesn’t currently explicitly take care of the pacing violations you link to, but practically it will not hit those limits if you run the script one symbol at a time. In my experience, the APIs are slow enough that running one symbol for a longer period of time tends to not hit pacing limits. But if you find it does, the script can be enhanced to insert some pauses. Pull requests are welcome!
If you decide to try running multiple concurrent requests, you will likely hit a limit. If you are building a backtester, my recommendation is to first download historical data for the symbols you are interested (maybe in a script you run overnight) and save the data in an archive. Then once you have it all saved locally, you can run multiple backtest iterations.
Hey Matt! Great to see your comprehensive guide on this topic! I wonder if historic option data can also be pulled by such requests? Have you tried them before?
or if its not available, how can we store streaming options data into our database and start building a historical options data ourselves?
I don’t currently subscribe to options prices on IB, so haven’t verified that my code works for options, but it should. If you look at the documentation, you’ll see that options are listed in the table. But you’ll need to subscribe to the data in order to get historical prices, as far as I know.
Hey Matt, Great stuff here regarding Interactive Brokers and downloading historic data – quick question, in your example “download_bars.py” you access “b.date, b.open, b.high, b.low, b.close, b.volume, b.barCount, b.average” for the security, do you know if other fields are available to pull in to the CSV file? Specifically, how could I call historical volatility/implied volatility (if available)? Many thanks, Matt K
The bar data is described here. It’s just prices and volume (if you’re getting trades).
As for historical volatility and implied volatility, that doesn’t appear to be supported by IB (see here). The best I can tell, if you wanted to listen to live data (including delayed data) using the real time market data API, you could get volatility. But historical data is probably going to cost you money from another source.
I have downloaded IB data with API using their R package. I downloaded 1 minute bars for all available contracts (they go back 2 years). For some reason when I backtest algos on older contracts, I get null results, whereas the latest 2-3 contracts provide positive results. Do you know if the older contracts (1 minute bars), have lower quality?
It all depends on the contracts you’re looking at. The documentation says that bar sizes of 30 seconds and lower are limited to 6 months, that futures are limited to two years. Also, data is not available for expired options, symbols that no longer trade, and some symbols that move between exchanges. This data is definitely not going to be of the same quality as what you’d purchase from a vendor specializing in historical data. You especially need to be careful if you’re making assumptions about the data without verifying it carefully, especially if you’re trading real money with it.
In general, working with US single stocks will have pretty complete historical data, as long as you aren’t ignoring the fact that you’ll be missing data from symbols that no longer trade.
Hey Matt, thank you for the code but I am bit confused on “Note that building bars with last price and bid/ask will require at least two queries (TRADES and BID_ASK), then merging the data together.” Can you please explain this bit more with a short code.
Aadil, thanks for reading. Happy to elaborate further. Bars can have lots of different data in them, and the most basic info is a trade. But sometimes you might be interested in the bid or ask price (or the Bid/Ask average). If you want Trades, Bids, and Asks, you’ll have to pull the data multiple times since IB only lets you ask for one of them at a time. If you used the sample code, you’d have to run it three times, then use something like pandas to read in each bar and merge the data into one larger dataframe with the Trade, Bid, and Ask. I should probably create a pandas merging article off that idea when I get some time.
Thank you for this tutorial.
From the earlier comments, it seems that only 2 years’ historical data of 1 minute resolution is provided by IB. This is a good start…
Have you come across any affordable data provider (for intraday 1 min bars) for historical data of US futures?
Hi Karthik, I haven’t really done any searches for alternatives. I know there are a few out there, including the CME itself. As usual, pricing will completely be dependent on your own situation (pro status, length of time, granularity, etc. )
Hey Matt, It’s a really nice script you did there, following an open issue that you already have: TWS>9.81.1 not returning full day of data for futures, any plan of solving that?
Hi Sasha, I saw your ticket and have tried a few things but am not sure what to do to fix it. If I can get some free time I’ll take a look at it, I have a few thought on improving the overall design of the script.
Hi Matt, I tried using your scripts to pull some historical data but the API returns the following error:
” ValueError: time data ‘19801212’ does not match format ‘%Y%m%d-%H:%M:%S’ ”
The change in date/time formatting seems to be a recent one and I am facing this issue with all other python tools I’m using. Is this something that is easily fixed? From the API in TWS I think date time need to be provided with the dash…
Stefan – I recently pushed a new version of the script on GitHub, try that and see if it works for you. If it still doesn’t, you can open an issue there for tracking.
Hi Matt, I cloned the latest repo but getting this error below when running your AAPL example (same for AMGN example)
File “C:\Program Files\Python311\Lib_strptime.py”, line 349, in _strptime
raise ValueError(“time data %r does not match format %r” %
ValueError: time data ‘19801212’ does not match format ‘%Y-%m-%d %H:%M:%S%z’st
Could you please help/suggest?
Also, I am a bit puzzled that there are no TWS credentials required. Am I just supposed to be logged in TWS ( I was when I got those errors). Thanks!
Alexander – I’d suggest opening an issue in GitHub, it’d be easier to track there. That error looks like you’re getting the head timestamp in the old format. I tested the latest version using the TWS API 10.19.1.
The other option is to check out an older version of the script that was built using the 9.x version.
Hi Alexander, did you manage to resolve this? I cannot see an issue recorded in GitHub and I am experiencing the same.
Milan, you can open an issue, or I can do it but need to know what version of TWS you are using.
Thanks for opening the issue, I pushed a change that should get you going. Please feel free to open other issues as you spot them, and PRs are welcome if anyone has improvements or other fixes.
Hi Matt, do you think it is possible to store all the tick data in real time that I have access to in IB? I suspect it will be terabytes of data.
Storing all tick data surely will get into terabytes in size pretty soon, but you probably won’t be able to get that much out of IB. First, you will have limits on how many instruments you can listen to at once. Second, the APIs don’t send you the full book, just the top of book, and I’m not even sure if they send you every update (i.e. they may conflate some updates even if you try to use the tick by tick apis). You could surely try it out and see how much you can capture. You’d want to use a more efficient storage system than CSV files.
Trading firms listen to the full data feeds from the exchange and store compressed packet captures, then they replay those captures into their systems. You can learn a bit about this by reading up on Wireshark, tcpdump, etc.
You can also download historical tick data, but it might take you forever since it appears to be limited to 1000 data points per request.