How to get historical market data from Interactive Brokers using Python

When exploring the world of quantitative finance or algorithmic trading, you quickly end up facing a very common issue. Where do you get historical market data? If you have an account with Interactive Brokers, you can download historical data from them using Python. This article will show you how.

No matter what sort of analysis or trading you plan to do, you’ll need access to quality market data for your research and development. This can be a challenging and possibly expensive process. If all you want is daily U.S. Equity closing prices for large cap stocks, you’ll probably be able to find this from a number of free or close to free sources. However, you may want intraday data (prices at hourly, minute, or even sub minute levels). Or maybe you need data for other types of securities (futures, bonds, foreign stocks, for example). In this case, you will find the data to be a bit more expensive and difficult to find. For example, I found that historical 1 minute data for the full S&P 500 going back to 1998 will cost over $750 from several vendors.

However, some brokerages will give you access to historical data as part of their service offerings. For example, Interactive Brokers (IB) offers APIs for fetching historical data at different resolutions. For many, this data may be good enough for historical backtesting and research. And the price you are already paying for market data includes this data.

OK, there has to be a catch, right?

Yes, there are several issues with downloading data from a broker like IB. They point out that you may want to purchase your data from a vendor that specializes in historical market data. Some of these issues are:

  • Being forced to use a clunky API instead of just downloading bulk CSV files. Some vendors, like Polygon offer bulk file downloads of historical data. IB forces you to use their API which adds a little bit of complexity to the process.
  • IB has placed restrictions on their APIs to prevent users from abusing the system. Downloads should be rate limited to avoid being flagged for abuse of the system. IB’s servers will also rate limit your results if you send too many requests, and you may get disconnected.
  • IB doesn’t offer historical data for stocks that are no longer listed. Your dataset will automatically suffer from survivorship bias. Some companies are acquired at high prices, others go bankrupt or are delisted. In both cases your historical backtests will have neither of these scenarios included. It also appears some expired futures data is not available, but I haven’t been able to verify this yet.

Even given these issues, using IB to obtain some historical data for research is worth considering as a first option. This is especially true if you’re already paying for the market data. If it doesn’t meet your needs, you can always purchase data form someone else.

The process

In order to fetch historical data, you need to have met several criteria:

  • Opened an IB account, and funded it
  • Downloaded and configured the TWS software and python API
  • Subscribed to Level 1 (top of book) market data for any contracts you wish to query

Please see my earlier article on how to do all of the above. There’s an example application that describes the basics of IB’s APIs.

Along with these steps, IB places some limitations on fetching data:

  • No more than 50 outstanding requests at a time. They note that it is probably more efficient to do fewer requests rather than try to test the upper limit.
  • If asking for 30 second bars or lower, no 6 requests for the same contract in 2 seconds, 60 requests in 10 minutes, or two identical in 15 seconds. If you are grabbing consecutive single days for a symbol you can hit this limit pretty easily.
  • In general, if your request will return more than a few thousand bars you should consider splitting it up.

So what sort of data is available? Bar data is available in sizes of 1, 5, 10, 15, and 30 seconds. Resolutions below 30 seconds are only available for six months from the current date. They will also generate larger bars of 1, 2, 3, 5, 10, 15, 20, and 30 minutes and 1, 2, 3, 4, and 8 hours, along with daily, weekly, and monthly bars. Those bars can consist of trades, bids and asks, midpoint, and various other fields described in the documentation. Note that building bars with last price and bid/ask will require at least two queries (TRADES and BID_ASK), then merging the data together. When considering the pacing of requests, this may factor into any downloading decisions.

In my testing, I found that more than a few thousand rows of data are returned for some queries (for example, fetching daily data for 40 years of AAPL returns over 9000 rows, 20 years of NVDA returns over 5000 rows at once. For minute bar data, I found that querying multiple days of daily data will cause rate limiting to take effect.

Configuration

In order to run my code, you need to follow the directions from my earlier post to install the IB API. Once you’ve activated your Python virtualenv, you also need to make sure you’ve installed a few more Python libraries.

pyenv activate ib-example
pip install python-dateutil matplotlib jupyter

The code

I’ve posted a command line application to GitHub that allows for some flexible downloads of data. It supports a few different command line options for querying different ranges of data.

$ ./src/download_bars.py -h
usage:
    Downloader for Interactive Brokers bar data. Using TWS API, will download
    historical instrument data and place csv files in a specified directory.
    Handles basic errors and reports issues with data that it finds.

    Examples:
    Get the continuous 1 minute bars for the E-mini future from GLOBEX
        ./download_bars.py --security-type CONTFUT --start-date 20191201 --end-date 20191228 --exchange GLOBEX ES

    Get 1 minute bars for US Equity AMGN for a few days
        ./download_bars.py --size "1 min" --start-date 20200202 --end-date 20200207 AMGN

       [-h] [-d] [--logfile LOGFILE] [-p PORT] [--size SIZE] [--duration DURATION] [-t DATA_TYPE]
       [--base-directory BASE_DIRECTORY] [--currency CURRENCY] [--exchange EXCHANGE] [--localsymbol LOCALSYMBOL]
       [--security-type SECURITY_TYPE] [--useRTH] [--start-date START_DATE] [--end-date END_DATE] [--max-days]
       symbol [symbol ...]

positional arguments:
  symbol

options:
  -h, --help            show this help message and exit
  -d, --debug           turn on debug logging
  --logfile LOGFILE     log to file
  -p PORT, --port PORT  local port for TWS connection
  --size SIZE           bar size
  --duration DURATION   bar duration
  -t DATA_TYPE, --data-type DATA_TYPE
                        bar data type
  --base-directory BASE_DIRECTORY
                        base directory to write bar files
  --currency CURRENCY   currency for symbols
  --exchange EXCHANGE   exchange for symbols
  --localsymbol LOCALSYMBOL
                        local symbol (for futures)
  --security-type SECURITY_TYPE
                        security type for symbols
  --useRTH              use Regular Trading Hours
  --start-date START_DATE
                        First day for bars
  --end-date END_DATE   Last day for bars
  --max-days            Set start date to earliest date

For example, to fetch all historical data for AAPL as daily bars and place the csv file in ./data/STK/1_day/AAPL.csv, run:

./download_bars.py --max-days --size '1 day' AAPL

To fetch a week of 1 minute bars for AMGN, with each day saved as a separate csv file in ./data/STK/1_min/AMGN, run:

./download_bars.py --size "1 min" --start-date 20200202 --end-date 20200207 AMGN

You can refer to the code for more details. However, at a higher level using the IB historical data API involves several methods. First, I use the reqHeadTimeStamp method to find the timestamp for the earliest data available for the contract. This is useful if we want to access the entire history of data, or to validate that we aren’t requesting data before the earliest date. Our result for this query is processed in the headTimeStamp method. Next, we invoke the reqHistoricalData method, making sure to request a reasonable amount of data. The results of this call are handled in the historicalData method, which is called once for each bar. Once all the data has been delivered, the historicalDataEnd method is invoked. There, we check that we’ve received all our data, save it to disk, and check to see if we have more data in our timespan to download. If so, we invoke the reqHistoricalData method again, repeating this process until all the data is downloaded. All the IB methods are well documented in the IB API documentation.

I’ve also created a very simple Jupyter notebook that shows what some of the data looks like.

Don't miss any articles!

If you like this article, give me your email and I'll send you my latest articles along with other helpful links and tips with a focus on Python, pandas, and related tools.

Invalid email address
I promise not to spam you, and you can unsubscribe at any time.

9 thoughts on “How to get historical market data from Interactive Brokers using Python”

  1. ./download_bars.py –start-date ‘2008-01-02’ –size ‘1 day’ IBM -t ‘ADJUSTED_LAST’

    needs something like:

    — download_bars.py.~1~ 2021-09-12 21:20:11.954378539 -0400
    +++ download_bars.py 2021-09-12 21:43:20.546817876 -0400
    @@ -67,7 +67,7 @@
    self.reqHistoricalData(
    cid, # tickerId, used to identify incoming data
    contract,
    – self.current.strftime(“%Y%m%d 00:00:00”), # always go to midnight
    + “”, # self.current.strftime(“%Y%m%d 00:00:00”), # always go to midnight
    self.duration, # amount of time to go back
    self.args.size, # bar size
    self.args.data_type, # historical data type

    1. Olivier,

      I’m glad the code has been helpful. The code doesn’t currently explicitly take care of the pacing violations you link to, but practically it will not hit those limits if you run the script one symbol at a time. In my experience, the APIs are slow enough that running one symbol for a longer period of time tends to not hit pacing limits. But if you find it does, the script can be enhanced to insert some pauses. Pull requests are welcome!

      If you decide to try running multiple concurrent requests, you will likely hit a limit. If you are building a backtester, my recommendation is to first download historical data for the symbols you are interested (maybe in a script you run overnight) and save the data in an archive. Then once you have it all saved locally, you can run multiple backtest iterations.

  2. Hey Matt! Great to see your comprehensive guide on this topic! I wonder if historic option data can also be pulled by such requests? Have you tried them before?
    or if its not available, how can we store streaming options data into our database and start building a historical options data ourselves?

    1. Jason,

      I don’t currently subscribe to options prices on IB, so haven’t verified that my code works for options, but it should. If you look at the documentation, you’ll see that options are listed in the table. But you’ll need to subscribe to the data in order to get historical prices, as far as I know.

      Good luck!

  3. Hey Matt, Great stuff here regarding Interactive Brokers and downloading historic data – quick question, in your example “download_bars.py” you access “b.date, b.open, b.high, b.low, b.close, b.volume, b.barCount, b.average” for the security, do you know if other fields are available to pull in to the CSV file? Specifically, how could I call historical volatility/implied volatility (if available)? Many thanks, Matt K

    1. Matt,

      The bar data is described here. It’s just prices and volume (if you’re getting trades).

      As for historical volatility and implied volatility, that doesn’t appear to be supported by IB (see here). The best I can tell, if you wanted to listen to live data (including delayed data) using the real time market data API, you could get volatility. But historical data is probably going to cost you money from another source.

  4. I have downloaded IB data with API using their R package. I downloaded 1 minute bars for all available contracts (they go back 2 years). For some reason when I backtest algos on older contracts, I get null results, whereas the latest 2-3 contracts provide positive results. Do you know if the older contracts (1 minute bars), have lower quality?

    1. It all depends on the contracts you’re looking at. The documentation says that bar sizes of 30 seconds and lower are limited to 6 months, that futures are limited to two years. Also, data is not available for expired options, symbols that no longer trade, and some symbols that move between exchanges. This data is definitely not going to be of the same quality as what you’d purchase from a vendor specializing in historical data. You especially need to be careful if you’re making assumptions about the data without verifying it carefully, especially if you’re trading real money with it.

      In general, working with US single stocks will have pretty complete historical data, as long as you aren’t ignoring the fact that you’ll be missing data from symbols that no longer trade.

Have anything to say about this topic?