How to get historical market data from Interactive Brokers using Python

When exploring the world of quantitative finance or algorithmic trading, you quickly end up facing a very common issue. Where do you get historical market data? If you have an account with Interactive Brokers, you can download historical data from them using Python. This article will show you how.

No matter what sort of analysis or trading you plan to do, you’ll need access to quality market data for your research and development. This can be a challenging and possibly expensive process. If all you want is daily U.S. Equity closing prices for large cap stocks, you’ll probably be able to find this from a number of free or close to free sources. However, you may want intraday data (prices at hourly, minute, or even sub minute levels). Or maybe you need data for other types of securities (futures, bonds, foreign stocks, for example). In this case, you will find the data to be a bit more expensive and difficult to find. For example, I found that historical 1 minute data for the full S&P 500 going back to 1998 will cost over $750 from several vendors.

However, some brokerages will give you access to historical data as part of their service offerings. For example, Interactive Brokers (IB) offers APIs for fetching historical data at different resolutions. For many, this data may be good enough for historical backtesting and research. And the price you are already paying for market data includes this data.

OK, there has to be a catch, right?

Yes, there are several issues with downloading data from a broker like IB. They point out that you may want to purchase your data from a vendor that specializes in historical market data. Some of these issues are:

  • Being forced to use a clunky API instead of just downloading bulk CSV files. Some vendors, like Polygon offer bulk file downloads of historical data. IB forces you to use their API which adds a little bit of complexity to the process.
  • IB has placed restrictions on their APIs to prevent users from abusing the system. Downloads should be rate limited to avoid being flagged for abuse of the system. IB’s servers will also rate limit your results if you send too many requests, and you may get disconnected.
  • IB doesn’t offer historical data for stocks that are no longer listed. Your dataset will automatically suffer from survivorship bias. Some companies are acquired at high prices, others go bankrupt or are delisted. In both cases your historical backtests will have neither of these scenarios included. It also appears some expired futures data is not available, but I haven’t been able to verify this yet.

Even given these issues, using IB to obtain some historical data for research is worth considering as a first option. This is especially true if you’re already paying for the market data. If it doesn’t meet your needs, you can always purchase data form someone else.

The process

In order to fetch historical data, you need to have met several criteria:

  • Opened an IB account, and funded it
  • Downloaded and configured the TWS software and python API
  • Subscribed to Level 1 (top of book) market data for any contracts you wish to query

Please see my earlier article on how to do all of the above. There’s an example application that describes the basics of IB’s APIs.

Along with these steps, IB places some limitations on fetching data:

  • No more than 50 outstanding requests at a time. They note that it is probably more efficient to do fewer requests rather than try to test the upper limit.
  • If asking for 30 second bars or lower, no 6 requests for the same contract in 2 seconds, 60 requests in 10 minutes, or two identical in 15 seconds. If you are grabbing consecutive single days for a symbol you can hit this limit pretty easily.
  • In general, if your request will return more than a few thousand bars you should consider splitting it up.

So what sort of data is available? Bar data is available in sizes of 1, 5, 10, 15, and 30 seconds. Resolutions below 30 seconds are only available for six months from the current date. They will also generate larger bars of 1, 2, 3, 5, 10, 15, 20, and 30 minutes and 1, 2, 3, 4, and 8 hours, along with daily, weekly, and monthly bars. Those bars can consist of trades, bids and asks, midpoint, and various other fields described in the documentation. Note that building bars with last price and bid/ask will require at least two queries (TRADES and BID_ASK), then merging the data together. When considering the pacing of requests, this may factor into any downloading decisions.

In my testing, I found that more than a few thousand rows of data are returned for some queries (for example, fetching daily data for 40 years of AAPL returns over 9000 rows, 20 years of NVDA returns over 5000 rows at once. For minute bar data, I found that querying multiple days of daily data will cause rate limiting to take effect.

Configuration

In order to run my code, you need to follow the directions from my earlier post to install the IB API. Once you’ve activated your Python virtualenv, you also need to make sure you’ve installed a few more Python libraries.

pyenv activate ib-example
pip install python-dateutil matplotlib jupyter

The code

I’ve posted a command line application to GitHub that allows for some flexible downloads of data. It supports a few different command line options for querying different ranges of data.

$ ./src/download_bars.py -h
usage:
    Downloader for Interactive Brokers bar data. Using TWS API, will download
    historical instrument data and place csv files in a specified directory.
    Handles basic errors and reports issues with data that it finds.

    Examples:
    Get the continuous 1 minute bars for the E-mini future from GLOBEX
        ./download_bars.py --security-type CONTFUT --start-date 20191201 --end-date 20191228 --exchange GLOBEX ES

    Get 1 minute bars for US Equity AMGN for a few days
        ./download_bars.py --size "1 min" --start-date 20200202 --end-date 20200207 AMGN

       [-h] [-d] [--logfile LOGFILE] [-p PORT] [--size SIZE] [--duration DURATION] [-t DATA_TYPE]
       [--base-directory BASE_DIRECTORY] [--currency CURRENCY] [--exchange EXCHANGE] [--localsymbol LOCALSYMBOL]
       [--security-type SECURITY_TYPE] [--useRTH] [--start-date START_DATE] [--end-date END_DATE] [--max-days]
       symbol [symbol ...]

positional arguments:
  symbol

options:
  -h, --help            show this help message and exit
  -d, --debug           turn on debug logging
  --logfile LOGFILE     log to file
  -p PORT, --port PORT  local port for TWS connection
  --size SIZE           bar size
  --duration DURATION   bar duration
  -t DATA_TYPE, --data-type DATA_TYPE
                        bar data type
  --base-directory BASE_DIRECTORY
                        base directory to write bar files
  --currency CURRENCY   currency for symbols
  --exchange EXCHANGE   exchange for symbols
  --localsymbol LOCALSYMBOL
                        local symbol (for futures)
  --security-type SECURITY_TYPE
                        security type for symbols
  --useRTH              use Regular Trading Hours
  --start-date START_DATE
                        First day for bars
  --end-date END_DATE   Last day for bars
  --max-days            Set start date to earliest date

For example, to fetch all historical data for AAPL as daily bars and place the csv file in ./data/STK/1_day/AAPL.csv, run:

./download_bars.py --max-days --size '1 day' AAPL

To fetch a week of 1 minute bars for AMGN, with each day saved as a separate csv file in ./data/STK/1_min/AMGN, run:

./download_bars.py --size "1 min" --start-date 20200202 --end-date 20200207 AMGN

You can refer to the code for more details. However, at a higher level using the IB historical data API involves several methods. First, I use the reqHeadTimeStamp method to find the timestamp for the earliest data available for the contract. This is useful if we want to access the entire history of data, or to validate that we aren’t requesting data before the earliest date. Our result for this query is processed in the headTimeStamp method. Next, we invoke the reqHistoricalData method, making sure to request a reasonable amount of data. The results of this call are handled in the historicalData method, which is called once for each bar. Once all the data has been delivered, the historicalDataEnd method is invoked. There, we check that we’ve received all our data, save it to disk, and check to see if we have more data in our timespan to download. If so, we invoke the reqHistoricalData method again, repeating this process until all the data is downloaded. All the IB methods are well documented in the IB API documentation.

I’ve also created a very simple Jupyter notebook that shows what some of the data looks like.

Don't miss any articles!

If you like this article, give me your email and I'll send you my latest articles along with other helpful links and tips with a focus on Python, pandas, and related tools.

Invalid email address
I promise not to spam you, and you can unsubscribe at any time.

35 thoughts on “How to get historical market data from Interactive Brokers using Python”

  1. ./download_bars.py –start-date ‘2008-01-02’ –size ‘1 day’ IBM -t ‘ADJUSTED_LAST’

    needs something like:

    — download_bars.py.~1~ 2021-09-12 21:20:11.954378539 -0400
    +++ download_bars.py 2021-09-12 21:43:20.546817876 -0400
    @@ -67,7 +67,7 @@
    self.reqHistoricalData(
    cid, # tickerId, used to identify incoming data
    contract,
    – self.current.strftime(“%Y%m%d 00:00:00”), # always go to midnight
    + “”, # self.current.strftime(“%Y%m%d 00:00:00”), # always go to midnight
    self.duration, # amount of time to go back
    self.args.size, # bar size
    self.args.data_type, # historical data type

    1. Olivier,

      I’m glad the code has been helpful. The code doesn’t currently explicitly take care of the pacing violations you link to, but practically it will not hit those limits if you run the script one symbol at a time. In my experience, the APIs are slow enough that running one symbol for a longer period of time tends to not hit pacing limits. But if you find it does, the script can be enhanced to insert some pauses. Pull requests are welcome!

      If you decide to try running multiple concurrent requests, you will likely hit a limit. If you are building a backtester, my recommendation is to first download historical data for the symbols you are interested (maybe in a script you run overnight) and save the data in an archive. Then once you have it all saved locally, you can run multiple backtest iterations.

  2. Hey Matt! Great to see your comprehensive guide on this topic! I wonder if historic option data can also be pulled by such requests? Have you tried them before?
    or if its not available, how can we store streaming options data into our database and start building a historical options data ourselves?

    1. Jason,

      I don’t currently subscribe to options prices on IB, so haven’t verified that my code works for options, but it should. If you look at the documentation, you’ll see that options are listed in the table. But you’ll need to subscribe to the data in order to get historical prices, as far as I know.

      Good luck!

  3. Hey Matt, Great stuff here regarding Interactive Brokers and downloading historic data – quick question, in your example “download_bars.py” you access “b.date, b.open, b.high, b.low, b.close, b.volume, b.barCount, b.average” for the security, do you know if other fields are available to pull in to the CSV file? Specifically, how could I call historical volatility/implied volatility (if available)? Many thanks, Matt K

    1. Matt,

      The bar data is described here. It’s just prices and volume (if you’re getting trades).

      As for historical volatility and implied volatility, that doesn’t appear to be supported by IB (see here). The best I can tell, if you wanted to listen to live data (including delayed data) using the real time market data API, you could get volatility. But historical data is probably going to cost you money from another source.

  4. I have downloaded IB data with API using their R package. I downloaded 1 minute bars for all available contracts (they go back 2 years). For some reason when I backtest algos on older contracts, I get null results, whereas the latest 2-3 contracts provide positive results. Do you know if the older contracts (1 minute bars), have lower quality?

    1. It all depends on the contracts you’re looking at. The documentation says that bar sizes of 30 seconds and lower are limited to 6 months, that futures are limited to two years. Also, data is not available for expired options, symbols that no longer trade, and some symbols that move between exchanges. This data is definitely not going to be of the same quality as what you’d purchase from a vendor specializing in historical data. You especially need to be careful if you’re making assumptions about the data without verifying it carefully, especially if you’re trading real money with it.

      In general, working with US single stocks will have pretty complete historical data, as long as you aren’t ignoring the fact that you’ll be missing data from symbols that no longer trade.

  5. Hey Matt, thank you for the code but I am bit confused on “Note that building bars with last price and bid/ask will require at least two queries (TRADES and BID_ASK), then merging the data together.” Can you please explain this bit more with a short code.

    1. Aadil, thanks for reading. Happy to elaborate further. Bars can have lots of different data in them, and the most basic info is a trade. But sometimes you might be interested in the bid or ask price (or the Bid/Ask average). If you want Trades, Bids, and Asks, you’ll have to pull the data multiple times since IB only lets you ask for one of them at a time. If you used the sample code, you’d have to run it three times, then use something like pandas to read in each bar and merge the data into one larger dataframe with the Trade, Bid, and Ask. I should probably create a pandas merging article off that idea when I get some time.

  6. Dear Matt.

    Thank you for this tutorial.
    From the earlier comments, it seems that only 2 years’ historical data of 1 minute resolution is provided by IB. This is a good start…

    Have you come across any affordable data provider (for intraday 1 min bars) for historical data of US futures?

    1. Hi Karthik, I haven’t really done any searches for alternatives. I know there are a few out there, including the CME itself. As usual, pricing will completely be dependent on your own situation (pro status, length of time, granularity, etc. )

  7. Hey Matt, It’s a really nice script you did there, following an open issue that you already have: TWS>9.81.1 not returning full day of data for futures, any plan of solving that?

    1. Hi Sasha, I saw your ticket and have tried a few things but am not sure what to do to fix it. If I can get some free time I’ll take a look at it, I have a few thought on improving the overall design of the script.

  8. Hi Matt, I tried using your scripts to pull some historical data but the API returns the following error:

    ” ValueError: time data ‘19801212’ does not match format ‘%Y%m%d-%H:%M:%S’ ”

    The change in date/time formatting seems to be a recent one and I am facing this issue with all other python tools I’m using. Is this something that is easily fixed? From the API in TWS I think date time need to be provided with the dash…

  9. Hi Matt, I cloned the latest repo but getting this error below when running your AAPL example (same for AMGN example)
    File “C:\Program Files\Python311\Lib_strptime.py”, line 349, in _strptime
    raise ValueError(“time data %r does not match format %r” %
    ValueError: time data ‘19801212’ does not match format ‘%Y-%m-%d %H:%M:%S%z’st

    Could you please help/suggest?

    Also, I am a bit puzzled that there are no TWS credentials required. Am I just supposed to be logged in TWS ( I was when I got those errors). Thanks!

    1. Alexander – I’d suggest opening an issue in GitHub, it’d be easier to track there. That error looks like you’re getting the head timestamp in the old format. I tested the latest version using the TWS API 10.19.1.

      The other option is to check out an older version of the script that was built using the 9.x version.

    1. Thanks for opening the issue, I pushed a change that should get you going. Please feel free to open other issues as you spot them, and PRs are welcome if anyone has improvements or other fixes.

  10. Hi Matt, do you think it is possible to store all the tick data in real time that I have access to in IB? I suspect it will be terabytes of data.

    1. Hi Rad,

      Storing all tick data surely will get into terabytes in size pretty soon, but you probably won’t be able to get that much out of IB. First, you will have limits on how many instruments you can listen to at once. Second, the APIs don’t send you the full book, just the top of book, and I’m not even sure if they send you every update (i.e. they may conflate some updates even if you try to use the tick by tick apis). You could surely try it out and see how much you can capture. You’d want to use a more efficient storage system than CSV files.

      Trading firms listen to the full data feeds from the exchange and store compressed packet captures, then they replay those captures into their systems. You can learn a bit about this by reading up on Wireshark, tcpdump, etc.

      You can also download historical tick data, but it might take you forever since it appears to be limited to 1000 data points per request.

  11. for some reason I gotta logout and log back in after every time I run it. any idea why that might be? I ran it in debug mode (-d flag) and the only interesting line is “22:16:36,15 ibapi.connection DEBUG socket either closed or broken, disconnecting”

    1. Taylor, when you say logout and back in, do you mean your Trader Workstation? And does it work for one run of the app and then stop working, or does it not work at all? Just wondering if it has never worked, or if it just sometimes doesn’t work. If it’s never worked, then maybe you didn’t finish the steps for API access that are covered in the first post? Another possible issue could be a local firewall getting in the way.

  12. Hey Matt,
    Have you managed to retrieve EURUSD data? The data I see on TWS only goes back as far as 2005, whereas I do indeed see AAPL data going back to the 80’s.

    Do you know why?

    1. Noah, if I go into TWS and pull up the EUR.USD contract, I only see historical data in the UI back to about 2005. I suspect that’s all that IB has from their vendor. They may have back-filled data for US equities, for example.

  13. Hi, Matt

    Much needed tool, thank you!
    I am trying to get continuous ES (or NQ) with this command

    python download_bars.py –security-type CONTFUT –start-date 20100101 –end-date 20230401 –exchange CME –size “4 hours” –max-days ES

    but only getting about 1 month of data, followed by errors referring to the latest ESM3 contract.
    Using the latest TWS Build 10.22.1h, Apr 5, 2023 3:17:15 PM

    I can see ‘durationStr’: ‘1 D’ in the request, is that right?

    22:28:35,83 ibapi.wrapper INFO ANSWER error {‘reqId’: 22, ‘errorCode’: 162, ‘errorString’: ‘Historical Market Data Service error message:HMDS query returned no data: ESM3@CME Trades’}
    22:28:35,85 ibapi.wrapper ERROR ERROR 22 162 Historical Market Data Service error message:HMDS query returned no data: ESM3@CME Trades
    22:28:35,85 root ERROR Error. Id: 22 Code 162 Msg: Historical Market Data Service error message:HMDS query returned no data: ESM3@CME Trades

    Thank you,

    -Alexander

      1. You can take a look at the solution merged in, I think it should resolve this issue. Obviously the project could benefit from some more comprehensive testing tools.

  14. Hi Mat,
    I am using the following query to download data for Indian stocks.

    python download_bars.py -p 4002 –exchange “NSE” –start-date 20150101 –end-date 20201231 –currency “INR” –size “1 hour” KOTAKBANK

    This works but your script does not exit. Is this by design? I have resorted to running your script in a screen session and killing the screen. I suspect I am missing something but cannot determine what.

    In addition, can the script download data for more than 5 years? I suspect it can but I tried a query like

    python download_bars.py -p 4002 –exchange “NSE” –start-date 20100101 –end-date 20201231 –currency “INR” –size “1 hour” KOTAKBANK

    I got about 4 years of data before the connection errored out. This could be an interactive brokers limitation.

    1. Pranav, I don’t have data permissions on NSE, so I can’t reproduce the issue on my end. A few suggestions for you. First, make sure you’re using the latest version from the GitHub repo. Second, you can try the –max-days option to have it fetch as much data as IB says they have. The data should match what you can see in TWS with graphing, I believe.

      Also, when you say the connection errored out, can you give an error message? I’d suggest opening a ticket in GitHub and putting all that info there. I can’t guarantee I will have much time to look at it, but perhaps someone else with NSE permissions will, or I can spot something in the code based on the error message. It could be a pacing violation, for example.

  15. Hello,

    The script freeze after a traceback : (Cannot compare tz-naive and tz-aware timestamps)

    18:03:34,74 ibapi.client DEBUG 140541439586368 connState: None -> 0
    Exception in thread Thread-2 (run):
    Traceback (most recent call last):
    File “/home/xxx/.local/lib/python3.10/site-packages/pandas/core/algorithms.py”, line 1596, in safe_sort
    sorter = values.argsort()
    File “timestamps.pyx”, line 388, in pandas._libs.tslibs.timestamps._Timestamp.richcmp
    TypeError: Cannot compare tz-naive and tz-aware timestamps

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “/home/xxx/.local/lib/python3.10/site-packages/numpy/core/fromnumeric.py”, line 59, in _wrapfunc
    return bound(*args, **kwds)
    File “timestamps.pyx”, line 388, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
    TypeError: Cannot compare tz-naive and tz-aware timestamps

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File “/usr/lib/python3.10/threading.py”, line 1016, in _bootstrap_inner
    self.run()
    File “/usr/lib/python3.10/threading.py”, line 953, in run

    ..

    API version 1026.03

    1. Thanks for reporting the issue Nicolas. I hadn’t run the script in a while and I’m seeing the same issue currently, so there must be some changes in how TWS is returning data. There is an issue in GitHub that has some comments about it, you can follow progress there.

Have anything to say about this topic?