Profiling Python code with py-spy

If you have a Python program that is currently running you may want to understand what the real-world performance profile of the code is. This program could be in a production environment or just on your local machine. You will want to understand where the running program spends its time and if any “hot spots” exist that should be investigated further for improvement. You may be dealing with a production system that is misbehaving and you may want to profile it in an unobtrusive way that doesn’t further impact production performance or require code modifications. What’s a good way to do this? This article will talk about py-spy, a tool that allows you to profile Python programs that are already running.

Deterministic vs. Sampling profilers

In earlier articles, I wrote about two deterministic profilers, cProfile and line_profiler. These profilers are useful when you are developing code and want to profile either sections of code or an entire process. Since they are deterministic, they will tell you exactly how many times a function (or in the case of line_profiler, a line) is executed and how much time it relatively takes to execute compared to the rest of your code. Because these profilers run within the observed process, they slow it down somewhat because they have to do bookkeeping and calculating in the midst of the program execution. For production code, modifying the code or restarting it with a profiler enabled is often not an option.

This is where sampling profilers can be helpful. A sampling profiler looks at an existing process and uses various tricks to determine what the running process is doing. You can manually try this yourself. For example, on linux you can use the pstack <pid> (or gstack <pid>) command to see what your process is doing. On a Mac, you can execute echo "thread backtrace all" | lldb -p <pid> to see something similar. The output will be the stack of all the threads in your process. This works for any process, not just Python programs. For your Python programs, you’ll see the underlying C functions, not your Python functions. In some cases, checking the stack a few times this way may tell you if your process is stuck or where it is slow, provided you can do the translation to your own code. But doing this provides only a single sample in time. Since the process is continually executing, your sample may change each time you run the command (unless it’s blocked or you just happened to be very lucky).

A sampling profiler and surrounding tools take multiple snapshots of the system over time and then provide you with the ability to look over this data and understand where your code is slow.


Py-spy uses system calls (process_vm_readv on Linux, vm_read on OSX, ReadProcessMemory on Windows) to obtain the call stack, then translates that information into the Python function calls that you see in your source code. It samples multiple times per second so it has a good chance of seeing your program in the various states that it will be in over time. It is written in Rust for speed.

Getting py-spy into your project is very simple, it’s installable via pip. To show you how to use it, I’ve created some sample code to profile and observe how py-spy can tell us about a running Python process. If you want to follow along, you can easily reproduce these steps.

First, I setup a new virtual environment using py-env and the pyenv-virtualenv plugin for this project. You can do this or setup a virtual environment using your preferred tool.

# whichever Python version you prefer
pyenv install 3.8.7             
# make our virtualenv (with above version)
pyenv virtualenv 3.8.7 py-spy   
# activate it
pyenv activate py-spy           
# install py-spy
pip install py-spy              
# make sure we pick up the commands in our path
pyenv rehash                    

That’s all there is to it, we now have the tools available. If you run py-spy, you can see the common usage.

$ py-spy
py-spy 0.3.4
Sampling profiler for Python programs

    py-spy <SUBCOMMAND>

    -h, --help       Prints help information
    -V, --version    Prints version information

    record    Records stack trace information to a flamegraph, speedscope or raw file
    top       Displays a top like view of functions consuming CPU
    dump      Dumps stack traces for a target program to stdout
    help      Prints this message or the help of the given subcommand(s)

An example

In order to demonstrate py-spy, I’ve written a simple long-running process what will consume streaming prices from a cryptocurrency exchange and generate a record every minute (this is also known as a bar). The bar contains various information from the past minute. The bar includes the high, low, and last price, the volume, and the Volume Weighted Average Price (VWAP). Right now, the code only logs these values, but could be extended to update a database. While it’s simple, it is a useful example to use since cryptocurrencies trade around the clock and will give us real world data to work with.

I’m using a Coinbase Pro API for Python to access data from the WebSocket feed. Here’s a first cut that has some debugging code left in place (along with two ways to generate the VWAP, one inefficient (the _vwap method) and one more efficient). Let’s see if py-spy reveals how much time this code uses.

This code will end up generating a thread for the WebSocket client. The asyncio loop will set a timer for the next minute boundary to tell the client to log the bar data. It will run until you kill it (with Ctrl-C, for example).

#!/usr/bin/env python

import argparse
import functools
import datetime
import asyncio
import logging

import arrow
import cbpro

class BarClient(cbpro.WebsocketClient):
    def __init__(self, **kwargs):
        self._bar_volume = 0
        self._weighted_price = 0.0
        self._ticks = 0
        self._bar_high = None
        self._bar_low = None
        self.last_msg = {}

        self._pxs = []
        self._volumes = []

    def next_minute_delay(self):
        delay = ('minutes') -
        return (delay.seconds + delay.microseconds/1e6)

    def _vwap(self):
        if len(self._pxs):
            wp = sum([x*y for x,y in zip(self._pxs, self._volumes)])
            v = sum(self._volumes)

            return wp/v

    def on_message(self, msg):
        if 'last_size' in msg and 'price' in msg:
            last_size = float(msg['last_size'])
            price = float(msg['price'])
            self._bar_volume += last_size
            self._weighted_price += last_size * price
            self._ticks += 1
            if self._bar_high is None or price > self._bar_high:
                self._bar_high = price
            if self._bar_low is None or price < self._bar_low:
                self._bar_low = price
            logging.debug("VWAP: %s", self._vwap())
        self.last_msg = msg
        logging.debug("Message: %s", msg)

    def on_bar(self, loop):
        if self.last_msg is not None:
            if self._bar_volume == 0:
                self.last_msg['vwap'] = None
                self.last_msg['vwap'] = self._weighted_price/self._bar_volume
            self.last_msg['bar_bar_volume'] = self._bar_volume
            self.last_msg['bar_ticks'] = self._ticks
            self.last_msg['bar_high'] = self._bar_high
            self.last_msg['bar_low'] = self._bar_low
            last = self.last_msg.get('price')
            if last:
                last = float(last)
            self._bar_high = last
            self._bar_low = last
  "Bar: %s", self.last_msg)
        self._bar_volume = 0
        self._weighted_price = 0.0
        self._ticks = 0
        // reschedule
        loop.call_at(loop.time() + self.next_minute_delay(),
                     functools.partial(self.on_bar, loop))
def main():
    argparser = argparse.ArgumentParser()
    argparser.add_argument("--product", default="BTC-USD",
                           help="coinbase product")
    argparser.add_argument('-d', "--debug", action='store_true',
                           help="debug logging")
    args = argparser.parse_args()

    cfg = {"format": "%(asctime)s - %(levelname)s -  %(message)s"}
    if args.debug:
        cfg["level"] = logging.DEBUG
        cfg["level"] = logging.INFO


    client = BarClient(url="wss://",

    loop = asyncio.get_event_loop()
    loop.call_at(loop.time() + client.next_minute_delay(), functools.partial(client.on_bar, loop))


if __name__ == '__main__':

Running the example

To run this code, you’ll need to install a few extra modules. The cbpro module is a simple Python wrapper of the Coinbase APIs. Arrow is a nice library for doing datetime logic.

pip install arrow cbpro

Now, you can run the code with debug logging and hopefully see some messages, depending on how busy the exchange is.

 ./ -d
2021-03-14 17:20:12,828 - DEBUG -  Using selector: KqueueSelector
-- Subscribed! --

2021-03-14 17:20:13,059 - DEBUG -  Message: {'type': 'subscriptions', 'channels': [{'name': 'ticker', 'product_ids': ['BTC-USD']}]}
2021-03-14 17:20:13,060 - DEBUG -  VWAP: 60132.57

Profiling the example

Now, let’s review the py-spy commands. First, using the dump command will give us a quick view of the stack, translated to Python functions. 

A quick side note here, if you’re using a Mac you will need to run py-spy as sudo. On Linux, it depends on your security settings. Also, since I was using pyenv I needed to pass on my environment to sudo using the -E flag so it picks up the right Python interpreter and the py-spy script in the path. I obtained the process id for my process using the ps command in my shell (in my case it was 97520). 

py-spy dump

 sudo -E py-spy dump -p 97520
Process 97520: /Users/mcw/.pyenv/versions/py-spy/bin/python ./ -d
Python v3.8.7 (/Users/mcw/.pyenv/versions/3.8.7/bin/python3.8)

Thread 0x113206DC0 (idle): "MainThread"
    select (
    _run_once (asyncio/
    run_forever (asyncio/
    main (
    <module> (
Thread 0x700009CAA000 (idle): "Thread-1"
    read (
    recv (
    recv (websocket/
    _recv (websocket/
    recv_strict (websocket/
    recv_header (websocket/
    recv_frame (websocket/
    recv_frame (websocket/
    recv_data_frame (websocket/
    recv_data (websocket/
    recv (websocket/
    _listen (cbpro/
    _go (cbpro/
    run (
    _bootstrap_inner (
    _bootstrap (

You can see there’s two threads running. One is reading data, the other is in select in the run loop. This is only useful for profiling if our program is stuck. One really nice feature though is if you give it the --locals option, it will show you any local variables, which can be really helpful for debugging!

py-spy top

The next command to try is top

sudo -E py-spy top -p 97520

This will bring up an interface that looks very similar to the unix top command. As your program runs and py-spy gathers samples, it will show you where it is spending time. Here is a screenshot of what that looked like for me after about 30 seconds.

py-spy top output

py-spy record

Finally, you can record data using py-spy for later analysis or output. There is a raw format, speedscope format, and a flamegraph output. You can specify the amount of time you want to collect data (in seconds), or just let it collect data until you exit the program using Ctrl-C. For example, this command will generate a useful little SVG file flamegraph that you can interact with in a web browser.

sudo -E py-spy record -p 97520 --output py-spy.svg

You can also export the data in speedscope format and then upload it to the speedscope web tool for further analysis. This is a great tool for interactively seeing how your code executes.

I’d encourage you to run this code on your own and play with both the speedscope tool and the SVG output, but here’s two screen shots that help explain how it works. This first view is the overall SVG output. If you hover over the cells, it will show you the function details. You can see that most of the time is spent in the WebSocket client _listen method. But the on_message method shows up to the right of that (designated by the arrow)

py-spy svg output

If we click on it, we get a detailed view.

py-spy svg detailed output

In my case, I see that my list comprehension and logging in the unneeded _vwap method show up fairly high in the profile. I can easily remove this method (and the extra prices and volumes that I was tracking) since the VWAP can be calculated with just a running product and total volume (as I’m doing already in the code). It’s also interesting to see when the script is run in debug mode how much time logging takes


In summary, I’d encourage you to try out py-spy on some of your code. If you try to predict where your code will spend its time, how correct are you? Are there any findings that surprise you? Maybe compare the output of py-spy to a deterministic profiler like line_profiler.

I hope this overview of py-spy has been helpful and that you can deploy this tool in diagnosing performance issues in your own code.

Don't miss any articles!

If you like this article, give me your email and I'll send you my latest articles along with other helpful links and tips with a focus on Python, pandas, and related tools.

Invalid email address
I promise not to spam you, and you can unsubscribe at any time.

Have anything to say about this topic?