Profiling Python code with line_profiler

Once we have debugged, working, readable (and hopefully testable) code, it may become important to examine it more closely and try to improve the code’s performance. Before we can make any progress in determining if our changes are an improvement, we need to measure the current performance and see where it is spending its time. In another article, I wrote about using the built in Python profiler, cProfile, to obtain a profile of some Python code. In this article, we will look at another tool called line_profiler. The results from cProfile show the relative performance of every function called from within the code being profiled. Because of this, the granularity you see is not always going to obviously indicate and isolate your slowest lines of code. Sometimes cProfile is not the best tool to use. It turns out lots of typical code will either:

  • do a lot of work without calling many functions
  • makes function calls that in turn make a lot of other functions calls

Both of these situations can make the output of cProfile pretty useless or hard to wade through.

Let’s look at an example. Here’s some code that takes a sample DataFrame of names and phone numbers and just splits the name into first and last name. Pretty trivial (and horribly incorrect for many real world names), but I’ll write the first pass of code with some obvious (or maybe not so obvious) performance problems.

Note this code was executed in Python 3.8 with pandas 1.1.4. I am also using the faker module.

import pandas as pd
import numpy as np

import logging

from faker import Faker

fake = Faker()

logging.basicConfig(level=logging.INFO)

df = pd.DataFrame([(fake.name(), fake.phone_number()) for _ in range(1000)], columns=['name', 'phone'])
df.head(3)

def function_with_issues(df):
    logging.debug(f'called with {df}') 
    def _get_first_name(n):
        return n.split()[0]
    def _get_last_name(n):
        return ' '.join(n.split()[1:])
    
    df['first'] = df['name'].apply(_get_first_name)
    df['last'] = df['name'].apply(_get_last_name)
    
    return df

Profiling with cProfile

I will also assume that this code has been tested and yields the desired results, i.e. the resulting DataFrame has first and last names as desired. Now just by inspection, you very well may be able to spot performance problems here and have plans to fix them, but let’s see what a profiler will tell us.

import cProfile
with cProfile.Profile() as pf:
    function_with_issues(df)
    pf.print_stats()
20691 function calls (20137 primitive calls) in 0.020 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(all)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(append)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(atleast_2d)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(bincount)
       12    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(concatenate)
       17    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(copyto)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ndim)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(prod)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ravel)
       33    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1017(_handle_fromlist)
        1    0.000    0.000    0.020    0.020 <ipython-input-1-6f2894cfab86>:15(function_with_issues)
     1000    0.000    0.000    0.000    0.000 <ipython-input-1-6f2894cfab86>:17(_get_first_name)
<snip>

I cut off the output but it goes on for pages. Good luck with using that output to figure out the issue! You can take this profile data and try to sort it and figure out what is happening, but essentially by using pandas you have brought in a fairly complex set of modules that will generate lots of function calls. The default output from cProfile make that hard to understand. Where do you start to try to speed this up?

This is where line_profiler comes in. It is another profiling tool, not included with Python by default, that you can use to profile individual lines in a function so you can see which lines are expensive.

You can install it with pip (in your notebook using %pip or on the command line with pip.

%pip install line_profiler

I found the line_profiler documentation a little bit confusing to get started quickly, so I’ll try to boil it down to the basic usage here so you can quickly get it running in your environment. Once you install it with pip, you should be able to use it one of three ways:

  • import it use the profiler directly in Python code
  • use the command line tool (kernprof) to run your code in another script
  • use the IPython/Jupyter magic (lprun) to run code in a session or notebook

I’ll go through all three of these now.

Running the line_profiler in code

The first way you could choose to use line_profiler is directly in code. This is probably not the way you’ll likely choose to use it in the end, but I think it’s helpful for understanding the mechanics of the tool. Once you import the module, you can make an instance of the LineProfiler, just like you would with cProfile.Profile. You then need to tell the profiler which functions to profile. Finally, you need to run a wrapper function that will cause those functions to execute. You can then print the stats generated by the profiler (I’d encourage you to look at the raw results since these can be fairly verbose outputs, and formatting can be challenging to read here).

import line_profiler

lp = line_profiler.LineProfiler()

lp.add_function(function_with_issues)

# this could be any existing function as well, you don't have to write this from scratch
def wrapper_function():
    function_with_issues(df)

wrapper = lp(wrapper_function)
wrapper()

lp.print_stats()
Timer unit: 1e-06 s

Total time: 0.019832 s
File: <ipython-input-1-6f2894cfab86>
Function: function_with_issues at line 15

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    15                                           def function_with_issues(df):
    16         1      17360.0  17360.0     87.5      logging.debug(f'called with {df}')
    17         1          2.0      2.0      0.0      def _get_first_name(n):
    18                                                   return n.split()[0]
    19         1          1.0      1.0      0.0      def _get_last_name(n):
    20                                                   return ' '.join(n.split()[1:])
    21
    22         1       1323.0   1323.0      6.7      df['first'] = df['name'].apply(_get_first_name)
    23         1       1144.0   1144.0      5.8      df['last'] = df['name'].apply(_get_last_name)
    24
    25         1          2.0      2.0      0.0      return df

Total time: 0.019874 s
File: <ipython-input-3-b55d0a54bd85>
Function: wrapper_function at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           def wrapper_function():
     9         1      19874.0  19874.0    100.0      function_with_issues(df)

The results

Now you can see the results of your line profile. Instead of telling you all the underlying code executed by your few lines of pandas code, you just see how much time is spent on each line. Note that the timer unit is at the top of the output (microseconds in this case), and if you multiply the time by this unit, you will get the time spent on that line.

Before we start discussing improving these times, let’s review the other ways you might choose to run the line_profiler.

Using magic

If you’re running code in a notebook or IPython session, you’ll likely want to use the lprun magic. Just load the line_profiler extension and then you can run the code in one line. Just tell it which functions to profile using the -f switch, and run the driver function on the same line. There are other options for dumping results and running entire modules, you can check the docs or use %lprun? in your notebook to read the help.

%load_ext line_profiler
%lprun -f function_with_issues function_with_issues(df)

Using kernprof

You can also run the kernprof tool from the command line. In that case, you just need to mark your code with the decorator @profile, then run the script from the comand line. This generates a binary file with the profile data.

You then can read the profile data and output the result.

For the example above, we could put the following code in a single python file (performance_line_profiler.py).

import pandas as pd
import numpy as np

import logging

from faker import Faker

fake = Faker()

@profile
def function_with_issues(df):
    logging.debug(f'called with {df}')
    def _get_first_name(n):
        return n.split()[0]
    def _get_last_name(n):
        return ' '.join(n.split()[1:])

    df['first'] = df['name'].apply(_get_first_name)
    df['last'] = df['name'].apply(_get_last_name)

    return df

if __name__ == '__main__':
    df = pd.DataFrame([(fake.name(), fake.phone_number()) for _ in range(1000)], columns=['name', 'phone'])
    function_with_issues(df)

Then, you’d need to make sure that you have the Python environment loaded on the command line where you installed line_profiler using pip. I use pyenv and virtualenv to create a specific virtualenv that I activated and used for this example. In a session with the virtualenv activated the kernprof tool is available.

$ kernprof -l performance_line_profiler.py
Wrote profile results to performance_line_profiler.py.lprof

Then, you can view the results, like this:

$ python -m line_profiler performance_line_profiler.py.lprof

You’ll get roughly the same results you did earlier.

The speedup

Now, circling back to our profile results. First we see the code is spending a large amount of time in a logging debug statement, even though you’ll see that we are not running at debug level. I thought logging was supposed to be essentially a no-op if you aren’t running at that level? What’s going on?

The lesson here is to not eagerly format your log strings. When you’re logging, you should recognize you are sending a format string as the first argument, with the format arguments being passed in but not applied. If you are not logging that statement, you don’t want to spend any time doing any string formatting.

Second, we can avoid calling apply with a function on our dataframe and use the built in string functions in pandas and do the name splitting with fewer lines of code. Here’s a first cut, which we can quickly profile.

def updated_function(df):
    logging.debug('called with %s', df)

    splits = df['name'].str.split().str
    df['first'] = splits[0]
    df['last'] = splits[1:].str.join(' ')

    return df
%lprun -f updated_function updated_function(df)
Timer unit: 1e-06 s

Total time: 0.006248 s
File: <ipython-input-5-c801b8557f25>
Function: updated_function at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def updated_function(df):
     2         1         13.0     13.0      0.2      logging.debug('called with %s', df)
     3
     4         1       2348.0   2348.0     37.6      splits = df['name'].str.split().str
     5         1       2001.0   2001.0     32.0      df['first'] = splits[0]
     6         1       1885.0   1885.0     30.2      df['last'] = splits[1:].str.join(' ')
     7
     8         1          1.0      1.0      0.0      return df

Well, we see right away that removing the eager logging format was wise, it’s now basically a no-op. But it turns out this new approach is not faster than our old apply-based method for this test data. It turns out creating a temporary variable to store the result of the split costs some time, and accessing the split data is also slow. We could do the split twice, but it’s still slower than the old approach. It turns out that using apply and string splitting on the values is faster than using the pandas built-in str accessors. This was a bit of a surprise to me, and I’m sure we could dig into this more to understand why. Maybe that’s another article for another time. But the lesson is it pays to profile! Just to reinforce this with a much simpler example, we can use %timeit to observe these differences for splitting.

> %timeit df['name'].str.split().str[0]
1.07 ms ± 70.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

> %timeit df['name'].apply(lambda x: x.split()[0])
443 µs ± 39.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If we had made multiple changes at once to this code only looking at the net results for the function call using something like %timeit we may have actually slowed some code down without realizing it. 

Using a tool like line_profiler is one way to spotlight where your code is spending its time and know where you should focus to speed things up. It will also help you make decisions about which speed improvements you tried are actually helping you reach your end goal.

Want to know how to profile running code, including code that might be in production? Check out py-spy.

Have anything to say about this topic?