Profiling Python code with memory_profiler

What do you do when your Python program is using too much memory? How do you find the spots in your code with memory allocation, especially in large chunks? It turns out that there is not usually an easy answer to these question, but a number of tools exist that can help you figure out where your code is allocating memory. In this article, I’m going to focus on one of them, memory_profiler.

The memory_profiler tool is similar in spirit (and inspired by) the line_profiler tool , which I’ve written about as well. Whereas line_profiler tells you how much time is spent on each line, memory_profiler tells you how much memory is allocated (or freed) by each line. This allows you to see the real impact of each line of code and get a sense where memory usage. While the tool is quite helpful, there’s a few things to know about it to use it effectively. I’ll cover some details in this article.

Installation

memory_profiler is written in Python and can be installed using pip. The package will include the library, as well as a few command line utilities. 

pip install memory_profiler

It uses the psutil library (or can use tracemalloc or posix) to access process information in a cross platform way, so it works on Windows, Mac, and Linux.

Basic profiling

memory_profiler is a set of tools for profiling a Python program’s memory usage, and the documentation gives a nice overview of those tools. The tool that provides the most detail is the line-by-line memory usage that the module will report when profiling a single function. You can obtain this by running the module from the command line against a python file. It’s also available via Juypyter/IPython magics, or in your own code. I’ll cover all those options in this article. 

I’ve extended the example code from the documentation to show several ways that you might see memory grow and be reclaimed in Python code, and what the line-by-line output looks like on my computer. Using the sample code below, saved in a source file (performance_memory_profiler.py), you can follow along by running the profile yourself.

from functools import lru_cache

from memory_profiler import profile

import pandas as pd
import numpy as np

@profile
def simple_function():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a

@profile
def simple_function2():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 8)
    del b
    return a

@lru_cache
def caching_function(size):
    return np.ones(size)


@profile
def test_caching_function():
    for i in range(10_000):
        caching_function(i)

    for i in range(10_000,0,-1):
        caching_function(i)


if __name__ == '__main__':
    simple_function()
    simple_function()
    simple_function2()
    test_caching_function()

Running memory_profiler

To provide line-by-line results, memory_profiler requires that a method be decorated with the @profile decorator. Just add this to the methods you want to profile, I have done this with three methods above. Then you’ll need a way to actually execute those methods, such as a command line script. Running a unit test can work as well, as long as you can run it from the command line. You do this by running the memory_profiler module and supplying the Python script that drives your code. You can give it a -h to see the help:

$ python -m memory_profiler -h
usage: python -m memory_profiler script_file.py

positional arguments:
  program               python script or module followed by command line arguements to run

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --pdb-mmem MAXMEM     step into the debugger when memory exceeds MAXMEM
  --precision PRECISION
                        precision of memory output in number of significant digits
  -o OUT_FILENAME       path to a file where results will be written
  --timestamp           print timestamp instead of memory measurement for decorated functions
  --include-children    also include memory used by child processes
  --backend {tracemalloc,psutil,posix}
                        backend using for getting memory info (one of the {tracemalloc, psutil, posix})

To view the results from the sample program, just run it with the defaults. Since we marked three of the functions with the @profile decorator, all three invocations will be printed. Be careful of profiling a method or function that is invoked many times, it will print a result for each invocation. Below are the results from my computer, and I’ll explain more about the run below. For each function, we get the source line number on the left, the actual Python source code on the right, and three metrics for each line. First, the memory usage of the entire process when that line of code was executed, how much of an increment (positive numbers) or decrement (negative numbers) of memory occured for that line, and how many times that line was executed.

$ python -m memory_profiler performance_memory_profiler.py
Filename: performance_memory_profiler.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     8     67.2 MiB     67.2 MiB           1   @profile
     9                                         def simple_function():
    10     74.8 MiB      7.6 MiB           1       a = [1] * (10 ** 6)
    11    227.4 MiB    152.6 MiB           1       b = [2] * (2 * 10 ** 7)
    12    227.4 MiB      0.0 MiB           1       del b
    13    227.4 MiB      0.0 MiB           1       return a


Filename: performance_memory_profiler.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     8    227.5 MiB    227.5 MiB           1   @profile
     9                                         def simple_function():
    10    235.1 MiB      7.6 MiB           1       a = [1] * (10 ** 6)
    11    235.1 MiB      0.0 MiB           1       b = [2] * (2 * 10 ** 7)
    12    235.1 MiB      0.0 MiB           1       del b
    13    235.1 MiB      0.0 MiB           1       return a


Filename: performance_memory_profiler.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    15    235.1 MiB    235.1 MiB           1   @profile
    16                                         def simple_function2():
    17    235.1 MiB      0.0 MiB           1       a = [1] * (10 ** 6)
    18   1761.0 MiB   1525.9 MiB           1       b = [2] * (2 * 10 ** 8)
    19    235.1 MiB  -1525.9 MiB           1       del b
    20    235.1 MiB      0.0 MiB           1       return a


Filename: performance_memory_profiler.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    27    235.1 MiB    235.1 MiB           1   @profile
    28                                         def test_caching_function():
    29    275.6 MiB      0.0 MiB       10001       for i in range(10_000):
    30    275.6 MiB     40.5 MiB       10000           caching_function(i)
    31
    32    280.6 MiB      0.0 MiB       10001       for i in range(10_000,0,-1):
    33    280.6 MiB      5.0 MiB       10000           caching_function(i)

Interpreting the results

If you check the official docs, you’ll see slightly different results in their example output than mine when I executed simple_function. For instance, in my first two invocations of the function, the del seems to have no effect, whereas their example shows memory being freed. This is because Python is a garbage collected language, and so del is not the same as freeing memory in a language like c or c++. You can see that the memory spiked on the first invocation of the method, but then on the second invocation no new memory was needed for creating b a second time. To clarify this point, I added another method, simple_function2 that creates a bigger list, and this time we see that the memory is freed, the garbage collector decided it wanted to reclaim that memory. This is just one example of how profiling code may require multiple runs with varied input data to get realistic results for your code. Also consider the hardware used; production issues may not match a development workstation. Just as much time may be needed to craft a good test program as to interpret the results and deciding how to improve things.

The second thing to note from my results is the profiling of caching_function. Note that the test driver runs through the function with 10,000 values, but then runs through them again in reverse. The cache will get hit for the first 128 calls (the default size of the functools.lru_cache function decorator. We see that there is much less memory growth the second time around (this is both because of the cache hits and the garbage collector not reclaiming previously allocated memory). In general, look for continual or large memory increments without decrements. Also look for cases where memory grows every time the function is called, even if it’s in smaller amounts.

Indexing in pandas can be so confusing

There are so many ways to do the same thing! What is the difference between .loc, .iloc, .ix, and []?  You can read the official documentation but there's so much of it and it seems so confusing. You can ask a question on Stack Overflow, but you're just as likely to get too many different and confusing answers as no answer at all. And existing answers don't fit your scenario.

You just need to get started with the basics.

What if you could quickly learn the basics of indexing and selecting data in pandas with clear examples and instructions on why and when you should use each one? What if the examples were all consistent, used realistic data, and included extra relevant background information?

Master the basics of pandas indexing with my free ebook. You'll learn what you need to get comfortable with pandas indexing. Covered topics include:

  • what an index is and why it is needed
  • how to select data in both a Series and DataFrame.
  • the difference between .loc, .iloc, .ix, and [] and when (and if) you should use them.
  • slicing, and how pandas slicing compares to regular Python slicing
  • boolean indexing
  • selecting via callable
  • how to use where and mask.
  • how to use query, and how it can help performance
  • time series indexing

Because it's highly focused, you'll learn the basics of indexing and be able to fall back on this knowledge time and again as you use other features in pandas.

Just give me your email and you'll get the free 57 page e-book, along with helpful articles about Python, pandas, and related technologies once or twice a month. Unsubscribe at any time.

Invalid email address

Profiling in regular code

If the function decorator is imported in your code (as above) and run as normal, profiling data is sent to stdout. This can be a handy way to profile single methods quickly. You can annotate any function and just run your code using whichever scripts you normally use. Note you can send this output to a file or log it using the logging module as well. See the docs for details.

Jupyter/IPython magics

The memory_profiler project also includes Jupyter/IPython magics, which can be useful. It’s very important to note that to get line-by-line output (as of the most recent version as of this writing – v0.58), code has to be saved in local Python source files, it can’t be read directly from notebooks or the IPython interpreter. But the magics can still be useful for debugging memory issues. To use them, load the extension.

%load_ext memory_profiler

mprun

The %mprun magic is similar to running the functions as described above, but you can do some more ad-hoc checking. First, just import the functions, then run them. Note that I found it didn’t seem to play well with autoreload, so your mileage may vary in trying to modify code and test it without doing a full kernel restart.

from performance_memory_profiler import test_caching_function, simple_function
%mprun -f simple_function simple_function()
Filename: /Users/mcw/projects/python_blogposts/performance/performance_memory_profiler.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     8     76.4 MiB     76.4 MiB           1   @profile
     9                                         def simple_function():
    10     84.0 MiB      7.6 MiB           1       a = [1] * (10 ** 6)
    11    236.6 MiB    152.6 MiB           1       b = [2] * (2 * 10 ** 7)
    12    236.6 MiB      0.0 MiB           1       del b
    13    236.6 MiB      0.0 MiB           1       return a

memit

The %memit and %%memit magics are helpful for checking what the peak memory and incremental memory growth is for the code executed. You don’t get line-by-line output, but this can allow for interactive debugging and testing.

%%memit
range(1000)
peak memory: 237.00 MiB, increment: 0.32 MiB

Looking at specific objects, not using memory_profiler

Let’s just look quickly at Numpy and pandas objects and how we can see the memory usage of those objects. These two libraries and their objects are very likely to be large for many use cases. For newer versions of the libraries, you can use sys.get_size_of to see their memory usage. Under the hood, pandas objects will just call their memory_usage method, which you can also use directly. Note that you need to specify deep=True if you also want to see the memory usage of objects in pandas containers.

import sys

import numpy as np
import pandas as pd

def make_big_array():
    x = np.ones(int(1e7))
    return x

def make_big_string_array():
    x = np.array([str(i) for i in range(int(1e7))])
    return x

def make_big_series():
    return pd.Series(np.ones(int(1e7)))

def make_big_string_series():
    return pd.Series([str(i) for i in range(int(1e7))])

arr = make_big_array()
arr2 = make_big_string_array()
ser = make_big_series()
ser2 = make_big_string_series()

print("arr: ", sys.getsizeof(arr), arr.nbytes)
print("arr2: ", sys.getsizeof(arr2), arr2.nbytes)
print("ser: ", sys.getsizeof(ser))
print("ser2: ", sys.getsizeof(ser2))
print("ser: ", ser.memory_usage(), ser.memory_usage(deep=True))
print("ser2: ", ser2.memory_usage(), ser2.memory_usage(deep=True))
arr:  80000096 80000000
arr2:  280000096 280000000
ser:  80000144
ser2:  638889034
ser:  80000128 80000128
ser2:  80000128 638889018
%memit make_big_string_series()
peak memory: 1883.11 MiB, increment: 780.45 MiB
%%memit
x = make_big_string_series()
del x
peak memory: 1883.14 MiB, increment: 696.07 MiB

Two things to point out there. First, you can see the size of a Series of int objects is the same whether you use deep=True or not. For string objects, the size of the object is the same as the int Series, but the underlying objects are much bigger. You can see that our Series that is made of strings objects is over 600MiB, and using %memit we can see that an increment when we invoke the function. This tool will help you narrow down which functions allocate the most memory and should be investigated further with line-by-line profiling.

Further investigation

The memory_profile project also has tools for investigating longer running programs and seeing how memory grows over time. Check out the mprofcommand for that functionality. It also supports tracking memory in forked processing in a multiprocessing context. 

Conclusion

Debugging memory issues can be a very difficult and laborious process, but having a few tools to help understand where the memory is being allocated can be very helpful in moving the debugging sessions along. When used along with other profiling tools, such as line_profiler or py-spy, you can get a much better idea of where your code needs improvement.

Have anything to say about this topic?