What do you do when your Python program is using too much memory? How do you find the spots in your code with memory allocation, especially in large chunks? It turns out that there is not usually an easy answer to these question, but a number of tools exist that can help you figure out where your code is allocating memory. In this article, I’m going to focus on one of them, memory_profiler
.
The memory_profiler
tool is similar in spirit (and inspired by) the line_profiler
tool , which I’ve written about as well. Whereas line_profiler
tells you how much time is spent on each line, memory_profiler
tells you how much memory is allocated (or freed) by each line. This allows you to see the real impact of each line of code and get a sense where memory usage. While the tool is quite helpful, there’s a few things to know about it to use it effectively. I’ll cover some details in this article.
Installation
memory_profiler
is written in Python and can be installed using pip. The package will include the library, as well as a few command line utilities.
pip install memory_profiler
It uses the psutil
library (or can use tracemalloc or posix) to access process information in a cross platform way, so it works on Windows, Mac, and Linux.
Basic profiling
memory_profiler
is a set of tools for profiling a Python program’s memory usage, and the documentation gives a nice overview of those tools. The tool that provides the most detail is the line-by-line memory usage that the module will report when profiling a single function. You can obtain this by running the module from the command line against a python file. It’s also available via Juypyter/IPython magics, or in your own code. I’ll cover all those options in this article.
I’ve extended the example code from the documentation to show several ways that you might see memory grow and be reclaimed in Python code, and what the line-by-line output looks like on my computer. Using the sample code below, saved in a source file (performance_memory_profiler.py
), you can follow along by running the profile yourself.
from functools import lru_cache from memory_profiler import profile import pandas as pd import numpy as np @profile def simple_function(): a = [1] * (10 ** 6) b = [2] * (2 * 10 ** 7) del b return a @profile def simple_function2(): a = [1] * (10 ** 6) b = [2] * (2 * 10 ** 8) del b return a @lru_cache def caching_function(size): return np.ones(size) @profile def test_caching_function(): for i in range(10_000): caching_function(i) for i in range(10_000,0,-1): caching_function(i) if __name__ == '__main__': simple_function() simple_function() simple_function2() test_caching_function()
Running memory_profiler
To provide line-by-line results, memory_profiler
requires that a method be decorated with the @profile
decorator. Just add this to the methods you want to profile, I have done this with three methods above. Then you’ll need a way to actually execute those methods, such as a command line script. Running a unit test can work as well, as long as you can run it from the command line. You do this by running the memory_profiler
module and supplying the Python script that drives your code. You can give it a -h
to see the help:
$ python -m memory_profiler -h usage: python -m memory_profiler script_file.py positional arguments: program python script or module followed by command line arguements to run optional arguments: -h, --help show this help message and exit --version show program's version number and exit --pdb-mmem MAXMEM step into the debugger when memory exceeds MAXMEM --precision PRECISION precision of memory output in number of significant digits -o OUT_FILENAME path to a file where results will be written --timestamp print timestamp instead of memory measurement for decorated functions --include-children also include memory used by child processes --backend {tracemalloc,psutil,posix} backend using for getting memory info (one of the {tracemalloc, psutil, posix})
To view the results from the sample program, just run it with the defaults. Since we marked three of the functions with the @profile
decorator, all three invocations will be printed. Be careful of profiling a method or function that is invoked many times, it will print a result for each invocation. Below are the results from my computer, and I’ll explain more about the run below. For each function, we get the source line number on the left, the actual Python source code on the right, and three metrics for each line. First, the memory usage of the entire process when that line of code was executed, how much of an increment (positive numbers) or decrement (negative numbers) of memory occured for that line, and how many times that line was executed.
$ python -m memory_profiler performance_memory_profiler.py Filename: performance_memory_profiler.py Line # Mem usage Increment Occurences Line Contents ============================================================ 8 67.2 MiB 67.2 MiB 1 @profile 9 def simple_function(): 10 74.8 MiB 7.6 MiB 1 a = [1] * (10 ** 6) 11 227.4 MiB 152.6 MiB 1 b = [2] * (2 * 10 ** 7) 12 227.4 MiB 0.0 MiB 1 del b 13 227.4 MiB 0.0 MiB 1 return a Filename: performance_memory_profiler.py Line # Mem usage Increment Occurences Line Contents ============================================================ 8 227.5 MiB 227.5 MiB 1 @profile 9 def simple_function(): 10 235.1 MiB 7.6 MiB 1 a = [1] * (10 ** 6) 11 235.1 MiB 0.0 MiB 1 b = [2] * (2 * 10 ** 7) 12 235.1 MiB 0.0 MiB 1 del b 13 235.1 MiB 0.0 MiB 1 return a Filename: performance_memory_profiler.py Line # Mem usage Increment Occurences Line Contents ============================================================ 15 235.1 MiB 235.1 MiB 1 @profile 16 def simple_function2(): 17 235.1 MiB 0.0 MiB 1 a = [1] * (10 ** 6) 18 1761.0 MiB 1525.9 MiB 1 b = [2] * (2 * 10 ** 8) 19 235.1 MiB -1525.9 MiB 1 del b 20 235.1 MiB 0.0 MiB 1 return a Filename: performance_memory_profiler.py Line # Mem usage Increment Occurences Line Contents ============================================================ 27 235.1 MiB 235.1 MiB 1 @profile 28 def test_caching_function(): 29 275.6 MiB 0.0 MiB 10001 for i in range(10_000): 30 275.6 MiB 40.5 MiB 10000 caching_function(i) 31 32 280.6 MiB 0.0 MiB 10001 for i in range(10_000,0,-1): 33 280.6 MiB 5.0 MiB 10000 caching_function(i)
Interpreting the results
If you check the official docs, you’ll see slightly different results in their example output than mine when I executed simple_function
. For instance, in my first two invocations of the function, the del
seems to have no effect, whereas their example shows memory being freed. This is because Python is a garbage collected language, and so del
is not the same as freeing memory in a language like c
or c++
. You can see that the memory spiked on the first invocation of the method, but then on the second invocation no new memory was needed for creating b
a second time. To clarify this point, I added another method, simple_function2
that creates a bigger list, and this time we see that the memory is freed, the garbage collector decided it wanted to reclaim that memory. This is just one example of how profiling code may require multiple runs with varied input data to get realistic results for your code. Also consider the hardware used; production issues may not match a development workstation. Just as much time may be needed to craft a good test program as to interpret the results and deciding how to improve things.
The second thing to note from my results is the profiling of caching_function
. Note that the test driver runs through the function with 10,000 values, but then runs through them again in reverse. The cache will get hit for the first 128 calls (the default size of the functools.lru_cache
function decorator. We see that there is much less memory growth the second time around (this is both because of the cache hits and the garbage collector not reclaiming previously allocated memory). In general, look for continual or large memory increments without decrements. Also look for cases where memory grows every time the function is called, even if it’s in smaller amounts.
Profiling in regular code
If the function decorator is imported in your code (as above) and run as normal, profiling data is sent to stdout. This can be a handy way to profile single methods quickly. You can annotate any function and just run your code using whichever scripts you normally use. Note you can send this output to a file or log it using the logging
module as well. See the docs for details.
Jupyter/IPython magics
The memory_profiler
project also includes Jupyter/IPython magics, which can be useful. It’s very important to note that to get line-by-line output (as of the most recent version as of this writing – v0.58), code has to be saved in local Python source files, it can’t be read directly from notebooks or the IPython interpreter. But the magics can still be useful for debugging memory issues. To use them, load the extension.
%load_ext memory_profiler
mprun
The %mprun
magic is similar to running the functions as described above, but you can do some more ad-hoc checking. First, just import the functions, then run them. Note that I found it didn’t seem to play well with autoreload
, so your mileage may vary in trying to modify code and test it without doing a full kernel restart.
from performance_memory_profiler import test_caching_function, simple_function
%mprun -f simple_function simple_function() Filename: /Users/mcw/projects/python_blogposts/performance/performance_memory_profiler.py Line # Mem usage Increment Occurences Line Contents ============================================================ 8 76.4 MiB 76.4 MiB 1 @profile 9 def simple_function(): 10 84.0 MiB 7.6 MiB 1 a = [1] * (10 ** 6) 11 236.6 MiB 152.6 MiB 1 b = [2] * (2 * 10 ** 7) 12 236.6 MiB 0.0 MiB 1 del b 13 236.6 MiB 0.0 MiB 1 return a
memit
The %memit
and %%memit
magics are helpful for checking what the peak memory and incremental memory growth is for the code executed. You don’t get line-by-line output, but this can allow for interactive debugging and testing.
%%memit range(1000) peak memory: 237.00 MiB, increment: 0.32 MiB
Looking at specific objects, not using memory_profiler
Let’s just look quickly at Numpy and pandas objects and how we can see the memory usage of those objects. These two libraries and their objects are very likely to be large for many use cases. For newer versions of the libraries, you can use sys.get_size_of
to see their memory usage. Under the hood, pandas objects will just call their memory_usage
method, which you can also use directly. Note that you need to specify deep=True
if you also want to see the memory usage of objects in pandas containers.
import sys import numpy as np import pandas as pd def make_big_array(): x = np.ones(int(1e7)) return x def make_big_string_array(): x = np.array([str(i) for i in range(int(1e7))]) return x def make_big_series(): return pd.Series(np.ones(int(1e7))) def make_big_string_series(): return pd.Series([str(i) for i in range(int(1e7))]) arr = make_big_array() arr2 = make_big_string_array() ser = make_big_series() ser2 = make_big_string_series() print("arr: ", sys.getsizeof(arr), arr.nbytes) print("arr2: ", sys.getsizeof(arr2), arr2.nbytes) print("ser: ", sys.getsizeof(ser)) print("ser2: ", sys.getsizeof(ser2)) print("ser: ", ser.memory_usage(), ser.memory_usage(deep=True)) print("ser2: ", ser2.memory_usage(), ser2.memory_usage(deep=True))
arr: 80000096 80000000 arr2: 280000096 280000000 ser: 80000144 ser2: 638889034 ser: 80000128 80000128 ser2: 80000128 638889018
%memit make_big_string_series()
peak memory: 1883.11 MiB, increment: 780.45 MiB
%%memit x = make_big_string_series() del x
peak memory: 1883.14 MiB, increment: 696.07 MiB
Two things to point out there. First, you can see the size of a Series
of int
objects is the same whether you use deep=True
or not. For string objects, the size of the object is the same as the int
Series
, but the underlying objects are much bigger. You can see that our Series
that is made of strings objects is over 600MiB, and using %memit
we can see that an increment when we invoke the function. This tool will help you narrow down which functions allocate the most memory and should be investigated further with line-by-line profiling.
Further investigation
The memory_profile
project also has tools for investigating longer running programs and seeing how memory grows over time. Check out the mprof
command for that functionality. It also supports tracking memory in forked processing in a multiprocessing context.
Conclusion
Debugging memory issues can be a very difficult and laborious process, but having a few tools to help understand where the memory is being allocated can be very helpful in moving the debugging sessions along. When used along with other profiling tools, such as line_profiler
or py-spy
, you can get a much better idea of where your code needs improvement.
does this only work when executing in a cli or shell with the command : python -m memory_profiler performance_memory_profiler.py or mprof?
can I run this in django without executing this separate command.
Hi Rakesh,
If you want to run memory profiler against Django (or any python process) you need to wrap it in a command to run the profiler. One issue with using a memory profiler is that it will slow things down considerably. You could use it locally to see what the typical memory usage is for a sample request, then extrapolate to your server in production.