A pandas.DataFrame.apply example

I recently saw a question about pandas.DataFrame.apply and realized that when I first started using Pandas I would often attempt to solve problems with apply when a vectorized solution was what I should have been using instead. Let’s say that you have an existing function to calculate the present value of an investment that takes scalar arguments and you also have a DataFrame of investments, perhaps loaded from a csv file or database.

PV = FV / (1 + i) ** n
def present_value(fv, i_rate, n_periods):
    return fv / (1 + i_rate) ** n_periods

If someone has given us this function, we might be tempted to just use it on our data. So here’s what a DataFrame might look like with some values.

df = pd.DataFrame([(1000, 0.05, 12), (1000, 0.07, 12), (1000, 0.09, 12), (500, 0.02, 24)],
               columns=['fv', 'i_rate', 'n_periods'])

One way to apply a function to a DataFrame is to manually iterate over the items in the frame and apply the function.

for (index, row) in df.iterrows():
    df.loc[index, 'pv'] = present_value(row.fv, row.i_rate, row.n_periods)

Another way to reuse that existing function is to use apply on the DataFrame, using axis=1 to apply it to each row (instead of each column).

df['pv'] = df.apply(lambda r: present_value(r['fv'], r['i_rate'], r['n_periods']), axis=1)

The problem with this technique is it isn’t vectorized. We are going to force the present_value function to be evaluated once for each row in the DataFrame, and this will be much more expensive than a similar vectorized solution. In fact, apply is even evaluated twice on the first row (for the current implementation) since it can choose an optimized path based on the result, so the function being applied should not have side effects.

So in this case, we should consider a vectorized solution.

df['pv2'] = df['fv']/(1 + df['i_rate']) ** df['n_periods']

If we time these two versions, we can see the vectorized version is more than twice as fast. Here’s the full result.

Leave a Reply

Your email address will not be published. Required fields are marked *