I recently saw a question about

and realized that when I first started using Pandas I would often attempt to solve problems with apply when a vectorized solution was what I should have been using instead. Let’s say that you have an existing function to calculate the present value of an investment that takes scalar arguments and you also have a `pandas.DataFrame.apply`

of investments, perhaps loaded from a csv file or database.`DataFrame`

PV = FV / (1 + i) ** n

def present_value(fv, i_rate, n_periods): return fv / (1 + i_rate) ** n_periods

If someone has given us this function, we might be tempted to just use it on our data. So here’s what a

might look like with some values.`DataFrame`

df = pd.DataFrame([(1000, 0.05, 12), (1000, 0.07, 12), (1000, 0.09, 12), (500, 0.02, 24)], columns=['fv', 'i_rate', 'n_periods'])

One way to apply a function to a

is to manually iterate over the items in the frame and apply the function.`DataFrame`

for (index, row) in df.iterrows(): df.loc[index, 'pv'] = present_value(row.fv, row.i_rate, row.n_periods)

Another way to reuse that existing function is to use `apply`

on the `DataFrame`

, using

to apply it to each row (instead of each column).`axis=1`

df['pv'] = df.apply(lambda r: present_value(r['fv'], r['i_rate'], r['n_periods']), axis=1)

The problem with this technique is it isn’t vectorized. We are going to force the

function to be evaluated once for each row in the `present_value`

`DataFrame`

, and this will be much more expensive than a similar vectorized solution. In fact, `apply`

is even evaluated twice on the first row (for the current implementation) since it can choose an optimized path based on the result, so the function being applied should not have side effects.

So in this case, we should consider a vectorized solution.

df['pv2'] = df['fv']/(1 + df['i_rate']) ** df['n_periods']

If we time these two versions, we can see the vectorized version is more than twice as fast. Here’s the full result.