Basic Pandas: Moving a DataFrame column

Let’s start with a basic DataFrame with a few columns.

>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.rand(5,5), columns=['a', 'b', 'c', 'd', 'e'])
>>> df['max'] = df.max(axis=1)
>>>
>>> df
          a         b         c         d         e       max
0  0.067423  0.058920  0.999309  0.440547  0.572163  0.999309
1  0.384196  0.732857  0.138881  0.764242  0.096347  0.764242
2  0.900311  0.662776  0.223959  0.903363  0.349328  0.903363
3  0.988267  0.852733  0.913800  0.106388  0.864908  0.988267
4  0.830644  0.647775  0.596375  0.631442  0.907743  0.907743

First, let’s just review the basics. Without moving or dropping columns, we can view any column we want in any order by just selecting them.

>>> df['max']
0    0.999309
1    0.764242
2    0.903363
3    0.988267
4    0.907743
Name: max, dtype: float64

Or any set of columns, including viewing the column more than once, and in any order.

>>> df[['d', 'a', 'max', 'b', 'd']]
          d         a       max         b         d
0  0.440547  0.067423  0.999309  0.058920  0.440547
1  0.764242  0.384196  0.764242  0.732857  0.764242
2  0.903363  0.900311  0.903363  0.662776  0.903363
3  0.106388  0.988267  0.988267  0.852733  0.106388
4  0.631442  0.830644  0.907743  0.647775  0.631442

So assigning back to our variable will make this reordering permanent.

df = df[['d', 'a', 'b', 'max', 'e']]

Since the columns are just an Index, they can be converted to a list and manipulated, then you can also use the reindex method to change the columns ordering. Note that you don’t want to just assign the sorted names to columns, this won’t move them, but will rename them!

>>> df.reindex(columns=sorted(df.columns))
          a         b         d         e       max
0  0.067423  0.058920  0.440547  0.572163  0.999309
1  0.384196  0.732857  0.764242  0.096347  0.764242
2  0.900311  0.662776  0.903363  0.349328  0.903363
3  0.988267  0.852733  0.106388  0.864908  0.988267
4  0.830644  0.647775  0.631442  0.907743  0.907743

Also, when you are first creating a column, you can just insert it in the order that you want it to appear. By default, adding a column using the [] operator will put it at the end.

>>> df.insert(3, "min", df.min(axis=1))
>>> df
          d         a         b       min       max         e
0  0.440547  0.067423  0.058920  0.058920  0.999309  0.572163
1  0.764242  0.384196  0.732857  0.096347  0.764242  0.096347
2  0.903363  0.900311  0.662776  0.349328  0.903363  0.349328
3  0.106388  0.988267  0.852733  0.106388  0.988267  0.864908
4  0.631442  0.830644  0.647775  0.631442  0.907743  0.907743

Finally, you can pop the column, then re-insert it. Popping a column removes it and returns it, as you’d expect.

>>> col_e = df.pop("e")
>>> df.insert(3, "e", col_e)
>>>
>>> df
          d         a         b         e       min       max
0  0.440547  0.067423  0.058920  0.572163  0.058920  0.999309
1  0.764242  0.384196  0.732857  0.096347  0.096347  0.764242
2  0.903363  0.900311  0.662776  0.349328  0.349328  0.903363
3  0.106388  0.988267  0.852733  0.864908  0.106388  0.988267
4  0.631442  0.830644  0.647775  0.907743  0.631442  0.907743

So as you can see, there are a number of ways to manipulate your column ordering in your dataframe.

Want to hear more from me about python, pandas, and numpy? Sign up for my newsletter. No spam, and unsubscribe any time.

Leave a Reply

Your email address will not be published. Required fields are marked *