Basic Pandas: Moving a DataFrame column

Let’s start with a basic DataFrame with a few columns.

>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame(np.random.rand(5,5), columns=['a', 'b', 'c', 'd', 'e'])
>>> df['max'] = df.max(axis=1)
>>>
>>> df
          a         b         c         d         e       max
0  0.067423  0.058920  0.999309  0.440547  0.572163  0.999309
1  0.384196  0.732857  0.138881  0.764242  0.096347  0.764242
2  0.900311  0.662776  0.223959  0.903363  0.349328  0.903363
3  0.988267  0.852733  0.913800  0.106388  0.864908  0.988267
4  0.830644  0.647775  0.596375  0.631442  0.907743  0.907743

First, let’s just review the basics. Without moving or dropping columns, we can view any column we want in any order by just selecting them.

>>> df['max']
0    0.999309
1    0.764242
2    0.903363
3    0.988267
4    0.907743
Name: max, dtype: float64

Or any set of columns, including viewing the column more than once, and in any order.

>>> df[['d', 'a', 'max', 'b', 'd']]
          d         a       max         b         d
0  0.440547  0.067423  0.999309  0.058920  0.440547
1  0.764242  0.384196  0.764242  0.732857  0.764242
2  0.903363  0.900311  0.903363  0.662776  0.903363
3  0.106388  0.988267  0.988267  0.852733  0.106388
4  0.631442  0.830644  0.907743  0.647775  0.631442

So assigning back to our variable will make this reordering permanent.

df = df[['d', 'a', 'b', 'max', 'e']]

Since the columns are just an Index, they can be converted to a list and manipulated, then you can also use the reindex method to change the columns ordering. Note that you don’t want to just assign the sorted names to columns, this won’t move them, but will rename them!

Indexing in pandas can be so confusing

There are so many ways to do the same thing! What is the difference between .loc, .iloc, .ix, and []?  You can read the official documentation but there's so much of it and it seems so confusing. You can ask a question on Stack Overflow, but you're just as likely to get too many different and confusing answers as no answer at all. And existing answers don't fit your scenario.

You just need to get started with the basics.

What if you could quickly learn the basics of indexing and selecting data in pandas with clear examples and instructions on why and when you should use each one? What if the examples were all consistent, used realistic data, and included extra relevant background information?

Master the basics of pandas indexing with my free ebook. You'll learn what you need to get comfortable with pandas indexing. Covered topics include:

  • what an index is and why it is needed
  • how to select data in both a Series and DataFrame.
  • the difference between .loc, .iloc, .ix, and [] and when (and if) you should use them.
  • slicing, and how pandas slicing compares to regular Python slicing
  • boolean indexing
  • selecting via callable
  • how to use where and mask.
  • how to use query, and how it can help performance
  • time series indexing

Because it's highly focused, you'll learn the basics of indexing and be able to fall back on this knowledge time and again as you use other features in pandas.

Just give me your email and you'll get the free 57 page e-book, along with helpful articles about Python, pandas, and related technologies once or twice a month. Unsubscribe at any time.

Invalid email address
>>> df.reindex(columns=sorted(df.columns))
          a         b         d         e       max
0  0.067423  0.058920  0.440547  0.572163  0.999309
1  0.384196  0.732857  0.764242  0.096347  0.764242
2  0.900311  0.662776  0.903363  0.349328  0.903363
3  0.988267  0.852733  0.106388  0.864908  0.988267
4  0.830644  0.647775  0.631442  0.907743  0.907743

Also, when you are first creating a column, you can just insert it in the order that you want it to appear. By default, adding a column using the [] operator will put it at the end.

>>> df.insert(3, "min", df.min(axis=1))
>>> df
          d         a         b       min       max         e
0  0.440547  0.067423  0.058920  0.058920  0.999309  0.572163
1  0.764242  0.384196  0.732857  0.096347  0.764242  0.096347
2  0.903363  0.900311  0.662776  0.349328  0.903363  0.349328
3  0.106388  0.988267  0.852733  0.106388  0.988267  0.864908
4  0.631442  0.830644  0.647775  0.631442  0.907743  0.907743

Finally, you can pop the column, then re-insert it. Popping a column removes it and returns it, as you’d expect.

>>> col_e = df.pop("e")
>>> df.insert(3, "e", col_e)
>>>
>>> df
          d         a         b         e       min       max
0  0.440547  0.067423  0.058920  0.572163  0.058920  0.999309
1  0.764242  0.384196  0.732857  0.096347  0.096347  0.764242
2  0.903363  0.900311  0.662776  0.349328  0.349328  0.903363
3  0.106388  0.988267  0.852733  0.864908  0.106388  0.988267
4  0.631442  0.830644  0.647775  0.907743  0.631442  0.907743

So as you can see, there are a number of ways to manipulate your column ordering in your dataframe.

Have anything to say about this topic?