Removing one or more columns from a pandas DataFrame is a pretty common task, but it turns out there are a number of possible ways to perform this task. I found that this StackOverflow question, along with solutions and discussion in it raised a number of interesting topics. It is worth digging in a little bit to the … Continue reading How to remove a column from a DataFrame, with some extra detail
Once we have debugged, working, readable (and hopefully testable) code, it may become important to examine it more closely and try to improve the code's performance. Before we can make any progress in determining if our changes are an improvement, we need to measure the current performance and see where it is spending its time. … Continue reading Profiling Python code with line_profiler
We would love for our Python programs to run as fast as possible, but figuring out how to speed things up requires gathering information about the current state of our code and knowing techniques to speed things up. First and foremost, we need to know where our program is spending its time, and what is … Continue reading Profiling Python with cProfile, and a speedup tip
If you've done any work in pandas, you've surely seen the SettingWithCopyWarning. This is an explanation of what's happening and how to fix it.
The query method in pandas DataFrames provides some flexibility in code, and potential speedups using numexpr.
This is the fifth post in a series on indexing and selecting in pandas. If you are jumping in the middle and want to get caught up, here's what has been discussed so far: Basic indexing, selecting by label and locationSlicing in pandasSelecting by boolean indexingSelecting by callable Once the basics were covered in the … Continue reading Selecting in Pandas using where and mask
In pandas, you can use callables where indexers are accepted. It turns out that can be handy for a pretty common use case.
This is the third post in the series on indexing and selecting data in pandas. If you haven't read the others yet, see the first post that covers the basics of selecting based on index or relative numerical indexing, and the second post, that talks about slicing. In this post, I'm going to talk about boolean … Continue reading Boolean Indexing in Pandas
Slicing data in pandas This is second in the series on indexing and selecting data in pandas. If you haven't read it yet, see the first post that covers the basics of selecting based on index or relative numerical indexing. In this post, I'm going to review slicing, which is a core Python topic, but has … Continue reading Indexing and Selecting in Pandas – slicing
The topic of indexing and selecting data in pandas is core to using pandas, but it can be quite confusing. One reason for that is because over the years pandas has grown organically based on user requests so there are multiple way to select data out of a pandas DataFrame or Series. Reading through the documentation can be … Continue reading Indexing and Selecting in Pandas (part 1)