In pandas, you can use callables where indexers are accepted. It turns out that can be handy for a pretty common use case.
Category: Python
Boolean Indexing in Pandas
This is the third post in the series on indexing and selecting data in pandas. If you haven't read the others yet, see the first post that covers the basics of selecting based on index or relative numerical indexing, and the second post, that talks about slicing. In this post, I'm going to talk about boolean … Continue reading Boolean Indexing in Pandas
Indexing and Selecting in Pandas – slicing
Slicing data in pandas This is second in the series on indexing and selecting data in pandas. If you haven't read it yet, see the first post that covers the basics of selecting based on index or relative numerical indexing. In this post, I'm going to review slicing, which is a core Python topic, but has … Continue reading Indexing and Selecting in Pandas – slicing
Indexing and Selecting in Pandas (part 1)
The topic of indexing and selecting data in pandas is core to using pandas, but it can be quite confusing. One reason for that is because over the years pandas has grown organically based on user requests so there are multiple way to select data out of a pandas DataFrame or Series. Reading through the documentation can be … Continue reading Indexing and Selecting in Pandas (part 1)
Connecting to your notebook kernel using Jupyter console
Jupyter notebooks are a great way to explore data using Python (and other languages as well). Having a visual representation of your code and output, along with documentation and formatting in one view can be extremely helpful. However, there are some things that are just much better to do in a console session. In this … Continue reading Connecting to your notebook kernel using Jupyter console
Overview of I/O tools in Pandas
Pandas has a lot of functionality, but before you can explore or use it, you'll most likely want to access some data from an external source. You'll also likely want to store results for use later or be able to export results to other tools or to share with others. Pandas has a lot of … Continue reading Overview of I/O tools in Pandas
Removing duplicate data in Pandas
It can be very common when dealing with time series data to end up with duplicate data. This can happen for a variety of reasons, and I've encountered it more than one time when and tried different approaches to eliminate the duplicate values. There's a gem of a solution on Stack Overflow and I thought … Continue reading Removing duplicate data in Pandas
Converting types in Pandas
Pandas is great for dealing with both numerical and text data. In most projects you'll need to clean up and verify your data before analysing or using it for anything useful. Data might be delivered in databases, csv or other formats of data file, web scraping results, or even manually entered. Once you have loaded … Continue reading Converting types in Pandas
Use pyenv and virtual environments to manage Python complexity
In my earlier post, I wrote about how pyenv is a great tool for running multiple versions of Python on the same host. It makes it simple to install multiple versions of Python on your workstation or server and control which version executes in a shell. But as a Python developer, the Python version is … Continue reading Use pyenv and virtual environments to manage Python complexity
You can easily and sensibly run multiple versions of Python with pyenv
Python 3.9 just came out recently, and I thought it would make sense to check out some of the new features (dict union operators, string remove prefix and suffix, etc.). Of course, doing this requires a Python 3.9 environment. Since new versions of Python may break existing code, I don't want to update my entire … Continue reading You can easily and sensibly run multiple versions of Python with pyenv