It can be very common when dealing with time series data to end up with duplicate data. This can happen for a variety of reasons, and I've encountered it more than one time when and tried different approaches to eliminate the duplicate values. There's a gem of a solution on Stack Overflow and I thought … Continue reading Removing duplicate data in Pandas
Month: November 2020
Converting types in Pandas
Pandas is great for dealing with both numerical and text data. In most projects you'll need to clean up and verify your data before analysing or using it for anything useful. Data might be delivered in databases, csv or other formats of data file, web scraping results, or even manually entered. Once you have loaded … Continue reading Converting types in Pandas
Use pyenv and virtual environments to manage Python complexity
In my earlier post, I wrote about how pyenv is a great tool for running multiple versions of Python on the same host. It makes it simple to install multiple versions of Python on your workstation or server and control which version executes in a shell. But as a Python developer, the Python version is … Continue reading Use pyenv and virtual environments to manage Python complexity