Use pandas DateOffsets for easy date manipulation

So much useful data has a date or time component. Often, data has a timestamp to represent when the data was acquired, or when an event will take place, or as an identifying attribute like an expiration date. For this reason, understanding how to work with dates and times effectively can be a very useful skill. One common need is to select dates (and times) using rules based on their offset from known times. This article will focus on some handy ways to use pandas DateOffsets for working with dates specifically.

Since my experience is in the areas of finance and trading, I’ll use some practical examples I’ve encountered over the years. But even if you don’t work in finance, the techniques should work for any data that has dates.

What is a DateOffset?

A DateOffset is just a special object that represents a way to shift a date to a new date. This turns out to be really useful.

The DateOffset class and a number of useful offset aliases are in the pd.offsets package (an alias to pandas.tseries.offsets).

Quick overview

Before we look at some ideas of how to use these DateOffsets, let’s just review how they work. This is all just a high level of what you’ll find in the documentation, so head there for more detail.

First, let’s just look at the DateOffset class itself, you can do quite a bit with it alone!

The DateOffset constructor takes a number of keyword arguments. Plural arguments will shift the date. Singular arguments replace the resulting date’s values. Use normalize to set the time to midnight. Note that DateOffset will respect timezones, unlike Timedelta, so if you cross a daylight savings boundary, it will make sure you aren’t off by an hour.

import pandas as pd

now = pd.Timestamp.now()
print("Add a day:", now + pd.offsets.DateOffset(days=1))
print("Add a week:", now + pd.offsets.DateOffset(weeks=1))
print("Add a month:", now + pd.offsets.DateOffset(months=1))
print("Add an hour:", now + pd.offsets.DateOffset(hours=1))
print("Add a day, replace the hour:", now + pd.offsets.DateOffset(days=1, hour=13))
print("Add a month, normalize:", now + pd.offsets.DateOffset(month=1, normalize=True))
print("Add 2 days across DST change:", pd.Timestamp("2022-11-05 00:00:00", tz="America/Chicago") + pd.offsets.DateOffset(days=2))
print("Add 2 days across DST change (with Timedelta, no adjustment):", pd.Timestamp("2022-11-05 00:00:00", tz="America/Chicago") + pd.Timedelta(days=2))
Add a day: 2022-09-26 14:20:30.243984
Add a week: 2022-10-02 14:20:30.243984
Add a month: 2022-10-25 14:20:30.243984
Add an hour: 2022-09-25 15:20:30.243984
Add a day, replace the hour: 2022-09-26 13:20:30.243984
Add a month, normalize: 2022-01-25 00:00:00
Add 2 days across DST change: 2022-11-07 00:00:00-06:00
Add 2 days across DST change (with Timedelta, no adjustment): 2022-11-06 23:00:00-06:00

Offset aliases

However, you don’t need to use the DateOffset class directly. Pandas has a ton of named offset aliases that do what you want for a number of common scenarios. You’ll find these to be extremely useful.

print("Next business day (or weekday):", now + pd.offsets.BDay(normalize=True))
print("Three business days (or weekday):", now + pd.offsets.BDay(3, normalize=True))
print("Next Easter:", now + pd.offsets.Easter(normalize=True))
Next business day (or weekday): 2022-09-26 00:00:00
Three business days (or weekday): 2022-09-28 00:00:00
Next Easter: 2023-04-09 00:00:00

You can also subtract offsets.

print("Beginning of month:", now - pd.offsets.MonthBegin(normalize=True))
print("Beginning of quarter:", now - pd.offsets.QuarterBegin(normalize=True))
print("Beginning of year:", now - pd.offsets.YearBegin(normalize=True))
Beginning of month: 2022-09-01 00:00:00
Beginning of quarter: 2022-09-01 00:00:00
Beginning of year: 2022-01-01 00:00:00

Full offset alias list

Pandas has a plethora of configured offset aliases. You can create them by constructing them as an object as shown above, or you can pass their code (listed in parentheses below) to other pandas methods that take offsets as a parameter, as you’ll see below. Here’s a list taken right from the documentation.

  • DateOffset Generic offset class, defaults to absolute 24 hours
  • BDay or BusinessDay, (B). business day or weekday
  • CDay or CustomBusinessDay, (C). custom business day
  • Week (W) one week, optionally anchored on a day of the week
  • WeekOfMonth (WOM) the x-th day of the y-th week of each month
  • LastWeekOfMonth (LWOM) the x-th day of the last week of each month
  • MonthEnd (M) calendar month end
  • MonthBegin (MS) calendar month begin
  • BMonthEnd or BusinessMonthEnd (BM) business month end
  • BMonthBegin or BusinessMonthBegin (BMS) business month begin
  • CBMonthEnd or CustomBusinessMonthEnd (CBM) custom business month end
  • CBMonthBegin or CustomBusinessMonthBegin (CBMS) custom business month begin
  • SemiMonthEnd (SM) 15th (or other day_of_month) and calendar month end
  • SemiMonthBegin (SMS) 15th (or other day_of_month) and calendar month begin
  • QuarterEnd (Q) calendar quarter end
  • QuarterBegin (QS) calendar quarter begin
  • BQuarterEnd (BQ) business quarter end
  • BQuarterBegin (BQS) business quarter begin
  • FY5253Quarter (REQ) retail (aka 52-53 week) quarter
  • YearEnd (A) calendar year end
  • YearBegin (AS) or (BYS) calendar year begin
  • BYearEnd (BA) business year end
  • BYearBegin (BAS) business year begin
  • FY5253 (RE) retail (aka 52-53 week) year
  • Easter Easter holiday
  • BusinessHour (BH) business hour
  • CustomBusinessHour (CBH) custom business hour
  • Day (D) one absolute day
  • Hour (H) one hour
  • Minute (T) or (min) one minute
  • Second (S) one second
  • Milli (L) or (ms) one millisecond
  • Micro (U) or (us) one microsecond
  • Nano (N) one nanosecond

A useful place to use the offset aliases is in pd.date_range. The code can be passed in as the freq argument along with numbers. Here’s a few examples.

print("Beginning of the quarter\n", pd.date_range(start='2022-01-01', freq='QS', periods=4))
print("Beginning of the month\n", pd.date_range(start='2022-01-01', freq='MS', periods=4))
print("Beginning of every 3rd month\n", pd.date_range(start='2022-01-01', freq='3MS', periods=4))
Beginning of the quarter
 DatetimeIndex(['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01'], dtype='datetime64[ns]', freq='QS-JAN')
Beginning of the month
 DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01'], dtype='datetime64[ns]', freq='MS')
Beginning of every 3rd month
 DatetimeIndex(['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01'], dtype='datetime64[ns]', freq='3MS')

What is the alternative to pandas DateOffsets?

You’ve probably seen a lot of code that tries to do complex date logic using basic Python datetime times. This might make sense for trival cases, but you will quickly run into situations that cause that code to turn ugly. For example, if given a date you want to find the next Monday, you could write something like this:

Indexing in pandas can be so confusing

There are so many ways to do the same thing! What is the difference between .loc, .iloc, .ix, and []?  You can read the official documentation but there's so much of it and it seems so confusing. You can ask a question on Stack Overflow, but you're just as likely to get too many different and confusing answers as no answer at all. And existing answers don't fit your scenario.

You just need to get started with the basics.

What if you could quickly learn the basics of indexing and selecting data in pandas with clear examples and instructions on why and when you should use each one? What if the examples were all consistent, used realistic data, and included extra relevant background information?

Master the basics of pandas indexing with my free ebook. You'll learn what you need to get comfortable with pandas indexing. Covered topics include:

  • what an index is and why it is needed
  • how to select data in both a Series and DataFrame.
  • the difference between .loc, .iloc, .ix, and [] and when (and if) you should use them.
  • slicing, and how pandas slicing compares to regular Python slicing
  • boolean indexing
  • selecting via callable
  • how to use where and mask.
  • how to use query, and how it can help performance
  • time series indexing

Because it's highly focused, you'll learn the basics of indexing and be able to fall back on this knowledge time and again as you use other features in pandas.

Just give me your email and you'll get the free 57 page e-book, along with helpful articles about Python, pandas, and related technologies once or twice a month. Unsubscribe at any time.

Invalid email address
import datetime

today = datetime.date.today()

while today.weekday() != 0: # Monday
    today += datetime.timedelta(days=1)

today
datetime.date(2022, 9, 26)

Compare the above to

(pd.Timestamp.today() + pd.offsets.Week(1, weekday=0)).date()
datetime.date(2022, 9, 26)

Plus, it’s easy to use these offsets on pandas Series and DataFrames.

s = pd.Series(pd.date_range('2022-01-01', periods=5))
s + pd.offsets.Week(1, weekday=0)
0   2022-01-03
1   2022-01-03
2   2022-01-10
3   2022-01-10
4   2022-01-10
dtype: datetime64[ns]

More complicated scenarios

One way I’ve found offsets to be useful is to select data for certain events. For example, a very important report for the US financial markets is made available every month from the U.S. Bureau of Labor Statistics. It’s called the Employment Situation, with the “Non-Farm payrolls” number in that report being one of the most closely watched pieces of data by traders. Their schedule is [listed on their website]. It generally follows the schedule of the first Friday of the month. We can generate this pretty easily using pandas. There are a couple of ways we could do this, but here’s one technique. We can make a date index using date_range, and pass in the MonthBegin as the freq, using the code from the list above.

dates = pd.date_range('2022-01-01', '2022-12-31', freq='MS')
dates
DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01',
               '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01',
               '2022-09-01', '2022-10-01', '2022-11-01', '2022-12-01'],
              dtype='datetime64[ns]', freq='MS')

Now, given the first day of the month, can we get the first Monday of the month? One way to do this is to back up 1 day (in case the first day is a Monday itself), then move forward one week, but setting the weekday to Friday.

dates - pd.offsets.Day(1) + pd.offsets.Week(1, weekday=4)
DatetimeIndex(['2022-01-07', '2022-02-04', '2022-03-04', '2022-04-01',
               '2022-05-06', '2022-06-03', '2022-07-01', '2022-08-05',
               '2022-09-02', '2022-10-07', '2022-11-04', '2022-12-02'],
              dtype='datetime64[ns]', freq=None)

But, it turns out you can pass in a 0 as the week move, and in that case it will not shift if the start date is the same as the anchor point. (I hadn’t realized that until I was writing this up so used to do it the first way. The pandas docs are full of great information, you should read them!).

dates + pd.offsets.Week(0, weekday=4)
DatetimeIndex(['2022-01-07', '2022-02-04', '2022-03-04', '2022-04-01',
               '2022-05-06', '2022-06-03', '2022-07-01', '2022-08-05',
               '2022-09-02', '2022-10-07', '2022-11-04', '2022-12-02'],
              dtype='datetime64[ns]', freq=None)

Now if I compare the values above with this year’s dates listed at the BLS site, I see that the July data was released on July 8th, not July 1st. This is related to the U.S. Independence Day holiday the following Monday. This is a good reminder to never completely trust your understanding of the data! The BLS can choose to move things around if they want to, so having a reliable reference source for events is probably required if you are depending on this data.

Holidays

What about dealing with holidays? If we look at the 2021 schedule, we can see that the first Friday in January falls on New Year’s Day. It turns out that adding holidays is not that hard with pandas. If all you want to do is select the next business day, you can just use the calendar with a CustomBusinessDay offset, with a value of 0 that means we should only move forward if the date is a holiday.

dates_2021 = pd.date_range('2021-01-01', '2021-12-31', freq='MS')
dates_2021
DatetimeIndex(['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01',
               '2021-05-01', '2021-06-01', '2021-07-01', '2021-08-01',
               '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'],
              dtype='datetime64[ns]', freq='MS')
from pandas.tseries.holiday import USFederalHolidayCalendar

bday_us = pd.offsets.CustomBusinessDay(0, calendar=USFederalHolidayCalendar())
dates_2021 + bday_us
/Users/mcw/.pyenv/versions/pandas/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py:760: PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex.
  warnings.warn(
DatetimeIndex(['2021-01-04', '2021-02-01', '2021-03-01', '2021-04-01',
               '2021-05-03', '2021-06-01', '2021-07-01', '2021-08-02',
               '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'],
              dtype='datetime64[ns]', freq=None)

Note that we get a warning that the offset is non-vectorized. This means that if you want to use this technique on an extremely large dataset, this will be quite slow (as of the time of writing with pandas 1.4.3). For this reason, for larger data sets you may want to create this index once and use it multiple times with your data.

Now, note that above we used a holiday calendar from pandas. But the holidays on the web site were slightly different – the BLS listed Inauguration Day as a holiday as well. We can make a custom holiday calendar ourselves.

bls_holidays = [
    "2021-01-01",
    "2021-01-18",
    "2021-01-20",
    "2021-02-15",
    "2021-05-31",
    "2021-07-05",
    "2021-09-06",
    "2021-10-11",
    "2021-11-11",
    "2021-11-25",
    "2021-12-24",
    "2021-12-31",
]
bday_bls = pd.offsets.CustomBusinessDay(0, holidays=bls_holidays)
dates_2021_bls = dates_2021 + bday_bls
dates_2021_bls
/Users/mcw/.pyenv/versions/pandas/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py:760: PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex.
  warnings.warn(
DatetimeIndex(['2021-01-04', '2021-02-01', '2021-03-01', '2021-04-01',
               '2021-05-03', '2021-06-01', '2021-07-01', '2021-08-02',
               '2021-09-01', '2021-10-01', '2021-11-01', '2021-12-01'],
              dtype='datetime64[ns]', freq=None)

Now, if you had a Series or DataFrame of data, say returns for a financial instrument for every day of the year, you could use this index to pick out the ones from the dates in question using pandas indexing. If you want to know more about indexing time series data in pandas, you can check out this article. Here’s an example:

# make some fake data, one value per day of the year
df = pd.DataFrame(np.random.rand(365), index=pd.date_range('2021-01-01', '2021-12-31'))
df.loc[dates_2021_bls]
                0
    2021-01-04  0.151260
    2021-02-01  0.201709
    2021-03-01  0.921957
    2021-04-01  0.072389
    2021-05-03  0.821674
    2021-06-01  0.561620
    2021-07-01  0.926453
    2021-08-02  0.055801
    2021-09-01  0.768521
    2021-10-01  0.294276
    2021-11-01  0.651574
    2021-12-01  0.099297

In summary, you can use pandas DateOffsets to shift dates easily. This can be a huge timesaver when you need to select data using complex (and not so complex) criteria. How will you use them in your next data investigation?

Have anything to say about this topic?