Incorrect dt.weekofyear
calculation? I was expecting weekofyear
/week
for 20050101
would be 0
In [1442]: df
Out[1442]:
A B
0 1 20050121
1 2 20050111
2 3 20050205
3 4 20050101
In [1443]: pd.to_datetime(df.B, format='%Y%m%d').dt.week
Out[1443]:
0 3
1 2
2 5
3 53
Name: B, dtype: int64
In [1444]: pd.to_datetime(df.B, format='%Y%m%d').dt.weekofyear
Out[1444]:
0 3
1 2
2 5
3 53
Name: B, dtype: int64
In [1445]: pd.to_datetime(df.B, format='%Y%m%d').dt.strftime('%W')
Out[1445]:
0 03
1 02
2 05
3 00
Name: B, dtype: object
Output of pd.show_versions()
Comment From: jschendel
If my understanding is correct, the current behavior for dt.weekofyear
is consistent with ISO-8601. Python's strftime
behavior for %W
explicitly breaks ISO-8601 by forcing days in the new year before the first Monday to be week zero.
As you've noted, this can lead to inconsistencies between the two, especially around the beginning/end of years. For some years it can even cause a consistent off-by-one numbering:
In [1]: import pandas as pd
...: df = pd.DataFrame({'full_dt': ['20031229', '20040601', '20041231', '20050102']})
...:
In [2]: df['full_dt'] = pd.to_datetime(df['full_dt'])
...: df['dt.weekofyear'] = df['full_dt'].dt.weekofyear
...: df['strftime_%W'] = df['full_dt'].dt.strftime('%W')
...: df
...:
Out[2]:
full_dt dt.weekofyear strftime_%W
0 2003-12-29 1 52
1 2004-06-01 23 22
2 2004-12-31 53 52
3 2005-01-02 53 00
Even though these are inconsistent, I believe they're both technically correct based on their respective specifications.
Comment From: jreback
see https://github.com/pandas-dev/pandas/issues/6936 as well.
closing as a duplicate