Incorrect dt.weekofyear calculation? I was expecting weekofyear/week for 20050101 would be 0

In [1442]: df
Out[1442]:
   A         B
0  1  20050121
1  2  20050111
2  3  20050205
3  4  20050101

In [1443]: pd.to_datetime(df.B, format='%Y%m%d').dt.week
Out[1443]:
0     3
1     2
2     5
3    53
Name: B, dtype: int64

In [1444]: pd.to_datetime(df.B, format='%Y%m%d').dt.weekofyear
Out[1444]:
0     3
1     2
2     5
3    53
Name: B, dtype: int64

In [1445]: pd.to_datetime(df.B, format='%Y%m%d').dt.strftime('%W')
Out[1445]:
0    03
1    02
2    05
3    00
Name: B, dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.21.0.dev+397.g234b042 pytest: 3.2.0 pip: 9.0.1 setuptools: 36.2.7 Cython: 0.24.1 numpy: 1.12.1 scipy: 0.19.1 pyarrow: None xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2017.2 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.6.4 bs4: 4.5.1 html5lib: 0.999999999 sqlalchemy: 1.0.13 pymysql: 0.7.9.None psycopg2: None jinja2: 2.8 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: jschendel

If my understanding is correct, the current behavior for dt.weekofyear is consistent with ISO-8601. Python's strftime behavior for %W explicitly breaks ISO-8601 by forcing days in the new year before the first Monday to be week zero.

As you've noted, this can lead to inconsistencies between the two, especially around the beginning/end of years. For some years it can even cause a consistent off-by-one numbering:

In [1]: import pandas as pd
   ...: df = pd.DataFrame({'full_dt': ['20031229', '20040601', '20041231', '20050102']})
   ...:

In [2]: df['full_dt'] = pd.to_datetime(df['full_dt'])
   ...: df['dt.weekofyear'] = df['full_dt'].dt.weekofyear
   ...: df['strftime_%W'] = df['full_dt'].dt.strftime('%W')
   ...: df
   ...:
Out[2]:
     full_dt  dt.weekofyear strftime_%W
0 2003-12-29              1          52
1 2004-06-01             23          22
2 2004-12-31             53          52
3 2005-01-02             53          00

Even though these are inconsistent, I believe they're both technically correct based on their respective specifications.

Comment From: jreback

see https://github.com/pandas-dev/pandas/issues/6936 as well.

closing as a duplicate