Pandas BUG: exclude object columns in DataFrame resample even if they are numeric

xref #3087 xref #12537

i have a dataframe like this

df

2011-01-07 14:51:00 4288 2011-01-07 14:52:00 4262 2011-01-07 14:53:00 2586 2011-01-07 14:54:00 10738 2011-01-07 14:55:00 4448 2011-01-07 14:56:00 10565 2011-01-07 14:57:00 6539 2011-01-07 14:58:00 25077 2011-01-07 14:59:00 6163 2011-01-07 15:00:00 9644 2011-01-10 09:31:00 23382 2011-01-10 09:32:00 10298 2011-01-10 09:33:00 13689 2011-01-10 09:34:00 9793 2011-01-10 09:35:00 9870 2011-01-10 09:36:00 4249 2011-01-10 09:37:00 5298 2011-01-10 09:38:00 3752 2011-01-10 09:39:00 4087 2011-01-10 09:40:00 10728 Name: volume, dtype: object

df.resample('D').sum()

2011-01-07 84310 2011-01-08 0 2011-01-09 0 2011-01-10 95146 Freq: D, Name: volume, dtype: int64

Problem description

when i execute code above

it produce the 0 not nan where not sample is there

i try another data type float it normally produce nan i think it's about the data type

Expected Output

2011-01-07 84310 2011-01-08 nan 2011-01-09 nan 2011-01-10 95146 Freq: D, Name: volume, dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.1 pytest: 3.1.1 pip: 9.0.1 setuptools: 35.0.2 Cython: None numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 6.1.0 sphinx: None patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: 0.9.6 lxml: 3.8.0 bs4: 4.6.0 html5lib: 0.999999999 sqlalchemy: 1.1.11 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

pls show an actualy reproducible example with code.

Comment From: dsm054

I'm pretty sure the OP is referring to this effect:

In [26]: s = pd.Series(1, index=pd.date_range("2017-01-05", periods=4, freq="B"))

In [27]: s.resample("D").sum()
Out[27]: 
2017-01-05    1.0
2017-01-06    1.0
2017-01-07    NaN
2017-01-08    NaN
2017-01-09    1.0
2017-01-10    1.0
Freq: D, dtype: float64

In [28]: s.astype(object).resample("D").sum()
Out[28]: 
2017-01-05    1
2017-01-06    1
2017-01-07    0
2017-01-08    0
2017-01-09    1
2017-01-10    1
Freq: D, dtype: int64

Comment From: zsluedem

@dsm054 i think you are right . i can't reproduce the example now. it happened in the middle of my code but i debug it now.

Comment From: jreback

So we are resampling on the object columns. This is a bug as we exclude them. This should ignore this column.

In [26]: df = pd.DataFrame({'A': [1, 2, 3], 'B': list('abc'), 'C':pd.date_range('20130101', periods=3, freq='1min')})

In [27]: df
Out[27]: 
   A  B                   C
0  1  a 2013-01-01 00:00:00
1  2  b 2013-01-01 00:01:00
2  3  c 2013-01-01 00:02:00

In [28]: df.resample('1H', on='C').mean()
Out[28]: 
            A
C            
2013-01-01  2

Comment From: yiwiz-sai

def test(grp):
    print grp
    return ''.join(grp)
ser=pd.Series(list('abcdfefeagg'))
print ser.rolling(window=3).apply(test)

I can not get what I want.... the rolling function can not work for string my pandas version is 0.21

Comment From: ron819

any chance this will be fixed for next release?

Comment From: jbrockmendel

We no longer drop nuisance columns, closing.

Pandas BUG: exclude object columns in DataFrame resample even if they are numeric

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`