When resampling a non-naive time series with a custom function (using apply or agg), an unexpected result is returned.

import pandas as pd

def weighted_quantile(series, weights, q):
    series = series.sort_values()
    cumsum = weights.reindex(series.index).fillna(0).cumsum()
    cutoff = cumsum.iloc[-1] * q
    return series[cumsum >= cutoff].iloc[0]

times = pd.date_range('2017-6-23 18:00', periods=8, freq='15T', tz='UTC')
data = pd.Series([1., 1, 1, 1, 1, 2, 2, 0], index=times)
weights = pd.Series([160., 91, 65, 43, 24, 10, 1, 0], index=times)

data.resample('D').apply(weighted_quantile, weights=weights, q=0.5)
Out[2]: 
2017-06-23 00:00:00+00:00    0.0
Freq: D, dtype: float64

Expected Output

The (single) value of the series should correspond with:

weighted_quantile(data, weights=weights, q=0.5)
Out[3]: 1.0

One indeed gets this result when passing data and weigths with naive time index:

times_naive = pd.date_range('2017-6-23 18:00', periods=8, freq='15T')
data = pd.Series([1., 1, 1, 1, 1, 2, 2, 0], index=times_naive)
weights = pd.Series([160., 91, 65, 43, 24, 10, 1, 0], index=times_naive)

data.resample('D').apply(weighted_quantile, weights=weights, q=0.5)
Out[4]: 
2017-06-23    1.0
Freq: D, dtype: float64

But, as shown above, not when passing non-naive time series.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.4 pytest: 4.2.1 pip: 19.0.2 setuptools: 39.0.1 Cython: 0.29.4 numpy: 1.16.0 scipy: 1.2.0 pyarrow: None xarray: 0.11.3 IPython: 7.2.0 sphinx: None patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.9 blosc: None bottleneck: 1.2.1 tables: None numexpr: None feather: None matplotlib: 2.2.3 openpyxl: 2.5.12 xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.2 lxml: 4.3.0 bs4: 4.6.1 html5lib: 1.0.1 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

there was quite a bit of work on this in 0.24 pls try and/or with master

Comment From: kdebrab

I get the same result with pandas 0.24.1

Comment From: kdebrab

It seems like the index of data is made naive by Resampler.apply when passing it to the custom function.

Comment From: kdebrab

The output in pandas 1.0.1 is:

Out[2]: 
2017-06-23 00:00:00+00:00    1.0
Freq: D, dtype: float64

So, this issue has been resolved by now!

Comment From: wfvining

It seems like the index of data is made naive by Resampler.apply when passing it to the custom function.

I am having this same problem in 1.0.1. If I do the following

def show_tz(xs):
    print(xs.index.tz)
    return False


ser = pd.Series(
    1, 
    index=pd.date_range(
        start='2020-01-01', 
        end='2020-01-05', 
        freq='H',
        tz='Etc/GMT-7'
    )
)
ser.resample('D').apply(show_tz)

I expect to see the following output

Etc/GMT-7
Etc/GMT-7
Etc/GMT-7
Etc/GMT-7

But instead I get

Etc/GMT-7
None
None
None

If you examine the value of xs for each application you find that the one 'group' that is not tz-naive has no data in it.

Comment From: kdebrab

As of pandas version 1.3.5, this issue is no longer present. Seems to be solved since version 1.3.3. With version 1.3.2 I get a IndexError when executing above example.

Comment From: jreback

would certainly take a validation test for this