When resampling a non-naive time series with a custom function (using apply or agg), an unexpected result is returned.
import pandas as pd
def weighted_quantile(series, weights, q):
series = series.sort_values()
cumsum = weights.reindex(series.index).fillna(0).cumsum()
cutoff = cumsum.iloc[-1] * q
return series[cumsum >= cutoff].iloc[0]
times = pd.date_range('2017-6-23 18:00', periods=8, freq='15T', tz='UTC')
data = pd.Series([1., 1, 1, 1, 1, 2, 2, 0], index=times)
weights = pd.Series([160., 91, 65, 43, 24, 10, 1, 0], index=times)
data.resample('D').apply(weighted_quantile, weights=weights, q=0.5)
Out[2]:
2017-06-23 00:00:00+00:00 0.0
Freq: D, dtype: float64
Expected Output
The (single) value of the series should correspond with:
weighted_quantile(data, weights=weights, q=0.5)
Out[3]: 1.0
One indeed gets this result when passing data
and weigths
with naive time index:
times_naive = pd.date_range('2017-6-23 18:00', periods=8, freq='15T')
data = pd.Series([1., 1, 1, 1, 1, 2, 2, 0], index=times_naive)
weights = pd.Series([160., 91, 65, 43, 24, 10, 1, 0], index=times_naive)
data.resample('D').apply(weighted_quantile, weights=weights, q=0.5)
Out[4]:
2017-06-23 1.0
Freq: D, dtype: float64
But, as shown above, not when passing non-naive time series.
Output of pd.show_versions()
Comment From: jreback
there was quite a bit of work on this in 0.24 pls try and/or with master
Comment From: kdebrab
I get the same result with pandas 0.24.1
Comment From: kdebrab
It seems like the index of data is made naive by Resampler.apply when passing it to the custom function.
Comment From: kdebrab
The output in pandas 1.0.1 is:
Out[2]:
2017-06-23 00:00:00+00:00 1.0
Freq: D, dtype: float64
So, this issue has been resolved by now!
Comment From: wfvining
It seems like the index of data is made naive by Resampler.apply when passing it to the custom function.
I am having this same problem in 1.0.1. If I do the following
def show_tz(xs):
print(xs.index.tz)
return False
ser = pd.Series(
1,
index=pd.date_range(
start='2020-01-01',
end='2020-01-05',
freq='H',
tz='Etc/GMT-7'
)
)
ser.resample('D').apply(show_tz)
I expect to see the following output
Etc/GMT-7
Etc/GMT-7
Etc/GMT-7
Etc/GMT-7
But instead I get
Etc/GMT-7
None
None
None
If you examine the value of xs
for each application you find that the one 'group' that is not tz-naive has no data in it.
Comment From: kdebrab
As of pandas version 1.3.5, this issue is no longer present. Seems to be solved since version 1.3.3. With version 1.3.2 I get a IndexError when executing above example.
Comment From: jreback
would certainly take a validation test for this