Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
def to_weekly(dataframe, field=None):
dataframe.index = pd.to_datetime(dataframe.index)
if field:
dataframe = dataframe[field]
return dataframe.resample('W').mean()
btc_frame = pd.read_csv('btc_nvt.csv')
btc_frame = btc_frame.shift(periods=1, freq=None, axis=1)
btc_frame = btc_frame.drop(['Date'], axis=1)
btc_frame['nvt'] = btc_frame['marketcap(USD)'] / btc_frame['txVolume(USD)']
btc_frame['price(%)'] = btc_frame['price(USD)'].pct_change(1)
btc_frame['% man'] = (btc_frame['price(USD)'] - btc_frame['price(USD)'].shift(1)) / btc_frame['price(USD)'].shift(1)
btc_frame['1 shift'] = btc_frame['price(USD)'].shift(1)
btc_frame = to_weekly(btc_frame)
print(btc_frame[['price(USD)', 'price(%)', '% man', '1 shift']])
Problem description
The values in price(%)
and in % man
are incorrect. It not clear how those values are computed because they do not align with pct_change. Another issue is the 1 shift
column does not seem to be price(USD)
shifted by 1.
I have tried using the latest version of Pandas as well via a pip install with the same result. Perhaps there is also a chance this is related to the way the csv is being imported and prepared? I'm attaching the csv as well (as .txt since github does not support .csv).
price(USD) price(%) % man 1 shift
2013-04-28 134.210000 NaN NaN NaN
2013-05-05 118.842857 -0.015728 -0.015728 121.457143
2013-05-12 113.925714 -0.000890 -0.000890 114.055714
2013-05-19 118.710000 0.008954 0.008954 117.711429
2013-05-26 127.732857 0.013101 0.013101 126.091429
2013-06-02 128.634286 -0.012134 -0.012134 130.232857
2013-06-09 114.727143 -0.027950 -0.027950 117.911429
2013-06-16 103.840000 -0.000162 -0.000162 103.910000
Expected Output
price(USD) price(%) % man 1 shift
2013-04-28 134.210000 NaN NaN NaN
2013-05-05 118.842857 -0.1145 -0.1145 134.210000
2013-05-12 113.925714 -0.041 -0.041 118.842857
Output of pd.show_versions()
Comment From: jreback
can u show a minimal example
Comment From: bgits
@jreback Can you clarify what you would expect as a minimal example? You can match the top 3 lines with the 3 lines in the expected output.
ie: on 2013-05-05, price(%) is -0.015728
it should be about -0.1145
Comment From: bgits
Upon closer inspection this line is causing the deviation: btc_frame = to_weekly(btc_frame)
if I move it up to the preprocessing stage before any calculations are done the weekly is as expected and the original output would be correct as well given that the weekly resample is being done on daily percentage changes.
This seems like intended behavior of pandas and just a mistake on my part. If this is indeed the expected behavior of pandas then we can close this issue.
Comment From: jreback
pct_change knows nothing about freq so you should resample first
Comment From: babrik
For below data: | Date | Open | High | Low | Close | Adj Close 2013-01-02 | 19693.300781 | 19756.679688 | 19686.500000 | 19714.240234 | 19714.240234 2013-01-03 | 19771.029297 | 19786.300781 | 19693.289063 | 19764.779297 | 19764.779297 2013-01-04 | 19782.589844 | 19797.439453 | 19679.990234 | 19784.080078 | 19784.080078 2013-01-07 | 19820.560547 | 19856.429688 | 19654.460938 | 19691.419922 | 19691.419922 2013-01-08 | 19681.380859 | 19761.779297 | 19632.589844 | 19742.519531 | 19742.519531
While doing: dframe.pct_change() I am getting below error: TypeError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y) 1008 try: -> 1009 result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs) 1010 except TypeError:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, op_str, a, b, use_numexpr, eval_kwargs) 204 if use_numexpr: --> 205 return _evaluate(op, op_str, a, b, eval_kwargs) 206 return _evaluate_standard(op, op_str, a, b)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b, truediv, reversed, **eval_kwargs) 119 if result is None: --> 120 result = _evaluate_standard(op, op_str, a, b) 121
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b, **eval_kwargs) 64 with np.errstate(all='ignore'): ---> 65 return op(a, b) 66
TypeError: unsupported operand type(s) for /: 'str' and 'float'