Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
id symbol strike info
0 ETH-21DEC22-600-C ETH/USD:ETH-221221-600-C 600.0 {'option_type': 'call', 'expiration_timestamp'...
1 ETH-21DEC22-600-P ETH/USD:ETH-221221-600-P 600.0 {'option_type': 'put', 'expiration_timestamp':...
2 ETH-21DEC22-700-C ETH/USD:ETH-221221-700-C 700.0 {'option_type': 'call', 'expiration_timestamp'...
3 ETH-21DEC22-700-P ETH/USD:ETH-221221-700-P 700.0 {'option_type': 'put', 'expiration_timestamp':...
4 ETH-21DEC22-800-C ETH/USD:ETH-221221-800-C 800.0 {'option_type': 'call', 'expiration_timestamp'...
.. ... ... ... ...
551 ETH-29SEP23-4500-P ETH/USD:ETH-230929-4500-P 4500.0 {'option_type': 'put', 'expiration_timestamp':...
552 ETH-29SEP23-5000-C ETH/USD:ETH-230929-5000-C 5000.0 {'option_type': 'call', 'expiration_timestamp'...
553 ETH-29SEP23-5000-P ETH/USD:ETH-230929-5000-P 5000.0 {'option_type': 'put', 'expiration_timestamp':...
554 ETH-29SEP23-5500-C ETH/USD:ETH-230929-5500-C 5500.0 {'option_type': 'call', 'expiration_timestamp'...
555 ETH-29SEP23-5500-P ETH/USD:ETH-230929-5500-P 5500.0 {'option_type': 'put', 'expiration_timestamp':...
[556 rows x 4 columns]
Issue Description
Traceback (most recent call last):
File "dw_test.py", line 73, in
Expected Behavior
For all other keys from the info column works as expected, and also worked for 2 years for "option_type", until recently. I have upgraded pandas to the latest version but it still throws this error
Installed Versions
Comment From: MarcoGorelli
thanks for the report
gonna need a reproducible example I'm afraid - closing for now, will reopen if you provide one
Comment From: MilicaMedic
I have mongodb from Deribit API data. Concrete data in mongo db is filled by the following code (simplified):
deribit = ccxt.deribit()
exchange_id = 'deribit'
exchange_class = getattr(ccxt, exchange_id)
drbt = exchange_class({
'urls': {
'api': 'https://www.deribit.com'
},
'apiKey': 'key',
'secret': 'secret',
'verbose': False,
'enableRateLimit': True,
'testnet': False,
})
mydb = myclient["tickers"]
mycol = mydb["deribit"]
der = mycol.delete_many({})
#print(x.deleted_count, " documents deleted.")
json = drbt.fetch_markets(params={})
der = mycol.insert_many(json)
I retrive data the followinf way:
df_options = pd.DataFrame(list (mydb["deribit"].find({"type": "option", "base": "BTC"},{"_id":0, "id": 1, "symbol": 1,"strike": 1, "info.expiration_timestamp": 1, "info.option_type": 1, })))
print of df_options is given above, but here it is again:
id symbol strike info
0 ETH-21DEC22-600-C ETH/USD:ETH-221221-600-C 600.0 {'option_type': 'call', 'expiration_timestamp'...
1 ETH-21DEC22-600-P ETH/USD:ETH-221221-600-P 600.0 {'option_type': 'put', 'expiration_timestamp':...
2 ETH-21DEC22-700-C ETH/USD:ETH-221221-700-C 700.0 {'option_type': 'call', 'expiration_timestamp'...
3 ETH-21DEC22-700-P ETH/USD:ETH-221221-700-P 700.0 {'option_type': 'put', 'expiration_timestamp':...
4 ETH-21DEC22-800-C ETH/USD:ETH-221221-800-C 800.0 {'option_type': 'call', 'expiration_timestamp'...
.. ... ... ... ...
551 ETH-29SEP23-4500-P ETH/USD:ETH-230929-4500-P 4500.0 {'option_type': 'put', 'expiration_timestamp':...
552 ETH-29SEP23-5000-C ETH/USD:ETH-230929-5000-C 5000.0 {'option_type': 'call', 'expiration_timestamp'...
553 ETH-29SEP23-5000-P ETH/USD:ETH-230929-5000-P 5000.0 {'option_type': 'put', 'expiration_timestamp':...
554 ETH-29SEP23-5500-C ETH/USD:ETH-230929-5500-C 5500.0 {'option_type': 'call', 'expiration_timestamp'...
555 ETH-29SEP23-5500-P ETH/USD:ETH-230929-5500-P 5500.0 {'option_type': 'put', 'expiration_timestamp':...
[556 rows x 4 columns]
I then do the following:
df_options['underlying_price'] = 0
df_options['underlying_price'] = df_greeks['info'].apply(lambda x: x['underlying_price']) #this is from another df and works
df_options['underlying_price'] = df_options['underlying_price'].astype(float)
#df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type'])
df_options['option_type'] = 0
df_options['option_type'] = df_options['info'].apply(lambda x: x['option_type'])
df_options['info.expiration_timestamp'] = df_options['info'].apply(lambda cell: cell['expiration_timestamp'])
df_options['expiration'] = 0
df_options['expiration']= pd.to_datetime(df_options['info.expiration_timestamp'], unit='ms')
.......
Until recently everything worked flawlesly, now throws an error:
File "/Users/mekintos/Desktop/multiflask/multiflask/bin/flask", line 10, in <module>
sys.exit(main())
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 967, in main
cli.main(args=sys.argv[1:], prog_name="python -m flask" if as_module else None)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 586, in main
return super(FlaskGroup, self).main(*args, **kwargs)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/decorators.py", line 73, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 848, in run_command
app = DispatchingApp(info.load_app, use_eager_loading=eager_loading)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 305, in __init__
self._load_unlocked()
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 330, in _load_unlocked
self._app = rv = self.loader()
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 392, in load_app
app = locate_app(self, import_name, None, raise_if_not_found=False)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 257, in locate_app
return find_best_app(script_info, module)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 83, in find_best_app
app = call_factory(script_info, app_factory)
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 119, in call_factory
return app_factory()
File "/Users/mekintos/Desktop/multiflask/app/__init__.py", line 50, in create_app
from app.analytics_deribit import layout as analytics_deribit_layout
File "/Users/mekintos/Desktop/multiflask/app/analytics_deribit.py", line 14, in <module>
from app.dw_data import df_puts, df_calls, exchange_id, expiration, df_options
File "/Users/mekintos/Desktop/multiflask/app/dw_data.py", line 74, in <module>
df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type'])
File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/pandas/core/series.py", line 4200, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2388, in pandas._libs.lib.map_infer
File "/Users/mekintos/Desktop/multiflask/app/dw_data.py", line 74, in <lambda>
df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type'])
KeyError: 'option_type'
The above report was from the test file, this is from the actual app
Comment From: MarcoGorelli
sorry, gonna need a reproducible example
https://stackoverflow.com/a/20159305/4451315
Comment From: MilicaMedic
here it is:
import pandas as pd
df = pd.DataFrame([[{'test1':'test2'}], [{'option_type': 'call'}]], columns=['A'])
print(df.to_string(index=False))
df['test1'] = df['A'].apply(lambda x: x['test1'])
df['option_type'] = df['A'].apply(lambda x: x['option_type'])
throws the similar error:
Traceback (most recent call last):
File "pandas_version.py", line 8, in <module>
df['test1'] = df['A'].apply(lambda x: x['test1'])
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "pandas_version.py", line 8, in <lambda>
df['test1'] = df['A'].apply(lambda x: x['test1'])
KeyError: 'test1'
I have used this in my app for 2 years and now it started throwing it
Comment From: MarcoGorelli
you haven't showed your expected output, but i'm guessing you want
df['test1'] = df['A'].str.get('test1')
and
df['option_type'] = df['A'].str.get('option_type')
Comment From: MilicaMedic
No, this gives me:
print(df['option_type'])
0 None
1 call
It just needs to be : call
also, I have multiple df columns with different types so this approach does not work. I have used lambda in all of them, but it suddenly doesn't work
Comment From: MarcoGorelli
ok, please ask on stackoverflow, including: - a reproducible example - your exact expected output
if there's a bug, then please open a new issue here on GitHub
pandas 0.24.2 is already 3-4 years old
Comment From: MilicaMedic
I have installed and tried with the latest version of pandas