Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

id                     symbol  strike                                               info
0     ETH-21DEC22-600-C   ETH/USD:ETH-221221-600-C   600.0  {'option_type': 'call', 'expiration_timestamp'...
1     ETH-21DEC22-600-P   ETH/USD:ETH-221221-600-P   600.0  {'option_type': 'put', 'expiration_timestamp':...
2     ETH-21DEC22-700-C   ETH/USD:ETH-221221-700-C   700.0  {'option_type': 'call', 'expiration_timestamp'...
3     ETH-21DEC22-700-P   ETH/USD:ETH-221221-700-P   700.0  {'option_type': 'put', 'expiration_timestamp':...
4     ETH-21DEC22-800-C   ETH/USD:ETH-221221-800-C   800.0  {'option_type': 'call', 'expiration_timestamp'...
..                  ...                        ...     ...                                                ...
551  ETH-29SEP23-4500-P  ETH/USD:ETH-230929-4500-P  4500.0  {'option_type': 'put', 'expiration_timestamp':...
552  ETH-29SEP23-5000-C  ETH/USD:ETH-230929-5000-C  5000.0  {'option_type': 'call', 'expiration_timestamp'...
553  ETH-29SEP23-5000-P  ETH/USD:ETH-230929-5000-P  5000.0  {'option_type': 'put', 'expiration_timestamp':...
554  ETH-29SEP23-5500-C  ETH/USD:ETH-230929-5500-C  5500.0  {'option_type': 'call', 'expiration_timestamp'...
555  ETH-29SEP23-5500-P  ETH/USD:ETH-230929-5500-P  5500.0  {'option_type': 'put', 'expiration_timestamp':...

[556 rows x 4 columns]

Issue Description

Traceback (most recent call last): File "dw_test.py", line 73, in df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type']) File "/Users/mekintos/Desktop/ccxtbot/ccxtbot/lib/python3.7/site-packages/pandas/core/series.py", line 4357, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "/Users/mekintos/Desktop/ccxtbot/ccxtbot/lib/python3.7/site-packages/pandas/core/apply.py", line 1043, in apply return self.apply_standard() File "/Users/mekintos/Desktop/ccxtbot/ccxtbot/lib/python3.7/site-packages/pandas/core/apply.py", line 1101, in apply_standard convert=self.convert_dtype, File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer File "dw_test.py", line 73, in df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type']) KeyError: 'option_type'

Expected Behavior

For all other keys from the info column works as expected, and also worked for 2 years for "option_type", until recently. I have upgraded pandas to the latest version but it still throws this error

Installed Versions

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Darwin OS-release: 20.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.24.2 pytest: None pip: 19.0.3 setuptools: 45.2.0 Cython: None numpy: 1.17.4 scipy: 1.4.1 pyarrow: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.1 pytz: 2019.3 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.1.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 4.5.0 bs4: 4.7.1 html5lib: None sqlalchemy: None pymysql: 0.9.3 psycopg2: 2.8.2 (dt dec pq3 ext lo64) jinja2: 2.11.1 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.8.1 gcsfs: None None

Comment From: MarcoGorelli

thanks for the report

gonna need a reproducible example I'm afraid - closing for now, will reopen if you provide one

Comment From: MilicaMedic

I have mongodb from Deribit API data. Concrete data in mongo db is filled by the following code (simplified):

deribit = ccxt.deribit()
exchange_id = 'deribit'
exchange_class = getattr(ccxt, exchange_id) 
drbt = exchange_class({
    'urls': {

                'api': 'https://www.deribit.com'

    },

    'apiKey': 'key',
    'secret': 'secret',
    'verbose': False,
    'enableRateLimit': True,
    'testnet': False,

})
mydb = myclient["tickers"]
mycol = mydb["deribit"]
der = mycol.delete_many({})
#print(x.deleted_count, " documents deleted.")
json = drbt.fetch_markets(params={}) 
der = mycol.insert_many(json)

I retrive data the followinf way:

df_options = pd.DataFrame(list (mydb["deribit"].find({"type": "option", "base": "BTC"},{"_id":0, "id": 1, "symbol": 1,"strike": 1, "info.expiration_timestamp": 1, "info.option_type": 1, }))) print of df_options is given above, but here it is again:

id                     symbol  strike                                               info
0     ETH-21DEC22-600-C   ETH/USD:ETH-221221-600-C   600.0  {'option_type': 'call', 'expiration_timestamp'...
1     ETH-21DEC22-600-P   ETH/USD:ETH-221221-600-P   600.0  {'option_type': 'put', 'expiration_timestamp':...
2     ETH-21DEC22-700-C   ETH/USD:ETH-221221-700-C   700.0  {'option_type': 'call', 'expiration_timestamp'...
3     ETH-21DEC22-700-P   ETH/USD:ETH-221221-700-P   700.0  {'option_type': 'put', 'expiration_timestamp':...
4     ETH-21DEC22-800-C   ETH/USD:ETH-221221-800-C   800.0  {'option_type': 'call', 'expiration_timestamp'...
..                  ...                        ...     ...                                                ...
551  ETH-29SEP23-4500-P  ETH/USD:ETH-230929-4500-P  4500.0  {'option_type': 'put', 'expiration_timestamp':...
552  ETH-29SEP23-5000-C  ETH/USD:ETH-230929-5000-C  5000.0  {'option_type': 'call', 'expiration_timestamp'...
553  ETH-29SEP23-5000-P  ETH/USD:ETH-230929-5000-P  5000.0  {'option_type': 'put', 'expiration_timestamp':...
554  ETH-29SEP23-5500-C  ETH/USD:ETH-230929-5500-C  5500.0  {'option_type': 'call', 'expiration_timestamp'...
555  ETH-29SEP23-5500-P  ETH/USD:ETH-230929-5500-P  5500.0  {'option_type': 'put', 'expiration_timestamp':...

[556 rows x 4 columns]

I then do the following:

df_options['underlying_price'] = 0
df_options['underlying_price'] = df_greeks['info'].apply(lambda x: x['underlying_price']) #this is from another df and works
df_options['underlying_price'] = df_options['underlying_price'].astype(float)
#df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type'])
df_options['option_type'] = 0
df_options['option_type'] = df_options['info'].apply(lambda x: x['option_type'])
df_options['info.expiration_timestamp'] = df_options['info'].apply(lambda cell: cell['expiration_timestamp'])
df_options['expiration'] = 0
df_options['expiration']= pd.to_datetime(df_options['info.expiration_timestamp'], unit='ms')
.......

Until recently everything worked flawlesly, now throws an error:

File "/Users/mekintos/Desktop/multiflask/multiflask/bin/flask", line 10, in <module> sys.exit(main()) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 967, in main cli.main(args=sys.argv[1:], prog_name="python -m flask" if as_module else None) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 586, in main return super(FlaskGroup, self).main(*args, **kwargs) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/decorators.py", line 73, in new_func return ctx.invoke(f, obj, *args, **kwargs) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 848, in run_command app = DispatchingApp(info.load_app, use_eager_loading=eager_loading) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 305, in __init__ self._load_unlocked() File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 330, in _load_unlocked self._app = rv = self.loader() File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 392, in load_app app = locate_app(self, import_name, None, raise_if_not_found=False) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 257, in locate_app return find_best_app(script_info, module) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 83, in find_best_app app = call_factory(script_info, app_factory) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/flask/cli.py", line 119, in call_factory return app_factory() File "/Users/mekintos/Desktop/multiflask/app/__init__.py", line 50, in create_app from app.analytics_deribit import layout as analytics_deribit_layout File "/Users/mekintos/Desktop/multiflask/app/analytics_deribit.py", line 14, in <module> from app.dw_data import df_puts, df_calls, exchange_id, expiration, df_options File "/Users/mekintos/Desktop/multiflask/app/dw_data.py", line 74, in <module> df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type']) File "/Users/mekintos/Desktop/multiflask/multiflask/lib/python3.7/site-packages/pandas/core/series.py", line 4200, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2388, in pandas._libs.lib.map_infer File "/Users/mekintos/Desktop/multiflask/app/dw_data.py", line 74, in <lambda> df_options['option_type'] = df_options['info'].apply(lambda cell: cell['option_type']) KeyError: 'option_type'

The above report was from the test file, this is from the actual app

Comment From: MarcoGorelli

sorry, gonna need a reproducible example

https://stackoverflow.com/a/20159305/4451315

Comment From: MilicaMedic

here it is:

import pandas as pd

df = pd.DataFrame([[{'test1':'test2'}], [{'option_type': 'call'}]], columns=['A'])

print(df.to_string(index=False))

df['test1'] = df['A'].apply(lambda x: x['test1'])
df['option_type'] = df['A'].apply(lambda x: x['option_type'])

throws the similar error:

Traceback (most recent call last):
  File "pandas_version.py", line 8, in <module>
    df['test1'] = df['A'].apply(lambda x: x['test1'])
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/series.py", line 3591, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
  File "pandas_version.py", line 8, in <lambda>
    df['test1'] = df['A'].apply(lambda x: x['test1'])
KeyError: 'test1'

I have used this in my app for 2 years and now it started throwing it

Comment From: MarcoGorelli

you haven't showed your expected output, but i'm guessing you want

df['test1'] = df['A'].str.get('test1')

and

df['option_type'] = df['A'].str.get('option_type')

Comment From: MilicaMedic

No, this gives me:

print(df['option_type'])

0    None
1    call

It just needs to be : call

also, I have multiple df columns with different types so this approach does not work. I have used lambda in all of them, but it suddenly doesn't work

Comment From: MarcoGorelli

ok, please ask on stackoverflow, including: - a reproducible example - your exact expected output

if there's a bug, then please open a new issue here on GitHub

pandas 0.24.2 is already 3-4 years old

Comment From: MilicaMedic

I have installed and tried with the latest version of pandas