Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
incomplete_df: pd.DataFrame = pd.DataFrame({'event1': [pd.Timedelta("1 days"), pd.Timedelta("2 days"), pd.Timedelta("nat"), pd.Timedelta("5 days") , pd.Timedelta("6 days"), pd.Timedelta("nat"), pd.Timedelta("nat"), pd.Timedelta("11 days"), pd.Timedelta("nat"), pd.Timedelta("15 days")],
'event2': [pd.Timedelta("nat"), pd.Timedelta("1 days") , pd.Timedelta("nat"), pd.Timedelta("3 days") , pd.Timedelta("4 days"), pd.Timedelta("7 days") , pd.Timedelta("nat"), pd.Timedelta("12 days"), pd.Timedelta("nat"), pd.Timedelta("17 days")],
'event3': [pd.Timedelta("nat"), pd.Timedelta("nat"), pd.Timedelta("nat"), pd.Timedelta("nat"), pd.Timedelta("6 days"), pd.Timedelta("4 days") , pd.Timedelta("9 days") , pd.Timedelta("nat"), pd.Timedelta("3 days") , pd.Timedelta("nat")]})
incomplete_df['reason'] = "Reason is " + incomplete_df.idxmin(axis="columns")
Issue Description
The above code will throw:
ValueError: attempt to get argmin of an empty sequence
However, using this code:
import pandas as pd
import numpy as np
incomplete_df = pd.DataFrame({'event1': [1, 2 ,np.NAN,5 ,6,np.NAN,np.NAN,11 ,np.NAN,15],
'event2': [np.NAN,1 ,np.NAN,3 ,4,7 ,np.NAN,12 ,np.NAN,17],
'event3': [np.NAN,np.NAN,np.NAN,np.NAN,6,4 ,9 ,np.NAN,3 ,np.NAN]})
incomplete_df['reason'] = "Reason is " + incomplete_df.idxmin(axis="columns")
returns a dataframe (incomplete_df
) with a fourth column with None
when the three columns are np.NAN
Expected Behavior
The documentation states:
Raises ValueError If the row/column is empty
.
This is currently wrong when it comes to integers.
Expected behaviour: some consistency between the two datatypes.
With those two stackoverflow posts in mind (https://stackoverflow.com/questions/21188151/pandas-getting-the-name-of-the-minimum-column/73346957#73346957, and https://stackoverflow.com/questions/73354436/pandas-getting-the-name-of-the-minimum-column-with-timedelta-datatype), it is quite obvious, the expected behaviour should not be Raises ValueError If the row/column is empty
but rather sets None
in the created row/column when the row/column is empty. Or an optional argument should be added to determine what behaviour is desired by the user.
Installed Versions
INSTALLED VERSIONS
commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.10.2.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252
pandas : 1.4.3 numpy : 1.22.0 pytz : 2021.3 dateutil : 2.8.2 setuptools : 58.1.0 pip : 22.2.2 Cython : None pytest : 7.1.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 8.0.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : 2022.5.0 gcsfs : None markupsafe : 2.0.1 matplotlib : 3.5.2 numba : 0.55.2 numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.7.3 snappy : None sqlalchemy : 1.4.36 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None zstandard : None
Comment From: josh-medorion
This method relies on numpy's core/fromnumeric.py, which contains a warning that if the object has a signature different from that of NumPy it will raise a TypeError
. Since Timedelta
is Pandas specific, I'm guessing that's the source of the problem.
Comment From: jreback
@JasonMendoza2008 bugs are fixed when the community does pull requests
Comment From: JasonMendoza2008
Update:
The bug is still present in the new pandas version (pandas==1.5.2
).
Comment From: jreback
@JasonMendoza2008 see my comment above
you are welcome to contribute a patch
bugs are fixed when someone does - pandas is all volunteer
Comment From: josh-medorion
Update: The bug is still present in the new pandas version (
pandas==1.5.2
).
@JasonMendoza2008 Please stop adding these comments, they only add noise and waste the time of the maintainers. As @jreback has said twice, bugs are not magically fixed, obviously a bug can be assumed to persist in future versions until a fix is contributed.