Pandas version checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of pandas.
- [X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame(
{
"A": pd.Series(["a", "b", "a", "b", "c", "a", "a"], dtype="category"),
}
)
dummies = pd.get_dummies(df, drop_first=True)
pd.from_dummies(dummies, sep="_", default_category="a")
returns
TypeError: Passed DataFrame contains non-dummy data
Issue Description
from_dummies
can't undo ops when sparse.
Expected Behavior
Work with sparse or non-sparse.
Installed Versions
INSTALLED VERSIONS
------------------
commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763
python : 3.9.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 1.5.0
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 63.3.0
pip : 22.2.2
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.1.9
pyxlsb : 1.0.9
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.41
tables : None
tabulate : 0.8.10
xarray : 2022.6.0
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : None
Comment From: Pvlb1998
Take
Comment From: kostyafarber
I can't reproduce this with the latest version of pandas on main. Can we confirm? Otherwise we can close this.
Comment From: ghost
It looks like the version of pandas that you are using is 1.5.0, and it seems to be causing the issue. This version of pandas is quite old, and the function from_dummies
may have been updated or fixed in later versions.
I recommend that you update to the latest version of pandas, which is currently 1.5.2 and then try reproducing the issue again. If you are still unable to reproduce the issue, it is likely that it has been fixed in a later version, or it was a temporary bug that has been resolved. In that case, you can close this issue. Also, it's good to check the official pandas documentation for from_dummies
to see if there are any changes or any indication of this bug.
Comment From: bashtage
Was it fixed? Possibly by the switch to bool in dummies from int8.
Comment From: kostyafarber
Was it fixed? Possibly by the switch to bool in dummies from int8.
I've pulled the latest version of Pandas from main and have tested this and do not receive any errors.
One of the maintainers can confirm and if so we can close this one.
Comment From: bashtage
Does appear to be fixed in 1.5.3.
Comment From: MarcoGorelli
tbh I can't even reproduce this on 1.5.0:
In [1]: pd.__version__
Out[1]: '1.5.0'
In [2]: import pandas as pd
...:
...: df = pd.DataFrame(
...: {
...: "A": pd.Series(["a", "b", "a", "b", "c", "a", "a"], dtype="category"),
...: }
...: )
...: dummies = pd.get_dummies(df, drop_first=True)
...: pd.from_dummies(dummies, sep="_", default_category="a")
Out[2]:
A
0 a
1 b
2 a
3 b
4 c
5 a
6 a