Research
-
[X] I have searched the [pandas] tag on StackOverflow for similar questions.
-
[X] I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/74830329/pandas-crosstab-dos-not-support-float-with-capital-f-number-formats
Question about pandas
Hello guys, I´ve noted that pandas crosstab does not support Float formats (instead of float). What I really would like is to understand why and if the output error message could be clearer to understand? Best regards.
I am working on a sample data transaction dataframe. Such base contains cliente ID, transaction gross value (GMV) and revenue. Take this example as DF :
df = pd.DataFrame({
'id' : [1,2,3,4,5],
'date' : [datetime(2022,6,1),datetime(2022,12,31),datetime(2022,12,31),datetime(2022,12,31),datetime(2022,12,31)],
'revenue' : [10.1, 10.1, 10.1, 10.1, 10.1]})
df = df.astype({'revenue':'Float64'})
Now I create a crosstab
CrossTab = pd.crosstab(df['id'], df['date'], values=df['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)
the output
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?62540584-615b-4486-a364-032ed486e331)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 CrossTab = pd.crosstab(df['id'], df['date'], values=df['revenue'], rownames=None, colnames=None, aggfunc='sum', margins=True, margins_name='All', dropna=False, normalize=False)
File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\reshape\pivot.py:691, in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
688 df["__dummy__"] = values
689 kwargs = {"aggfunc": aggfunc}
--> 691 table = df.pivot_table(
692 "__dummy__",
693 index=unique_rownames,
694 columns=unique_colnames,
695 margins=margins,
696 margins_name=margins_name,
697 dropna=dropna,
698 **kwargs,
699 )
701 # Post-process
702 if normalize is not False:
File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\frame.py:8728, in DataFrame.pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name, observed, sort)
8711 @Substitution("")
8712 @Appender(_shared_docs["pivot_table"])
8713 def pivot_table(
(...)
...
--> 292 raise TypeError(dtype) # pragma: no cover
294 converted = maybe_downcast_numeric(result, dtype, do_round)
295 if converted is not result:
TypeError: Float64
Comment From: phofl
Hi, thanks for your report.
Can you please provide a minimal reproducible example? see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
The groupby ops etc don't seem to be relevant
Comment From: frbelotto
Thanks. I´ve edited the post to make it smaller and more straight forward.
Comment From: phofl
This doesn't work, clients is not defined
Comment From: frbelotto
Sorry, I´ve simplified the code and forgot to change the source of the crosstab data. I´ve updated the post.
Its just substitute "clients" for df.
Comment From: phofl
This work on main, so may need tests
Comment From: frbelotto
Sorry for my ignorance. What do you mean when you say it "works on main"?
Comment From: phofl
That it works on the pandas main branch
Comment From: frbelotto
I am running it on VScode on last available versions. These are my imported modules and current versions.
pandas 1.5.2 Python 3.11.1
The crosstab itselfs runs normmaly if I just remove the "df = df.astype({'revenue':'Float64'})
" code (so the DF is generated with float64 instead of Float64).
And other types of float vs float generate same errors (Float32, for example).
Comment From: phofl
The pandas main branch is our development branch. I am not talking about anything that is released yet
Comment From: frbelotto
OOOwww, sorry. Lol. Finally got it.
Comment From: phofl
No worries
Comment From: natmokval
take
Comment From: frbelotto
Thanks @natmokval, Just another question.
As I've seen, the issue occurred with Float32 too, for example, and you sad Float64.
Does it fix all of them in one shot?