Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
series1 = pd.DataFrame(
[(0, "s2", pd.Period(2022)), (0, "s1", pd.Period(2021))],
columns=["A", "B", "C"]
).set_index(["A", "B"])["C"]
series2 = series1.astype(str)
print(series1.unstack("B").reindex(["s2"], axis=1))
print(series2.unstack("B").reindex(["s2"], axis=1))
Issue Description
The example code prints
B s2
A
0 2021
B s2
A
0 2022
The result with pd.Period
data is the incorrect 2021, but with str
data it's the correct 2022.
This only occurs with certain index values. When replacing "s1" with "s3", the effect disappears and the result is 2022 in both cases.
Expected Behavior
Expect the result for both pd.Period
and str
data to be 2022:
B s2
A
0 2022
B s2
A
0 2022
(actually observed with older Pandas 2.0.3)
Installed Versions
Comment From: pedromfdiogo
take
Comment From: rhshadrach
Thanks for the report! Confirmed on main. There has to be some internal state that gets set wrong. Constructing the unstacked result directly:
series1 = pd.DataFrame(
[(0, "s2", pd.Period(2022)), (0, "s1", pd.Period(2021))],
columns=["A", "B", "C"]
).set_index(["A", "B"])["C"]
unstacked = series1.unstack("B")
df = pd.DataFrame({"s1": pd.Period(2021), "s2": pd.Period(2022)}, index=pd.Index([0], name="A"))
df.columns.name = "B"
tm.assert_frame_equal(unstacked, df)
does not exhibit the buggy behavior either when using reindex
. Further investigations and PRs to fix are welcome!
Comment From: snitish
@rhshadrach I noticed that PeriodDtype
(along with DatetimeTZDtype
) is set to support 2D values in the block:
https://github.com/pandas-dev/pandas/blob/56847c56c59bc3edf8607aaacb4f2d1b544204b2/pandas/core/dtypes/dtypes.py#L1034
but in method Block._validate_ndim
we only check for DatetimeTZDtype
and not PeriodDtype
.
https://github.com/pandas-dev/pandas/blob/56847c56c59bc3edf8607aaacb4f2d1b544204b2/pandas/core/internals/blocks.py#L162
Adding PeriodDtype
to the isinstance
check seems to fix this issue, but wondering if we should use dtype._supports_2d
instead of individually comparing with each known 2D dtype.
Comment From: rhshadrach
@snitish - using dtype._supports_2d
looks right to me. cc @jbrockmendel
Comment From: snitish
Thanks. I believe we can go one step further and use the is_1d_only_ea_dtype
utility:
def _validate_ndim(self) -> bool:
...
- return not isinstance(dtype, ExtensionDtype) or isinstance(
- dtype, DatetimeTZDtype
- )
+ return not is_1d_only_ea_dtype(self.dtype)
@pedromfdiogo are you able to take this? Note that this fix will also resolve #60273