Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from datetime import datetime
import pandas as pd
def main():
df = pd.DataFrame({
'field': pd.to_datetime([datetime(2022, 1, 20), datetime(2022, 1, 22)]),
'update': [True, False],
})
print("Before:", df.info())
print(df.head(2))
print()
df_to_update = df[df['update']]
df.loc[df['update'], ['field']] = df_to_update['field']
print("After:", df.info())
print(df.head(2))
if __name__ == "__main__":
main()
Issue Description
When doing a partial assignment to a view created by .loc[predicate, [column]]
in a column with dtype datetime64[ns]
(naive datetime) the column dtype changes to object
and the datetimes assigned are represented as float
s.
With non-naive datetimes it works as expected, maintaining the dtype as datetime64[ns, timezone]
.
The reproducible example below prints the following:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null datetime64[ns]
1 update 2 non-null bool
dtypes: bool(1), datetime64[ns](1)
memory usage: 146.0 bytes
Before: None
field update
0 2022-01-20 True
1 2022-01-22 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null object
1 update 2 non-null bool
dtypes: bool(1), object(1)
memory usage: 146.0+ bytes
After: None
field update
0 1642636800000000000 True
1 2022-01-22 00:00:00 False
Expected Behavior
The assignment shouldn't change the values nor the dtype of the column.
As an example, see what's shown by the reproducible example when we add tzinfo=timezone.utc
to the values:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null datetime64[ns, UTC]
1 update 2 non-null bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
Before: None
field update
0 2022-01-20 00:00:00+00:00 True
1 2022-01-22 00:00:00+00:00 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 field 2 non-null datetime64[ns, UTC]
1 update 2 non-null bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
After: None
field update
0 2022-01-20 00:00:00+00:00 True
1 2022-01-22 00:00:00+00:00 False
As you can see, with a timezone the column doesn't change the dtype and the values are interpreted as datetime
s, not float
s.
Installed Versions
Comment From: phofl
Hi, thanks for your report. This works on main, could use a test
Comment From: MarcoGorelli
Just for reference, looks like this was fixed by https://github.com/pandas-dev/pandas/pull/49161 (good one @phofl !)
https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=112905556
Comment From: alonme
take