Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from datetime import datetime

import pandas as pd


def main():
    df = pd.DataFrame({
        'field': pd.to_datetime([datetime(2022, 1, 20), datetime(2022, 1, 22)]),
        'update': [True, False],
    })
    print("Before:", df.info())
    print(df.head(2))
    print()
    df_to_update = df[df['update']]
    df.loc[df['update'], ['field']] = df_to_update['field']
    print("After:", df.info())
    print(df.head(2))


if __name__ == "__main__":
    main()

Issue Description

When doing a partial assignment to a view created by .loc[predicate, [column]] in a column with dtype datetime64[ns] (naive datetime) the column dtype changes to object and the datetimes assigned are represented as floats.

With non-naive datetimes it works as expected, maintaining the dtype as datetime64[ns, timezone].

The reproducible example below prints the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns](1)
memory usage: 146.0 bytes
Before: None
       field  update
0 2022-01-20    True
1 2022-01-22   False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      object
 1   update  2 non-null      bool
dtypes: bool(1), object(1)
memory usage: 146.0+ bytes
After: None
                 field  update
0  1642636800000000000    True
1  2022-01-22 00:00:00   False

Expected Behavior

The assignment shouldn't change the values nor the dtype of the column.

As an example, see what's shown by the reproducible example when we add tzinfo=timezone.utc to the values:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns, UTC]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
Before: None
                      field  update
0 2022-01-20 00:00:00+00:00    True
1 2022-01-22 00:00:00+00:00   False

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   field   2 non-null      datetime64[ns, UTC]
 1   update  2 non-null      bool
dtypes: bool(1), datetime64[ns, UTC](1)
memory usage: 146.0 bytes
After: None
                      field  update
0 2022-01-20 00:00:00+00:00    True
1 2022-01-22 00:00:00+00:00   False

As you can see, with a timezone the column doesn't change the dtype and the values are interpreted as datetimes, not floats.

Installed Versions

❯ python Python 3.8.14 (default, Oct 10 2022, 16:44:50) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd pd>>> pd.show_versions() INSTALLED VERSIONS ------------------ commit : 91111fd99898d9dcaa6bf6bedb662db4108da6e6 python : 3.8.14.final.0 python-bits : 64 OS : Linux OS-release : 5.19.13-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Tue, 04 Oct 2022 14:36:58 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : es_ES.UTF-8 LOCALE : es_ES.UTF-8 pandas : 1.5.1 numpy : 1.23.5 pytz : 2022.6 dateutil : 2.8.2 setuptools : 56.0.0 pip : 22.0.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: phofl

Hi, thanks for your report. This works on main, could use a test

Comment From: MarcoGorelli

Just for reference, looks like this was fixed by https://github.com/pandas-dev/pandas/pull/49161 (good one @phofl !)

https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=112905556

Comment From: alonme

take