Code Sample

df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
df2 = DataFrame({'A': [2,3]})
df1.update(df2, overwrite=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-a766b5317aac> in <module>()
      1 df1 = DataFrame({'A': [1,None], 'B':[to_datetime('abc', errors='coerce'),to_datetime('2016-01-01')]})
      2 df2 = DataFrame({'A': [2,3]})
----> 3 df1.update(df2, overwrite=False)

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/frame.py in update(self, other, join, overwrite, filter_func, raise_conflict)
   3897
   3898             self[col] = expressions.where(mask, this, that,
-> 3899                                           raise_on_error=True)
   3900
   3901     # ----------------------------------------------------------------------

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in where(cond, a, b, raise_on_error, use_numexpr)
    229
    230     if use_numexpr:
--> 231         return _where(cond, a, b, raise_on_error=raise_on_error)
    232     return _where_standard(cond, a, b, raise_on_error=raise_on_error)
    233

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in _where_numexpr(cond, a, b, raise_on_error)
    152
    153     if result is None:
--> 154         result = _where_standard(cond, a, b, raise_on_error)
    155
    156     return result

~/Envs/pandas-dev/lib/python3.6/site-packages/pandas/pandas/core/computation/expressions.py in _where_standard(cond, a, b, raise_on_error)
    127 def _where_standard(cond, a, b, raise_on_error=True):
    128     return np.where(_values_from_object(cond), _values_from_object(a),
--> 129                     _values_from_object(b))
    130
    131

TypeError: invalid type promotion

Problem description

A similar problem as in issue #15593 which was fixed in pandas version 0.20.2, NaT values anywhere in the DataFrame still throws the following exception: TypeError: invalid type promotion

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.20.2 pytest: 2.9.2 pip: 9.0.1 setuptools: 36.0.1 Cython: 0.24 numpy: 1.13.0 scipy: 0.17.1 xarray: None IPython: 6.1.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.1.0 tables: 3.4.2 numexpr: 2.6.2 feather: 0.3.1 matplotlib: 1.5.1 openpyxl: 2.4.0 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: 4.5.1 html5lib: 0.999999999 sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: olizhu

I've just tested some more and it seems that the error occurs whenever there is a null object in a column containing datetimes. So replacing NaT with NaN still has the same error.

Comment From: TomAugspurger

So when we reindex df2 like df1 we end up with different dtypes

In [22]: df2.reindex_like(df1).dtypes
Out[22]:
A      int64
B    float64
dtype: object

I wonder if we could add a parameter to reindex_like for controlling the dtypes of columns that are created. I could see that being broadly useful.

Comment From: sboltz

I just encountered this issue and was wondering if there were any updates or workarounds for it? Thanks.

Comment From: IanFLee

I'm also having this issue in sklearn.preprocessing with StandardScaler(). It definitely seems to be a datetime issue so I've just dropped that column for the type being, but eventually I'll need it back, so fingers crossed.

Comment From: wxing11

take

Comment From: wxing11

Hi, new contributor here so please correct me if I'm wrong!

This seems to be caused by situations where the Dataframe to be updated has a Datetime column with NaT values and the input Dataframe has either

  1. A matching column by index but of a type that isn't Datetime/Object. I assume an error here is expected.
  2. No matching column by index, so the call to reindex_like in the update function creates a column that isn't of type Datetime/Object. (The example case above)

Since in the situation of the second case the created column is full of only NA values, would it be reasonable to solve this by just adding a check to the function that if a column is full of only NA values, to skip the updating of that column?

I created a PR with an implementation of this as well as a couple new test cases including the one introduced above.

Comment From: MarcoGorelli

I wonder if we could add a parameter to reindex_like for controlling the dtypes of columns that are created

How would this work? Would the dtype be taken from the other DataFrame in reindex? Because if so, one issue would be with null columns getting converted to bool:

In [1]: pd.Series([np.nan, np.nan]).astype(bool)
Out[1]: 
0    True
1    True
dtype: bool

Alternatively, there could be an option to exclude null columns could be excluded from the result of reindex_like, but then that would still require an update to

https://github.com/pandas-dev/pandas/blob/2d126dd0c5fd9768a772ffefede956dfff827667/pandas/core/frame.py#L8196-L8198

to skip over columns which aren't in both this and that

At the moment, I'm struggling to see a simpler solution that that proposed in https://github.com/pandas-dev/pandas/pull/49395 cc @mroeschke (as you'd commented on the PR)

Comment From: mroeschke

Maybe generally a full reindex_like is not needed generally, as only the shared columns should be updated?

Comment From: wxing11

I pushed a new commit to my PR that only reindexes rows and then skips non matching columns. Does that seem right for what you were saying?