• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> from datetime import datetime
>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan
>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0    1629590400000000000
dtype: int64
>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64
>>>
>>> pd.to_numeric(pd.NaT, errors="coerce")
nan
>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   -9223372036854775808
dtype: int64
>>> pd.Series([pd.NaT]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

Problem description

When using pd.to_numeric to convert a pd.Series with dtype datetime64[ns], it returns different values than converting the series value by value

Expected Output

Converting a pd.Series as a whole should be the same than converting it value by value. I am not sure about what the correct output should be, but IMO the output should be consistent in these two scenarios.

What I suggest: - For no-null values, returns the same value. Maybe the integer? - For pd.NaT, always returns np.NaN

Output of pd.show_versions()

I am using the latest version of master until today

INSTALLED VERSIONS ------------------ commit : e39ea3024cebb4e7a7fd35972a44637de6c41650 python : 3.8.3.final.0 python-bits : 64 OS : Darwin OS-release : 19.6.0 Version : Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 1.4.0.dev0+517.gc3761e24d8 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0.post20200714 Cython : 0.29.21 pytest : 5.4.3 hypothesis : None sphinx : 3.1.2 blosc : None feather : None xlsxwriter : 1.2.9 lxml.etree : 4.5.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.16.1 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fsspec : 0.7.4 fastparquet : None gcsfs : None matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.4 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.5.0 sqlalchemy : 1.3.18 tables : 3.6.1 tabulate : 0.8.9 xarray : None xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.50.1

Comment From: DAKSHA2001

Assign me this issue. I will solve it. I have to make an open source contribution as part of Microsoft Research Intern Role so assign this to me

Comment From: ShreyasPatel031

take

Comment From: Navaneethan2503

this PR will close this issue #43289 , thanks for this issue @hec10r . closes #43280

Comment From: hec10r

take

Comment From: hec10r

After some investigation, it looks like this behavior is explained by: https://github.com/numpy/numpy/issues/19782, so not sure if this is a pandas issue or needs to be fixed in numpy. Before creating a PR here I would like to have the input from the maintainers.

Comment From: mroeschke

The key point to note is that datetime objects in pandas get converted to datetime64[ns] (this has been the convention for a while) and not explicitly an object unless dtype=object is specified. So that being said:

Incorrect, should be 1629590400000000000

>>> pd.to_numeric(datetime(2021, 8, 22), errors="coerce")
nan

Correct

>>> pd.to_numeric(pd.Series(datetime(2021, 8, 22)), errors="coerce")
0    1629590400000000000
dtype: int64

Incorrect, should be 1629590400000000000

>>> pd.Series([datetime(2021, 8, 22)]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN
dtype: float64

Correct

>>> pd.to_numeric(pd.NaT, errors="coerce")
nan

Incorrect, probably related to https://github.com/pandas-dev/pandas/issues/16674

>>> pd.to_numeric(pd.Series(pd.NaT), errors="coerce")
0   -9223372036854775808
dtype: int64

Correct

>>> pd.Series([pd.NaT]).apply(partial(pd.to_numeric), errors="coerce")
0   NaN

Comment From: hec10r

Hi Matthew, thanks for answering.

That's the other approach that I considered, but since there weren't documentation of the pd.to_numeric behavior for date-like objects and I didn't find it very intuitive, I thought that changing the whole behavior for something more intuitive would be good.

The use case where I am struggling with is this one:

>>> import pandas as pd
>>> from datetime import datetime
>>> pd.to_numeric(pd.Series([datetime(2021,8,22)]), errors="raise") # Same for errors="ignore"/"coerce"
0    1629590400000000000
dtype: int64
>>> pd.to_numeric(pd.Series(["apple", 1, datetime(2021,8,22)]), errors="coerce")
0    NaN
1    1.0
2    NaN
dtype: float64
>>> pd.to_numeric(pd.Series(["apple", 1, datetime(2021,8,22)]), errors="ignore")
0                  apple
1                      1
2    2021-08-22 00:00:00
dtype: object

Is this desired/expected? Should we return 1629590400000000000 in the three cases?

Comment From: mroeschke

Sorry, yes the documentation could use improvement to address how datetime-like objects (e.g. np.datetime64, datetime.datetime, etc) are treated.

I would expect 1629590400000000000 in all 3 cases, thought this case is a bit tricky since the entire Series has dtype=object, but certain elements in that Series can be converted to numbers as it aligns with what you mentioned in your OP

Converting a pd.Series as a whole should be the same as converting it value by value.

Comment From: hec10r

IMO, the best approach would be to have a function pd._scalar_to_numericand then call it inside pd.to_numeric, something like:

from functools import partial
def to_numeric(arg, errors="raise", downcast=None):
    if is_scalar(arg):
        return pd._scalar_to_numeric(arg, errors=errors, downcast=downcast)
    elif is_series_or_index(arg):
        return arg.apply(pd._scalar_to_numeric, errors=errors, downcast=downcast)
    elif is_list_tuple_or_np_array(arg):
        # Maybe keeping the dtype as well for `np.array`?
        return np.array(map(partial(pd._scalar_to_numeric, errors=errors, downcast=downcast), arg)

What do you think?

Comment From: mroeschke

Probably the best approach would be to modify this line here to handle datetime like scalars (use one of the existing functions in pandas.core.dtypes.common

https://github.com/pandas-dev/pandas/blob/5f648bf1706dd75a9ca0d29f26eadfbb595fe52b/pandas/core/tools/numeric.py#L155

Comment From: hec10r

By only doing that, the problem will persist for lists, tuples and np.arrays

Comment From: mroeschke

Best to address scalars vs array-likes in separate PRs. I imagine addressing array-likes may need to use one of the datetime inference functions for arrays

Comment From: hec10r

I don't see a way of fixing array-likes without using the same logic than for scalars. Inferring the type of the array isn't enough given all the possible cases. Having a function that handles scalars and then mapping this function element by element for iterables seems to be the easiest solution.

Comment From: mroeschke

mapping this function element by element for iterables seems to be the easiest solution

This will kill performance for the existing cases, so that implementation is probably a non-starter.

Comment From: hec10r

Understood. Then let's split the problem 1. PR to fix behavior for scalar date-like objects, including pd.NaT (will do this before weekend) 2. PR to fix behavior for iterables: infer types for list/tuples and use current approach for np.array when type is number-like. Approach for mixed types TBF (open to discuss and implement)

Comment From: jack5github

This is still an ongoing issue as of the 13th of February, 2025. This could theoretically be fixed by using the dtype Int64, which has support for NA values.

from datetime import datetime
import pandas as pd

print(pd.to_numeric(pd.Series([datetime(2025, 2, 13), pd.NaT])))

"""
0    1739404800000000000
1   -9223372036854775808
dtype: int64
"""