Code and results:
import pandas as pd
data = [{'dt': '2020-01-11T11:00:00'},{'dt': ''}, {'dt': '2021-04-19T10:00:00+08'}]
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data)
assert df1['dt'].dtype.name == 'object'
assert df2['dt'].dtype.name == 'object'
df1['dt'] = pd.to_datetime(df1['dt'])
df2['dt'] = df2['dt'].apply(pd.to_datetime)
print(df1['dt'][1], type(df1['dt'][1])) # Result: str: 'NaT'
print(df2['dt'][1], type(df2['dt'][1])) # Result: pandas._libs.tslibs.nattype.NaTType: NaT
Comment From: ishansmishra
I think you're trying to print df1['dt'][0]
print(df1['dt'][1], type(df1['dt'][1])) # Result: 2020-01-11 11:00:00 <class 'datetime.datetime'>
print(df2['dt'][1], type(df2['dt'][1])) # Result: 2020-01-11 11:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Close this issue if you think this is right please :)
Comment From: phofl
Please write something about your expected result or adress @legitishan s comment
Comment From: MarcoGorelli
I presume the expectation is that the output from these should be the same (TBH I don't know what's correct):
>>> import pandas as pd
>>> data = [{'dt': '2020-01-11T11:00:00'},{'dt': ''}, {'dt': '2021-04-19T10:00:00+08'}]
>>> pd.DataFrame(data)['dt'].apply(pd.to_datetime)[1]
NaT
>>> pd.to_datetime(pd.DataFrame(data)['dt'])[1]
'NaT'
Comment From: srotondo
take
Comment From: WangWenQiang
你的邮件我已收到,谢谢!
Comment From: srotondo
This is actually not a bug. As to_datetime()
's documentation describes, when you pass in a list of timezone-aware inputs with mixed timezones, you get a list of datetime.datetime
objects. This is a class from base Python, and therefore has no special class for NaT
, so it returns it as a str
instead. The reason this doesn't happen in the second example in the original post is because the apply()
function invokes to_datetime()
on each object in the DataFrame individually, and with a single value, to_datetime()
will convert to pandas' Timestamp
class.
Given this, I believe there is nothing else that has to be done, so this issue report should be closed.
Comment From: MarcoGorelli
As of PDEP4, this would error anyway:
ValueError: time data "2021-04-19T10:00:00+08" at position 2 doesn't match format "%Y-%m-%dT%H:%M:%S"
And if you pass an element that's already parsed, then you'll consistently get NaT
in the result:
In [26]: data = [{'dt': '2020-01-11T11:00:00'},{'dt': ''}, {'dt': Timestamp('2021-04-19T10:00:00+08')}]
...: df1 = pd.DataFrame(data)
...: df2 = pd.DataFrame(data)
...: assert df1['dt'].dtype.name == 'object'
...: assert df2['dt'].dtype.name == 'object'
...: df1['dt'] = pd.to_datetime(df1['dt'])
...: df2['dt'] = df2['dt'].apply(pd.to_datetime)
...: print(df1['dt'][1], type(df1['dt'][1])) # Result: str: 'NaT'
...: print(df2['dt'][1], type(df2['dt'][1]))
NaT <class 'pandas._libs.tslibs.nattype.NaTType'>
NaT <class 'pandas._libs.tslibs.nattype.NaTType'>
Closing then, but thanks for the report!
Comment From: WangWenQiang
你的邮件我已收到,谢谢!