>>> d = '2013-02-02T02:12:34+0600'
>>> ts_from_str = pandas.Timestamp(d)
>>> ts_from_dt64 = pandas.Timestamp(numpy.datetime64(d))
>>> ts_from_dt64.tzinfo
>>> ts_from_str.tzinfo
pytz.FixedOffset(360)
NumPy documentation is clear about datetime64
representing 'a single moment in time' and being 'always stored based on POSIX time'. It seems to me that a Timestamp
object constructed from datetime64
should always be tz-aware, so that the equality comparison between ts_from_dt64
and ts_from_str
would return True
, instead of raising an error.
This part seems easy and I can create a PR if what I wrote makes sense. Since datetime64
doesn't seem to store offsets, we probably can't get ts_from_dt64
and ts_from_str
to print the same string.
Comment From: jreback
this is not correct. numpy
datetime64 DO NOT have a timezone associated with them. However, they DISPLAY in the local timezone. These are actually completely naive.
In [6]: Timestamp('20130101 09:00:01')
Out[6]: Timestamp('2013-01-01 09:00:01')
In [7]: Timestamp('20130101 09:00:01').value
Out[7]: 1357030801000000000
In [8]: np.datetime64(Timestamp('20130101 09:00:01').value,'ns')
Out[8]: numpy.datetime64('2013-01-01T04:00:01.000000000-0500')
and from your example above
In [33]: ts_from_str.value
Out[33]: 1359749554000000000
In [34]: ts_from_dt64.value
Out[34]: 1359749554000000000
These ARE the same time. The tz WILL change depending on YOUR timezone (or whatever your code is run). In point of fact, this is a horrible property to have.
Comment From: kawochen
I am not concerned about the display. I was thinking that pandas.Timestamp(d)==pandas.Timestamp(numpy.datetime64(d))
should always evaluate to True
, but then I guess it's not possible to make it work for both cases (d = '2013-02-02T02:00'
and d = '2013-02-02T02:00+0000'
).
Comment From: jorisvandenbossche
@kawochen it is indeed not possible to get this working with the current numpy, as with the conversion in numpy.datetime64(d)
the timezone info that is contained in d
is lost.
Comment From: jreback
@kawochen its not about display at all, rather as @jorisvandenbossche points out a lack of timezone support on the numpy side (with the confusion being that it actually DISPLAYs a timezone). pandas tz handling is quite good, even handling weird things like ambiguous time transitions over DST changes.