pandas.to_datetime called with an int is too slow for my use case. Basically, I have a loop that sequentially gets an integer from a generator of about 1 000 000 numbers, converts it to pandas.Timestamp and passes it to a function. A profiler says that the call of pandas.to_datetime takes about 40 % of the total run time of my program.

Compared to datetime.datetime.fromtimestamp, it's more than 60 times slower:

$ python -m timeit -n 1000000 -s 'import datetime' 'datetime.datetime.fromtimestamp(30, tz=datetime.timezone.utc)'
1000000 loops, best of 3: 0.889 usec per loop
$ python -m timeit -n 1000000 -s 'import pandas' 'pandas.to_datetime(30, utc=True, unit="s")'
1000000 loops, best of 3: 62.8 usec per loop
$ python -c 'import pandas;pandas.show_versions()'

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Can you please provide/document a faster way to instantiate a pandas.Timestamp instance from an epoch time?

Comment From: jreback

why would you do this in a loop? simply pass the entire list

Comment From: jreback

In [10]: r = list(range(100000))

In [11]: %timeit [ datetime.datetime.fromtimestamp(30+v, tz=datetime.timezone.utc) for v in r ]
1 loop, best of 3: 251 ms per loop

In [12]: %timeit pd.to_datetime(r, utc=True, unit='s')
10 loops, best of 3: 84.5 ms per loop

Comment From: radekholy24

Because the whole data from the generator do not fit into memory? But yeah, I can do that in my case.

Comment From: TomAugspurger

FYI @PyDeq

In [652]: %timeit pd.Timestamp.utcfromtimestamp(30)
The slowest run took 11.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.76 µs per loop

vs

In [653]: %timeit datetime.datetime.fromtimestamp(30, tz=datetime.timezone.utc)
The slowest run took 14.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.62 µs per loop

But agreed with @jreback, you're much better off using vectorized methods in pandas.

Comment From: radekholy24

@TomAugspurger thanks. Unfortunately, pd.Timestamp.utcfromtimestamp is not documented.

Comment From: TomAugspurger

Mind opening a PR to fix that?

Comment From: radekholy24

No promises but I can consider doing a PR in case of spare time, sure.

Comment From: TomAugspurger

https://github.com/pandas-dev/pandas/issues/5218 seems to be the reason it's not in the API docs at the moment.

Comment From: jorisvandenbossche

Using just plain Timestamp constructor is actually also fast:

In [51]: %timeit pd.Timestamp.utcfromtimestamp(30)
The slowest run took 39.10 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.77 µs per loop

In [52]: %timeit pd.Timestamp(30, unit='s', tz='UTC')
The slowest run took 14.56 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.97 µs per loop

And this one is documented (so I would prefer this over Timestamp.utcfromtimestamp)

(and it also gives me the impression that the performance of to_datetime can certainly be improved for this case)

Comment From: radekholy24

@jorisvandenbossche, what document do you mean? So far, I've found only examples with strings as the first arguments and no unit nor tz arguments. Anyway, thanks. That is actually what I expected to be the resolution of this request.