Pandas perf for read_csv with parse_dates

Caching (memoizing) the date_parser function in read_csv might be an easy perf improvement. Seems it is not cached unless I am missing something?

In [55]: df = pd.DataFrame([datetime.datetime.today()] * 1000000)

In [56]: df.to_csv('j', index=False)

In [57]: !gzip j

In [58]: %time df = pd.read_csv('j.gz')
CPU times: user 703 ms, sys: 68.7 ms, total: 772 ms
Wall time: 774 ms

In [59]: d = {df['0'][0]: datetime.datetime.today()}

In [60]: %time s = df['0'].map(d)
CPU times: user 84.8 ms, sys: 14.8 ms, total: 99.6 ms
Wall time: 99.2 ms

In [61]: %time df = pd.read_csv('j.gz', parse_dates=['0'])
CPU times: user 1.49 s, sys: 88.7 ms, total: 1.58 s
Wall time: 1.58 s

Comment From: chris-b1

Yep, caching certainly could help, this is a duplicate of #11665. PR welcome!