Code Sample, a copy-pastable example if possible
x = [u'2016-07-19T00:00:00.000Z',
u'2016-07-19T01:00:00.000Z',
u'2016-07-19T02:00:00.000Z',
u'2016-07-19T03:00:00.000Z',
u'2016-07-19T04:00:00.000Z',
u'2016-07-19T05:00:00.000Z',
u'2016-07-19T06:00:00.000Z',
u'2016-07-19T07:00:00.000Z',
u'2016-07-19T08:00:00.000Z',
u'2016-07-19T09:00:00.000Z',
u'2016-07-19T10:00:00.000Z',
u'2016-07-19T11:00:00.000Z',
u'2016-07-19T12:00:00.000Z',
u'2016-07-19T13:00:00.000Z',
u'2016-07-19T14:00:00.000Z',
u'2016-07-19T15:00:00.000Z',
u'2016-07-19T16:00:00.000Z',
u'2016-07-19T17:00:00.000Z',
u'2016-07-19T18:00:00.000Z',
u'2016-07-19T19:00:00.000Z',
u'2016-07-19T20:00:00.000Z',
u'2016-07-19T21:00:00.000Z',
u'2016-07-19T22:00:00.000Z',
u'2016-07-19T23:00:00.000Z']
print pd.Series([dateutil.parser.parse(y) for y in x])
print pd.Series([dateutil.parser.parse(y) for y in (x + x + x)])
Expected Output
0 2016-07-19 00:00:00+00:00
1 2016-07-19 01:00:00+00:00
2 2016-07-19 02:00:00+00:00
3 2016-07-19 03:00:00+00:00
4 2016-07-19 04:00:00+00:00
5 2016-07-19 05:00:00+00:00
6 2016-07-19 06:00:00+00:00
7 2016-07-19 07:00:00+00:00
8 2016-07-19 08:00:00+00:00
9 2016-07-19 09:00:00+00:00
10 2016-07-19 10:00:00+00:00
11 2016-07-19 11:00:00+00:00
12 2016-07-19 12:00:00+00:00
13 2016-07-19 13:00:00+00:00
14 2016-07-19 14:00:00+00:00
15 2016-07-19 15:00:00+00:00
16 2016-07-19 16:00:00+00:00
17 2016-07-19 17:00:00+00:00
18 2016-07-19 18:00:00+00:00
19 2016-07-19 19:00:00+00:00
20 2016-07-19 20:00:00+00:00
21 2016-07-19 21:00:00+00:00
22 2016-07-19 22:00:00+00:00
23 2016-07-19 23:00:00+00:00
dtype: datetime64[ns, tzlocal()]
0 2016-07-19 00:00:00
1 2016-07-19 01:00:00
2 2016-07-19 02:00:00
3 2016-07-19 03:00:00
4 2016-07-19 04:00:00
5 2016-07-19 05:00:00
6 2016-07-19 06:00:00
7 2016-07-19 07:00:00
8 2016-07-19 08:00:00
9 2016-07-19 09:00:00
10 2016-07-19 10:00:00
11 2016-07-19 11:00:00
12 2016-07-19 12:00:00
13 2016-07-19 13:00:00
14 2016-07-19 14:00:00
15 2016-07-19 15:00:00
16 2016-07-19 16:00:00
17 2016-07-19 17:00:00
18 2016-07-19 18:00:00
19 2016-07-19 19:00:00
20 2016-07-19 20:00:00
21 2016-07-19 21:00:00
22 2016-07-19 22:00:00
23 2016-07-19 23:00:00
24 2016-07-19 00:00:00
25 2016-07-19 01:00:00
26 2016-07-19 02:00:00
27 2016-07-19 03:00:00
28 2016-07-19 04:00:00
29 2016-07-19 05:00:00
...
42 2016-07-19 18:00:00
43 2016-07-19 19:00:00
44 2016-07-19 20:00:00
45 2016-07-19 21:00:00
46 2016-07-19 22:00:00
47 2016-07-19 23:00:00
48 2016-07-19 00:00:00
49 2016-07-19 01:00:00
50 2016-07-19 02:00:00
51 2016-07-19 03:00:00
52 2016-07-19 04:00:00
53 2016-07-19 05:00:00
54 2016-07-19 06:00:00
55 2016-07-19 07:00:00
56 2016-07-19 08:00:00
57 2016-07-19 09:00:00
58 2016-07-19 10:00:00
59 2016-07-19 11:00:00
60 2016-07-19 12:00:00
61 2016-07-19 13:00:00
62 2016-07-19 14:00:00
63 2016-07-19 15:00:00
64 2016-07-19 16:00:00
65 2016-07-19 17:00:00
66 2016-07-19 18:00:00
67 2016-07-19 19:00:00
68 2016-07-19 20:00:00
69 2016-07-19 21:00:00
70 2016-07-19 22:00:00
71 2016-07-19 23:00:00
dtype: datetime64[ns, tzlocal()]
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 8.1.1
setuptools: 21.2.1
Cython: None
numpy: 1.11.0
scipy: None
statsmodels: None
xarray: None
IPython: 3.1.0
sphinx: None
patsy: None
dateutil: 2.5.2
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.1
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.7.3
boto: None
pandas_datareader: None
Comment From: se7entyse7en
The actual output of print pd.Series([dateutil.parser.parse(y) for y in (x + x + x)])
is:
0 2016-07-19 00:00:00+00:00
1 1970-01-01 00:00:00+00:00
2 1970-01-01 00:00:00+00:00
3 1970-01-01 00:00:00+00:00
4 1970-01-01 00:00:00+00:00
5 1970-01-01 00:00:00+00:00
6 1970-01-01 00:00:00+00:00
7 1970-01-01 00:00:00+00:00
8 1970-01-01 00:00:00+00:00
9 1970-01-01 00:00:00+00:00
10 1970-01-01 00:00:00+00:00
11 1970-01-01 00:00:00+00:00
12 1970-01-01 00:00:00+00:00
13 1970-01-01 00:00:00+00:00
14 1970-01-01 00:00:00+00:00
15 1970-01-01 00:00:00+00:00
16 1970-01-01 00:00:00+00:00
17 1970-01-01 00:00:00+00:00
18 1970-01-01 00:00:00+00:00
19 1970-01-01 00:00:00+00:00
20 1970-01-01 00:00:00+00:00
21 1970-01-01 00:00:00+00:00
22 1970-01-01 00:00:00+00:00
23 1970-01-01 00:00:00+00:00
24 1970-01-01 00:00:00+00:00
25 1970-01-01 00:00:00+00:00
26 1970-01-01 00:00:00+00:00
27 1970-01-01 00:00:00+00:00
28 1970-01-01 00:00:00+00:00
29 1970-01-01 00:00:00+00:00
...
42 2016-07-19 18:00:00+00:00
43 1970-01-01 00:00:00+00:00
44 1970-01-01 00:00:00+00:00
45 1970-01-01 00:00:00+00:00
46 1970-01-01 00:00:00+00:00
47 1970-01-01 00:00:00+00:00
48 1970-01-01 00:00:00+00:00
49 1970-01-01 00:00:00+00:00
50 1970-01-01 00:00:00+00:00
51 1970-01-01 00:00:00+00:00
52 1970-01-01 00:00:00+00:00
53 1970-01-01 00:00:00+00:00
54 1970-01-01 00:00:00+00:00
55 1970-01-01 00:00:00+00:00
56 1970-01-01 00:00:00+00:00
57 1970-01-01 00:00:00+00:00
58 1970-01-01 00:00:00+00:00
59 1970-01-01 00:00:00+00:00
60 1970-01-01 00:00:00+00:00
61 1970-01-01 00:00:00+00:00
62 1970-01-01 00:00:00+00:00
63 1970-01-01 00:00:00+00:00
64 1970-01-01 00:00:00+00:00
65 1970-01-01 00:00:00+00:00
66 1970-01-01 00:00:00+00:00
67 1970-01-01 00:00:00+00:00
68 1970-01-01 00:00:00+00:00
69 1970-01-01 00:00:00+00:00
70 1970-01-01 00:00:00+00:00
71 1970-01-01 00:00:00+00:00
dtype: datetime64[ns, tzlocal()]
Comment From: se7entyse7en
Another interesting thing is that even if the representation contains those incorrect dates, the conversion to dict of the Series
is correct:
In [89]: pd.Series([dateutil.parser.parse(y) for y in (x + x + x)])
Out[89]:
0 2016-07-19 00:00:00+00:00
1 1970-01-01 00:00:00+00:00
2 1970-01-01 00:00:00+00:00
3 1970-01-01 00:00:00+00:00
4 1970-01-01 00:00:00+00:00
5 1970-01-01 00:00:00+00:00
6 1970-01-01 00:00:00+00:00
7 1970-01-01 00:00:00+00:00
8 1970-01-01 00:00:00+00:00
9 1970-01-01 00:00:00+00:00
10 1970-01-01 00:00:00+00:00
11 1970-01-01 00:00:00+00:00
12 1970-01-01 00:00:00+00:00
13 1970-01-01 00:00:00+00:00
14 1970-01-01 00:00:00+00:00
15 1970-01-01 00:00:00+00:00
16 1970-01-01 00:00:00+00:00
17 1970-01-01 00:00:00+00:00
18 1970-01-01 00:00:00+00:00
19 1970-01-01 00:00:00+00:00
20 1970-01-01 00:00:00+00:00
21 1970-01-01 00:00:00+00:00
22 1970-01-01 00:00:00+00:00
23 1970-01-01 00:00:00+00:00
24 1970-01-01 00:00:00+00:00
25 1970-01-01 00:00:00+00:00
26 1970-01-01 00:00:00+00:00
27 1970-01-01 00:00:00+00:00
28 1970-01-01 00:00:00+00:00
29 1970-01-01 00:00:00+00:00
...
42 2016-07-19 18:00:00+00:00
43 1970-01-01 00:00:00+00:00
44 1970-01-01 00:00:00+00:00
45 1970-01-01 00:00:00+00:00
46 1970-01-01 00:00:00+00:00
47 1970-01-01 00:00:00+00:00
48 1970-01-01 00:00:00+00:00
49 1970-01-01 00:00:00+00:00
50 1970-01-01 00:00:00+00:00
51 1970-01-01 00:00:00+00:00
52 1970-01-01 00:00:00+00:00
53 1970-01-01 00:00:00+00:00
54 1970-01-01 00:00:00+00:00
55 1970-01-01 00:00:00+00:00
56 1970-01-01 00:00:00+00:00
57 1970-01-01 00:00:00+00:00
58 1970-01-01 00:00:00+00:00
59 1970-01-01 00:00:00+00:00
60 1970-01-01 00:00:00+00:00
61 1970-01-01 00:00:00+00:00
62 1970-01-01 00:00:00+00:00
63 1970-01-01 00:00:00+00:00
64 1970-01-01 00:00:00+00:00
65 1970-01-01 00:00:00+00:00
66 1970-01-01 00:00:00+00:00
67 1970-01-01 00:00:00+00:00
68 1970-01-01 00:00:00+00:00
69 1970-01-01 00:00:00+00:00
70 1970-01-01 00:00:00+00:00
71 1970-01-01 00:00:00+00:00
dtype: datetime64[ns, tzlocal()]
In [90]: pd.Series([dateutil.parser.parse(y) for y in (x + x + x)]).to_dict()
Out[90]:
{0: Timestamp('2016-07-19 00:00:00+0000', tz='tzlocal()'),
1: Timestamp('2016-07-19 01:00:00+0000', tz='tzlocal()'),
2: Timestamp('2016-07-19 02:00:00+0000', tz='tzlocal()'),
3: Timestamp('2016-07-19 03:00:00+0000', tz='tzlocal()'),
4: Timestamp('2016-07-19 04:00:00+0000', tz='tzlocal()'),
5: Timestamp('2016-07-19 05:00:00+0000', tz='tzlocal()'),
6: Timestamp('2016-07-19 06:00:00+0000', tz='tzlocal()'),
7: Timestamp('2016-07-19 07:00:00+0000', tz='tzlocal()'),
8: Timestamp('2016-07-19 08:00:00+0000', tz='tzlocal()'),
9: Timestamp('2016-07-19 09:00:00+0000', tz='tzlocal()'),
10: Timestamp('2016-07-19 10:00:00+0000', tz='tzlocal()'),
11: Timestamp('2016-07-19 11:00:00+0000', tz='tzlocal()'),
12: Timestamp('2016-07-19 12:00:00+0000', tz='tzlocal()'),
13: Timestamp('2016-07-19 13:00:00+0000', tz='tzlocal()'),
14: Timestamp('2016-07-19 14:00:00+0000', tz='tzlocal()'),
15: Timestamp('2016-07-19 15:00:00+0000', tz='tzlocal()'),
16: Timestamp('2016-07-19 16:00:00+0000', tz='tzlocal()'),
17: Timestamp('2016-07-19 17:00:00+0000', tz='tzlocal()'),
18: Timestamp('2016-07-19 18:00:00+0000', tz='tzlocal()'),
19: Timestamp('2016-07-19 19:00:00+0000', tz='tzlocal()'),
20: Timestamp('2016-07-19 20:00:00+0000', tz='tzlocal()'),
21: Timestamp('2016-07-19 21:00:00+0000', tz='tzlocal()'),
22: Timestamp('2016-07-19 22:00:00+0000', tz='tzlocal()'),
23: Timestamp('2016-07-19 23:00:00+0000', tz='tzlocal()'),
24: Timestamp('2016-07-19 00:00:00+0000', tz='tzlocal()'),
25: Timestamp('2016-07-19 01:00:00+0000', tz='tzlocal()'),
26: Timestamp('2016-07-19 02:00:00+0000', tz='tzlocal()'),
27: Timestamp('2016-07-19 03:00:00+0000', tz='tzlocal()'),
28: Timestamp('2016-07-19 04:00:00+0000', tz='tzlocal()'),
29: Timestamp('2016-07-19 05:00:00+0000', tz='tzlocal()'),
30: Timestamp('2016-07-19 06:00:00+0000', tz='tzlocal()'),
31: Timestamp('2016-07-19 07:00:00+0000', tz='tzlocal()'),
32: Timestamp('2016-07-19 08:00:00+0000', tz='tzlocal()'),
33: Timestamp('2016-07-19 09:00:00+0000', tz='tzlocal()'),
34: Timestamp('2016-07-19 10:00:00+0000', tz='tzlocal()'),
35: Timestamp('2016-07-19 11:00:00+0000', tz='tzlocal()'),
36: Timestamp('2016-07-19 12:00:00+0000', tz='tzlocal()'),
37: Timestamp('2016-07-19 13:00:00+0000', tz='tzlocal()'),
38: Timestamp('2016-07-19 14:00:00+0000', tz='tzlocal()'),
39: Timestamp('2016-07-19 15:00:00+0000', tz='tzlocal()'),
40: Timestamp('2016-07-19 16:00:00+0000', tz='tzlocal()'),
41: Timestamp('2016-07-19 17:00:00+0000', tz='tzlocal()'),
42: Timestamp('2016-07-19 18:00:00+0000', tz='tzlocal()'),
43: Timestamp('2016-07-19 19:00:00+0000', tz='tzlocal()'),
44: Timestamp('2016-07-19 20:00:00+0000', tz='tzlocal()'),
45: Timestamp('2016-07-19 21:00:00+0000', tz='tzlocal()'),
46: Timestamp('2016-07-19 22:00:00+0000', tz='tzlocal()'),
47: Timestamp('2016-07-19 23:00:00+0000', tz='tzlocal()'),
48: Timestamp('2016-07-19 00:00:00+0000', tz='tzlocal()'),
49: Timestamp('2016-07-19 01:00:00+0000', tz='tzlocal()'),
50: Timestamp('2016-07-19 02:00:00+0000', tz='tzlocal()'),
51: Timestamp('2016-07-19 03:00:00+0000', tz='tzlocal()'),
52: Timestamp('2016-07-19 04:00:00+0000', tz='tzlocal()'),
53: Timestamp('2016-07-19 05:00:00+0000', tz='tzlocal()'),
54: Timestamp('2016-07-19 06:00:00+0000', tz='tzlocal()'),
55: Timestamp('2016-07-19 07:00:00+0000', tz='tzlocal()'),
56: Timestamp('2016-07-19 08:00:00+0000', tz='tzlocal()'),
57: Timestamp('2016-07-19 09:00:00+0000', tz='tzlocal()'),
58: Timestamp('2016-07-19 10:00:00+0000', tz='tzlocal()'),
59: Timestamp('2016-07-19 11:00:00+0000', tz='tzlocal()'),
60: Timestamp('2016-07-19 12:00:00+0000', tz='tzlocal()'),
61: Timestamp('2016-07-19 13:00:00+0000', tz='tzlocal()'),
62: Timestamp('2016-07-19 14:00:00+0000', tz='tzlocal()'),
63: Timestamp('2016-07-19 15:00:00+0000', tz='tzlocal()'),
64: Timestamp('2016-07-19 16:00:00+0000', tz='tzlocal()'),
65: Timestamp('2016-07-19 17:00:00+0000', tz='tzlocal()'),
66: Timestamp('2016-07-19 18:00:00+0000', tz='tzlocal()'),
67: Timestamp('2016-07-19 19:00:00+0000', tz='tzlocal()'),
68: Timestamp('2016-07-19 20:00:00+0000', tz='tzlocal()'),
69: Timestamp('2016-07-19 21:00:00+0000', tz='tzlocal()'),
70: Timestamp('2016-07-19 22:00:00+0000', tz='tzlocal()'),
71: Timestamp('2016-07-19 23:00:00+0000', tz='tzlocal()')}
Comment From: se7entyse7en
This problem becomes more important when dealing with Dataframe
s and aggregation as in the following example:
In [91]: df = pd.DataFrame({'timestamp': x + x + x, 'count': 1})
In [92]: df
Out[92]:
count timestamp
0 1 2016-07-19T00:00:00.000Z
1 1 2016-07-19T01:00:00.000Z
2 1 2016-07-19T02:00:00.000Z
3 1 2016-07-19T03:00:00.000Z
4 1 2016-07-19T04:00:00.000Z
5 1 2016-07-19T05:00:00.000Z
6 1 2016-07-19T06:00:00.000Z
7 1 2016-07-19T07:00:00.000Z
8 1 2016-07-19T08:00:00.000Z
9 1 2016-07-19T09:00:00.000Z
10 1 2016-07-19T10:00:00.000Z
11 1 2016-07-19T11:00:00.000Z
12 1 2016-07-19T12:00:00.000Z
13 1 2016-07-19T13:00:00.000Z
14 1 2016-07-19T14:00:00.000Z
15 1 2016-07-19T15:00:00.000Z
16 1 2016-07-19T16:00:00.000Z
17 1 2016-07-19T17:00:00.000Z
18 1 2016-07-19T18:00:00.000Z
19 1 2016-07-19T19:00:00.000Z
20 1 2016-07-19T20:00:00.000Z
21 1 2016-07-19T21:00:00.000Z
22 1 2016-07-19T22:00:00.000Z
23 1 2016-07-19T23:00:00.000Z
24 1 2016-07-19T00:00:00.000Z
25 1 2016-07-19T01:00:00.000Z
26 1 2016-07-19T02:00:00.000Z
27 1 2016-07-19T03:00:00.000Z
28 1 2016-07-19T04:00:00.000Z
29 1 2016-07-19T05:00:00.000Z
.. ... ...
42 1 2016-07-19T18:00:00.000Z
43 1 2016-07-19T19:00:00.000Z
44 1 2016-07-19T20:00:00.000Z
45 1 2016-07-19T21:00:00.000Z
46 1 2016-07-19T22:00:00.000Z
47 1 2016-07-19T23:00:00.000Z
48 1 2016-07-19T00:00:00.000Z
49 1 2016-07-19T01:00:00.000Z
50 1 2016-07-19T02:00:00.000Z
51 1 2016-07-19T03:00:00.000Z
52 1 2016-07-19T04:00:00.000Z
53 1 2016-07-19T05:00:00.000Z
54 1 2016-07-19T06:00:00.000Z
55 1 2016-07-19T07:00:00.000Z
56 1 2016-07-19T08:00:00.000Z
57 1 2016-07-19T09:00:00.000Z
58 1 2016-07-19T10:00:00.000Z
59 1 2016-07-19T11:00:00.000Z
60 1 2016-07-19T12:00:00.000Z
61 1 2016-07-19T13:00:00.000Z
62 1 2016-07-19T14:00:00.000Z
63 1 2016-07-19T15:00:00.000Z
64 1 2016-07-19T16:00:00.000Z
65 1 2016-07-19T17:00:00.000Z
66 1 2016-07-19T18:00:00.000Z
67 1 2016-07-19T19:00:00.000Z
68 1 2016-07-19T20:00:00.000Z
69 1 2016-07-19T21:00:00.000Z
70 1 2016-07-19T22:00:00.000Z
71 1 2016-07-19T23:00:00.000Z
[72 rows x 2 columns]
In [93]: df['timestamp'] = [dateutil.parser.parse(y) for y in df['timestamp']]
In [94]: df.groupby(['timestamp'])['count'].sum()
Out[94]:
timestamp
1970-01-01 00:00:00+00:00 71
2016-07-19 00:00:00+00:00 1
Name: count, dtype: int64
But for some reason when the timezone is removed or changed (I tried pytz.utc
) everything works as expected:
In [95]: df = pd.DataFrame({'timestamp': x + x + x, 'count': 1})
In [96]: df['timestamp'] = [dateutil.parser.parse(y).replace(tzinfo=None) for y in df['timestamp']]
In [97]: df.groupby(['timestamp'])['count'].sum()
Out[97]:
timestamp
2016-07-19 00:00:00 3
2016-07-19 01:00:00 3
2016-07-19 02:00:00 3
2016-07-19 03:00:00 3
2016-07-19 04:00:00 3
2016-07-19 05:00:00 3
2016-07-19 06:00:00 3
2016-07-19 07:00:00 3
2016-07-19 08:00:00 3
2016-07-19 09:00:00 3
2016-07-19 10:00:00 3
2016-07-19 11:00:00 3
2016-07-19 12:00:00 3
2016-07-19 13:00:00 3
2016-07-19 14:00:00 3
2016-07-19 15:00:00 3
2016-07-19 16:00:00 3
2016-07-19 17:00:00 3
2016-07-19 18:00:00 3
2016-07-19 19:00:00 3
2016-07-19 20:00:00 3
2016-07-19 21:00:00 3
2016-07-19 22:00:00 3
2016-07-19 23:00:00 3
Name: count, dtype: int64
In [101]: df = pd.DataFrame({'timestamp': x + x + x, 'count': 1})
In [102]: df['timestamp'] = [dateutil.parser.parse(y).replace(tzinfo=pytz.utc) for y in df['timestamp']]
In [103]: df.groupby(['timestamp'])['count'].sum()
Out[103]:
timestamp
2016-07-19 00:00:00+00:00 3
2016-07-19 01:00:00+00:00 3
2016-07-19 02:00:00+00:00 3
2016-07-19 03:00:00+00:00 3
2016-07-19 04:00:00+00:00 3
2016-07-19 05:00:00+00:00 3
2016-07-19 06:00:00+00:00 3
2016-07-19 07:00:00+00:00 3
2016-07-19 08:00:00+00:00 3
2016-07-19 09:00:00+00:00 3
2016-07-19 10:00:00+00:00 3
2016-07-19 11:00:00+00:00 3
2016-07-19 12:00:00+00:00 3
2016-07-19 13:00:00+00:00 3
2016-07-19 14:00:00+00:00 3
2016-07-19 15:00:00+00:00 3
2016-07-19 16:00:00+00:00 3
2016-07-19 17:00:00+00:00 3
2016-07-19 18:00:00+00:00 3
2016-07-19 19:00:00+00:00 3
2016-07-19 20:00:00+00:00 3
2016-07-19 21:00:00+00:00 3
2016-07-19 22:00:00+00:00 3
2016-07-19 23:00:00+00:00 3
Name: count, dtype: int64
Comment From: jreback
timezones are quite tricky, pandas handles them very very well, see the docs.
you are doing some very odd conversions.
simpler (and much much faster) to use the pandas native parsers and such
In [4]: pd.Series(pd.to_datetime(x, utc=True))
Out[4]:
0 2016-07-19 00:00:00+00:00
1 2016-07-19 01:00:00+00:00
2 2016-07-19 02:00:00+00:00
3 2016-07-19 03:00:00+00:00
4 2016-07-19 04:00:00+00:00
5 2016-07-19 05:00:00+00:00
...
18 2016-07-19 18:00:00+00:00
19 2016-07-19 19:00:00+00:00
20 2016-07-19 20:00:00+00:00
21 2016-07-19 21:00:00+00:00
22 2016-07-19 22:00:00+00:00
23 2016-07-19 23:00:00+00:00
dtype: datetime64[ns, UTC]
Comment From: jorisvandenbossche
@se7entyse7en Not sure what is going on on your computer, but your code runs fine for me:
In [14]: pd.Series([dateutil.parser.parse(y) for y in (x + x + x)])
Out[14]:
0 2016-07-19 00:00:00+00:00
1 2016-07-19 01:00:00+00:00
2 2016-07-19 02:00:00+00:00
3 2016-07-19 03:00:00+00:00
4 2016-07-19 04:00:00+00:00
...
67 2016-07-19 19:00:00+00:00
68 2016-07-19 20:00:00+00:00
69 2016-07-19 21:00:00+00:00
70 2016-07-19 22:00:00+00:00
71 2016-07-19 23:00:00+00:00
dtype: datetime64[ns, UTC]
But indeed, @jreback says, better to just use to_datetime
for string parsing