Code Sample, a copy-pastable example if possible
# Setup
import pandas as pd
dates = pd.to_datetime(range(1,300), unit='D', origin=pd.Timestamp('2000-01-01'))
se = pd.Series(range(1,300),dates)
df = pd.DataFrame(se).reset_index()
# The problem:
>>> df
index 0
0 2000-01-02 1
1 2000-01-03 2
2 2000-01-04 3
3 2000-01-05 4
4 2000-01-06 5
5 2000-01-07 6
.. ... ...
294 2000-10-22 295
295 2000-10-23 296
296 2000-10-24 297
297 2000-10-25 298
298 2000-10-26 299
[299 rows x 2 columns]
>>> pd.Series(df['index'], index = df[0])
0
1 2000-01-03
2 2000-01-04
3 2000-01-05
4 2000-01-06
5 2000-01-07
...
295 2000-10-23
296 2000-10-24
297 2000-10-25
298 2000-10-26
299 NaT
Name: index, Length: 299, dtype: datetime64[ns]
Problem description
The commands above should not generate any NaT.
In a real life case, I tried to do the same thing with 56355 rows and had 53913 NaT at the end.
Expected Output
No NaT
My current work around is: df.set_index('0')
Output of pd.show_versions()
Comment From: jreback
pd.Series(df['index'], index = df[0])
is effectively a .reindex()
. You are asking this to align the passed values to the passed index.
In [15]: df['index'].reindex(df[0])
Out[15]:
0
1 2000-01-03
2 2000-01-04
3 2000-01-05
4 2000-01-06
5 2000-01-07
6 2000-01-08
...
294 2000-10-22
295 2000-10-23
296 2000-10-24
297 2000-10-25
298 2000-10-26
299 NaT
Name: index, Length: 299, dtype: datetime64[ns]
You could in theory do this:
In [16]: pd.Series(df['index'].values, df[0].index)
Out[16]:
0 2000-01-02
1 2000-01-03
2 2000-01-04
3 2000-01-05
4 2000-01-06
5 2000-01-07
...
293 2000-10-21
294 2000-10-22
295 2000-10-23
296 2000-10-24
297 2000-10-25
298 2000-10-26
Length: 299, dtype: datetime64[ns]
This is the correct idiom
In [17]: df.set_index(0)
Out[17]:
index
0
1 2000-01-02
2 2000-01-03
3 2000-01-04
4 2000-01-05
5 2000-01-06
6 2000-01-07
.. ...
294 2000-10-21
295 2000-10-22
296 2000-10-23
297 2000-10-24
298 2000-10-25
299 2000-10-26
[299 rows x 1 columns]
Comment From: catkfr
Thanks for the quick reply and for explaining the proper way to do things.
As often, there are multiple ways to do this and I had found the df.set_index(0)
But don't you think there is an underlying bug here? As mentionned, I stubbled upon this working on a dataframe with 56355 and 50+ columns. If what I did was wrong, maybe an error would have helped. Here, I had no errors, just almost 54000 NaT in a series that I wanted to use for other things.
Comment From: jreback
no there is 1 way to do this particular operation
you are fighting alignment by trying to manually construct a Series. it is correctly operating by actually aligning the passes series to the index. it is unexpected since you are passing a Series as the index and expecting it to align in ITS index
Comment From: catkfr
Ah ok: good to know
Actually the context was that I was building a dictionary from a database. One column was a reference and another was a date. My objective was to build a dictionary from ref to date.
The data had multiple rows per reference and date. I extracted these two columns, checked for duplicates then wanted to use to_dict() so needed to create a Series one way or the other.
Is there also 1 proper way to do this?
Comment From: jreback
you are best off asking these kinds of questions on SO with a reproducible example
Comment From: catkfr
ok. Thanks for your help