Pandas df.assign with boolean dummy series results in NaN Series.

Code Sample, a copy-pastable example if possible

import pandas
import io

csv_file = io.StringIO("""datetime,temperature_husconet,lat_husconet,lon_husconet,temperature_pws,dewpoint_pws,humidity_pws,lat_pws,lon_pws,temperature_eddh,dewpoint_eddh,windspeed_eddh,windgust_eddh,winddirection_eddh,pressure_eddh,humidity_eddh,precipitation_eddh,cloudcover_eddh_BKN,cloudcover_eddh_CAVOC,cloudcover_eddh_FEW,cloudcover_eddh_NSC,cloudcover_eddh_OVC,cloudcover_eddh_SCT,cloudcover_eddh_VV
2016-01-01 00:00:00,5.688,53.551105624479334,9.984235400036294,2.9,1.9,93.0,53.57898,9.79428,3.0,3.0,7.2,0.0,160.0,1023.0,94.0,,0,1,0,0,0,0,0
2016-01-01 00:00:00,4.508,53.540879976582616,9.995115621037971,2.9,1.9,93.0,53.57898,9.79428,3.0,3.0,7.2,0.0,160.0,1023.0,94.0,,0,1,0,0,0,0,0
2016-01-01 00:02:00,5.694,53.551105624479334,9.984235400036294,2.6,2.4,99.0,53.570890000000006,9.88888,3.0,3.0,7.2,0.0,160.0,1023.0,94.0,,0,1,0,0,0,0,0
2016-01-01 00:02:00,4.367,53.540879976582616,9.995115621037971,2.6,2.4,99.0,53.570890000000006,9.88888,3.0,3.0,7.2,0.0,160.0,1023.0,94.0,,0,1,0,0,0,0,0
""")

data_df = pandas.read_csv(
    csv_file,
    index_col="datetime",
    parse_dates=["datetime"]
)

df_hour = pandas.get_dummies(data_df.index.hour, prefix="hour")

df_hour_dict_version = {column: df_hour[column] for column in df_hour.columns}
data_df = data_df.assign(**df_hour_dict_version)

Problem description

I want to add the dummy variables back to the data frame as later in the process I will remove the index. While it is a dictionary, meaning df_hour_dict_version, everything is fine, the dummy columns are series. They are uint8 values even though as dummy variables boolean would be more appropriate but that is how get_dummies is implemented. But for some reason, after the assign the series turn into NaN columns. Out of the sudden all 0's and 1's are replaced by NaN!

Expected Output

Keep the series as they are.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 32 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.1 pytest: None pip: 9.0.1 setuptools: 28.8.0 Cython: 0.26 numpy: 1.12.1 scipy: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

That's because you're indexes don't match.

In [21]: df_hour.index
Out[21]: RangeIndex(start=0, stop=4, step=1)

In [22]: data_df.index
    ...:
    ...:
Out[22]:
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
               '2016-01-01 00:02:00', '2016-01-01 00:02:00'],
              dtype='datetime64[ns]', name='datetime', freq=None)

In [23]: data_df['foo'] = df_hour

In [24]: data_df['foo']
Out[24]:
datetime
2016-01-01 00:00:00   NaN
2016-01-01 00:00:00   NaN
2016-01-01 00:02:00   NaN
2016-01-01 00:02:00   NaN
Name: foo, dtype: float64

.assign is doing the same thing (you can check the source, it's quite short), so all the usual alignment rules apply. df_hour is aligned to the original index, and NaNs are inserted. Your best bet is to set df_hour.index = data_df.index before assigning.

Pandas df.assign with boolean dummy series results in NaN Series.

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`