Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

# Create DataFrames to play with
dates = pd.date_range("20170101", periods=10, freq="S")
df_a = pd.DataFrame(np.random.randn(10,2), index=dates, columns=list("AB"))
df_a = df_a.abs() * np.random.randint(0, 1000)
dates = pd.date_range("20170101", periods=5, freq="S")
df_b = pd.DataFrame(np.random.randn(5,2), index=dates, columns=list("AB"))
df_b = df_b.abs() * np.random.randint(0, 1000)
df_c = pd.DataFrame(np.random.randn(5,1), index=dates, columns=list("A"))
df_c = df_c.abs() * np.random.randint(0, 1000)
df_c["B"] = np.nan

# Have a look-see
print df_a; print 
print df_b

# Shorter index wins, everything else is NaN'd.
print df_a.add(df_b)

# NaNs wins.
print df_b.add(df_c)

# Concat works.
print pd.concat([df_a, df_b, df_c], axis=1)["B"].sum(axis=1)

df = pd.concat([df_a, df_b, df_c], axis=1)
series_dict = {}
for col in df.columns.unique():
  series_dict[col] = df[col].sum(axis=1)
df = pd.DataFrame(series_dict)

print df

Problem description

Adding DFs of varying lengths and with missing data results in missing data. 1. The shorter index is used and all values beyond are simply NaN'd. 2. If one of the two series values is NaN, NaN wins.

Expected Output

  1. The longer index DF should be used, filling the shorter DF with NaN.
  2. If a value is present, it should be used instead of NaN.

I've found DataFrame.concat to be a useful workaround. It will result in a DF with multiple columns having the same key. The unique columns can be iterated over, building a dictionary of series. This can then be used to create a new DataFrame. There's probably more elegant solutions, but it works.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 32.2.0 Cython: None numpy: 1.11.1 scipy: None statsmodels: None xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 1.5.3 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None

Comment From: TomAugspurger

It's not really the lengths that matter, rather the labels are aligned to match. I'd recommend you read through the docs on this. since df_b is a subset of df_a it looks like it's the lengths.

I don't understand what your second point means. An operation between two valid values will be valid. An operation between any value and nan will be nan.

Comment From: TomAugspurger

I'll close this, as auto-alignment is fundament to pandas, but feel free to point out places in the docs that aren't clear.