Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
fill_value = 0
#fill_value = np.nan
df0 = pd.DataFrame({'A': [1, fill_value, fill_value]})
df1 = pd.DataFrame({'B': ['a','b','c']})
c = pd.concat([df0, df1], axis=1)
print(c) # >>> OK
print(c['A']) # >>> OK
print(c['B']) # >>> OK
#========================
print("=====================")
df0_s = df0.to_sparse(fill_value = fill_value)
c = pd.concat([df0_s, df1], axis=1)
print(c) # >>> OK
print(c['A']) # >>> OK
print(c['B']) # >>> ERROR: 'IntBlock' object has no attribute 'sp_index'
Problem description
The above code fails, when using mixed type data on segmenting the frame. A possible work-around is to recast the resulting sparse dataframe to a dense data frame via c_new = pd.DataFrame(c)
At the bootom of this it seems that pandas.concat
always uses the highest class object in the to catenate list, e.g. here the SparseDataFrame, to store teh resulting values in. This effectively works deep, below, however, the on surface interfaces are kind of blown out if the data contained is not as expected.
This might be a very similar issue as #18551 (https://github.com/pandas-dev/pandas/issues/18551)
Expected Output
well, hard to say if SparseDataFrames are to handle mixed type data (possibly keeping some of the columns sparse, other dense), or if pd.concat
should fall back to a dense DataFrame here. Anyhow, the issue should be fixed one way or another,
Output of pd.show_versions()
Comment From: jreback
a duplicate of #16874
yep, sparse could use some love!