Pandas Infer dtype when using df.explode()ENH:

Is your feature request related to a problem?

Yes. Currently, the df.explode method always returns an object for the column being exploded. This leads to loss of information about the dtype of the exploded column.

E.g.

s = pd.Series([1,2,3]) # <- dtype('int64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1})
df.explode("A").dtypes

	0
A	object
B	int64

It would be great if pandas could return the underlying dtype if it was consistent across all rows. (Or return the best dtype (int -> float -> object).)

Describe the solution you'd like

solution 1: The best case scenario would be where pandas would directly infer the dtype if it was consistent (ignoring NaNs) across the across the row.

s = pd.Series([1,None,3]) # <- dtype('float64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1}) # <- empty list is converted to NaN 
df.explode("A").dtypes

	0
A	float64
B	int64

solution 2: Providing a argument to force inferring the dtype:

s = pd.Series([1,None,3]) # <- dtype('float64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1}) # <- empty list is converted to NaN 
df.explode("A", infer_type=True).dtypes

	0
A	float64
B	int64

Describe alternatives you've considered

Currently, I use the following workaround:

s = pd.Series([1,None,3]) # <- dtype('float64')
df = pd.DataFrame({'A': [s, s, s, s], 'B': 1}) # <- empty list is converted to NaN 

d = df.A[0].dtype
df2 = df.explode("A")
df2.A = df2.A.astype(d)

API breaking implications

Not sure.

Comment From: WillAyd

I think this is reasonable. Can probably add a maybe_infer_objects call on the result of the exploded column (you'll find this in pandas._libs.lib)

Comment From: erfannariman

Can you elaborate @WillAyd , I looked in lib but there was nothing even starting with maybe_infer, also in the whole project I could only find:

maybe_infer_freq
maybe_infer_tz
maybe_infer_dtype_type

Comment From: WillAyd

My mistake - I meant to say maybe_convert_objects which is in pandas._libs.lib

Comment From: jreback

ok this is not going to work as-is as its a breaking change. would suggest that we add downcast=None|'infer' option which folks can opt-in (and at some point we could deprecate and default).

Comment From: jbrockmendel

if the implementation is just going to be if infer: return result.infer_objects(copy=False), then we should just tell users to do that directly

Comment From: daskol

then we should just tell users to do that directly

I think the issue is that pandas does not preserve element type at the moment. From my perspective, explode should create a column with common dtype of all elements of lists by default because there is no a meaningful reason to change data element type from np.int64 to object. So, explode should act like [[a]] -> [a] but this requires that column elements are actually of common list type [a].