This is the origin of #23558 and #23677, but IMO worthy to fix in its own right:
Trying to create a MultiIndex from a list of tuples containing NaNs (like Index.str.partition
does if there are NaNs present) yields:
>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
[...]
TypeError: object of type 'float' has no len()
However, it works easily when passing a tuple of NaNs
>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,) * 3, ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])
In fact, the length of the tuple is irrelevant:
>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,), ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])
Aside from the inconvenience, it is also a real problem to set elements of an array (say if there are several NaNs) to tuples, because they almost always get interpreted as another axis/list-like and then give creative errors.
I'm thinking this extra fail-safe should not be controversial, and I've got a fix prepared already, which is blocked by #23582 (at least in terms of adding tests).
Comment From: gfyoung
cc @toobaz
Comment From: toobaz
I guess there should be a link between the two following proposals I see in this issue, but I'm failing to grasp it.
1) pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
fails: I think that's perfectly fine. Or at least, I would not call it a bug. Sure, there is scope for an enhancement, but one that I do not feel very important, and that at the same time doesn't seem easy to implement/maintain.
Incidentally, I think that #23558 should be fixed by just passing a tuple of NaN
s, and that pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,), ('d', '', '')])
should have raised - but never mind because it is not harmful and changing it now might break code.
2)
Aside from the inconvenience, it is also a real problem to set elements of an array (say if there are several NaNs) to tuples, because they almost always get interpreted as another axis/list-like and then give creative errors.
What do you mean by "an array"? Creating a MultiIndex
with tuple labels is easy:
In [2]: pd.MultiIndex.from_tuples([('a', 'b', 'c'), ('d', (2, 3), '')])
Out[2]:
MultiIndex(levels=[['a', 'd'], [(2, 3), 'b'], ['', 'c']],
labels=[[0, 1], [1, 0], [1, 0]])
In any case, I fail to see the link with point 1).
Comment From: mroeschke
This failing example seem to work now. Could use a test
Comment From: srkds
This failing example seem to work now. Could use a test
Hi @mroeschke
I went through the issue and related discussions and I found commit , mentioned here
It contains the same taste case that I think we want to add. but I think this is only committed to Fork and I didn't find it on the main.
I would like to contribute, could you please guide me on how should I proceed next?
Thank you!