• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
pd.show_versions()

broken = pd.DataFrame([], columns=["a"], dtype="string")
broken.assign(b=[1]) # ValueError: StringArray requires a sequence of strings or pandas.NA

working1 = pd.DataFrame([pd.NA], columns=["a"], dtype="string")
working1.assign(b=[1]) # 0  <NA>  1
working2 = pd.DataFrame([], columns=["a"], dtype="object")
working2.assign(b=[1]) # 0  NaN  1

Full traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../site-packages/pandas/core/frame.py", line 4481, in assign
    data[k] = com.apply_if_callable(v, data)
  File ".../site-packages/pandas/core/frame.py", line 3607, in __setitem__
    self._set_item(key, value)
  File ".../site-packages/pandas/core/frame.py", line 3779, in _set_item
    value = self._sanitize_column(value)
  File ".../site-packages/pandas/core/frame.py", line 4497, in _sanitize_column
    self._ensure_valid_index(value)
  File ".../site-packages/pandas/core/frame.py", line 3853, in _ensure_valid_index
    self._mgr = self._mgr.reindex_axis(index_copy, axis=1, fill_value=np.nan)
  File ".../site-packages/pandas/core/internals/base.py", line 89, in reindex_axis
    return self.reindex_indexer(
  File ".../site-packages/pandas/core/internals/managers.py", line 680, in reindex_indexer
    new_blocks = [
  File ".../site-packages/pandas/core/internals/managers.py", line 681, in <listcomp>
    blk.take_nd(
  File ".../site-packages/pandas/core/internals/blocks.py", line 1512, in take_nd
    new_values = self.values.take(indexer, fill_value=fill_value, allow_fill=True)
  File ".../site-packages/pandas/core/arrays/_mixins.py", line 108, in take
    return self._from_backing_data(new_data)
  File ".../site-packages/pandas/core/arrays/numpy_.py", line 118, in _from_backing_data
    return type(self)(arr)
  File ".../site-packages/pandas/core/arrays/string_.py", line 318, in __init__
    self._validate()
  File ".../site-packages/pandas/core/arrays/string_.py", line 323, in _validate
    raise ValueError("StringArray requires a sequence of strings or pandas.NA")
ValueError: StringArray requires a sequence of strings or pandas.NA

Problem description

The broken code doesn't seem meaningfully different to the working versions, but throws an exception; from a very quick look (and as hinted by the working2 output), it looks like the StringArray is being constructed with an array containing NaN/float('nan'), and that value is considered/treated different to pd.NA.

There's a few other issues mentioning this error, but it wasn't obvious to me if they're the same:

  • 40800

  • 40823

  • 40824

Expected Output

The broken code should behave similarly to the very similar working1 and working2. The output should be the same as the working1:

      a  b
0  <NA>  1

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 5f648bf1706dd75a9ca0d29f26eadfbb595fe52b python : 3.8.7.final.0 python-bits : 64 OS : Darwin OS-release : 20.5.0 Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_AU.UTF-8 LOCALE : en_AU.UTF-8 pandas : 1.3.2 numpy : 1.20.2 pytz : 2021.1 dateutil : 2.8.1 pip : 21.2.4 setuptools : 49.2.1 Cython : None pytest : 6.1.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.1 (dt dec pq3 ext lo64) jinja2 : 2.11.3 IPython : 7.22.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : 1.3.20 tables : None tabulate : 0.8.9 xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : None

Comment From: eemilhaa

Was able to reproduce this with the code snippets.

Here is my output of pd.show_versions():

INSTALLED VERSIONS
------------------
commit           : 5f648bf1706dd75a9ca0d29f26eadfbb595fe52b
python           : 3.9.6.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.11.0-7620-generic
Version          : #21~1626191760~21.04~55de9c3-Ubuntu SMP Tue Jul 20 22:18:55 UTC 
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.2
numpy            : 1.21.2
pytz             : 2021.1
dateutil         : 2.8.2
pip              : 21.2.4
setuptools       : 57.4.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.1
IPython          : 7.26.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

Comment From: sourabh112

Can I try solving this issue( If it is not assigned to someone else )? But as I'm quite new to this so I will need some guidance about where should I start.

Comment From: mzeitlin11

Looks like #40839 would fix this, so might not be worth patching if needs to be done with a workaround since there's already been progress towards that (#41412)

Comment From: phofl

This works, I guess we can close, since the initial bug was located somewhere else