Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
from io import StringIO

data1="""
1,Alice
2,Bob
3,Chris
"""
df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
print(df1)
#    id   name  country
# 0   1  Alice      NaN
# 1   2    Bob      NaN
# 2   3  Chris      NaN


data2="""
1,Alice
"""
df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
# ParserError: Too many columns specified: expected 3 and found 2

Issue Description

I want to read csv which does not contain header line.

When csv has 3 lines, pandas can read csv. But csv has 1 line, pandas cannot read csv. The following error occurred.

ParserError                               Traceback (most recent call last)
Cell In[33], line 4
      1 data2="""
      2 1,Alice
      3 """
----> 4 df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:611, in _read(filepath_or_buffer, kwds)
    608     return parser
    610 with parser:
--> 611     return parser.read(nrows)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1778, in TextFileReader.read(self, nrows)
   1771 nrows = validate_integer("nrows", nrows)
   1772 try:
   1773     # error: "ParserBase" has no attribute "read"
   1774     (
   1775         index,
   1776         columns,
   1777         col_dict,
-> 1778     ) = self._engine.read(  # type: ignore[attr-defined]
   1779         nrows
   1780     )
   1781 except Exception:
   1782     self.close()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:230, in CParserWrapper.read(self, nrows)
    228 try:
    229     if self.low_memory:
--> 230         chunks = self._reader.read_low_memory(nrows)
    231         # destructive to chunks
    232         data = _concatenate_chunks(chunks)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:808, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:890, in pandas._libs.parsers.TextReader._read_rows()

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:952, in pandas._libs.parsers.TextReader._convert_column_data()

ParserError: Too many columns specified: expected 3 and found 2

Expected Behavior

Even if csv has 1 line, pandas can read csv.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.11.2.final.0 python-bits : 64 OS : Linux OS-release : 5.15.0-60-generic Version : #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : ja_JP.UTF-8 LOCALE : ja_JP.UTF-8 pandas : 1.5.3 numpy : 1.24.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 65.5.0 pip : 23.0.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.11.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Comment From: asishm

Can't reproduce on main

Comment From: yuji38kwmt

Sorry, I had not confirmed on the main branch.

Which pull request solved this bug?

Comment From: PKNaveen

In reference to this, Stackoverflow Pandas Parse-errors

Pandas.read csv DocumentationPandas DOC

I did replicate this error, changing the Engine to 'Python' seems to correct this issue. If the engine command is not issued the error occurs and if u specify the C engine or pyarrow the same error occurs. I am not sure why this is so , is the C engine running on default for this read?

Comment From: yuji38kwmt

Thansks!

When I specified enginge='python", the error did not occurr.

In [18]: data2="""
    ...: 1,Alice
    ...: """

In [25]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="python")
Out[25]: 
   id   name  country
0   1  Alice      NaN

In [26]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="c")
---------------------------------------------------------------------------
...
ParserError: Too many columns specified: expected 3 and found 2


In [30]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="pyarrow")
---------------------------------------------------------------------------
...
ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
*  pyarrow v11.0.0

I am not sure why this is so , is the C engine running on default for this read?

I'm sorry, but I don't know what the default engine is. How can I confirm the default engine?

Comment From: PKNaveen

According to the documentation there's no information regarding what engine is default, but the stack overflow link posted before says C engine is taken as default. This is my assumption that the default is C

Comment From: lithomas1

Going to close, because I can't reproduce on main (currently 2.1 dev for me). You might want to try 2.0/2.0.1 to see if that fixes it.

Comment From: yuji38kwmt

I have confirmed that there is no problem in pandas v2.0.1 .

In [82]: import pandas
    ...: from io import StringIO
    ...: 
    ...: data1="""
    ...: 1,Alice
    ...: 2,Bob
    ...: 3,Chris
    ...: """
    ...: df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
    ...: print(df1)
    ...: #    id   name  country
    ...: # 0   1  Alice      NaN
    ...: # 1   2    Bob      NaN
    ...: # 2   3  Chris      NaN
    ...: 
    ...: 
    ...: data2="""
    ...: 1,Alice
    ...: """
    ...: df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
    ...: # ParserError: Too many columns specified: expected 3 and found 2
   id   name  country
0   1  Alice      NaN
1   2    Bob      NaN
2   3  Chris      NaN

In [83]: pandas.__version__
Out[83]: '2.0.1'