Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
from io import StringIO
data1="""
1,Alice
2,Bob
3,Chris
"""
df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
print(df1)
# id name country
# 0 1 Alice NaN
# 1 2 Bob NaN
# 2 3 Chris NaN
data2="""
1,Alice
"""
df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
# ParserError: Too many columns specified: expected 3 and found 2
Issue Description
I want to read csv which does not contain header line.
When csv has 3 lines, pandas can read csv. But csv has 1 line, pandas cannot read csv. The following error occurred.
ParserError Traceback (most recent call last)
Cell In[33], line 4
1 data2="""
2 1,Alice
3 """
----> 4 df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
209 else:
210 kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
325 if len(args) > num_allow_args:
326 warnings.warn(
327 msg.format(arguments=_format_argument_list(allow_args)),
328 FutureWarning,
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(*args, **kwargs)
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
935 kwds_defaults = _refine_defaults_read(
936 dialect,
937 delimiter,
(...)
946 defaults={"delimiter": ","},
947 )
948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:611, in _read(filepath_or_buffer, kwds)
608 return parser
610 with parser:
--> 611 return parser.read(nrows)
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1778, in TextFileReader.read(self, nrows)
1771 nrows = validate_integer("nrows", nrows)
1772 try:
1773 # error: "ParserBase" has no attribute "read"
1774 (
1775 index,
1776 columns,
1777 col_dict,
-> 1778 ) = self._engine.read( # type: ignore[attr-defined]
1779 nrows
1780 )
1781 except Exception:
1782 self.close()
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:230, in CParserWrapper.read(self, nrows)
228 try:
229 if self.low_memory:
--> 230 chunks = self._reader.read_low_memory(nrows)
231 # destructive to chunks
232 data = _concatenate_chunks(chunks)
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:808, in pandas._libs.parsers.TextReader.read_low_memory()
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:890, in pandas._libs.parsers.TextReader._read_rows()
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pandas/_libs/parsers.pyx:952, in pandas._libs.parsers.TextReader._convert_column_data()
ParserError: Too many columns specified: expected 3 and found 2
Expected Behavior
Even if csv has 1 line, pandas can read csv.
Installed Versions
Comment From: asishm
Can't reproduce on main
Comment From: yuji38kwmt
Sorry, I had not confirmed on the main branch.
Which pull request solved this bug?
Comment From: PKNaveen
In reference to this, Stackoverflow Pandas Parse-errors
Pandas.read csv DocumentationPandas DOC
I did replicate this error, changing the Engine to 'Python' seems to correct this issue. If the engine command is not issued the error occurs and if u specify the C engine or pyarrow the same error occurs. I am not sure why this is so , is the C engine running on default for this read?
Comment From: yuji38kwmt
Thansks!
When I specified enginge='python"
, the error did not occurr.
In [18]: data2="""
...: 1,Alice
...: """
In [25]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="python")
Out[25]:
id name country
0 1 Alice NaN
In [26]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="c")
---------------------------------------------------------------------------
...
ParserError: Too many columns specified: expected 3 and found 2
In [30]: pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"], engine="pyarrow")
---------------------------------------------------------------------------
...
ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
* pyarrow v11.0.0
I am not sure why this is so , is the C engine running on default for this read?
I'm sorry, but I don't know what the default engine is. How can I confirm the default engine?
Comment From: PKNaveen
According to the documentation there's no information regarding what engine is default, but the stack overflow link posted before says C engine is taken as default. This is my assumption that the default is C
Comment From: lithomas1
Going to close, because I can't reproduce on main (currently 2.1 dev for me). You might want to try 2.0/2.0.1 to see if that fixes it.
Comment From: yuji38kwmt
I have confirmed that there is no problem in pandas v2.0.1 .
In [82]: import pandas
...: from io import StringIO
...:
...: data1="""
...: 1,Alice
...: 2,Bob
...: 3,Chris
...: """
...: df1 = pandas.read_csv(StringIO(data1), header=None, names=["id","name","country"])
...: print(df1)
...: # id name country
...: # 0 1 Alice NaN
...: # 1 2 Bob NaN
...: # 2 3 Chris NaN
...:
...:
...: data2="""
...: 1,Alice
...: """
...: df2 = pandas.read_csv(StringIO(data2), header=None, names=["id","name","country"])
...: # ParserError: Too many columns specified: expected 3 and found 2
id name country
0 1 Alice NaN
1 2 Bob NaN
2 3 Chris NaN
In [83]: pandas.__version__
Out[83]: '2.0.1'