Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# Two tables, each header row + 3 data rows
file1 = "test1.xlsx" # blank row between the tables
file2 = "test2.xlsx" # no blank row between the tables
df1 = pd.read_excel(file1, header=0, nrows=4)
df2 = pd.read_excel(file2, header=0, nrows=4)
print(df1)
print(df2)
assert df1.shape == df2.shape
# df2 includes the header row of the following table
Issue Description
Consider two Excel files with nearly identical data: two tables, each with a header row and 3 data rows. The only difference is that the first has a blank row between the tables and the second does not.
It seems that the blank line makes a difference, even when nrows
is specified. I expect nrows=4
to always parse 4 rows, yielding a data frame with a header and 3 data rows. Yet without a blank line, read_excel
also includes the next row, which is the header for the next table.
Expected Behavior
I expect nrows=4
to always parse 4 rows regardless of context: a header and 3 data rows.