Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Two tables, each header row + 3 data rows
file1 = "test1.xlsx" # blank row between the tables
file2 = "test2.xlsx" # no blank row between the tables
df1 = pd.read_excel(file1, header=0, nrows=4)
df2 = pd.read_excel(file2, header=0, nrows=4)

print(df1)
print(df2)
assert df1.shape == df2.shape

# df2 includes the header row of the following table

Issue Description

Consider two Excel files with nearly identical data: two tables, each with a header row and 3 data rows. The only difference is that the first has a blank row between the tables and the second does not.

It seems that the blank line makes a difference, even when nrows is specified. I expect nrows=4 to always parse 4 rows, yielding a data frame with a header and 3 data rows. Yet without a blank line, read_excel also includes the next row, which is the header for the next table.

test1.xlsx test2.xlsx

Expected Behavior

I expect nrows=4 to always parse 4 rows regardless of context: a header and 3 data rows.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.11.0rc1 python-bits : 64 OS : Linux OS-release : 5.15.167.4-microsoft-standard-WSL2 Version : #1 SMP Tue Nov 5 00:21:55 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 pip : 25.0.1 Cython : None sphinx : None IPython : 8.12.3 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.3 lxml.etree : None matplotlib : 3.8.4 numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : 8.2.0 python-calamine : None pyxlsb : None s3fs : None scipy : 1.15.2 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 xlsxwriter : 3.2.0 zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None