Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df=pd.read_html (filename)
Issue Description
At one point, importing thml table data acquired from the same from the same source, Pandas suddenly rejected file, dropping down with error:
/usr/lib64/python3.9/site-packages/bs4/init.py:435: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.
warnings.warn(
Traceback (most recent call last):
File "/home/janis/Data/Elektreiba/NOMX-04.py", line 146, in
Two consecutive files (originally misnamed as xls,representing files before and after the problem) are attached. With the first and data before it everything worked fine (no need for additional lib), with the second html5lib was requested with the message:
Traceback (most recent call last):
File "/home/janis/Data/Elektreiba/NOMX-04.py", line 143, in
Both files look pretty similar and both open the same way in Firefox and Excel.
Expected Behavior
import of html table in both cases example.zip
Installed Versions
INSTALLED VERSIONS
commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.9.16.final.0 python-bits : 64 OS : Linux OS-release : 5.15.80 Version : #1 SMP PREEMPT Sun Nov 27 13:28:05 CST 2022 machine : x86_64 processor : Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz byteorder : little LC_ALL : None LANG : lv_LV.UTF-8 LOCALE : lv_LV.UTF-8
pandas : 1.5.3 numpy : 1.23.4 pytz : 2022.6 dateutil : 2.8.2 setuptools : 65.5.0 pip : 23.0.1 Cython : 0.29.32 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : None pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : 1.0.9 fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None
Comment From: Jancs-E
I recall the bug report - suddenly everything started to work without html5lib or other interference