https://en.wikipedia.org/wiki/List_of_countries_by_road_network_size

Copy/paste the table HTML in a file

df=pd.read_html(table) df=pd.DataFrame(df[0])

The result is shown in the image. The Density column values are missing.

Screen Shot 2566-02-25 at 16 49 16

Comment From: phofl

Hi, thanks for your report.

Can you please provide a minimal and reproducible example? You can define your html table as

data = """Put your html table here
"""
pd.read_html(StringIO(data))

Also, please provide your pandas versions and dependencies, e.g. fill out the issue template

Comment From: manzikki

here's a few first lines of the table in question. Pandas 1.3.5. I'm running it on Google Colab.

import pandas as pd from io import StringIO; roadtable = """

Country Total
(km)
Density
(km/100 km2)
Paved
(km)
Unpaved
(km)
Controlled-access
(km)
Source
& Year
World 64,285,009 47 2021
United States * 6,803,479 69 4,304,715 63% 2,581,895 38% 95,932 1.4% [3] 2019
""" df = pd.read_html(StringIO(roadtable))[0]

Comment From: phofl

Can you try on the newest pandas version? 2.0.0rc0

Comment From: manzikki

Unfortunately the issue is there with 2.0.0rc0 pd.version '2.0.0rc0'

print(df.columns) Index(['Country', 'Total .mw-parser-output .nobold{font-weight:normal}(km)', 'Density (km/100 km2)', 'Paved (km)', 'Paved (km).1', 'Unpaved (km)', 'Unpaved (km).1', 'Controlled-access (km)', 'Controlled-access (km).1', 'Source & Year'], dtype='object') print(df['Density (km/100 km2)']) 0 NaN 1 NaN 2 NaN Name: Density (km/100 km2), dtype: float64

Comment From: phofl

Good, now please reduce everything from your html table that is not necessary to reproduce. The example should be minimal

Comment From: m-ganko

Hi, it seems the problem is related to style="display:none". When one of elements in table cell has this style attribute, pandas returns NaN, while it should only skip this element. Below reproducible example:

html_table = """
    <table>
  <tr>
    <th>Col 1</th>
    <th>Col 2</th>
    <th>Col 3</th>
  </tr>
  <tr>
    <td>1</td>
    <td>2</td>
    <td>3</td>
  </tr>
  <tr>
    <td><span style="display:none"></span>4</td>
    <td>5</td>
    <td>6</td>
  </tr>
</table>
"""

pd.read_html(html_table)[0]

Out[1]:
   Col 1  Col 2  Col 3
0    1.0      2      3
1    NaN      5      6

I can try to fix this problem.

Comment From: m-ganko

take

Comment From: m-ganko

I have created PR which should resolve main issue.

But in this wikipedia example we can see another one, read_html reads also <style> element text. I would suggest adding new argument to read_html function e.g. skip_style_elements. I could also work on this, just let me know if it make sense for you and if I should create new issue.