pandas version: '0.19.2'


import requests
url ="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
response = requests.get(url)
if response.status_code == 200:
        soup = BeautifulSoup(response.content, "lxml")
        table = soup.find("table", class_="table_grey_border")
        board_data = pandas.read_html(table.prettify(),header=0, flavor="bs4")

        return board_data[0]


Problem description

       股份代號             股份名稱   買賣單位   附註 Unnamed: 4 Unnamed: 5 Unnamed: 6
0         1               長和    500    #          H          O          F
1         2             中電控股    500    #          H          O          F
2         3           香港中華煤氣   1000    #          H          O          F
3         4            九龍倉集團   1000    #          H          O          F
4         5             匯豐控股    400    #          H          O          F
5         6             電能實業    500    #          H          O          F
6         7             凱富能源   2000    #        NaN        NaN        NaN

股份代號 this column data should be 1->00001, 2->00002

datatype: 股份代號 int64 股份名稱 object 買賣單位 int64 附註 object Unnamed: 4 object Unnamed: 5 object Unnamed: 6 object dtype: object

why missing the 0000 data in the columns

actually, the 股份代號 datatype should be object.

Comment From: jreback

@sinhrks can you have a look

Comment From: jorisvandenbossche

@nooperpudd Can you try to pass dtype={'股份代號': str} to read_html? The 000001 are just interpreted as numbers, hence the 1

Comment From: jorisvandenbossche

Small correction to the above, it is the converters keyword, not dtype (related PR: https://github.com/pandas-dev/pandas/pull/13575)

Comment From: sinhrks

Yeah, the problem should be solved by @jorisvandenbossche 's answer:)