pandas version: '0.19.2'
import requests
url ="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "lxml")
table = soup.find("table", class_="table_grey_border")
board_data = pandas.read_html(table.prettify(),header=0, flavor="bs4")
return board_data[0]
Problem description
股份代號 股份名稱 買賣單位 附註 Unnamed: 4 Unnamed: 5 Unnamed: 6
0 1 長和 500 # H O F
1 2 中電控股 500 # H O F
2 3 香港中華煤氣 1000 # H O F
3 4 九龍倉集團 1000 # H O F
4 5 匯豐控股 400 # H O F
5 6 電能實業 500 # H O F
6 7 凱富能源 2000 # NaN NaN NaN
股份代號 this column data should be 1->00001
, 2->00002
datatype:
股份代號 int64
股份名稱 object
買賣單位 int64
附註 object
Unnamed: 4 object
Unnamed: 5 object
Unnamed: 6 object
dtype: object
why missing the 0000 data in the columns
actually, the 股份代號 datatype should be object.
Comment From: jreback
@sinhrks can you have a look
Comment From: jorisvandenbossche
@nooperpudd Can you try to pass dtype={'股份代號': str}
to read_html? The 000001
are just interpreted as numbers, hence the 1
Comment From: jorisvandenbossche
Small correction to the above, it is the converters
keyword, not dtype
(related PR: https://github.com/pandas-dev/pandas/pull/13575)
Comment From: sinhrks
Yeah, the problem should be solved by @jorisvandenbossche 's answer:)