Code Sample, a copy-pastable example if possible
>>> dfs = pd.read_html('<table><tr><td>1</td></tr></table>')
ImportError: lxml not found, please install it
>>> dfs = pd.read_html('<table><tr><td>1</td></tr></table>', flavor='bs4')
[ 0
0 1]
Problem description
The documentation explictly states, in the HTML-parsing-gotchas page and the argument docstring that the fall back is to 'bs4+html5lib' if 'lxml' fails. For me there is no fallback, just an ImportError since I do not have lxml installed.
Expected Output
Docs should be changed to require explicit flavor
input, or the ImportError is caught.
Output of pd.show_versions()
Comment From: TomAugspurger
Looks like in https://github.com/pandas-dev/pandas/blob/37dfcc1acf3b37a1ff5251fee3380a179da1f2ed/pandas/io/html.py#L885-L886 we unconditionally raise an exception, even if the flavor was implicit from None. Will need some logic to handle that case.
Comment From: bsipocz
I've run into this very same problem, where flavor was set to ('lxml', 'bs4')
from None, and in the loop it never in fact got to bs4
.
Would changing the order of the tuple be controversial, or rewriting the loop? If large/complicated consequences are not foreseen I could open a PR for this.