Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
# You can use any simple csv for demonstration, but this FAILS
df = pandas.read_csv('file://Users/kvelayutham/Documents/testing/yellow_tripdata_2015-01.csv', nrows=10)
# This seems to work with the extra /
df = pandas.read_csv('file:///Users/kvelayutham/Documents/testing/yellow_tripdata_2015-01.csv', nrows=10)
Issue Description
I'm not sure if this is intended behavior or not (please close if it is), but it seems like when we use file://
as is, the path isn't get parsed correctly and is throwing an error in urllib. My gut is saying that there is a small bug in the order of if-statement checks in pandas/io/common.py
Expected Behavior
I would expect relative paths to work with the file://
prefix to work, but I could be wrong here.
Installed Versions
Comment From: pyrito
It looks like this might be expected behavior?
Comment From: mroeschke
I think this follows the file URI scheme behavior: https://en.wikipedia.org/wiki/File_URI_scheme#How_many_slashes?
//
should be followed by the hostname while ///
implies an empty hostname
Comment From: twoertwein
This might be because we try to infer whether we should be using urllib or fsspec to open the file.
It might be nice to remove this urllib code https://github.com/pandas-dev/pandas/blob/9757d1f93faaa517161fd719e884be7344c18b62/pandas/io/common.py#L352 and always use fsspec. But there are several problems with that 1) fsspec is an optional dependency 2) behavior changes (urllib infers some compression and they might take different storage_options
)
Comment From: pyrito
I think this follows the file URI scheme behavior: https://en.wikipedia.org/wiki/File_URI_scheme#How_many_slashes?
//
should be followed by the hostname while///
implies an empty hostname
That makes sense, but yeah, I guess like @twoertwein alluded to, maybe the error message should reflect that instead of the urllib error. But this may not even be a big issue, so please feel free to close the issue.