-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
If Dataframe has index which timezone is set to dateutil.tz.tzlocal()
, it could not be saved to parquet.
It might be related to #24310
Code Sample, a copy-pastable example
from dateutil.tz import tzlocal
ind = pd.date_range('2020-02-01','2020-04-14').tz_localize(tzlocal())
x = pd.DataFrame([[1,2]]*len(ind), index=ind, columns=['A','B'])
x.to_parquet('tmp.parquet')
Problem description
The code raises ValueError: Unable to convert timezone "tzlocal()" to string
.
However saving to e.g. CSV works well:
x.to_csv('tmp.csv')
Output of pd.show_versions()
Comment From: jorisvandenbossche
@vfilimonov dateutil timezones are currently not supported by pyarrow, see https://issues.apache.org/jira/browse/ARROW-5248
So the best option, for now, is to convert the timezone to a datetime.timezone fixed offset of pytz timezone.
Comment From: vfilimonov
Thank you for quick response @jorisvandenbossche!
What would be the easiest way to convert index to fixed offset?
I could think of a workaround by iterating over index and parsing strings:
from dateutil.tz import tzlocal
ind = pd.date_range('2020-02-01','2020-04-14').tz_localize(tzlocal())
ind = [pd.Timestamp(str(_)) for _ in ind]
Comment From: jbrockmendel
@jorisvandenbossche is there any prospect of this being supported in pyarrow? if not it might make sense to add a helpful message for this case?