-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
If Dataframe has index which timezone is set to dateutil.tz.tzlocal(), it could not be saved to parquet.
It might be related to #24310
Code Sample, a copy-pastable example
from dateutil.tz import tzlocal
ind = pd.date_range('2020-02-01','2020-04-14').tz_localize(tzlocal())
x = pd.DataFrame([[1,2]]*len(ind), index=ind, columns=['A','B'])
x.to_parquet('tmp.parquet')
Problem description
The code raises ValueError: Unable to convert timezone "tzlocal()" to string.
However saving to e.g. CSV works well:
x.to_csv('tmp.csv')
Output of pd.show_versions()
Comment From: jorisvandenbossche
@vfilimonov dateutil timezones are currently not supported by pyarrow, see https://issues.apache.org/jira/browse/ARROW-5248
So the best option, for now, is to convert the timezone to a datetime.timezone fixed offset of pytz timezone.
Comment From: vfilimonov
Thank you for quick response @jorisvandenbossche!
What would be the easiest way to convert index to fixed offset?
I could think of a workaround by iterating over index and parsing strings:
from dateutil.tz import tzlocal
ind = pd.date_range('2020-02-01','2020-04-14').tz_localize(tzlocal())
ind = [pd.Timestamp(str(_)) for _ in ind]
Comment From: jbrockmendel
@jorisvandenbossche is there any prospect of this being supported in pyarrow? if not it might make sense to add a helpful message for this case?