import pendulum
import pandas
pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())
[1] 56829 segmentation fault ipython
Problem description
As pendulum
claims to inherit from datetime.datetime
, I wouldn't expect a segfaut to happen, as date_range
always worked find with datetime
objects
From the documentation:
The Pendulum class is a drop-in replacement for the native datetime class (it is inherited from it).
I could file an issue on the pendulum bug tracker as well, but that's pandas who's segfaulting, so I think you guys would have better tools to understand what's going on.
Expected Output
DatetimeIndex(['2017-04-12 20:54:52.305104', '2017-04-13 20:54:52.305104'], dtype='datetime64[ns]', freq='D')
Output of pd.show_versions()
Comment From: jreback
this is a pendulum
issue. They don't use standard timezones.
In [4]: type(pendulum.now().tz)
Out[4]: pendulum.tz.timezone.Timezone
we accept dateutil
and pytz
timezones. I suppose we could add (optional) support for this. but all we would do would be to convert to Timestamp
anyhow, but it would likely be a major effort for not much gain.
closing this as won't fix / out-of-scope.
Comment From: sdispater
@jreback Author of pendulum
here :-) What do you mean by it does not use standard timezone?
>>> isinstance(pendulum.now().tzinfo, datetime.tzinfo)
True
So, as you can see, pendulum
uses a subclass of the standard tzinfo
.
The underlying problem comes from pandas
since it calls:
int(total_seconds(_get_utcoffset(tz, None)))
in the _get_dst_info() function.
However total_seconds()
assumes a timedelta
while the utcoffset()
method of a tzinfo
can also return None
.
Here is what the offical Python documentation says (https://docs.python.org/3.6/library/datetime.html#datetime.tzinfo.utcoffset):
If the UTC offset isn’t known, return None.
And in this case None
is being passed to utcoffset()
so there is no way to determine the offset that's why pendulum returns None
.
So None
is a valid return value, however pandas
will segfault when it occurs.
Comment From: TomAugspurger
Thanks for the explanation @sdispater.
Comment From: jreback
I am not averse to changing this, but some concerns.
pendulum
doesn't have the notion of naive datetimes w/o actually using a timezone object; this is contrary to both pytz and dateutil which always return a valid utc offset (sure the actual spec says different, but has been this way a long time). rather naive is represented by having NO tz object. This is fairly ingrained in the implementation in pandas.
This is quite awkward in the context of having distinct notions of naive and tz-aware datetimes, which are both first class in pandas. So several paths exist:
- ban objects that purport to be naive but are actually representing as tz-aware but with no utc offset (raise NotImplementedError is best)
- convert object from 1) to a tz=None when encountered
- handle this case in pandas.
2nd and 3rd would require fully passing the entire test suite with these objects fully tested. We test pytz and dateutil objects in many many ways.
So if someone wants to do this go for it.
Would take the first option (ban) immediately (pretty trivial), then others could be over time.
Comment From: pletnes
I ran into this and was thoroughly confused. My jupyter notebook kernel kept dying with no indication of what the error was; it even took a good long time to discover that it was a segfault. Presumably many pandas users don't even know much about segfaults.
Pandas doesn't have to accept pendulum objects, I think, but segfaulting really is not very polite.
Comment From: jorisvandenbossche
PR to for now at least raise an error instead of segfaulting is very welcome! (we can later still see whether to more fully support it or not)
Comment From: daviskirk
We'd also really appreciate an error being raised. Maybe another point here after talking to less experienced devs is that just segfaulting gives so little info on what even happened that it's hard to pinpoint that pendulum is even the problem, so there might be many users that aren't even aware that this is the issue they are looking for and just assume pandas is a buggy library.
Comment From: jreback
In 0.21.x this does not segfault, rather
import pendulum
...: import pandas
...: pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib.tz_convert_single'
Traceback (most recent call last):
File "pandas/_libs/tslibs/timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()
AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib.tz_convert_single'
Traceback (most recent call last):
File "pandas/_libs/tslibs/timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-c9557dad79f3> in <module>()
1 import pendulum
2 import pandas
----> 3 pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())
~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
2055 return DatetimeIndex(start=start, end=end, periods=periods,
2056 freq=freq, tz=tz, normalize=normalize, name=name,
-> 2057 closed=closed, **kwargs)
2058
2059
~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
116 else:
117 kwargs[new_arg_name] = new_arg_value
--> 118 return func(*args, **kwargs)
119 return wrapper
120 return _deprecate_kwarg
~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
322 return cls._generate(start, end, periods, name, freq,
323 tz=tz, normalize=normalize, closed=closed,
--> 324 ambiguous=ambiguous)
325
326 if not isinstance(data, (np.ndarray, Index, ABCSeries)):
~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in _generate(cls, start, end, periods, name, offset, tz, normalize, ambiguous, closed)
533 if tz is not None and getattr(index, 'tz', None) is None:
534 index = libts.tz_localize_to_utc(_ensure_int64(index), tz,
--> 535 ambiguous=ambiguous)
536 index = index.view(_NS_DTYPE)
537
pandas/_libs/tslib.pyx in pandas._libs.tslib.tz_localize_to_utc()
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()
AttributeError: 'NoneType' object has no attribute 'total_seconds'
https://github.com/jreback/pandas/commit/97dddbcc26f67c853ec2af3833015940e2c660ff is a working branch which allows pandas to accept pendulum objects.
BUT
1) it needs a lot more testing 2) the interface exposed by pendulum is a bit not-standard compared to both pytz and dateutil, mainly the repr of a tz is custom one, so for example the changes in #18596 should be done first, I suspect a number of things won't work because of this. 3) these would certainly need to be coerced for any actual use within pandas (IOW in a Series).
In [12]: str(pendulum.now().astimezone('UTC').tz)
Out[12]: '<Timezone [UTC]>'
In [13]: str(pytz.utc)
Out[13]: 'UTC'
so if someone wants to take this and run with it would be great.
Comment From: jreback
closing as stale. would take a PR from the community though.
Comment From: philsheard
I see that the Pendulum issue has been closed without a fix, so if anyone wants a workaround to this:
I did a roundtrip from Pendulum > ISO format > Pandas to avoid the internal lookup that causes the warning.
converted_date = pandas.to_datetime(pendulum.today().isoformat())
Not pretty but it will clean your error logs without hurting comprehension.