import pendulum
import pandas
pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())

[1]    56829 segmentation fault  ipython

Problem description

As pendulum claims to inherit from datetime.datetime, I wouldn't expect a segfaut to happen, as date_range always worked find with datetime objects

From the documentation:

The Pendulum class is a drop-in replacement for the native datetime class (it is inherited from it).

I could file an issue on the pendulum bug tracker as well, but that's pandas who's segfaulting, so I think you guys would have better tools to understand what's going on.

Expected Output

DatetimeIndex(['2017-04-12 20:54:52.305104', '2017-04-13 20:54:52.305104'], dtype='datetime64[ns]', freq='D')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_BE.UTF-8 LOCALE: fr_BE.UTF-8 pandas: 0.19.2 nose: None pip: 9.0.1 setuptools: 34.4.1 Cython: None numpy: 1.12.1 scipy: None statsmodels: None xarray: None IPython: 5.3.0 sphinx: None patsy: None dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.9 pymysql: None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.9.6 boto: None pandas_datareader: None

Comment From: jreback

this is a pendulum issue. They don't use standard timezones.

In [4]: type(pendulum.now().tz)
Out[4]: pendulum.tz.timezone.Timezone

we accept dateutil and pytz timezones. I suppose we could add (optional) support for this. but all we would do would be to convert to Timestamp anyhow, but it would likely be a major effort for not much gain.

closing this as won't fix / out-of-scope.

Comment From: sdispater

@jreback Author of pendulum here :-) What do you mean by it does not use standard timezone?

>>> isinstance(pendulum.now().tzinfo, datetime.tzinfo)
True

So, as you can see, pendulum uses a subclass of the standard tzinfo.

The underlying problem comes from pandas since it calls:

int(total_seconds(_get_utcoffset(tz, None)))

in the _get_dst_info() function.

However total_seconds() assumes a timedelta while the utcoffset() method of a tzinfo can also return None.

Here is what the offical Python documentation says (https://docs.python.org/3.6/library/datetime.html#datetime.tzinfo.utcoffset):

If the UTC offset isn’t known, return None.

And in this case None is being passed to utcoffset() so there is no way to determine the offset that's why pendulum returns None.

So None is a valid return value, however pandas will segfault when it occurs.

Comment From: TomAugspurger

Thanks for the explanation @sdispater.

Comment From: jreback

I am not averse to changing this, but some concerns.

pendulum doesn't have the notion of naive datetimes w/o actually using a timezone object; this is contrary to both pytz and dateutil which always return a valid utc offset (sure the actual spec says different, but has been this way a long time). rather naive is represented by having NO tz object. This is fairly ingrained in the implementation in pandas.

This is quite awkward in the context of having distinct notions of naive and tz-aware datetimes, which are both first class in pandas. So several paths exist:

  • ban objects that purport to be naive but are actually representing as tz-aware but with no utc offset (raise NotImplementedError is best)
  • convert object from 1) to a tz=None when encountered
  • handle this case in pandas.

2nd and 3rd would require fully passing the entire test suite with these objects fully tested. We test pytz and dateutil objects in many many ways.

So if someone wants to do this go for it.

Would take the first option (ban) immediately (pretty trivial), then others could be over time.

Comment From: pletnes

I ran into this and was thoroughly confused. My jupyter notebook kernel kept dying with no indication of what the error was; it even took a good long time to discover that it was a segfault. Presumably many pandas users don't even know much about segfaults.

Pandas doesn't have to accept pendulum objects, I think, but segfaulting really is not very polite.

Comment From: jorisvandenbossche

PR to for now at least raise an error instead of segfaulting is very welcome! (we can later still see whether to more fully support it or not)

Comment From: daviskirk

We'd also really appreciate an error being raised. Maybe another point here after talking to less experienced devs is that just segfaulting gives so little info on what even happened that it's hard to pinpoint that pendulum is even the problem, so there might be many users that aren't even aware that this is the issue they are looking for and just assume pandas is a buggy library.

Comment From: jreback

In 0.21.x this does not segfault, rather

import pendulum
   ...: import pandas
   ...: pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())
   ...: 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()

AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib.tz_convert_single'
Traceback (most recent call last):
  File "pandas/_libs/tslibs/timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()

AttributeError: 'NoneType' object has no attribute 'total_seconds'
Exception ignored in: 'pandas._libs.tslib.tz_convert_single'
Traceback (most recent call last):
  File "pandas/_libs/tslibs/timezones.pyx", line 227, in pandas._libs.tslibs.timezones.get_dst_info
AttributeError: 'NoneType' object has no attribute 'total_seconds'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-c9557dad79f3> in <module>()
      1 import pendulum
      2 import pandas
----> 3 pandas.date_range(start=pendulum.now(), end=pendulum.tomorrow())

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
   2055     return DatetimeIndex(start=start, end=end, periods=periods,
   2056                          freq=freq, tz=tz, normalize=normalize, name=name,
-> 2057                          closed=closed, **kwargs)
   2058 
   2059 

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    322             return cls._generate(start, end, periods, name, freq,
    323                                  tz=tz, normalize=normalize, closed=closed,
--> 324                                  ambiguous=ambiguous)
    325 
    326         if not isinstance(data, (np.ndarray, Index, ABCSeries)):

~/miniconda3/envs/pandas/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in _generate(cls, start, end, periods, name, offset, tz, normalize, ambiguous, closed)
    533             if tz is not None and getattr(index, 'tz', None) is None:
    534                 index = libts.tz_localize_to_utc(_ensure_int64(index), tz,
--> 535                                                  ambiguous=ambiguous)
    536                 index = index.view(_NS_DTYPE)
    537 

pandas/_libs/tslib.pyx in pandas._libs.tslib.tz_localize_to_utc()

pandas/_libs/tslibs/timezones.pyx in pandas._libs.tslibs.timezones.get_dst_info()

AttributeError: 'NoneType' object has no attribute 'total_seconds'

https://github.com/jreback/pandas/commit/97dddbcc26f67c853ec2af3833015940e2c660ff is a working branch which allows pandas to accept pendulum objects.

BUT

1) it needs a lot more testing 2) the interface exposed by pendulum is a bit not-standard compared to both pytz and dateutil, mainly the repr of a tz is custom one, so for example the changes in #18596 should be done first, I suspect a number of things won't work because of this. 3) these would certainly need to be coerced for any actual use within pandas (IOW in a Series).

In [12]: str(pendulum.now().astimezone('UTC').tz)
Out[12]: '<Timezone [UTC]>'

In [13]: str(pytz.utc)
Out[13]: 'UTC'

so if someone wants to take this and run with it would be great.

Comment From: jreback

closing as stale. would take a PR from the community though.

Comment From: philsheard

I see that the Pendulum issue has been closed without a fix, so if anyone wants a workaround to this:

I did a roundtrip from Pendulum > ISO format > Pandas to avoid the internal lookup that causes the warning.

converted_date = pandas.to_datetime(pendulum.today().isoformat())

Not pretty but it will clean your error logs without hurting comprehension.