Hi,

While buildling pandas 0.20.3 on Debian I had 3 timezone handling related test failures.

I also get the 3 errors when running the tests in a virtualenv with 0.20.3 installed via pip

The error tracebacks:

================================================= FAILURES ==================================================
_____________________________ TestTimeZoneSupportDateutil.test_ambiguous_flags ______________________________

self = <pandas.tests.tseries.test_timezones.TestTimeZoneSupportDateutil object at 0x7f742d3702e8>

    def test_ambiguous_flags(self):
        # November 6, 2011, fall back, repeat 2 AM hour
        tz = self.tz('US/Eastern')

        # Pass in flags to determine right dst transition
        dr = date_range(datetime(2011, 11, 6, 0), periods=5,
                        freq=offsets.Hour(), tz=tz)
        times = ['11/06/2011 00:00', '11/06/2011 01:00', '11/06/2011 01:00',
                 '11/06/2011 02:00', '11/06/2011 03:00']

        # Test tz_localize
        di = DatetimeIndex(times)
        is_dst = [1, 1, 0, 0, 0]
        localized = di.tz_localize(tz, ambiguous=is_dst)
        tm.assert_index_equal(dr, localized)
        tm.assert_index_equal(dr, DatetimeIndex(times, tz=tz,
                                                ambiguous=is_dst))

        localized = di.tz_localize(tz, ambiguous=np.array(is_dst))
        tm.assert_index_equal(dr, localized)

        localized = di.tz_localize(tz,
                                   ambiguous=np.array(is_dst).astype('bool'))
        tm.assert_index_equal(dr, localized)

        # Test constructor
        localized = DatetimeIndex(times, tz=tz, ambiguous=is_dst)
        tm.assert_index_equal(dr, localized)

        # Test duplicate times where infer_dst fails
        times += times
        di = DatetimeIndex(times)

        # When the sizes are incompatible, make sure error is raised
        pytest.raises(Exception, di.tz_localize, tz, ambiguous=is_dst)

        # When sizes are compatible and there are repeats ('infer' won't work)
        is_dst = np.hstack((is_dst, is_dst))
        localized = di.tz_localize(tz, ambiguous=is_dst)
        dr = dr.append(dr)
        tm.assert_index_equal(dr, localized)

        # When there is no dst transition, nothing special happens
        dr = date_range(datetime(2011, 6, 1, 0), periods=10,
                        freq=offsets.Hour())
        is_dst = np.array([1] * 10)
        localized = dr.tz_localize(tz)
        localized_is_dst = dr.tz_localize(tz, ambiguous=is_dst)
        tm.assert_index_equal(localized, localized_is_dst)

        # construction with an ambiguous end-point
        # GH 11626
        tz = self.tzstr("Europe/London")

        def f():
            date_range("2013-10-26 23:00", "2013-10-27 01:00",
                       tz="Europe/London", freq="H")
            pytest.raises(pytz.AmbiguousTimeError, f)

        times = date_range("2013-10-26 23:00", "2013-10-27 01:00", freq="H",
                           tz=tz, ambiguous='infer')
        assert times[0] == Timestamp('2013-10-26 23:00', tz=tz, freq="H")

        if dateutil.__version__ != LooseVersion('2.6.0'):
            # see gh-14621
>           assert times[-1] == Timestamp('2013-10-27 01:00:00+0000',
                                          tz=tz, freq="H")
E           AssertionError: assert Timestamp('2013-10-27 01:00:00+0100', tz='dateutil//usr/share/zoneinfo/Europe/London', freq='H') == Timestamp('2013-10-27 02:00:00+0100', tz='dateutil//usr/share/zoneinfo/Europe/London', freq='H')
E            +  where Timestamp('2013-10-27 02:00:00+0100', tz='dateutil//usr/share/zoneinfo/Europe/London', freq='H') = Timestamp('2013-10-27 01:00:00+0000', tz='dateutil/Europe/London', freq='H')

pandas/tests/tseries/test_timezones.py:567: AssertionError
____________________________________ TestTimeZones.test_ambiguous_compat ____________________________________

self = <pandas.tests.tseries.test_timezones.TestTimeZones object at 0x7f742d2ef4e0>

    def test_ambiguous_compat(self):
        # validate that pytz and dateutil are compat for dst
        # when the transition happens
        tm._skip_if_no_dateutil()
        tm._skip_if_no_pytz()

        pytz_zone = 'Europe/London'
        dateutil_zone = 'dateutil/Europe/London'
        result_pytz = (Timestamp('2013-10-27 01:00:00')
                       .tz_localize(pytz_zone, ambiguous=0))
        result_dateutil = (Timestamp('2013-10-27 01:00:00')
                           .tz_localize(dateutil_zone, ambiguous=0))
        assert result_pytz.value == result_dateutil.value
        assert result_pytz.value == 1382835600000000000

        # dateutil 2.6 buggy w.r.t. ambiguous=0
        if dateutil.__version__ != LooseVersion('2.6.0'):
            # see gh-14621
            # see https://github.com/dateutil/dateutil/issues/321
>           assert (result_pytz.to_pydatetime().tzname() ==
                    result_dateutil.to_pydatetime().tzname())
E           AssertionError: assert 'GMT' == 'BST'
E             - GMT
E             + BST

pandas/tests/tseries/test_timezones.py:1263: AssertionError
_______________________________________ TestDST.test_fallback_plural ________________________________________

self = <pandas.tests.tseries.test_offsets.TestDST object at 0x7f742d199fd0>

    def test_fallback_plural(self):
        # test moving from daylight savings to standard time
        import dateutil
        for tz, utc_offsets in self.timezone_utc_offsets.items():
            hrs_pre = utc_offsets['utc_offset_daylight']
            hrs_post = utc_offsets['utc_offset_standard']

            if dateutil.__version__ != LooseVersion('2.6.0'):
                # buggy ambiguous behavior in 2.6.0
                # GH 14621
                # https://github.com/dateutil/dateutil/issues/321
                self._test_all_offsets(
                    n=3, tstart=self._make_timestamp(self.ts_pre_fallback,
                                                     hrs_pre, tz),
>                   expected_utc_offset=hrs_post)

pandas/tests/tseries/test_offsets.py:4860: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/tests/tseries/test_offsets.py:4803: in _test_all_offsets
    self._test_offset(offset_name=name, offset_n=n, **kwds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pandas.tests.tseries.test_offsets.TestDST object at 0x7f742d199fd0>, offset_name = 'minutes'
offset_n = 3
tstart = Timestamp('2013-11-03 01:59:59.999999-0700', tz='dateutil//usr/share/zoneinfo/US/Pacific')
expected_utc_offset = -8

    def _test_offset(self, offset_name, offset_n, tstart, expected_utc_offset):
        offset = DateOffset(**{offset_name: offset_n})

        t = tstart + offset
        if expected_utc_offset is not None:
>           assert get_utc_offset_hours(t) == expected_utc_offset
E           AssertionError: assert -7.0 == -8
E            +  where -7.0 = get_utc_offset_hours(Timestamp('2013-11-03 01:02:59.999999-0700', tz='dateutil//usr/share/zoneinfo/US/Pacific'))

pandas/tests/tseries/test_offsets.py:4810: AssertionError
============================== 3 failed, 454 passed, 5 skipped in 7.12 seconds ==============================

output of show_versions:

INSTALLED VERSIONS

commit: None python: 3.5.4.final.0 python-bits: 64 OS: Linux OS-release: 4.12.0-1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3 pytest: 3.2.2 pip: 9.0.1 setuptools: 36.4.0 Cython: None numpy: 1.13.1 scipy: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: None s3fs: None pandas_gbq: None pandas_datareader: None

Comment From: jreback

not really sure why that would be. these changes w.r.t. dateutil 2.6.1 were in 0.20.3. pls try with master. These are not reproducible by any ci / locally, so no idea.

Comment From: detrout

Hi. on a different Debian machine than my first test I tried with both pip and git master (commit 328c7e179b72e257e27adf92a06718fd5a40473f) I had the same 3 failures.

Looking at one of the assertion errors, the tz parameter suggests dateutil is using the system zoneinfo database in /usr/share/zoneinfo

E           AssertionError: assert Timestamp('2013-10-27 01:00:00+0100', tz='dateutil
//usr/share/zoneinfo/Europe/London', freq='H') == Timestamp('2013-10-27 02:00:00+0100
', tz='dateutil//usr/share/zoneinfo/Europe/London', freq='H') 

I went and looked at the CircleCI configuration at https://circleci.com/gh/pandas-dev/pandas/4568#config/containers/0

From the CI config It looks like you're using miniconda and I'm guessing its using an Ubuntu base image but I can't tell what version of Ubuntu. Do you know which version it is? Is the zoneinfo database one of the dependencies managed by conda?

On both systems the version of the tzdata package (which provides /usr/share/zoneinfo) is 2017b (with Debian version 2017b-2)

Comment From: detrout

I have a configuration where the tests pass with commit f5cfdbb1f4b715819aceac4b00cf18ba5f467f85, but fail with builds from the tags for v0.20.0 or v0.20.3.

Unfortunately I'm having trouble getting git bisect to work right (an issue with master not descending from v0.20.3), so I haven't found what changed yet.

Comment From: detrout

I found that the 3 errors go away if I apply commit 55af1ab626baf62dbbc00c2521c20be29b819a06 to the 0.20.3 source tree. Debian is shipping dateutil 2.6.1 now. and I see this code block in the diff, I'm guessing this patch should get shipped?

-            if dateutil.__version__ != LooseVersion('2.6.0'):
+            if dateutil.__version__ < LooseVersion('2.6.0'):

Comment From: jreback

oh I thought you were testing master. yes datetutil updated and our tests needed to be slightly changed.

Comment From: jreback

thanks for the debugging run. always happy to have debian (and other packagers) test things out.

Comment From: detrout

You're welcome, and thanks for listening

Comment From: detrout

I took some time and dug...

This piece of code (which is similar to whats called in https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals.py#L1993 ):

numpy.asarray(numpy.nan).astype('i8', copy=False).view('i8')

On amd64 it returns array(-9223372036854775808) On arm64 it returns array(0)

The pandas timestamp class uses NPY_MIN_INT64 to indicate not a time

I accidentally traced through a different version of pandas that didn't call these functions in internals.py::Block::where

other = maybe_convert_string_to_object(other)
other = maybe_convert_scalar(other)

maybe_conver_scalar(other) changes others type from 'float' to 'float64'

is_null_datelike_scalar returns True for a float numpy.nan, but False for a float64 numpy.nan