Pandas DEPS: drop 32-bit support - Nineya|java/go/python

we barely test this. I don't seem much clamor for it in the real world.

Comment From: chris-b1

I'm generally in favor, although a little messier on Windows. The default on https://www.python.org/downloads/ for Windows is still 32 bit (not sure why)

Comment From: gfyoung

I agree with @chris-b1 on that sentiment regarding Windows. I would make sure that this is announced somewhat well in advance to make sure we don't surprise people.

Comment From: jreback

@chris-b1 want to ping python and see why the downloads default like that?

Comment From: OmerJog

There is no more default for windows. When you choose a version you get all options (32, 64) and you choose from it.

Comment From: jreback

Going to remove support for 32-bit starting in 0.25.0. Posting a doc warning in 0.24.0. If anyone has a really good reason why we should not do this, speak now or forever hold your peace.

32-bit support has a very small number of downloads now (@TomAugspurger will post some stats). We have very little testing (only on a daily build not accessible to regular contributors), and no testing whatsoever for windows. This is a burden.

Comment From: jorisvandenbossche

Disclaimer: I am a complete noob with regard to this. But, just wondering: does this have implications for more "exotic" systems and platforms?
For example for ARM, Raspberry Pi and the like (cc @jjhelmus from berryconda) or WebAssembly (cc @rth in pyodide).

Further question: what are other packages doing? (numpy, scipy, matplotlib, sklearn, ...) Anybody aware of it already being dropped in one of those packages? (can't directly find a reference about it)

Comment From: jorisvandenbossche

Quickly spoke with some colleagues: their feeling is that it is important to support ARM. Not necessarily 32 bit, but that might be the consequence (I don't know what version of ARM is most widely used, I understand the latest ARM also supports 64bit).

Also, it is maybe good to discuss where exactly the burden is: is it difficult to support in the actual code, or are it more the tests that are failing due to some reasons (int32 vs int64, precision, ..) without necessarily giving incorrect results? Are there ways to improve the situation around testing? Or is it the packaging? (but if that is the case, we can also drop binaries for it without dropping full support (like conda-forge did))

If it is mainly the testing that is the problem (tests easily break if we don't run tests on 32 bit regularly), we could also say that "we don't guarantee that the tests pass on 32 bit, but happily receive patches to get them passing" instead of "we don't support 32 bit". In practice this might be more or less the same for us (we don't test / package for 32-bit), but it's a very different message to outside users / developers.

Comment From: jorisvandenbossche

cc @pandas-dev/pandas-core

Comment From: jreback

e.g. just today: https://github.com/pandas-dev/pandas/issues/24613

Comment From: toobaz

Quickly spoke with some colleagues: their feeling is that it is important to support ARM. Not necessarily 32 bit, but that might be the consequence

This was precisely what I was going to object - although I don't have any authoritative source.

Moreover, my very general and superficial analysis of issues like #24613 (or - now speaking out of experience - #19439 ) is that they actually help us fix mistakes in the code, regardless of the use of 32 bits.

Finally, even assuming that 32 bits is (or will be) the obsolete past, maybe by not breaking compatibility we keep a codebase which is ready for e.g. 128 bits?

Comment From: rth

For example for ARM, Raspberry Pi and the like or WebAssembly

Yes, if pandas explicitly drop supports (as in stops accepting patches for non x86-64 arch, or makes refactoring breaking existing support) it might become difficult to run pandas there cc @mdboom

32-bit support has a very small number of downloads now

I guess you were talking about Windows, but note that for most Linux based exotic arch, people would likely use the packages built for debian / some other distribution (or rebuild from sources). In scikit-learn the debian release CI is how we mostly discover issues on these platforms.

On the other side of the spectrum, there is e.g. PowerPC64 arch used in IBM Blue Gene Q supercomputers. It is 64 bit, but big endian, which is also not tested in CI. It might not be officially supported, but I guess pandas probably mostly works there. Packaging would be done as part of RHEL.

Further question: what are other packages doing? (numpy, scipy, matplotlib, sklearn, ...)

Numpy runs CI for ARM (64 bit I think) with Shippable, and from what I understood there might be some interest in also doing that for scikit-learn.

Moreover, my very general and superficial analysis of issues like #24613 [..] is that they actually help us fix mistakes in the code, regardless of the use of 32 bits.

I had the same experience in other projects.

Overall CPython is very portable and it would be nice if the the main data analysis library in Python could also be run on most arch as is currently the case. I understand that this adds some maintance effort.

Maybe a situation where motivated people in the community report issues and provide patches for those architectures, so that such arch are not officially supported but in practice mostly work (i.e. the test suite is not guaranteed to fully pass) could be good enough?

Comment From: jreback

@rth yeah we would certainly continue accepting patches, as we now do for non-supported arch (Big Endian). The difference would be that we wouldn't build wheels (and make them pass).

Comment From: TomAugspurger

For conda downloads, Linux and Mac 32-bit downloads are negligible (Anaconda Distribution dropped Mac 32 and will drop 32-bit Linux this year).

For windows, the split is 94% to 6% for 64-bit to 32-bit.

I'm not sure how to think about this 6%. That's not negligible.

I also don't think the current system is great. A PR that introduces a 32-bit bug should ideally fail the CI on the PR. If we're OK with supporting 32-bit windows still (which I think we probably should, at 6%) then I'll set up a 32-bit azure build.

Comment From: jorisvandenbossche

@TomAugspurger azure supports to direct test on 32 bit? Then I would personally be in favor of that, as a big part of the problem is the lack of direct testing feedback. (alternatively, here is an idea of sklearn on how to do it on travis with docker: https://github.com/scikit-learn/scikit-learn/issues/9352)

Comment From: TomAugspurger

I thought they did, but I may be mis-remembering.

On Mon, Jan 7, 2019 at 8:51 AM Joris Van den Bossche notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger azure supports to direct test on 32 bit? Then I would personally be in favor of that, as a big part of the problem is the lack of direct testing feedback. (alternatively, here is an idea of sklearn on how to do it on travis with docker: scikit-learn/scikit-learn#9352 https://github.com/scikit-learn/scikit-learn/issues/9352)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/15889#issuecomment-451959286, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIrMXxjdXJrWMob1TfAWMwqJjoV1Aks5vA17tgaJpZM4My55m .

Comment From: mdboom

For Pyodide, the plan is to soon have the Pandas regression test suite running inside of the WebAssembly browser environment (which is currently 32-bit). So we should at least catch these things independently of upstream at some point and its nice to hear you aren't opposed to accepting patches to fix such things. But obviously my selfish preference would be to have some 32-bit testing and official support remain in upstream so any regressions in that regard are caught sooner and closer to the source.

Comment From: jjhelmus

32-bit ARM is still an activate platform. The major Linux distributions maintain a 32-bit ARM port (armhf or armhfp). In addition, Raspbian, the default distribution for all models of the Raspberry Pi, is a 32-bit platform.

Comment From: wesm

Is supporting 32-bit really a hardship? I can understand that it requires additional testing, so that is not nothing, but having discipline around explicitly-sized types (i.e. using int64_t etc. instead of int) is a good thing anyway

Comment From: jreback

we don’t have on demand CI testing, only once a day so very hard to test things

if we could have a 32-bit azure build then would solve these things

Comment From: wesm

You can run Docker in Travis CI, so I think you could have a 32-bit Dockerfile to test things

Comment From: jorisvandenbossche

Thanks all for the feedback! I think is is clear that we want to keep (some level of) support for int32?

Personally I would keep an explicit support (and therefore look for ways to improve the testing situation (test it on CI for PRs to have an easier feedback loop)), compared to an "patches welcome" kind of support.

You can run Docker in Travis CI, so I think you could have a 32-bit Dockerfile to test things

Yes, that's basically what I linked to in https://github.com/scikit-learn/scikit-learn/issues/9352, if I understood that correctly.

Shall we open a new issue for adding 32bit to CI?

Comment From: jreback

we already run a 32bit CI in our wheel builder but that setup is completely different than our current setup

might be easier to add an azure build i think as they start with docker images

Comment From: gfyoung

Shall we open a new issue for adding 32bit to CI?

@jorisvandenbossche : There has been one open for a couple of years now by @jreback himself:

https://github.com/pandas-dev/pandas/issues/13715

might be easier to add an azure build i think as they start with docker images

@jreback : Ideally, if they just had a pool for 32-bit, that would be ideal. That being said, Docker looks to be the way to go on Azure or Travis.

Comment From: jreback

@pandas-dev/pandas-core since consensus appears to be to keep 32-bit, then let's make a concerted effort to enable this on per-commit CI to enable actual testing, rather than post-facto fixing, which is really a PITA.

Comment From: jorisvandenbossche

There has been one open for a couple of years now

OK, let's close this then in favor of #13715. I will move some of the ideas mentioned here about testing there.