Code Sample, a copy-pastable example if possible
python setup.py build_ext --inplace -j 8
Problem description
See some discussion in #30862 . I have 8 cores on Windows. When I use -j 8
, in the middle of the build, I will get this error message:
c:\Code\pandas_dev\pandas\pandas\_libs\tslibs\src\datetime\np_datetime.c : fatal error C1083: Cannot open compiler generated file: 'c:\Code\pandas_dev\pandas\build\temp.win-amd64-3.7\Release\pandas\_libs\tslibs\src\datetime\np_datetime.obj': Permission denied
What I think is happening is that in setup.py
, _libs.tslibs.conversion
, libs.tslibs.np_datetime
and _libs.tslibs.period
all have to compile pandas/_libs/tslibs/src/datetime/np_datetime.c
. If the parallel build is timed in such a way that two of those extensions are being built at the same time, then they can conflict in writing np_datetime.obj
. A similar thing can happen with _libs.lib
, _libs.parsers
, and _libs.tslibs.parsing
all having to compile pandas/_libs/src/parser/tokenizer.c
.
If I do two python setup.py build_ext --inplace -j 8
steps in a row, then everything is fine. Also things are fine with -j 4
. But I can imagine that even with -j 4
, you could end up with the same kind of issue, even in CI.
I don't know enough about how extensions are built, distutils and setuptools in order to avoid this potential collision during the build process.
Output of pd.show_versions()
Comment From: WillAyd
You are using the MSVC compiler right? Not sure if it matters just asking
Comment From: Dr-Irv
You are using the MSVC compiler right? Not sure if it matters just asking
Yes.
Comment From: alimcmaster1
I sometimes do pandas dev on windows and recall running into this exact problem with -j 4
Comment From: alimcmaster1
Maybe @scoder or someone from Cython can help?
Comment From: adamjstewart
Was this fixed by #30862? Can this be closed now?
Comment From: Dr-Irv
@adamjstewart Don't think so. I opened this one while #30862 was being developed.
Comment From: mgorny
This is actually a pretty awful bug in distutils. Basically, if you use the same source file in two extensions, distutils can hit pretty nasty race conditions compiling the file twice simultaneously. This is especially bad on Linux because there's no immediate error, but the installed extensions are simply broken (e.g. missing symbols). I'm going to submit a PR shortly.
Comment From: leycec
I'm equally astonished, appalled, and disappointed that Pandas developers like @jreback pushed back against #40285 with nonsensical commentary like "why is j more than like 4 actually useful?" I mean, come on. Scalability is an unconditional good. This is not up for meaningful debate. :facepalm:
The trivial resolution here is to do what everyone else in Python's standard scientific stack does: disable parallel building entirely by forcing -j 1
in setup.py
. Yes, that's demonstrably awful – but there are no working alternatives on the table. Forcing -j 1
at C extension build time is exactly what SciPy and everyone else does. That's the MostlyRightThing™ to do, because setuptools
upstream has no interest or intention of resolving this within the expected lifetime of our Universe.
Fortunately, this is trivial. Just override the finalize_options()
method of your custom build_ext
class in setup.py
to resemble:
class build_ext(_build_ext):
def finalize_options(self):
super().finalize_options()
# Disable distutils parallel build due to upstream race conditions. See
# pandas-dev/pandas#30873
if self.parallel:
print("NOTE: -j build option not supported.")
self.parallel = None
Boom! Issue trivially solved. Pandas then happily munch on bamboo shoots as a show of appreciation. :bamboo: :panda_face:
Comment From: jonashaag
I can reliably reproduce this on Windows, see e.g. https://github.com/pandas-dev/pandas/pull/46611
Comment From: WillAyd
Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows
Comment From: WillAyd
Also does stdout show the MSVC as actually using the -j flag during compilation or is it mapped to something else? MSVC should be using /MP as the flag, so curious if that gets translated by setuptools along the way
https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=msvc-170
Comment From: Dr-Irv
Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows
Given the discussion above, I don't think it's a compiler issue. It has to do with how setuptools
has no intention of fixing this.
Comment From: WillAyd
Might be overlooking but where is the setuptools discussion?
Sent from my iPhone
On Apr 6, 2022, at 8:49 AM, Irv Lustig @.***> wrote:
Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows
Given the discussion above, I don't think it's a compiler issue. It has to do with how setuptools has no intention of fixing this.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
Comment From: Dr-Irv
Might be overlooking but where is the setuptools discussion?
See above, particularly https://github.com/pandas-dev/pandas/issues/30873#issuecomment-792274356 and https://github.com/pandas-dev/pandas/issues/30873#issuecomment-849309336
Comment From: WillAyd
I checked setuptools github but didn't find any bug report. Can someone link what is alluded to in those comments?
The closest I could find is this. It is similar but not exactly the same:
https://github.com/pypa/setuptools/issues/2442
From personal experience I have found the MSVC to be pretty lacking compared to other compilers, from standards complaince, feature, and error / warning reporting perspectives. So it would be good to at least try some other compilers out and figure out if this is an OS issue or a compiler issue
Depending on what we can figure out there we can try to report upstream in the appropriate channel. I think we all agree this isn't necessarily a pandas issue but if its something we want addressed we can likely push this along forward process of elimination
Comment From: Dr-Irv
I checked setuptools github but didn't find any bug report. Can someone link what is alluded to in those comments?
See my original description at the top of this issue here; https://github.com/pandas-dev/pandas/issues/30873#issue-547779187
The parallelization of the process is done by setuptools
where it takes each extension that we want to build (see setup.py
starting at line 441), and then just fires of parallel builds, one for each extension up to the limit specified. Since some of the extensions are dependent on the same C file, it could be the case that two parallel compiles are trying to write the same OBJ file at the same time. I'm guessing that MSVC is putting a lock on the OBJ file (probably smart to do), so it then reports an error.
Based on comments above, it could happen in Linux too.
Probably the right solution for us is to split the building of the extensions in two parts. One part where there are no conflicts, and you can use the -j 8
flag, and the other where the extensions must be built sequentially, so you force "-j 1" on that part.
Comment From: lithomas1
closed by #51525.