Code Sample, a copy-pastable example if possible

python setup.py build_ext --inplace -j 8

Problem description

See some discussion in #30862 . I have 8 cores on Windows. When I use -j 8, in the middle of the build, I will get this error message:

c:\Code\pandas_dev\pandas\pandas\_libs\tslibs\src\datetime\np_datetime.c : fatal error C1083: Cannot open compiler generated file: 'c:\Code\pandas_dev\pandas\build\temp.win-amd64-3.7\Release\pandas\_libs\tslibs\src\datetime\np_datetime.obj': Permission denied

What I think is happening is that in setup.py, _libs.tslibs.conversion, libs.tslibs.np_datetime and _libs.tslibs.period all have to compile pandas/_libs/tslibs/src/datetime/np_datetime.c . If the parallel build is timed in such a way that two of those extensions are being built at the same time, then they can conflict in writing np_datetime.obj. A similar thing can happen with _libs.lib, _libs.parsers, and _libs.tslibs.parsing all having to compile pandas/_libs/src/parser/tokenizer.c .

If I do two python setup.py build_ext --inplace -j 8 steps in a row, then everything is fine. Also things are fine with -j 4. But I can imagine that even with -j 4, you could end up with the same kind of issue, even in CI.

I don't know enough about how extensions are built, distutils and setuptools in order to avoid this potential collision during the build process.

Output of pd.show_versions()

c:\Code\pandas_dev\pandas\pandas\core\index.py:29: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. FutureWarning, INSTALLED VERSIONS ------------------ commit : 13858f6e11790bb0f52092d4f8ac65c4057d9e74 python : 3.7.6.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 0.26.0.dev0+1734.g13858f6e1 numpy : 1.17.3 pytz : 2019.3 dateutil : 2.8.1 pip : 19.3.1 setuptools : 42.0.2.post20191201 Cython : 0.29.14 pytest : 5.3.2 hypothesis : 4.57.1 sphinx : 2.3.1 blosc : None feather : None xlsxwriter : 1.2.7 lxml.etree : 4.4.2 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.10.1 pandas_datareader: None bs4 : 4.8.2 bottleneck : 1.3.1 fastparquet : 0.3.2 gcsfs : None lxml.etree : 4.4.2 matplotlib : 3.1.2 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.1 pandas_gbq : None pyarrow : 0.15.1 pytables : None pytest : 5.3.2 s3fs : 0.4.0 scipy : 1.3.1 sqlalchemy : 1.3.12 tables : 3.6.1 tabulate : 0.8.6 xarray : 0.14.1 xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.7 numba : 0.46.0

Comment From: WillAyd

You are using the MSVC compiler right? Not sure if it matters just asking

Comment From: Dr-Irv

You are using the MSVC compiler right? Not sure if it matters just asking

Yes.

Comment From: alimcmaster1

I sometimes do pandas dev on windows and recall running into this exact problem with -j 4

Comment From: alimcmaster1

Maybe @scoder or someone from Cython can help?

Comment From: adamjstewart

Was this fixed by #30862? Can this be closed now?

Comment From: Dr-Irv

@adamjstewart Don't think so. I opened this one while #30862 was being developed.

Comment From: mgorny

This is actually a pretty awful bug in distutils. Basically, if you use the same source file in two extensions, distutils can hit pretty nasty race conditions compiling the file twice simultaneously. This is especially bad on Linux because there's no immediate error, but the installed extensions are simply broken (e.g. missing symbols). I'm going to submit a PR shortly.

Comment From: leycec

I'm equally astonished, appalled, and disappointed that Pandas developers like @jreback pushed back against #40285 with nonsensical commentary like "why is j more than like 4 actually useful?" I mean, come on. Scalability is an unconditional good. This is not up for meaningful debate. :facepalm:

The trivial resolution here is to do what everyone else in Python's standard scientific stack does: disable parallel building entirely by forcing -j 1 in setup.py. Yes, that's demonstrably awful – but there are no working alternatives on the table. Forcing -j 1 at C extension build time is exactly what SciPy and everyone else does. That's the MostlyRightThing™ to do, because setuptools upstream has no interest or intention of resolving this within the expected lifetime of our Universe.

Fortunately, this is trivial. Just override the finalize_options() method of your custom build_ext class in setup.py to resemble:

class build_ext(_build_ext):
    def finalize_options(self):
        super().finalize_options()

        # Disable distutils parallel build due to upstream race conditions. See
        # pandas-dev/pandas#30873
        if self.parallel:
            print("NOTE: -j build option not supported.")
        self.parallel = None

Boom! Issue trivially solved. Pandas then happily munch on bamboo shoots as a show of appreciation. :bamboo: :panda_face:

Comment From: jonashaag

I can reliably reproduce this on Windows, see e.g. https://github.com/pandas-dev/pandas/pull/46611

Comment From: WillAyd

Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows

Comment From: WillAyd

Also does stdout show the MSVC as actually using the -j flag during compilation or is it mapped to something else? MSVC should be using /MP as the flag, so curious if that gets translated by setuptools along the way

https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=msvc-170

Comment From: Dr-Irv

Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows

Given the discussion above, I don't think it's a compiler issue. It has to do with how setuptools has no intention of fixing this.

Comment From: WillAyd

Might be overlooking but where is the setuptools discussion?

Sent from my iPhone

On Apr 6, 2022, at 8:49 AM, Irv Lustig @.***> wrote:

 Can anyone confirm if choosing a different compiler helps? I assume by default this issue is affecting the visual studio compiler. I'm wondering if it is reproducible using another compiler on Windows

Given the discussion above, I don't think it's a compiler issue. It has to do with how setuptools has no intention of fixing this.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

Comment From: Dr-Irv

Might be overlooking but where is the setuptools discussion?

See above, particularly https://github.com/pandas-dev/pandas/issues/30873#issuecomment-792274356 and https://github.com/pandas-dev/pandas/issues/30873#issuecomment-849309336

Comment From: WillAyd

I checked setuptools github but didn't find any bug report. Can someone link what is alluded to in those comments?

The closest I could find is this. It is similar but not exactly the same:

https://github.com/pypa/setuptools/issues/2442

From personal experience I have found the MSVC to be pretty lacking compared to other compilers, from standards complaince, feature, and error / warning reporting perspectives. So it would be good to at least try some other compilers out and figure out if this is an OS issue or a compiler issue

Depending on what we can figure out there we can try to report upstream in the appropriate channel. I think we all agree this isn't necessarily a pandas issue but if its something we want addressed we can likely push this along forward process of elimination

Comment From: Dr-Irv

I checked setuptools github but didn't find any bug report. Can someone link what is alluded to in those comments?

See my original description at the top of this issue here; https://github.com/pandas-dev/pandas/issues/30873#issue-547779187

The parallelization of the process is done by setuptools where it takes each extension that we want to build (see setup.py starting at line 441), and then just fires of parallel builds, one for each extension up to the limit specified. Since some of the extensions are dependent on the same C file, it could be the case that two parallel compiles are trying to write the same OBJ file at the same time. I'm guessing that MSVC is putting a lock on the OBJ file (probably smart to do), so it then reports an error.

Based on comments above, it could happen in Linux too.

Probably the right solution for us is to split the building of the extensions in two parts. One part where there are no conflicts, and you can use the -j 8 flag, and the other where the extensions must be built sequentially, so you force "-j 1" on that part.

Comment From: lithomas1

closed by #51525.