A common complaint in contributor sprints is that setting up a development environment takes too long
Since the docs have moved from recommending conda
to mamba
, this has improved, but I think it could still be better.
For most contributors (especially casual ones at sprints), most dependencies are irrelevant. We could have minimal environment and requirements files which contain the bare minimum to build pandas locally so people can get started quickly
I think just cython numpy python-dateutil pytz pytest pytest-asyncio
should be enough - we could have a script which creates this from environment.yml
and takes version numbers from there
Comment From: YvanCywan
Would you mind if I took a look?
Comment From: phofl
This has the disadvantage that you can't run most of the tests
Comment From: YvanCywan
We could always add more test dependencies to the minimal_environment.yml
, to at least get more of the tests working properly.
I am assuming that the packages under # test dependencies
in the environment.yml
is all that is needed for the majority of them, I could be wrong however.
Comment From: MarcoGorelli
I'll try this out at the next contributor sprint - if it's enough for people to be productive, maybe we can consider adding it to the docs, or it can be something that's only ever part of instructions for sprints
Comment From: phofl
If we want to add this to the docs, we have to add a couple of clarifications, that this is not sufficient to pass all tests and some things might fail unexpectedly
Comment From: mroeschke
This somewhat assumes "minimal" contributions will be bug fixs/enhancements, but doc changes might be common contributions that should ensure the doc dependencies are available too?
An alternative idea would be to provide conda lock files for a variety of platforms such that users aren't running the slow solve step but still get all the dependencies to make any type of contribution: https://github.com/conda-incubator/conda-lock (these can also be used in the CI too)
Comment From: asishm
This would be a great change to have. The current environment installs a lot of things. For example pytorch is listed as a dependency (of a downstream pacakge) -https://github.com/pandas-dev/pandas/blob/0dadc71dd4653e5b858d7b4153df1d7aded4ba46/environment.yml#L73 and as a pandas user who might want to do some minor bugfixes, it's a bit confusing as to why I also need to have pytorch installed.
I also ran into https://github.com/pandas-dev/pandas/issues/47305 when using -j 2
option to speed things up. It happened twice consistently after which I gave up and switched back to -j 1
.
Comment From: Dr-Irv
Probably have to add things used in the pre-commit, like black
, flake8
, isort
and others.
Comment From: YvanCywan
@Dr-Irv Well, that depends on if someone installs pre-commit for the project or not. Otherwise, the precommit CI should still function as normal when the pull request is made.
But to have some pre-PR checks, it might be worth adding it regardless.
Comment From: WillAyd
This could also be a use case to publish a pandas-dev image on DockerHub
Comment From: MarcoGorelli
With regards to this particular issue, I've realised that the 311-dev job actually has exactly what I was looking for
If we just move that those requirements into their own file, then that gives a minimal installation with which you can build pandas and run the vast majority of tests
This would be really useful when running tasks on Colab/Kaggle (for example, bisecting regressions)
https://github.com/pandas-dev/pandas/pull/50339 would do this