A common complaint in contributor sprints is that setting up a development environment takes too long

Since the docs have moved from recommending conda to mamba, this has improved, but I think it could still be better.

For most contributors (especially casual ones at sprints), most dependencies are irrelevant. We could have minimal environment and requirements files which contain the bare minimum to build pandas locally so people can get started quickly

I think just cython numpy python-dateutil pytz pytest pytest-asyncio should be enough - we could have a script which creates this from environment.yml and takes version numbers from there

Comment From: YvanCywan

Would you mind if I took a look?

Comment From: phofl

This has the disadvantage that you can't run most of the tests

Comment From: YvanCywan

We could always add more test dependencies to the minimal_environment.yml, to at least get more of the tests working properly.

I am assuming that the packages under # test dependencies in the environment.yml is all that is needed for the majority of them, I could be wrong however.

Comment From: MarcoGorelli

I'll try this out at the next contributor sprint - if it's enough for people to be productive, maybe we can consider adding it to the docs, or it can be something that's only ever part of instructions for sprints

Comment From: phofl

If we want to add this to the docs, we have to add a couple of clarifications, that this is not sufficient to pass all tests and some things might fail unexpectedly

Comment From: mroeschke

This somewhat assumes "minimal" contributions will be bug fixs/enhancements, but doc changes might be common contributions that should ensure the doc dependencies are available too?

An alternative idea would be to provide conda lock files for a variety of platforms such that users aren't running the slow solve step but still get all the dependencies to make any type of contribution: https://github.com/conda-incubator/conda-lock (these can also be used in the CI too)

Comment From: asishm

This would be a great change to have. The current environment installs a lot of things. For example pytorch is listed as a dependency (of a downstream pacakge) -https://github.com/pandas-dev/pandas/blob/0dadc71dd4653e5b858d7b4153df1d7aded4ba46/environment.yml#L73 and as a pandas user who might want to do some minor bugfixes, it's a bit confusing as to why I also need to have pytorch installed.

I also ran into https://github.com/pandas-dev/pandas/issues/47305 when using -j 2 option to speed things up. It happened twice consistently after which I gave up and switched back to -j 1.

Comment From: Dr-Irv

Probably have to add things used in the pre-commit, like black, flake8, isort and others.

Comment From: YvanCywan

@Dr-Irv Well, that depends on if someone installs pre-commit for the project or not. Otherwise, the precommit CI should still function as normal when the pull request is made.

But to have some pre-PR checks, it might be worth adding it regardless.

Comment From: WillAyd

This could also be a use case to publish a pandas-dev image on DockerHub

Comment From: MarcoGorelli

With regards to this particular issue, I've realised that the 311-dev job actually has exactly what I was looking for

If we just move that those requirements into their own file, then that gives a minimal installation with which you can build pandas and run the vast majority of tests

This would be really useful when running tasks on Colab/Kaggle (for example, bisecting regressions)

https://github.com/pandas-dev/pandas/pull/50339 would do this