xref https://github.com/pandas-dev/pandas/pull/50205#discussion_r1046097646
If any dependency has a minimum version specified in
https://github.com/pandas-dev/pandas/blob/16b9c98bfedcfae031df5d570ef68a2d126826b7/pyproject.toml#L57-L119
then if that dependency appears in environment.yml
or any of the *.yaml
files, then it should either be pinned to that minimum version, or be marked as >=
that minimum version
Let's add a script to automate checking this!
https://github.com/pandas-dev/pandas/blob/main/scripts/validate_min_versions_in_sync.py is kind of related
Feel free to tag me for review if you take this on
Comment From: mroeschke
Notes:
ci/deps/actions-*-minimum_versions.yaml
should be pinned as=
instead of>=
- IMO
validate_min_versions_in_sync.py
should be updated to ensure the dependency files are also in sync
Personally not sure if this is a good first issue as there are a lot of moving parts but happy to be wrong
Comment From: seanjedi
Hmm, I believe I understand the ask.
So essentially get all the dependencies from pyproject.toml optional-dependencies, and compare them against the dependencies within environment.yml, and if there are any dependencies that are newer within pyproject.toml tan environment.yaml, then update environment.yaml to either =
or >=
correct?
So if that is the case, then:
1) how do we determine when we want to have it =
or >=
? Would that be depdendent on how pyproject.toml has it?
2) what if environment.yml somehow has a newer depdency, then should be ignore it, or should we downgrade it to the version that pyproject has it at?
Also can I take this?
Comment From: MarcoGorelli
I'd say:
1. if it already has =
then keep as-is, else, add >=
2. not sure I understand the question, can you show an example?
And yes, sure!
Comment From: seanjedi
For 2 let's say hypothetically we have a situation where In environment.yml we have:
- numba=0.53.1
and in pyproject we have:
numba>=0.52
Should I check if the dependencies in pyproject are indeed greater than the ones in environment, or if they differ then set the dependencies to pyproject
This might be all moot, but I wanted to doublecheck since in my head the way I was thinking of comparing the two is if pyproject has a dependency greater than environment, then update environment, otherwise stay the same, or do we want the two be equal?
I think the case we have, is that we want it to be equal, so above even though environment has a great version, we should have it pinned to be numba>=0.52
Comment From: seanjedi
take
Comment From: MarcoGorelli
If there's already a pin, then as long as the version is greater than the minimum version, you can leave it as it is
If there's no pin, then just put >=
and the minimum version
Comment From: seanjedi
So should I create a new file for this, or extend validate_min_versions_in_sync.py to be able to update these dependencies as well?
Comment From: MarcoGorelli
it's probably fine to extend that file if possible
Comment From: seanjedi
Created a branch here to work on the issue: seanjedi-50207
Comment From: seanjedi
@MarcoGorelli I have a question regarding which YAML files to check
Right now I am checking this list of YAML file (these are all the .yml files I can find in both the root
and ci/deps
, not all of these have dependencies, that is correct right? )
Also looking at how we were checking the dependencies earlier, it seems that we were only concerned with the optional dependencies, should I only be checking optional dependencies, or if I find any dependency that is both within pyproject.toml and any yaml file, then I should pin it?
Comment From: MarcoGorelli
you only need to look at environment.yml
and ci/deps/*.yaml
, the other yaml files are irrelevant
I find any dependency that is both within pyproject.toml and any yaml file, then I should pin it
yup (unless it's pinned already)
Comment From: seanjedi
Update (Since I have not updated in a long while): I am still working on this issue, but progress has slowed down a lot since the last commit I made in my branch. This began with the Holiday's taking my time away so that I can spend it with family. Now it is due to graduate school, and a few requirements for graduation that appeared that I have to appeal against now due to administration. I bring this up so that if someone sees this, I want to ensure that this issue is not dead and that if anyone else wants to work on this issue, I would be open to that (especially collaboratively), and please let me know so that I can let you know what I have worked on so that we don't end up doing the same work.
Otherwise, I will continue working on this issue when I have the chance.
Comment From: MarcoGorelli
cool, thanks - I've removed the assignment so it's clear this is open
Comment From: kathleenhang
take
Comment From: kathleenhang
@seanjedi Hi there, I saw you mentioned about working collaboratively, and I'm open to that. I did see the branch you created for this issue. Feel free to let me know if you worked on anything extra or details on how you'd like to collaborate together. I'm going to start working on this issue now.
Edit: Here is the WIP scripts/validate_min_versions_in_sync.py
Also, I have a question. What types of characters are allowed for the version portion of the YAML dependency files? I ask this because the only dependency file in ci/deps/ which breaks the script is:
ci/deps/actions-pypy-38.yaml - line 8 - contains:
- python=3.8[build=*_pypy] # TODO: use this once pypy3.8 is available
Because it contains 2 "=" signs. I am wondering if I should write code to accommodate this or if this is out of place and only digits and decimals should be accommodated for the version portion?
Comment From: kathleenhang
@MarcoGorelli Could you please review my script? I linked it above. I'm wondering if it will run too slow or be too risky since it would create a new file and delete the old one. I'm still working on it, but wanted to know if I'm headed in the right direction.
Comment From: MarcoGorelli
nice! taking a look 👀
Comment From: MarcoGorelli
hey @kathleenhang, looks like you're on the right path - the script doesn't run at the moment though: if you have a list, then .append
will return None
, you probably wanted something like
all_yaml_files = list(YAML_PATH.iterdir())
all_yaml_files.append(ENV_PATH)
And
data = yaml_f.read()
yaml_file = yaml.safe_load(yaml_f)
should probably have been
data = yaml_f.read()
yaml_file = yaml.safe_load(data)
?
If you can get it to run without errors, then I'll take a closer look
Comment From: kathleenhang
@MarcoGorelli Got it working! I commented out line 8 from ci/deps/actions-pypy-38.yaml which shows- python=3.8[build=*_pypy] # TODO: use this once pypy3.8 is available
. If you run the script again, make sure to comment that line out, or the script breaks.
What should be done for these edge cases?
- pyqt5==5.15.1
- numpy<1.24.0
- sqlalchemy<1.4.46
- python=3.8[build=*_pypy] # TODO: use this once pypy3.8 is available
Also the script does mess up on a line in environment.yml: - jinja2>=3.0.0>=3.0.0
. I'm planning to fix that. Other than that, it works for me.
I'm not sure if it will work for you since I am using the imported os python module. I'm not sure if that may be different on Windows due to differing paths, etc.
Let me know what you think, thanks.
Comment From: MarcoGorelli
thanks for updating!
so:
- pyqt5==5.15.1
- you should check that the pin (here, 5.15.1
) is greater or equal to the minimum version
- numpy<1.24.0
- this should be greater than the minimum version. You can set both greater than and less than in pip, see https://stackoverflow.com/questions/50842144/requirements-txt-greater-than-equal-to-and-then-less-than
- python=3.8[build=*_pypy]
- I think we can exclude this one, like in an exclusions list in the script