We would like to introduce gitpod integration, as a development environment quick start.
Gitpod can provide new contributors, with quick automated, and ready-to-code development environments. Instead of sending them to read your documentation for setup, how about telling them to click/tap a button, and pick an issue, they can already start to work on their first PR?
It may also be useful for experienced contributors, who work on many projects. They might notice something they can quickly make a PR on, but not have the time to open your contributor guide just now.
A gitpod saves you setup time and gets you to contribute your changes faster.
pandas already has a working Docker image, so making the custom gitpod Docker image was relatively easy. I have prepared a docker image and yml file to get things going. There are still a few more steps to complete the integration setup.
Next steps
- [x] Open a pandas DockerHub/Quay.io organization account, or use the GitHub container registry, to add the gitpod docker image there
- [x] Add the gitpod gitpod yml file in the repository root
- [ ] Create a Github Action for prebuilding the gitpod docker image and uploading it to DockerHub,
- nebari's docker actions builds to both quay.io and GH container registery
- scipy and NumPy's gitpod actions build to DockerHub
- [ ] Test the workflow is working correcly and adjust as needed.
- [x] Write documentation for using Gitpod, as well as guidance on which account to use:
- SciPy, and Numpy Gitpod Documentation
- Any maintainer and active OSS contributor can apply for the open source account.
- New contributors may use the free account which provides 50 hours/month, up to 4 parallel workspaces, and 30mins timeout on inactivity.
Attachments
- DockerfileGitpod
- The original pandas docker file, but extended to become a Gitpod! We opted for a Gitpod with prebuilds for fast loading. This requires adding the GitHub action to generate the Docker image each time the repository is updated.
- gitpod.yml
- This file still needs tweaking based on where we decide to place the docker images generated. It also has a few configurations for vscode extensions, which can be pre-configured for the gitpod. We can make make a few more tweaks as we finalize the setup.
The docker was tested locally as follows: by replacing $gh_username
in the dockerfile with your GitHub username, you should be able to run the DockerfileGitpod with the command docker build . -f DockerfileGitpod
(from the working directory the file is located in). It can be fiddly on M1 macs 🚨.
Comment From: mroeschke
Open a pandas DockerHub/Quay.io organization account, or use the GitHub container registry, to add the gitpod docker image there
Do you know how Gitpod manages image pulls and if there's quotas?
Comment From: noatamir
I asked on their discord. Will get back to you ☺️
Comment From: noatamir
They haven't replied yet ⏳. I also sent an e-mail now.
But based on their pricing page, I suspect that there is no quota since all of their plans include the following:
prebuilds: Enable prebuilds to continuously build your Git branches, so you and your team can always start coding right away.
Comment From: noatamir
And we got a reply!
Thank you for contacting Gitpod. As Gitpod does not host any publicly available Docker images ourselves there wouldn't be any limits you'd be subjected to there. You would need to check with whatever registry you're using to see if they have any limits.
Comment From: mroeschke
Okay cool!
Dockerhub (free) account has some limits (100/200 pulls per 6 hours should be okay) https://docs.docker.com/docker-hub/download-rate-limit/
The Github Container Registry isn't as clear to me what quotas exist, but it appears we have to pay for storing images? https://docs.github.com/en/billing/managing-billing-for-github-packages/about-billing-for-github-packages
Comment From: datapythonista
Sorry, a bit late to the party, didn't see this earlier.
I'd personally have this in a third party project. I think a similar discussion happened for VS code stuff, and that was the conclusion. The pandas project is already huge, and the CI huge and very complex. I think it's great that things like this exist if contributors find them useful, but I don't think it should be the pandas core team maintaining them, and the pandas CI and codebase the one bigger, slower, with extra complexity, and with new things that break.
I don't think there is any drawback in using another repo, and we can use one in the pandas-dev org. Even if my preference would be to start in a personal repo first, and move it to the pandas-dev org when the project starts to be mature.
Comment From: jorisvandenbossche
I think a similar discussion happened for VS code stuff, and that was the conclusion.
I am not directly aware of such discussion (we actually do have some VSCode specific configuration already with .devcontainer.json
, so this was added at some point. There is https://github.com/pandas-dev/pandas/pull/41721 where indeed you objected further customizing the existing .devcontainer.json
setup ).
But the one discussion that I found on the gitpod topic is a previous PR (https://github.com/pandas-dev/pandas/pull/34829), where people were actually OK with adding this, the PR only never got merged because of the contributor not further working on it.
As a small anecdote: I helped in two conference sprints the last two weeks, and in the first I had someone contributing using github codespaces, and she repeatedly said how amazing it was being able to directly work on something without to first set up the whole development environment. And in the second there was someone who struggled with the typical "needs Visual Studio Build Tools on Windows -> cannot install this on company laptop without devops involvement", and a setup like gitpod could have helped a lot. To be clear, I know that this only supports that it would be nice to have such gitpod integration set up, not that it necessarily lives in the main repo. I do think that it will be more accessible (since that is the standard approach) and better integrated if it lives in the main repo though.
Comment From: jorisvandenbossche
Some issues we have been running into related to not having write permissions outside of the pandas repo / mamba env:
- Running
pre-commit install
doesn't work: "PermissionError: [Errno 13] Permission denied: '/home/gitpod/.cache/pre-commit'" - Installing an extra package with mamba doesn't work: "Non-writable cache error"
Comment From: noatamir
The install issue in the last comment is addressed by https://github.com/pandas-dev/pandas/pull/52700 and already fixed in the Gitpod we deployed to dockerhub today.