We've been stuck on how to handle PEP8 cleanups for a while. On the one hand PEP8 storms break PRs and git blame, OTOH the back and forth on code review about whitespace fixes drives some maintainers (me) crazy.
But: - Autopep8 accepts a line-range argument which tells it to only fix problems in a given area. - git diff tells you which lines on which file were altered between two points in history.
So:
- write a script "pep8-radius.py" that takes a commit hash h
as input, find altered lines
between HEAD and h
and then invokes autopep8 on a given radius around those
altered lines to locally cleanup where things changed.
- Do one last pep8 storm.
- Get contributors to run the script and include/squash the cleanups in the PR.
This is a fun little hack which might become quite handy outside pandas. would appreciate someone picking this idea up and making it happen as I'm a little burned out on this sort of work.
Comment From: hayd
This is a great idea. I've put something together which runs over the changed lines (I just grab the output of git diff manually) will post as another project as it's pretty generic.
I think may be some bug(s) in autopep8 when using the range argument over already indented code, it doesn't fix it. Seems to get most other infractions though.
Comment From: hayd
@y-p A first go is here https://github.com/hayd/btyfi/blob/master/btyfi.py
pip installable as either pep8radius or Better-Than-You-Found-It... but expect bugs! (Edit: renamed)
Comment From: schodge
I had wanted to attempt this, but didn't have time - at any rate, this is a far nicer implementation than I would have come up with (never used argparse before).
Would adding the check GitHub commit hash functionality just be a matter of pulling the URL of this form:
https://stackoverflow.com/questions/22180869/get-git-commit-from-short-hash-using-github-api
using something like httplib, and then feeding into the code already present? If so, I'll try to write it up tomorrow as a break in hunting for a mysterious parasitic capacitance plaguing my impedance meter...
Shayne
Andy Hayden mailto:notifications@github.com Tuesday, March 25, 2014 10:28 PM
@y-p https://github.com/y-p A first go is here https://github.com/hayd/pep8-radius/blob/master/pep8radius.py
pip installable as pep8radius... but expect bugs!
— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/6248#issuecomment-38651189.
Comment From: hayd
@schodge I'm just naively passing rev to git diff and parse the results, I think git diff should just accept either branch name (e.g. master) or hash... But I confess to not having to fully tested (I'm not sure a good way to test these in general)!
I'm also not 100% that I'm checking the correct lines (I think what I'm actually doing is autopep8-ing those which are displayed in the git-diff, which is ok, but maybe it should be the subset of just those edited lines? Would love to have more eyes/any fixes!
Comment From: hayd
I tagged this for 0.15, perhaps we can pep8 storm after 0.14 (as @y-p suggests) and include this script / add to dev page / wiki etc.
cc @jreback @jorisvandenbossche (not sure else may be interested)
Comment From: hayd
I've found a couple of bugs in the line range argument of autopep8, and fixed one. Specifically indentation levels aren't fixed (e.g. tabs vs spaces - which is tricky to review in GH so would be good to fix).
(It already ensures new lines at the end of file.) Any other critical ones?
Comment From: jorisvandenbossche
@hayd Tried it out and seems to work, very nice! Although, I am not really sure about the name ... :-)
I think we should just more stress that new contributions have to follow pep8, and then we can point them to this to help with that. So in that case it doesn't matter if it's tagged 0.14 or 0.15, as also new contributions at this moment should ideally be pep8 compliant.
Comment From: hayd
I think the original idea is that we'd do a big pep8 storm, then (new) contributors could be directed to this to run this before commiting / rebasing etc. hence ensuring they are PEP8-y. It's v. easy just to run it!
From above: "the back and forth on code review about whitespace fixes drives some maintainers (me) crazy." :)
Comment From: jreback
I think we should do a pep fix, but want to work down some of the issues (your @hayd) first .... then should do it in several commits....e.g. do a few files at a time to avoid creating massive confusion....
Comment From: hayd
The good thing about this script, is that people can start using it before any mass pep8 fix (though I agree we should) and it won't lead to noise...
Trying to wrap my head around fix of the indentation issue... hope to have a fix soonish. I think we already have a lot (after next autopep8 release), but imo best way to test is to start using!
Comment From: jorisvandenbossche
yes, that is what I meant that using this does not depend on whether we already did a PEP8 storm or not. For new code we can always think to ask for PEP8 compliance, and this tool can help to diminish the back and forth code review
Comment From: hayd
@jorisvandenbossche cool. Ok will see if maybe they'll roll a point release once I fix indentation... and ping back here.
Comment From: hayd
Fixed indentation stuff, just waiting on point release (or you can install autopep8 from github master).
Related is docstring formatting (autopep8 doesn't touch strings I don't think), not sure how well this tool would fair on pandas (would be interesting to have a play): https://github.com/myint/docformatter
...trivial to include call to this on the line ranges in pep8radius script if it worked for us.
Comment From: hayd
Please do have a play with the new version on pypi: "0.8". :)
Comment From: jorisvandenbossche
The link: https://github.com/hayd/pep8radius/
Comment From: hayd
Should I add something about this to contributing.md ? Assuming it's not too broken... :)
Although I did have this question https://github.com/hayd/pep8radius/issues/54 - should it compare against last shared commit rather than tip - which maybe would be worth addressing now (although usually we rebase on master then can compare against master so it's usually the same)...
Comment From: jorisvandenbossche
Yes, I think it is a good idea to add this to the contributors guide (and we should also start asking for this on PRs).
About your question, it is indeed true that if someone updated his master, but not yet rebased his branch against this updated master (which I think certainly can happen, certainly when working on multiple branches), the pep8radius will also correct changes in master that are not yet in the branch, which is not really intended I think. So maybe last shared commit is safer than tip (and if you have rebased, this will be the same anyway)
Comment From: hayd
Ok, I agree, will tweak the script/push a release before editing contributing.
The issue is that I don't quite understand what this means in the context of a merge workflow (I guess you'd want the diff ignoring merge commits from the last shared commit??)... so perhaps I'll hide this behind a flag for now. :s
Also, if using this please check the diff for sanity/bugs before doing the change inplace (ie pep8radius --diff
before pep8radius -i
)!
Update: you do git diff master...branch
to ignore merges.
Comment From: hayd
updated with https://github.com/pydata/pandas/commit/3849d5de2ab4ab543579f54852dd84aca3e0181a (maybe it should be somewhere else?) and pushed out a release 0.8.1 which does the last shared commit thing (which is what we want - even if you merge).
pep8radius master --diff # show diff
pep8radius master --diff --inplace # show diff and do inplace
pep8radius master -di # same as above
Comment From: hayd
I spent some time cleaning this up over the last few days. I've been using it as git rad
for a while now (to compare against merge-base with master) which works pretty well.
https://github.com/hayd/pep8radius
Comment From: jreback
ohh, looks pretty sexy, and has video!
Comment From: hayd
and has video!
Spared no expense!
Comment From: jorisvandenbossche
PEP8 is in the meantime checked for during CI testing for all lines, so closing this