The Comparison with R docs have a great section on how to go from R's aggregate
function to Pandas's groupby
, but as this StackOverflow question shows, there are similar but not identical problems going from R's dcast
to Panda's groupby
and aggregate
.
I'm not sure exactly what the documentation should say here. (By the way, I'm also by no means sure that my answer to that question was the best one, so feel free to chime in there too…), just that maybe it should say something.
Comment From: jreback
oh, I answered you on SO.
this is just a reference/tutorial for someone coming from R who wants to know how to do dcast
in pandas. You can put up your example almost exactly (though make sure its self-contained, so can be copy-pasted).
Comment From: onesandzeroes
Yeah, I'd say just add your SO answer as a separate section (with copy-paste reproducible data for both R and Python). You might want to point out the similarities to aggregate
and tapply
, but I'd say it's worth putting dcast
in its own section. I liked the dcast
/melt
functions enough that I rarely used base R functions like aggregate
, and I suspect others reading this section of the docs might be the same, i.e. more likely to find what they're looking for through a dcast
header.
Comment From: 11rchitwood
Shouldn't the doc's comparison with R quick reference clarify where each R function comes from? As is, it contains it mixes base and dplyr functions. I suggest prefixing package names or changing to just base R code. Example from the second line in the table:
dplyr::slice(df, 1:10) # specifying that this function comes from dplyr
df[1:10, , .drop = FALSE] # or replace with base R equivalent
If keeping with dplyr, it might be better to do the full tidyverse comparison, since functionality from readr, tidyr, and other tidyverse packages are part of pandas. This would make using the magrittr pipe %>%
a good idea.
df %>% slice(1:10)
This is more worth it when change functions together a la pandas method chaining.
Final thought: many useRs use the data.table package. Perhaps a pandas/data.table comparison is would be useful also?
Compliments to the authors as this is one of the best comparisons I have seen. For more base/dplyr comparisons see here.
Comment From: jreback
would certainly take updates for more idiomatic R as well as links to R api docs (meaning this could be multiple PRs)
Comment From: MarcoGorelli
I'm not sure exactly what the documentation should say here.
this has been open for nearly 10 years with no activity, so without a concrete suggestion, let's close