Pandas Comparison with R should discuss dcast as well as aggregate

The Comparison with R docs have a great section on how to go from R's aggregate function to Pandas's groupby, but as this StackOverflow question shows, there are similar but not identical problems going from R's dcast to Panda's groupby and aggregate.

I'm not sure exactly what the documentation should say here. (By the way, I'm also by no means sure that my answer to that question was the best one, so feel free to chime in there too…), just that maybe it should say something.

Comment From: jreback

oh, I answered you on SO.

this is just a reference/tutorial for someone coming from R who wants to know how to do dcast in pandas. You can put up your example almost exactly (though make sure its self-contained, so can be copy-pasted).

Comment From: onesandzeroes

Yeah, I'd say just add your SO answer as a separate section (with copy-paste reproducible data for both R and Python). You might want to point out the similarities to aggregate and tapply, but I'd say it's worth putting dcast in its own section. I liked the dcast/melt functions enough that I rarely used base R functions like aggregate, and I suspect others reading this section of the docs might be the same, i.e. more likely to find what they're looking for through a dcast header.

Comment From: 11rchitwood

Shouldn't the doc's comparison with R quick reference clarify where each R function comes from? As is, it contains it mixes base and dplyr functions. I suggest prefixing package names or changing to just base R code. Example from the second line in the table:

dplyr::slice(df, 1:10) # specifying that this function comes from dplyr
df[1:10, , .drop = FALSE] # or replace with base R equivalent

If keeping with dplyr, it might be better to do the full tidyverse comparison, since functionality from readr, tidyr, and other tidyverse packages are part of pandas. This would make using the magrittr pipe %>% a good idea.

df %>% slice(1:10)

This is more worth it when change functions together a la pandas method chaining.

Final thought: many useRs use the data.table package. Perhaps a pandas/data.table comparison is would be useful also?

Compliments to the authors as this is one of the best comparisons I have seen. For more base/dplyr comparisons see here.

Comment From: jreback

would certainly take updates for more idiomatic R as well as links to R api docs (meaning this could be multiple PRs)

Comment From: MarcoGorelli

I'm not sure exactly what the documentation should say here.

this has been open for nearly 10 years with no activity, so without a concrete suggestion, let's close