This is a follow-up to #26242, for the following points:
- [ ] Add test in test_merge_asof.py
that checks that merge_asof()
raises an exception if called on unordered categories for a merge key.
- [ ] Remove docstrings in tests in test_merge_asof.py
- [ ] Add test to check that merge_asof
works when one of the merge keys is an ordered categorical
- [ ] Raise MergeError sooner (rather than the ValueError that happens later) when asked to do a merge and one of the keys is an unordered categorical.
- [ ] use repr(x.dtype)
instead of x.dtype
as input to .format()
in exception messages everywhere in merge.py
Comment From: chrish42
There's a fifth potential cleanup. Currently, most of the tests use the "ticker data" example with a merge_asof(trades, quotes, on='time', by='ticker')
, but with the "ticker" column left as object dtype. This leaves on the table a large number of tests that could exercise the by=[category dtype]
code paths.
I see potentially two options here:
1. Just convert the "ticker" column to be a categorical. Pro: one-line change. Con: many fewer tests for the by=[object dtype]
case.
2. Refactor the ticker
data to be a pytest parametrized fixture, and then parametrize that fixture to generate both "object" and "category" variants for the "trades" and "quotes" dataframes. Pro: both cases now tested. Con: bigger change.
What do you think? If option 2 sounds good, now would be a good time for me to do it at least, given that I've been looking at that code lately (because I've been doing asof merges on dataframes that contain a lot of categorical data).
Comment From: chrish42
Maybe this would also be a good first step to then work on #16454?
Comment From: gfyoung
I like option 2, as it's more comprehensive.
Comment From: chrish42
@gfyoung Me too, especially since I'm interested in having merge_asof() working better with categorical data. Just looking for some approval that this direction is good with the Pandas developers before reworking a bunch of the tests there., and also that it's okay to use the more advanced pytest features in tests for Pandas.
Comment From: WillAyd
@chrish42 I'd say go for it! We use purest parametrized fixtures in other places as well so nothing out of the ordinary
Comment From: mroeschke
I think this has been cleaned up over the years so closing. Can open individual tickets for remaining issues is something isnt addressed