CC @jbrockmendel who I think may have an interest in cleaning up the JSON code.
One thing I've had on the back of my mind for months is to untangle the web that is objToJSON.c . The main issue in my mind is the object introspection loop that does a lot of heavy lifting just to traverse a DataFrame, and in different ways depending on the orient chosen
A very pie in the sky approach would have us refactor to remove any DataFrame iteration from objToJSON.c into a separate file. We could even mimic the NumPy C API approach for iteration NumPy arrays. Outside of JSON, this also exposes a C interface through which you can read Python objects.
I started on this a while back in my own branch but haven't had the time / motivation to see it through. Here's what the header looks like for the implementation I have in mind:
https://github.com/WillAyd/pandas/blob/json-array-iter/pandas/_libs/src/ujson/python/ndframe_iter.h
What I've noticed is that there is still probably a layer missing. Right now the iteration always provides a PandasScalar
object every time you call next on it. From trying to intertwine that with the JSON code I think we need some intermediary Row / Column representation so that you can traverse NDFrame -> (Row|Column) -> PandasObject
My branch is a mess right now and I don't expect to have a solution soon (if ever) but offering up here as conversation starter to others
Comment From: jbrockmendel
cc @deponovo