@jorisvandenbossche and I have been looking at Cythonizing some of GeoPandas. This has resulted in us having an object that holds onto a numpy array of pointers to C-level Geometry objects. We would maybe like to include this numpy-like object as a column in a Pandas dataframe. However, we would still like to hold onto the object, and not have these pointers just join an integer block in the block manager. This is useful for things like garbage collection on the C side, odd indexing rules, etc..
My intuition says that putting a numpy-like object into a Pandas dataframe without it being coerced into part of a numpy array is probably not feasible with present-day Pandas, but I thought I'd check first just in case. We have backup plans if this isn't feasible, so it's not a big deal either way.
Comment From: jbrockmendel
AFAICT this would require implementing a new Block
type and patching BlockManager
to recognize it. I've been toying with something similar but have shied away from it as being "too internal".
@gfyoung where does this sort of thing lie on a scale of "What's The Worst That Could Happen" to "You're Gonna Have A Bad Time"?
Comment From: gfyoung
@jbrockmendel : Working with internals is not going to be a cakewalk. That being said, you're not developing in production, so "what's the worst that could happen?" :smile:
I wouldn't worry about it being "too internal" - as long as you can surface it in the end with the desired behavior, that's all that counts.
Comment From: jorisvandenbossche
Let's close this in favor of https://github.com/pandas-dev/pandas/issues/17144 (allow external Blocks), as I think that is the only option to include non-numpy arrays in a DataFrame (and the one we are using now in geopandas)