Pandas BUG(?): ArrowExtensionArray.median is an "approximate median"

ArrowExtensionArray.median is implemented using pyarrow.compute.approximate_median which might produce surprising results for users looking for a "true" median:

In [1]: import pandas as pd

In [2]: pd.Series([1, 2], dtype="float64").median()
Out[2]: 1.5

In [3]: pd.Series([1, 2], dtype="float64[pyarrow]").median()
Out[3]: 1.0

It does not look like pyarrow provides a compute function for a "true" median. Is this intentional? Should it be documented somehow?

cc @mroeschke

Comment From: lukemanley

It seems like this might work to get a true median:

In [4]: pd.Series([1, 2], dtype="float64[pyarrow]").quantile(0.5)
Out[4]: 1.5

Comment From: mroeschke

Is approximate median more performant than quantile when there are an odd number of elements in the Series?

Comment From: lukemanley

Even with odd numbers it is still just an estimate, not a true median. It also looks like quantile is more performant:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: ser = pd.Series(np.random.randn(999_999), dtype="float64[pyarrow]")

In [4]: ser.median()
Out[4]: -0.00010149162825897479

In [5]: ser.quantile(0.5)
Out[5]: -0.00011543661536270946

In [6]: np.median(ser.to_numpy())
Out[6]: -0.00011543661536270946

In [7]: %timeit ser.median()
59.4 ms ± 91.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: %timeit ser.quantile(0.5)
13.2 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comment From: mroeschke

Okay let's definitely use quantile then