Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
Documentation problem
pandas.concat
says (of the sort
argument):
Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when join='inner', which already preserves the order of the non-concatenation axis.
My reading of this is that irrespective of whether I pass sort=True
or sort=False
if I am using join="inner"
, I should expect the same result. But:
import pandas as pd
df1 = pd.DataFrame({"b": [1], "a": [2]})
df2 = pd.DataFrame({"a": [3], "c": [4], "b": [5]})
dfFalse = pd.concat([df1, df2], sort=False, join="inner")
dfTrue = pd.concat([df1, df2], sort=True, join="inner")
dfFalse == dfTrue # => ValueError("Can only compare identically-labeled DataFrames")
I was (perhaps incorrectly) expecting that sort=
would have no effect.
Suggested fix for documentation
If this is not a code bug, I would suggest changing the docstring to say:
Sort non-concatenation axis if it is not already aligned.
Indicating that the sort will always be applied.
Comment From: mroeschke
Thanks for the report. Agreed that This has no effect when join='inner', which already preserves the order of the non-concatenation axis.
is misleading and should be removed as your example exhibits the correct behavior. We have a test test_concat_inner_sort
that confirms the behavior you're seeing