• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> df = pd.DataFrame(
...             {
...                 0: [315.3324, 3243.32432, 3232.332, -100.32],
...                 1: [0.3223, 0.32, 0.0000232, 0.32224],
...             },
...             index=[7, 20, 11, 9],
...         )
>>> df
             0         1
7    315.33240  0.322300
20  3243.32432  0.320000
11  3232.33200  0.000023
9   -100.32000  0.322240
>>> series = pd.Series([10, 11, 23, 234, 13], index=[11, 12, 13, 44, 33])
>>> series
11     10
12     11
13     23
44    234
33     13
dtype: int64

>>> df.append(series, ignore_index=True)
           0         1     11    12    13    33     44
0   315.33240  0.322300   NaN   NaN   NaN   NaN    NaN
1  3243.32432  0.320000   NaN   NaN   NaN   NaN    NaN
2  3232.33200  0.000023   NaN   NaN   NaN   NaN    NaN
3  -100.32000  0.322240   NaN   NaN   NaN   NaN    NaN
4         NaN       NaN  10.0  11.0  23.0  13.0  234.0

>>> df.append(series, ignore_index=True, sort=True)
           0         1     11    12    13    33     44
0   315.33240  0.322300   NaN   NaN   NaN   NaN    NaN
1  3243.32432  0.320000   NaN   NaN   NaN   NaN    NaN
2  3232.33200  0.000023   NaN   NaN   NaN   NaN    NaN
3  -100.32000  0.322240   NaN   NaN   NaN   NaN    NaN
4         NaN       NaN  10.0  11.0  23.0  13.0  234.0

>>> df.append(series, ignore_index=True, sort=False)
           0         1     11    12    13    33     44
0   315.33240  0.322300   NaN   NaN   NaN   NaN    NaN
1  3243.32432  0.320000   NaN   NaN   NaN   NaN    NaN
2  3232.33200  0.000023   NaN   NaN   NaN   NaN    NaN
3  -100.32000  0.322240   NaN   NaN   NaN   NaN    NaN
4         NaN       NaN  10.0  11.0  23.0  13.0  234.0
>>> 

Problem description

The problem here is that even when sort is False, the output dataframe seems to have sorted columns(0, 1, 11, 12, 13, 33, 44), instead the desired order from what the docs say here should be (0, 1, 11, 12, 13, 44, 33).

Expected Output

>>> df.append(series, ignore_index=True, sort=True)
           0         1     11    12    13    33     44
0   315.33240  0.322300   NaN   NaN   NaN   NaN    NaN
1  3243.32432  0.320000   NaN   NaN   NaN   NaN    NaN
2  3232.33200  0.000023   NaN   NaN   NaN   NaN    NaN
3  -100.32000  0.322240   NaN   NaN   NaN   NaN    NaN
4         NaN       NaN  10.0  11.0  23.0  13.0  234.0


>>> df.append(series, ignore_index=True, sort=False)
           0         1     11    12    13    44     33
0   315.33240  0.322300   NaN   NaN   NaN   NaN    NaN
1  3243.32432  0.320000   NaN   NaN   NaN   NaN    NaN
2  3232.33200  0.000023   NaN   NaN   NaN   NaN    NaN
3  -100.32000  0.322240   NaN   NaN   NaN   NaN    NaN
4         NaN       NaN  10.0  11.0  23.0  234.0  13.0

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.6.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-76-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.5 numpy : 1.17.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 47.3.1.post20200616 Cython : 0.29.20 pytest : 5.4.3 hypothesis : 5.18.2 sphinx : 3.1.1 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.15.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.17.1 pytables : None pytest : 5.4.3 pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : 0.49.1

Comment From: AlexKirko

Looking into this one. Also reproducible on the master branch with:

IN:
df1 = pd.DataFrame(data={0: [1,2], 1: [3,4]})
df2 = pd.DataFrame(data={3: [1,2], 2: [3,4]})

df1.append(df2, sort=False)

OUT:
    0   1   2   3
0   1.0 3.0 NaN NaN
1   2.0 4.0 NaN NaN
0   NaN NaN 3.0 1.0
1   NaN NaN 4.0 2.0

Comment From: AlexKirko

take

Comment From: AlexKirko

Found the culprit. At some point down the call stack, DataFrame.append calls union_indexes in core.indexes.api. In it we detect the type of our indexes:

indexes, kind = _sanitize_and_check(indexes)

We don't support sort for kind == 'special' in union_indexes. kind = 'special', I believe, gets mistakenly assigned in this case. Here is the relevant code from _sanitize_and_check:

    if len(kinds) > 1 or Index not in kinds:
        return indexes, "special"
    else:
        return indexes, "array"

What was probably meant was to check if there is an Index subclass in kinds:

    if len(kinds) > 1 or not any([issubclass(kind, Index) for kind in kinds]):
        return indexes, "special"
    else:
        return indexes, "array"

This seems to fix the problem. I'll submit a PR later today.

Please ping if the forced sort is intended behavior and doesn't need a fix.

Comment From: TomAugspurger

This fix is being reverted in https://github.com/pandas-dev/pandas/pull/35277.

Comment From: phofl

append was removed for 2.0, so closing