Code Sample

import pandas as pd
import numpy as np

left = pd.DataFrame(
    columns=['A'], 
    index=pd.Index([], name='id', dtype=np.int64)
)
right = pd.DataFrame(
    [[5]], 
    columns=['B'],
    index=pd.Index([42], name='id', dtype=np.int64)
)
res_idx_dtype =  left.join(right).index.dtype
assert res_idx_dtype == np.int64

Problem description

When executing a left join on two dataframes I would expect the resulting index type to be of the same type my left dataframe was.

I ran into this issue while using dask. I executed another join on the resulting dataframe which resulted in an error, because the index was of type object instead of np.int64.

Expected Output

>>>  left.join(right).index.dtype
dtype('int64')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.3 pytest: None pip: 9.0.3 setuptools: 40.4.3 Cython: 0.29.10 numpy: 1.14.3 scipy: 1.1.0 pyarrow: None xarray: None IPython: 7.0.1 sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.5 blosc: None bottleneck: None tables: 3.5.2 numexpr: 2.6.9 feather: None matplotlib: 2.2.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: 1.2.9 pymysql: None psycopg2: None jinja2: 2.10 s3fs: 0.2.1 fastparquet: 0.3.1 pandas_gbq: None pandas_datareader: None

Comment From: TomAugspurger

Looks like a bug. Is it just for empty dataframes?

Comment From: jreback

try this on master / 0.24.2 iirc this is fixed

Comment From: michcio1234

It still happens for me with 0.24.2.

Comment From: michcio1234

Looks like a bug. Is it just for empty dataframes?

@TomAugspurger yes, if left dataframe is not empty, the result is as expected.

Comment From: kayibal

Unfortunately not fixed in master either:

>>> import pandas as pd
>>> pd.__version__
'0.25.0.dev0+754.gbd09a5904'
>>> import numpy as np
>>>
>>> left = pd.DataFrame(
...     columns=['A'],
...     index=pd.Index([], name='id', dtype=np.int64)
... )
>>> right = pd.DataFrame(
...     [[5]],
...     columns=['B'],
...     index=pd.Index([42], name='id', dtype=np.int64)
... )
>>> res_idx_dtype =  left.join(right).index.dtype
>>> assert res_idx_dtype == np.int64
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

Comment From: TomAugspurger

Are you interested in investigating where things go wrong?

On Tue, Jun 18, 2019 at 10:21 AM Alan Höng notifications@github.com wrote:

Unfortunately not fixed in master either:

import pandas as pd pd.version '0.25.0.dev0+754.gbd09a5904' import numpy as np

left = pd.DataFrame( ... columns=['A'], ... index=pd.Index([], name='id', dtype=np.int64) ... ) right = pd.DataFrame( ... [[5]], ... columns=['B'], ... index=pd.Index([42], name='id', dtype=np.int64) ... ) res_idx_dtype = left.join(right).index.dtype assert res_idx_dtype == np.int64 Traceback (most recent call last): File "", line 1, in AssertionError

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/26925?email_source=notifications&email_token=AAKAOIXR5K6ULK3DDGURSXDP3D4QJA5CNFSM4HZAX6EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX67YKA#issuecomment-503184424, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIWAMFQAQN3T4JRFMXTP3D4QJANCNFSM4HZAX6EA .

Comment From: kayibal

Sure I can have a look!

Comment From: kayibal

From a quick look I think this line could be the culprit:

https://github.com/pandas-dev/pandas/blob/master/pandas/core/reshape/merge.py#L787

introduced by this commit: https://github.com/pandas-dev/pandas/commit/d16cce89521c84fcb9c7b7bb2e95629a6fe7acb7

I can try to remove it and run all the tests although it will take me a while to set up my env for pandas.

Comment From: TomAugspurger

Ah I think you've found it. It does seem pretty deliberate, but I'm hopeful that it can just be removed and things will be OK thanks to a recent refactor.

The commit you linked to is just a reorganization. The old filename was pandas/tools/merge.py.

On Tue, Jun 18, 2019 at 3:23 PM Alan Höng notifications@github.com wrote:

From a quick look I think this line could be the culprit:

https://github.com/pandas-dev/pandas/blob/master/pandas/core/reshape/merge.py#L787

introduced by this commit: d16cce8 https://github.com/pandas-dev/pandas/commit/d16cce89521c84fcb9c7b7bb2e95629a6fe7acb7

I can try to remove it and run all the tests although it will take me a while to set up my env for pandas.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/26925?email_source=notifications&email_token=AAKAOIRX45P7RK6G7JWNJQTP3E73FA5CNFSM4HZAX6EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX73HPQ#issuecomment-503296958, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIQJYJFCJUOGBPOOBRTP3E73FANCNFSM4HZAX6EA .

Comment From: kayibal

I solved it like this and looks like so far all tests for reshape/merge/test_merge.py are passing:

        if len(join_index) == 0:
            if self.how == 'right':
                join_index = join_index.astype(self.right.index.dtype)
            elif self.how == 'left':
                join_index = join_index.astype(self.left.index.dtype)
            else:
                join_index = join_index.astype(object)

Any idea on how to handle inner and outer?

Comment From: TomAugspurger

Not off the top of my head. It may be best to make a PR at this point if you're interested.

Comment From: lukemanley

closing as this is fixed on main and seems to be tested in a few places (e.g. test_join_empty)