When trying to compare a dataframe to a column/series (I know, in the following case not useful due to the alignement of the series with the columns of the dataframe and not the rows, but it is something typical users will try), I get the correct results if there are strings in the dataframe and series, but a TypeError when the dataframe contains datetime values:
In [1]: from io import StringIO
In [2]: s = """id date birth_date_1 birth_date_2
...: 1 2000-01-01 2000-01-03 2000-01-05
...: 1 2000-01-07 2000-01-03 2000-01-05
...: 2 2000-01-02 2000-01-10 2000-01-01
...: 2 2000-01-05 2000-01-10 2000-01-01"""
In [3]: df = pd.read_csv(StringIO(s), sep='\s+')
In [5]: df[['birth_date_1','birth_date_2']] > df['date']
Out[5]:
0 1 2 3 birth_date_1 birth_date_2
0 False False False False True True
1 False False False False True True
2 False False False False True True
3 False False False False True True
In [7]: df = pd.read_csv(StringIO(s), sep='\s+', parse_dates=[1,2,3])
In [8]: df[['birth_date_1','birth_date_2']] > df['date']
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in handle_error()
954 if raise_on_error:
955 raise TypeError('Could not operate %s with block values
%s'
--> 956 % (repr(other), str(detail)))
957 else:
958 # return the values
TypeError: Could not operate array(['2000-01-01T01:00:00.000000000+0100',
'2000-01-07T01:00:00.000000000+0100',
'2000-01-02T01:00:00.000000000+0100',
'2000-01-05T01:00:00.000000000+0100'], dtype='datetime64[ns]') with block
values invalid type promotion
Comment From: jorisvandenbossche
Although I am not sure this is the correct result:
In [5]: df[['birth_date_1','birth_date_2']] > df['date']
Out[5]:
0 1 2 3 birth_date_1 birth_date_2
0 False False False False True True
1 False False False False True True
2 False False False False True True
3 False False False False True True
There are no overlapping elements between the dataframe and series, but why then sometimes True and sometimes False?
Comment From: jreback
this is quite tricky; datetimes are not handled in a multi-column vectorized way correctly
xref to #8554. I think I can fix this but its a bit tricky.
Comment From: jbrockmendel
@jorisvandenbossche I'm not entirely clear on what the issue is here. Is it about broadcasting? Maybe it has been resolved in the interim?
Comment From: mroeschke
I think the first case raises a sensible error now (not date parsed)
TypeError: '>' not supported between instances of 'numpy.ndarray' and 'str'
The 2nd case doesn't seem to raise a sensible error as there is no float column being compared
TypeError: '<' not supported between instances of 'Timestamp' and 'float'
In [60]: pd.__version__
Out[60]: '1.1.0.dev0+1027.g767335719'
Comment From: jbrockmendel
IIUC reindexing is introducing float (all-nan) columns, which then raise on comparison. That automatic reindexing was deprecated in #36795. we could try to get something in for 1.4 to give a better exception message, but i dont think its worth the trouble
Comment From: jbrockmendel
This now correctly raises because automatic alignment deprecation has been enforced. Is there another bug after that surfaces if we manually align before the comparison?