Pandas Value error on groupby bfill()

s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')

I have a dataframe with a column containing some strings and NaNs. I can do do df.col.bfill() but when I try to do df.groupby('col2').col1.bfill() I get a ValueError. I can do first and apply(lambda x: x is str) without error.

Here is the full traceback:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 558, in wrapper
    return self.apply(curried_with_axis)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
    not_indexed_same=mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex(ax)
  File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
    method, fill_value, copy).__finalize__(self)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
    allow_dups=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
    copy=copy)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 561, in wrapper
    return self.apply(curried)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
    not_indexed_same=mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
    result = result.reindex(ax)
  File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
    method, fill_value, copy).__finalize__(self)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
    allow_dups=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
    copy=copy)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 570, in wrapper
    return self._aggregate_item_by_item(name, *args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 511, in __getattr__
    (type(self).__name__, attr))
AttributeError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1733, in <module>
    debugger.run(setup['file'], None, None)
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1226, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\_pydev_execfile.py", line 38, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
  File "G:/Users/ari/Documents/Git/Misc/testbross.py", line 5, in <module>
    brwbt = KevinsDB.KevinToWBT(brdb.both[brdb.both.SYMBOL=='VCLT'])
  File "G:\Users\ari\Documents\Git\KevinsDB.py", line 95, in KevinToWBT
    s = grp.OrderTime.ffill().bfill()
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 572, in wrapper
    raise ValueError
ValueError

Comment From: jreback

pls supply an example which reproduces this or show df.info() and df.head() as well as pd.show_versions()

Comment From: jreback

can you show a reproducible example and pd.show_versions()

Comment From: arijun

To get a reproducible example I tried manually grouping:

for g in df.groupingColumn.unique():
    group = df[df.groupingColumn == g]
    group.groupby('groupingColumn').otherColumn.bfill()

but it didn't cause an exception when I did it like that.

I also tried to split it into smaller chunks to cut down the size but couldn't cut it much more than half the original size.

If you can think of another method to get a minimal reproducing DataFrame please let me know.

The output of pd.show_versions() was

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.8.0
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: 0.6.2.None
psycopg2: None

Comment From: jreback

@arijun you need to provide a constructed dataframe and your code that repros e.g.

df = DataFrame(.......)
df.groupby(...)......

Comment From: jorisvandenbossche

@arijun How big is your original dataframe? And may you share it?

Comment From: Marigold

This produces @arijun error (using 0.16.2)

s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')

It fails on calling reindex on series with duplicate index. I'm not that familiar with pandas codebase to submit a PR for this, but let me know if I can help in any other way.

Comment From: TomAugspurger

Duplicate of https://github.com/pandas-dev/pandas/issues/19437