s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')
I have a dataframe with a column containing some strings and NaNs. I can do do df.col.bfill()
but when I try to do df.groupby('col2').col1.bfill()
I get a ValueError
.
I can do first
and apply(lambda x: x is str)
without error.
Here is the full traceback:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 558, in wrapper
return self.apply(curried_with_axis)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
return self._python_apply_general(f)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
not_indexed_same=mutated)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
not_indexed_same=not_indexed_same)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
result = result.reindex(ax)
File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
method, fill_value, copy).__finalize__(self)
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
allow_dups=False)
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
copy=copy)
File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 561, in wrapper
return self.apply(curried)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 662, in apply
return self._python_apply_general(f)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 669, in _python_apply_general
not_indexed_same=mutated)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 2380, in _wrap_applied_output
not_indexed_same=not_indexed_same)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 1142, in _concat_objects
result = result.reindex(ax)
File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line 2149, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1731, in reindex
method, fill_value, copy).__finalize__(self)
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1749, in _reindex_axes
allow_dups=False)
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1834, in _reindex_with_indexers
copy=copy)
File "C:\Anaconda3\lib\site-packages\pandas\core\internals.py", line 3150, in reindex_indexer
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 570, in wrapper
return self._aggregate_item_by_item(name, *args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 511, in __getattr__
(type(self).__name__, attr))
AttributeError: 'SeriesGroupBy' object has no attribute '_aggregate_item_by_item'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\helpers\pydev\_pydev_execfile.py", line 38, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
File "G:/Users/ari/Documents/Git/Misc/testbross.py", line 5, in <module>
brwbt = KevinsDB.KevinToWBT(brdb.both[brdb.both.SYMBOL=='VCLT'])
File "G:\Users\ari\Documents\Git\KevinsDB.py", line 95, in KevinToWBT
s = grp.OrderTime.ffill().bfill()
File "C:\Anaconda3\lib\site-packages\pandas\core\groupby.py", line 572, in wrapper
raise ValueError
ValueError
Comment From: jreback
pls supply an example which reproduces this or show df.info() and df.head() as well as pd.show_versions()
Comment From: jreback
can you show a reproducible example and pd.show_versions()
Comment From: arijun
To get a reproducible example I tried manually grouping:
for g in df.groupingColumn.unique():
group = df[df.groupingColumn == g]
group.groupby('groupingColumn').otherColumn.bfill()
but it didn't cause an exception when I did it like that.
I also tried to split it into smaller chunks to cut down the size but couldn't cut it much more than half the original size.
If you can think of another method to get a minimal reproducing DataFrame please let me know.
The output of pd.show_versions()
was
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.16.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.8.0
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: 0.6.2.None
psycopg2: None
Comment From: jreback
@arijun you need to provide a constructed dataframe and your code that repros e.g.
df = DataFrame(.......)
df.groupby(...)......
Comment From: jorisvandenbossche
@arijun How big is your original dataframe? And may you share it?
Comment From: Marigold
This produces @arijun error (using 0.16.2)
s = pd.Series([1,1,0], index=[1,2,2])
s.groupby(s).fillna(method='bfill')
It fails on calling reindex
on series with duplicate index. I'm not that familiar with pandas codebase to submit a PR for this, but let me know if I can help in any other way.
Comment From: TomAugspurger
Duplicate of https://github.com/pandas-dev/pandas/issues/19437