As already described in issue #3690, panel.to_frame discards all minor entries with nan in the data, which can be very confusing. I believe there should be a warning, first time data is dropped or the opposite should be the default behavior.
The warning could be treated similar to a ZeroDivisionWarning in numpy only on the first occurrence.
See below for an example:
df1 = pd.DataFrame(np.random.randn(2, 3), columns=['A', 'B', 'C'],
index=['foo', 'bar'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['A', 'B', 'C'],
index=['foo', 'bar'])
df2.loc['foo', 'B'] = np.nan
mydict = {'df1': df1, 'df2': df2}
pd.Panel(mydict).to_frame()
Output:
major | minor | df1 | df2 |
---|---|---|---|
foo | A | 1.9097545931480682 | -0.6710202447941566 |
foo | C | 1.3335254610685865 | 1.53372538551507 |
bar | A | 0.3145550744497975 | -1.7221352144306152 |
bar | B | -0.15681197178861878 | -1.2308510354641322 |
bar | C | -0.09598971674309852 | -0.1268630728124487 |
Using filter_observations=False, nan won't be dropped:
pd.Panel(mydict).to_frame(filter_observations=False)
Output:
major | minor | df1 | df2 |
---|---|---|---|
foo | A | 1.9097545931480682 | -0.6710202447941566 |
foo | B | 2.092552358833253 | |
foo | C | 1.333525461068586 | 1.53372538551507 |
bar | A | 0.3145550744497975 | -1.7221352144306152 |
bar | B | -0.15681197178861878 | -1.2308510354641322 |
bar | C | -0.09598971674309852 | -0.1268630728124487 |
Comment From: TomAugspurger
+1 on a warning. This surprises me every time.
Comment From: m-novikov
@jreback I can fix broken tests, it's for most part causing problems becouse result of to_frame() changed length with new API. But I'm not sure what to do with SparsePanel.to_frame() method which will raise exeption if filter_observations changed to False. Else if I left along SpasePanel.to_frame method signatures of Panel.to_frame(filter_observations=False) and SparsePanel.to_frame(filter_observations=True) will be inconsistent.
Comment From: jreback
can you show the error with SparsePanel
. That's sort of a problem step-child. Prob noone uses it (and I never fixed it to inherit from NDFrame).
Comment From: springcoil
Sorry, @jreback how exactly do I do that? Do you want me to change some errors?
On Mon, Nov 3, 2014 at 12:16 AM, jreback notifications@github.com wrote:
can you show the error with SparsePanel. That's sort of a problem step-child. Prob noone uses it (and I never fixed it to inherit from NDFrame).
— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/7879#issuecomment-61428704.
Peadar Coyle Skype: springcoilarch www.twitter.com/springcoil peadarcoyle.wordpress.com
Comment From: jreback
just show a complete example and the error it outputs (when you change the default)
Comment From: eyurtsev
This just got me. Took a long time to figure out why data was missing. :+1: for any solution. :)
Comment From: eyurtsev
This got me yet one more time! 1 hour of troubleshooting. :)
Comment From: jreback
pull-requests welcome to change this for 0.17.0
Comment From: hensing
Is PR https://github.com/pydata/pandas/pull/8063 a solution or is it worth to write a new one or to change the default behavior?
Comment From: jreback
@hensing so options are this
- just change the default and document in the API section. This doesn't lose information, but could be unexpected for users. Though in this case ok with it.
- deprecate filter_observations
and replace with dropna
(which is more consistent with the kw schema). Then you can change to the new default and show a deprecation warning for the originals.
Comment From: hensing
@jreback solution № 2 (deprecate und change kw) seems for me to be the most elegant solution — I'll give it a try next weekend.
Comment From: jreback
gr8!
Comment From: jreback
@hensing status?
Comment From: hensing
@jreback I couldn't work on that due to bike accident, sorry. Hope I can make it this weekend.
Comment From: saddy001
I am a bit confused: I'm using Pandas 19.2 and still have the default drop-behaviour with the filter_observations-Keyword.
pip3 show pandas
Version: 0.19.2
$ python3
import pandas
pandas.__version__
'0.19.2'
When I look into the source I see no changes regarding dropna etc: panel.py line 879
What am I missing?
Comment From: jreback
this is an open issue
iirc there a couple of pull requests that were almost there but did not get merged
welcome to have a complete one
Comment From: jreback
closing as Panel deprecated