When it comes to writing unit tests that return a DataFrame or a TimeSeries there's a clear need to compare the output to the expected result. The comparison can be strict, or relaxed in terms of the order of columns and/or order of rows, and could also be forgiving for floats.
I'm thinking of an approach implemented in https://pythonhosted.org/toytable/toytable.table.html#table-literals, i.e. defining a table as
>>> pokedex = table_literal("""
... | Pokemon (str) | Level (int) | Owner (str) | Type (str) |
... | Pikchu | 12 | Ash Ketchum | Electric |
... | Bulbasaur | 16 | Ash Ketchum | Grass |
... | Charmander | 19 | Ash Ketchum | Fire |
... | Raichu | 23 | Lt. Surge | Electric |
... """)
and subsequently invoking self.assertTableEqual or self.assertTableContentsEqual.
Does anything similar exist in pandas (I could not find), and can the above approach be upvoted please?
Comment From: jreback
well we have an entire module for this: https://github.com/pydata/pandas/blob/master/pandas/util/testing.py
not to mention that the entire test suite uses things like this.
try
from pandas .util import testing as tm
tm.assert_frame_equal
Comment From: rs2
How would you build & visualize the expected DataFrame? The table_literal approach seems to be appropriate.
Comment From: jorisvandenbossche
@rs2 Just the same as you otherwise build a small dataframe, eg pd.DataFrame(..) with eg a dictionary.
If you want to ahve something visual like the above, you can always pass such a string to read_csv or read_fwf
See also https://github.com/pydata/pandas/issues/9895 for proposal to make the testing functions more 'officially' public