Pandas hash_pandas_object fails on dataframe with dict column

Code Sample, a copy-pastable example if possible

import pandas as pd
from pandas.util import hash_pandas_object

data =  [{'a': 10, 'b': {'c': 20, 'd': 30}}, {'a': 100, 'b': {'c': 200, 'd': 300}}]
df = pd.DataFrame(data)
hash_pandas_object(df)

Problem description

The hash_pandas_object call fails with: ``` Traceback (most recent call last): File "/Users/jonathanstray/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in hash_pandas_object(df) File "/Users/jonathanstray/anaconda/lib/python3.5/site-packages/pandas/core/util/hashing.py", line 115, in hash_pandas_object h = combine_hash_arrays(hashes, num_items) File "/Users/jonathanstray/anaconda/lib/python3.5/site-packages/pandas/core/util/hashing.py", line 41, in _combine_hash_arrays for i, a in enumerate(arrays): File "/Users/jonathanstray/anaconda/lib/python3.5/site-packages/pandas/core/util/hashing.py", line 104, in hashes = (hash_array(series.values) for , series in obj.iteritems()) File "/Users/jonathanstray/anaconda/lib/python3.5/site-packages/pandas/core/util/hashing.py", line 286, in hash_array codes, categories = factorize(vals, sort=False) File "/Users/jonathanstray/anaconda/lib/python3.5/site-packages/pandas/core/algorithms.py", line 471, in factorize labels = table.get_labels(values, uniques, 0, na_sentinel, check_nulls) File "pandas/_libs/hashtable_class_helper.pxi", line 1367, in pandas._libs.hashtable.PyObjectHashTable.get_labels TypeError: unhashable type: 'dict'


#### Expected Output
Series of length 2 containing hashes for each row.

#### Output of ``pd.show_versions()``

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 pandas: 0.21.1 pytest: 3.0.6 pip: 9.0.1 setuptools: 34.3.3 Cython: 0.25.2 numpy: 1.11.3 scipy: 0.16.0 pyarrow: 0.7.1 xarray: None IPython: 4.0.1 sphinx: 1.3.1 patsy: 0.4.0 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.6.1 feather: None matplotlib: 1.5.0 openpyxl: 2.2.6 xlrd: 1.0.0 xlwt: 1.0.0 xlsxwriter: 0.7.7 lxml: None bs4: 4.4.1 html5lib: None sqlalchemy: 1.0.9 pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.8 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None




**Comment From: jreback**

why do you think you could hash a mutable object? generically hashing random objects is hard. and not likely to be supported.

**Comment From: jstray**

Because I want to cache tables to disk. The hash value allows me to check if the current table version is different from the one on disk, thus avoiding large unnecessary writes.

While I have you -- how do I tell if a column is a dict or other complex type? Loading from e.g. json just gives me "object" type for every column.

**Comment From: jstray**

I have to find another way for the moment (string hash on to_csv, probably), but I'm wondering if I could modify hash_pandas_object to do canonicalization using the same strategy as .equals()

**Comment From: jreback**

strings & python objects are stored as ``object`` dtype. you can do this to differentiate strings and python objects themselves. There is no good way to deeply introspect these though. Working with non-strings is generally not supported.

In [27]: data = [{'a': 10, 'b': {'c': 20, 'd': 30}}, {'a': 100, 'b': {'c': 200, 'd': 300}}] ...: df = pd.DataFrame(data) ...:

In [28]: df.apply(pd.api.types.infer_dtype) Out[28]: a integer b mixed dtype: object ```