Hi, I get a warning that confused me, and in my opinion is wrong. See code - that should produce warning with further comments.
INSTALLED VERSIONS
commit: None python: 2.7.9.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None
pandas: 0.16.1 nose: 1.3.4 Cython: 0.22 numpy: 1.9.2 scipy: 0.15.1 statsmodels: 0.6.1 IPython: 3.0.0 sphinx: None patsy: 0.3.0 dateutil: 2.4.0 pytz: 2015.2 bottleneck: None tables: 3.1.1 numexpr: 2.4 matplotlib: 1.4.3 openpyxl: None xlrd: 0.9.3 xlwt: None xlsxwriter: 0.7.1 lxml: None bs4: 4.3.2 html5lib: None httplib2: None apiclient: None sqlalchemy: 0.9.9 pymysql: None psycopg2: None
import pandas as pd
import sqlite3 as sql
con = sql.connect("Temp_db.db")
df = pd.DataFrame(columns=[["FIRST", "FIRST", "SECOND", "SECOND"], ["ONE", "TWO", "ONE", "TWO"]]) # no space in any column
# following 'to_sql' produces warning:
# UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
# However, there are no spaces (see above)
df.to_sql("tmp_table", con=con, if_exists="replace")
# refering to page: http://pydoc.net/Python/pandas/0.14.0/pandas.io.sql/
import re
columns = list(map(str, df.columns))
pat = re.compile('\s+')
print columns
# OUTPUT: ["('FIRST', 'ONE')", "('FIRST', 'TWO')", "('SECOND', 'ONE')", "('SECOND', 'TWO')"]
# ('FIRST',X'ONE'): At position X of a tuple is a whitespace that falsly produces the warning:
print any(map(pat.search, columns)) #true
# quick and dirty fix
columns = [i.replace("', '", "','") for i in columns]
print columns
print any(map(pat.search, columns)) #false
```python
Comment From: jorisvandenbossche
Thanks for the report!
However, I don't think it is a false warning in this case, as the multi-index tuples are converted to strings (eg "('FIRST', 'ONE')"
) when used as column names for the sql table. So in those names, there are spaces.
In general, multi-index columns are not really supported by to_sql
(but it works, given that they are converted to somewhat awkward strings)
Example (reading the table back into pandas, from your example above):
In [77]: df = pd.read_sql("SELECT * FROM tmp_table", con)
In [79]: df
Out[79]:
Empty DataFrame
Columns: [index, ('FIRST', 'ONE'), ('FIRST', 'TWO'), ('SECOND', 'ONE'), ('SECOND', 'TWO')]
Index: []
In [80]: df.columns
Out[80]:
Index([u'index', u'('FIRST', 'ONE')', u'('FIRST', 'TWO')',
u'('SECOND', 'ONE')', u'('SECOND', 'TWO')'],
dtype='object')
So when looking at the .columns
attribute, you can see it are now actually strings and no longer tuples.
Comment From: elDan101
Ok, I realised this now too. Sorry for that. I still use MultiIndex with SQL. When I read from the database, I needed to implementent a "MultiIndex recovery" to obtain the same table.
Comment From: jorisvandenbossche
@elDan101 Personally, I think the easiest will be to first join the MultiIndex with eg _
before writing to sql, and then you can easily split it again to a multiindex afterwards.
Something like
# before
df.columns = [a + '_' + b for a, b in df.columns]
# after
df.columns = df.columns.split('_')
(only works of course if you don't have _
in your names itself)