Hello,

I recently upgraded from pandas 0.17.1 to 0.18.1 and ran into a performance regression with DataFrame.to_csv:

Code Sample, a copy-pastable example if possible

import pandas import numpy import tempfile fd, filename = tempfile.mkstemp(prefix='to_csv_', suffix='.csv') test = pandas.DataFrame(numpy.random.rand(2000, 5000)) %timeit test.to_csv(filename, sep=';', index=False, header=False)

In pandas 0.17.1: 1 loop, best of 3: 6.86 s per loop

In pandas 0.18.1: 1 loop, best of 3: 21.9 s per loop

You will find attached the results given by the Spyder profiler running this code in both version.

Most of the time (in pandas 0.18.1) is spent in the pandas.isnull method.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: fr_FR

pandas: 0.18.1 nose: 1.3.7 pip: 8.1.1 setuptools: 20.3 Cython: 0.23.4 numpy: 1.10.4 scipy: 0.17.0 statsmodels: None xarray: None IPython: 4.1.2 sphinx: 1.3.5 patsy: 0.4.0 dateutil: 2.5.1 pytz: 2016.2 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.4 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.39.0 pandas_datareader: None None to_csv_0_17_1 to_csv_0_18_1

Comment From: jreback

In [1]: pd.__version__
Out[1]: u'0.17.1'

In [7]: %timeit test.to_csv(filename, sep=';', index=False, header=False)
1 loop, best of 3: 6 s per loop
%timeit test.to_csv(filename, sep=';', index=False, header=False)
In [1]: pd.__version__
Out[1]: u'0.18.1'

1 loop, best of 3: 16 s per loop
In [28]: pd.__version__
Out[28]: '0.18.1+403.ga0151a7'
1 loop, best of 3: 5.91 s per loop

In [28]: pd.__version__
Out[28]: '0.18.1+403.ga0151a7'

I think this was caused by this: https://github.com/pydata/pandas/issues/12922

though looks ok in master/0.19.0 (soon)