Code Sample, a copy-pastable example if possible

Given a directory with a large collection (> 1900) of files such as the attached sample (sample_file.zip), and the following pseudo-code "junk.py" (needs work to make it copy-pastable by defining dtypes and colnames, as well as generating thousands of similar files

import os.path as osp
import pandas as pd

for f in ec_files:
    print(osp.basename(f))
    fdf = pd.read_csv(f, dtype=dtypes, header=None,
                      parse_dates=[0, 1], index_col=1, names=colnames,
                      na_values=["NAN"], true_values=["t"],
                      false_values=["f"], low_memory=False)

Problem description

Running this script stops at different files each time it is run with the following traceback:

Traceback (most recent call last):
  File "junk.py", line 9, in <module>
    false_values=["f"], low_memory=False)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 400, in _read
    data = parser.read()
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 938, in read
    ret = self._engine.read(nrows)
  File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1505, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 849, in pandas.parser.TextReader.read (pandas/parser.c:9907)
  File "pandas/parser.pyx", line 945, in pandas.parser.TextReader._read_rows (pandas/parser.c:11161)
  File "pandas/parser.pyx", line 1047, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:12536)
  File "pandas/parser.pyx", line 1126, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:13783)
ValueError: invalid literal for float(): 2016-07-13 18:54:03.9

The file where the error is thrown, as well as the particular ValueError vary with each run, i.e. it is never consistent. Furthermore, every file where the error occurs reads without any problem with the very same read_csv command called on its own (not in a loop). Therefore, the read_csv command is correct for these files.

If the number of files to loop through is relatively small (<100) the process sometimes finishes successfully, but not always.

I initially reported this in Stackoverflow

Expected Output

read_csv output and behaviour should be the same whether it is run in a loop or independently.

Output of pd.show_versions()

# Paste the output here pd.show_versions() here
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-2-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: None.None

pandas: 0.19.0+git14-ga40e185
nose: 1.3.7
pip: 9.0.1
setuptools: 32.0.0
Cython: None
numpy: 1.12.0rc2
scipy: 0.18.1
statsmodels: 0.8.0rc1
xarray: None
IPython: 5.1.0
sphinx: 1.4.9
patsy: 0.4.1+dev
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0rc2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.9.2
apiclient: 1.5.5
sqlalchemy: None
pymysql: None
psycopg2: 2.6.2 (dt dec mx pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

Comment From: jreback

you are odding using a version from master which is from oct 2016. have had lots of bug fixes in 0.19.1 and 0.19.2 for read_csv. please give them a try. If you can show a specific example file where this doesn't work, then please update the post.

further you should simply try with less options passed to read_csv, e.g. the na_values doesn't matter. and low_memory is a deprecated option.

Comment From: spluque

I'll update Pandas and remove low_memory as you suggest and see if that improves anything. Why does na_values not matter? Some of my files do have "NAN" for missing values. So far, every single file where the error is thrown (and several surrounding it) in the loop, read absolutely fine on their own with the exact same read_csv... I can certainly show this. I'm hoping the update solves this.

Comment From: spluque

Updating to 0.19.2 solved this problem.