Pandas BUG: read_table error with tabs, dtype and index_col

This gist demonstrates the problem: https://gist.github.com/brentp/6066942

It's discussed in this thread: https://groups.google.com/forum/#!topic/pydata/hIIZpZqbY5M

As discussed in the thread, there is some interaction when specifying dtype, index_col, and sep in read_csv.

As the gist shows, things work find with sep="\s+", but fail for the same set with sep="\t" even though the file is tab-delimited.

Comment From: jtratner

Could you add a csv/tsv file to that file that demonstrates it? I know copy/paste is easiest for ipython, but it'd be great to be able to just download the raw data file and work with it :)

On Thu, Jul 25, 2013 at 12:06 PM, Brent Pedersen - Bioinformatics notifications@github.com wrote:

This gist demonstrates the problem: https://gist.github.com/brentp/6066942

It's discussed in this thread: https://groups.google.com/forum/#!topic/pydata/hIIZpZqbY5M

As discussed in the thread, there is some interaction when specifying dtype, index_col, and sep in read_csv.

As the gist shows, things work find with sep="\s+", but fail for the same set with sep="\t" even though the file is tab-delimited.

— Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/4363 .

Comment From: brentp

I added the data file to the gist.

https://gist.github.com/brentp/6066942/raw/95742c7811f89e032194e1f32a849272e0268c15/t.txt

Comment From: jtratner

I was playing around with this. Weirdly, if you pass the text directly to the reader in StringIO, it works, but doesn't if you read from the file:

text = StringIO(txt.replace("   ", "\t"))
df = pd.read_csv(text, dtype=np.int_, sep="\t", index_col=0)
print r"OK \t"

Additionally, if you remove the dtype, it also works:

df = pd.read_csv('t.txt', sep="\t")
print r"OK \t+", df.shape

Comment From: jtratner

Okay, issue is the index column, everything works if you change index to integers.

Comment From: jreback

dupe of #9435