Pandas to_csv execution unreliable in combination with for loop

I am outputting a single column from a dataframe to a csv. The data, however, is too long for some older downstream applications, so I have the csv add line breaks every size items so that individual rows in the output aren't too long.

Example

from tkinter import Tk
from tkinter import filedialog
import pandas as pd
import numpy as np

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
infilename = filedialog.askopenfilename()# show an "Open" dialog box and return the path to the selected file
data = pd.read_csv(infilename, header=None) #usecols=[0], only get 1st column, specify no header
outfilename = filedialog.asksaveasfile() #get save location

size=50 #number of items per line
col=0
indexes = np.arange(0,len(data),size)#have to use numpy since range is now an immutable type in python 3
indexes = np.append(indexes,[len(data)]) #add the uneven final index
for i in range(len(indexes)-1):
    holder = pd.DataFrame(data.iloc[indexes[i]:indexes[i+1],col]).T
    holder.to_csv(outfilename, index=False, header=False)

Expected Output

for input: A,B,C,D,E,F,G,H,I,J,K #this is actually a column in the input, rendered horizontal for space

Becomes in the file (size=3): A,B,C D,E,F G,H,I J,K

Despite not throwing any errors, the final loop (with the uneven final index) does not write to the file, even though the information is assigned to holder without issue. Also, despite my not setting the mode param, acts as w and overwrites file if existing on the first loop, then acts as a and appends to the file on subsequent loops, indicating some possible statefullness which may be relevant. Since no errors are thrown, I cannot figure out why the final information is not being written.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None pandas: 0.18.1 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24 numpy: 1.11.1 scipy: 0.17.1 statsmodels: 0.6.1 xarray: None IPython: 4.2.0 sphinx: 1.3.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.0 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: 4.4.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.40.0 pandas_datareader: None

Comment From: TomAugspurger

Is tkinter necessary to reproduce the problem? Could you maybe simplify the example?

Comment From: Void2258

OK, I just tried hardcoding the paths and removing all tkinter. Now, it writes the data perfectly BUT it no longer shows the variable mode behavior; it repeatedly overwrites the single line of data, but it writes the last set successfully. The example below includes a modified for loop to account for this.

import pandas as pd
import numpy as np

infilename = 'C:\\Users\\...infile.csv'
data = pd.read_csv(infilename, header=None)
outfilename = 'C:\\Users\\...test.txt'

size=50 #number of items per line
col=0
indexes = np.arange(0,len(data),size)#have to use numpy since range is now an immutable type in python 3
indexes = np.append(indexes,[len(data)]) #add the uneven final index
for i in range(len(indexes)-1):
    holder = pd.DataFrame(data.iloc[indexes[i]:indexes[i+1],col]).T
    if i ==0:
        holder.to_csv(outfilename, index=False, header=False)
    else:
        holder.to_csv(outfilename, index=False, header=False, mode='a')

Comment From: TomAugspurger

@Void2258 we still can run that example since we don't have a 'C:\\Users\\...infile.csv' file on our computers. It doesn't look like read_csv is needed at all here. You could make the dataframe using pd.DataFrame().

Either way, I suspect the problem is with your looping logic, and not to_csv. I've tested that out and the appending works as expected. Maybe try stack overflow if you have a question about getting the loop logic worked out.

Pandas to_csv execution unreliable in combination with for loop

Example

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`