I am interested in seeing if there is a method, or a method could be built to only read in the header column of a text or excel file. The reason I am proposing this is that I generally have to read in files from sources that use different header names for the same underlying data. Internally, I keep a list of column names, and its respective data type. This might look something like:

DATA_TYPES = { 'name':str, 'ISIN':str, 'SEDOL':str, 'CUSIP':str, 'price': np.float64, 'volume': np.int64, }

I then create a column mapper to map internal column names to header column names, for instance:

col_map = { 'name':'ISSUER_NAME', 'CUSIP':'ISSUER_CUSIP' }

in order to create a dictionary to set as the dtype arguement in read_csv to be:

dtype = { 'ISSUER_NAME':str, 'ISSUER_CUSIP':str }

This avoids pd.read_csv to accidently convert cusip's to integers for instance. However, sometimes, the column name(s) will be consistent between the file and the internal name, so I would want to read in the file column names, and based on the given names, remap any necessary columns, and then get each columns dtype. For example, say the file had a column name='ticker', which is the same as the internal column name='ticker', so I want my dtype dictionary to be:

dtype = { 'ISSUER_NAME':str, 'ISSUER_CUSIP':str 'ticker':str }

headers = pd.read_csv(...,header_only=True) headers_intern = remap_list_function(headers,col_map) dtype_intern = filter_dict_key_function(headers_intern,DATA_TYPES) dtype = remap_dict_function(col_map,dtypes_intern)

df = pd.read_csv(....,dtype=dtype,header_only=False) df.rename(columns=flip_dict_function(col_map),inplace=True)

Comment From: jreback

you can specify

pd.read_csv(....., header=None,nrows=1) to (or whatever number you need) and it will return the headers