Pandas Need way to get original column order (read_csv, read_json)

Code Sample, a copy-pastable example if possible

# Your code here

Problem description

Applications sometimes need to be able to present and manipulate data in the column ordering found in the source data file (CSV or JSON). The column order in these files is significant to users in many cases. The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order. After the Dataframe has been created, there seems to be no way to get the original column order. Please consider adding a parameter to read_csv() and read_json() to preserve the column ordering found in the first line of the file (for CSV) or in the first record (for JSON). Thanks!

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of `pd.show_versions()`

[paste the output of ``pd.show_versions()`` here below this line]

Comment From: TomAugspurger

The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.

Can you show an example where column order is lost with read_csv?

Comment From: rfernand2

I just checked CSV and TXT – my bad – these work perfectly. The problem seems to only be with read_json(). Thanks!

From: Tom Augspurger notifications@github.com Sent: Sunday, January 27, 2019 12:55 PM To: pandas-dev/pandas pandas@noreply.github.com Cc: Roland Fernandez rfernand@microsoft.com; Author author@noreply.github.com Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.

Can you show an example where column order is lost with read_csv?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-457953371&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210153461&sdata=24K5o9dUafSUxF4j2G0Rf%2F8LwMgfbkzyR3mMEIVBs8s%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3sayc5mr6VQiW93qrgzs1cY0IHjUks5vHhI2gaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210163470&sdata=jxLwZp3eDSw8KCYt7lmOSBEfHFVtwx0C3E7idkuUW3s%3D&reserved=0.

Comment From: TomAugspurger

How about an example with read_json? Keep in mind that dictionaries aren’t necessarily ordered.

From: Roland Fernandez notifications@github.com Sent: Sunday, January 27, 2019 18:10 To: pandas-dev/pandas Cc: Tom Augspurger; Comment Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

I just checked CSV and TXT – my bad – these work perfectly. The problem seems to only be with read_json(). Thanks!

The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.

Can you show an example where column order is lost with read_csv?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/24962#issuecomment-457967158, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIpwv68DkxwTzomTf94Q3r-rp6-Klks5vHj_8gaJpZM4aU630.

Comment From: rfernand2

Sorry for the delay. Here is a failing test case:

pandasJsonLoader: test out ability of pandas JSON loader to preserve column order

import pandas

json_str = '[{"Name": "chevrolet chevelle malibu", "Miles_per_Gallon": 18, "Cylinders": 8,}]' data = pandas.read_json(json_str)

print(list(data)) # out: ['Cylinders', 'Miles_per_Gallon', 'Name'] `

Comment From: rfernand2

Not sure how I closed this issue - Reopening it again.

Comment From: WillAyd

@rfernand2 what you are asking for is achievable with orient='table' for JSON as that by design produces a schema with an ordered list of field descriptors (columns)

Comment From: WillAyd

Closing as I think this is already supported across formats requested

Comment From: rfernand2

That doesn't seem to be a valid option for this. When I add "orient='table'" to the above test case, I get the following error:

C:\multiNLI\multinli_1.0>python test.py Traceback (most recent call last): File "test.py", line 4, in data = pandas.read_json(json_str, orient="table") File "c:\anaconda3\lib\site-packages\pandas\io\json\json.py", line 422, in read_json result = json_reader.read() File "c:\anaconda3\lib\site-packages\pandas\io\json\json.py", line 529, in read obj = self._get_object_parser(self.data) File "c:\anaconda3\lib\site-packages\pandas\io\json\json.py", line 546, in _get_object_parser obj = FrameParser(json, **kwargs).parse() File "c:\anaconda3\lib\site-packages\pandas\io\json\json.py", line 638, in parse self._parse_no_numpy() File "c:\anaconda3\lib\site-packages\pandas\io\json\json.py", line 864, in _parse_no_numpy precise_float=self.precise_float) File "c:\anaconda3\lib\site-packages\pandas\io\json\table_schema.py", line 298, in parse_table_schema col_order = [field['name'] for field in table['schema']['fields']] TypeError: list indices must be integers or slices, not str

Comment From: WillAyd

Right your source data would need to be in the table format first.

pd.read_json(data[['Miles_per_Gallon', 'Cylinders', 'Name']].to_json(orient='table'), orient='table')

Comment From: rfernand2

Hi William - thanks for the quick response. So, if my data is in the format shown in my test case (which is was when I ran into this problem), there is no way to load it and preserve the column order using Pandas. Can you change to code to just preserve the column order for this common case? Maybe with a option for the behavior, if needed?
Thanks, Roland.

Comment From: WillAyd

JSON objects are by definition unordered so I don't think there's a compliant way to get what you are asking for with your source data

https://json.org

Comment From: rfernand2

OK, I see the problem now. Thanks for your help!

From: William Ayd notifications@github.com Sent: Tuesday, March 26, 2019 9:02 AM To: pandas-dev/pandas pandas@noreply.github.com Cc: Roland Fernandez rfernand@microsoft.com; Mention mention@noreply.github.com Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

JSON objects are by definition unordered so I don't think there's a compliant way to get what you are asking for with your source data

https://json.orghttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjson.org&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505225052&sdata=qENLj0TeimCKpl7zZxJQNou7okZjYKh5pLn%2FiUpdhd4%3D&reserved=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-476719735&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505235061&sdata=ru%2BhAVEn5K%2BTkrN5chKLvUpOPEQGAh5I7UgCgs9xRiA%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3pwm1n_JVfweE88OTc_MQmlcujQlks5vakSUgaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505235061&sdata=dp9Xyux4IBPiBQ%2F5YQdxTUSesD9O6rqgs4ZynOcb2LA%3D&reserved=0.

Comment From: Biprodeep

Hi,

Please provide a feature where we can turn off lexicographical sorting of columns in read_json() Currently it sorts lexicographically regardless of the original order which leads to loosing of format. By this I mean - the dataframe should be reading columns in order as they are placed in the original JSON structure. In case of an unformated JSON file it should throw an error that it cannot replicate the default column order for every other field.

Thanks.

Comment From: apalii

# expected order
In [28]: qwe  = [{"question": "Who are you ?", "answer": "I am bot"}, {"question": "Are you production ready ?", "answer": "Definately not"},]

# expected order
In [29]: json.dumps(qwe)
Out[29]: '[{"question": "Who are you ?", "answer": "I am bot"}, {"question": "Are you production ready ?", "answer": "Definately not"}]'

# wrong order
In [30]: pandas.read_json(json.dumps(qwe))
Out[30]:
           answer                    question
0        I am bot               Who are you ?
1  Definately not  Are you production ready ?

In [31]: pandas.read_json(json.dumps(qwe)).to_csv()
Out[31]: ',answer,question\n0,I am bot,Who are you ?\n1,Definately not,Are you production ready ?\n'

I am going to use this function for converting json to csv, but mentioned function returns absolutely another object. @WillAyd

JSON objects are by definition unordered

but it doesn't mean that read_json should change the object. So again, please reopen the issue and provide a key where we can turn off lexicographical sorting of columns:

for example

pandas.read_json(json.dumps(qwe), keep_order=True)

Comment From: jcwchen

Same question here. It seems that right now pandas.read_json would sort the JSON (as a dictionary in Python?) and then print it. Still looking for an alternative to control the order because I would like to make certain column being the first order. Providing the function for keeping order would be really helpful.

Thank you so much.

def json_to_csv(self, json_name):
    with open(json_name) as json_file:  
        json_data = json.load(json_file, object_pairs_hook=OrderedDict) 
    csv_name = 'temp.csv'
    with open(csv_name, 'w') as f:
        writer = csv.writer(f)
        keys = []
        for key in json_data[0]:
            keys.append(key)
        writer.writerow(keys)
        for j in json_data:
            values = []
            for key in j:
                values.append(j[key])
            writer.writerow(values)
        return pd.read_csv(csv_name)

Just use the codes above for an alternative. Load JSON as OrderedDict to maintain the original order and then convert it into a csv. Use read_csv instead of read_json.

Comment From: discover59

@jacky82226 's approach worked for me with minor adjustment.

Comment From: brakende

Shorter version of @jcwchen answer:

with open(JSON_DB) as handle:
    j = json.load(handle, object_pairs_hook=OrderedDict)
    rowdicts = [{"idx": k} | vs for k, vs in j.items()]
print(pd.DataFrame.from_records(rowdicts).set_index("idx"))

Here I assume that the json usesorient="index". My problem was that pd.read_json fudged up the row order.

Pandas Need way to get original column order (read_csv, read_json)

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

pandasJsonLoader: test out ability of pandas JSON loader to preserve column order

for example

Output of `pd.show_versions()`