Code Sample, a copy-pastable example if possible
# Your code here
Problem description
Applications sometimes need to be able to present and manipulate data in the column ordering found in the source data file (CSV or JSON). The column order in these files is significant to users in many cases. The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order. After the Dataframe has been created, there seems to be no way to get the original column order. Please consider adding a parameter to read_csv() and read_json() to preserve the column ordering found in the first line of the file (for CSV) or in the first record (for JSON). Thanks!
Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!
Note: Many problems can be resolved by simply upgrading pandas
to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master
addresses this issue, but that is not necessary.
For documentation-related issues, you can check the latest versions of the docs on master
here:
https://pandas-docs.github.io/pandas-docs-travis/
If the issue has not been resolved there, go ahead and file it in the issue tracker.
Expected Output
Output of pd.show_versions()
Comment From: TomAugspurger
The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.
Can you show an example where column order is lost with read_csv
?
Comment From: rfernand2
I just checked CSV and TXT – my bad – these work perfectly. The problem seems to only be with read_json(). Thanks!
From: Tom Augspurger notifications@github.com Sent: Sunday, January 27, 2019 12:55 PM To: pandas-dev/pandas pandas@noreply.github.com Cc: Roland Fernandez rfernand@microsoft.com; Author author@noreply.github.com Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)
The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.
Can you show an example where column order is lost with read_csv?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-457953371&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210153461&sdata=24K5o9dUafSUxF4j2G0Rf%2F8LwMgfbkzyR3mMEIVBs8s%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3sayc5mr6VQiW93qrgzs1cY0IHjUks5vHhI2gaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210163470&sdata=jxLwZp3eDSw8KCYt7lmOSBEfHFVtwx0C3E7idkuUW3s%3D&reserved=0.
Comment From: TomAugspurger
How about an example with read_json? Keep in mind that dictionaries aren’t necessarily ordered.
From: Roland Fernandez notifications@github.com Sent: Sunday, January 27, 2019 18:10 To: pandas-dev/pandas Cc: Tom Augspurger; Comment Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)
I just checked CSV and TXT – my bad – these work perfectly. The problem seems to only be with read_json(). Thanks!
From: Tom Augspurger notifications@github.com Sent: Sunday, January 27, 2019 12:55 PM To: pandas-dev/pandas pandas@noreply.github.com Cc: Roland Fernandez rfernand@microsoft.com; Author author@noreply.github.com Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)
The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.
Can you show an example where column order is lost with read_csv?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-457953371&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210153461&sdata=24K5o9dUafSUxF4j2G0Rf%2F8LwMgfbkzyR3mMEIVBs8s%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3sayc5mr6VQiW93qrgzs1cY0IHjUks5vHhI2gaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210163470&sdata=jxLwZp3eDSw8KCYt7lmOSBEfHFVtwx0C3E7idkuUW3s%3D&reserved=0.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/24962#issuecomment-457967158, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIpwv68DkxwTzomTf94Q3r-rp6-Klks5vHj_8gaJpZM4aU630.
Comment From: rfernand2
Sorry for the delay. Here is a failing test case:
`
pandasJsonLoader: test out ability of pandas JSON loader to preserve column order
import pandas
json_str = '[{"Name": "chevrolet chevelle malibu", "Miles_per_Gallon": 18, "Cylinders": 8,}]' data = pandas.read_json(json_str)
print(list(data)) # out: ['Cylinders', 'Miles_per_Gallon', 'Name'] `
Comment From: rfernand2
Not sure how I closed this issue - Reopening it again.
Comment From: WillAyd
@rfernand2 what you are asking for is achievable with orient='table'
for JSON as that by design produces a schema with an ordered list of field descriptors (columns)
Comment From: WillAyd
Closing as I think this is already supported across formats requested
Comment From: rfernand2
That doesn't seem to be a valid option for this. When I add "orient='table'" to the above test case, I get the following error:
C:\multiNLI\multinli_1.0>python test.py
Traceback (most recent call last):
File "test.py", line 4, in
Comment From: WillAyd
Right your source data would need to be in the table format first.
pd.read_json(data[['Miles_per_Gallon', 'Cylinders', 'Name']].to_json(orient='table'), orient='table')
Comment From: rfernand2
Hi William - thanks for the quick response.
So, if my data is in the format shown in my test case (which is was when I ran into this problem), there is no way to load it and preserve the column order using Pandas. Can you change to code to just preserve the column order for this common case? Maybe with a option for the behavior, if needed?
Thanks, Roland.
Comment From: WillAyd
JSON objects are by definition unordered so I don't think there's a compliant way to get what you are asking for with your source data
https://json.org
Comment From: rfernand2
OK, I see the problem now. Thanks for your help!
From: William Ayd notifications@github.com Sent: Tuesday, March 26, 2019 9:02 AM To: pandas-dev/pandas pandas@noreply.github.com Cc: Roland Fernandez rfernand@microsoft.com; Mention mention@noreply.github.com Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)
JSON objects are by definition unordered so I don't think there's a compliant way to get what you are asking for with your source data
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-476719735&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505235061&sdata=ru%2BhAVEn5K%2BTkrN5chKLvUpOPEQGAh5I7UgCgs9xRiA%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3pwm1n_JVfweE88OTc_MQmlcujQlks5vakSUgaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505235061&sdata=dp9Xyux4IBPiBQ%2F5YQdxTUSesD9O6rqgs4ZynOcb2LA%3D&reserved=0.
Comment From: Biprodeep
Hi,
Please provide a feature where we can turn off lexicographical sorting of columns in read_json() Currently it sorts lexicographically regardless of the original order which leads to loosing of format. By this I mean - the dataframe should be reading columns in order as they are placed in the original JSON structure. In case of an unformated JSON file it should throw an error that it cannot replicate the default column order for every other field.
Thanks.
Comment From: apalii
# expected order
In [28]: qwe = [{"question": "Who are you ?", "answer": "I am bot"}, {"question": "Are you production ready ?", "answer": "Definately not"},]
# expected order
In [29]: json.dumps(qwe)
Out[29]: '[{"question": "Who are you ?", "answer": "I am bot"}, {"question": "Are you production ready ?", "answer": "Definately not"}]'
# wrong order
In [30]: pandas.read_json(json.dumps(qwe))
Out[30]:
answer question
0 I am bot Who are you ?
1 Definately not Are you production ready ?
In [31]: pandas.read_json(json.dumps(qwe)).to_csv()
Out[31]: ',answer,question\n0,I am bot,Who are you ?\n1,Definately not,Are you production ready ?\n'
I am going to use this function for converting json to csv, but mentioned function returns absolutely another object. @WillAyd
JSON objects are by definition unordered
but it doesn't mean that read_json
should change the object. So again, please reopen the issue and provide a key where we can turn off lexicographical sorting of columns:
for example
pandas.read_json(json.dumps(qwe), keep_order=True)
Comment From: jcwchen
Same question here. It seems that right now pandas.read_json would sort the JSON (as a dictionary in Python?) and then print it. Still looking for an alternative to control the order because I would like to make certain column being the first order. Providing the function for keeping order would be really helpful.
Thank you so much.
def json_to_csv(self, json_name):
with open(json_name) as json_file:
json_data = json.load(json_file, object_pairs_hook=OrderedDict)
csv_name = 'temp.csv'
with open(csv_name, 'w') as f:
writer = csv.writer(f)
keys = []
for key in json_data[0]:
keys.append(key)
writer.writerow(keys)
for j in json_data:
values = []
for key in j:
values.append(j[key])
writer.writerow(values)
return pd.read_csv(csv_name)
Just use the codes above for an alternative. Load JSON as OrderedDict to maintain the original order and then convert it into a csv. Use read_csv instead of read_json.
Comment From: discover59
@jacky82226 's approach worked for me with minor adjustment.
Comment From: brakende
Shorter version of @jcwchen answer:
with open(JSON_DB) as handle:
j = json.load(handle, object_pairs_hook=OrderedDict)
rowdicts = [{"idx": k} | vs for k, vs in j.items()]
print(pd.DataFrame.from_records(rowdicts).set_index("idx"))
Here I assume that the json
usesorient="index"
. My problem was that pd.read_json
fudged up the row order.