`import pandas as pd import datetime from pandas_datareader import data, wb import csv import json
declaring libraries for the analysis
out= open("testfile.csv", "rt") # read the .csv file data = csv.reader(out) data = [[row[0],row[1] + "" + row[2],row[3] +"" + row[4], row[5],row[6]] for row in data] #this part is mainly to concatenate the City and country out.close()
out=open("data.csv", "wt") output = csv.writer(out) for row in data: output.writerow(row) out.close()
<------------ End of concatenation ------------->
The result is written out and saved in data.csv (I don't want to override the original source)
df = pd.read_csv('data.csv') #using pandas and dataframe, I can read in data and loop it through between the Arrival Date and Departure Date
df.DateDpt = pd.to_datetime(df.DateDpt) df.DateAr = pd.to_datetime(df.DateAr) df = df.set_index('DateAr') new_df = pd.DataFrame() for i, data in df.iterrows(): data = data.to_frame().transpose() data = data.reindex(pd.date_range(start=data.index[0], end=data.DateDpt[0])).fillna(method='ffill').reset_index().rename(columns={'index': 'DateAr'}) new_df = pd.concat([new_df, data])
new_df = new_df[['AuthorID', 'ArCity_ArCountry', 'DptCity_DptCountry', 'DateAr', 'DateDpt']]
print (new_df)
<---- End of iteration ---->
Json file : <------ JSON ----------------->
json_dict = {}
for arrival_date, data in new_df.groupby('DateAr'): matching_dates = data[data.DateDpt==arrival_date] not_matching_dates = data[data.DateDpt!=arrival_date] json_dict[arrival_date.strftime('%Y-%m-%d')] = {} if not matching_dates.empty: for city, flights in matching_dates.groupby('ArCity_ArCountry'): json_dict[arrival_date.strftime('%Y-%m-%d')][city] = [str(v) for v in flights.AuthorID.to_dict().values()] if not not_matching_dates.empty: for city, flights in not_matching_dates.groupby('DptCity_DptCountry'): json_dict[arrival_date.strftime('%Y-%m-%d')][city] = [str(v) for v in flights.AuthorID.to_dict().values()]
with open('json_dict.json', 'w') as f: json.dump(json_dict, f, indent=4, sort_keys=True)
<------- End of JSON ---------------->`
CSV File (testfile.csv) AuthorID ArCity ArCountry DptCity DptCountry DateDpt DateAr AAA Paris France NewYork UnitedState 2008-03-10 2001-02-02 BBB Paris France NewYork UnitedState 2008-03-10 2001-02-02 CCC Paris France Beijing Japan 2008-03-10 2001-02-02 DDD Paris France NewYork UnitedState 2008-03-10 2001-02-02 EEE Paris France London UK 2008-03-10 2001-02-02 FFF Paris France NewYork UnitedState 2008-03-10 2001-02-02 GGG Paris France Beijing Japan 2008-03-10 2001-02-02
Expected output:
{
"2001-02-02": {
"Beijing _Japan": [
"CCC",
"GGG"
],
"London_UK": [
"EEE"
],
"NewYork_UnitedState": [
"AAA"
"BBB"
"DDD"
"FFF"
],
"Paris_France": [
"AAA"
"BBB"
"CCC"
"DDD"
"EEE"
"FFF"
]
},
.
.
.
}
But the current output is
`{
"2001-02-02": {
"Beijing _Japan": [
"GGG"
],
"London_UK": [
"EEE"
],
"NewYork_UnitedState": [
"FFF"
]
},
"2001-02-03": {
"Beijing _Japan": [
"GGG"
],
"London_UK": [
"EEE"
],
"NewYork_UnitedState": [
"FFF"
]
},
.
.
.
}`
``
Comment From: jreback
you will get more help on the mailing list and/or stack overflow
Comment From: jreback
this is not a library issues, rather a question how to structure your code
Comment From: jreback
you will have better responses with a much shorter and copy pastable example
you are reading from a csv file which only u have
Comment From: abdojulari
I have attached the sample csv file
AuthorID ArCity ArCountry DptCity DptCountry DateDpt DateAr
AAA Paris France NewYork UnitedState 2008-03-10 2001-02-02
BBB Paris France NewYork UnitedState 2008-03-10 2001-02-02
CCC Paris France Beijing Japan 2008-03-10 2001-02-02
DDD Paris France NewYork UnitedState 2008-03-10 2001-02-02
EEE Paris France London UK 2008-03-10 2001-02-02
FFF Paris France NewYork UnitedState 2008-03-10 2001-02-02
GGG Paris France Beijing Japan 2008-03-10 2001-02-02
I have posted it on stackover flow, but no response and I'm right now behind schedule. Please sir! This is the sample CSV file named testfile.cs
Comment From: abdojulari
Please assist Sir
Thank you so much for your time sir I only seek for assistance to fix the code.
I have a csv file, which has several IDs, these fall in categories of arrival, departure, within range of dates. I have group by date and then find any of the IDs on particular date sharing the same place of arrival and /or departure with any other IDs
import pandas as pd
import datetime
import numpy as np
from pandas_datareader import data, wb
import csv
import json
df = pd.read_csv('testfile.csv')
df.DepartureDate = pd.to_datetime(df.DepartureDate)
df.ArrivalDate = pd.to_datetime(df.ArrivalDate)
df = df.set_index('ArrivalDate')
new_df = pd.DataFrame()
for i, data in df.iterrows():
data = data.to_frame().transpose()
data =
data.reindex(pd.date_range(start=data.index[0], end=data.DepartureDate[0])).fillna(method='ffill').reset_index().rename(columns={'index': 'ArrivalDate'})
new_df = pd.concat([new_df, data])
new_df = new_df[['ID', 'Arrival', 'Departure', 'ArrivalDate', 'DepartureDate']]
print (new_df)
json_dict = {}
for arrival_date, data in new_df.groupby('ArrivalDate'):
matching_dates = data[data.DepartureDate==arrival_date]
not_matching_dates = data[data.DepartureDate!=arrival_date]
json_dict[arrival_date.strftime('%Y-%m-%d')] = {}
if not matching_dates.empty:
for city, flights in matching_dates.groupby('Arrival'):
json_dict[arrival_date.strftime('%Y-%m-%d')][city] = [str(v) for v in flights.ID.to_dict().values()]
if not not_matching_dates.empty:
for city, flights in not_matching_dates.groupby('Departure'):
json_dict[arrival_date.strftime('%Y-%m-%d')][city] = [str(v) for v in flights.ID.to_dict().values()]
with open('json_dict.json', 'w') as f:
json.dump(json_dict, f, indent=4, sort_keys=True)
The desired output:
{ "2001-02-02": { "Japan": [ "CCC", "GGG" ], "UK": [ "EEE" ], "UnitedState": [ "AAA" "BBB" "DDD" "FFF" ], "France": [ "AAA" "BBB" "CCC" "DDD" "EEE" "FFF" ]
},
.
.
.
}
The current output
{ "2001-02-02": { "Japan": [ "GGG" ], "UK": [ "EEE" ], "UnitedState": [ "FFF" ] }, "2001-02-03": { "Japan": [ "GGG" ], "UK": [ "EEE" ], "UnitedState": [ "FFF" ] }, . . . }
Date: Fri, 29 Jan 2016 18:30:01 -0800 From: notifications@github.com To: pandas@noreply.github.com CC: tollycoast@hotmail.com Subject: Re: [pandas] Please I have serious issue with the CSV, Pandas, Dataframe and JSON, I need someone to assist. (#12183)
you will have better responses with a much shorter and copy pastable example
you are reading from a csv file which only u have
— Reply to this email directly or view it on GitHub.
Comment From: jorisvandenbossche
@tollycoast As Jeff already said, can you please ask this question for example at Stack Overflow (http://stackoverflow.com/) or the mailing list (https://groups.google.com/forum/?fromgroups#!forum/pydata) (I personally recommend Stack Overflow).
It is not that we don't want to help you (we can answer on those other platforms), but we like to keep this issue tracker for bugs/enhancement requests for pandas.