Parse JSON to Excel - Pandas + xlwt - pandas
I'm kind of half way through this functionality. However, I need some help with formatting the data in the sheet that contains the output.
My current code...
response = {"sic2":[{"confidence":1.0,"label":"73"}],"sic4":[{"confidence":0.5,"label":"7310"}],"sic8":[{"confidence":0.5,"label":"73101000"},{"confidence":0.25,"label":"73102000"},{"confidence":0.25,"label":"73109999"}]}
# Create a Pandas dataframe from the data.
df = pd.DataFrame.from_dict(json.loads(response), orient='index')
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
The output is as follows...
What I want is something like this...
I suppose that first I would need to extract and organise the headers.
This would also include manually assigning a header for a column that cannot have a header by default as in case of SIC column.
After that, I can feed data to the columns with their respective headers.
You can loop over the keys of your json object and create a dataframe from each, then use pd.concat to combine them all:
import json
import pandas as pd
response = '{"sic2":[{"confidence":1.0,"label":"73"}],"sic4":[{"confidence":0.5,"label":"7310"}],"sic8":[{"confidence":0.5,"label":"73101000"},{"confidence":0.25,"label":"73102000"},{"confidence":0.25,"label":"73109999"}]}'
json_data = json.loads(response)
all_frames = []
for k, v in json_data.items():
df = pd.DataFrame(v)
df['SIC Category'] = k
all_frames.append(df)
final_data = pd.concat(all_frames).set_index('SIC Category')
print(final_data)
This prints:
confidence label
SIC Category
sic2 1.00 73
sic4 0.50 7310
sic8 0.50 73101000
sic8 0.25 73102000
sic8 0.25 73109999
Which you can export to Excel as before, through final_data.to_excel(writer, sheet_name='Sheet1')
Related
Read json files in pandas dataframe
I have large pandas dataframe (17 000 rows) with a filepath in each row associated with a specific json file. For each row I want to read the json file content and extract the content into a new dataframe. The dataframe looks something like this: 0 /home/user/processed/config1.json 1 /home/user/processed/config2.json 2 /home/user/processed/config3.json 3 /home/user/processed/config4.json 4 /home/user/processed/config5.json ... ... 16995 /home/user/processed/config16995.json 16996 /home/user/processed/config16996.json 16997 /home/user/processed/config16997.json 16998 /home/user/processed/config16998.json 16999 /home/user/processed/config16999.json What is the most efficient way to do this? I believe a simple for-loop might be best suited here? import json json_content = [] for row in df: with open(row) as file: json_content.append(json.load(file)) result = pd.DataFrame(json_content)
Generally, I'd try with iterrows() function (as a first hit to improve efficiency). Implementation could possibly look like that: import json import pandas as pd json_content = [] for row in df.iterrows(): with open(row) as file: json_content.append(json.load(file)) result = pd.Series(json_content)
Possible solution is the following: # pip install pandas import pandas as pd #convert column with paths to list, where: : - all rows, 0 - first column paths = df.iloc[:, 0].tolist() all_dfs = [] for path in paths: df = pd.read_json(path, encoding='utf-8') all_dfs.append(df) Each df in all_dfs can be accessed individually or in loop by index like all_dfs[0], all_dfs[1] and etc. If you wish you can merge all_dfs into the single dataframe. dfs = df.concat(all_dfs, axis=1)
How can I sort and replace the rank of each item instead of it's value before sorting in a merged final csv?
I have 30 different csv files and each row begin with date and some similar features measured for each of 30 items daily. The value of each feature is not important, but the rank they gain after sorting in each day is important. How can I have one merged csv from 30 separate csv with the rank of each feature?
If your files are all the same format, you can do a loop and consolidate in a single data frame. From there you can manipulate as needed: import pandas as pd import glob path = r'C:\path_to_files\files' # use your path all_files = glob.glob(path + "/*.csv") li = [] for filename in all_files: df = pd.read_csv(filename, index_col=None, header=0) li.append(df) frame = pd.concat(li, axis=0, ignore_index=True) Other examples of the same thing: Import multiple csv files into pandas and concatenate into one DataFrame.
How to subset a dataframe, groupby and export the dataframes as multiple sheets of a one excel file in Python
Python newbie here In the dataset below: import pandas as pd import numpy as np data = {'Gender':['M','M','M','M','F','F','F','F','M','M','M','M','F','F','F','F'], 'Location':['NE','NE','NE','NE','SW','SW','SW','SW','SE','SE','SE','SE','NC','NC','NC','NC'], 'Type':['L','L','L','L','L','L','L','L','R','R','R','R','R','R','R','R'], 'PDP':['<10','<10','<10','<10',10,10,10,10,20,20,20,20,'>20','>20','>20','>20'], 'PDP_code':[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4], 'diff':[-1,-1,-1,-1,0,0,0,0,1,1,1,1,3,3,3,3], 'series':[1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8], 'Revenue_YR1':[1150.78,1162.34,1188.53,1197.69,2108.07,2117.76,2129.48,1319.51,1416.87,1812.54,1819.57,1991.97,2219.28,2414.73,2169.91,2149.19], 'Revenue_YR2':[250.78,262.34,288.53,297.69,308.07,317.7,329.81,339.15,346.87,382.54,369.59,399.97,329.28,347.73,369.91,349.12], 'Revenue_YR3':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33], 'Revenue_YR4':[270.84,282.14,298.53,306.69,318.73,327.47,369.63,389.59,398.75,432.18,449.78,473.55,494.85,509.39,515.52,539.23], 'Revenue_YR5':[251.78,221.34,282.53,272.69,310.07,317.7,329.81,333.15,334.87,332.54,336.59,339.97,329.28,334.73,336.91,334.12], 'Revenue_YR6':[240.18,232.14,258.53,276.69,338.07,307.74,359.16,339.25,365.87,392.48,399.97,410.75,429.08,448.39,465.15,469.33], 'Revenue_YR7':[27.84,28.14,29.53,30.69,18.73,27.47,36.63,38.59,38.75,24.18,24.78,21.55,13.85,9.39,15.52,39.23], 'Revenue_YR8':[279.84,289.14,299.53,309.69,318.73,327.47,336.63,398.59,398.75,324.18,324.78,321.55,333.85,339.39,315.52,319.23], } df = pd.DataFrame(data,columns = ['Gender','Location','Type','PDP','PDP_code','diff','series', 'Revenue_YR1','Revenue_YR2','Revenue_YR3','Revenue_YR4','Revenue_YR5','Revenue_YR6', 'Revenue_YR7','Revenue_YR8']) df.head(5) I want a pythonic way of doing the following : subset df into 4 dataframes / lists based on unique Location resulting in NE,SW,SE & NC dataframes aggregating all the Revenue_YR columns while GroupBy series and PDP_code columns and export all the aggregated dataframes (NE,SW,SE & NC) as multiple sheets of one xlsx file My attempt ### this code returns output of 1 df instead of 4 dfs, I need help aggregating each of the 4 dataframes and export them to 4 sheets of 12312021_output.xlsx for i, part_df in df.groupby('Location'): part_df.groupby(['series','PDP_code'])[['Revenue_YR1', 'Revenue_YR2','Revenue_YR3', 'Revenue_YR4', 'Revenue_YR5', 'Revenue_YR6', 'Revenue_YR7']].mean().unstack().style.background_gradient(cmap='Blues').to_excel('12312021_output.xlsx') Please share your code.
You can use pandas.ExcelWriter, and your loop (which I improved slightly for readability): import pandas as pd with pd.ExcelWriter("output.xlsx") as writer: cols = df.filter(like='Revenue_YR').columns for g, d in df.groupby('Location'): (d.groupby(['series','PDP_code'])[cols].mean().unstack() .style.background_gradient(cmap='Blues') ).to_excel(writer, sheet_name=g)
Extracting column from Array in python
I am beginner in Python and I am stuck with data which is array of 32763 number, separated by comma. Please find the data here data I want to convert this into two column 1 from (0:16382) and 2nd column from (2:32763). in the end I want to plot column 1 as x axis and column 2 as Y axis. I tried the following code but I am not able to extract the columns import numpy as np import pandas as pd import matplotlib as plt data = np.genfromtxt('oscilloscope.txt',delimiter=',') df = pd.DataFrame(data.flatten()) print(df) and then I want to write the data in some file let us say data1 in the format as shown in attached pic
It is hard to answer without seeing the format of your data, but you can try data = np.genfromtxt('oscilloscope.txt',delimiter=',') print(data.shape) # here we check we got something useful # this should split data into x,y at position 16381 x = data[:16381] y = data[16381:] # now you can create a dataframe and print to file df = pd.DataFrame({'x':x, 'y':y}) df.to_csv('data1.csv', index=False)
Try this. #input as dataframe df, its chunk_size, extract output as list. you can mention chunksize what you want. def split_dataframe(df, chunk_size = 16382): chunks = list() num_chunks = len(df) // chunk_size + 1 for i in range(num_chunks): chunks.append(df[i*chunk_size:(i+1)*chunk_size]) return chunks or np.array_split
Append values to pandas dataframe incrementally inside for loop
I am trying to add rows to pandas dataframe incrementally inside the for loop. My for loop is like below: def print_values(cc): data = [] for x in values[cc]: data.append(labels[x]) # cc is a constant and data is a list. I need these values to be appended to a row in pandas dataframe. # Pandas dataframe structure is like follows: df=pd.DataFrame(columns = ['Index','Names']) print cc print data # This does not work - Not sure about the problem !! #df_clustercontents.loc['Cluster_Index'] = cc #df_clustercontents.loc['DatabaseNames'] = data for x in range(0,10): print_values(x) I need the values "cc" and "data" to be appended to the dataframe incrementally. Any help would be really appreciated !!
You can use , ... print(cc) print(data) df_clustercontents.loc[len(df_clustercontents)]=[cc,data] ...