Append values to pandas dataframe incrementally inside for loop - pandas

I am trying to add rows to pandas dataframe incrementally inside the for loop.
My for loop is like below:
def print_values(cc):
data = []
for x in values[cc]:
data.append(labels[x])
# cc is a constant and data is a list. I need these values to be appended to a row in pandas dataframe.
# Pandas dataframe structure is like follows: df=pd.DataFrame(columns = ['Index','Names'])
print cc
print data
# This does not work - Not sure about the problem !!
#df_clustercontents.loc['Cluster_Index'] = cc
#df_clustercontents.loc['DatabaseNames'] = data
for x in range(0,10):
print_values(x)
I need the values "cc" and "data" to be appended to the dataframe incrementally.
Any help would be really appreciated !!

You can use ,
...
print(cc)
print(data)
df_clustercontents.loc[len(df_clustercontents)]=[cc,data]
...

Related

python - if-else in a for loop processing one column

I am interested to loop through column to convert into processed series.
Below is an example of two row, four columns data frame:
import pandas as pd
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils
data = [['r/o ac. nephritis. /. nephrotic syndrome', ' ac. nephritis. /. nephrotic syndrome',1,'ac nephritis nephrotic syndrome'], [ 'sternocleidomastoid contracture','sternocleidomastoid contracture',0,"NA"]]
# Create the pandas DataFrame
df_diagnosis = pd.DataFrame(data, columns = ['diagnosis_name', 'diagnosis_name_edited','is_spell_corrected','spell_corrected_value'])
I want to use spell_corrected_value column if is_spell_corrected column is more than 1. Else, use diagnosis_name_edited
At the moment, I have following code to directly use diagnosis_name_edited column. How do I make into if-else/lambda check for is_spell_corrected column?
unmapped_diag_series = (rapid_utils.default_process(d) for d in df_diagnosis['diagnosis_name_edited'].astype(str)) # characters (generator)
unmapped_processed_diagnosis = pd.Series(unmapped_diag_series) #
Thank you.
If I get you right, try out this fast solution using numpy.where:
df_diagnosis['new_column'] = np.where(df_diagnosis['is_spell_corrected'] > 1, df_diagnosis['spell_corrected_value'], df_diagnosis['diagnosis_name_edited'])

Extracting column from Array in python

I am beginner in Python and I am stuck with data which is array of 32763 number, separated by comma. Please find the data here data
I want to convert this into two column 1 from (0:16382) and 2nd column from (2:32763). in the end I want to plot column 1 as x axis and column 2 as Y axis. I tried the following code but I am not able to extract the columns
import numpy as np
import pandas as pd
import matplotlib as plt
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
df = pd.DataFrame(data.flatten())
print(df)
and then I want to write the data in some file let us say data1 in the format as shown in attached pic
It is hard to answer without seeing the format of your data, but you can try
data = np.genfromtxt('oscilloscope.txt',delimiter=',')
print(data.shape) # here we check we got something useful
# this should split data into x,y at position 16381
x = data[:16381]
y = data[16381:]
# now you can create a dataframe and print to file
df = pd.DataFrame({'x':x, 'y':y})
df.to_csv('data1.csv', index=False)
Try this.
#input as dataframe df, its chunk_size, extract output as list. you can mention chunksize what you want.
def split_dataframe(df, chunk_size = 16382):
chunks = list()
num_chunks = len(df) // chunk_size + 1
for i in range(num_chunks):
chunks.append(df[i*chunk_size:(i+1)*chunk_size])
return chunks
or
np.array_split

how to put first value in one column and remaining into other column?

ROCO2_CLEF_00001.jpg,C3277934,C0002978
ROCO2_CLEF_00002.jpg,C3265939,C0002942,C2357569
I want to make a pandas data frame from csv file.
I want to put first row entry(filename) into a column and give the column/header name "filenames", and remaining entries into another column name "class". How to do so?
in case your file hasn't a fixed number of commas per row, you could do the following:
import pandas as pd
csv_path = 'test_csv.csv'
raw_data = open(csv_path).readlines()
# clean rows
raw_data = [x.strip().replace("'", "") for x in raw_data]
print(raw_data)
# make split between data
raw_data = [ [x.split(",")[0], ','.join(x.split(",")[1:])] for x in raw_data]
print(raw_data)
# build the pandas Dataframe
column_names = ["filenames", "class"]
temp_df = pd.DataFrame(data=raw_data, columns=column_names)
print(temp_df)
filenames class
0 ROCO2_CLEF_00001.jpg C3277934,C0002978
1 ROCO2_CLEF_00002.jpg C3265939,C0002942,C2357569

Parse JSON to Excel - Pandas + xlwt

I'm kind of half way through this functionality. However, I need some help with formatting the data in the sheet that contains the output.
My current code...
response = {"sic2":[{"confidence":1.0,"label":"73"}],"sic4":[{"confidence":0.5,"label":"7310"}],"sic8":[{"confidence":0.5,"label":"73101000"},{"confidence":0.25,"label":"73102000"},{"confidence":0.25,"label":"73109999"}]}
# Create a Pandas dataframe from the data.
df = pd.DataFrame.from_dict(json.loads(response), orient='index')
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
The output is as follows...
What I want is something like this...
I suppose that first I would need to extract and organise the headers.
This would also include manually assigning a header for a column that cannot have a header by default as in case of SIC column.
After that, I can feed data to the columns with their respective headers.
You can loop over the keys of your json object and create a dataframe from each, then use pd.concat to combine them all:
import json
import pandas as pd
response = '{"sic2":[{"confidence":1.0,"label":"73"}],"sic4":[{"confidence":0.5,"label":"7310"}],"sic8":[{"confidence":0.5,"label":"73101000"},{"confidence":0.25,"label":"73102000"},{"confidence":0.25,"label":"73109999"}]}'
json_data = json.loads(response)
all_frames = []
for k, v in json_data.items():
df = pd.DataFrame(v)
df['SIC Category'] = k
all_frames.append(df)
final_data = pd.concat(all_frames).set_index('SIC Category')
print(final_data)
This prints:
confidence label
SIC Category
sic2 1.00 73
sic4 0.50 7310
sic8 0.50 73101000
sic8 0.25 73102000
sic8 0.25 73109999
Which you can export to Excel as before, through final_data.to_excel(writer, sheet_name='Sheet1')

Loop to get the dataframe, and pass it to a function

I am trying to:
read files and store in dataset
store names of the dataframes in a dataframe
loop to recover the dataframe, and pass it to a function as a dataframe
It does't work because when I retrieve the name of the dataframe, it is a str object, not a dataframe, so the calculus fails.
df_files:
dataframe name
0 df_bureau bureau
1 df_previous_application previous_application
Code:
def missing_values_table_for(df_for, name):
mis_val_for = df_for.isnull().sum() # count null values
-> error
for index, row in df_files.iterrows():
missing_values_for = missing_values_table_for(dataframe, name)
Thanks in advance.
I believe the best here is working with dictionary of Dataframes creating by loop names of files by glob:
import glob
files = glob.glob('files/*.csv')
dfs = {f: pd.read_csv(f) for f in files}
for k, v in dfs.items():
df = v.isnull().sum()
print (df)