I have two csv files, have to find difference for both files and generate the output file in sheet1 - difference data for txt1.csv and sheet2 - difference data for txt2.csv. Kindly advise me.
Sample Input :
with open('txt1.csv', 'r') as t1, open('txt2.csv', 'r') as t2:
fileone = t1.readlines()
filetwo = t2.readlines()
with open('update.csv', 'w') as outFile:
for line in filetwo:
if line not in fileone:
with open('update1.csv', 'w') as outFile:
for line in fileone:
if line not in filetwo:
Expected output:
In sheet1
In sheet2
Note :
When the input file is too large above code is executing very slow
You could try the following.
df2=pd.DataFrame({"A":[1, 2],"B":[2,9]})
0 3
1 4
0 8
1 9
In your case something like:
df1 = pd.read_csv("txt1.csv")
df2 = pd.read_csv("txt2.csv")
delta1 = df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))].reset_index(drop=True)
delta2 = df2[~df2.apply(tuple,1).isin(df1.apply(tuple,1))].reset_index(drop=True)
delta1.to_csv("txt1_delta.csv", index=False)
delta2.to_csv("txt2_delta.csv", index=False)
edit, or if you want to have it in Excel with multiple sheets:
pip install xlsxwriter # if required
import xlsxwriter
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("your_output_excel.xlsx", engine="xlsxwriter")
# Write each dataframe to a different worksheet.
delta1.to_excel(writer, sheet_name="Delta1")
delta2.to_excel(writer, sheet_name="Delta2")
# Close the Pandas Excel writer and output the Excel file.
I have a .csv file, i uses pandas to read the .csv file.
import pandas as pd
from pandas import read_csv
0 1 2 3 4 5
0 -3.288733e-08 2.905263e-08 2.297046e-08 2.052534e-08 3.767194e-08 4.822049e-08
1 2.345769e-07 9.462636e-08 4.331173e-08 3.137627e-08 4.680112e-08 6.067109e-08
2 -1.386798e-07 1.637338e-08 4.077676e-08 3.339685e-08 5.020153e-08 5.871679e-08
3 -4.234607e-08 3.555008e-08 2.563824e-08 2.320405e-08 4.008257e-08 3.901410e-08
4 3.899913e-08 5.368551e-08 3.713510e-08 2.367323e-08 3.172775e-08 4.799337e-08
My aim is to assign the file to a column name so that i can access the data in later time. For example by doing something like
new_data= df['filename']
0 -3.288733e-08,2.905263e-08,2.297046e-08,2.052534e-08,3.767194e-08,4.822049e-08
1 2.345769e-07,9.462636e-08,4.331173e-08,3.137627e-08,4.680112e-08, 6.067109e-08
2 -1.386798e-07,1.637338e-08,4.077676e-08,3.339685e-08,5.020153e-08,5.871679e-08
3 -4.234607e-08,3.555008e-08,2.563824e-08,2.320405e-08,4.008257e-08,3.901410e-08
4 3.899913e-08,5.368551e-08,3.713510e-08,2.367323e-08,3.172775e-08,4.799337e-08
I don't really like it (and I still don't completely get the point), but you could just read in your data as 1 column (by using a 'wrong' seperator) and renaming the column.
import pandas as pd
filename = 'input.csv'
df = pd.read_csv(filename, sep=';')
df.columns = [filename]
If you then wish, you could add other files by doing the same thing (with a different name for df at first) and then concatenate that with df.
A more usefull approach IMHO would be to add the dataframe to a dictionary (or a list would be possible).
import pandas as pd
filename = 'input.csv'
df = pd.read_csv(filename)
data_dict = {filename: df}
# ... Add multiple files to data_dict by repeating steps above in a loop
You can then access your data later on by calling data_dict[filename] or data_dict['input.csv']
I'm kind of half way through this functionality. However, I need some help with formatting the data in the sheet that contains the output.
My current code...
response = {"sic2":[{"confidence":1.0,"label":"73"}],"sic4":[{"confidence":0.5,"label":"7310"}],"sic8":[{"confidence":0.5,"label":"73101000"},{"confidence":0.25,"label":"73102000"},{"confidence":0.25,"label":"73109999"}]}
# Create a Pandas dataframe from the data.
df = pd.DataFrame.from_dict(json.loads(response), orient='index')
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
The output is as follows...
What I want is something like this...
I suppose that first I would need to extract and organise the headers.
This would also include manually assigning a header for a column that cannot have a header by default as in case of SIC column.
After that, I can feed data to the columns with their respective headers.
You can loop over the keys of your json object and create a dataframe from each, then use pd.concat to combine them all:
import json
import pandas as pd
response = '{"sic2":[{"confidence":1.0,"label":"73"}],"sic4":[{"confidence":0.5,"label":"7310"}],"sic8":[{"confidence":0.5,"label":"73101000"},{"confidence":0.25,"label":"73102000"},{"confidence":0.25,"label":"73109999"}]}'
json_data = json.loads(response)
all_frames = []
for k, v in json_data.items():
df = pd.DataFrame(v)
df['SIC Category'] = k
final_data = pd.concat(all_frames).set_index('SIC Category')
This prints:
confidence label
SIC Category
sic2 1.00 73
sic4 0.50 7310
sic8 0.50 73101000
sic8 0.25 73102000
sic8 0.25 73109999
Which you can export to Excel as before, through final_data.to_excel(writer, sheet_name='Sheet1')
I have a quite large CSV file. I have a pandas dataframe that has exactly the columns with the CSV file.
I checked on stackoverflow and I see several answers suggested to read_csv then concatenate the read dataframe with the current one then write back to a CSV file.
But for a large file I think it is not the best way.
Can I concatenate a pandas dataframe to an existed CSV file without reading the whole file?
Update: Example
import pandas as pd
df1 = pd.DataFramce ({'a':1,'b':2}, index = [0])
df2 = pd.DataFrame ({'a':3, 'b':4}, index = [1])
# what to do here? I would like to concatenate df2 to my.csv
The expected my.csv
a b
0 1 2
1 3 4
Look at using mode='a' in to_csv:
df1 = pd.DataFrame ({'a':1,'b':2}, index = [0])
df2 = pd.DataFrame ({'a':3, 'b':4}, index = [1])
df2.to_csv('my.csv', mode='a', header=False)
!type my.csv #Windows machine use 'type' command or on unix use 'cat'
Using pandas 0.19.2.
My goal is to read an excel file and keep everything as string, no conversions.
My excel file that contains the following:
Row1 Row 2
52.60 52.80
68.7k 67.5k
0.80% 0.80%
I tried reading the excel file using the following commands
df = pd.read_excel('C:\Dash\static\Calendar-01-01-2017.xls')
df = pd.read_excel('C:\Dash\static\Calendar-01-01-2017.xls', converters={'Row1':str,'Row2':str}))
df = pd.read_excel('C:\Dash\static\Calendar-01-01-2017.xls', converters={0:str,1:str}))
Unfortunately I end up with this:
Row1 Row 2
52.6 52.8
68.7k 67.5k
0.008 0.008
In the end, I would like to pass it to a list:
df = df.values.tolist()
but I end up with long values such as
My code plan is as follows:
1) find csv files in folder using glob and create a list of files
2) covert each csv file into dataframe
3) extract data from a column location and convert into a separate dataframe
4) append the new data into a separate summary csv file
code is as follows:
Result = []
def result(filepath):
files = glob.glob(filepath)
print files
dataframes = [pd.DataFrame.from_csv(f, index_col=None) for f in files]
new_dfb = pd.DataFrame()
for i, df in enumerate(dataframes):
colname = 'Run {}'.format(i+1)
selected_data = df['3'].ix[0:4]
new_dfb[colname] = selected_data
folder = r"C:/Users/Joey/Desktop/tcd/summary.csv"
print Result
The code error is shown below. The issue seems to be with line 36 .. which corresponds to the selected_data = df['3'].ix[0:4].
I show one of my csv files below:
I'm not sure what the problem is with the dataframe constructor?
You're csv snippet is a bit unclear. But as suggested in the comments, read_csv (from_csv in this case) automatically taken the first row as a list of headers. The behaviour you appear to want is the columns to be labelled as 1,2,3 etc. To achieve this you need to have
[pd.DataFrame.from_csv(f, index_col=None,header=None) for f in files]