I'm converting the float columns to currency data type with the following:
df = pd.DataFrame({'col0': [71513.0, 200000.0, None],
'col1': [True, False, False],
'col2': [100.0, 200.0, 0.0]})
df[['col0', 'col2']] = df[['col0', 'col2']].astype(float).astype("Int32").applymap(\
lambda x: "${:,.0f}".format(x) if isinstance(x, int) else x)
I am outputting the table with the following:
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
df.to_excel(writer, index= False)
workbook = writer.book
ws = writer.sheets['Sheet1']
writer.close()
writer.save()
However, when I output the datable with the following, the currency is stored as text:
How would I format the excel sheet itself (instead of the pandas column) based on the column name so that the value is a number, but the formatting is currency?
This is how it worked for me
Removed the column formatting within df
df = pd.DataFrame({'col0': [71513.0, 200000.0, None],
'col1': [True, False, False],
'col2': [100.0, 200.0, 0.0]})
Removed index parameter from to_excel,
Defined format for the columns, and assign it to columns 1, and 3
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
df.to_excel(writer) # index= False)
workbook = writer.book
ws = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '$#,##0.00'})
ws.set_column(1, 1, 18, format1)
ws.set_column(3, 3, 18, format1)
writer.save()
writer.close()
reference to documentation: https://xlsxwriter.readthedocs.io/example_pandas_column_formats.html
Related
I am reading a txt file for search variable.
I am using this variable to find it in a dataframe.
for lines in lines_list:
sn = lines
if sn in df[df['SERIAL'].str.contains(sn)]:
condition = df[df['SERIAL'].str.contains(sn)]
df_new = pd.DataFrame(condition)
df_new.to_csv('try.csv',mode='a', sep=',', index=False)
When I check the try.csv file, it has much more lines the txt file has.
The df has a lots of lines, more than the txt file.
I want save the whole line from search result into a dataframe or file
I tried to append the search result to a new dataframe or csv.
first create line list
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
write this apply func has comparing values and filtering
def apply_func(x):
if str(x) in l:
return x
return np.nan
and get output
df["Serial"] = df["Serial"].apply(apply_func)
df.dropna(inplace=True)
df.to_csv("new_df.csv", mode="a", index=False)
or try filter method
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
df = df.set_index("Serial").filter(items=l, axis=0).reset_index()
df.to_csv("new_df.csv", mode="a", index=False)
I am able to save multiple dataframe in multiple excel sheets.
writer = pd.ExcelWriter('Cloud.xlsx', engine='xlsxwriter')
frames = {'Image': df1, 'Objects': df2, 'Text': df3 , 'Labels': df4}
for sheet, frame in frames.items():
frame.to_excel(writer, sheet_name = sheet)
writer.save()
Now I want to create multiple files based on the dataframes column. For example, I want to create 4 excel files:
df1
Category URL Obj
A example.com Chair
A example2.com table
B example3.com glass
B example4.com tv
So my all datframes have 7 categories and I want to create 7 files based the category.
I think you need:
frames = {'Image': df1, 'Objects': df2, 'Text': df3 , 'Labels': df4}
for sheet, frame in frames.items():
for cat, g in frame.groupby('Category'):
# if file does not exist
if not os.path.isfile(f'{cat}.xlsx'):
writer = pd.ExcelWriter(f'{cat}.xlsx')
else: # else it exists
writer = pd.ExcelWriter(f'{cat}.xlsx', mode='a', engine='openpyxl')
g.to_excel(writer, sheet_name = sheet)
writer.save()
I'm iterating through PDF's to obtain the text entered in the form fields. When I send the rows to a csv file it only exports the last row. When I print results from the Dataframe, all the row indexes are 0's. I have tried various solutions from stackoverflow, but I can't get anything to work, what should be 0, 1, 2, 3...etc. are coming in as 0, 0, 0, 0...etc.
Here is what I get when printing results, only the last row exports to csv file:
0
0 1938282828
0
0 1938282828
0
0 22222222
infile = glob.glob('./*.pdf')
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = pd.DataFrame([myfieldvalue2])
print(df)`
Thank you for any help!
You are replacing the same dataframe each time:
infile = glob.glob('./*.pdf')
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = pd.DataFrame([myfieldvalue2]) # this creates new df each time
print(df)
Correct Code:
infile = glob.glob('./*.pdf')
df = pd.DataFrame()
for i in infile:
if i.endswith('.pdf'):
pdreader = PdfFileReader(open(i,'rb'))
diction = pdreader.getFormTextFields()
myfieldvalue2 = str(diction['ID'])
df = df.append([myfieldvalue2])
print(df)
I have dataframe "df" as below:
x = [1,3,5,7]
y1 = [3,2,2,2]
y2 = [2,5,2,2]
y3 = [7,2,2,1]
df = pd.DataFrame({'x': x, 'y1': y1, 'y2': y2, 'y3': y3})
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
I want the excel output file shows the same color in mutual values of column x with other columns. :
You can use styles if colors are specify by dictionary:
def color(a):
d = {1:'yellow', 3:'green', 5:'blue', 7:'red'}
d1 = {k: 'background-color:' + v for k, v in d.items()}
df1 = pd.DataFrame(index=a.index, columns=a.columns)
df1 = a.applymap(d1.get).fillna('')
return df1
df.style.apply(color, axis=None).to_excel('styled.xlsx', engine='openpyxl', index=False)
I have tried the following but none work.
chart.auto_axis = False
chart.x_axis.unit = 365
chart.set_y_axis({'minor_unit': 100, 'major_unit':365})
changing the max and min scale for both axis is straight forward
chart.x_axis.scaling.min = 0
chart.x_axis.scaling.max = 2190
chart.y_axis.scaling.min = 0
chart.y_axis.scaling.max = 2
so I'm hoping there is a straight forward solution to this. Here is a mcve.
from openpyxl import load_workbook, Workbook
import datetime
from openpyxl.chart import ScatterChart, Reference, Series
wb = Workbook()
ws = wb.active
rows = [
['data point 1', 'data point2'],
[25, 1],
[100, 2],
[500, 3],
[800, 4],
[1200, 5],
[2100, 6],]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Example Chart"
chart.style = 18
chart.y_axis.title = 'y'
chart.x_axis.title = 'x'
chart.x_axis.scaling.min = 0
chart.y_axis.scaling.min = 0
chart.X_axis.scaling.max = 2190
chart.y_axis.scaling.max = 6
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
yvalues = Reference(ws, min_col=2, min_row=2, max_row=7)
series = Series(values=yvalues, xvalues=xvalues, title="DP 1")
chart.series.append(series)
ws.add_chart(chart, "D2")
wb.save("chart.xlsx")
I need to automate changing the axis to units of 365 or what ever.
Very late answer, but I figured out how to do this just after finding this question.
You need to set the major unit axis to 365.25, and the format to show just the year:
chart.x_axis.number_format = 'yyyy'
chart.x_axis.majorUnit = 365.25