Pandas.to_excel: can cell formats be predefined for openpyxl (like xlswriter's add_format)? Precision example - pandas

I'm trying to save dataframes into tabs/sheets in an existing xlsm file. I was able to save multiple sheets into an xlsx file with engine=xlsxwriter but couldn't find a way to modify it to write to a xlsm file. I found numerous examples using openpyxl as an engine but haven't found a way to predefine the cell formats. Here's the xlswriter code:
with pd.ExcelWriter(filename + '.xlsx', engine='xlsxwriter') as writer:
df.to_excel(writer, sheet_name=tabname)
workbook = writer.book
worksheet = writer.sheets[tabname]
fmt0 = workbook.add_format({'num_format': '0'})
fmt1 = workbook.add_format({'num_format': '0.0'})
fmt2 = workbook.add_format({'num_format': '0.00'})
for r in range(len(df.index)):
s = r + 1
cols = {3: 'min', 4: 'max'}
for c in cols:
v = df.at[df.index[r], cols[c]]
if is_number(v):
if '.' not in v:
worksheet.write_number(s, c, float(v), fmt0)
else:
pl = len(v) - v.index('.') - 1 # number of digits after '.'
if pl == 1:
worksheet.write_number(s, c, float(v), fmt1)
elif pl == 2:
worksheet.write_number(s, c, float(v), fmt2)
Does openpyxl have a way to do the same thing?
I think I can define a function that will do this but if there's a way to modify the existing code, I'd rather do that.

Related

Creating new dataframe by search result in df

I am reading a txt file for search variable.
I am using this variable to find it in a dataframe.
for lines in lines_list:
sn = lines
if sn in df[df['SERIAL'].str.contains(sn)]:
condition = df[df['SERIAL'].str.contains(sn)]
df_new = pd.DataFrame(condition)
df_new.to_csv('try.csv',mode='a', sep=',', index=False)
When I check the try.csv file, it has much more lines the txt file has.
The df has a lots of lines, more than the txt file.
I want save the whole line from search result into a dataframe or file
I tried to append the search result to a new dataframe or csv.
first create line list
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
write this apply func has comparing values and filtering
def apply_func(x):
if str(x) in l:
return x
return np.nan
and get output
df["Serial"] = df["Serial"].apply(apply_func)
df.dropna(inplace=True)
df.to_csv("new_df.csv", mode="a", index=False)
or try filter method
f = open("text.txt", "r")
l = list(map(lambda x: x.strip(), f.readlines()))
df = df.set_index("Serial").filter(items=l, axis=0).reset_index()
df.to_csv("new_df.csv", mode="a", index=False)

How can I modify the existing sheetobj.data_validations in openpyxl?

I am writing an application to create a new blank day's worth of cells in a financial transaction spreadsheet. The spreadsheet was generated by WPS.
Everything is working as I'd like, except that data validation for one column only does not copy successfully from the old cells to the new cells in the same column. When I examine the sheetobj.data_validations, I find that the working columns end like this:
M3217:M65536, E3217:E65536
The column that doesn't work correctly ends like this:
N3196:N3215
Here is the code that actually copies the cells:
for r in range(frod, eod):
for c in range(1, maxc + 1):
nrow = r + rinc
cell_tmp = sheet.cell(row = r, column = c)
val_tmp = cell_tmp.value
new_cell = sheet.cell(row = nrow, column = c)
new_cell.font = copy(cell_tmp.font)
new_cell.fill = copy(cell_tmp.fill)
new_cell.number_format = copy(cell_tmp.number_format)
new_cell.alignment = copy(cell_tmp.alignment)
new_cell.border = copy(cell_tmp.border)
headstr = col_string(sheet.title, c)
method = get_method(headstr)
#print('header string for col', c, 'is', headstr, 'form method is', method)
if cell_tmp.data_type is not 'f':
if method == 'cp_eod' and r == (eod - 2):
new_cell.value = val_tmp
else:
new_cell.value = None
new_cell.data_type = cell_tmp.data_type
elif method == 'tc_eod':
new_cell.value = form_self(val_tmp, r, nrow)
elif method == 'prev_inc':
if r == frod:
new_cell.value = form_inc(val_tmp, val_tmp[val_tmp.rindex('+')+1:])
else:
new_cell.value = date_prev(val_tmp, r, nrow)
elif method == 'self':
if r == eod:
#print('*** EOD ***')
new_cell.value = form_self_dec(val_tmp, r, nrow)
else:
new_cell.value = form_self(val_tmp, r, nrow)
elif method == 'self_fod':
if r == frod:
new_cell.value = form_self(val_tmp, r, nrow)
elif r == eod:
new_cell.value = form_self_dec(val_tmp, r, nrow)
else:
new_cell.value = form_frod(val_tmp, frod, frod + rinc, r, nrow)
elif method == 'self_eod':
new_cell.value = form_eod(val_tmp, r, nrow, eod, eod + rinc)
elif method == 'tcsbs':
new_cell.value = form_tcsbs(val_tmp, val_tmp[val_tmp.rindex('$')+1:])
elif method == 'self_prev':
if r == eod:
new_cell.value = form_prev(val_tmp, nrow)
else:
new_cell.value = form_self_dec(val_tmp, r, nrow)
The column in question falls into this block of code:
else:
new_cell.value = None
new_cell.data_type = cell_tmp.data_type
Is it possible to edit the sheetobj.data_validations for column N to end the same way the working columns do?
Other things I have tried:
Directly setting a new data validation for the cells in the column. Python showed the cells having data validation but when I saved the workbook and opened in google sheets, it was not there.
Copying a cell from a good column into the bad column on the original sheet.
This also did not solve my problem, when I copied a new block of cells, the data validation did not copy.
Copying a cell from a good column in another sheet into the bad column on the original sheet.
This resulted in losing all data validation when I copied a new block of cells.

Pandas - Append dataframe to new excel sheet for multiple files

I want my new columns that are the output in a new sheet named "Analyzed Data" for multiple files. Each file has a different amount of columns with varying names.
import os
import pandas as pd
path = r'C:\Users\Me\1Test'
filelist = []
for root, dirs, files in os.walk(path):
for f in files:
if not f.endswith('.txt'):
continue
filelist.append(os.path.join(root, f))
for f in filelist:
df = pd.read_table(f)
col = df.iloc[ : , : -3]
df['Average'] = col.mean(axis = 1)
out = (df.join(df.drop(df.columns[[-3,-1]], axis=1)
.sub(df[df.columns[-3]], axis=0)
.add_suffix(' - Background')))
out.to_excel(f.replace('txt', 'xlsx'), 'Sheet1')

xlrd copying cell style from xls to xlsx

Im trying to convert xls to xlsx files including cell style. At the moment, Im working with cell fonts.
Is there another way for xlrd to determine whether the cell font is bold or not other than font._font_flag == 1?
def xlstoxlsx():
input_file_loc = [x for x in os.listdir(path1) if x.endswith(".xls")]
count = 0
for file in input_file_loc:
xlsxwb = xlsxwriter.Workbook(f"{path1}/" + input_file_loc[count] + '_Coverted {:2d}.xlsx'.format(count))
xlswb = xlrd.open_workbook(path1 + "/" + file, formatting_info=True)
bold = xlsxwb.add_format({'bold': True})
italic = xlsxwb.add_format({'italic': True})
uline = xlsxwb.add_format({'underline': True})
for sht in xlswb.sheet_names():
xlsws = xlswb[sht]
rows, cols = xlsws.nrows, xlsws.ncols
all_rows = []
for row in range(rows):
curr_row = []
for col in range(cols):
curr_row.append(xlsws.cell_value(row, col))
all_rows.append(curr_row)
xlsxws = xlsxwb.add_worksheet()
for row in range(len(all_rows)):
for col in range(len(all_rows[0])):
cell = xlsws.cell(row, col)
font = xlswb.xf_list[cell.xf_index]
if font._font_flag == 1:
xlsxws.write(row, col, all_rows[row][col], bold)
elif font._font_flag == 0:
xlsxws.write(row, col, all_rows[row][col])
count += 1
xlsxwb.close()

Create Dataframe name from 2 strings or variables pandas

i am extracting selected pages from a pdf file. and want to assign dataframe name based on the pages extracted:
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
for i in selected_pages():
df{str(i)} = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True,area = [100,10,740,950],pages= (i), index = False)
print (df{str(i)} )
The idea, ultimately, as in above example, is to have dataframes: df10, df11. I have tried "df" + str(i), "df" & str(i) & df{str(i)}. however all are giving error msg: SyntaxError: invalid syntax
Or any better way of doing it is most welcome. thanks
This is where a dictionary would be a much better option.
Also note the error you have at the start of the loop. selected_pages is a list, so you can't do selected_pages().
file = "abc"
selected_pages = ['10','11'] #can be any combination eg ['6','14','20]
df = {}
for i in selected_pages:
df[i] = read_pdf(path + file + ".pdf",encoding = 'ISO-8859-1', stream = True, area = [100,10,740,950], pages= (i), index = False)
i = int(i) - 1 # this will bring it to 10
dfB = df[str(i)]
#select row number to drop: 0:4
dfB.drop(dfB.index[0:4],axis =0, inplace = True)
dfB.columns = ['col1','col2','col3','col4','col5']