Hello I am trying to make an automation where I can iterate through the rows in a df column and copy and paste them one at a time to excel. I would like to include a loop to where I can press enter and it will copy the next cell. I have this code written for reference but it is not working.
import pandas as pd
import openpyxl
import pyperclip as pc
import pyautogui as pg
Excel_File = r'/Users/martinflores/Desktop/Control.xlsx'
df = pd.read_excel(Excel_File)
x= df['Age']
y = df['Name']
z = df['Count']
def main():
for index, row in df.iterrows():
string = row['Age']
cp = pc.copy(string)
return cp
pg.sleep(3)
pc.paste(main())
pg.press('down')
I thought my main function would save the string to the Clipboard and I could either paste by pg.hotkey('ctrl','v',) or pc.paste(main()) but it won't do anything.Also I am not sure if it matter but I am developing this code on IOS at the moment.
Related
I have loaded an Excel file into a pandas dataframe with:
df = pandas.read_excel("file.xlsx")
The file has multiple sheets, but only the first is displayed when I invoke the dataframe name.
How do I view the other sheets?
You can try using pandas ExcelFile
xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')
import pandas as pd
df = pd.read_excel("file.xlsx", sheet_name = 'sheet1')
So I have been having some issues reading large excel files into databricks using pyspark and pandas. Spark seems to be really fast at csv and txt but not excel
i.e
df2=pd.read_excel(excel_file, sheetname=sheets,skiprows = skip_rows).astype(str)
df = spark.read.format("com.crealytics.spark.excel").option("dataAddress", "\'" + sheet + "\'" + "!A1").option("useHeader","false").option("maxRowsInMemory",1000).option("inferSchema","false").load(filePath)
We have found the fastest way to read in an excel file to be one which was written by a contractor:
from openpyxl import load_workbook
import csv
from os import sys
excel_file = "/dbfs/{}".format(path)
sheets = []
workbook = load_workbook(excel_file,read_only=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in workbook.get_sheet_names():
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
with open("/dbfs/{}/{}.csv".format(tempDir, worksheet_name), 'w') as your_csv_file:
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
headerDone = False
for row in worksheet.iter_rows():
lrow = []
if headerDone == True:
lrow.append(worksheet_name)
else:
lrow.append("worksheet_name")
headerDone = True
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
#Sometimes python gets a bit ahead of itself and
#tries to do this before it's finished writing the csv
#and fails
retryCount = 0
retryMax = 20
while retryCount < retryMax:
try:
df2 = spark.read.format("csv").option("header", "true").load(tempDir)
if df2.count() == 0:
print("Retrying load from CSV")
retryCount = retryCount + 1
time.sleep(10)
else:
retryCount = retryMax
except:
print("Thew an error trying to read the file")
The reason it is fast is that it is only storing one line of excel sheet in memory when it loops round. I tried appending the list of rows together but this made it very slow.
The issue with the above method is that it writing to csv and re-reading it doesn't seem the most robust method. Its possible that the csv could be read part way while its written and it could still be read in and data could be lost.
Is there any other way of making this fast such as using cython so you can just put the append the list of rows without incurring a penalty for the memory and put them directly into spark directly via createDataFrame?
I have error in python code. I am trying split workbook to different sheets based on column value, below is the code.
import pandas as pd
import os
from xlwings import Book, Range, Sheet
path = ('C:\Dell')
worksheet = ('FILE.xlsx')
sheet =('Temporary_Table')
column = ('SERIAL_NUMBER')
workbook = os.path.join(path, worksheet)
wb = Book(workbook)
data = pd.DataFrame(pd.read_excel(workbook, sheet, index_col=None, na_values=[0]))
data.sort_values(column, axis = 0, inplace = True)
data = pd.DataFrame(pd.read_excel(workbook, sheet, index_col=None, na_values=[0]))
data.sort_values(column, axis = 0, inplace = True)
split = data.groupby(column)
for i in split.groups:
Sheet.add()
Range('A1', index = False).value = split.get_group(i)
it keeps giving me
type object 'Sheet' has no attribute 'add'
Desperate about this mystery. So i just upgraded my pandas to 0.22 (from 0.18) and mysteriously, when using xlwings, dropna or isnull does NOT work anymore. I see that myTemp is still giving me the correct True and False, yet
unwindDF will give me all the df_raw data just with everything filled to become nan and naT. Similar issue for noPx.
This is the case even if I manually assign np.nan to a cell Yet surprisingly, when in the same file I create a simple df towards the end, then myTest1
is working well. why? is there something special about xlwings with pandas 0.22?
My code is below and my xlsx file in the image.
import pythoncom
import pandas as pd
import xlwings as xw
import numpy as np
folder_path = 'S:/Order/all PNL files/'
excel_name='pnlTest.xlsx'
pnl_excel_path = folder_path + excel_name
sheetName = 'Sheet1'
pythoncom.CoInitialize()
app = None
bk = None
app_count = xw.apps.count
for i in range(app_count):
try:
app = xw.apps[i]
temp = app.books[excel_name]
bk = temp
print()
print("Using Opened File")
except:
print()
if bk == None:
print("Open New Excel App")
app = xw.App()
bk = xw.Book(pnl_excel_path)
bk.app.calculation = 'manual'
bk.app.screen_updating = False
sht = bk.sheets[sheetName]
last_row_index = sht.range('A1').end('down').row
df_raw = sht.range('A1:M' + str(last_row_index)).options(pd.DataFrame, header=1,
index=0).value
myTemp = df_raw['UNWD_DT'].isnull()
unwindDF = df_raw[df_raw['UNWD_DT'].isnull()]
df_raw.loc[10,'Curr_Px']=np.nan
df_raw.iloc[10,11]=np.nan
noPx=df_raw[df_raw['Curr_Px'].isnull()]
df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1],'c':[np.nan,1,0,np.nan]})
myTemp1=df['c'].isnull()
myTest1=df[df['c'].isnull()]
df_raw.dropna(thresh=2,inplace=True)
df_raw2=df_raw.dropna(thresh=2)
I am working on an application that allows the user to dynamically add to and remove items from an excel file. The quantity of items shall be unlimited.
I am looking for a way to grab the items from the excel file and transfer them to the ComboBox.
To make myself clearer: The problem is not iterating through cells, but getting cell values into the ComboBox. I need a method that captures the content of all cells with values in a given column, where the end of range is unknown and then transfer the values to a ComboBox.
The Combobox only accepts values, not any empty cells. I also don't want fields in the ComboBox that say "No Value".
I have tried itering through cells and range methods, but this doesn't get the values into the ComboBox.
What I have so far is:
wb = load_workbook (source_file)
ws = wb.active
self.value_1 = ws['B2'].value
self.value_2 = ws['B3'].value
self.value_3 = ws['B4'].value
self.value_4 = ws['B5'].value
self.value_5 = ws['B6'].value
self.value_6 = ws['B7'].value
self.value_7 = ws['B8'].value
self.value_8 = ws['B9'].value
self.value_9 = ws['B10'].value
self.value_10 = ws['B11'].value
stock_items = [ self.value_1 , self.value_2 , self.value_3 , self.value_4 , self.value_5 ,
self.value_6 , self.value_7 , self.value_8 , self.value_9 , self.value_10 ]
self.combo_items_list = [ ]
for stock_item in stock_items :
if stock_item != None :
self.combo_items_list.append (stock_item)
self.combo.addItems(self.combo_items_list)
This works as expected, but what troubles me is that I have to add a line of code for each item I grab from the excel file, besides having to put an extra entry into the stock_items list. If there were 5.000 items in the file, that would result in 5.000 lines of code and 5000 entries in the list.
Is there a more efficient and elegant way to handle the issue with "counter" or pandas?
Thanks in advance.
I found a way to do this nicely using Pandas, not opnpyxl :
import pandas as pd
import numpy as np
# get sheet and whole column
sales = pd.read_excel ("Inventory.xlsx")
# filter out any None Values
sales_article = sales ["Artigo"] .dropna()
# transform into list
sales_list = sales_article.values.tolist()
# add list to ComboBox
self.combo.addItems(sales_list)
In openpyxl 2.4 worksheets have an iter_cols method that allow you to select a range of cells and have them returned as columns. Just like iter_rows returns them as rows. This is the simplest and most efficient way to do what you want to do.
See https://openpyxl.readthedocs.io/en/default/tutorial.html#accessing-many-cells for details.
An example for your use case:
cells = [cell.value for cell in ws.iter_cols(min_col=2, max_col=2, min_row=2) if cell.value is not None]
wb = load_workbook(source_file)
ws = wb.active
lastrow = ws.UsedRange.Height # don't remember method name
for row in range(lastrow):
value = ws['B' + str(row + 2)].value
if value is not None:
self.combo_items_list.append (value)
self.combo.addItems(self.combo_items_list)
See worksheet docs for other ways to get range of excel rows.
wb = load_workbook (source_file)
ws = wb.active
self.combo_items_list = [ ]
// loop from 2(start)-11(end)
// check if ws['B'<counter>].value is available and not null
// add this value to your array of combo for each ittration.
self.combo_items_list.append (ws['B'<counter>].value)
self.combo.addItems(self.combo_items_list)
Sorry, if i am getting it wrong. I don't know the syntax of given language. Still saw a logical answer and decided to post.