Save Selenium results in a file - selenium

I try to save my web scraping results in a csv file by using openpyxl
If I print my results with the following command, I am able to see al the data necessary:
i = 1
g = len(titles)
while i < g:
print(titles[i].text)
print('\n')
print(sections[i].text)
print('\n')
# print(dates[i].text)
# print('\n')
i += 1
but if I try to save the results with the following comamand to csv file:
for r in range(1,5):
for c in range(1,4):
sheet.cell(row=r,column=c).titles[i].text
workbook.save(path1)
I do get the follwing error message:
AttributeError Traceback (most recent call last)
<ipython-input-487-201b34800c46> in <module>
1 for r in range(1,5):
2 for c in range(1,4):
----> 3 sheet.cell(row=r,column=c).titles[i].text
4
5 workbook.save(path1)
AttributeError: 'Cell' object has no attribute 'titles'
As I am a total beginner, it would be great to get an explanation what is wrong... Thanks!

If you look at the error, you'll see it says 'Cell' object has no attribute 'titles'. That means that you're looking for titles in the wrong place, and in your example with printing, you just have titles as a list. This sort of seems like a makeshift solution, so I'd recommend using a data-frame with something like pandas, as it's easy and intuitive to work with in python and has simple exporting. Also, it's a little hard to troubleshoot, as this isn't a minimum workable example. That means I can't see the type of the variables, or data in them.

Is it simply this line:
sheet.cell(row=r,column=c).titles[i].text
Should be an assignment?
sheet.cell(row=r,column=c) = titles[i].text

Related

Question about insert from TK form to csv file

I am trying to get some info from TK form that i built to new CSV file, unfortunately i am getting some Errors.
The code:
def sub_func():
with open('Players.csv','w') as df:
df = pd.DataFrame
i=len(df.index)
data=[]
data.append(entry_box1.get())
data.append(entry_box2.get())
data.append(entry_box3.get())
data.append(entry_box4.get())
if i==4:
df.loc[i,:]=data
The Errors:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\tkinter\_init.py", line 1883, in __call_
return self.func(*args)
File "C:/Users/user/PycharmProjects/test11/Final Project/registration form.py", line 23, in sub_func
i=len(df.index)
TypeError: object of type 'pandas._libs.properties.AxisProperty' has no len()
You have two problems in your code:
The file is opened as df, but df is overwritten on the next line
df = pd.DataFrame does not return anything as it would need parenthesis () to create a dataframe. This means that df does not exist -> error comes when you try to take length of 'nothing'.
Solution:
df = pd.read_csv('./Players') # Make sure that the file is in the same directory
i = len(df.index)
Not sure what you are trying to achieve, but you might need to work on the rest of the code, too.

Getting wildcard from input files when not used in output files

I have a snakemake rule aggregating several result files to a single file, per study. So to make it a bit more understandable; I have two roles ['big','small'] that each produce data for 5 studies ['a','b','c','d','e'], and each study produces 3 output files, one per phenotype ['xxx','yyy','zzz']. Now what I want is a rule to aggregate the phenotype results from each study to a single summary file per study (so merging the phenotypes into a single table). In the merge_results rule I give the rule a list of files (per study and role), and aggregate these using a pandas frame, and then spit out the result as a single file.
In the process of merging the results I need the 'pheno' variable from the input file being iterated over. Since pheno is not needed in the aggregated output file, it is not provided in output and as a consequence it is also not available in the wildcards object. Now to get a hold of the pheno I parse the filename to grab it, however this all feels very hacky and I suspect there is something here I have not understood properly. Is there a better way to grab wildcards from input files not used in output files in a better way?
runstudy = ['a','b','c','d','e']
runpheno = ['xxx','yyy','zzz']
runrole = ['big','small']
rule all:
input:
expand(os.path.join(output, '{role}-additive', '{study}', '{study}-summary-merge.txt'), role=runrole, study=runstudy)
rule merge_results:
input:
expand(os.path.join(output, '{{role}}', '{{study}}', '{pheno}', '{pheno}.summary'), pheno=runpheno)
output:
os.path.join(output, '{role}', '{study}', '{study}-summary-merge.txt')
run:
import pandas as pd
import os
# Iterate over input files, read into pandas df
tmplist = []
for f in input:
data = pd.read_csv(f, sep='\t')
# getting the pheno from the input file and adding it to the data frame
pheno = os.path.split(f)[1].split('.')[0]
data['pheno'] = pheno
tmplist.append(data)
resmerged = pd.concat(tmplist)
resmerged.to_csv(output, sep='\t')
You are doing it the right way !
In your line:
expand(os.path.join(output, '{{role}}', '{{study}}', '{pheno}', '{pheno}.summary'), pheno=runpheno)
you have to understand that role and study are wildcards. pheno is not a wildcard and is set by the second argument of the expand function.
In order to get the phenotype if your for loop, you can either parse the file name like you are doing or directly reconstruct the file name since you know the different values that pheno takes and you can access the wildcards:
run:
import pandas as pd
import os
# Iterate over phenotypes, read into pandas df
tmplist = []
for pheno in runpheno:
# conflicting variable name 'output' between a global variable and the rule variable here. Renamed global var outputDir for example
file = os.path.join(outputDir, wildcards.role, wildcards.study, pheno, pheno+'.summary')
data = pd.read_csv(file, sep='\t')
data['pheno'] = pheno
tmplist.append(data)
resmerged = pd.concat(tmplist)
resmerged.to_csv(output, sep='\t')
I don't know if this is better than parsing the file name like you were doing though. I wanted to show that you can access wildcards in the code. Either way, you are defining the input and output correctly.

EOF error using input Python 3

I keep getting an EOF error but unsure as to why. I have tried with and without int() but it makes no difference. I'm using Pycharm 3.4 and Python 3.
Thanks,
Chris
while True:
try:
number = int(input("what's your favourite number?"))
print (number)
break
You must close a try statement because you are declaring that there might be an error and you want to handle it
while True:
try:
number = int(input("what's your favourite number?"))
print(number)
break
except ValueError as e:
print("Woah, there is an error: {0}".format(e))

Minimisation problem in Python, fmin_bfgs won't work but fmin will, 'Matrices not aligned'

I have a function in python which takes a vector and returns a real number. I am using the scipy.optimize fmin and fmin_bfgs functions to find the argument which gives the function its approx minimum value. However, when I use fmin I get an alright answer (quite slowly) but when I switch to fmin_bfgs, I get an error saying "Matrices are not aligned". Here's my function:
def norm(b_):
b_ = b_.reshape(int(M),1) #M already given elsewhere
Yb = np.dot(Y,b_) #Y already given elsewhere
B = np.zeros((int(M),int(M)))
for j in xrange(int(M)):
B[j][j] = -t[j+1]*np.exp(-t[j+1]*Yb[j]) #The t[j] are already known
P = np.zeros((int(M),1))
for j in xrange(int(M)):
P[j][0] = np.exp(-t[j+1]*Yb[j])
diff = np.zeros((int(M),1)) #Functions d(i,b) are known
for i in xrange(1,int(M)-1):
diff[i][0] = d(i+1,b_) - d(i,b_)
diff[0][0] = d(1,b_)
diff[int(M)-1][0] = -d(int(M)-1,b_)
term1_ = (1.0/N)*(np.dot((V - np.dot(c,P)).transpose(),W))
term2_ = np.dot(W,V - np.dot(c,P)) #V,c,P,W already known
term1_ = np.dot(term1_,term2_)
term2_ = lambd*np.dot(Yb.transpose(),diff)
return term1_ + term2_
Here's how I call fmin_bfgs:
fmin_bfgs(norm, b_guess,fprime=None,
args=(),gtol=0.0001,norm=0.00000000001,
epsilon=1.4901161193847656e-08,maxiter=None,
full_output=0, disp=1, retall=0, callback=None)
When I call fmin it works fine, just too slowly to be useful (I need to optimise several times). But when I try fmin_bfgs I get this error:
Traceback (most recent call last):
File "C:\Program Files\Wing IDE 101 4.0\src\debug\tserver_sandbox.py", line 287, in module
File "C:\Python27\Lib\site-packages\scipy\optimize\optimize.py", line 491, in fmin_bfgs old_fval,old_old_fval)
File "C:\Python27\Lib\site-packages\scipy\optimize\linesearch.py", line 239, in line_search_wolfe2 derphi0, c1, c2, amax)
File "C:\Python27\Lib\site-packages\scipy\optimize\linesearch.py", line 339, in scalar_search_wolfe2 phi0, derphi0, c1, c2)
File "C:\Python27\Lib\site-packages\scipy\optimize\linesearch.py", line 471, in _zoom derphi_aj = derphi(a_j)
File "C:\Python27\Lib\site-packages\scipy\optimize\linesearch.py", line 233, in derphi return np.dot(gval[0], pk)
ValueError: matrices are not aligned
Any ideas why this might happen? All the matrices I have supplied the function are aligned correctly (and the function works since fmin worked). Help much appreciated!
It seems that one of the programs just ended up dealing with numbers that were too large for it to handle. Shame it couldn't tell me it was doing that properly. I worked around it though, so no more problem. Sorry if this wasted your time.

error when trying to import ps file by grImport in R

I need to create a pdf file with several chart created by ggplot2 arranged in a A4 paper, and repeat it 20-30 times.
I export the ggplot2 chart into ps file, and try to PostScriptTrace it as instructed in grImport, but it just keep giving me error of Unrecoverable error, exit code 1.
I ignore the error and try to import and xml file generated into R object, give me another error:
attributes construct error
Couldn't find end of Start Tag text line 21
Premature end of data in tag picture line 3
Error: 1: attributes construct error
2: Couldn't find end of Start Tag text line 21
3: Premature end of data in tag picture line 3
What's wrong here?
Thanks!
If you have no time to deal with Sweave, you could also write a simple TeX document from R after generating the plots, which you could later compile to pdf.
E.g.:
ggsave(p, file=paste('filename', id, '.pdf'))
cat(paste('\\includegraphics{',
paste('filename', id, '.pdf'), '}', sep=''),
file='report.pdf')
Later, you could easily compile it to pdf with for example pdflatex.