Storing information from td tags with a specific width, in python - selenium

I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method.
<a name="AAKER"> </a>
<table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b>
<small>(<a href="http://google.com">Soundex
A260</a>)
— <i>See also</i>
ACKER,
KEAR,
TAAKE.
</small>
</td></tr></tbody></table><br clear="all">
<table align="left" cellpadding="5">
<tbody><tr><td width="82" align="right" valign="top"> </td><td valign="top">
<img src="rd.gif" width="13" height="13">
<b><a name="954.35.65">Aaker, Casper Drengman</a> (b.1883)</b>
— also known as
<b>Casper D. Aaker</b> — of Minot,
WardCounty , N.Dak. Born in Ridgeway,
Winneshiek County , Iowa, August,
1883. Republican.
Lawyer; organizer, Trinity
Hospital,
1922; delegate to Republican National Convention from North Dakota.
<table width="100%" align="left">
<tbody>
<tr><td width="20"> </td>
<td width="26" valign="top"><img src="hand.gif" width="26" height="17"></td>
<td valign="top">
<span style="font-size:8pt;"><i>Relatives:</i>
Son of Drengman Aaker and Christine (Ellefson) Aaker; married,
December 15, 1914,
to Leda Mansfield.</span>
</td>
</tr>
</tbody>
</table>
</td></tr>
<tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="949.93.45">Aaker, H. H.</a></b> — of
Norman County
, Minn. Prohibition candidate for
secretary of state of Minnesota
, 1892.
Burial location unknown.
</td></tr>
</tbody>
</table><br clear="all"><br>
<a name="AALL"> </a>
<table border="" width="100%" cellpadding="5">
<tbody><tr><td bgcolor="#FFFFFF"><b>AALL</b> <small>(
SoundexA400
)— <i>See also</i>
AHL,
AL,
ALL,
</small>
</td></tr>
</tbody></table><br clear="all">
<tbody><tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="961.32.34">Aamodt, Gary</a></b> — of Madison,
Dane County, Wis.
Democrat. Delegate to Democratic National Convention from Wisconsin,
1976. Still living as of 1976.
</td></tr>
<tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="030.75.75">Aamodt, Marjorie M.</a></b> —
Democrat. Candidate for
<a href="http://google.com">Pennsylvania
state house of representatives</a> 13th District, 1980.
Female.
Still living as of 1980.
</td>
</tr>
</tbody></table><br clear="all"><br>
So far I have tried defining an object:
ta = driver.find_element_by_tag_name('tbody').get_attribute('innerHTML')
pd.read_html(ta)
But I wish to have all pd.read_html(ta)[i] stored in a dataframe ignoring the table width ="100"

You can .extract() the tables with widht="100% from the soup and then get all rows.
For example (txt contains your HTML snippet from the question):
soup = BeautifulSoup(txt, 'html.parser')
for t in soup.select('table[width="100%"]'):
t.extract()
all_data = []
for row in soup.select('tr'):
name, desc = row.get_text(strip=True, separator=' ').split('—', maxsplit=1)
all_data.append([name, desc.strip()])
df = pd.DataFrame(all_data, columns=['name', 'description'])
print(df)
df.to_csv('data.csv')
Prints:
name description
0 Aaker, Casper Drengman (b.1883) also known as Casper D. Aaker — of Minot, Ward...
1 Aaker, H. H. of Norman County , Minn. Prohibition candidate...
2 Aamodt, Gary of Madison, Dane County , Wis.\n Democr...
3 Aamodt, Marjorie M. Democrat. Candidate for Pennsylvania\n ...
And saves data.csv (screenshot from LibreOffice):

Related

XPATH to use preceding and following sibling in a single statement

I would like to scrape name, address informations between tag contains defendent text and another tag,
My HTML structure is:
<hr>
<H5>Defendant/Respondent Information</H5>
<span class="InfoChargeStatement">(Each Defendant/Respondent is displayed below)</span>
<table>
<tr>
<td><span class="FirstColumnPrompt">Party Type:</span></td><td><span class="Value">Defendant</span><span class="Prompt">Party No.:</span><span class="Value">1</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Name:</span></td><td><span class="Value">Name 1</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Address:</span></td><td><span class="Value">Addr 1</span></td>
</tr>
<tr>
<td><span class="FirstColumnPrompt">City:</span></td><td><span class="Value">city1</span><span class="Prompt">State:</span><span class="Value">aa</span><span class="Prompt">Zip Code:</span><span class="Value">Zip1</span></td>
</tr>
</table>
<hr>
<table>
<tr>
<td><span class="FirstColumnPrompt">Party Type:</span></td><td><span class="Value">Defendant</span><span class="Prompt">Party No.:</span><span class="Value">2</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Name:</span></td><td><span class="Value">Name 2</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Address:</span></td><td><span class="Value">Addr2</span></td>
</tr>
<tr>
<td><span class="FirstColumnPrompt">City:</span></td><td><span class="Value">City2</span><span class="Prompt">State:</span><span class="Value">st2</span><span class="Prompt">Zip Code:</span><span class="Value">zip2</span></td>
</tr>
</table>
<hr>
<H5>Related Persons Information</H5>
<span class="InfoChargeStatement">(Each Related person is displayed below)</span>
<table>
<tr>
<td><span class="FirstColumnPrompt">Name:</span></td><td><span class="Value">Unwanted Name</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Address:</span></td><td><span class="Value">un addr</span></td>
</tr>
<tr>
<td><span class="FirstColumnPrompt">City:</span></td><td><span class="Value">Unwanted City</span><span class="Prompt">State:</span><span class="Value">Unwanted city</span><span class="Prompt">Zip Code:</span><span class="Value">12345</span></td>
</tr>
</table>
<table></table>
<hr>
My current XPATH capturing the first occurence of Name and address properly, but if need to extract the multiple occurences, it also scrape the information from the unwanted h5 tags.
My current XPATH is,
"//*[contains(text(),'Defendant')]//following-sibling::table//span[text()='Name:' or text()='Business or Organization Name:']/ancestor-or-self::td/following-sibling::td//text()")
I tried including preceding sibling and following sibling but nothing gives my expected output,
My current output is..
names - [
Name1,
Name2
Unwanted Name,
]
Expected output is,
[
Name1
Name2
]
Kindly help.
try this:
"//H5[contains(text(),'Defendant')]/following-sibling::table[not(preceding-sibling::H5[not(contains(text(),'Defendant'))])]/tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"
It first selects the table that has not a preceding-sibling::h5 with text() that not contains 'Defendant' and than
selects from the correct table the tr where the first td meets your requirements and selects the second td
No need for double slashes which is bad for performance
EDIT 1
Since there are more preceding-sibling::h5 than the example shows, this XPath will deal with that:
"//H5[contains(text(),'Defendant')]/following-sibling::table[preceding-sibling::H5[1][contains(text(),'Defendant')]]//tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"
This will only select those tables that have as there first preceding-sibling::h5 the same h5 as we were interested in
EDIT 2
Actually now the first h5 select is redundant. This XPath will do:
"//table[preceding-sibling::H5[1][contains(text(),'Defendant')]]//tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"

v-for duplicating th of tr same code is working fine for other objects

I have seen few solutions online but It did not solve my problem. I am getting JSON object in the response.
<!-- Show Negativita Check Azienda -->
<table class="divide-y divide-gray-200 table-fixed w-full mt-4" v-if="showTableAzienda" v-for="item in impreses">
<thead class="bg-gray-900">
<tr>
<th>Codice Fiscale</th>
<th>Flag Domande</th>
<th>Flag Pregiudizievoli</th>
<th>Flag Procedure</th>
<th>Flag Protesti</th>
<th>Data Evasione</th>
</tr>
</thead>
<tbody class="text-center py-6" >
<tr>
<td>{{item.codice_fiscale}}</td>
<td>{{item.flagDomande}}</td>
<td>{{item.flagPregiudizievoli}}</td>
<td>{{item.flagProcedure}}</td>
<td>{{item.flagProtesti}}</td>
<td>{{item.dataEvasione}}</td>
</tr>
</tbody>
</table>
Here is JSON response
{
"codice_fiscale":"CLLLCA82R69D960T",
"flagDomande":"N",
"flagPregiudizievoli":"N",
"flagProcedure":"N",
"flagProtesti":"N",
"dataEvasione":"2021-11-04"
}
because the elements in the object are six. it generates th for six times with no output. if I print {{impreses.codice_fiscale}} then it shows the output. I am not able to understand behavior.
EDIT
Second Question
{"EventiNegativiPersona":
{"InfoPersona":
{"Nominativo":
{"#attributes":{"cognome":"","nome":""}},
"CodiceFiscale":"CLLLCA82R69D960T"},
"ElencoProtesti":{"#attributes":
{"flagPresenza":"N"}},"ElencoPregiudizievoli":
{"#attributes":{"flagPresenza":"N"}}}}
I would like to show these but {{item.EventiNegativiPersona.#parameters.so-on}} does not work because of #parameters. How can i show this?
Based on the response object shown in your question you could move the v-for to the td tag :
<table class="..." v-if="showTableAzienda" >
<thead class="bg-gray-900">
<tr>
<th>Codice Fiscale</th>
<th>Flag Domande</th>
<th>Flag Pregiudizievoli</th>
<th>Flag Procedure</th>
<th>Flag Protesti</th>
<th>Data Evasione</th>
</tr>
</thead>
<tbody class="text-center py-6" >
<tr>
<td v-for="item in impreses">{{item}}</td>
</tr>
</tbody>
</table>

Find the text from a row and column in a table

I have a table in a html like below, I need to extract for example the End Snap value under Snap Time column which is 03-Sep-20 02:00:01
<table border="0" width="600" class="tdiff" summary="for snapshot information">
<tr><th class="awrnobg" scope="col"></th><th class="awrbg" scope="col">Snap Id</th><th class="awrbg" scope="col">Snap Time</th><th class="awrbg" scope="col">Sessions</th><th class="awrbg" scope="col">Cursors/Session</th><th class="awrbg" scope="col">Instances</th></tr>
<tr><td scope="row" class='awrnc'>Begin Snap:</td><td align="right" class='awrnc'>121525</td><td align="center" class='awrnc'>03-Sep-20 01:30:07</td><td align="right" class='awrnc'>167</td><td align="right" class='awrnc'> 10.4</td><td align="right" class='awrnc'>6</td></tr>
<tr><td scope="row" class='awrc'>End Snap:</td><td align="right" class='awrc'>121526</td><td align="center" class='awrc'>03-Sep-20 02:00:01</td><td align="right" class='awrc'>174</td><td align="right" class='awrc'> 11.2</td><td align="right" class='awrc'>6</td></tr>
<tr><td scope="row" class='awrnc'>Elapsed:</td><td class='awrnc'> </td><td align="center" class='awrnc'> 29.90 (mins)</td><td class='awrnc'> </td><td class='awrnc'> </td><td class='awrnc'> </td></tr>
<tr><td scope="row" class='awrc'>DB Time:</td><td class='awrc'> </td><td align="center" class='awrc'> 67.15 (mins)</td><td class='awrc'> </td><td class='awrc'> </td><td class='awrc'> </td></tr>
</table>
the required value is requested in the format : columnname_row name :
Snap Id_Begin Snap
Snap Id_End Snap
Snap Time_Begin Snap
Snap Time_End Snap
which goes into a variable called namesplit.
I am trying to first pull the column number,and row number to then print the required value :
dbii = soup.find_all("table", attrs={"summary": "for snapshot information"})
for tables in dbii:
vcols=tables.findChildren('th')
#print(type(rows)) #bs4.element.ResultSet
#print(rows)
#print(ti)
ii=0
for value ivcols:
#print(value.strip)
#print(value.string)
#print(type(value)) # bs4.element.Tag
if(value.text!=None and value.text.lower() == namesplit[0].lower()): # this matches the column name string
print("match")
col_no=ii
table_no=ti
else:
ii+=1
ti+=1
print(table_no,col_no,namesplit[1]) # correctly gives table 0, column as 1 or 2
print("abc")
#print(dbii[table_no])
#print(type(dbii[table_no]))
# Find Row number.
drow=dbii[table_no].find_all(scope = 'row' )
j=0
print(row_no)
for value in drow:
#print("row",j,"asdasdsad:",value,value.text)
if(value.text!=None and namesplit[1].lower() in value.text.lower() ):
row_no=j
j+=1
print(row_no) # correctly picks the td row as 0(for begin) or 1 (for end)
# We have Table no , column number, Row_no .. get the corresponding value.
fvalue=dbii[table_no].find_all(tr)[row_no] ## this doesnt work. as its a tag.
print(type(fvalue)) ## tag ??
print(fvalue)
To print the value from End Snap row and third column, you can do:
from bs4 import BeautifulSoup
html_text = '''
<table border="0" width="600" class="tdiff" summary="for snapshot information">
<tr><th class="awrnobg" scope="col"></th><th class="awrbg" scope="col">Snap Id</th><th class="awrbg" scope="col">Snap Time</th><th class="awrbg" scope="col">Sessions</th><th class="awrbg" scope="col">Cursors/Session</th><th class="awrbg" scope="col">Instances</th></tr>
<tr><td scope="row" class='awrnc'>Begin Snap:</td><td align="right" class='awrnc'>121525</td><td align="center" class='awrnc'>03-Sep-20 01:30:07</td><td align="right" class='awrnc'>167</td><td align="right" class='awrnc'> 10.4</td><td align="right" class='awrnc'>6</td></tr>
<tr><td scope="row" class='awrc'>End Snap:</td><td align="right" class='awrc'>121526</td><td align="center" class='awrc'>03-Sep-20 02:00:01</td><td align="right" class='awrc'>174</td><td align="right" class='awrc'> 11.2</td><td align="right" class='awrc'>6</td></tr>
<tr><td scope="row" class='awrnc'>Elapsed:</td><td class='awrnc'> </td><td align="center" class='awrnc'> 29.90 (mins)</td><td class='awrnc'> </td><td class='awrnc'> </td><td class='awrnc'> </td></tr>
<tr><td scope="row" class='awrc'>DB Time:</td><td class='awrc'> </td><td align="center" class='awrc'> 67.15 (mins)</td><td class='awrc'> </td><td class='awrc'> </td><td class='awrc'> </td></tr>
</table>
'''
soup = BeautifulSoup(html_text, 'html.parser')
print(soup.select_one('tr:has(td:contains("End Snap:")) td:nth-child(3)').text)
Prints:
03-Sep-20 02:00:01
To get all values, you can do:
all_data = []
for row in soup.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in row.select('td')]
all_data.append(tds)
for row in all_data:
print('{:<20} {:<20} {:<20} {:<20} {:<20} {:<20}'.format(*row))
Prints:
Begin Snap: 121525 03-Sep-20 01:30:07 167 10.4 6
End Snap: 121526 03-Sep-20 02:00:01 174 11.2 6
Elapsed: 29.90 (mins)
DB Time: 67.15 (mins)

PHP selenium find parent sibiling

I have below code and I want find second td value. How can I select text with <br/> in it?
<tr>
<td valign="top" style="width: 85px">
<span class="fieldtext">Address:</span>
</td>
<td valign="top">
Shaftesbury House, 1st floor
<br/>20 Tylney Road
<br/>Bromley
<br/>Greater London
<br/>BR1 2RL
<br/>United Kingdom
<br/>
</td>
<td style="width: 200px; vertical-align: top; text-align: right;" rowspan="2" />
</tr>
Assuming you want to find the td element by the Address: label defined previously, you can use the following-sibling axis:
//td[span = 'Address:']/following-sibling::td
Then, after locating the element, call getText() method:
$driver->findElement(WebDriverBy::xpath("//td[span = 'Address:']/following-sibling::td"))->getText();
Try as below :-
$driver->findElement(WebDriverBy::xpath('//tr/td[2]'))->getText();
Hope it helps....:)

Testing kendoGrid data using cucumber capybara

I'm attempting to write some cucumber/capybara tests to validate data in a KendoGrid UI component and am having some real trouble determining how to select and validate the data on the page.
I've found the basic tutorials and examples on utlizing cucumber/capybara with table data but it appears that KendoGrid utilizing a slightly different configuration of it's tables and data where 1.) there is no "id" to easily select the grid on the page and 2.) there are multiple tables (one for the header) and another for the actual data itself.
Here is an excerpt of my current kendoGrid data I want to check:
<div id="item_grid" data-role="grid" class="k-grid k-widget k-secondary" style="">
<div class="k-grid-header" style="padding-right: 17px;">
<div class="k-grid-header-wrap">
<table role="grid">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead>
<tr>
<th role="columnheader" data-field="ItemA" data-title="Item A" class="k-header" data-role="sortable">
<a class="k-link" href="#">Item A</a>
</th>
<th role="columnheader" data-field="ItemB" data-title="Item B" class="k-header" data-role="sortable">
<a class="k-link" href="#">Item B</a>
</th>
<th role="columnheader" data-field="ItemC" data-title="Item C" class="k-header" data-role="sortable">
<a class="k-link" href="#">Item C</a>
</th>
</tr>
</thead>
</table>
</div>
</div>
<div class="k-grid-content">
<table role="grid">
<colgroup>
<col>
<col>
<col>
</colgroup>
<tbody>
<tr data-uid="2c77ea57-50ea-474d-950a-8379b3690936" role="row">
<td role="gridcell">A</td>
<td role="gridcell">223.63</td>
<td role="gridcell">0</td>
</tr>
<tr class="k-alt" data-uid="979534bc-7dea-47e9-9471-088c5bffe5b5" role="row">
<td role="gridcell">B</td>
<td role="gridcell">223.63</td>
<td role="gridcell">180</td>
</tr>
<tr data-uid="4d4c31e7-4daf-44ad-b6c1-20ffdfde57c4" role="row">
<td role="gridcell">C</td>
<td role="gridcell">143.58</td>
<td role="gridcell">0</td>
</tr>
<tr class="k-alt" data-uid="8d315558-b014-4219-b21b-dbe52cc6dd18" role="row">
<td role="gridcell">D</td>
<td role="gridcell">143.58</td>
<td role="gridcell">180</td>
</tr>
</tbody>
</table>
</div>
</div>
Where is the best place to start for writing tests to cover this scenario?
I have done some additional playing with the Telerik Test Studio and testing this specific scenario in that application is extremely easy!
One approach would be to collect the table of data into a 2D array using:
data_rows = page.all(:css, 'div#item_grid div.k-grid-content tr')
data = data_rows.collect do |tr|
tr.all(:css, 'td').collect(&:text)
end
p data
#=> [["A", "223.63", "0"], ["B", "223.63", "180"], ["C", "143.58", "0"], ["D", "143.58", "180"]]
Then with the data (and assuming you know what data should be in the table), you can validate the data array:
# If you want to validate the entire table and row order matters:
expect(data).to eql([["A", "223.63", "0"], ["B", "223.63", "180"], ["C", "143.58", "0"], ["D", "143.58", "180"]])
# If you want to validate the entire table and row order does not matter:
expect(data).to match_array([["B", "223.63", "180"], ["A", "223.63", "0"], ["D", "143.58", "180"], ["C", "143.58", "0"]])
# If you want to validate a specific row exists:
expect(data).to include(["B", "223.63", "180"])