Find the text from a row and column in a table - beautifulsoup

I have a table in a html like below, I need to extract for example the End Snap value under Snap Time column which is 03-Sep-20 02:00:01
<table border="0" width="600" class="tdiff" summary="for snapshot information">
<tr><th class="awrnobg" scope="col"></th><th class="awrbg" scope="col">Snap Id</th><th class="awrbg" scope="col">Snap Time</th><th class="awrbg" scope="col">Sessions</th><th class="awrbg" scope="col">Cursors/Session</th><th class="awrbg" scope="col">Instances</th></tr>
<tr><td scope="row" class='awrnc'>Begin Snap:</td><td align="right" class='awrnc'>121525</td><td align="center" class='awrnc'>03-Sep-20 01:30:07</td><td align="right" class='awrnc'>167</td><td align="right" class='awrnc'> 10.4</td><td align="right" class='awrnc'>6</td></tr>
<tr><td scope="row" class='awrc'>End Snap:</td><td align="right" class='awrc'>121526</td><td align="center" class='awrc'>03-Sep-20 02:00:01</td><td align="right" class='awrc'>174</td><td align="right" class='awrc'> 11.2</td><td align="right" class='awrc'>6</td></tr>
<tr><td scope="row" class='awrnc'>Elapsed:</td><td class='awrnc'> </td><td align="center" class='awrnc'> 29.90 (mins)</td><td class='awrnc'> </td><td class='awrnc'> </td><td class='awrnc'> </td></tr>
<tr><td scope="row" class='awrc'>DB Time:</td><td class='awrc'> </td><td align="center" class='awrc'> 67.15 (mins)</td><td class='awrc'> </td><td class='awrc'> </td><td class='awrc'> </td></tr>
</table>
the required value is requested in the format : columnname_row name :
Snap Id_Begin Snap
Snap Id_End Snap
Snap Time_Begin Snap
Snap Time_End Snap
which goes into a variable called namesplit.
I am trying to first pull the column number,and row number to then print the required value :
dbii = soup.find_all("table", attrs={"summary": "for snapshot information"})
for tables in dbii:
vcols=tables.findChildren('th')
#print(type(rows)) #bs4.element.ResultSet
#print(rows)
#print(ti)
ii=0
for value ivcols:
#print(value.strip)
#print(value.string)
#print(type(value)) # bs4.element.Tag
if(value.text!=None and value.text.lower() == namesplit[0].lower()): # this matches the column name string
print("match")
col_no=ii
table_no=ti
else:
ii+=1
ti+=1
print(table_no,col_no,namesplit[1]) # correctly gives table 0, column as 1 or 2
print("abc")
#print(dbii[table_no])
#print(type(dbii[table_no]))
# Find Row number.
drow=dbii[table_no].find_all(scope = 'row' )
j=0
print(row_no)
for value in drow:
#print("row",j,"asdasdsad:",value,value.text)
if(value.text!=None and namesplit[1].lower() in value.text.lower() ):
row_no=j
j+=1
print(row_no) # correctly picks the td row as 0(for begin) or 1 (for end)
# We have Table no , column number, Row_no .. get the corresponding value.
fvalue=dbii[table_no].find_all(tr)[row_no] ## this doesnt work. as its a tag.
print(type(fvalue)) ## tag ??
print(fvalue)

To print the value from End Snap row and third column, you can do:
from bs4 import BeautifulSoup
html_text = '''
<table border="0" width="600" class="tdiff" summary="for snapshot information">
<tr><th class="awrnobg" scope="col"></th><th class="awrbg" scope="col">Snap Id</th><th class="awrbg" scope="col">Snap Time</th><th class="awrbg" scope="col">Sessions</th><th class="awrbg" scope="col">Cursors/Session</th><th class="awrbg" scope="col">Instances</th></tr>
<tr><td scope="row" class='awrnc'>Begin Snap:</td><td align="right" class='awrnc'>121525</td><td align="center" class='awrnc'>03-Sep-20 01:30:07</td><td align="right" class='awrnc'>167</td><td align="right" class='awrnc'> 10.4</td><td align="right" class='awrnc'>6</td></tr>
<tr><td scope="row" class='awrc'>End Snap:</td><td align="right" class='awrc'>121526</td><td align="center" class='awrc'>03-Sep-20 02:00:01</td><td align="right" class='awrc'>174</td><td align="right" class='awrc'> 11.2</td><td align="right" class='awrc'>6</td></tr>
<tr><td scope="row" class='awrnc'>Elapsed:</td><td class='awrnc'> </td><td align="center" class='awrnc'> 29.90 (mins)</td><td class='awrnc'> </td><td class='awrnc'> </td><td class='awrnc'> </td></tr>
<tr><td scope="row" class='awrc'>DB Time:</td><td class='awrc'> </td><td align="center" class='awrc'> 67.15 (mins)</td><td class='awrc'> </td><td class='awrc'> </td><td class='awrc'> </td></tr>
</table>
'''
soup = BeautifulSoup(html_text, 'html.parser')
print(soup.select_one('tr:has(td:contains("End Snap:")) td:nth-child(3)').text)
Prints:
03-Sep-20 02:00:01
To get all values, you can do:
all_data = []
for row in soup.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in row.select('td')]
all_data.append(tds)
for row in all_data:
print('{:<20} {:<20} {:<20} {:<20} {:<20} {:<20}'.format(*row))
Prints:
Begin Snap: 121525 03-Sep-20 01:30:07 167 10.4 6
End Snap: 121526 03-Sep-20 02:00:01 174 11.2 6
Elapsed: 29.90 (mins)
DB Time: 67.15 (mins)

Related

v-for duplicating th of tr same code is working fine for other objects

I have seen few solutions online but It did not solve my problem. I am getting JSON object in the response.
<!-- Show Negativita Check Azienda -->
<table class="divide-y divide-gray-200 table-fixed w-full mt-4" v-if="showTableAzienda" v-for="item in impreses">
<thead class="bg-gray-900">
<tr>
<th>Codice Fiscale</th>
<th>Flag Domande</th>
<th>Flag Pregiudizievoli</th>
<th>Flag Procedure</th>
<th>Flag Protesti</th>
<th>Data Evasione</th>
</tr>
</thead>
<tbody class="text-center py-6" >
<tr>
<td>{{item.codice_fiscale}}</td>
<td>{{item.flagDomande}}</td>
<td>{{item.flagPregiudizievoli}}</td>
<td>{{item.flagProcedure}}</td>
<td>{{item.flagProtesti}}</td>
<td>{{item.dataEvasione}}</td>
</tr>
</tbody>
</table>
Here is JSON response
{
"codice_fiscale":"CLLLCA82R69D960T",
"flagDomande":"N",
"flagPregiudizievoli":"N",
"flagProcedure":"N",
"flagProtesti":"N",
"dataEvasione":"2021-11-04"
}
because the elements in the object are six. it generates th for six times with no output. if I print {{impreses.codice_fiscale}} then it shows the output. I am not able to understand behavior.
EDIT
Second Question
{"EventiNegativiPersona":
{"InfoPersona":
{"Nominativo":
{"#attributes":{"cognome":"","nome":""}},
"CodiceFiscale":"CLLLCA82R69D960T"},
"ElencoProtesti":{"#attributes":
{"flagPresenza":"N"}},"ElencoPregiudizievoli":
{"#attributes":{"flagPresenza":"N"}}}}
I would like to show these but {{item.EventiNegativiPersona.#parameters.so-on}} does not work because of #parameters. How can i show this?
Based on the response object shown in your question you could move the v-for to the td tag :
<table class="..." v-if="showTableAzienda" >
<thead class="bg-gray-900">
<tr>
<th>Codice Fiscale</th>
<th>Flag Domande</th>
<th>Flag Pregiudizievoli</th>
<th>Flag Procedure</th>
<th>Flag Protesti</th>
<th>Data Evasione</th>
</tr>
</thead>
<tbody class="text-center py-6" >
<tr>
<td v-for="item in impreses">{{item}}</td>
</tr>
</tbody>
</table>

Storing information from td tags with a specific width, in python

I am trying to store all the information from the td tags that have width="82" or maybe there is a more efficient method.
<a name="AAKER"> </a>
<table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b>
<small>(<a href="http://google.com">Soundex
A260</a>)
— <i>See also</i>
ACKER,
KEAR,
TAAKE.
</small>
</td></tr></tbody></table><br clear="all">
<table align="left" cellpadding="5">
<tbody><tr><td width="82" align="right" valign="top"> </td><td valign="top">
<img src="rd.gif" width="13" height="13">
<b><a name="954.35.65">Aaker, Casper Drengman</a> (b.1883)</b>
— also known as
<b>Casper D. Aaker</b> — of Minot,
WardCounty , N.Dak. Born in Ridgeway,
Winneshiek County , Iowa, August,
1883. Republican.
Lawyer; organizer, Trinity
Hospital,
1922; delegate to Republican National Convention from North Dakota.
<table width="100%" align="left">
<tbody>
<tr><td width="20"> </td>
<td width="26" valign="top"><img src="hand.gif" width="26" height="17"></td>
<td valign="top">
<span style="font-size:8pt;"><i>Relatives:</i>
Son of Drengman Aaker and Christine (Ellefson) Aaker; married,
December 15, 1914,
to Leda Mansfield.</span>
</td>
</tr>
</tbody>
</table>
</td></tr>
<tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="949.93.45">Aaker, H. H.</a></b> — of
Norman County
, Minn. Prohibition candidate for
secretary of state of Minnesota
, 1892.
Burial location unknown.
</td></tr>
</tbody>
</table><br clear="all"><br>
<a name="AALL"> </a>
<table border="" width="100%" cellpadding="5">
<tbody><tr><td bgcolor="#FFFFFF"><b>AALL</b> <small>(
SoundexA400
)— <i>See also</i>
AHL,
AL,
ALL,
</small>
</td></tr>
</tbody></table><br clear="all">
<tbody><tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="961.32.34">Aamodt, Gary</a></b> — of Madison,
Dane County, Wis.
Democrat. Delegate to Democratic National Convention from Wisconsin,
1976. Still living as of 1976.
</td></tr>
<tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="030.75.75">Aamodt, Marjorie M.</a></b> —
Democrat. Candidate for
<a href="http://google.com">Pennsylvania
state house of representatives</a> 13th District, 1980.
Female.
Still living as of 1980.
</td>
</tr>
</tbody></table><br clear="all"><br>
So far I have tried defining an object:
ta = driver.find_element_by_tag_name('tbody').get_attribute('innerHTML')
pd.read_html(ta)
But I wish to have all pd.read_html(ta)[i] stored in a dataframe ignoring the table width ="100"
You can .extract() the tables with widht="100% from the soup and then get all rows.
For example (txt contains your HTML snippet from the question):
soup = BeautifulSoup(txt, 'html.parser')
for t in soup.select('table[width="100%"]'):
t.extract()
all_data = []
for row in soup.select('tr'):
name, desc = row.get_text(strip=True, separator=' ').split('—', maxsplit=1)
all_data.append([name, desc.strip()])
df = pd.DataFrame(all_data, columns=['name', 'description'])
print(df)
df.to_csv('data.csv')
Prints:
name description
0 Aaker, Casper Drengman (b.1883) also known as Casper D. Aaker — of Minot, Ward...
1 Aaker, H. H. of Norman County , Minn. Prohibition candidate...
2 Aamodt, Gary of Madison, Dane County , Wis.\n Democr...
3 Aamodt, Marjorie M. Democrat. Candidate for Pennsylvania\n ...
And saves data.csv (screenshot from LibreOffice):

DOMPDF 0.8.3 how split long vertical table

I have table which have more rows:
This table start on new page and some rows are hidden (are not visible on next page).
I want to place table on previous page and hidden visible display to next page. I use DOMPDF 0.8.3.
I tried:
<table style"page-break-inside: auto;">
<tr style="page-break: auto">
<td>abc</td>
<td>cde</td>
</tr>
<tr style="page-break: auto">
<td>abc</td>
<td>cde</td>
</tr>
<tr style="page-break: auto">
<td>abc</td>
<td>cde</td>
</tr>
</table>
Do not know how to solve this, please?

can we use selenium when such a table is not having proper html like shown below?

Here is the table that I am using to get the table row element that has specific element such as the href that has 'Harvest' in text and also checking if text 'running' exists in the same table row.
<table id="execTable" class="tableHistory jobtable translucent">
<colgroup>
<col class="execid">
<col class="titlecol">
</colgroup>
<tbody>
<tr>
<th>Id</th>
<th>Name</th>
</tr>
</tbody>
<tr id="8571">
<td>8571</td>
<td class="titlecol">
<div id="hitdiv-8571" class="arrow"></div>
Harvest
</td>
<td>09-03-2015 09:45:04</td>
<td>-</td>
<td>2m 6s</td>
<td>running</td>
<td>view/restart</td>
</tr>
<tr id="8571-child" class="childRow" style="display: none;"></tr>
<tr id="8566">
<td>8566</td>
<td class="titlecol">
<div id="hitdiv-8566" class="arrow"></div>
mk
</td>
<td>09-03-2015 03:30:00</td>
<td>09-03-2015 04:16:50</td>
<td>46m 50s</td>
<td>succeeded</td>
<td>view/restart</td>
</tr>
<tr id="8555-child" class="childRow" style="display: none;"></tr>
</table>
I am not able to get the TRs.
WebElement table = driver.findElement(By.id("execTable"));
List<WebElement> trows = table.findElements(By.tagName("tr"));
List<WebElement> all = driver.findElements(By.xpath(".//*[#id='execTable']/*"));
for (WebElement a : all) {
if(a.getTagName().equalsIgnoreCase("tr")) { ....}
}
I was able to get the above code working. Thank you!

Testing kendoGrid data using cucumber capybara

I'm attempting to write some cucumber/capybara tests to validate data in a KendoGrid UI component and am having some real trouble determining how to select and validate the data on the page.
I've found the basic tutorials and examples on utlizing cucumber/capybara with table data but it appears that KendoGrid utilizing a slightly different configuration of it's tables and data where 1.) there is no "id" to easily select the grid on the page and 2.) there are multiple tables (one for the header) and another for the actual data itself.
Here is an excerpt of my current kendoGrid data I want to check:
<div id="item_grid" data-role="grid" class="k-grid k-widget k-secondary" style="">
<div class="k-grid-header" style="padding-right: 17px;">
<div class="k-grid-header-wrap">
<table role="grid">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead>
<tr>
<th role="columnheader" data-field="ItemA" data-title="Item A" class="k-header" data-role="sortable">
<a class="k-link" href="#">Item A</a>
</th>
<th role="columnheader" data-field="ItemB" data-title="Item B" class="k-header" data-role="sortable">
<a class="k-link" href="#">Item B</a>
</th>
<th role="columnheader" data-field="ItemC" data-title="Item C" class="k-header" data-role="sortable">
<a class="k-link" href="#">Item C</a>
</th>
</tr>
</thead>
</table>
</div>
</div>
<div class="k-grid-content">
<table role="grid">
<colgroup>
<col>
<col>
<col>
</colgroup>
<tbody>
<tr data-uid="2c77ea57-50ea-474d-950a-8379b3690936" role="row">
<td role="gridcell">A</td>
<td role="gridcell">223.63</td>
<td role="gridcell">0</td>
</tr>
<tr class="k-alt" data-uid="979534bc-7dea-47e9-9471-088c5bffe5b5" role="row">
<td role="gridcell">B</td>
<td role="gridcell">223.63</td>
<td role="gridcell">180</td>
</tr>
<tr data-uid="4d4c31e7-4daf-44ad-b6c1-20ffdfde57c4" role="row">
<td role="gridcell">C</td>
<td role="gridcell">143.58</td>
<td role="gridcell">0</td>
</tr>
<tr class="k-alt" data-uid="8d315558-b014-4219-b21b-dbe52cc6dd18" role="row">
<td role="gridcell">D</td>
<td role="gridcell">143.58</td>
<td role="gridcell">180</td>
</tr>
</tbody>
</table>
</div>
</div>
Where is the best place to start for writing tests to cover this scenario?
I have done some additional playing with the Telerik Test Studio and testing this specific scenario in that application is extremely easy!
One approach would be to collect the table of data into a 2D array using:
data_rows = page.all(:css, 'div#item_grid div.k-grid-content tr')
data = data_rows.collect do |tr|
tr.all(:css, 'td').collect(&:text)
end
p data
#=> [["A", "223.63", "0"], ["B", "223.63", "180"], ["C", "143.58", "0"], ["D", "143.58", "180"]]
Then with the data (and assuming you know what data should be in the table), you can validate the data array:
# If you want to validate the entire table and row order matters:
expect(data).to eql([["A", "223.63", "0"], ["B", "223.63", "180"], ["C", "143.58", "0"], ["D", "143.58", "180"]])
# If you want to validate the entire table and row order does not matter:
expect(data).to match_array([["B", "223.63", "180"], ["A", "223.63", "0"], ["D", "143.58", "180"], ["C", "143.58", "0"]])
# If you want to validate a specific row exists:
expect(data).to include(["B", "223.63", "180"])