Using Playwright how to select next td of a given inner text - html-table

This one has me baffled. Basically, using playwright, I'm trying to verify values on a table. Given, "Cat", I should see if "Dog" exists, or if given "Space", I should see if "Rocket" exists.
I tried
const planet = (await page.locator('tr:has(td.col_d:has-text("Saturn")) >> a')).innerText();
but that didn't work. I thought of grabbing all of the innerText on all the , sticking it into an array, then looking for where the initial text is in the Array (Cat) and seeing if the text in the next index is correct (i.e. Dog). Isn't there an easier way I don't know of yet?
<tbody>
<tr>
<td class="labelCol"> Title A < /td>
<td class="dataCol col02"><span>
<a href="/00578000000VqXe" title="POS ""</a>
Data A
</td>
<td class="labelCol">Title X</td>
<td class="dataCol">Data X</td>
</tr>
<tr>
<td class="labelCol" > Cat < /td>
<td class="dataCol col02">Dog/td >
<td class="labelCol" > Saturn < /td>
<td class="dataCol">Jupiter/td >
</tr>
<tr >
<td class="labelCol" > Blue < /td>
<td class="dataCol col02">Red</td >
<td class="labelCol" > Reason < /td>
<td class="dataCol">Space</td > </tr>
Rocket
</td>
</tr >
</tbody>

I don't particularly like this, but you could just assert that 'Dog' is to the right of 'Cat' and 'Rocket' is to the right of 'Space like this if you don't care if they are in the next cell or not.
await expect(page.locator(`td:right-of(:text-is("Cat"))`).first()).toHaveText('Dog');
await expect(page.locator(`td:right-of(:text-is("Space"))`).first()).toHaveText('Rocket');
Or if Dog needs to immediately follow cat, you could do something like this:
const space = page.locator(`td:text-is("Cat")`);
await expect(space.locator(`//following-sibling::td`).first()).toHaveText('Dog');

Related

XPATH to use preceding and following sibling in a single statement

I would like to scrape name, address informations between tag contains defendent text and another tag,
My HTML structure is:
<hr>
<H5>Defendant/Respondent Information</H5>
<span class="InfoChargeStatement">(Each Defendant/Respondent is displayed below)</span>
<table>
<tr>
<td><span class="FirstColumnPrompt">Party Type:</span></td><td><span class="Value">Defendant</span><span class="Prompt">Party No.:</span><span class="Value">1</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Name:</span></td><td><span class="Value">Name 1</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Address:</span></td><td><span class="Value">Addr 1</span></td>
</tr>
<tr>
<td><span class="FirstColumnPrompt">City:</span></td><td><span class="Value">city1</span><span class="Prompt">State:</span><span class="Value">aa</span><span class="Prompt">Zip Code:</span><span class="Value">Zip1</span></td>
</tr>
</table>
<hr>
<table>
<tr>
<td><span class="FirstColumnPrompt">Party Type:</span></td><td><span class="Value">Defendant</span><span class="Prompt">Party No.:</span><span class="Value">2</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Name:</span></td><td><span class="Value">Name 2</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Address:</span></td><td><span class="Value">Addr2</span></td>
</tr>
<tr>
<td><span class="FirstColumnPrompt">City:</span></td><td><span class="Value">City2</span><span class="Prompt">State:</span><span class="Value">st2</span><span class="Prompt">Zip Code:</span><span class="Value">zip2</span></td>
</tr>
</table>
<hr>
<H5>Related Persons Information</H5>
<span class="InfoChargeStatement">(Each Related person is displayed below)</span>
<table>
<tr>
<td><span class="FirstColumnPrompt">Name:</span></td><td><span class="Value">Unwanted Name</span></td>
</tr>
</table>
<table>
<tr>
<td><span class="FirstColumnPrompt">Address:</span></td><td><span class="Value">un addr</span></td>
</tr>
<tr>
<td><span class="FirstColumnPrompt">City:</span></td><td><span class="Value">Unwanted City</span><span class="Prompt">State:</span><span class="Value">Unwanted city</span><span class="Prompt">Zip Code:</span><span class="Value">12345</span></td>
</tr>
</table>
<table></table>
<hr>
My current XPATH capturing the first occurence of Name and address properly, but if need to extract the multiple occurences, it also scrape the information from the unwanted h5 tags.
My current XPATH is,
"//*[contains(text(),'Defendant')]//following-sibling::table//span[text()='Name:' or text()='Business or Organization Name:']/ancestor-or-self::td/following-sibling::td//text()")
I tried including preceding sibling and following sibling but nothing gives my expected output,
My current output is..
names - [
Name1,
Name2
Unwanted Name,
]
Expected output is,
[
Name1
Name2
]
Kindly help.
try this:
"//H5[contains(text(),'Defendant')]/following-sibling::table[not(preceding-sibling::H5[not(contains(text(),'Defendant'))])]/tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"
It first selects the table that has not a preceding-sibling::h5 with text() that not contains 'Defendant' and than
selects from the correct table the tr where the first td meets your requirements and selects the second td
No need for double slashes which is bad for performance
EDIT 1
Since there are more preceding-sibling::h5 than the example shows, this XPath will deal with that:
"//H5[contains(text(),'Defendant')]/following-sibling::table[preceding-sibling::H5[1][contains(text(),'Defendant')]]//tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"
This will only select those tables that have as there first preceding-sibling::h5 the same h5 as we were interested in
EDIT 2
Actually now the first h5 select is redundant. This XPath will do:
"//table[preceding-sibling::H5[1][contains(text(),'Defendant')]]//tr[td[1][span[text()[.='Name:' ]]]]/td[2]/span/text()"

How can I get the XPATH of elements under all rows of the same rowspan?

Test data:
<table>
<tbody>
<tr>
<td id="mainfield:1" rowspan="3">A1</td>
<td ><span class="searching_for_this"> AA1</span></td>
<td ><span class="not_searching_for_this">AA2</span></td>
</tr>
<tr>
<td ><span class="searching_for_this"> AA3 </span></td>
<td ><span class="not_searching_for_this">AA3 </span></td>
</tr>
<tr>
<td ><span class="searching_for_this"> AA1 </span></td>
<td ><span class="not_searching_for_this">AA4 </span></td>
</tr>
<tr>
<td id="main_field:2" rowspan="3">B1</td>
<td ><span class="searching_for_this"> BB1</span></td>
<td ><span class="not_searching_for_this">BB2</span></td>
</tr>
<tr>
<td ><span class="searching_for_this"> AA1 </span></td>
<td ><span class="not_searching_for_this">BB3 </span></td>
</tr>
<tr>
<td ><span class="searching_for_this"> BB2 </span></td>
<td ><span class="not_searching_for_this">BB3 </span></td>
</tr>
</tbody>
</table>
Premises
I know the content of the row and column where 3 rowspan is located, (in this example A1)
I now the content of one element of the class I want to look for, in this scenario AA1 and searching_for_this
I want to get the rows (tr) of AA1 under the rowspan of A1. So the result would be the first and third row
First try
So in a single row scenario this would be something like:
Main row: //tr[td[contains(text(), 'A1')]]
Search in the children from the row (relative search .//):
.//tr[td/span[class=searching_for_this and contains(text(), 'AA1')]]
Problem
With this rowspan scenario I don't know how can I get all elements taking into account "next rows" after colspan without including the rows outside the colspan (B1).
Update
After the last answer I tried to build from there, but I'm still not able to get the rows under the main row span row to build the query combine with the main row. This was my try
$x("//tr[ (preceding-sibling::tr[ .//td[ contains(#id, 'main_field')]])[1][.//td[contains(text(),'A1')]] ]")
I tried to get all tr that have a preceding sibling tr with the given known partial id, take the first one of that list with [0] (direct sibling with the given id) then filter with the content A1. But I do not get anything.
If you want to do this in a single expression in XPath 1.0, it gets a bit complex. You could approach it like this, building it piece by piece.
As a starting point, here's how you select your "main row":
//tr[td[contains(text(),'A1')]]
Building on this, you can select the following rows within the same rowspan:
//tr[td[contains(text(),'A1')]]
/following-sibling::tr[
position() < number(preceding-sibling::tr/td[contains(text(),'A1')]/#rowspan)
]
However, this does not include the "main row" itself. To get it also, you can take a union of both of the above with the union operator (|), so you get both the main row and the following rows that fall within the the same rowspan:
(//tr[td[contains(text(),'A1')]]
|//tr[td[contains(text(),'A1')]]
/following-sibling::tr[
position() < number(preceding-sibling::tr/td[contains(text(),'A1')]/#rowspan)
]
)
Now that you have the set of rows of interest at hand, you can further narrow down to the rows that you want within that set, e.g.:
(//tr[td[contains(text(),'A1')]]
|//tr[td[contains(text(),'A1')]]
/following-sibling::tr[
position() < number(preceding-sibling::tr/td[contains(text(),'A1')]/#rowspan)
]
)[td/span[#class='searching_for_this' and contains(text(), 'AA1')]]

iterate with v-for and data-attribute

I have a vuejs-datatable, and now I want to have an option-column with edit- / delete-links.
This is the table-body which gets iterated from the function getRows():
<tbody>
<tr v-for="(row, idr) in get_rows()" v-bind:key="idr">
<td>{{row.id}}</td>
<td>{{row.email}}</td>
<td>
<b-icon-pencil-square></b-icon-pencil-square>
<b-icon-trash></b-icon-trash>
</td>
</tr>
</tbody>
Now the td with the {{row.id}} and {{row.email}} are fine. However the :data-id="row.id" displays only the id of the first entry. Links in every row in my table have the same data-id. I do not understand why this is happening and what am I doing wrong.
Use code below (notice, it's not using data-id):
<tbody>
<tr v-for="(row, idr) in get_rows()" v-bind:key="idr">
<td>{{row.id}}</td>
<td>{{row.email}}</td>
<td>
<b-icon-pencil-square></b-icon-pencil-square>
<b-icon-trash></b-icon-trash>
</td>
</tr>
</tbody>

Extracting data from table with Scrapy

I have this table
<table class="specs-table">
<tbody>
<tr>
<td colspan="2" class="group">Sumary</td>
</tr>
<tr>
<td class="specs-left">Name</td>
<td class="specs-right">ROG GL552JX </td>
</tr>
<tr class="noborder-bottom">
<td class="specs-left">Category</td>
<td class="specs-right">Gaming </td>
</tr>
<tr>
<td colspan="2" class="group">Technical Details</td>
</tr>
<tr>
<td class="specs-left">Name</td>
<td class="specs-right">Asus 555 </td>
</tr>
<tr>
<td class="specs-left">Resolution </td>
<td class="specs-right">1920 x 1080 pixels </td>
</tr>
<tr class="noborder-bottom">
<td class="specs-left"> Processor </td>
<td class="specs-right"> 2.1 GHz </td>
</tr>
</tbody>
</table>
From this table I want my Scrapy to find the first occurrence of the text "Name" and to copy the value from the next cell (In this case "ROG GL552JX") and find the next occurrence of the text "Name" and copy the value "Asus 555".
The result I need:
'Name': [u'Asus 555'],
'Name': [u'Asus 555'],
The problem is that in this table I have two occurrences of the text "Name" and Scrapy copies the value of both occurrences.
My result is:
'Name': [u'ROG GL552JX', u'Asus 555'],
My bot:
def parse(self, response):
next_selector = response.xpath('//*[#aria-label="Pagina urmatoare"]//#href')
for url in next_selector.extract():
yield Request(urlparse.urljoin(response.url, url))
item_selector = response.xpath('//*[contains(#class, "pb-name")]//#href')
for url in item_selector.extract():
yield Request(urlparse.urljoin(response.url, url), callback=self.parse_item)
def parse_item(self, response):
l = ItemLoader(item = PcgItem(), response=response, )
l.add_xpath('Name', 'Name', '//tr/td[contains(text(), "Name")]/following-sibling::td/text()',', MapCompose(unicode.strip, unicode.title))
return l.load_item()
How can I solve this problem?
Thank you
if you need an item per Name, then you should do something like:
for sel in response.xpath('//tr/td[contains(text(), "Name")]/following-sibling::td/text()'):
l = ItemLoader(...)
l.add_value('Name', sel.extract_first())
...
yield l.load_item()
Now if you want it all inside an item, I would recommend to leave it as it is (a list) because an scrapy.Item is a dictionary, so you won't be able to have 2 Name as keys.

can we use selenium when such a table is not having proper html like shown below?

Here is the table that I am using to get the table row element that has specific element such as the href that has 'Harvest' in text and also checking if text 'running' exists in the same table row.
<table id="execTable" class="tableHistory jobtable translucent">
<colgroup>
<col class="execid">
<col class="titlecol">
</colgroup>
<tbody>
<tr>
<th>Id</th>
<th>Name</th>
</tr>
</tbody>
<tr id="8571">
<td>8571</td>
<td class="titlecol">
<div id="hitdiv-8571" class="arrow"></div>
Harvest
</td>
<td>09-03-2015 09:45:04</td>
<td>-</td>
<td>2m 6s</td>
<td>running</td>
<td>view/restart</td>
</tr>
<tr id="8571-child" class="childRow" style="display: none;"></tr>
<tr id="8566">
<td>8566</td>
<td class="titlecol">
<div id="hitdiv-8566" class="arrow"></div>
mk
</td>
<td>09-03-2015 03:30:00</td>
<td>09-03-2015 04:16:50</td>
<td>46m 50s</td>
<td>succeeded</td>
<td>view/restart</td>
</tr>
<tr id="8555-child" class="childRow" style="display: none;"></tr>
</table>
I am not able to get the TRs.
WebElement table = driver.findElement(By.id("execTable"));
List<WebElement> trows = table.findElements(By.tagName("tr"));
List<WebElement> all = driver.findElements(By.xpath(".//*[#id='execTable']/*"));
for (WebElement a : all) {
if(a.getTagName().equalsIgnoreCase("tr")) { ....}
}
I was able to get the above code working. Thank you!