Find the first element matching conditions with preceding-sibling - selenium

My goal is to find the text contains in the first preceding-sibling which match.
My xpath is the following one : //*[#id='myid']/parent::td(/preceding-sibling::td/label/a)[1]
Here I want the text of the td/label/a the most close of the parent of //*[#id='myid'].
Can you tell me if you see any mistake please? Thank you a lot.
Update 1
This one is working //*[#id='myid']/parent::td/preceding-sibling::td[2]/label/a but it is not everytime 2, sometimes it can be 6 or whatever.
Update 2
Example 1
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>
<label>
<a>text</a>
</label>
</td>
<td></td>
<td>
<input id='myid'>
</td>
</tr>
</tobdy>
Example 2
<tbody>
<tr>
<td></td>
<td>
<label>
<a>text</a>
</label>
</td>
<td></td>
<td></td>
<td></td>
<td>
<input id='myid'>
</td>
</tr>
</tobdy>
Example 3
<tbody>
<tr>
<td></td>
<td>
<label>
<a>text</a>
</label>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>
<input id='myid'>
</td>
</tr>
</tobdy>

I don't think you'd need to go up to any ancestors; just go to the first preceding td (that contains a label/a)...
//input[#id='myid']/preceding::td[label/a][1]/label/a

You can use xpath below. Get first parent tr and it's child label/a
//input[#id='myid']/ancestor::tr[1]//label/a
One more way is to use union operator in xpath with all locators for all cases in order you need:
//input[#id='myid']/ancestor::td/preceding-sibling::td/label/a | //input[#id='myid']/ancestor::tr[1]/preceding-sibling::tr//label/a

Seems like table content are dynamic as i can see in your examples but <tbody> must be one for that table.
(//input[#id='myid']/preceding::td/label/a)[last()]

Related

Which could be the best Xpath to Get text when there is a text on both <td>

I have the follow xml which I need to scrape:
<div class="tab_product_details">
<table>
<tbody>
<tr>...</tr>
<tr>...</tr>
<tr>...</tr>
<tr>
<td> text 1 </tr>
<td> text 2 </tr>
</tr>
<tr>
<td colspan = "2">
<table>
<tbody>
<tr>
<td> Adjustment</td> this text i do not need it!
<td></td>
</tr>
<tr class="feature">
<td> text3</td>
<td> text4 </td>
</tr>
My actual xpath is the following:
text1 =response.xpath('//div[contains(#class,"tab_product_details")]//td[followingsibling::td[not(table)]]')
text2 = response.xpath('//div[contains(#class,"tab_product_details")]//td[2]')
But I continue to get the texts that have no pair
Any help very welcome, thanks!
If you want to get text if both cells (td) are not empty try
//div[#class = "tab_product_details"]//tr[count(./td[normalize-space()]) = 2]/td/text()[1]
//div[#class = "tab_product_details"]//tr[count(./td[normalize-space()]) = 2]/td/text()[2]

What are the currently supported CSS selectors available to VBA?

Back on May 19th 2021, I wrote this Q&A regarding recent (Apr-May-21) suspected changes to an interface in relation to mshtml.dll and late bound referencing. This is a part 2, if you will.
Previously, in questions such as this and this, I have remarked upon the lack of support for various CSS selectors with mshtml.dll, in particular regarding pseudo-classes. In the aforementioned questions, I highlighted that nth-child() and nth-of-type() were not implemented with respect to MSHTML.
Typically, as demonstrated here, not supported selector syntax can result in:
Run-time error '-2140143604 (8070000c)': Could not complete the
operation due to error 8070000c.
I expect some things to break as various versions/platforms are no longer supported in relation to Internet Explorer (IE) (which MSHTML is related to - see my this. What I did not expect
to find was a recent improvement in supported CSS selectors. Take the following example:
Option Explicit
''Required references:
'' Microsoft HTML Object Library
Public Sub CssTest()
Const URL = "https://books.toscrape.com/"
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Debug.Print html.querySelector("meta:nth-of-type(2)").outerHTML
End Sub
Prior to Apr-May'21, this would have errored out due to the use of non-implemented syntax.
Now, on my set-up, where I saw an update to mshtml.dll during early May (latest), I get the same result as had I run this via an automated Internet Explorer instance, where it was already supported:
<meta name="created" content="24th Jun 2016 09:29">
So, what are the currently supported CSS selectors available to VBA?
I have covered the 'why do we care?' in the previous Q&A so won't repeat here. I will however, re-state my set-up:
My set-up:
OS Name Microsoft Windows 10 Pro
Version 10.0.19042 Build 19042
System Type x64-based PC
Microsoft® Excel® 2019 MSO (16.0.13929.20206) 32-bit (Microsoft Office Professional Plus)
Version 2104 Build 13929.20373
mshtml.dll file 11.00.19041.985
ieframe.dll file 11.0.19041.964
Feedback:
As with the prior Q&A, any feedback on set-ups which do/do not see these changes I would appreciate. I will add feedback to this for others to be able to reference.
tl;dr;
There is much greater support for css selectors and for Element.querySelector (allowing for greater flexibility in chaining querySelector(All) calls. This enormously enhances the expressivity of the MSHTML class, in terms of CSS selectors, and brings it on par with Selenium Basic.
Motivation:
I have been wanting to write a list of supported selectors for some time, due to the lack of documentation on this in relation to VBA, and the trial and error nature of learning what does and doesn't work. This latest change has spurred me to do so, and include those libraries which currently support use of CSS selectors within them.
CAVEATS:
This is not exhaustive; it is pretty comprehensive.
Should you find any errors, particularly with respect to Selenium Basic, which I had to write from memory, please notify me and I will edit accordingly.
The recent changes, represented by shaded cells within the summary table (JSFiddle)| marked with ✔* , within simplified table below, are as they pertain to my set-up, at this point in time. Your mileage may vary e.g. CSS selectors were not supported at all < IE8.
Before and After:
Traditionally, the expressivity of CSS selectors within VBA was as follows, with respect to the libraries supporting them:
Selenium implementing, by far, the most CSS selectors.
Current state:
The current state of implemented selectors I believe to be as follows (sorry for image quality, even when you click to enlarge table - please see JSFiddle for clearest table view):
I include this as a simplified HTML insert as well, so you can click on hyperlinks. Please click the Run code snippet below the code insert, then the Full page link. Apologies, the table is large and I haven't even covered all conceivable selectors - only the main ones I consider likely to be frequently of use. Inserting a fancy table threw me over the body character limit so here we are. For a fancy table please see this JSFiddle - the newly supported are shaded.
<!DOCTYPE html>
<html>
<head>
<title>VBA: Valid CSS Selectors 2021-05-30</title>
</head>
<body>
<h1>VBA: Valid CSS Selectors 2021-05-30</h1>
<table>
<tr>
<td colspan="2">
Selectors Level 3 Specification
</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Pattern</td>
<td>Represents</td>
<td>Description</td>
<td>Level</td>
<td>Microsoft HTML Object Library (MSHTML)</td>
<td>Microsoft Internet Explorer Controls (SHDocVw)</td>
<td>Selenium Type Library (Selenium)</td>
<td>Remarks</td>
</tr>
<tr>
<td>*</td>
<td>any element</td>
<td>
Universal selector
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E</td>
<td>an element of type E</td>
<td>
Type selector
</td>
<td>1</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo]</td>
<td>an E element with a "foo" attribute</td>
<td>
Attribute selectors
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo="bar"]</td>
<td>an E element whose "foo" attribute value is exactly equal to "bar"</td>
<td>
Attribute selectors
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo~="bar"]</td>
<td>an E element whose "foo" attribute value is a list of whitespace-separated values, one of which is exactly equal to "bar"</td>
<td>
Attribute selectors
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo^="bar"]</td>
<td>an E element whose "foo" attribute value begins exactly with the string "bar"</td>
<td>
Attribute selectors
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo$="bar"]</td>
<td>an E element whose "foo" attribute value ends exactly with the string "bar"</td>
<td>
Attribute selectors
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo*="bar"]</td>
<td>an E element whose "foo" attribute value contains the substring "bar"</td>
<td>
Attribute selectors
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E[foo|="en"]</td>
<td>an E element whose "foo" attribute has a hyphen-separated list of values beginning (from the left) with "en"</td>
<td>
Attribute selectors
</td>
<td>2</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td> </td>
</tr>
<tr>
<td>E[attr operator value i]</td>
<td>value compared case-insensitively (ASCII range).</td>
<td>
Attribute selectors
</td>
<td>4</td>
<td>x</td>
<td>x</td>
<td>?</td>
<td>
i identifier
</td>
</tr>
<tr>
<td>E[attr operator value s]</td>
<td>value compared case-sensitively (ASCII range).</td>
<td>
Attribute selectors
</td>
<td>4</td>
<td>x</td>
<td>x</td>
<td>x</td>
<td>
s identifier
</td>
</tr>
<tr>
<td>E:root</td>
<td>an E element, root of the document</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td>HTML node only</td>
</tr>
<tr>
<td>E:nth-child(n)</td>
<td>an E element, the n-th child of its parent</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td>nth-child(odd) and (even) as well as nth-child(range) also supported</td>
</tr>
<tr>
<td>E:nth-last-child(n)</td>
<td>an E element, the n-th child of its parent, counting from the last one</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:nth-of-type(n)</td>
<td>an E element, the n-th sibling of its type</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:nth-last-of-type(n)</td>
<td>an E element, the n-th sibling of its type, counting from the last one</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:first-child</td>
<td>an E element, first child of its parent</td>
<td>
Structural pseudo-classes
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:last-child</td>
<td>an E element, last child of its parent</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:first-of-type</td>
<td>an E element, first sibling of its type</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:last-of-type</td>
<td>an E element, last sibling of its type</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:only-child</td>
<td>an E element, only child of its parent</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:only-of-type</td>
<td>an E element, only sibling of its type</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:empty</td>
<td>an E element that has no children (including text nodes)</td>
<td>
Structural pseudo-classes
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:link</td>
<td rowspan="2">an E element being the source anchor of a hyperlink of which the target is not yet visited (:link) or already visited (:visited)</td>
<td rowspan="2">
The link pseudo-classes
</td>
<td>1</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:visited</td>
<td>1</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E:not(s)</td>
<td>an E element that does not match simple selector s</td>
<td>
Negation pseudo-class
</td>
<td>3</td>
<td>✔*</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E F</td>
<td>an F element descendant of an E element</td>
<td>
Descendant combinator
</td>
<td>1</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E > F</td>
<td>an F element child of an E element</td>
<td>
Child combinator
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E + F</td>
<td>an F element immediately preceded by an E element</td>
<td>
Next-sibling combinator
</td>
<td>2</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>E ~ F</td>
<td>an F element preceded by an E element</td>
<td>
Subsequent-sibling combinator
</td>
<td>3</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td>foo, bar</td>
<td>foo, bar will match both <foo> and <bar> elements.</td>
<td>
Selector list
</td>
<td>1</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td> </td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>element.querySelector</td>
<td>Expanded element.querySelector</td>
<td>
Element.querySelector
</td>
<td>API</td>
<td>✔</td>
<td>✔</td>
<td>✔</td>
<td>Can now chain querySelector(All) calls on wider base node range</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lib info:</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>Microsoft HTML Object Library (MSHTML)</td>
<td>MS Internet Explorer Controls (SHDocVw)</td>
<td>Selenium Type Library (Chromedriver)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lib</td>
<td>mshtml.dll</td>
<td>ieframe.dll</td>
<td>selenium.dll</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>File Version</td>
<td>11.00.19041.985</td>
<td>11.0.19041.964</td>
<td>2.0.9.0</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Date</td>
<td>2021-05-12</td>
<td>2021-05-12</td>
<td>2016-03-02</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>
</body>
</html>
12 newly supported pseudo-classes and an expanded Element.querySelector:
If you run the above snippet, and view full page, you will see there are now, at least, 12 newly supported pseudo-classes supported, as well as mention of expanded Element.querySelector. Bam, kapow, ker-sploosh, shut the proverbial front door ... welcome to VBA CSS Canaan, Scraper's Shangri-la, Nerd Nirvana!
I think there may also have been interesting updates to ieframe.dll; the focus here is on recent mshtml.dll changes. You may wish to review the IE support under the Lifecyle announcements here and here, or search for Lifecycle FAQ - Internet Explorer and Microsoft Edge.
As the benefit of the expanded Element.querySelector() was not covered in the last Q&A, I will briefly mention it here. By expanded, I mean an increased number of elements which you can call querySelector on, such that you can chain .querySelector() i.e .querySelector(..).querySelector(..) and .querySelector(..).querySelectorAll(..).
Previously, this was largely not possible. As exemplified by this question. Typically, the workaround was to chain traditional methods onto the returned node e.g.
html.querySelector("body").getElementsByTagName("li"); this led to unsightly chaining and hard to follow, as well as limited, paths to target elements. Better, IMHO, was the idea of a surrogate MSHTML.HTMLDocument variable, which would carry the innerHTML of the current node returned by querySelector, and thus allow you to call querySelector(All) again; and thereby gain access to much faster matching, clearer syntax and greater versatility. Numerous examples of that approach here.
End Notes:
This is a document under revision. All feedback on improvements welcomed.
Thanks:
Finally, a big thanks to #SIM for running a test script of mine to examine this on a different set-up.

Finding second table using HTMLAgilityPack

I am trying to identify the second table using HTMLAgilityPack
<center>
<table>
<tr>
<td>0-A</td> <td>&nbsp| </td>
<td>B</td> <td>&nbsp| </td>
<td>C</td> <td>&nbsp| </td>
</tr>
</table>
</center>
<br><br>
<TABLE DIR=LTR BORDER>
<TR>
<TD DIR=LTR ALIGN=RIGHT><b>A</b></TD>
<TD DIR=LTR ALIGN=LEFT><b>B</b></TD>
<TD DIR=LTR ALIGN=LEFT><b>C</b></TD>
</TR>
</TABLE>
I have tried
Dim table = doc.DocumentNode.SelectSingleNode("//table[2]")
and that does not work. I have tried it as a capital and that does not work. If I put "//table[1]" I do find the first table. Is there a different way that I should do this? I'm doing this in VB.net
Additional information when I do
For Each x_table As HtmlNode In doc.DocumentNode.SelectNodes("//table")
it finds two tables and I can skip the first and work on the second, but is that the way it was designed to work?

how to check if a tag is the last one

I am using Selenium and Java to write a test. I have the DOM below:
<tbody>
<tr>
<th>Copy</th>
<th>Subfield</th>
<th>Subfield Border</th>
<th>Field</th>
<th>Field Border</th>
</tr>
<tr id="333877">
<td>
<input type="checkbox" checked="" class="copySubfieldBorderCheck"/>
</td>
<td>a</td>
<td class="s">No</td>
<td>c</td>
<td>Yes</td>
</tr>
<tr>
<th>as</th>
<th>er</th>
<th>df</th>
<th>xc</th>
<th>xc</th>
</tr>
<tr id="333877">
<td>
<input type="checkbox" checked="" class="copySubfieldBorderCheck"/>
</td>
<td>rt</td>
<td class="noBorderBoldRed">Yes</td>
<td>ff</td>
<td>sdf</td>
</tr>
I want to get the tr that has a td tag whose text is No and alos it LAST td tag's text is Yes
I am looking for something like this:
//tr[./td[text()='No'] and ./td[text()='Yes' and isLast()]]
An easy way would be to concat the 3rd and the last cell and then filter the text on NoYes:
//tr[concat(td[3], td[last()])='NoYes']

ng-repeat with tables not working

I am a fairly new web developer, just needed quick help with some view code.
I was looping through an object in my controller called "products". I was displaying all the data of each item fine before I wanted to organize it in a table.
Could anyone see the problem with my code? I'm a very weak front end designer, back end is my niche, so it could be a very simple error.
<tr ng-repeat="product in Ctrl.products">
<td><img ng-src= "{{product.image}}"></td>
<td>
<tr>
<td>Name:</td><td>{{product.name}}</td>
</tr>
<tr>
<td>Price:</td><td>{{product.price}}</td>
</tr>
<tr>
{{product.description}}
</tr>
</td>
</tr>
You can't insert a TR inside a TD. You need to insert a full table inside the TD to achieve what you want.
<table>
<tr>
<td>
<table><tr><td>...</td></tr></table>
</td>
</tr>
</table>
You can use rowspan properties:
<table>
<tbody ng-repeat="product in Ctrl.products">
<tr>
<td rowspan="2"><img ng-src= "{{product.image}}"></td>
<td>Name:</td><td>{{product.name}}</td>
<td rowspan="2">{{product.description}}</td>
</tr>
<tr>
<td>Price:</td><td>{{product.price}}</td>
</tr>
</tbody>
</table>