XPath - not able to find an element with children text - selenium

<a>
This is <var>Me</var> and That is <var> You</var>
</a>
I can find an element "a" which contains "This is" by following code:
//a[contains(text(),'This is')]
But I am not able to find element "a" which contains "This is Me and That is You".
//a[contains(text(),'This is Me and That is You')]
Is there a way to find an element with children text as well?

I am not sure if this what you need but you can use string() to get the result as required,
//a[string()='This is Me and That is You']
The caveat however will be that you need to have precised information about the String being used.
See working example here.

This also can be find using normalize-space() function of xpath which strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string as below :-
//a[normalize-space()='This is Me and That is You']

Related

How to write xpath for following example?

For example, I have div tag that has two attributes.
class='hello#123' text='321#he#321llo#321'
<div> class='hello#123' text='321#he#321llo#321'></div>
Here, I want to write xpath for both class and text attributes but numbers may change dynamically. ie., "hello#123" may become "345" when we reload. "321#he#321llo#321" may become "567#he#456llo#321".
Note: Need to write xpath in single line not separately.
Assuming that you have the (corrected) two-attribute-HTML
<div class='hello#123' text='321#he#321llo#321'>...</div>
you can select it using the following, for example:
Using the contains() function
//div[contains(#class,'hello') and contains(#text,'#he#')]
This is quite specific and only applicable if the "hello" is always split in the same way
Using the translate() function to mask everything except the chars for "hello"
//div[translate(#class,'#0123456789','')='hello' and translate(#text,'#0123456789','')='hello']
This removes all # chars and digits and checks if the remaining string is "hello"
I guess combining these two approaches you will be able to create your own XPath expression fitting your needs. The patterns you provided were not fully clear, so this may only approach a good enough solution.

How to find the xpath for an element by text containing three dots

I'm trying to find the xpath for an element that contains three dots instead of text. Once I expand the element by clicking on arrow in DOM model html it returns the text.
The following xpath returns nothing unfortunately:
//button[contains(text(),'Pending')]
What's the right solution?
The text content of that element is not a three dots but Pending (). You see the ... only because there is no room to present the text there until you expand the element. So, generally this //button[contains(text(),'Pending')] should work.
You can also try this XPath expression:
//button[contains(.,'Pending')]
It means "a button element with any attribute containing Pending value"
It is more general and should work.
To understand why //button[contains(text(),'Pending')] did not work see this post or this. There are more explanations about this.

Is there a way to find any elements by text in Capybara?

I want to find any element with a given text on my page, but when I pass it to find without an element it gives me back an error
find(material.attachment_filename) #material.attachment_filename is "01. pain killer.mp3"
But if i do:
find('a',text: material.attachment_filename)
It works fine, and the given error is:
Selenium::WebDriver::Error::InvalidSelectorError:
Given css selector expression "01. pain killer.mp3" is invalid: SyntaxError: An invalid or illegal string was specified
Capybara's find takes 3 arguments (a Capybara selector type, a locator string, and an options hash). If the selector type isn't specified it defaults to :css which means the locator string needs to be a CSS selector.
This means that find(material.attachment_filename) in your case is equivalent to
find(:css, "01. pain killer.mp3")
which will raise an error as you've seen because "01. pain killer.mp3" isn't valid CSS. If you want to find any element containing the text you could do something like
find('*', text: "01. pain killer.mp3")
which will find any element containing the text, however that's also going to find all the ancestor elements too since they also contain the text, So what you'd probably want is to use a regex to make sure the element contains only that content
find('*', /\A#{Regexp.escape(material.attachment_filename)}\z/)
which should be interpreted as
find('*', /\A01\. pain killer\.mp3\z/)
Note: That is going to be pretty slow if your page has anything more than simple content on it because it means transferring all the elements from selenium to capybara to check the text content.
A more performant solution would be to use XPath which has support for finding elements by text content (CSS does not)
find(:xpath, XPath.descendant[XPath.string.n.is(material.attachment_filename)]) #using the XPath library - contains (assuming Capybara.exact == false)
find(:xpath, XPath.descendant[XPath.string.n.is(material.attachment_filename)], exact: true) #using the XPath library - equals (you could also pass exact:false to force contains)
If the text won't contain XPath special characters (ex. apostrophes) that need escaping, you can use a string to define the XPath:
find(:xpath, ".//*[contains(., '#{material.attachment_filename}')]") #contains the text
find(:xpath, ".//*[text()='#{material.attachment_filename}']") #equals the text
If the element is actually a link you're looking for though then you would probably want to use
find_link("01. pain killer.mp3")
or
find(:link, "01. pain killer.mp3")

XPath: difference between dot and text()

My question is about specifics of using dot and text() in XPath. For example, following find_element lines returns same element:
driver.get('http://stackoverflow.com/')
driver.find_element_by_xpath('//a[text()="Ask Question"]')
driver.find_element_by_xpath('//a[.="Ask Question"]')
So what is the difference? What are the benefits and drawbacks of using . and text()?
There is a difference between . and text(), but this difference might not surface because of your input document.
If your input document looked like (the simplest document one can imagine given your XPath expressions)
Example 1
<html>
<a>Ask Question</a>
</html>
Then //a[text()="Ask Question"] and //a[.="Ask Question"] indeed return exactly the same result. But consider a different input document that looks like
Example 2
<html>
<a>Ask Question<other/>
</a>
</html>
where the a element also has a child element other that follows immediately after "Ask Question". Given this second input document, //a[text()="Ask Question"] still returns the a element, while //a[.="Ask Question"] does not return anything!
This is because the meaning of the two predicates (everything between [ and ]) is different. [text()="Ask Question"] actually means: return true if any of the text nodes of an element contains exactly the text "Ask Question". On the other hand, [.="Ask Question"] means: return true if the string value of an element is identical to "Ask Question".
In the XPath model, text inside XML elements can be partitioned into a number of text nodes if other elements interfere with the text, as in Example 2 above. There, the other element is between "Ask Question" and a newline character that also counts as text content.
To make an even clearer example, consider as an input document:
Example 3
<a>Ask Question<other/>more text</a>
Here, the a element actually contains two text nodes, "Ask Question" and "more text", since both are direct children of a. You can test this by running //a/text() on this document, which will return (individual results separated by ----):
Ask Question
-----------------------
more text
So, in such a scenario, text() returns a set of individual nodes, while . in a predicate evaluates to the string concatenation of all text nodes. Again, you can test this claim with the path expression //a[.='Ask Questionmore text'] which will successfully return the a element.
Finally, keep in mind that some XPath functions can only take one single string as an input. As LarsH has pointed out in the comments, if such an XPath function (e.g. contains()) is given a sequence of nodes, it will only process the first node and silently ignore the rest.
There is big difference between dot (".") and text() :-
The dot (".") in XPath is called the "context item expression" because it refers to the context item. This could be match with a node (such as an element, attribute, or text node) or an atomic value (such as a string, number, or boolean). While text() refers to match only element text which is in string form.
The dot (".") notation is the current node in the DOM. This is going to be an object of type Node while Using the XPath function text() to get the text for an element only gets the text up to the first inner element. If the text you are looking for is after the inner element you must use the current node to search for the string and not the XPath text() function.
For an example :-
<a href="something.html">
<img src="filename.gif">
link
</a>
Here if you want to find anchor a element by using text link, you need to use dot ("."). Because if you use //a[contains(.,'link')] it finds the anchor a element but if you use //a[contains(text(),'link')] the text() function does not seem to find it.
Hope it will help you..:)
enter image description here
The XPath text() function locates elements within a text node while dot (.) locate elements inside or outside a text node. In the image description screenshot, the XPath text() function will only locate Success in DOM Example 2. It will not find success in DOM Example 1 because it's located between the tags.
In addition, the text() function will not find success in DOM Example 3 because success does not have a direct relationship to the element . Here's a video demo explaining the difference between text() and dot (.) https://youtu.be/oi2Q7-0ZIBg

Removing hyperlink html tag from oracle sql result

I am writing as SQL query which might return HTML text also. HTML Tags are fine for me, because I want to show it formatted with the HTML tags in the front end. But I do not need links. I mean is there anyway I can strip off the hyper links only from the column. just the anchor tag. I am so bad in Regular Expressions, though i think that might be the solution for this. Any help!
This should work fine for links:
<a[^>]*>(.*?)<\/a>
Since you say you don't understand regular expressions, I might as well explain. The <a part is straightforward, the [^>]* will match anything up to the closing bracket, the bracket is just the bracket. (.*?) matches anything, regardless of length, empty links as well. The ? is required so that it becomes non-greedy, so it stops at the first closing tag. <\/a> matches the closing tag.
Edit: if you have spaces in between your tags, you can use <a[^>]*>((?:.|\s)*?)<\/a>. Notice I added the (?:.|\s)*? in place of .*?. The .|\s means match any character or space, the ?: indicate a non capturing group, since we don't care which particular character was matched.