Selenium Python, extract text from node and ALL child nodes - selenium

I have the opposite problem described here. I can't get the text more than one layer deep.
HTML is structured in the following manner:
<span class="data">
<p>This text is extracted just fine.</p>
<p>And so is this.</p>
<p>
And this.
<div>
<p>But this text is not extracted.</p>
</div>
</p>
<div>
<p>And neither is this.</p>
</div>
</span>
My Python code looks something like this:
el.find_element_by_xpath(".//span[contains(#class, 'data')]").text

Try the same with child elements:
print(el.find_element_by_xpath(".//span[contains(#class, 'data')]").text)
print(el.find_element_by_xpath(".//span[contains(#class, 'data')]/div").text)
print(el.find_element_by_xpath(".//span[contains(#class, 'data')]/p").text)

Not sure what's the referred el in your original post. But able to get all the text using the below.
driver.find_element_by_xpath("//span[#class='data']").text
Output:
'This text is extracted just fine.\nAnd so is this.\nAnd this.\nBut this text is not extracted.\nAnd neither is this.'

Instead of relying on WebElement.text property consider querying innerText property
Consider using Explicit Wait as it will make your test more robust and reliable in case if the element you're looking for is loaded by i.e. AJAX call
Assuming all above:
print(WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//span[#class='data']"))).get_attribute("innerText"))
Demo:

Related

Click on parent element based on two conditions in child elements Selenium Driver

Using Selenium 4.8 in .NET 6, I have the following html structure to parse.
<ul class="search-results">
<li>
<a href=//to somewhere>
<span class="book-desc">
<div class="book-title">some title</div>
<span class="book-author">some author</span>
</span>
</a>
</li>
</ul>
I need to find and click on the right li where the book-title matches my variable input (ideally ignore sentence case too) AND the book author also matches my variable input. So far I'm not getting that xpath syntax correct. I've tried different variations of something along these lines:
var matchingBooks = driver.FindElements(By.XPath($"//li[.//span[#class='book-author' and text()='{b.Authors}' and #class='book-title' and text()='{b.Title}']]"));
then I check if matchingBooks has a length before clicking on the first element. But matchingBooks is always coming back as 0.
class="book-author" belongs to span while class="book-title" belongs to div child element.
Also it cane be extra spaces additionally to the text, so it's better to use contains instead of exact equals validation.
So, instead of "//li[.//span[#class='book-author' and text()='{b.Authors}' and #class='book-title' and text()='{b.Title}']]" please try this:
"//li[.//span[#class='book-author' and(contains(text(),'{b.Authors}'))] and .//div[#class='book-title' and(contains(text(),'{b.Title}'))]]"
UPD
The following XPath should work. This is a example specific XPath I tried and it worked "//li[.//span[#class='book-author' and(contains(text(),'anima'))] and .//div[#class='book-title' and(contains(text(),'Coloring'))]]" for blood of the fold search input.
Also, I guess you should click on a element inside the li, not on the li itself. So, it's try to click the following element:
"//li[.//span[#class='book-author' and(contains(text(),'{b.Authors}'))] and .//div[#class='book-title' and(contains(text(),'{b.Title}'))]]//a"

Selenium XPATH selecting next sibling

<div class="block wbc">
<span></span>
<span> text_value </span>
</div>
for getting text in second span where does below code go wrong?
driver.find_element(X_PATH,"*//div[#class='block']/span[1]")
For trying by yourself, maybe I write sth wrong here is link
https://soundcloud.com/daydoseofhouse/snt-whats-wrong/s-jmbaiBDyQ0d?si=233b2f843a2c4a7c8afd6b9161369717&utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing
And my code; still giving an error
playbackTimeline__duration =driver.find_element(By.XPATH,"*//div[#class='playbackTimeline__duration']/span[2]")
For finding web element clearly:
//*[#id="app"]/div[4]/section/div/div[3]/div[3]/div/div[3]/span[2]
But I will not use this way, I need declare with class method or CSS Selector at least
If you are sure that you always need the second span use this XPath:
*//div[#class='playbackTimeline__duration']/span[2]
If you need the first span that has actual text use this:
*//div[#class='playbackTimeline__duration']/span[normalize-space()][1]
If the #class has more than only playbackTimeline__duration in it you can use:
*//div[contains(#class,'playbackTimeline__duration')]/span[2]
If there are more div's like that use:
*//div[contains(#class,'playbackTimeline__duration')][1]/span[2]

Could you help me with xpath of a similar html structure?

<div>
<div>
<div>
<h1 >text1<h1>
<div>
<div>
<div>
<div>
<p> some text <p>
I would like to have the XPath for <p>some text<p> which follows <h1>text1<h1>
Depending upon how much you know about the structure, you could make the XPath a bit more specific, and if you need to worry about whitespace decide whether you could use = or should use contains(), or normalize the whitespace with normalize-space() before comparing.
However, first identify the h1 element, and then use the following:: axis to target the p:
//h1[. = "text1"]//following::p[contains(text(), "some text")]
Not totally clear if those divs the OP is showing are on the same level, but when they are and you don't want to be depend on amount of nested div's, you could use:
//div[.//h1[contains(text(),'text1')]]/following-sibling::div//p[contains(.,'some text')]
If you can depend on amount of nested divs then the following XPath wil certainly perform better
//div[div/div/h1[contains(text(),'text1')]]/following-sibling::div/div/div/div/p[contains(.,'some text')]
In case the "some text" is unique you can simply use the following XPath
//p[contains(text(),'some text')]
or
//*[contains(text(),'some text')]
or
//p[text()='some text']
or
//*[text()='some text']
If you want to write Xpath of some text based on text1.
Xpath :
//h1[contains(text(),'text1')]/following-sibling::h1/descendant::p[last()]/preceding-sibling::p
Read more about xpath Axes here
This will be the xpath of the above p tag contains "some text"
//div//div//div//div//p

Xpath Selenium trouble

Can anyone help me? i tried using Firepath for a correct Xpath however the code it gives me is incorrect in my eyes. First line in the examples, is the provided one.
.//*[#id='content']/div/div/div[1]/h2/span
<div id="content" class="article">
<h1>News</h1>
<div>
<div>
<div class="summary">
<h2>
<span>9</span>
// this should be the correct xpath i think
_driver.findElement(By.xpath("//*div[#id='content']/div/span.getText()"));
Here i want check if the text in between is greater or equal to 1
and the other is:
.//*[#id='content']/div/div/div[3]
<div id="content" class="article">
<h1>News</h1>
<div>
<div>
<div class="summary">
<div class="form fancy">
<div class="common results">
Here i want to check if the div class common results has been made, 1 item equals 1 common results
For retrieving span text you can use this
String spanText=driver.findElement(By.xpath("//div[#id='content']/div/div/div/h2/span")).getText();
System.out.println(spanText);
From the second question I am not so much clear.You can get class name like this, Please explain me if its not your solution
String className=driver.findElement(By.xpath("//*[#id='content']/div/div/div/div/div")).getAttribute("class");
System.out.println(className);
I would suggest you making usage of:
//div[#id='content']/div/div/div/h2/span/text()
Note: the html code you shared was not well formed. I would suggest you to test in advance the code and the xpath with http://www.xpathtester.com/xpath (to fix the code) and http://codebeautify.org/Xpath-Tester (to test your xpath)

how to get text from text node without getting content of siblings

I have following code
<div>
<p>some paragraph</p>
some nasty text that I need
<span>something else</span>
</div>
Now I need to get some nasty text that I need only. How to do it using only XPath 1.0? Is it possible?
How to do it using only XPath 1.0? Is it possible?
Yes - and it's rather trivial:
/div/text()
I wonder why you did not try that? All other text nodes are either in a p or span element and should not cause you any trouble.