I have below html code
<a class = sidetoolsdivider>
<div class = sideone > Test 1 </div>
<div class = sidetwo> </div>
</a>
<a class = sidetoolsdivider>
<div class = sideone > Test 2 </div>
<div class = sidetwo> </div>
</a>
...............
Here I need to find xpath locator of class sidetwo which has text Test1. There are many such similar classes hence you can differentiate between different only based on element text
The xpath would be something like below:
Since the element depends on the text, can make use of text attribute for the same.
//div[text()='Text1']/following-sibling::div
Or
//div[contains(text(),'Text1')]/following-sibling::div
Or
//div[contains(text(),'Text1')]/following-sibling::div[#class='sidetwo']
Link to refer - Link
This gets you the correct 'a'. Find an 'a' which contains the right div of sideone (note the .//, find a Child which is)
"//a[.//div[ #class='sideone" and text()='Test 1']"
Then just get the side two, complete xPath
"//a[.//div[ #class='sideone" and text()='Test 1']//div[#class='sidetwo']"
Works even if there is more text inside the entire 'a' and stuff gets complex with more elements inside.
Related
I am creating a parser using Jsoup in Kotlin
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
When I am trying to getElementsByClass in a element objects that created by a former getElementsByClass, I getting 0 elements
Code:
class NetlifxHtmlParser {
val html = """
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title"><span class="titleCard-title_text">Map Her</span><span><span class="duration ellipsized">50m</span></span></div>
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.</div>
</p>
</div>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title"><span class="titleCard-title_text">Renaissance Titties</span><span><span class="duration ellipsized">50m</span></span></div>
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">Amerie, the new outcast, receives a party invitation that gives her butterflies. But when she manages to show up, a bitter surprise awaits.</div>
</p>
</div>
""".trimIndent()
fun parseEpisode() {
val doc = Jsoup.parseBodyFragment(html)
val titleCards = doc.getElementsByClass("titleCard-synopsis")
println("Episode: count titleCard = > ${titleCards.count()}") // 2
titleCards.forEachIndexed { index, element ->
val ptrack = element.getElementsByClass("ptrack-content")
println("Episode: count ptrack = > ${ptrack.count()}") // 0 !!
println("inner html = > ${ptrack.html()}") // null string !!
}
}
}
In the above code,
First, I am extracting tags with class name titleCard-synopsis.
For that , I using doc.getElementsByClass("titleCard-synopsis") which returns 2 element items.
Then, In the List of titleCard elements, I am extracting the elements that have ptrack-content as Class, by using the same getElementsByClass in each element,
which returns empty list.
Why this is happening ?
My goal is, I need to extract the description text for each title, the stored in the interior tags of p tag with class titleCard-synopsis.
If I try to get directly from "ptrack-content", it's working fine, but this a general class used in many places in the main HTML source. (this is snippet)
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
But in the above method in the code, I am only getting emtpy list.
Why ?
Also note that, if I invoke the HTML() method in a element object of titleCards(ptrack.html()),
I am not getting the inner DIV tag, an empty string!!!
Please guide my to resolve the issue !
TL;DR
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
I'm not really familiar with Kotlin, but this should produce the desired output:
val doc = Jsoup.parseBodyFragment(html)
val result = doc.select(".titleCard-synopsis + .ptrack-content")
result.forEachIndexed {index, element ->
println("${element.html()}")
}
Live example
This is an interesting problem!
You basically have an invalid HTML and jsoup is smart enough to auto-correct it for your. Your HTML structure gets altered and suddenly your query does not work.
This is the error:
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.</div>
</p>
You can't nest a <div> element inside a <p> element like that.
Paragraphs are block-level elements, and notably will automatically close if another block-level element is parsed before the closing </p> tag. [Source: <p>: The Paragraph element]
Also, look at Nesting block level elements inside the <p> tag... right or wrong?
This is how jsoup parses your tree:
<html>
<head></head>
<body>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title">
<span class="titleCard-title_text">Map Her</span><span><span class="duration ellipsized">50m</span></span>
</div>
<p class="titleCard-synopsis previewModal--small-text"></p>
<div class="ptrack-content">
A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.
</div>
<p></p>
</div>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title">
<span class="titleCard-title_text">Renaissance Titties</span><span><span class="duration ellipsized">50m</span></span>
</div>
<p class="titleCard-synopsis previewModal--small-text"></p>
<div class="ptrack-content">
Amerie, the new outcast, receives a party invitation that gives her butterflies. But when she manages to show up, a bitter surprise awaits.
</div>
<p></p>
</div>
</body>
</html>
As you can see, elements with class titleCard-synopsis have no children with class ptrack-content.
I am trying to learn how to print by tag. Cannot use find element by xpath or class. If there are 4 "div" tags, how do I print the contents of a specific one?
Desired Output:
vjs-poster
Attempt 1:
divs = driver.find_elements(By.TAG_NAME, "div")
print(divs[0])
Attempt 2:
divs = driver.find_elements(By.TAG_NAME, "div")
print(divs[0].get_attribute('class'))
HTML: (The third line says "vjs-poster" this is what I want to print.)
<video id="video_html5_api" class="vjs-tech" onclick="streaming();" src="/video/stream?cntId=21671&quality=sd"></video>
<div></div>
<div class="vjs-poster" tabindex="-1" style="background-image: url("https://[REDACTED].com/images/V15064/720X480/720x480/nt/4.jpg");"></div>
<div class="vjs-text-track-display vjs-hidden" aria-live="assertive" aria-atomic="true"></div>
<div class="vjs-loading-spinner" dir="ltr"></div>
To print the value of the class attribute vjs-poster of the second <div> you can use:
print(driver.find_elements(By.TAG_NAME, "div")[1].get_attribute('class'))
You can also use a css_selector as:
print(driver.find_element(By.CSS_SELECTOR, "video.vjs-tech#video_html5_api +div +div").get_attribute('class'))
You can try locating that element based on it class name and style or any one of them if the locator will still be unique.
You try this:
class_val = driver.find_elements(By.XPATH, "//div[contains(#style,'https://[REDACTED].com/images')").get_attribute('class')
print(class_val)
OK, so I mentioned Selenium Basic as that is the use of the XPath and I believe Selenium Basic uses Selenium version 2 so maybe it won't be able to understand some/all answers that might require the latest Selenium. But someone might take that into account if necessary.
There are dynamic classes at play here.
Criteria for selection.
1. Class starting with 'NextToJump__eventWrapper' (the outer one) must be used.
2. Class starting with 'NextToJump__venue' must contain text = 'Ballarat'
3. Class starting with 'NextToJump__race' (and/or span) must contain text = 'Race 2'
I need to be able to click on the <a> tag that contains Points 2 and 3.
The best that I've been able to do (and checked) using ChroPath in Chrome Devtools is...
//div[starts-with(#class,'NextToJump__eventWrapper')]//descendant::*[contains(text(),'Ballarat')]
But note that there are 2 cases of Point 2 in the HTML but only 1 case that satisfies Points 2 and 3.
Thanks
<div class="NextToJump__eventWrapper--13zZJ">
<div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY" href="/racing-betting/greyhound-racing/crayford-am/20200708/race-1-1801951-58544404">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Ballarat</div>
<div class="NextToJump__race--3JydR"><span>Race 1</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">52s</span></div>
</a>
</div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY active" href="/racing-betting/greyhound-racing/rockhampton/20200708/race-4-1799474-58466521" aria-current="page">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Rockhampton</div>
<div class="NextToJump__race--3JydR"><span>Race 4</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">2m 52s</span></div>
</a>
</div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY" href="/racing-betting/greyhound-racing/ballarat/20200708/race-4-1799454-58465201">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Ballarat</div>
<div class="NextToJump__race--3JydR"><span>Race 2</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">5m 52s</span></div>
</a>
</div>
</div>
</div>
The xpath expression you need to use to select your target <a> tag is long and convoluted, but that's life....
[formatted for ease of reading, but you can use that in one line]
//a
[ancestor::div[starts-with(#class,'NextToJump__eventWrapper')]]
[.//div[.="Ballarat"]
[starts-with(#class,'NextToJump__venue-')]
[./following-sibling::div[.="Race 2"]
[starts-with(#class,'NextToJump__race-')]
]
]
Edit:
In "plain English":
Find an <a> node which meets ALL these conditions (i) has an ancestor (not a parent) node which is a <div>, which <div> has a class attribute with an attribute name which starts with NextToJump__eventWrapper; and (ii) it has <div>descendant (not just a child) node, which has Ballarat as a text node AND which has a class attribute with an attribute name which starts with NextToJump__venue-, where that <div>descendant itself has a following sibling which is a <div> which itself has a Race 2 text node AND which has a class attribute with an attribute name which starts with NextToJump__race-...
Yes, the word "plain" doesn't really fit here, but that's the closest I could get. I like xpath, and it's very powerful, but sometimes it's very hard to follow... As an aside, it would have been somewhat less cryptic if xquery was used instead of straight xpath.
Sample code:
<div class="loginbox">some code</div>
<div class="loginbox">other code</div>
<div class="loginbox">
<p> style="color: Red;">Test Extract</p>
</div>
Using Selenium Web Driver, I would like to extract the text Test Extract within the paragraph element which is nested within a div, whose class name is shared with other div classes. c# preferred.
You can try below method:
driver.findElement(By.xpath("//div[#class='loginbox']/p")).getText();
EDITED
You should use = inside the square braces like:
driver.findElement(By.xpath("//div[#class='loginbox']/p");
C# code to get the text from the locator specified,
IWebElement element = Browser.GetElementByCssSelector("div.loginbox p");
string text = element.Text;
I'll try to keep it simple.
I have a list of similar rows like this:
HTML code:
<li ...>
<div ... >
<some elements here>
</div>
<input id="121099" class="containerMultiSelect" type="checkbox" value="121099" name="NodeIds">
<a ...>
<div ... />
<div ... >
<h2>Identified Text</h2>
<h3>...</h3>
</div>
</a>
</li>
I want to click the checkbox with a certain text, but I can't use any of its elements, because they are the same for all the list, and id is generated automatically. The only thing can be differentiated is the h2 text. I tried :
browser.h2(:text => /Identified/).checkbox(:name => "NodeIds").set
and I got UnknownException which is obvious because checkbox is not nested with a tag.
What can I do in this case?
Thanks
The h2 and checkbox are related by the li, which is a common ancestor. Therefore, to find the checkbox, you can look for the li that contains the h2 element. I find the most readable approach to doing this is by using the find method of the element collection. The find method basically allows you to make custom locators.
The code would be:
parent_li = browser.lis.find do |li|
li.h2(:text => 'Identified Text').present?
end
parent_li.checkbox.set
Notes:
browser.lis creates a collection of all li elements.
find iterates through the lis and returns the first element that has the block evaluate as true - ie the first li where an h2 with the specified text is present.
First have a look at this explanation
http://jkotests.wordpress.com/2012/12/20/finding-a-parent-element-that-matches-a-specific-criteria/
Now following a similar approach,
First locate the element that has a unique identifier
parent=#browser.h2(:text=>"Identified Text")
Now we have to iterate over to the parent element which contains both the checkbox and text against it.
parent=parent.parent until parent.tag_name=="li"
Once the control is on the li element, simple click on the checkbox using.
parent.checkbox.click