Use GetElementsByClass to find all <div> elements by class name, nested inside a <p> element - kotlin

I am creating a parser using Jsoup in Kotlin
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
When I am trying to getElementsByClass in a element objects that created by a former getElementsByClass, I getting 0 elements
Code:
class NetlifxHtmlParser {
val html = """
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title"><span class="titleCard-title_text">Map Her</span><span><span class="duration ellipsized">50m</span></span></div>
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.</div>
</p>
</div>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title"><span class="titleCard-title_text">Renaissance Titties</span><span><span class="duration ellipsized">50m</span></span></div>
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">Amerie, the new outcast, receives a party invitation that gives her butterflies. But when she manages to show up, a bitter surprise awaits.</div>
</p>
</div>
""".trimIndent()
fun parseEpisode() {
val doc = Jsoup.parseBodyFragment(html)
val titleCards = doc.getElementsByClass("titleCard-synopsis")
println("Episode: count titleCard = > ${titleCards.count()}") // 2
titleCards.forEachIndexed { index, element ->
val ptrack = element.getElementsByClass("ptrack-content")
println("Episode: count ptrack = > ${ptrack.count()}") // 0 !!
println("inner html = > ${ptrack.html()}") // null string !!
}
}
}
In the above code,
First, I am extracting tags with class name titleCard-synopsis.
For that , I using doc.getElementsByClass("titleCard-synopsis") which returns 2 element items.
Then, In the List of titleCard elements, I am extracting the elements that have ptrack-content as Class, by using the same getElementsByClass in each element,
which returns empty list.
Why this is happening ?
My goal is, I need to extract the description text for each title, the stored in the interior tags of p tag with class titleCard-synopsis.
If I try to get directly from "ptrack-content", it's working fine, but this a general class used in many places in the main HTML source. (this is snippet)
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
But in the above method in the code, I am only getting emtpy list.
Why ?
Also note that, if I invoke the HTML() method in a element object of titleCards(ptrack.html()),
I am not getting the inner DIV tag, an empty string!!!
Please guide my to resolve the issue !

TL;DR
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
I'm not really familiar with Kotlin, but this should produce the desired output:
val doc = Jsoup.parseBodyFragment(html)
val result = doc.select(".titleCard-synopsis + .ptrack-content")
result.forEachIndexed {index, element ->
println("${element.html()}")
}
Live example
This is an interesting problem!
You basically have an invalid HTML and jsoup is smart enough to auto-correct it for your. Your HTML structure gets altered and suddenly your query does not work.
This is the error:
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.</div>
</p>
You can't nest a <div> element inside a <p> element like that.
Paragraphs are block-level elements, and notably will automatically close if another block-level element is parsed before the closing </p> tag. [Source: <p>: The Paragraph element]
Also, look at Nesting block level elements inside the <p> tag... right or wrong?
This is how jsoup parses your tree:
<html>
<head></head>
<body>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title">
<span class="titleCard-title_text">Map Her</span><span><span class="duration ellipsized">50m</span></span>
</div>
<p class="titleCard-synopsis previewModal--small-text"></p>
<div class="ptrack-content">
A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.
</div>
<p></p>
</div>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title">
<span class="titleCard-title_text">Renaissance Titties</span><span><span class="duration ellipsized">50m</span></span>
</div>
<p class="titleCard-synopsis previewModal--small-text"></p>
<div class="ptrack-content">
Amerie, the new outcast, receives a party invitation that gives her butterflies. But when she manages to show up, a bitter surprise awaits.
</div>
<p></p>
</div>
</body>
</html>
As you can see, elements with class titleCard-synopsis have no children with class ptrack-content.

Related

How to find xpath of an element which depends upon sibling class

I have below html code
<a class = sidetoolsdivider>
<div class = sideone > Test 1 </div>
<div class = sidetwo> </div>
</a>
<a class = sidetoolsdivider>
<div class = sideone > Test 2 </div>
<div class = sidetwo> </div>
</a>
...............
Here I need to find xpath locator of class sidetwo which has text Test1. There are many such similar classes hence you can differentiate between different only based on element text
The xpath would be something like below:
Since the element depends on the text, can make use of text attribute for the same.
//div[text()='Text1']/following-sibling::div
Or
//div[contains(text(),'Text1')]/following-sibling::div
Or
//div[contains(text(),'Text1')]/following-sibling::div[#class='sidetwo']
Link to refer - Link
This gets you the correct 'a'. Find an 'a' which contains the right div of sideone (note the .//, find a Child which is)
"//a[.//div[ #class='sideone" and text()='Test 1']"
Then just get the side two, complete xPath
"//a[.//div[ #class='sideone" and text()='Test 1']//div[#class='sidetwo']"
Works even if there is more text inside the entire 'a' and stuff gets complex with more elements inside.

How to create an Xpath in a tricky section of document (for me) for the purpose of using with Selenium Basic in VBA

OK, so I mentioned Selenium Basic as that is the use of the XPath and I believe Selenium Basic uses Selenium version 2 so maybe it won't be able to understand some/all answers that might require the latest Selenium. But someone might take that into account if necessary.
There are dynamic classes at play here.
Criteria for selection.
1. Class starting with 'NextToJump__eventWrapper' (the outer one) must be used.
2. Class starting with 'NextToJump__venue' must contain text = 'Ballarat'
3. Class starting with 'NextToJump__race' (and/or span) must contain text = 'Race 2'
I need to be able to click on the <a> tag that contains Points 2 and 3.
The best that I've been able to do (and checked) using ChroPath in Chrome Devtools is...
//div[starts-with(#class,'NextToJump__eventWrapper')]//descendant::*[contains(text(),'Ballarat')]
But note that there are 2 cases of Point 2 in the HTML but only 1 case that satisfies Points 2 and 3.
Thanks
<div class="NextToJump__eventWrapper--13zZJ">
<div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY" href="/racing-betting/greyhound-racing/crayford-am/20200708/race-1-1801951-58544404">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Ballarat</div>
<div class="NextToJump__race--3JydR"><span>Race 1</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">52s</span></div>
</a>
</div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY active" href="/racing-betting/greyhound-racing/rockhampton/20200708/race-4-1799474-58466521" aria-current="page">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Rockhampton</div>
<div class="NextToJump__race--3JydR"><span>Race 4</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">2m 52s</span></div>
</a>
</div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY" href="/racing-betting/greyhound-racing/ballarat/20200708/race-4-1799454-58465201">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Ballarat</div>
<div class="NextToJump__race--3JydR"><span>Race 2</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">5m 52s</span></div>
</a>
</div>
</div>
</div>
The xpath expression you need to use to select your target <a> tag is long and convoluted, but that's life....
[formatted for ease of reading, but you can use that in one line]
//a
[ancestor::div[starts-with(#class,'NextToJump__eventWrapper')]]
[.//div[.="Ballarat"]
[starts-with(#class,'NextToJump__venue-')]
[./following-sibling::div[.="Race 2"]
[starts-with(#class,'NextToJump__race-')]
]
]
Edit:
In "plain English":
Find an <a> node which meets ALL these conditions (i) has an ancestor (not a parent) node which is a <div>, which <div> has a class attribute with an attribute name which starts with NextToJump__eventWrapper; and (ii) it has <div>descendant (not just a child) node, which has Ballarat as a text node AND which has a class attribute with an attribute name which starts with NextToJump__venue-, where that <div>descendant itself has a following sibling which is a <div> which itself has a Race 2 text node AND which has a class attribute with an attribute name which starts with NextToJump__race-...
Yes, the word "plain" doesn't really fit here, but that's the closest I could get. I like xpath, and it's very powerful, but sometimes it's very hard to follow... As an aside, it would have been somewhat less cryptic if xquery was used instead of straight xpath.

How to write xpath for a field and validate the fields

I have a requirement to verify field name and values. My code looks like
<div class="line info">
<div class="unit labelInfo TextMdB">
Reference #:
</div>
<div class="unit lastUnit">
701
</div>
</div>
</div>
<div class="line info">
<div class="unit labelInfo TextMdB">
Registered Date:
</div>
<div class="unit lastUnit">
05/05/2020
</div>
</div>
I gave my xpath as
"//div[#class='unit lastUnit']//preceding-sibling::div[#class='unit labelInfo TextMdB' and contains(text(),'Reference #:')]".
With this xpath I am able to reach "reference#" field . But how to verify reference # field is displaying the value (in this case 701) .
Appreciate your response.
Thanks
You can first reach the Reference # text by using its text in the xpath and then you can use following-sibling to fetch the div tag and then use getText()(java) / text (python) method to get 701.
(Edited answer after OP's comment)
If you want to check if the element is displayed on the page or not then you can fetch its list and check if the size of that list is greater than 0 or not.
You can do it like:
In Java:
List<WebElement> elementList = driver.findElements(By.xpath("//div[#class='line info']//div[contains(text(),'Reference #')]//following-sibling::div"));
if(elementList.size()>0){
// Element is present on the UI
// Finding its text
String text = elementList.get(0).getText();
}
In python:
elementList = driver.find_elements_by_xpath("//div[#class='line info']//div[contains(text(),'Reference #')]//following-sibling::div")
if (elementList.len>0):
# Element is present
# Printing its text
print(elementList[0].text)

Not finding the Correct xpath

I'm trying write a Python script to get some information from Google's products listed on the top right of the screen. (Usual 6 pictures with price and seller)
I am using Python, PhantomJS and Selenium
Doing a google search for "red shoe" I want my script to return the prices. I get stuck in the step where I try to even find the element containing the products. Am I missing something with my xpath?
def getTopSongs(object):
print "Working YETI"
browser = webdriver.PhantomJS('c:/projects/phantomjs/phantomjs.exe')
browser.get('http://google.com/search?q=red+shoe')
time.sleep(5)
title = browser.find_element_by_xpath('//div[contains#class, "pla-unit")]/text()[contains(., "red")]/following::b').text
From Google's webpage I element under a few nested
<div id="rhs">
...
<div class="_Pwb">
<div class="_Ohb">
<div style="width:109px" class="pla-unit">
<div class="_PD">
<div class="pla-unit-img-container">
<div class="_Z5">
<div class="_vT"><a href="http://www.somewebsite.com">
<span class="rhsl4">Nina 'Forbes' Peep Toe Pump <b>Red</b> R...</span>
<span class="rhsg3 rhsl5">Nina 'Forbes' Peep Toe Pum...</span>
<span class="rhsg4">Nina 'Forbes' Peep Toe Pu...</span></a>
</div>
<div class="_QD"><b>$78.95</b></div>
<div class="_mC">
<span class="rhsl4 a">Nordstrom</span>
<span class="rhsg3 rhsl5 a">Nordstrom</span>
<span class="rhsg4 a">Nordstrom</span>
</div>
</div>
*Update:
I added more HTML. In this example I am looking to get the text from ($78.95) annd (Norstrom)
*Update
To clarify,
<div id="rhs">
is an unique element
There are however multiple (6) elements of:
<div style="width:109px" class="pla-unit">
The elements under each category have the same name and follow the same structure and substructures
ie, there are 6
<div class="_PD">
<div class="pla-unit-img-container">
<div class="_Z5">
<div class="_vD">
<div class="_QD">
<div class="_mC">
and so on.
The main objective is to get all of the elements but for purposes of debugging I was asking help to get the first one.
The xpath for a price unit using XPathChecker on Firefox is:
id('rhs_block')/x:div[1]/x:div/x:div/x:div/x:div[1]/x:div[1]/x:div[2]/x:div[2]/x:b
You can use ancestor:: to go back up then following-sibling:: to get elements at the same level that follow it.
I haven't tried this but give it a shot:
title = browser.find_element_by_xpath('//div[contains#class, "pla-unit")]/text()[contains(., "red")]/ancestor::div/following-sibling::div[1]').text
Then to get to your div class ='mC' you just change:
following-sibling::div[1]
to
following-sibling::div[2]
and get the text from the spans under that.

How to extend dijit.form.button

I am trying to extend dijit.form.Button with an extra attribute but this is not working.Code is given below
In file1.js
dojo.require('dijit.form.Button');
dojo.extend(dijit.form.Button,{xyz: ''});
In file2.jsp
<script type="text/javascript" src="file1.js"></script>
<div dojoType="dijit.form.Button" xyz="abc"></div>
However when I look at the HTML of the created button (In chrome seen by right click and then selecting 'inspect element' option), it doesn't show xyz attribute.
You need to keep in mind that there's a distinction between the widget object and its HTML representation. When you extend dijit.form.Button, the xyz attribute is added to the widget class, but not automatically to the HTML that the widget will render. So in your case, if you do
console.debug(dijit.byId("yourWidgetId").get("xyz"));
.. you'll see that the button object does have the xyz member, but the HTML (like you point out) does not.
If you also want it do be visible in the HTML, you have to manually add it to the HTML rendering of the button. One way to do that is to subclass dijit.form.Button and override the buildRendering method.
dojo.declare("my.Button", dijit.form.Button, {
xyz: '',
buildRendering: function() {
this.inherited(arguments);
this.domNode.setAttribute("xyz", this.xyz);
}
});
If you add an instance of your new Button class in the HTML, like so:
<div dojoType="my.Button" xyz="foobar" id="mybtn"></div>
.. then the HTML representation (after Dojo has parsed it and made it into a nice looking widget) will contain the xyz attribute. Probably something like this:
<span class="..." xyz="foobar" dir="ltr" widgetid="mybtn">
<span class="..." dojoattachevent="ondijitclick:_onButtonClick">
<input class="dijitOffScreen" type="button" dojoattachpoint="valueNode" ...>
</span>