how to extract H1,H2,H3 tag values from html using selenium webdriver

how to extract H1,H2,H3 tag values from html using selenium webdriver - selenium

1.HTML
<div id="page-filter1">
**<h3>**
Browse Category
**</h3>**
<ul class="mt-accordion multiple">
<li id="phdesktopbody_0_phdesktopfilterbycategory_0_liAllProduct" class="cls-8dcbcbac-2fef-4231-9641-d61818abe0e0 item-1 odd first odd">
<a id="phdesktopbody_0_phdesktopfilterbycategory_0_hypAllProducts" href="/en-us/products">All Products</a>
</li>
2.
<div class="span12">
<h3 class="onelayout-heading">
<strong><em>Callout <sub>©</sub>itle<sup>x</sup></em></strong>
</h3>
</div>
<div id="phdesktopbody_0_phdesktopflexiblepromo_0_phdesktoppromocontentarea6b299d9421684ceaaed7c23ebee57f57_0_panelSubheadlineandCTASection" class="span7 pull-left">
i've more than 1 H2 ,&H3 tags in each page,
The class name and description changes each and every times ,pls help me on identifying the tags and extract the values from it.

Hi Arjun Vc please do like below if the class name and description changes each and every times same beloe piece of code will work.
driver.get("http://www.seleniumhq.com"); // link to your web page
// working with H1 H2 .... tags
String TagToWorkWith = "h1"; // here simply change the tag name on which you want to work
List<WebElement> myTags = driver.findElements(By.tagName(TagToWorkWith));
// now extracting the vale
// this for loop will print/extract all the values for tag 'H1'.
for(int i=0;i<myTags.size();i++){
// extracting tags text
System.out.println(TagToWorkWith + " value is : " + myTags.get(i).getText());
}

Related

Use GetElementsByClass to find all <div> elements by class name, nested inside a <p> element

I am creating a parser using Jsoup in Kotlin
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
When I am trying to getElementsByClass in a element objects that created by a former getElementsByClass, I getting 0 elements
Code:
class NetlifxHtmlParser {
val html = """
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title"><span class="titleCard-title_text">Map Her</span><span><span class="duration ellipsized">50m</span></span></div>
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.</div>
</p>
</div>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title"><span class="titleCard-title_text">Renaissance Titties</span><span><span class="duration ellipsized">50m</span></span></div>
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">Amerie, the new outcast, receives a party invitation that gives her butterflies. But when she manages to show up, a bitter surprise awaits.</div>
</p>
</div>
""".trimIndent()
fun parseEpisode() {
val doc = Jsoup.parseBodyFragment(html)
val titleCards = doc.getElementsByClass("titleCard-synopsis")
println("Episode: count titleCard = > ${titleCards.count()}") // 2
titleCards.forEachIndexed { index, element ->
val ptrack = element.getElementsByClass("ptrack-content")
println("Episode: count ptrack = > ${ptrack.count()}") // 0 !!
println("inner html = > ${ptrack.html()}") // null string !!
}
}
}
In the above code,
First, I am extracting tags with class name titleCard-synopsis.
For that , I using doc.getElementsByClass("titleCard-synopsis") which returns 2 element items.
Then, In the List of titleCard elements, I am extracting the elements that have ptrack-content as Class, by using the same getElementsByClass in each element,
which returns empty list.
Why this is happening ?
My goal is, I need to extract the description text for each title, the stored in the interior tags of p tag with class titleCard-synopsis.
If I try to get directly from "ptrack-content", it's working fine, but this a general class used in many places in the main HTML source. (this is snippet)
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
But in the above method in the code, I am only getting emtpy list.
Why ?
Also note that, if I invoke the HTML() method in a element object of titleCards(ptrack.html()),
I am not getting the inner DIV tag, an empty string!!!
Please guide my to resolve the issue !

TL;DR
I need to get a inner text of a tag with class "ptrack-content" inside the tag with class "titleCard-synopsis"
I'm not really familiar with Kotlin, but this should produce the desired output:
val doc = Jsoup.parseBodyFragment(html)
val result = doc.select(".titleCard-synopsis + .ptrack-content")
result.forEachIndexed {index, element ->
println("${element.html()}")
}
Live example
This is an interesting problem!
You basically have an invalid HTML and jsoup is smart enough to auto-correct it for your. Your HTML structure gets altered and suddenly your query does not work.
This is the error:
<p class="titleCard-synopsis previewModal--small-text">
<div class="ptrack-content">A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.</div>
</p>
You can't nest a <div> element inside a <p> element like that.
Paragraphs are block-level elements, and notably will automatically close if another block-level element is parsed before the closing </p> tag. [Source: <p>: The Paragraph element]
Also, look at Nesting block level elements inside the <p> tag... right or wrong?
This is how jsoup parses your tree:
<html>
<head></head>
<body>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title">
<span class="titleCard-title_text">Map Her</span><span><span class="duration ellipsized">50m</span></span>
</div>
<p class="titleCard-synopsis previewModal--small-text"></p>
<div class="ptrack-content">
A hidden map rocks Hartley High as the students' sexcapades are publicly exposed. Caught as the culprit, Amerie becomes an instant social pariah.
</div>
<p></p>
</div>
<div class="titleCardList--metadataWrapper">
<div class="titleCardList-title">
<span class="titleCard-title_text">Renaissance Titties</span><span><span class="duration ellipsized">50m</span></span>
</div>
<p class="titleCard-synopsis previewModal--small-text"></p>
<div class="ptrack-content">
Amerie, the new outcast, receives a party invitation that gives her butterflies. But when she manages to show up, a bitter surprise awaits.
</div>
<p></p>
</div>
</body>
</html>
As you can see, elements with class titleCard-synopsis have no children with class ptrack-content.

How to write xpath for a field and validate the fields

I have a requirement to verify field name and values. My code looks like
<div class="line info">
<div class="unit labelInfo TextMdB">
Reference #:
</div>
<div class="unit lastUnit">
701
</div>
</div>
</div>
<div class="line info">
<div class="unit labelInfo TextMdB">
Registered Date:
</div>
<div class="unit lastUnit">
05/05/2020
</div>
</div>
I gave my xpath as
"//div[#class='unit lastUnit']//preceding-sibling::div[#class='unit labelInfo TextMdB' and contains(text(),'Reference #:')]".
With this xpath I am able to reach "reference#" field . But how to verify reference # field is displaying the value (in this case 701) .
Appreciate your response.
Thanks

You can first reach the Reference # text by using its text in the xpath and then you can use following-sibling to fetch the div tag and then use getText()(java) / text (python) method to get 701.
(Edited answer after OP's comment)
If you want to check if the element is displayed on the page or not then you can fetch its list and check if the size of that list is greater than 0 or not.
You can do it like:
In Java:
List<WebElement> elementList = driver.findElements(By.xpath("//div[#class='line info']//div[contains(text(),'Reference #')]//following-sibling::div"));
if(elementList.size()>0){
// Element is present on the UI
// Finding its text
String text = elementList.get(0).getText();
}
In python:
elementList = driver.find_elements_by_xpath("//div[#class='line info']//div[contains(text(),'Reference #')]//following-sibling::div")
if (elementList.len>0):
# Element is present
# Printing its text
print(elementList[0].text)

Selector to extract element value of type different tagName

There is String on my web page which looks as "#Account9 Hey Dude". #Account9 is a link, 'Hey Dude' is span. Please help me to create Selector to extract this "#Account9 Hey Dude".
note: Can find something like selenium "normalize-space" method.
#Account9
Hey Dude

To create a selector that extracts the inner text of the provided markup:
<div class=“XYZ123>
<span class="r-18u37iz">
<a href="/Account9" dir="ltr" role="link" data-focusable="true" class=“ABC123”>#Account9 </a>
</span>
</div>
<span class=“ABCxyz123”> :Hey Dude</span
you need to find the first parent of these elements (div class=“XYZ123 and span class=“ABCxyz123”) , specify CSS selector for it and call innerText property.
const targetText = await Selector('<parent_of_these_elements>').innerText;

trouble making in xpath for "ul" html code in selenium webdriver

HTML code:
<div id="routingPanel" class="">
<div id="routingPanelRight">
<ul id="routingList" class="ui-sortable">
<li class="ui-menu-item ui-draggable" style="display: list-item;" role="presentation" data-type="srl" data-id="15">
<a class="ui-corner-all" tabindex="-1">AS-HTTS-US-LAN-SW</a>
<span class="fa fa-trash"/>
<span class="type">[srl]</span>
</li>
<li class="ui-menu-item ui-draggable" style="display: list-item;" role="presentation" data-type="queue" data-id="119">
<a class="ui-corner-all" tabindex="-1">AS-EMEA-NORTH</a>
<span class="fa fa-trash"/>
<span class="type">[queue]</span>
</li></ul></div></div>
I need to click on a button which is having the span class"fa fa-trash" but it is inside li class. And i have list on buttons on the page with li class changing.
I am giving testdata from excel file so i can't use the direct value.
i tried to use this xpath
.//*[#id='routingList']/li[5]/span[1] //testdata1
.//*[#id='routingList']/li[2]/span[1] //testdata2
where li value changes everytime from excel file.
WebDriverWait wait = new WebDriverWait(driver, 15);
wait.until(ExpectedConditions.visibilityOfElementLocated((By.xpath("//ul[#id='routingList']/li/span[1]")))).click();
List<WebElement> options = driver.findElements(By.xpath("//ul[#id='routingList']/li/span[1]"));
for (WebElement option : options) {
if(testData.equals(option.getText()))
option.click();
Tried above code but it is deleting only one from the list ,where i have passed two more testdata that needs to be deleted.
Need suggestions Please

According to the information you gave me in comments, I think the problem is that you are trying to get a text from an element that doesn't contain text.
Let's say your testData is AS-HTTS-US-LAN-SW. In the HTML you provided and the xpath you mentioned, you are selecting an autoclosing tag <span class="fa fa-trash"/>. Once this tag is selected, you are trying to get the text inside of it, and there is none.
<ul id="routingList" class="ui-sortable">
<li class="ui-menu-item ui-draggable" style="display: list-item;" role="presentation" data-type="srl" data-id="15">
===========================
<a class="ui-corner-all" tabindex="-1">AS-HTTS-US-LAN-SW</a> ----> The text is contained here
<span class="fa fa-trash"/> ---> No text in that tag
===========================
<span class="type">[srl]</span>
</li>
</ul>
So, basically, you have to modify a little bit your xpath from : //ul[#id='routingList']/li/span[1] to : //ul[#id='routingList']/li/a to get the text, and then go back to the parent node to find your button with : ../span[contains(#class, 'fa fa-trash')]
WebDriverWait wait = new WebDriverWait(driver, 15);
wait.until(ExpectedConditions.visibilityOfElementLocated((By.xpath("//ul[#id='routingList']/li/span[1]")))) // removed the click here because you were clicking on the first element of the list
List<WebElement> options = driver.findElements(By.xpath("//ul[#id='routingList']/li/a"));
for (WebElement option : options) {
if(testData.equals(option.getText()))
option.findElement(By.xpath("../span[contains(#class, 'fa fa-trash')]")).click();
Tell me if it helped

I know you already accepted an answer but there's a more efficient way to do this. You can specify the text you are looking for as part of the XPath. So, you do a single search instead of looping through all the options which can be a performance hit if there are many options. Also, with something like this you are likely to use it more than once so put it in a function.
In this case, the function would take in the string you are looking for and then click the appropriate element.
public void selectRegion(String regionName)
{
driver.findElement(By.xpath("//a[.='" + regionName + "']/following-sibling::span[#class='fa fa-trash']")).click();
}
and you would call it like
selectRegion(testData);
The function looks for an A tag that contains the desired text and then clicks the sibling SPAN with class fa fa-trash.

Identifying the Web element with same class name in Selenium

I have tried to get the number of tweets(tweet count) through selenium
Here is the page source:
<li class="DashboardProfileCard-stat Arrange-sizeFit">
<a class="DashboardProfileCard-statLink u-textUserColor u-linkClean u-block"
title="1 Tweet"
href="/saisiva14"
data-element-term="tweet_stats">
<span class="DashboardProfileCard-statLabel u-block">Tweets</span>
<span class="DashboardProfileCard-statValue" data-is-compact="false">1</span>
</a>
</li>
<li class="DashboardProfileCard-stat Arrange-sizeFit">
<a class="DashboardProfileCard-statLink u-textUserColor u-linkClean u-block"
title="38 Following"
href="/following"
data-element-term="follower_stats">
<span class="DashboardProfileCard-statLabel u-block">Following</span>
<span class="DashboardProfileCard-statValue" data-is-compact="false">38</span>
</a>
</li>
<li class="DashboardProfileCard-stat Arrange-sizeFit">
<a class="DashboardProfileCard-statLink u-textUserColor u-linkClean u-block"
title="4 Followers"
href="/followers"
data-element-term="following_stats">
<span class="DashboardProfileCard-statLabel u-block">Followers</span>
<span class="DashboardProfileCard-statValue" data-is-compact="false">4</span>
</a>
</li>
I could not able to locate the web element for getting Tweets,Followers & following. The reason is span class names are common for all these elements.Please help me .

To get number of Tweets/ Following/ Followers, You can try the below statements:
System.out.println(driver.findElement(By.xpath("//a[contains(#title, 'Tweet')]/span[2]")).getText());
System.out.println(driver.findElement(By.xpath("//a[contains(#title, 'Following')]/span[2]")).getText());
System.out.println(driver.findElement(By.xpath("//a[contains(#title, 'Followers')]/span[2]")).getText());
To click on the Tweets/ Following/ Followers links, You can try the below statements:
driver.findElement(By.xpath("//a[contains(#title, 'Tweet')]")).click();
driver.findElement(By.xpath("//a[contains(#title, 'Following')]")).click();
driver.findElement(By.xpath("//a[contains(#title, 'Followers')]")).click();
or
driver.findElement(By.xpath("//a[./span[text()='Tweets']]")).click();
driver.findElement(By.xpath("//a[./span[text()='Following']]")).click();
driver.findElement(By.xpath("//a[./span[text()='Followers']]")).click();
The above statements are working fine for me.

try this
IList<IWebElement> elements = driver.FindElements(By.ClassName("DashboardProfileCard-stat"));
foreach (IWebElement element in elements)
{
IWebElement ele = element.FindElement(By.ClassName("DashboardProfileCard-statLabel"));
if (ele.Text == "Tweets")
{
return element.FindElement(By.ClassName("DashboardProfileCard-statValue")).Text;
}
}
this is using C#, you can modify accordingly if anyother language is used.

The selector .DashboardProfileCard-stat span:nth-child(2) should give you the collection of web elements pointing to the count. For example in Java:
ArrayList<WebElement> elements = driver.findElements(By.cssSelector(".DashboardProfileCard-stat span:nth-child(2)"))
Then you can use elements.get(0).getText() for tweets. elements.get(1).getText() for following. elements.get(2).getText() for followers. So:
ArrayList<WebElement> elements = driver.findElements(By.cssSelector(".DashboardProfileCard-stat span:nth-child(2)"));
int tweets = elements.get(0).getText();
int following = elements.get(1).getText();
int followers = elements.get(2).getText();
Of course, do your appropriate safety checks, etc. Check the length of the array before access.

This code is in Java :)
capture all the parents of the "SPAN" into a collection item.
Iterate on the collection to find the span elements (which are child of <a> tag) and capture text based on the class name variation statLabel / statValue".
List<WebElement> webElement = driver.findElements(By.xpath("//li[#class='DashboardProfileCard-stat Arrange-sizeFit']//a"));
for (WebElement element : webElement) {
System.out.println(element.findElement(By.xpath("//span[contains(#class,'statLabel')]")).getText());
System.out.println(element.findElement(By.xpath("//span[contains(#class,'statValue')]")).getText());
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to extract H1,H2,H3 tag values from html using selenium webdriver - selenium

Related

Use GetElementsByClass to find all <div> elements by class name, nested inside a <p> element

How to write xpath for a field and validate the fields

Selector to extract element value of type different tagName

trouble making in xpath for "ul" html code in selenium webdriver

Identifying the Web element with same class name in Selenium

Categories

Resources