Retrieving substring in selenium - selenium

I have below HTML code:
<div class="card-platform">
<span class="sr-only">Hosted at the </span>
<!-- react-text: 365 -->
ANDROID
<!-- /react-text -->
<span class="sr-only"> app store</span>
</div>
</div>
I want to retreive the word "ANDROID". There are many elements which have "ANDROID" value. The other value for the HTML element is "IOS". Few have ANDRIOD and few others have IOS. I want to print what either ANDRIOD/IOS for each element whatever is defined in the HTML.
How to get this value using selenium?

Required text located not inside "react-text" tag as that element is just comment, but text is located inside a div. You can use below code to get required value:
WebElement MyText = driver.findElement(By.xpath("//div[#class='card-platform']"));
JavascriptExecutor jse = (JavascriptExecutor)driver;
String platformName = (String) jse.executeScript("return arguments[0].childNodes[4].nodeValue", MyText);

This is a simple exercise in string manipulation:
String fullText = driver.findElement(By.className("card-platform")).getText();
assert fullText.equals("Hosted at the \nANDROID\n app store");
String prefix = driver.findElements(By.className("sr-only")).get(0).getText();
String suffix = driver.findElements(By.className("sr-only")).get(1).getText();
assert prefix.equals("Hosted at the ");
assert suffix.equals(" app store");
String yourText = fullText.replace(prefix, "").replace(suffix, "");
assert yourText.equals("ANDROID");
Of course you will have to adjust the locators for all your different cases. This is based on the tiny sample of code snippet you provided.

In a similar situation, I got a number using the code below.
from requests import get
from bs4 import BeautifulSoup
pagina = "https://example"
response = get(pagina)
soup = BeautifulSoup(response.text, "html.parser")
cont = soup.find(class_="Trsdu(0.3s)")
print(cont.text)

Related

How to use 'find_elements_by_xpath' inside a for loop

I'm somewhat (or very) confused about the following:
from selenium.webdriver import Chrome
driver = Chrome()
html_content = """
<html>
<head></head>
<body>
<div class='first'>
Text 1
</div>
<div class="second">
Text 2
<span class='third'> Text 3
</span>
</div>
<div class='first'>
Text 4
</div>
<my_tag class="second">
Text 5
<span class='third'> Text 6
</span>
</my_tag>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
What I'm trying to do, is find each span element using xpath, print out its text and then print out the text of the parent of that element. The final output should be something like:
Text 3
Text 2
Text 6
Text 5
I can get the text of span like this:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
With the output being:
Text 3
Text 6
But when I try to get the parent's (and only the parent's) text by using:
elp = driver.find_elements_by_xpath("*//span/..")
for i in elp:
print(i.text)
The output is:
Text 2 Text 3
Text 5 Text 6
The xpath expressions *//span/..and //span/../text() usually (but not always, depending on which xpath test site is being used) evaluate to:
Text 2
Text 5
which is what I need for my for loop.
Hence the confusion. So I guess what I'm looking for is a for loop which, in pseudo code, looks like:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
print(i.parent.text) #trying this in real life raises an error....
There's probably a few ways to do this. Here's one way
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
print(s.split('<')[0].strip())
I used a simple CSS selector to find the child elements ("text 3" and "text 6"). I loop through those elements and print their .text as well as navigate up one level to find the parent and print its text also. As OP noted, printing the parent text also prints the child. To get around this, we need to get the innerHTML, split it and strip out the spaces.
To explain the XPath in more detail
./..
^ start at an existing node, the 'i' in 'i.find_element_*'. If you skip/remove this '.', you will start at the top of the DOM instead of at the child element you've already located.
^ go up one level, to find the parent
I know I already accepted #JeffC's answer, but in the course of working on this question something occurred to me. It's very likely an overkill, but it's an interesting approach and, for the sake of future generations, I figured I might as well post it here as well.
The idea involves using BeautifulSoup. The reason is that BS has a couple of methods for erasing nodes from the tree. One of them which can be useful here (and for which, to my knowledge, Selenium doesn't have an equivalent method) is decompose() (see more here). We can use decompose() to suppress the printing of the second part of the text of the parent, which is contained inside a span tag by eliminating the tag and its content. So we import BS and start with #JeffC's answer:
from bs4 import BeautifulSoup
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
and here switch to bs4
content = BeautifulSoup(s, 'html.parser')
content.find('span').decompose()
print(content.text)
And the output, without string manipulation, regex, or whatnot is...:
Text 3
Text 2
Text 6
Text 5
i.parent.text will not work, in java i used to write some thing like
ele.get(i).findElement("here path to parent may be parent::div ").getText();
Here is the python method that will retrieve the text from only parent node.
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to use the method in your case:
elements = driver.find_elements_by_css_selector("span.third")
for eleNum in range(len(elements)):
print(driver.find_element_by_xpath("(//span[#class='third'])[" + str(eleNum+1) +"]").text)
print(get_text_exclude_children(driver.find_element_by_xpath("(//span[#class='third'])[" + str(eleNum+1) +"]/parent::*")))
Here is the output:

Need Help in programming using selenium

I have a problem is I can not store the value of href on the page
<a target="_blank" href="http://xxx.xx/RLS?mid=-1050286007&guid=53v90152oyA8bDg&lid=26527875" clinkid="26527875"></a>
How can I take the value of href using findElement ?
You should try using getAttribute after finding element as below :-
String href = driver.findElement(By.cssSelector("a[clinkid = '26527875']")).getAttribute("href");
Edited 1:- if clinkid dynamically generated try using by visible link text as below :-
String href = driver.findElement(By.linkText("your link text")).getAttribute("href");
or
String href = driver.findElement(By.partialLinkText("your link text")).getAttribute("href");
or
String href = driver.findElement(By.cssSelector("a[target = '_blank']")).getAttribute("href");
Edited 2 :- if you want query string from url, you should implement java.net.URL to parse it into url and then use getQuery() to get query string as below :-
URL url = new URL(href);
String queryStr = url.getQuery();
Hope it helps..:)

Locating Element with same class in Selenium using c#

I am trying to access ABC. I know that simple By.ClassName("bb") will not work here. How else can I access this content.
<body>
<div id="Frame">
<div class="bb"></div>
<div class="bb">ABC</div>
</div>
</body>
You can use the below css selector to get the value of "ABC".
.bb:nth-child(2)
You can use "XPath" Expression to find or locating your element.
Example : element = findElement(By.xpath("Your xpath expression");
For your XML use following line.
element = findElement(By.xpath("/body/div/div[#class='bb'][node()]");
There is a way to do this in the search using XPath but I am not an XPath expert. I can give you a solution using CSS Selectors. Basically you grab all the DIVs with class bb and then search their text to find the desired text.
String searchText = "ABC";
IReadOnlyCollection<IWebElement> divs = driver.FindElements(By.CssSelector("div.bb"));
foreach (IWebElement div in divs)
{
if (div.Text == searchText)
{
break; // exit the for and use the variable 'div' which contains the desired DIV
}
}

How to get innerHTML of whole page in selenium driver?

I'm using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.
Somebody has shown how to get inner HTML of an element in a Selenium WebDriver. Is there a way to get HTML of the whole page? Thanks
The sample code in Python
(Based on the post above, the language seems to not matter too much):
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)
the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
To get the HTML for the whole page:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")
html = driver.page_source
To get the outer HTML (tag included):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')
To get the inner HTML (tag excluded):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
driver.page_source probably outdated. Following worked for me
let html = await driver.getPageSource();
Reference: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/ie_exports_Driver.html#getPageSource
Using page object in Java:
#FindBy(xpath = "xapth")
private WebElement element;
public String getInnnerHtml() {
System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML"));
return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")
}
A C# snippet for those of us who might want to copy / paste a bit of working code some day
var element = yourWebDriver.FindElement(By.TagName("html"));
string outerHTML = element.GetAttribute(nameof(outerHTML));
Thanks to those who answered before me. Anyone in the future who benefits from this snippet of C# that gets the HTML for any page element in a Selenium test, please consider up voting this answer or leaving a comment.

Testing relative positions of elements

On the page under test I have the following Support link:
Which is represented with the following HTML:
<div class="ap-version-panel ap-version-support">
<i class="fa fa-external-link"></i>Support
</div>
What I'm trying to is to test that the icon is located before the "Support" text. How can I do that?
I've tried the following - locate the icon element with an XPath that additionally checks that there is "Support" text after and check if the element is present:
expect(element(by.xpath("//text()[. = 'Support']/preceding-sibling::i[contains(#class, 'fa-external-link')]")).isPresent().toBe(true);
This works but it is quite ugly and I don't like that the actual position check is hidden inside the XPath expression which is not really readable and reliable.
I recommend changing the product under test by adding a <span> and move the text "Support" into the <span>.
<div class="ap-version-panel ap-version-support">
<i class="fa fa-external-link"></i><span>Support<span>
</div>
But if you cannot change the product, you can use javascriptExecutor to get childNodes then check order of the nodes:
var aFunctionToCheckOrder = function (arguments) {
var nodes = arguments[0].childNodes;
var result;
// Check nodes[0] = <i>
// Check node [1] is textNode and its value = "Support"
return result;
}
var supportLink = element(by.linkText("Support"));
browser.executeScript(aFunctionToCheckOrder, supportLink).then(...)
As you can see, it is more uglier than your solution. You'd better change your product under test.