How to use 'find_elements_by_xpath' inside a for loop - selenium

I'm somewhat (or very) confused about the following:
from selenium.webdriver import Chrome
driver = Chrome()
html_content = """
<html>
<head></head>
<body>
<div class='first'>
Text 1
</div>
<div class="second">
Text 2
<span class='third'> Text 3
</span>
</div>
<div class='first'>
Text 4
</div>
<my_tag class="second">
Text 5
<span class='third'> Text 6
</span>
</my_tag>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
What I'm trying to do, is find each span element using xpath, print out its text and then print out the text of the parent of that element. The final output should be something like:
Text 3
Text 2
Text 6
Text 5
I can get the text of span like this:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
With the output being:
Text 3
Text 6
But when I try to get the parent's (and only the parent's) text by using:
elp = driver.find_elements_by_xpath("*//span/..")
for i in elp:
print(i.text)
The output is:
Text 2 Text 3
Text 5 Text 6
The xpath expressions *//span/..and //span/../text() usually (but not always, depending on which xpath test site is being used) evaluate to:
Text 2
Text 5
which is what I need for my for loop.
Hence the confusion. So I guess what I'm looking for is a for loop which, in pseudo code, looks like:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
print(i.parent.text) #trying this in real life raises an error....

There's probably a few ways to do this. Here's one way
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
print(s.split('<')[0].strip())
I used a simple CSS selector to find the child elements ("text 3" and "text 6"). I loop through those elements and print their .text as well as navigate up one level to find the parent and print its text also. As OP noted, printing the parent text also prints the child. To get around this, we need to get the innerHTML, split it and strip out the spaces.
To explain the XPath in more detail
./..
^ start at an existing node, the 'i' in 'i.find_element_*'. If you skip/remove this '.', you will start at the top of the DOM instead of at the child element you've already located.
^ go up one level, to find the parent

I know I already accepted #JeffC's answer, but in the course of working on this question something occurred to me. It's very likely an overkill, but it's an interesting approach and, for the sake of future generations, I figured I might as well post it here as well.
The idea involves using BeautifulSoup. The reason is that BS has a couple of methods for erasing nodes from the tree. One of them which can be useful here (and for which, to my knowledge, Selenium doesn't have an equivalent method) is decompose() (see more here). We can use decompose() to suppress the printing of the second part of the text of the parent, which is contained inside a span tag by eliminating the tag and its content. So we import BS and start with #JeffC's answer:
from bs4 import BeautifulSoup
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
and here switch to bs4
content = BeautifulSoup(s, 'html.parser')
content.find('span').decompose()
print(content.text)
And the output, without string manipulation, regex, or whatnot is...:
Text 3
Text 2
Text 6
Text 5

i.parent.text will not work, in java i used to write some thing like
ele.get(i).findElement("here path to parent may be parent::div ").getText();

Here is the python method that will retrieve the text from only parent node.
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to use the method in your case:
elements = driver.find_elements_by_css_selector("span.third")
for eleNum in range(len(elements)):
print(driver.find_element_by_xpath("(//span[#class='third'])[" + str(eleNum+1) +"]").text)
print(get_text_exclude_children(driver.find_element_by_xpath("(//span[#class='third'])[" + str(eleNum+1) +"]/parent::*")))
Here is the output:

Related

How to get text that has no node and attribute using xpath in selenium

i'm trying to get a certain text in HTML using xpath.
The HTML is as below and as you see,
the "target text" which i want to get is in node p.
But "target text" doesn't have its node or attribute,
it is just presented alone in node p.
How can i get this?
Supplement : I'm using xpath in selenium. So, i couldn't use "text()" in xpath query
<p class="mean" lang="ko">
<span class="word_class ">non-target text1 </span>
<span class="mark">non-target text2 </span>
target text
</p>
target text belongs to parent p node.
What you need to do here is:
Get the parent element text (it will include parent element text content and child element text contents).
Then remove child element text contents.
In case this is done with Selenium the code can be as following:
parent_text = ""
all_text = driver.find_element(By.XPATH, ("//p[#class='mean']")).text
child_elements = driver.find_elements(By.XPATH, ("//*[#class='parent']//*"))
for child_element in child_elements:
parent_text = all_text.replace(child_element.text, '')
print(parent_text)
Use //p[#class = 'mean' and #lang = 'ko']/text()[normalize-space()] to select any text node children of that p element that contain more than white space. Note that the text node contents begins after the closing </span> and ends before the closing </p> so its content with be e.g.
target text
If you want to remove leading and trailing white space you can use e.g. normalize-space(//p[#class = 'mean' and #lang = 'ko']/text()[normalize-space()]).

Aurelia repeat.for generating an extra element

Problem
I have a repeat.for in my html markup bound to an array on my viewmodel. The array only has one element. However, the generation of the HTML always produces one extra undefined element.
This behavior only appears in one section of my code which makes it very hard to duplicate -- repeat.for is working everywhere else in my code.
Proof
So before you label me crazy and try to insist that my array must have more items than I think, have a look at my code and at the output from the Aurelia Inspector.
client.js
export class Client {
months = []
activate(id) {
...
this.loadData()
}
loadData(){
this.months.push("Item1")
}
client.html
<section class="scrollable">
${months.length}
<div repeat.for="month of months">
Index = ${$index}
</div>
</section>
My output looks like this:
1
Index = 0
Index = 1
If I use Aurelia Inspector, here is what I see:
Inspecting '1' = ${months.length}
Inspecting 'Index = 0' = ${$index}
Inspecting 'Index = 1' = ${$index}
Has anyone seen similar behavior with repeat.for? Any ideas as to what I might be doing wrong?

Using Selenium to select text

Want to select the text "This is for testing selector" from below HTML code.
<div class="breadcrumb">
<a title=" Home" href="http://www.google.com/"> Home</a>
<span class="arrow">»</span>
<a title="abc" href="http://www.google.com/">test1</a>
<span class="arrow">»</span><a title="xyz" href="http://www.google.com/">test2</a>
<span class="arrow">»</span>
This is for testing selector
</div>
I'm not sure if there an easy way out for this or not. It turned out to be more difficult than I thought. Below mentioned code is tested locally and giving correct output for me ;)
String MyString= driver.findElement(By.xpath("//div[#class='breadcrumb']")).getText();
//get all child nodes of div parent class
List<WebElement> ele= driver.findElements(By.xpath("//div[#class='breadcrumb']/child::*"));
for(WebElement i:ele) {
//substracing a text of child node from parent node text
MyString= MyString.substring(i.getText().length(), MyString.length());
//removing white spaces
MyString=MyString.trim();
}
System.out.println(MyString);
Let me know if it works for you or not!
Try with this example :
driver.get("http://www.google.com/");
WebElement text =
findElement(By.className("breadcrumb")).find("span").get(1);
Actions select = new Actions(driver);
select.doubleClick(text).build().perform();
I suggest also that you copy the xpath for the text you need and put it here to have the exact xpath
You cannot select text inside an element using xpath.
Xpath can only help you select XML elements, or in this case, HTML elements.
Typically, text should be encased in a span tag, however, in your case, it isn't.
What you could do, however, is select the div element encasing the text. Try this xpath :
(//div[#class='breadcrumb']/span)[3]/following-sibling::text()
You could try Abhijeet's Answer if you just want to get the text inside. As an added check, check if the string obtained from using getText() on root element contains the string obtained from using getText() on the child elements.

Getting inner tag text whilst using a filter in BeautifulSoup

I have:
... html
<div id="price">$199.00</div>
... html
How do I get the $199.00 text. Using
soup.findAll("div",id="price",text=True)
does not work as I get all the innet text from the whole document.
Find div tag, and use text attribute to get text inside the tag.
>>> from bs4 import BeautifulSoup
>>>
>>> html = '''
... <html>
... <body>
... <div id="price">$199.00</div>
... </body>
... </html>
... '''
>>> soup = BeautifulSoup(html)
>>> soup.find('div', id='price').text
u'$199.00'
You are SO close to make it work.
(1) How to search and locate the tag that you are interested:
Let's take a look at how to use find_all function:
find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs):...
name="div":The name attribute will contains the tag name
attrs={"id":"price"}: The attrs is a dictionary who contains the attribute
recursive: a flag whether dive into its children or not.
text: could be used along with regular expressions to search for tags which contains certain text
limit: is a flag to choose how many you want to return limit=1 make find_all the same as find
In your case, here are a list of commands to locate the tags playing with different flags:
>> # in case you have multiple interesting DIVs I am using find_all here
>> html = '''<html><body><div id="price">$199.00</div><div id="price">$205.00</div></body></html>'''
>> soup = BeautifulSoup(html)
>> print soup.find_all(attrs={"id":"price"})
[<div id="price">$199.00</div>, <div id="price">$205.00</div>]
>> # This is a bit funky but sometime using text is extremely helpful
>> # because text is actually what human will see so it is more reliable
>> import re
>> tags = [text.parent for text in soup.find_all(text=re.compile('\$'))]
>> print tags
[<div id="price">$199.00</div>, <div id="price">$205.00</div>]
There are many different ways to locate your elements and you just need to ask yourself, what will be the most reliable way to locate a element.
More Information about BS4 Find, click here.
(2) How to get the text of a tag:
tag.text will return unicode and you can convert to string type by using tag.text.encode('utf-8')
tag.string will also work.

Alternative to innerHTML for IE

I create HTML documents that include proprietary tags for links that get transformed into standard HTML when they go through our publishing system. For example:
<LinkTag contents="Link Text" answer_id="ID" title="Tooltip"></LinkTag>
When I'm authoring and reviewing these documents, I need to be able to test these links in a browser, before they get published. I wrote the following JavaScript to read the attributes and write them into an <a> tag:
var LinkCount = document.getElementsByTagName('LinkTag').length;
for (i=0; i<LinkCount; i++) {
var LinkText = document.getElementsByTagName('LinkTag')[i].getAttribute('contents');
var articleID = document.getElementsByTagName('LinkTag')[i].getAttribute('answer_id');
var articleTitle = document.getElementsByTagName('LinkTag')[i].getAttribute('title');
document.getElementsByTagName('LinkTag')[i].innerHTML = '' + LinkText + '';
}
This works great in Firefox, but not in IE. I've read about the innerHTML issue with IE and imagine that's the problem here, but I haven't been able to figure a way around it. I thought perhaps jQuery might be the way to go, but I'm not that well versed in it.
This would be a huge productivity boost if I could get this working in IE. Any ideas would be greatly appreciated.
innerHTML only works for things INSIDE of the open/close tags. So for instance if your LinkTag[i] is an <a> element, then putting innerHTML="<a .... > </a> would put that literally between the <a tag=LinkTag> and </a>.
You would need to put that in a DIV. Perhaps use your code to draw from links, then place the corresponding HTML code into a div.
var LinkCount = document.getElementsByTagName('LinkTag').length;
for (i=0; i<LinkCount; i++) {
var LinkText = document.getElementsByTagName('LinkTag')[i].getAttribute('contents');
var articleID = document.getElementsByTagName('LinkTag')[i].getAttribute('answer_id');
var articleTitle = document.getElementsByTagName('LinkTag')[i].getAttribute('title');
document.getElementsById('MyDisplayDiv')[i].innerHTML = '' + LinkText + '';
This should produce your HTML results within a div. You could also simply append the other LinkTag elements to a single DIV to produce a sort of "Preview of Website" within the div.
document.getElementsById('MyDisplayDiv').innerHTML += '' + LinkText + '';
Note the +=. Should append your HTML fabrication to the DIV with ID "MyDisplayDiv". Your list of LinkTag elements would be converted into a list of HTML elements within the div.
DOM functions might be considered more "clean" (and faster) than simply replacing the innerHTML in cases like this. Something like this may work better for you:
// where el is the element you want to wrap with <a link.
newel = document.createElement('a')
el.parentNode.insertBefore(newel,prop);
el = prop.parentNode.removeChild(prop);
newel.appendChild(prop);
newel.setAttribute('href','urlhere');
Similar code worked fine for me in Firebug, it should wrap the element with <a> ... </a>.
I wrote a script I have on my blog which you can find here: http://blog.funelr.com/?p=61 anyways take it a look it automatically fixes ie innerHTML errors in ie8 and ie9 by hijacking ie's innerHTML property which is "read-only" for table elements and replacing it with my own.
I have this xmlObject:
<c:value i:type="b:AccessRights">ReadAccess WriteAccess AppendAccess AppendToAccess CreateAccess DeleteAccess ShareAccess AssignAccess</c:value>
I find value use this: responseXML.getElementsByTagNameNS("http://schemas.datacontract.org/2004/07/System.Collections.Generic",'value')[0].firstChild.data
This one is the best : elm.insertAdjacentHTML( 'beforeend', str );
In case your element size is very small :elm.innerHTML += str;