Selenium find all the elements which have two divs - selenium

I am trying to collect texts and images from a website to help collect missing people related tweets. Here is the problem:
Some tweets don't have images so the corresponding <div class='c' ....> has only one <div>...</div>.
Some tweets have images, so the corresponding <div class='c' ....> has two <div>...</div>, as shown in the following codes:
<div class='c' id="M_D*****">
<div>...</div>
and
<div class='c' id="M_D*****">
<div>...</div>
<div>...</div>
I intend to check whether a tweet has an image, i.e. find out whether the corresponding <div class='c' ....> has two <div>...</div>.
PS: The following codes are used to collect all the texts and image URLs but not all tweets have images so I want to match them by solving the above problem.
tweets = browser.find_elements_by_xpath("//span[#class='ctt']")
graph_links = browser.find_elements_by_xpath("//img[#alt='img' and #class='ib']")
This is a public welfare program, which aims to help the missing people go back home.

By collecting the text and the images separately, I think that it's going to be impossible to match the text with the related image after the fact. I would suggest a different approach. I would search for the <div class='c'...> that contains both the text and the optional image. Once you have the "container" DIV, you can then get the text and see if an image exists and put them all together. Without all the relevant HTML, you may have to tweak the code below but it should give you an idea on how to approach this.
containers = browser.find_elements_by_css_selector("div.c")
for container in containers:
print container.find_element_by_css_selector("span.ctt").text // the tweet text
images = container.find_elements_by_css_selector("img.ib")
if len(images) > 0 // see if the image exists
print images[0].get_attribute("src") // the URL of the image
print "-------------" // separator between tweets

The html you provided is probably not enough, but basing on it I suggest xpath: //div[#id='M_D*****' and ./div//img] which find div with specified id and containing div with image.
But answering directly to your question:
//div[./div[2] and not(./div[3])] will find all divs with exactly 2 div children

Related

Selenium XPath find element where second text child element contains certain text (use contains on array item)

The page contains a multi-select dropdown (similar to the one below)
The html code looks like the below:
<div class="button-and-dropdown-div>
<button class="Multi-Select-Button">multi-select button</button>
<div class="dropdown-containing-options>
<label class="dropdown-item">
<input class="checkbox">
"
Name
"
</label>
<label class="dropdown-item">
<input class="checkbox">
"
Address
"
</label>
</div>
After testing in firefox developer tools, I was finally able to figure out the xPath needed in order to get the text for a certain label ...
The below XPath statement will return the the text "Phone"
$x("(//label[#class='dropdown-item'])[4]/text()[2]")
The label contains multiple text items (although it looks like there is just one text object when looking at the UI) in the label element. There are actually two text elements within each label element. The first is always empty, the second contains the actual text (as shown in the below image when observing the element through the Firefox developer tool's console window):
Question:
How do I modify the XPath shown above in order to use in Selenium's FindElement?
Driver.FindElement(By.XPath("?"));
I know how to use the contains tool, but apparently not with more complex XPath statements. I was pretty sure one of the below would work but they did not (develop tool complain of a syntax error):
$x("(//label[#class='dropdown-item' and text()[2][contains(., 'Name')]]")
$x("(//label[#class='dropdown-item' and contains(text()[2], 'Name')]")
I am using the 'contains' in order to avoid white-space conflicts.
Additional for learning purposes (good for XPath debugging):
just in case anyone comes across this who is new to XPath, I wanted to show what the data structure of these label objects looked like. You can explore the data structure of objects within your webpage by using the Firefox Console window within the developer tools (F12). As you can see, the label element contains three sub-items; text which is empty, then the inpput checkbox, then some more text which has the actual text in it (not ideal). In the picture below, you can see the part of the webpage that corresponds to the label data structure.
If you are looking to find the element that contains "Name" given the HTML above, you can use
//label[#class='dropdown-item'][contains(.,'Name')]
So finally got it to work. The Firefox developer environment was correct when it stated there was a syntax problem with the XPath strings.
The following XPath string finally returned the desired result:
$x("//label[#class='dropdown-item' and contains(text()[2], 'Name')]")

Scrapy - Cleaning up text[/p] from nested links[/a] etc

I am new to python and scrape as well. Nevertheless, I spend a few days trying to scrape news articles from its archive - SUCCESSFULLY.
PROBLEM is that when I scrape CONTENT of the article <p> that content is filled with additional tags like - strong, a etc. And as such scrapy won't pull it out and I am left with news article containing 2/3 of the text. Will try HTML below:
<p> According to <a> Japan's newspapers </a> it happened ... </p>
Now I tried googling around and looking into the forum here. There were some suggestion but from what I tried, it did not work or broke my spider:
I have read about normalized-space and remove tags but it didn't work. Thank you for any insights in advance.
Please provide your selector for more detailed help.
Given what you're describing, I'd guess you're selecting p/text() (xml) or p::text (css), which is not going to get the text in the children of <p> elements.
You should try selecting response.xpath('//p/descendant-or-self::*/text()') to get the text in the <p> and all it's children.
You could also just select the <p>, not its text, and you'll get its children as well. From there you can start cleaning up the tags. There are answered questions regarding how to do that.
You could use string.replace(,)
new_string = old_string.replace("<a>", "")
You could integrate this into a loop which iterates over a list that contains all of the substrings that you want to discard.

Selenium - Search for an element within element

Hi I want to hold element references in files somewhere. and then in run time search for elements withing referenced elements in Selenium how to do that.
For example- a Frame contains multiple text boxes -and multiple frames of similar properties exist where the textboxes are also duplicate. Something like I wanna reference the text box under a particular frame. But i wanna predefine the frame. and the specify that search under that frame[Something like Aliases in Testcomplete]
For Example - similar concept exist in Cheezy's Page-Objects. but not quite.
if you have a structure like this:
<div class='some class'>
<input class='input-button' value='submit'>Submit</input>
</div>
<div class='some class2'>
<input class='input-button' value='submit'>Submit</input>
</div>
and you want to find the first 'Submit' which is within the 'some class' div, you can do this:
parent_element = driver.find_element(:xpath, "//div[#class='some class']")
child_element = parent_element.find_element(:xpath, ".//input")
p.s. this is ruby code.

How do I select a particular dynamic div, using Selenium when I don't have a unique id or name?

Only the content of the div is unique. So, in the following dynamically generated html, only "My Article-1245" is unique:
<div class="col-md-4 article">
<h2>
My Article-1245
Delete
Edit
</h2>
<p>O ephemeral text! Here today, gone tomorrow. Not terribly important, but necessary</p>
</div>
How do I select the edit/delete link of this specific div, using Selenium? assertText/verifyText requires an element locator, but I do not have any unique id/name (out of my control). There will be many such div blocks, with other content text, all dynamically generated.
Any help would be appreciated.
If text 'My Article' appears each time, you may use following:
//For Delete
driver.findElement(By.xpath("//h2[contains(text(),'My Article-')]/a[text()='Delete']"));
//For Edit
driver.findElement(By.xpath("//h2[contains(text(),'My Article-')]/a[text()='Edit']"));
Hope it meets your requirement :)
Matching by text is always a bad automated testing concept. If you want to keep clean and reliable test scripts, then :
Contact your web dev to add unique identifiers to the elements
Suck it up, and create selectors based on what's there.
You are able to create a CSS selector based on what you want.
What you should do is create the selector using parent-child relationships:
driver.findElement(By.cssSelector("div.article:nth-child(X) a[href^='delete']"));
As I am ignorant of your appp, this is also assuming that all of your article classes are under the same parent. You would substitute X with the number of the div you want to refer to. e.g.:
<div id="someparent">
<div class="...article" />
<div class="...article" />
...
</div>

dijit.InlineEditBox with highlighted html

I have some dijit.InlineEditBox widgets and now I need to add some search highlighting over them, so I return the results with a span with class="highlight" over the matched words. The resulting code looks like this :
<div id="title_514141" data-dojo-type="dijit.InlineEditBox"
data-dojo-props="editor:\'dijit.form.TextBox\', onFocus:titles.save_old_value,
onChange:titles.save_inline, renderAsHtml:true">Twenty Thousand Leagues <span
class="highlight">Under</span> the Sea</div>
This looks as expected, however, when I start editing the title the added span shows up. How can I make the editor remove the span added so only the text remains ?
In this particular case the titles of the books have no html in them, so some kind of full tag stripping should work, but it would be nice to find a solution (in case of short description field with a dijit.Editor widget perhaps) where the existing html is left in place and only the highlighting span is removed.
Also, if you can suggest a better way to do this (inline editing and word highlighting) please let me know.
Thank you !
How will this affect your displayed content in the editor? It rather depends on the contents you allow into the field - you will need a rich-text editor (huge footprint) to handle html correctly.
These RegExp's will trim away XML tags
this.value = this.displayNode.innerHTML.replace(/<[^>]*>/, " ").replace(/<\/[^>]*>/, '');
Here's a running example of the below code: fiddle
<div id="title_514141" data-dojo-type="dijit.InlineEditBox"
data-dojo-props="editor:\'dijit.form.TextBox\', onFocus:titles.save_old_value,
onChange:titles.save_inline, renderAsHtml:true">Twenty Thousand Leagues <span
class="highlight">Under</span> the Sea
<script type="dojo/method" event="onFocus">
this.value = this.displayNode.innerHTML.
replace(/<[^>]*>/, " ").
replace(/<\/[^>]*>/, '');
this.inherited(arguments);
</script>
</div>
The renderAsHtml attribute only trims 'off one layer', so embedded HTML will still be html afaik. With the above you should be able to 1) override the onFocus handling, 2) set the editable value yourself and 3) call 'old' onFocus method.
Alternatively (as seeing you have allready set 'titles.save_*' in props, use dojo/connect instead of dojo/method - but you need to get there first, sort of say.