How can I get the number of div inside a div in scrapy with css? - scrapy

I would like to count the number of div (so 2) inside the div "thumb-container".
So far I have used css selectors so I would like to keep using css and not xpath:
yield {
'date':datetime.date.today(),
'title': response.css('h1::text').extract()[-1],
'rating':response.css('bl-rating::attr(rating)').get(),
}

getall() returns a list so you can get it's length.
With the code you provided (next time post it as text and not as image):
In [1]: len(response.css('div.thumb-container div').getall())
Out[1]: 2

Related

How to use 'find_elements_by_xpath' inside a for loop

I'm somewhat (or very) confused about the following:
from selenium.webdriver import Chrome
driver = Chrome()
html_content = """
<html>
<head></head>
<body>
<div class='first'>
Text 1
</div>
<div class="second">
Text 2
<span class='third'> Text 3
</span>
</div>
<div class='first'>
Text 4
</div>
<my_tag class="second">
Text 5
<span class='third'> Text 6
</span>
</my_tag>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
What I'm trying to do, is find each span element using xpath, print out its text and then print out the text of the parent of that element. The final output should be something like:
Text 3
Text 2
Text 6
Text 5
I can get the text of span like this:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
With the output being:
Text 3
Text 6
But when I try to get the parent's (and only the parent's) text by using:
elp = driver.find_elements_by_xpath("*//span/..")
for i in elp:
print(i.text)
The output is:
Text 2 Text 3
Text 5 Text 6
The xpath expressions *//span/..and //span/../text() usually (but not always, depending on which xpath test site is being used) evaluate to:
Text 2
Text 5
which is what I need for my for loop.
Hence the confusion. So I guess what I'm looking for is a for loop which, in pseudo code, looks like:
el = driver.find_elements_by_xpath("*//span")
for i in el:
print(i.text)
print(i.parent.text) #trying this in real life raises an error....
There's probably a few ways to do this. Here's one way
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
print(s.split('<')[0].strip())
I used a simple CSS selector to find the child elements ("text 3" and "text 6"). I loop through those elements and print their .text as well as navigate up one level to find the parent and print its text also. As OP noted, printing the parent text also prints the child. To get around this, we need to get the innerHTML, split it and strip out the spaces.
To explain the XPath in more detail
./..
^ start at an existing node, the 'i' in 'i.find_element_*'. If you skip/remove this '.', you will start at the top of the DOM instead of at the child element you've already located.
^ go up one level, to find the parent
I know I already accepted #JeffC's answer, but in the course of working on this question something occurred to me. It's very likely an overkill, but it's an interesting approach and, for the sake of future generations, I figured I might as well post it here as well.
The idea involves using BeautifulSoup. The reason is that BS has a couple of methods for erasing nodes from the tree. One of them which can be useful here (and for which, to my knowledge, Selenium doesn't have an equivalent method) is decompose() (see more here). We can use decompose() to suppress the printing of the second part of the text of the parent, which is contained inside a span tag by eliminating the tag and its content. So we import BS and start with #JeffC's answer:
from bs4 import BeautifulSoup
elp = driver.find_elements_by_css_selector("span.third")
for i in elp:
print(i.text)
s = i.find_element_by_xpath("./..").get_attribute("innerHTML")
and here switch to bs4
content = BeautifulSoup(s, 'html.parser')
content.find('span').decompose()
print(content.text)
And the output, without string manipulation, regex, or whatnot is...:
Text 3
Text 2
Text 6
Text 5
i.parent.text will not work, in java i used to write some thing like
ele.get(i).findElement("here path to parent may be parent::div ").getText();
Here is the python method that will retrieve the text from only parent node.
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to use the method in your case:
elements = driver.find_elements_by_css_selector("span.third")
for eleNum in range(len(elements)):
print(driver.find_element_by_xpath("(//span[#class='third'])[" + str(eleNum+1) +"]").text)
print(get_text_exclude_children(driver.find_element_by_xpath("(//span[#class='third'])[" + str(eleNum+1) +"]/parent::*")))
Here is the output:

Selenium webdriver:select a "div" from many "div"s that dynamically change the absolute path

I need some help in selecting a div form many div's that for every session number of div's are changing.
For example:
one time a have the div in that position(absolute path): /html/body/div[97]
other time in that position (absolute path):/html/body/div[160]
and so on...
At a moment only one div is active and the other div's are hidden.
I attached a picture to show the code.
I try the xpath below but doesn't work,I get the error "no such element: Unable to locate element ...
driver.findElement(By.xpath(".//*[#class=\'ui-selectmenu-menu ui-selectmenu-open\']/ul/li[1]")).click();
Picture with html code is here:
XPath is great if you expect the target element to be in the exact location every time the page is displayed. However, CSS is the way to go if the content is constantly changing.
In the example below I have found the DIV and the Element that you highlighted in your screen capture.
WebElement targetElementDiv = null;
WebElement targetElement = null;
targetElementDiv = driver.findElement(By.cssSelector("[class='ui-selectmenu-menu ui-selectmenu-open']"));
if (targetElementDiv != null) {
targetElement = targetElementDiv.findElement(By.cssSelector("[class='ui-menu-item ui-state-focus'])");
}

how to check that all of the images on a page have a src attribute that is not empty?

How can I check that all of the images on a page have a src attribute that is not empty using nightwatch.js? I have a page of images and sometimes the src attribute is empty. I want to check for that
By using an XPath Selenium locator like this:
.//img[#src[not(string())]]
Or , for non-empty:
.//img[#src[string()]]
I am guessing here, but to get the list of elements so you can assert on the size of the list, get them like this:
return this.client.useXpath().elements('/xpath/expression');
Use the getAttribute command to fetch the src attribute of your image and then assert in the callback. For example:
client.getAttribute(".container img", "src", function (result) {
this.assert.equal(typeof result, "object");
this.assert.equal(result.status, 0);
this.assert.ok(result.value);
});
More about getAttribute: http://nightwatchjs.org/api#getAttribute
you can compare the length of the results returned by
img
and
img[src]
The first CSS selector will get all img tags, the 2nd will get all img tags with the src property defined.

How to get the value of an attribute using XPath

I have been testing using Selenium WebDriver and I have been looking for an XPath code to get the value of the attribute of an HTML element as part of my regression testing. But I couldn't find a good answer.
Here is my sample html element:
<div class="firstdiv" alt="testdiv"></div>
I want to get the value of the "alt" attribute using the XPath. I have an XPath to get to the div element using the class attribute which is:
//div[#class="firstdiv"]
Now, I am looking for an XPath code to get the value of the "alt" attribute. The assumption is that I don't know what is the value of the "alt" attribute.
You can use the getAttribute() method.
driver.findElement(By.xpath("//div[#class='firstdiv']")).getAttribute("alt");
Using C#, .Net 4.5, and Selenium 2.45
Use findElements to capture firstdiv elements into a collection.
var firstDivCollection = driver.findElements(By.XPath("//div[#class='firstdiv']"));
Then iterate over the collection.
foreach (var div in firstDivCollection) {
div.GetAttribute("alt");
}
Just use executeScript and do XPath or querySelector/getAttribute in browser. Other solutions are wrong, because it takes forever to call getAttribute for each element from Selenium if you have more than a few.
var hrefsPromise = driver.executeScript(`
var elements = document.querySelectorAll('div.firstdiv');
elements = Array.prototype.slice.call(elements);
return elements.map(function (element) {
return element.getAttribute('alt');
});
`);
Selenium Xpath can only return elements.
You should pass javascript function that executes xpaths and returns strings to selenium.
I'm not sure why they made it this way. Xpath should support returning strings.

Dojo disable all input fields in div container

Is there any way to disable all input fields in an div container with dojo?
Something like:
dijit.byId('main').disable -> Input
That's how I do it:
dojo.query("input, button, textarea, select", container).attr("disabled", true);
This one-liner disables all form elements in the given container.
Sure there is. Open up this form test page for example, launch FireBug and execute in the console:
var container = dojo.query('div')[13];
console.log(container);
dojo.query('input', container).forEach(
function(inputElem){
console.log(inputElem);
inputElem.disabled = 'disabled';
}
)
Notes:
On that test page form elements are actually dijit form widgets, but in this sample I'm treating them as if they were normal input tags
The second dojo.query selects all input elements within the container element. If the container had some unique id, you could simplify the sample by having only one dojo.query: dojo.query('#containerId input').forEach( ...
forEach loops through all found input elements and applies the given function on them.
Update: There's also a shortcut for setting an attribute value using NodeList's attr function instead of forEach. attr takes first the attribute name and then the value or an object with name/value pairs:
var container = dojo.query('div')[13];
dojo.query('input', container).attr('disabled', 'disabled');
Something else to keep in mind is the difference between A Dijit and a regular DomNode. If you want all Dijit's within a DomNode, you can convert them from Nodes -> Dijit refs with query no problem:
// find all widget-dom-nodes in a div, convert them to dijit reference:
var widgets = dojo.query("[widgetId]", someDiv).map(dijit.byNode);
// now iterate over that array making each disabled in dijit-land:
dojo.forEach(widgets, function(w){ w.attr("disabled", "disabled"); }
It really just depends on if your inputs are regular Dom input tags or have been converted into the rich Dijit templates (which all do have a regular input within them, just controlled by the widget reference instead)
I would do it like this:
var widgets;
require(["dijit/registry", "dojo/dom"], function(registry, dom){
widgets = registry.findWidgets(dom.byId(domId));
});
require(["dojo/_base/array"], function(array){
array.forEach(widgets, function(widget, index) {
widget.set("disabled", true);
});
});
Method findWidgets is essential to get all widgets underneath a specific DOM.