Scrapy Xpath with text() equal to - scrapy

import scrapy
example='<div class="ParamText"><span>OWNER APP</span></div>
<div class="ParamText"><span>OWNER</span></div>
<div class="ParamText"><span>OWNER NAME</span></div>'
scrapy.Selector(text=example).xpath('//*[#class="ParamText"]/span[contains(text(),"OWNER")]').extract_first()
Here I need to scrape OWNER only sometimes 3 span I will get OWNER
output:
I am getting: OWNER APP
I want: OWNER

You can use the regular expression ^OWNER$ to match spans containing only OWNER.
Replace contains(text(),"OWNER") with re:test(text(),"^OWNER$").
The advantage of regular expressions is that you could also allow for spaces (^\s*OWNER\s*$) or support different letter cases ((?i)^OWNER$).

scrapy.Selector(text=example).xpath('//*[#class="ParamText"]/span/text()').extract()[1]

You can select by text equation like scrapy.Selector(text=txt).xpath('//*[#class="ParamText"]/span[text()="OWNER"]').get() or without span details, it will give you the first one: scrapy.Selector(text=txt).css('div.ParamText span').get()

Related

How to identify elements classname containing special characters using Selenium and VBA

I'm trying to learn more about how Selenium works with VBA and I'm trying to do somethings about the trendings behaviors of ecommerce nowadays.
In this case, I don't know how works the FindelementByclass when it has special characters like _ or - inside, because it always gives me empty result and I need to identify it because I want to go through every class called as it.
<span class="minificha__sku ng-binding">Cód TG: AS0-322</span>
space in class means it has two classes,
class="minificha__sku ng-binding"
means it has "minificha__sku" and "ng-binding" , so use xpath or css instead of byclass or use either of the two class not two
css:
span[class="minificha__sku ng-binding"]
xpath
//span[#class="minificha__sku ng-binding"]
To identify the element you can use either of the following Locator Strategies:
Using FindElementByClassName I:
bot.FindElementByClassName("minificha__sku")
Using FindElementByClassName II:
bot.FindElementByClassName("ng-binding")
Using FindElementByCss:
bot.FindElementByCss("span.minificha__sku.ng-binding")
Using FindElementByXPath:
bot.FindElementByXPath("//span[#class='minificha__sku ng-binding']")

How to Find an xpath with some string contained and some not contained in the div?(Xpath not contains)

**Problem :** How to Find an xpath with some string contained and some not contained in the div?(Xpath not contains)?
**Below are three examples**
1)<div>You are now connected to Customer Care Virtual Assistant.</div>
2)<div>You are now connected to sumit.</div>
3)<div>You are now connected to dev.</div>
I have tried *//*[contains(text(),'You are now')]*
this xpath as well but it gives me 3 results and want only to fetch for values which does not contain virtual
i am currently using this xpath - *//*[text()[contains(.,'You are now ') and not[contains(.,'Virtual')]]*
It's not working for me , let me know what mistake i am doing here.
Any help will be appreciated ,
Thanks
You were close.Try the following xpath.
//*[contains(.,'You are now') and not(contains(.,'Virtual'))]
Try another solution
//a[not(contains(text(), 'Virtual')) and contains(.,'You are now')]
or
//a[not(contains(text(), 'Virtual')) and contains(text(),'You are now')]
maybe you can use:
1- use "starts-with" function.
div[starts-with(#id,'something')]
2- use "contains" function.
div[contains(#id,'something')]
or
3- use full xpath, that id is not necessary to use attributes
/html/body/div[3]/div[2]/div[2]/div[2]/div[2]/div[2]/div[2]/div/div[4]/div[1]/div/div/div/div[1]/div/div/div/div[1]/div[2]/div[1]/form[1]/input[1]

Identifying an element from a group, span[i] is the differentiating factor

I have added the screenshot I have a group of elements that have the exact same xpath except the span tag.I want to identify the individual input fields, but unable to.
I have tried using contains, with class but unable to attach span to the xpath
Here is what the HTML looks like:
/html/body/div[#id='app']/div/div[#class='LayoutModify_LayoutModify_1Akxb']/main[#class='LayoutModify_main_5aBy3']/section[#class='sub-detail inner ProductDetail_productdetail_bJWN2']/div[#class='ProductDetail_productsphere_kgNGm']/div[#class='ProductDetail_threecol_2zA1n ProductDetail_productsphereleft_2pLZT']/span[4]/div[#class='el-input el-input--medium ProductDetail_productsphereinput_3eVZg']/input[#class='el-input__inner']
/html/body/div[#id='app']/div/div[#class='LayoutModify_LayoutModify_1Akxb']/main[#class='LayoutModify_main_5aBy3']/section[#class='sub-detail inner ProductDetail_productdetail_bJWN2']/div[#class='ProductDetail_productsphere_kgNGm']/div[#class='ProductDetail_threecol_2zA1n ProductDetail_productsphereright_3BrqC']/span[4]/div[#class='el-input el-input--medium ProductDetail_productsphereinput_3eVZg']/input[#class='el-input__inner']
Notice the span[4] and span[15] are the only differences
Quick question:
do either of these locators:
locator A
//span[4]/div/input[#class='el-input__inner']
locator B
//span[15]/div/input[#class='el-input__inner']
find any input on Your page?
If not - could You please post here the whole HTML page code here please?
Here is the xpath that worked:
//div[contains(#class,'ProductDetail_productsphereright')]/span[4]/div/input

How to validate a text in selenium where multiple spaces in between two words?

I would like to know how to write the Xpath for validate the text -'Confirm Passowrd*' where there is more than 10 space gap between two words
When I tried to get it using chropath tool it gives Xpath like
//label[contains(text(),'Confirm Password*')]
But even that is also not working.
you can use the normalize-space in xpath.
//label[#ng-if='add_user' and normalize-space(text())='Confirm Password*']
You can also get text by:
driver.findElement(By.xpath("//label[#ng-if='add_user']")).GetAttribute("innerText");
OR
driver.findElement(By.xpath("//label[#ng-if='add_user']")).GetAttribute("value");
driver.findElement(By.xpath("//label").GetAttribute("value");

Selenium RC Having problems with XPath for a table

I'm trying to select an element given by:
/html/body[#id='someid']/form[#id='formid']/div[#id='someid2']/div[#id='']/div[#id='']/div[#id='']/table/tbody[#id='tableid']/tr[7]/td[2]
Now the html of that row I'm trying to select looks like this:
<tr>
<td class="someClass">some text</td>
<td class="someClass2">my required text for verifying</td>
</tr>
I need to check whether my required text for verifying exists in the page.
I used selenium.isTextPresent("my required text for verifying"); and it doesnt work
So now I tried with selenium.isElementPresent("//td[contains(text(),'my required text for verifying')]")
This works sometimes but occassionally gives random failures.
Tried with selenium.isElementPresent(//*[contains(text(),'my required text for verifying')]) too..
How do I verify this text on the page using selenium?
The problem is not with the page taking time to load. I took screenshots before the failure occurs and found that the page was fully loaded so that shouldnt be the problem.
Could someone please suggest any way to select this element or any way to validate this text on the screen?
Try locating it by CSS:
assertText(selenium.getText("css=.someClass2"), "my required text for verifying");
The above should give a better failure message than isElementPresent, but you can still use that with CSS locators:
assertTrue(selenium.isElementPresent("css=.someClass2"));
If there is an issue with the load times you could try waiting for the element to be present:
selenium.waitForCondition("var value = selenium.isElementPresent('css=.someClass2'); value == true", "60000");
Some other XPath locators that might work for you, if you prefer not to use CSS locators:
//td[contains(#class, 'someClass2')
xpath=id('tableid')/tr[7]/td[2]
xpath=id('tableid')/descendant::td[contains(#class, 'someClass2')][7]
I've never heard of selenium; but your initial XPath is unnecessarily fragile and verbose.
If an element has an id, it's unique; using such a long XPath just to select a particular element is unnecessary; just select the last element with the id. Further, I see that you're occasionally selecting xyz[#id=''] - if you're trying to select elements without id attributes, you can do `xyz[not(#id)] instead.
Assuming your initial XPath is basically correct, it would suffice to do something like this:
//tbody[#id='tableid']/tr[7]/td[2]
However, using a specific row and column number like that is asking for trouble if ever anyhow changes details of the html. Also, it's atypical to have id's on tbody elements, perhaps the table element has the id?
Finally, you may be running into space-normalization issues. In xml, multiple consecutive spaces are often considered equivalent to a single space, and you're not accounting for that. In particular, if the xhtml is pretty-printed and contains a line-break in the middle of your sought-after text, it won't work.
//td[contains(normalize-space(text()),'my required text for verifying')]
Finally, text() explicitly selects
child text nodes - so the above xpath won't select elements where the text isn't the immediate child of td (e.g. <td><b>my required text for verifying</b></td>) won't match. Perhaps you mean to look up the concatenated text vale of all descendents:
//td[contains(normalize-space(string(.)),'my required text for verifying')]
Finally, type conversion can be implicit in XPath, so string(.) can be replaced by . in the above, leading to the version:
//td[contains(normalize-space(.),'my required text for verifying')]
This may be slow on large documents since it needs to normalize the spaces and perform a string search for each td element. If you run into perf problems, try to be more specific about which td elements need to be inspected, or, if you don't care where the text occurs, try to reduce the number of "calls" to normalize-space by normalizing the entire doc in one go (e.g. via /*[contains(normalize-space(.),'my required text for verifying')]).