How to exclude certain paths of xpath without getting scraped? - scrapy

I tried to scrape the data which is neccesary, but when I am trying to exclude the part which is not needed, I was unable to do that. Please help in scraping the data which is necessary?
Case - 1:
<div class="abc xyz">
<div class="aaaaaa bbbbbb">
"I dont want to include this"
</div>
***"I just want to scrape this"***
</div>
Case - 2:
<div class="abc xyz">
<div class="aaaaaa bbbbbb">
</div>
***"I just want to scrape this"***
</div>
Both the cases, the output which I tried to get is "I just want to scrape this".
Already tried scraping using './/div[contains(#class,"abc")]//text()' - but in the first case it is giving output as "I dont want to include thisI just want to scrape this", In second case the expected output is scraped.

This one will have some garbage in result, but it will do the job:
result = response.xpath('//div[#class="abc xyz"]/text()').extract()
result = "".join(result)

Related

Selenium cannot find element: directly under div

I am trying to use xpath to find a specific text from a page. Selenium can not find this exact text.
Here is the code I am trying with:
searchresult = self.driver.find_element_by_xpath("//*[contains(text(), 'Sorry, we could not find the book you were looking for.')]")
Here is the HTML I am working on:
<div style="width:100%;padding: 20px;">
<div class="searchbox">
</div>
<div class="vert-spacing-15"></div>
Sorry, we could not find the book you were looking for. Please try
searching for something else.
</div>
Try it like:
//*[contains(., 'Sorry, we could not find the book you were looking for.')]
Why? Because text() will only match the text before the first child element.

Unable to retrieve the text inside the div tag even when the tag is identified

I am trying to retrieve the text embedded inside the div tag. Partial html code is given below. I consulted the other existing answers, but the tag is located successfully but the text is coming back as empty string.My purpose is to retrieve the string between the 'div' tag as "You entered an invalid username or password, please try again."
I used the xpath
//div[#class='login-card js-login-card']/div[#role='alert']/div[2]
I used the css
.alert__heading.js-alert--error-text
This only getting back the tag name as div, but the text as an empty string.
Any ideas or corrections?
<div class="login-card js-login-card">
<div class="login-page__alert alert alert--error tt js-alert--error" role="alert">
<div class="alert__icon">
<div class="alert__heading js-alert--error-text">You entered an invalid username or password, please try again. </div>
</div>
<div id="cmePageWrapper" class="page-wrapper page-wrapper--card"> </div>
Try following xpath, as the required div tag is child node of div with class 'alert__icon':
//div[#class='login-card js-login-card']/div[#role='alert']/div[1]/div
Let me know, if it works for you.
Maybe you wanna try this
div[class*="error-text"]
If it didn't work try to get text by executing javascript code using this
$$( "div[class*="error-text"]" ).text() OR .val()/.html()
Good luck !
You could use contains with xpath, something like //div[contains(#class, 'error-text' ) ], using findelement will retrieve first element match the criteria. If it still returns empty, it means that the page might have more than one element which match the criteria

Finding XPATH in single nested div statement, when the classname is shared among multiple classnames

So im currently trying to write a test/learn how to use Selenium. One of the issues I am running in to is that I need to select specfically the number 262 in this nested div.
The issue I ran into is that if I make the xpath //div[#class='np_amount inline'] that I get multiple results going down the entire page, and if I make it //div[#class='np_field_amount_etc'], then I get all three items in the row, and not just the number 262.
However, the initial div class (np_field_amount_etc) is unique. What xpath command would I write in order to only select the 262 in this series of div's?
<div class="np_field_amount_etc">
<div class="np_label inline">Total Calories</div>
<div class="np_amount inline">262 </div>
<div class="np_dv inline">14</div>
</div>
I think, somethings like this:
//div[#class='np_field_amount_etc']/div[#class='np_amount inline']
is what you want.

Xpath Selenium trouble

Can anyone help me? i tried using Firepath for a correct Xpath however the code it gives me is incorrect in my eyes. First line in the examples, is the provided one.
.//*[#id='content']/div/div/div[1]/h2/span
<div id="content" class="article">
<h1>News</h1>
<div>
<div>
<div class="summary">
<h2>
<span>9</span>
// this should be the correct xpath i think
_driver.findElement(By.xpath("//*div[#id='content']/div/span.getText()"));
Here i want check if the text in between is greater or equal to 1
and the other is:
.//*[#id='content']/div/div/div[3]
<div id="content" class="article">
<h1>News</h1>
<div>
<div>
<div class="summary">
<div class="form fancy">
<div class="common results">
Here i want to check if the div class common results has been made, 1 item equals 1 common results
For retrieving span text you can use this
String spanText=driver.findElement(By.xpath("//div[#id='content']/div/div/div/h2/span")).getText();
System.out.println(spanText);
From the second question I am not so much clear.You can get class name like this, Please explain me if its not your solution
String className=driver.findElement(By.xpath("//*[#id='content']/div/div/div/div/div")).getAttribute("class");
System.out.println(className);
I would suggest you making usage of:
//div[#id='content']/div/div/div/h2/span/text()
Note: the html code you shared was not well formed. I would suggest you to test in advance the code and the xpath with http://www.xpathtester.com/xpath (to fix the code) and http://codebeautify.org/Xpath-Tester (to test your xpath)

Selenium WebDriver : Not able find xpath for Paytm.com , Proceed button

I was trying to automate paytm.com site ,
Here i found Proceed button attribute has name but when i tried to use xpath checker for the name attribute , it was showing 13 matches but my question here is in the webpage from the UI level am not able to see 13 Proceed buttons instead only one Proceed button are present .
Even i tried with other attribute to find the xpath , but it showing more matches found.
Below is the HTML code for Proceed
<div class="msg-container">
<div class="btn-spinner" alt="Proceed to Recharge">
<div class="spinner hidden"></div>
<input class="btn proceed active" type="submit" data-express-text="Recharge Now" data-soft-block-text="Proceed anyway" data-default-text="Proceed" name="Proceed" value="Proceed" alt="Proceed to Recharge">
Can you please let me where am going wrong ?
This xpath returns 1 match for me
//form[#id='prepaidMobile']//input[#name='Proceed']
Also, if want use only //input[#name='Proceed'] you can get it from List of WebElements:
WebElement firstInput = driver.findElements(by.xpath("//input[#name='Proceed']"))[0];
This will work for you, I think:
driver.findElement(By.xpath("(//input[#name='Proceed'])[1]")));