How do I scrape text from <a> links? - scrapy

How do I scrape text from <a> elements? For example:
<div id="job_14" class="job">
<a target="_blank" href="https://www.indeed.com/viewjob?t=Associate+Network+System&c=Las+Vegas+Valley+Water+District&l=Las+Vegas%2C+NV&jk=a22e9d1fa81cae52&indpubnum=4385896808151888&atk=&chnl=JobrollSearch" class="jobtitle" rel="nofollow">Associate Network

You can try with css selector:
response.css('.job a::text')
Which translates to select node with class job -> select child node <a> -> select current node's text value.

You can also try with xpath. CSS selector is translated to xpath and then executed.
my_xpath = "//div[#id='job_14']/a"
my_text = tree.xpath(my_xpath)
my_text = my_text.text

Related

Selenium getting href link returning null

Getting href link from following html is always giving me null. I have been trying multiple ways to get this I cannot use xpath since xpath changes for every page.
<div class="form-group" id="idfb">
<label class="control-label">PDF </label> <a type="button" value="Download" href="./decisiondecisionForm-pdfContainer-filePdfDownload&id=6303"><i class="fa fa-download fa-2x xh-highlight" aria-hidden="true"></i></a><br>
</div>
I am trying to get href in following way
val element = driver.findElement(By.cssSelector("*[id^='id']"))
val link = element.getAttribute("href")
Is issue in aria-hidden attribute ?
Actually you want get the href attribute from the <a> tag, and your selector is not referring to it, but to the div tag.
The div tag doesn't have the href attribute, so that's why your code returns null.
So, instead you can try with the following value: By.cssSelector("div[id^='id'] > a")
Can you try this: Assuming div class="form-group" id="idfb">
is parent tag.
val element = driver.findElement(By.cssSelector("*[id^='id' a]"))

Using selenium, how to select a the text in a paragraph which is nested in a div element?

Sample code:
<div class="loginbox">some code</div>
<div class="loginbox">other code</div>
<div class="loginbox">
<p> style="color: Red;">Test Extract</p>
</div>
Using Selenium Web Driver, I would like to extract the text Test Extract within the paragraph element which is nested within a div, whose class name is shared with other div classes. c# preferred.
You can try below method:
driver.findElement(By.xpath("//div[#class='loginbox']/p")).getText();
EDITED
You should use = inside the square braces like:
driver.findElement(By.xpath("//div[#class='loginbox']/p");
C# code to get the text from the locator specified,
IWebElement element = Browser.GetElementByCssSelector("div.loginbox p");
string text = element.Text;

Using Selenium to select text

Want to select the text "This is for testing selector" from below HTML code.
<div class="breadcrumb">
<a title=" Home" href="http://www.google.com/"> Home</a>
<span class="arrow">»</span>
<a title="abc" href="http://www.google.com/">test1</a>
<span class="arrow">»</span><a title="xyz" href="http://www.google.com/">test2</a>
<span class="arrow">»</span>
This is for testing selector
</div>
I'm not sure if there an easy way out for this or not. It turned out to be more difficult than I thought. Below mentioned code is tested locally and giving correct output for me ;)
String MyString= driver.findElement(By.xpath("//div[#class='breadcrumb']")).getText();
//get all child nodes of div parent class
List<WebElement> ele= driver.findElements(By.xpath("//div[#class='breadcrumb']/child::*"));
for(WebElement i:ele) {
//substracing a text of child node from parent node text
MyString= MyString.substring(i.getText().length(), MyString.length());
//removing white spaces
MyString=MyString.trim();
}
System.out.println(MyString);
Let me know if it works for you or not!
Try with this example :
driver.get("http://www.google.com/");
WebElement text =
findElement(By.className("breadcrumb")).find("span").get(1);
Actions select = new Actions(driver);
select.doubleClick(text).build().perform();
I suggest also that you copy the xpath for the text you need and put it here to have the exact xpath
You cannot select text inside an element using xpath.
Xpath can only help you select XML elements, or in this case, HTML elements.
Typically, text should be encased in a span tag, however, in your case, it isn't.
What you could do, however, is select the div element encasing the text. Try this xpath :
(//div[#class='breadcrumb']/span)[3]/following-sibling::text()
You could try Abhijeet's Answer if you just want to get the text inside. As an added check, check if the string obtained from using getText() on root element contains the string obtained from using getText() on the child elements.

Selenium. How to navigate element who's href contains a certain string? [duplicate]

The following is a bunch of links <a elements. ONLY one of them has a substring "long" as a value for the attribute href
<a class="c1" href= "very_lpng string" > name1 </a>
<a class="g2" href= "verylong string" > name2 </a> // The one that I need
<a class="g4" href= "very ling string" > name3 </a>
<a class="g5g" href= "very ng string" > name4 </a>
...................
I need to click the link whose href has substring "long" in it. How can I do this?
PS: driver.findElement(By.partialLinkText("long")).click(); // b/c it chooses by the name
I need to click the link who's href has substring "long" in it. How can I do this?
With the beauty of CSS selectors.
your statement would be...
driver.findElement(By.cssSelector("a[href*='long']")).click();
This means, in english,
Find me any 'a' elements, that have the href attribute, and that attribute contains 'long'
You can find a useful article about formulating your own selectors for automation effectively, as well as a list of all the other equality operators. contains, starts with, etc... You can find that at: http://ddavison.io/css/2014/02/18/effective-css-selectors.html
use driver.findElement(By.partialLinkText("long")).click();
You can do this:
//first get all the <a> elements
List<WebElement> linkList=driver.findElements(By.tagName("a"));
//now traverse over the list and check
for(int i=0 ; i<linkList.size() ; i++)
{
if(linkList.get(i).getAttribute("href").contains("long"))
{
linkList.get(i).click();
break;
}
}
in this what we r doing is first we are finding all the <a> tags and storing them in a list.After that we are iterating the list one by one to find <a> tag whose href attribute contains long string. And then we click on that particular <a> tag and comes out of the loop.
With the help of xpath locator also, you can achieve the same.
Your statement would be:
driver.findElement(By.xpath(".//a[contains(#href,'long')]")).click();
And for clicking all the links contains long in the URL, you can use:-
List<WebElement> linksList = driver.findElements(By.xpath(".//a[contains(#href,'long')]"));
for (WebElement webElement : linksList){
webElement.click();
}

Finding all "A" tags with specific strings in the attribute href?

driver.FindElement(By.Name("zipcode")).Clear();
driver.FindElement(By.Name("zipcode")).SendKeys(zipcode);
driver.FindElement(By.Name("Go")).Click();
driver.FindElements(By.TagName("A"). //<---- ?????????
I have some Selenium API code that I started. I aim to get all the "A" tags with the string "alertsepy" and the sting "sevendwarves" in the attribute href and return all those elements into an array so I can do some further processing. I started the code but I am really not quite sure how to get all the way there yet. Does anyone know how to do this type of query with Selenium.
Kind Regards!
You should use css selector:
IList<IWebElement> elements = driver.findElements(By.cssSelector("a[href*=alertsepy],a[href*=sevendwarves]")
This query will return a nodes with href attribute that contains alertsepy or sevendwarves or both strings:
<a href="alertsepy.html" > </a>
<a href="sevendwarves.html" > </a>
<a href="http://sevendwarves.org/alertsepy.html" > </a>
Or you can use:
IList<IWebElement> elements = driver.findElements(By.cssSelector("a[href*=alertsepy][href*=sevendwarves]")
This query will return a nodes with href attribute that contains alertsepy and sevendwarves strings:
<a href="http://sevendwarves.org/alertsepy.html" > </a>
For a list of generally available css selectors refer to w3c css selectors. For the list of available in Selenium query types refer to Locating UI Elements.
List<WebElement> anchortaglist = driver.find Elements(By.Tag Name('a');