I am doing Linkedin web scraping as a part of my college project. This is the code to locate the skills & endorsements, recommendations and accomplishments section:
skills = driver.find_elements_by_css_selector('#ember661')
recom = driver.find_elements_by_css_selector('#ember679')
acc = driver.find_elements_by_css_selector('#ember695')
But I am getting an empty list in all the three variables. Please help!
There are couple of reasons.
ID is geneerated and not the same for all the profiles.
You should not expect a list of element. There is a single section of each type on profile page so that a single element would be returned.
Those sections might get loaded asynchronously so that the page is loaded but the section has not yet been. So that the locators return false. In such case you need to use explicit waiting. Like:
waiter = WebDriverWait(driver, 10)
skills = waiter.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.pv-skill-categories-section')))
recom = waiter.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.pv-recommendations-section')))
acc = waiter.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.pv-accomplishments-section')))
Related
Hi I am currently looking to scrape a [age such as this "https://www.tennis24.com/match/ABiALWlt/#match-statistics;0" every time it changes score. currently i have the ability to scrape it using selenium and BS using the below code
from selenium import webdriver
Chrom_path = r"C:\Users\Dan1\Desktop\chromedriver.exe"
driver = webdriver.Chrome(Chrom_path)
driver.get("https://www.tennis24.com/match/zVrM3ySQ/#match-statistics;0")
data = driver.find_elements_by_class_name("statTextGroup")
for d in data:
sub_data = d.find_elements_by_xpath(".//*")
assert len(sub_data)==3
for s_d in sub_data:
print(s_d.get_attribute('class')[19:], s_d.get_attribute('innerText'))
but I have no idea how to automate it so that once the score at the top of the page located here"Medical timeout6 : 6 ( 0 : 0 )" changes, the scraper scrapes the new data. the change to monitor though is only visible when the match is in play and not always there.
if you need anymore info please let me know and ill be happy to add it
You can scrape in a while loop the "scoreboard"-class and when this is not the same as the old value of this then this value changed and you can scrape the other things you wanted.
Hope it helped
I have a test where I click through list of links on a page. If I open the links in new pages, I can iterate through the list using browser.switchTo().window and the original window's handle, clicking on each one.
But if I open a link in the same page (_self) and I navigate back to the original list of links using browser.navigate().back(), I get the following error when it iterates to click the next link:
StaleElementReferenceError: stale element reference: element is not attached to the page document
What's the proper way to access elements on a prior page once you've navigated away?
When you request a object from Selenium, Selenium IDs that object with a internal Unique ID. See the below command
>>> driver.find_element_by_tag_name("a")
<selenium.webdriver.firefox.webelement.FirefoxWebElement
(session="93fc2bec-c9f8-0c46-aec3-1939af00c917",
element="5173f7fb-63ca-e447-b176-4a226d956834")>
As you can see the element has a unique uuid. Selenium maintains these list internally, so when you take an action like click, it fetches the element from its cached and takes action on it.
Once the page is refreshed or a new page is loaded, this cache is no longer valid. But the object you created in your language binding still is. If I try and execute some action on its
>>> elem.is_displayed()
selenium.common.exceptions.StaleElementReferenceException:
Message: The element reference of [object Null] null
stale: either the element is no longer attached to the DOM or the page has been refreshed
So in short there is no way to use the same object. Which means you need to alter your approach. Consider the below code
for elem in driver.find_elements_by_tag_name("a"):
elem.click()
driver.back()
Above code will fail on the second attempt of elem.click(). So the fix is to make sure not to re-use a collection object. Instead use a number based loop. I can write the above code in many different ways. Consider few approaches below
Approach 1
elems = driver.find_elements_by_tag_name("a")
count = len(elems)
for i in range(0, count):
elems[i].click()
driver.back()
elems = driver.find_elements_by_tag_name("a")
This is not a very great approach as I am getting a collection of objects and only using one of them. A page which would have 500+ odd links will make this code quite slow
Approach 2
elems = driver.find_elements_by_tag_name("a")
count = len(elems)
for i in range(1, count + 1):
elem = driver.find_element_by_xpath("(//a)[{}]".format(i))
driver.back()
This is better than approach 1 as I am getting all objects just one. Latter I am getting one and using one
Approach 3
elems = driver.find_elements_by_tag_name("a")
links = []
for elem in elems:
links.append(elem.get_attribute("href"))
for link in links:
driver.get(link)
# do some action
This approach will only work when links are href based. So it is that based on the situation I would choose or alter my approach
I am trying to get the job description for job search page indeed.com This is how it looks like
Provide technical leadership around
QA
automation to IT teams. Work with various team to promote
QA
processes, practices and standardization....
Any idea how can I get that description? I tried the following:
//span[contains(#class,'summary')]
That does not give me the text description. Should I xpath or is there any other solution? Thanks in advance for your time.
This XPath are correct.
//span[contains(#class,'summary')]
//span[#class='summary']
I'm a Python guy, But I translated it to Java. You can do:
element = driver.findElement(By.name("summary"));
element = driver.findElement(By.className("summary"));
element = driver.findElement(By.cssSelector('span[class="summary"]');
And remember that If you want the element text, every element has the method .getText(), the find* functions only retrieve the element/s.
Double check you were not using driver.findElements(By.xpath()) in plural. In that case you should first retrieve the individual elements. Then access to the .getText() method.
description = driver.findElement(By.className("summary")).getText();
System.out.print(description);
Alternatively you could do:
description = driver.findElement(By.className("summary"));
description_text = description.getAttribute("innerHTML");
System.out.print(description_text);
If your problem is that your element is not visible or reachable (stale). Then you can use javascript.
element = driver.executeScript("return document.querySelector('span[class=\"summary\"]');");
For more reference:
https://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/WebElement.html
http://i.stack.imgur.com/L4WUv.jpg
Link to Grid
I'm trying to detect the different drop downs on this page (depicted by the filters by the text boxes). The problem i'm having is that it seems that the filters all have the same ids. I can get the webdriver to find the initial filter button but not target the options in the drop down.
Note the filters I'm talking about are the ones from the funnel buttons. For example contains, isEqual, between etc *
This is wrong but an example
it('Should filter grid to -contain Civic', function() {
browser.element(by.id('ctl00_ContentPlaceHolder1_RadGrid1_ctl00_ctl02_ctl03_FilterTextBox_Model')).sendKeys("civic");
browser.element(by.id('ctl00$ContentPlaceHolder1$RadGrid1$ctl00$ctl02$ctl03$FilterTextBox_Model')).click();
browser.element(by.xpath("//*[contains(text(), 'Contains')]")).click();
})
NOTE The answer that was being looked for is at the bottom of this answer after the word "EDIT". The rest of this answer is retained because it is still useful.
It's a challenge to test webpages that dynamically generate ids and other attributes. Sometimes you just have to figure out how to navigate the stable attributes with an xpath. Here's an xpath that finds all four dropdowns:
//tr[#class='rgFilterRow']//input
To differentiate between each one, you can do this:
(//tr[#class='rgFilterRow']//input)[1] // Brand Name
(//tr[#class='rgFilterRow']//input)[2] // Classification
(//tr[#class='rgFilterRow']//input)[3] // Transmission
(//tr[#class='rgFilterRow']//input)[4] // Fuel
Using numbers to specify elements in an xpath isn't really desirable (it will behave incorrectly if the order of columns in the table changes), but it's probably the best you can do in this case because of all the dynamic ids and general lack of reliable identifying attributes.
EDIT
I misunderstood what you were trying to get because I didn't look at the image that you linked to. Once you've opened up that menu, you should be able to use an xpath to get whichever option you want by the text. For example, if you want the "Contains" option:
//a[#class='rmLink']//span[text()='Contains']
This page is highly dynamic. You had better brush up on your XPath, as nothing else will be able to help you. You can use this: http://www.zvon.org/xxl/XPathTutorial/General/examples.html .
Here is a simple example of how to access the Brand Name "pulldown". This is written in Groovy, which looks a lot like Java. If you know Java you should be able to get the idea from this:
WebElement brandName = driver.findElement(By.id("ctl00_ContentPlaceHolder1_RadGrid1_ctl00_ctl02_ctl03_BrandNameCombo_Arrow"))
brandName.click() // to open the "pulldown"
List<WebElement> brandItems = driver.findElements(By.xpath("//ul[#class='rcbList']/li"))
brandItems.each {
if(it.text == 'BMW')
it.click()
}
Unfortunately, the above id is not very reliable. A much better strategy would be something like:
WebElement classification = driver.findElement(By.xpath("//table[#summary='combobox']//a[contains(#id, 'ClassificationCombo_Arrow')]"))
Selecting its items is done similarly.
classification.click() // to open the "pulldown"
List<WebElement> classificationItems = driver.findElements(By.xpath("//ul[#class='rcbList']/li"))
classificationItems.each {
if(it.text == 'Sedan')
it.click()
}
If you are not up to the task, you should be able to get help from your development colleagues on how to locate all the elements in this page.
I am creating a framework for the data validation using selenium. The issue I am struggling with is I want to locate the element "td"(html tag) within element "tr"(html tag) . This is the code I have written.
Iterator<WebElement> i = rows.iterator();
While(i.hasnext()){
List<WebElement> columns = row.findElements(By.tagName("td"));
for(WebElement s:columns)
{
System.out.println("columnDetails : "+s.getText().toString());
}
if(columns.isEmpty())
{
ElementNotFoundException e = new ElementNotFoundException("No data in table");
throw e;
}
Iterator<WebElement> j = columns.iterator();// does some other work
ClusterData c = new ClusterData(); // does some other work
ClusterDataInitializer.initUI(c, j, lheaders); // does some other work
CUIData.put(c.getCN(), c); // does some other work
}
Now the issue with this is:
I am trying to fetch the data from the rows(see table data) in arraylist and use that arraylist further. Currently whats happening is the data for column header is fetched at start of which I have no use.I only want the rows's data. I am not able to determine the proper way to collect the data of table rows only.
if xPath of the table will help you understand it properly then here are the details :
Table header xPath of cluster name column:
/html/body/table/tbody/tr[2]/td[2]/div[2]/div/div/div[2]/div/div/div[2]/div/div/div[2]/div/div/div/div/div/div[2]/div/div/div[2]/div/div/div[2]/div[2]/div/table/tbody/tr/td[2]/div/div[2]
Table row (Table Data) xPath of test cluster 01:
/html/body/table/tbody/tr[2]/td[2]/div[2]/div/div/div[2]/div/div/div[2]/div/div/div[2]/div/div/div/div/div/div[2]/div/div/div[2]/div/div/div[3]/div[2]/div/table/tbody/tr/td[2]/div/div/a
Please let me know if you need anything else.
I am using the following code to extract row data from table.
List<WebElement> rows = getElement(driver,sBy,"table_div_id").findElements(By.tagName("tr"));
where sBy = By.id and table_div_id = id of div in which table is present. This extracts all the rows into arraylist and then i am using code to extract the row data into another arraylist. It is where I am stuck.
Each row from the table is in its own "table" tag so following things are not working :-
List<WebElement> rows = driver.findElements(By.xpath("//div[#id = 'table_div_id']//tr"));
List<WebElement> columns = row.findElements(By.xpath("./td"));
or the approach I used for the previous release of product i.e.
List<WebElement> columns = row.findElements(By.tagName("td"));
So, I used following approach which enabled me to capture all of the visible rows from the table.
List<WebElement> columns = row.findElements(By.xpath(".//table[#class='gridxRowTable']/tbody/tr"));
But after that I faced another issue that is since this table was implemented using dojo, the scrolling was impossible and Selenium was only able to capture the visible rows , so to overcome this I zoomed out in the browser using selenium. This is how i achieved my goal of getting the data.I believe others might have provided me answer if i would have shared some more details. Still , sorry about that and hope my answer helps you all.
instead of
List<WebElement> columns = row.findElements(By.tagName("td"));
try using
List<WebElement> columns = row.findElements(By.xpath("./td"));
Check if this helps. This should give you the td elements. If I have not understood your issue, let me know.
You can use this way-
driver.findElement(By.Xpath("//table[#id=\"table1\"]/tbody/tr[2]/td[1]"));
Regards,
Anuja
Do you have selenium IDE installed? Perform storeText operation on the row you want to retrieve, then xpath will get populated in IDE. There will be multiple xpaths; the most reliable is xpath:position, use that to capture your rows.
And use firebug for better visibilty of your AUT.
Firebug and Selenium IDE are the most basic component of Selenium Framework development.
You can manipulate xpath as you want.