Right now my code is set up to scrape through all the last names that start with 'Z'
query = ['Z']
for letter in query:
url = "https://hsba.org/HSBA_2020/For_the_Public/Find_a_Lawyer/HSBA_2020/Public/Find_a_Lawyer.aspx"
driver.get(url)
#input from query
element = driver.find_element(By.CSS_SELECTOR,'#txtDirectorySearchLastName')
element.send_keys(letter)
I need it to loop through the whole alphabet and get all the text info on each page.
I tried searching for help and am not sure where to start...
(Disclaimer: very new to coding)
import string
alphabets=list(string.ascii_lowercase)
for alph in alphabets:
url = "https://hsba.org/HSBA_2020/For_the_Public/Find_a_Lawyer/HSBA_2020/Public/Find_a_Lawyer.aspx"
driver.get(url)
#input from query
element = driver.find_element(By.CSS_SELECTOR,'#txtDirectorySearchLastName')
element.send_keys(alph)
If you need to loop all the letters just get all letters using string.ascii_lowercase and loop them.
Related
I can't make line breaks between each URL I get.
The urls are displayed in a row when I would like to have 1 url per line.
Could you help me with this problem?
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://www.twitch.tv/directory/game/League%20of%20Legends/clips?range=7d")
sleep(3)
i = 1
while i <= 20:
links = driver.find_elements_by_xpath("//a[#data-a-target='preview-card-image-link']")
driver.execute_script('arguments[0].scrollIntoView(true);', links[len(links)-1])
print("=> i :", i)
i+=20
sleep(1)
links = driver.find_elements_by_xpath("//a[#data-a-target='preview-card-image-link']")
for link in links:
print(link.get_attribute('href'))
f = (link.get_attribute('href'))
c = open('proxy_list.txt', 'a')
c.write(f)
A few things...
Your first while loop only loops once... I'm not sure if that's intentional or not? You are setting i to 1 and looping until i > 20. After the first command, you increment i by 20 which means that it will fail the while condition and fall out. If that's what you intend, you can get rid of most of that code and just keep 2 lines. The first block then becomes
links = driver.find_elements_by_xpath("//a[#data-a-target='preview-card-image-link']")
driver.execute_script('arguments[0].scrollIntoView(true);', links[len(links)-1])
The second block of code is looping through the returned links, getting the href in each and then writing to file. Writing to disk is slow so you want to minimize that as much as possible. Since you aren't trying to write millions of lines at once, you can just build your final string containing the hrefs in the loop and then once the loop is done, write the string to file. That way you only write to disk once.
Adding the line break is as simple as appending "\n" (newline character) to the end of each href.
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://www.twitch.tv/directory/game/League%20of%20Legends/clips?range=7d")
links = driver.find_elements_by_xpath("//a[#data-a-target='preview-card-image-link']")
driver.execute_script('arguments[0].scrollIntoView(true);', links[len(links)-1])
list = ""
links = driver.find_elements_by_xpath("//a[#data-a-target='preview-card-image-link']")
for link in links:
print(link.get_attribute('href'))
list += link.get_attribute('href') + "\n"
c = open('proxy_list.txt', 'a')
c.write(list)
I'm trying to print search results of DuckDuckgo using a headless WebDriver and Selenium. However, I cannot locate the DOM elements referring to the search results no matter what ID or class name I search for and no matter how long I wait for it to load.
Here's the code:
opts = Options()
opts.headless = False
browser = Firefox(options=opts)
browser.get('https://duckduckgo.com')
search = browser.find_element_by_id('search_form_input_homepage')
search.send_keys("testing")
search.submit()
# wait for URL to change with 15 seconds timeout
WebDriverWait(browser, 15).until(EC.url_changes(browser.current_url))
print(browser.current_url)
results = WebDriverWait(browser,10)
.until(EC.presence_of_element_located((By.ID,"links")))
time.sleep(10)
results = browser.find_elements_by_class_name('result results_links_deep highlight_d result--url-above-snippet') # I tried many other ID's and class names
print(results) # prints []
I'm starting to suspect there is some trickery to avoid web scraping in DuckDuckGo. Does anyone has a clue?
I've changed to use cssSelector then it works.I use java, not python.
List<WebElement> elements = driver.findElements(
By.cssSelector(".result.results_links_deep.highlight_d.result--url-above-snippet"));
System.out.println(elements.size());
//10
So very new here to Selenium but I'm having trouble selecting the element I want from this website. In this case, I got the x_path using Chrome's 'copy XPath tool.' Basically, I'm looking to extract the CID text (in this case 4004) from the website, but my code seems to be unable to do this. Any help would be appreciated!
I have also tried using the CSS selector method as well but it returns the same error.
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
driver= webdriver.Chrome()
chem_name = "D008294"
url = "https://pubchem.ncbi.nlm.nih.gov/#query=" + chem_name
driver.get(url)
elements = driver.find_elements_by_xpath('//*[#id="collection-results-container"]/div/div/div[2]/ul/li/div/div/div/div[2]/div[2]/div[2]/span/a/span/span')
driver.close()
print(elements.text)
As of now, this is the error I receive: 'list' object has no attribute 'text'
Here is the xpath that you can use.
//span[.='Compound CID']//following-sibling::a/descendant::span[2]
Why your script did not worked: I 2 issues in your code.
elements = driver.find_elements_by_xpath('//*[#id="collection-results-container"]/div/div/div[2]/ul/li/div/div/div/div[2]/div[2]/div[2]/span/a/span/span')
driver.close() # <== don't close the browser until you are done with all your steps on the browser or elements
print(elements.text) # <== you can not get text from list (python will through error here
How to fix it:
CID = driver.find_element_by_xpath("//span[.='Compound CID']//following-sibling::a/descendant::span[2]").text # <== returning the text using find_element (not find_elements)
driver.close()
print(CID) # <== now you can print `CID` though browser closed as the value already stored in variable.
Function driver.find_elements_by_xpath return list of Element. You should loop to get text of each element,
Like this:
for ele in print(elements.text):
print(ele.text)
Or if you want to match first Element, use driver.find_element_by_xpath function instead.
Using xpath provided chrome is always does not work as expected. First you have to know how to write xpath and verify it chrome console.
see these links, which helps you to know about xpaths
https://www.guru99.com/xpath-selenium.html
https://www.w3schools.com/xml/xpath_syntax.asp
In this case, first find the span contains text Compound CID and move to parent span the down to child a/span/span. something like //span[contains(text(),'Compound CID']/parent::span/a/span/span.
And also you need to findelement which return single element and get text from it. If you use findelements then it will return list of elements, so you need to loop and get text from those elements.
xpath: //a[contains(#href, 'compound')]/span[#class='breakword']/span
you can use the "href" as your attribute reference since I noticed that it has unique value for each component.
Example:
href="https://pubchem.ncbi.nlm.nih.gov/substance/53790330"
href="https://pubchem.ncbi.nlm.nih.gov/compound/4004"
I have some elements on my page which have xpath like //*[#id='protect']/a, //*[#id='deezer']/a ,//*[#id='international']/a..I want to get all the xpaths in an array list..all the xpaths are ending with '/a'..but the id is different for all the elements..please help
i want to get all the elements in an array list.so that i can click them by using some loops and conditions inside loop
There's a straighforward way to do it in selenium: using findElements. Ex:
List list = driver.findElements(By.xpath("//a"));
You'll get every single hyperlink on the page stored in a List. Then you'll be able to handle every webelement in it by using loops.
I am trying to print the values of the City column for different rows from the below table shown in the image.
I used the below code. My code prints "Dubai" 4 times instead of printing different cities. Can someone help me in fixing this?
System.setProperty("webdriver.chrome.driver","C:\\Selenium\\Drivers\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("http://toolsqa.com/automation-practice-table/");
WebElement body = driver.findElement(By.xpath("//*[#id='content']/table/tbody"));
List<WebElement> rows = body.findElements(By.tagName("tr"));
System.out.println(rows.get(0).findElement(By.xpath("//td[2]")).getText());
System.out.println(rows.get(1).findElement(By.xpath("//td[2]")).getText());
System.out.println(rows.get(2).findElement(By.xpath("//td[2]")).getText());
System.out.println(rows.get(3).findElement(By.xpath("//td[2]")).getText());
Table Image
Actually your code is correct, only problem with your xpath using for getting city element.
When you are going to find city name using xpath //td[2], your are searching actually second column every time but on whole page that's why you are getting same city name every time.
You need to provide .//td[2] xpath, because when you provided xpath with . it will search only element context otherwise it will search on document means on whole page, and according to whole page output is absolutely correct.
Now if you want simply print all cities try as below :-
List<WebElement> cities = driver.findElements(By.xpath("//*[#id='content']/table/tbody//td[2]"));
for(WebElement city : cities)
{
System.out.println(city.getText());
}
Try below code to print values of the City column:
List<WebElement> rows = driver.findElements(By.xpath("//*[#id='content']/tbody/tr"));
for(int row = 0; row < rows.size(); row++){
System.out.println(driver.findElement(By.xpath("//*[#id='content']/table/tbody/tr[row]/td[2]")).getText());
}
Hope it helps!