I´m using selenium to scrape a webpage and it finds the elements on the main page, but when I use the click() function, the driver never finds the elements on the new page. I used beautifulSoup to see if it´s getting the html, but the html is always from the main. (When I see the driver window it shows that the page is opened).
html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify)
I´ve used webDriverWait() to see if it´s not loading but even after 60 seconds it never does,
element = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "ddlProducto")))
also execute_script() to check if by clicking the button using javascript loads the page, but it returns None when I print a variable saving the new page.
selectProducto = driver.execute_script("return document.getElementById('ddlProducto');")
print(selectProducto)
Also used chwd = driver.window_handles and driver.switch_to_window(chwd[1]) but it says that the index is out of range.
chwd = driver.window_handles
driver.switch_to.window(chwd[1])
I'm trying to scrape data from https://in.puma.com/in/en/mens/mens-new-arrivals . The complete data is loaded when the show all button is clicked.
I used selenium to generate the click and load the rest of the page, however - I'm getting an error
"TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: "
See my code below.
from selenium.webdriver.support import expected_conditions as EC
import time
from lxml import etree as et
chrome_driver_path = "driver/chromedriver"
url = 'https://in.puma.com/in/en/mens/mens-new-arrivals'
browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get(url)
x_path_to_load_more = '//*[#data-component-id="a_tspn9cqoeth"]'
browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
button_locate = wait(browser,10).until(EC.presence_of_element_located((By.XPATH,x_path_to_load_more)))
button_locate.click()
The xpath is not correct, try
x_path_to_load_more = "//button[contains(., 'Show All')]"
To verify the effectiveness of the xpath inspect the page, open the find bar with Command + F or Control + F and paste your xpath
#data-component-id seem to be dynamic. It means that the value will be different (not "a_tspn9cqoeth") each time you open that page. Try to search by another attribiute value:
x_path_to_load_more = '//div[#class="text-center"]/button[contains(#class, "show-more-button")]'
or
x_path_to_load_all = '//div[#class="text-center"]/button[contains(#class, "show-all-button")]'
Also it's better to use EC.element_to_be_clickable instead of EC.presence_of_element_located
UPDATE
Since click on button might be intercepted by Cookies footer try to scroll page down before making click:
from selenium.webdriver.common.keys import Keys
driver.find_element('xpath', '//body').send_keys(Keys.END)
I am currently working on a college project for Linkedin Web Scraping using selenium. Following is the code for the same:
from selenium import webdriver
from time import sleep
from selenium.webdriver.common.keys import Keys
from parsel import Selector
driver = webdriver.Chrome('location of web driver')
driver.get('https://www.linkedin.com')
# username
username = driver.find_element_by_id('session_key')
username.send_keys('Linkedin Username')
sleep(0.5)
# password
password = driver.find_element_by_id('session_password')
password.send_keys('Linkedin Password')
sleep(0.5)
#submit value
sign_in_button = driver.find_element_by_xpath('//*[#type="submit"]')
sign_in_button.click()
sleep(0.5)
driver.get('https://www.google.com/') #Navigate to google to search the profile
# locate search form by_name
search_query = driver.find_element_by_name('q')
# send_keys() to simulate the search text key strokes
search_query.send_keys('https://www.linkedin.com/in/khushi-thakkar-906b56188/')
sleep(0.5)
search_query.send_keys(Keys.RETURN)
sleep(3)
# locate the first link
search_person = driver.find_element_by_class_name('yuRUbf')
search_person.click()
#Experience
experience = driver.find_elements_by_css_selector('#experience-section .pv-profile-section')
for item in experience:
print(item.text)
print("")
#Education
education = driver.find_elements_by_css_selector('#education-section .pv-profile-section')
for item in education:
print(item.text)
print("")
#Certification
certification = driver.find_elements_by_css_selector('#certifications-section .pv-profile-section')
for item in certification:
print(item.text)
print("")
When I scrape the experience part, it extracts the information perfectly. But when I do the same with Education and certifications part - It shows an empty list. Please help!
I think the problem ís because of your css selector. I try it my self and it is unable to locate any element on html main body
Fix your css selector and you will be fine
#Education
education = driver.find_elements_by_css_selector('#education-section li')
#Certification
certification = driver.find_elements_by_css_selector('#certifications-section li')
I would like to automatically get images saved as browser's data after the page renders, using their corresponding data URLs.
For example:
You can go to the webpage: https://en.wikipedia.org/wiki/Truck
Using the WebInspector from Firefox pick the first thumbnail image on the right.
Now on the Inspector tab, right click over the img tag, go to Copy and press "Image Data-URL"
Open a new tab, paste and enter to see the image from the data URL.
Notice that the data URL is not available on the page source. On the website I want to scrape, the images are rendered after passing through a php script. The server returns a 404 response if the images try to be accessed directly with the src tag attribute.
I believe it should be possible to list the data URLs of the images rendered by the website and download them, however I was unable to find a way to do it.
I normally scrape using selenium webdriver with Firefox coded in python, but any solution would be welcome.
I managed to work out a solution using chrome webdriver with CORS disabled as with Firefox I could not find a cli argument to disable it.
The solution executes some javascript to redraw the image on a new canvas element and then use toDataURL method to get the data url. To save the image I convert the base64 data to binary data and save it as png.
This apparently solved the issue in my use case.
Code to get first truck image
from binascii import a2b_base64
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--disable-web-security")
chrome_options.add_argument("--disable-site-isolation-trials")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://en.wikipedia.org/wiki/Truck")
img = driver.find_element_by_xpath("/html/body/div[3]/div[3]"
"/div[5]/div[1]/div[4]/div"
"/a/img")
img_base64 = driver.execute_script(
"""
const img = arguments[0];
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
canvas.width = img.width;
canvas.height = img.height;
ctx.drawImage(img, 0, 0);
data_url = canvas.toDataURL('image/png');
return data_url
""",
img)
binary_data = a2b_base64(img_base64.split(',')[1])
with open('image.png', 'wb') as save_img:
save_img.write(binary_data)
Also, I found that the data url that you get with the procedure described in my question, was generated by the Firefox web inspector on request, so it should not be possible to get a list of data urls (that are not within the page source) as I first thought.
BeautifulSoup is the best library to use for such problem statements. When u wanna retrieve data from any website, u can blindly use BeautifulSoup as it is faster than selenium. BeautifulSoup just takes around 10 seconds to complete this task, whereas selenium would approximately take 15-20 seconds to complete the same task, so it is better to use BeautifulSoup. Here is how u do it using BeautifulSoup:
from bs4 import BeautifulSoup
import requests
import time
st = time.time()
src = requests.get('https://en.wikipedia.org/wiki/Truck').text
soup = BeautifulSoup(src,'html.parser')
divs = soup.find_all('div',class_ = "thumbinner")
count = 1
for x in divs:
url = x.a.img['srcset']
url = url.split('1.5x,')[-1]
url = url.split('2x')[0]
url = "https:" + url
url = url.replace(" ","")
path = f"D:\\Truck_Img_{count}.png"
response = requests.get(url)
file = open(path, "wb")
file.write(response.content)
file.close()
count+=1
print(f"Execution Time = {time.time()-st} seconds")
Output:
Execution Time = 9.65831208229065 seconds
29 Images. Here is the first image:
Hope that this helps!
Is there any way to perform a copy and paste using Selenium 2 and the Python bindings?
I've highlighted the element I want to copy and then I perform the following actions
copyActionChain.key_down(Keys.COMMAND).send_keys('C').key_up(Keys.COMMAND)
However, the highlighted text isn't copied.
To do this on a Mac and on PC, you can use these alternate keyboard shortcuts for cut, copy and paste. Note that some of them aren't available on a physical Mac keyboard, but work because of legacy keyboard shortcuts.
Alternate keyboard shortcuts for cut, copy and paste on a Mac
Cut => control+delete, or control+K
Copy => control+insert
Paste => shift+insert, or control+Y
If this doesn't work, use Keys.META instead, which is the official key that replaces the command ⌘ key
source: https://w3c.github.io/uievents/#keyboardevent
Here is a fully functional example:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
browser = webdriver.Safari(executable_path = '/usr/bin/safaridriver')
browser.get("http://www.python.org")
elem = browser.find_element_by_name("q")
elem.clear()
actions = ActionChains(browser)
actions.move_to_element(elem)
actions.click(elem) #select the element where to paste text
actions.key_down(Keys.META)
actions.send_keys('v')
actions.key_up(Keys.META)
actions.perform()
So in Selenium (Ruby), this would be roughly something like this to select the text in an element, and then copy it to the clipboard.
# double click the element to select all it's text
element.double_click
# copy the selected text to the clipboard using CTRL+INSERT
element.send_keys(:control, :insert)
Pretty simple actually:
from selenium.webdriver.common.keys import Keys
elem = find_element_by_name("our_element")
elem.send_keys("bar")
elem.send_keys(Keys.CONTROL, 'a') # highlight all in box
elem.send_keys(Keys.CONTROL, 'c') # copy
elem.send_keys(Keys.CONTROL, 'v') # paste
I imagine this could probably be extended to other commands as well.
Rather than using the actual keyboard shortcut i would make the webdriver get the text. You can do this by finding the inner text of the element.
WebElement element1 = wd.findElement(By.locatorType(locator));
String text = element1.getText();
This way your test project can actually access the text. This is beneficial for logging purposes, or maybe just to make sure the text says what you want it to say.
from here you can manipulate the element's text as one string so you have full control of what you enter into the element that you're pasting into. Now just
element2.clear();
element2.sendKeys(text);
where element2 is the element to paste the text into
elem.send_keys(Keys.SHIFT, Keys.INSERT)
It works FINE on macOS Catalina when you try to paste something.
I cannot try this on OSX at the moment, but it definitely works on FF and Ubuntu:
import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
with open('test.html', 'w') as fp:
fp.write("""\
<html>
<body>
<form>
<input type="text" name="intext" value="ABC">
<br>
<input type="text" name="outtext">
</form>
</body>
</html>
""")
driver = webdriver.Firefox()
driver.get('file:///{}/test.html'.format(os.getcwd()))
element1 = driver.find_element_by_name('intext')
element2 = driver.find_element_by_name('outtext')
time.sleep(1)
element1.send_keys(Keys.CONTROL, 'a')
time.sleep(1)
element1.send_keys(Keys.CONTROL, 'c')
time.sleep(1)
element2.send_keys(Keys.CONTROL, 'v')
The sleep() statements are just there to be able to see the steps, they are of course not necessary for the program to function.
The ActionChain send_key just switches to the selected element and does a send_keys on it.
The solutions involving sending keys do not work in headless mode. This is because the clipboard is a feature of the host OS and that is not available when running headless.
However all is not lost because you can simulate a paste event in JavaScript and run it in the page with execute_script.
const text = 'pasted text';
const dataTransfer = new DataTransfer();
dataTransfer.setData('text', text);
const event = new ClipboardEvent('paste', {
clipboardData: dataTransfer,
bubbles: true
});
const element = document.querySelector('input');
element.dispatchEvent(event)
Solution for both Linux and MacOS (for Chrome driver, not tested on FF)
The answer from #BradParks almost worked for me for MacOS, except for the copy/cut part. So, after some research I came up with a solution that works on both Linux and MacOS (code is in ruby).
It's a bit dirty, as it uses the same input to pre-paste the text, which can have some side-effects. If it was a problem for me, I'd try using different input, possibly creating one with execute_script.
def paste_into_input(input_selector, value)
input = find(input_selector)
# set input value skipping onChange callbacks
execute_script('arguments[0].focus(); arguments[0].value = arguments[1]', input, value)
value.size.times do
# select the text using shift + left arrow
input.send_keys [:shift, :left]
end
execute_script('document.execCommand("copy")') # copy on mac
input.send_keys [:control, 'c'] # copy on linux
input.send_keys [:shift, :insert] # paste on mac and linux
end
If you want to copy a cell text from the table and paste in search box,
Actions Class : For handling keyboard and mouse events selenium provided Actions Class
///
/// This Function is used to double click and select a cell text , then its used ctrl+c
/// then click on search box then ctrl+v also verify
/// </summary>
/// <param name="text"></param>
public void SelectAndCopyPasteCellText(string text)
{
var cellText = driver.FindElement(By.Id("CellTextID"));
if (cellText!= null)
{
Actions action = new Actions(driver);
action.MoveToElement(cellText).DoubleClick().Perform(); // used for Double click and select the text
action = new Actions(driver);
action.KeyDown(Keys.Control);
action.SendKeys("c");
action.KeyUp(Keys.Control);
action.Build().Perform(); // copy is performed
var searchBox = driver.FindElement(By.Id("SearchBoxID"));
searchBox.Click(); // clicked on search box
action = new Actions(driver);
action.KeyDown(Keys.Control);
action.SendKeys("v");
action.KeyUp(Keys.Control);
action.Build().Perform(); // paste is performed
var value = searchBox.GetAttribute("value"); // fetch the value search box
Assert.AreEqual(text, value, "Selection and copy paste is not working");
}
}
KeyDown(): This method simulates a keyboard action when a specific keyboard key needs to press.
KeyUp(): The keyboard key which presses using the KeyDown() method, doesn’t get released automatically, so keyUp() method is used to release the key explicitly.
SendKeys(): This method sends a series of keystrokes to a given web element.
If you are using Serenity Framework then use following snippet:
withAction().moveToElement(yourWebElement.doubleClick().perform();
withAction().keyDown(Keys.CONTROL).sendKeys("a");
withAction().keyUp(Keys.CONTROL);
withAction().build().perform();
withAction().keyDown(Keys.CONTROL).sendKeys("c");
withAction().keyUp(Keys.CONTROL);
withAction().build().perform();
withAction().keyDown(Keys.CONTROL).sendKeys("v");
withAction().keyUp(Keys.CONTROL);
withAction().build().perform();
String value = yourWebElement.getAttribute("value");
System.out.println("Value copied: "+value);
Then send this value wherever you want to send:
destinationWebElement.sendKeys(value);