How to use selenium in pandas to read a webpage? - pandas

I want to collect the information of a webpage using chromedriver. How do I install it and use it?

You have to install selenium first if you don't have it already. Then to use selenium:
from selenium.webdriver import Chrome
url="URL of the webpage you want to read"
setting up the driver
webdriver = "path of the chromedriver.exe file saved in your pc"
driver.get(url)
using css selector
y = driver.find_element_by_css_selector('css selector of the data you want to read from the webpage').text
print(y)

You don't install the chromedriver - you download the .exe (from here) and use the path to it in webdriver.Chrome(). This getting started page has a comprehensive guide:
from selenium import webdriver
driver = webdriver.Chrome('/path/to/chromedriver') # refers to the path where you saved the exe
driver.get('http://www.google.com/');
time.sleep(5) # Let the user actually see something!
search_box = driver.find_element_by_name('q')
search_box.send_keys('ChromeDriver')
search_box.submit()
time.sleep(5) # Let the user actually see something!
driver.quit()
Note: download the .exe that matches with your version of chrome!
(In Help > About Google Chrome)

As mentioned by #Patha_Mondal, you need to download the driver and select the elements you want to read. However, as your original question asks "How to use selenium in pandas to read a webpage?", I would say instead consider using Scrapy along with Selenium to create a ".csv" file from the Webpage Data.
Read the ".csv" data into pandas using pandas.read_csv() .
The data from the Webpage might not be clean or properly formatted. Using Scrapy to create a dataset out of it would be beneficial for reading it into pandas. Avoid using pandas directly in the same script as Selenium and Scrapy.
Hope it Helped.

Related

Can I install chromedriver on python anywhere and not use it headless?

I am trying to use this code that works on my local machine on python anywhere and i want to understand if it is even possible:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
# Initialize webdriver
driver = webdriver.Chrome(executable_path="/Users/matteo/Downloads/chromedriver")
# Navigate to website
driver.get("https://apnews.com/article/prince-harry-book-meghan-royals-4141be64bcd1521d1d5cf0f9b65e20b5")
time.sleep(5)
# Parse page source
soup = BeautifulSoup(driver.page_source, "html.parser")
# Find desired elements using Beautiful Soup
elements = soup.find_all("p")
# Print element text
for element in elements:
print(element.text)
# Close webdriver
driver.quit()
Do i need to have installed chrome to make that work or is chromium enough? Because when i run that code on my local machine a chrome page opens up. How does that work on python anywhere? Would it crush?
I am wondering if the code i am using only works if someone is on a GUI with Chrome installed or if it can work on python anywhere too.
The short answer is no. ChromeDriver must have chrome installed. You can run your tests headless for time save, but chrome still must be installed.

Selenium and Chromedriver on Lambda - Inconsistency at scale

I have a basic selenium script running/deployed on aws lambda.
Chrome and chromedriver and installed as layers (and available on /opt) via serverless.
The script works ... but only some of the time and rarely at scale (invoking more than 5 instances asynchronously).
I invoke the function in a simple for loop (up to about 200 iterations)
response = client.invoke(
FunctionName='arn:aws:lambda:us-east-1:12345667:function:selenium-lambda-dev-hello',
InvocationType='Event', #|'RequestResponse'|'Event' (async)| DryRun'
LogType='Tail',
#ClientContext='string',
Payload=event_payload,
#Qualifier='24'
)
Other runs, the process hangs while initiating the selenium driver on this line
driver = webdriver.Chrome('/opt/chromedriver_89', chrome_options=options)
other iterations the drivers fail/throw a 'timeout waiting for renderer exception'
This I believe is often due to a mismatch of chromedriver/chrome. I have checked and verified my versions are matched up and compatible (and like i said they do work sometimes).
I guess i'm looking for some ideas/direction to even begin to troubleshoot this. I was under the impression that each invokation of a lambda function is in a separate environment, so why would increasing the volume of invokations have any adverse effect on how well my script runs?
Any ideas or thoughts would be greatly appreciated all!
Discussed in the comments.
There's no complete solution but what has helped improve the situation was increasing the memory of the Lambda service.
The alternatives to try/consider:
Don't use Chrome. Use requests and lxml to query the pages via the network level and remove the need for Chrome.
I did something similar to support another stack question recently. You can see it's similar but not quite the same.
Go to a url and get some text from an xpath:
from lxml import html
import requests
import json
url = "https://nonfungible.com/market/history"
response = requests.get(url)
page = html.fromstring(response.content)
datastring = page.xpath('//script[#id="__NEXT_DATA__"]/text()')
Don't use Chrome. Chrome is ideal for functional testing - but if that's not you objective consider using HtmlUnitDriver or phantomjs. Both of these are significantly lighter than chrome and won't require a browser to be installed (you can run it straight from libraries)
You only need to change the driver initialisation and the rest of the script (in theory) should work.
PhantomJS:
$ pip install selenium
$ brew install phantomjs
from selenium import webdriver
driver = webdriver.PhantomJS()
Unit Driver:
from selenium import webdriver
driver = webdriver.Remote(
desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)

Is there a way that you can use the browser profiles on your desktop as the Selenium webdriver? [duplicate]

So whenever I try to use my Chrome settings (the settings I use in the default browser) by adding
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\Users\... (my webdriver path)")
driver = webdriver.Chrome(executable_path="myPath", options=options)
it shows me the error code
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes n 16-17: truncated \UXXXXXXXX escape
in my bash. I don't know what that means and I'd be happy for any kind of help I can get. Thanks in advance!
The accepted answer is wrong. This is the official and correct way to do it:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument(r"--user-data-dir=C:\path\to\chrome\user\data") #e.g. C:\Users\You\AppData\Local\Google\Chrome\User Data
options.add_argument(r'--profile-directory=YourProfileDir') #e.g. Profile 3
driver = webdriver.Chrome(executable_path=r'C:\path\to\chromedriver.exe', chrome_options=options)
driver.get("https://www.google.co.in")
To find the profile folder on Windows right-click the desktop shortcut of the Chrome profile you want to use and go to properties -> shortcut and you will find it in the "target" text box.
To get the path, follow the steps below.
In the search bar type the following and press enter
This will then show all the metadata. There find the path to the profile
As per your question and your code trials if you want to open a Chrome Browsing Session here are the following options:
To use the default Chrome Profile:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Users\\AtechM_03\\AppData\\Local\\Google\\Chrome\\User Data\\Default")
driver = webdriver.Chrome(executable_path=r'C:\path\to\chromedriver.exe', chrome_options=options)
driver.get("https://www.google.co.in")
Note: Your default chrome profile would contain a lot of bookmarks, extensions, theme, cookies etc. Selenium may fail to load it. So as per the best practices create a new chrome profile for your #Test and store/save/configure within the profile the required data.
To use the customized Chrome Profile:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("user-data-dir=C:\\Users\\AtechM_03\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 2")
driver = webdriver.Chrome(executable_path=r'C:\path\to\chromedriver.exe', chrome_options=options)
driver.get("https://www.google.co.in")
Here you will find a detailed discussion on How to open a Chrome Profile through Python
This is how I managed to use EXISTING CHROME PROFILE in php selenium webdriver.
Profile 6 is NOT my default profile. I dont know how to run default profile. It is IMPORTANT not to add -- before chrome option arguments! All other variants of options didnt work!
<?php
//...
$chromeOptions = new ChromeOptions();
$chromeOptions->addArguments([
'user-data-dir=C:/Users/MyUser/AppData/Local/Google/Chrome/User Data',
'profile-directory=Profile 6'
]);
$host = 'http://localhost:4444/wd/hub'; // this is the default
$capabilities = DesiredCapabilities::chrome();
$capabilities->setCapability(ChromeOptions::CAPABILITY, $chromeOptions);
$driver = RemoteWebDriver::create($host, $capabilities, 100000, 100000);
To get name of your chrome profile, go to chrome://settings/manageProfile, click on profile icon, click "Show profile shortcut on my desktop". After that right click on desktop profile icon and go to properties, here you will see something like "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --profile-directory="Profile 6".
Also I recommend you to close all chrome instances before running this code. Also maybe you need to TURN OFF chrome settings > advanced > system > "Continue running background apps when Google Chrome is closed".
None of the given answers were working for me so I researched a bit and now the working code is for is this one. I copied the user dir folder from Profile Path from chrome://version/ and made another argument for the profile as shown below:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument('user-data-dir=C:\\Users\\gupta\\AppData\\Local\\Google\\Chrome\\User Data')
options.add_argument('profile-directory=Profile 1')
driver = webdriver.Chrome(executable_path=r'C:\Program Files (x86)\chromedriver.exe', options=options)
driver.get('https://google.com')
Are you sure you are meant to be putting in the webdriver path in the user-data-dir argument? That's usually where you put your chrome profile e.g. "C:\Users\yourusername\AppData\Local\Google\Chrome\User Data\Profile 1\". Also you will need to use either double backslashes or forward slashes in your directory path (both work). You can test if your path works by using the os library
e.g.
import os
os.list("C:\\Users\\yourusername\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 1")
will give you the directory listing.
I might also add that occasionally if you manage to crash chrome while running webdriver with a nominated user profile, that it seems to record the crash in the profile and the next time you open chrome, you get the Chrome prompt to restore pages after it exited abnormally. For me personally this had been a bit of headache to deal with and I no longer use a user profile with chromedriver because of it. I could not find a way around it. Other people have reported it here, but none of their solutions seemed to work for me, or were not suitable for my test cases. https://superuser.com/questions/237608/how-to-hide-chrome-warning-after-crash
If you don't nominate a user profile it seems to create a new (blank) temporary one each time it runs
Make sure you've got the path to the profile right, and that you double escape backslashes in said path.
For example, typically the default profile on windows is located at:
"C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data\\Default"
I managed to launch my chrome profile using these arguments:
ChromeOptions options = new ChromeOptions();
options.addArguments("--user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data");
options.addArguments("--profile-directory=Profile 2");
WebDriver driver = new ChromeDriver(options);
You can find out more about the web driver here
You simply have to replace the '\' to '/' in your paths and that'll resolve it.
Get profile name by navigating to chrome://version from your chrome browser (You'll see Profile Path, but you only want the profile name from it (e.g. Profile 1)
Close out all Chrome sessions using the profile you want to use. (or else you will get the following error: InvalidArgumentException)
Now make sure you have the code below (Make sure you replace UserFolder with the name of your userfolder.
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Users\\EnterYourUserFolder\\AppData\\Local\\Google\\Chrome\\User Data") #leave out the profile
options.add_argument("profile-directory=Profile 1") #enter profile here
driver = webdriver.Chrome(executable_path="C:\\chromedriver.exe", chrome_options=options)
this worked for me 100% and it showed up my selected profile.
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# path of your chrome webdriver
dir_path = os.getcwd()
user_profile_path = os.environ[ 'USERPROFILE' ]
#if "frtkpr" which is ll be your custom profile does not exist it will be created.
option.add_argument( "user-data-dir=" + user_profile_path + "/AppData/Local/Google/Chrome/User Data/frtkpr" )
driver = webdriver.Chrome( dir_path + "/chromedriver.exe",chrome_options=option )
baseUrl = "https://www.facebook.com/"
driver.maximize_window()
driver.get( baseUrl )

Click on elements in Chrome Extension with selenium

I have been searching on the internet using Selenium (Java) interacting with Google Chrome Extension but have not been able to find an answer.
First Question
Is there a way to launch the chrome extension since Selenium only interact with WebView but not on the chrome extensions button in the browser ?
I try this method
"chrome-extension://id/index.html" but the extension did not launch as expected. I like find if there is another way to launch a chrome extension through selenium
Second Question
I am trying to click on the elements in a chrome extension with Selenium webdriver. How do I do it ? I tried the driver.CurrentWindowHandle , but it does not detect the chrome extension.
Thanks
Below is the solution with pyautogui (similar to autoit in java - so you can extend the same solution for java also).
Pre-Condition:
save the extension image in the project folder (I saved it under "autogui_ref_snaps" folder in my example with "capture_full_screenshot.png" name
Python:
Imports needed
from selenium import webdriver
from selenium.webdriver import ChromeOptions
from Common_Methods.GenericMethods import *
import pyautogui #<== need this to click on extension
Script:
options = ChromeOptions()
options.add_argument("--load-extension=" + r"C:\Users\supputuri\AppData\Local\Google\Chrome\User Data\Default\Extensions\fdpohaocaechififmbbbbbknoalclacl\5.1_0") #<== loading unpacked extension
driver = webdriver.Chrome(
executable_path=os.path.join(chrome_options=options)
url = "https://google.com/"
driver.get(url)
# get the extension box
extn = pyautogui.locateOnScreen(os.path.join(GenericMethods.get_full_path_to_folder('autogui_ref_snaps') + "/capture_full_screenshot.png"))
# click on extension
pyautogui.click(x=extn[0],y=extn[1],clicks=1,interval=0.0,button="left")
If you are loading an extension and it's not available in incognito mode then follow my answer in here to enable it.
Try to click on extension with this JsExecutor method:
driver.execute_script("window.postMessage('clicked_browser_action', '*')")

Unable to load webextension in firefox using python selenium

Hi I am seeing this issue when I try to load a Firefox webextension using Python Selenium:
selenium.webdriver.firefox.firefox_profile.AddonFormatError: ("[Errno 2] No such
file or directory: 'c:\\users\\admini~1\\appdata\\local\\temp\\tmpr
wj4ed.xxx.xpi\\install.rdf'", )
Code is as below
from selenium import webdriver
extn_path = "C:\Program Files (x86)\xxxxx\xxx.xpi"
profile = webdriver.FirefoxProfile()
profile.add_extension(extn_path)
self.browser=webdriver.Firefox(profile,executable_path='xxx\geckodriver.exe')
Versions used
Selenium 3.12
Gecko v.0.20
Firefox- 60
Can anyone let me know why I am facing this issue. I did see that many people are facing this issue and it's mentioned that it's a known issue but with latest Selenium and Gecko driver issue it is expected to be resolved.
However, I don't see it working. Any tips or inputs.
can you try double front slash instead one single front slash inside your path