How to enable "reader-mode" in chrome using selenium and scrapy? - selenium

You can enable the "reader-mode" by opening chrome://flags/#enable-reader-mode in Google Chrome.
Then, you can toggle the "reader mode" while browsing a webpage:
How to get the "reader mode" version of a web page using Selenium and chromedriver?

Not really the answer you probably want. You can enable the "reader-mode" feature by using ChromeOptions.
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--enable-features=ReaderMode")
driver = webdriver.Chrome(options=options)
driver.get(url)
Then you have to toggle it somehow. The Selenium documentation is quite clear that it is for not for testing browser functionality see https://stackoverflow.com/a/49801432/839338 so you are left with using another automation tool like AutoIt https://www.autoitscript.com/site/ or Sikuli http://doc.sikuli.org/ to find and toggle the "reader-mode" menu item. I'm not sure how you go about that using scrapy.

Related

I disabled loading images in chrome while using webdriver with selenium now cant enable it

I disabled loading images in chrome while using webdriver with selenium now cant enable it.
I was using python to webscrape on instagram so thought it would be a good idea to disable images.
The commands i used:
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu=true")
options.add_argument("--blink-settings=imagesEnabled=false")
And now I cannot change it from chrome settings.
Screenshot of chrome settings page.
Please Help.
Edit: This happens only in my default Chrome Profile. Other Profiles work fine even though the profile I use for selenium is a different one.
After a long day on google finally found the solution.
options.add_experimental_option("prefs", {'profile.managed_default_content_settings.images': 1})

Let selenium display browser while in --headless

Is there a way to run selenium headless but have it display the window for example at the start of the application ? like:
show browser
login
do captcha
go --headless
execute tasks
Actually, this is possible.
You can peek into a headless browser by using the "--remote-debugging-port" argument in ChromeOptions. For example,
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--remote-debugging-port=9222') # Recommended is 9222
driver = webdriver.Chrome(options=chrome_options)
Then while Selenium is running, you can go to http://localhost:9222 in your regular browser to view the headless session. Your browser will display links to each tab Chrome has open, and you can view/interact with each page by clicking on the links. You'll also be able to monitor Selenium/Chrome as Selenium runs its test.
Edit: As of Chrome 100 or so, you now need to go to the link chrome://inspect to view the headless session in your browser while Selenium is running.
You'll need to do some extra configuration as well. Basically:
Check the box for "Discover network Targets"
Hit "Configure..."
Add the link "localhost:9222" to the list of servers
CREDITS: https://stackoverflow.com/a/58045991/1136132

Selenium: Configure Firefox webdriver to not run in test mode

I'm using Selenium to automate a navigation in Google Meet website for a user who can't use the keyboard. However, Google Meet won't let me enter a meeting when using Chrome in test mode. If I configure Chrome webdriver to run as a regular browser, I can navigate on the website a little but eventually I can't enter a meeting at all. Here is the python code I use to initialize Chrome as a non-test browser:
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ['enable-automation'])
browser = webdriver.Chrome()
browser.get("https://meet.google.com/some_meeting_id")
time.sleep(3)
txtName = browser.find_element_by_css_selector('#jd\.anon_name')
txtName.send_keys("user_name" + Keys.RETURN) // redirects to an error page
My next hope is to use Firefox, but when I load it from webdriver, it opens with an orange address bar, indicating that it is being run by a test tool. Is there a way to run Firefox in normal mode from Selenium (just as I did with Chrome) or, even better, is there any additional configuration I can do in Chrome webdriver to make this work?
May be try this to switch the headless mode on firefox to normal mode
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = False
driver = webdriver.Firefox(options=options)
driver.get("http://stackoverflow.com")

Selenium interpret javascript on mac?

I'm trying to make a web crawler that click on ads (yes, i know), it's very sophisticated, but, I realise that Google Ads aren't showed when javascript is disabled. Today, i use Mechanize, and it doesn't "accept" javasript.
I heard selenium use another system to crawl the net.
The only thing I want to do is access my page, and click on the ad (generated by javascript).
Can Selenium do it ?
Selenium is a browser automation tool. You can basically automate everything you can do in your browser. Start with going through the Getting Started section of the documentation.
Example:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.python.org")
print driver.title
driver.close()
Besides automating common browsers, like Chrome, Firefox, Safari or Internet Explorer, you can also use PhantomJS headless browser.

HTMLUNIT with Headless Selenium

I am trying to scrape a website that contains images using a headless Selenium.
Initially, the website populates 50 images. If you scroll down more and more images are loaded.
Windows 7 x64
python 2.7
recent install of selenium
[1] Non-Headless
Navigating to the website with selenium as follows:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(url)
browser.execute_script('window.scrollBy(0, 10000)')
browser.page_source
This works (if anyone has a better suggestion please let me know).
I can continue to scrollBy() until I reach the end and then pull the source page.
[2] Headless with HTMLUNIT
from selenium import webdriver
driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)
driver.get(url)
I cannot use scrollBy() in this headless environment.
Any suggestions on how to scrape this kind of page?
Thanks
One option is to study the JavaScript to see how it calculates what to load next. Then implement that logic in your scraping client instead. Once you have done that, you can use faster scraping tools like Perl's WWW::Mechanize.
You need to enable JavaScript explicitly when using the HtmlUnit Driver:
driver.setJavascriptEnabled(true);
According to [http://code.google.com/p/selenium/wiki/HtmlUnitDriver](the docs), it should emulate IE's JavaScript handling by default.
When I tried the same method, I got error messages that selenium crashed while connecting java to simulate javascript.
I wrote the script into execute_script method then the code works well.
I guess the communication between selenium and java server part is not configured properly.
Enabling the javascript with HTMLUNITDRIVERWITHJS is possible and quick ;)