Bypassing Youtube Login + reCaptcha with Selenium - selenium

I'm trying to get the e-mail of Youtube Channel. This is an example link. As you can see, to see the e-mail you have to LOGIN and then bypass RECAPTCHA. Honestly saying, I'm stuck at the first stage. Google doesn't think the selenium browser isn't safe to login and blocks it. The reason (I'm guessing) is because it thinks I'm an automized browser. So basically I have to let it think it's not a bot. I tried randomizing user-agent, and turning off the auto extension by code, but it still blocks my scraper. I have also checked if the selenium browser has "turned-off" the javascript since Google told me to check if it is.
So, I'm guessing there is a way to scrape using Scrapy or requests? Or I haven't tried enough with Selenium? I've seen an useful video that scrapes subtitles from Youtube without loggin by TechLead. So it seems it's not impossible. I'm still working on it, so if anyone has an answer or advice please let me know, thank you for reading.
Refence
Google login block Selenium Google Login Block
Randomizing User Agent Way to change Google Chrome user agent in Selenium?
Websites detecting Selenium Automation How to make Selenium script undetectable using GeckoDriver and Firefox through Python?

Try this code, for bypassing Gmail login.
Use Seleniumwire with undetected browser v2
Note: put chromedriver in your sys path.
from seleniumwire.undetected_chromedriver.v2 import Chrome, ChromeOptions
import time
options = {}
chrome_options = ChromeOptions()
chrome_options.add_argument('--user-data-dir=hash')
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-dev-shm-usage")
# chrome_options.add_argument("--headless")
browser = Chrome(seleniumwire_options=options, options=chrome_options)
browser.get('https://gmail.com')
browser.find_element_by_xpath('//*[#id="identifierId"]').send_keys('your-email')
browser.find_element_by_xpath('//*[#id="identifierNext"]/div/button').click()
time.sleep(5)
browser.find_element_by_xpath('//*[#id="password"]/div[1]/div/div[1]/input').send_keys('you-password')
browser.find_element_by_xpath('//*[#id="passwordNext"]/div/button').click()
In addition to this, selenium wire has many awesome features, check out Github repository

Related

reCAPTCHA bot detected only when selenium webdriver is headless

As title, my program would go to a website, get reCAPTCHA audio file, recognize it, send the answer, then crawl something.
It works well when my webdriver is visible(not headless).
However when I change webdriver option to headless, reCAPTCHA detect that the webdriver is a bot when it trying to get audio file after few recognition times.
Is anything else way or option to hide my webdriver and avoid being detected as bot?
By the way, the webdriver I am using is EdgeDriver.

Login through Google SSO using automated browser

I am trying to automate login to my app which uses among others, google sso authentication.
However login form return error "This browser or app may not be secure.". I set my google account options to allow less secure apps but still nothing.
I browsed few topics:
GMail is blocking login via Automation (Selenium)
Selenium Google Login Block
Automation Google login with python and selenium shows ""This browser or app may be not secure""
And it seems that google is blocking this way at all in favor of oauth.
People write in these topics that solutions stopped working recently
So is it currently possible, to set ChromeDriver somehow using capabalities, to be able to login through SSO?. I need a simple solution, that will run headless with other scripts on cloud (not something that would require me to manually login first on another instance as one anwser suggests).
If its not possible or extremly complicated please tell me I will not waste time on it.
If you want to use chrome capabilities, what you can do is set the user-data-dir to a chrome profile that has already been signed in using SSO.
You should look up how to reuse chrome profiles with selenium.
If your accounts have 2 steps verifications, google believe it's safer and allows you to get login. Then the issue will be how to handle the 2 steps verifications. Working on that :/

Get past Google login using Selenium chrome driver

I need to get past a Google login form using Selenium but am unable to do so. Following advice I've seen elsewhere, I've tried making the chrome driver launch with an existing chrome profile which is already logged in to Google:
options.addArguments("user-data-dir=/Users/myuser/Library/Application Support/Google/Chrome/");
options.addArguments("profile-directory=Profile 3");
This works insofar as the correct chrome profile is opened, but as soon as selenium navigates to the desired website the Google login form pops up. If I close Selenium and open the same website with the same chrome profile in a regular browser, the browser remembers me and I am able to go straight through without being prompted to login. Am I doing something wrong or has Google fixed this workaround? (I am using Mac).

How does reCAPTCHA 3 know I'm using Selenium/chromedriver?

I'm curious how reCAPTCHA v3 works. Specifically the browser fingerprinting.
When I launch an instance of Chrome through Selenium/chromedriver and test against reCAPTCHA 3 (https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php) I always get a score of 0.1 when using Selenium/chromedriver.
When using incognito with a normal instance, I get 0.3.
I've beaten other detection systems by injecting JavaScript and modifying the web driver object and recompiling webdriver from source and modifying the $cdc_ variables.
I can see what looks like some obfuscated POST back to the server, so I'm going to start digging there.
What might it be looking for to determine if I'm running Selenium/chromedriver?
reCaptcha
Websites can easily detect the network traffic and identify your program as a BOT. Google have already released 5(five) reCAPTCHA to choose from when creating a new site. While four of them are active and reCAPTCHA v1 being shutdown.
reCAPTCHA versions and types
reCAPTCHA v3 (verify requests with a score): reCAPTCHA v3 allows you to verify if an interaction is legitimate without any user interaction. It is a pure JavaScript API returning a score, giving you the ability to take action in the context of your site: for instance requiring additional factors of authentication, sending a post to moderation, or throttling bots that may be scraping content.
reCAPTCHA v2 - "I'm not a robot" Checkbox: The "I'm not a robot" Checkbox requires the user to click a checkbox indicating the user is not a robot. This will either pass the user immediately (with No CAPTCHA) or challenge them to validate whether or not they are human. This is the simplest option to integrate with and only requires two lines of HTML to render the checkbox.
reCAPTCHA v2 - Invisible reCAPTCHA badge: The invisible reCAPTCHA badge does not require the user to click on a checkbox, instead it is invoked directly when the user clicks on an existing button on your site or can be invoked via a JavaScript API call. The integration requires a JavaScript callback when reCAPTCHA verification is complete. By default only the most suspicious traffic will be prompted to solve a captcha. To alter this behavior edit your site security preference under advanced settings.
reCAPTCHA v2 - Android: The reCAPTCHA Android library is part of the Google Play services SafetyNet APIs. This library provides native Android APIs that you can integrate directly into an app. You should set up Google Play services in your app and connect to the GoogleApiClient before invoking the reCAPTCHA API. This will either pass the user through immediately (without a CAPTCHA prompt) or challenge them to validate whether they are human.
reCAPTCHA v1: reCAPTCHA v1 has been shut down since March 2018.
Solution
However there are some generic approaches to avoid getting detected while web-scraping:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
Outro
Some food for thought:
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Unable to use Selenium to automate Chase site login
Confidence Score of the request using reCAPTCHA v3 API
Selenium and Puppeteer have some browser configurations that is different from a non-automated browser. Also, since some JavaScript functions are injected into browser to manipulate elements, you need to create some override to avoid detections.
There are some good articles explaining some points about Selenium and Puppeteer detection while it runs on a site with detection mechanisms:
Detecting Chrome headless, new techniques - You can use it to write defensive code for your bot.
It is not possible to detect and block Google Chrome headless - it explains in a clear and sound way the differences that JavaScript code can detect between a browser launched by automated software and a real one, and also how to fake it.
GitHub - headless-cat-n-mouse - Example using Puppeteer + Python to avoid detection

Basic Authentication with Firefox in Selenium and Nightwatch.js

I use Nightwatch-Cucumber based on Nightwatch.js. Nightwatch.js is a NodeJS framework based on Selenium.
Currently I'm looking for a smart and simple solution to handle with Basic Authentication, especially with the Firefox. In Chrome I can do the Basic Auth via URL, like:
https://user:password#mydomain.com
But in Firefox, I get an anti-phishing dialogue box. I tried it manually on about:config in Firefox with the new entry:
network.http.phishy-userpass-length=255
But without success.
At How to Perform Basic Authentication for FirefoxDriver, ChromeDriver & IEdriver in Selenium WebDriver?, a lot of different solutions were named. But no solution is a smart one.
Maybe is it possible to send a Basic Auth header via Selenium? Or is there any other solution to realize such a Basic Auth, especially in Firefox. A generic solution for every browser/driver will be the smartest solution.