I've been trying to scrape some websites and decided to use proxies for a while it went fine i guess but now I'm getting ERR_EMPTY_RESPONSE quite often and once every 10 or so tries, the page will load correctly, without changing the code, of course.
When I'm not running headless and I get an empty response, I can click the "reload" button on the chrome webdriver and the page will sometimes load.
It's got something to do with the fact that I use a proxy but I can't really work out why it's working sometimes and why it gives me an empty response so often.
Has anyone else run into a similar problem?
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
PROXY = "35.230.111.164" + ":" + "8888"
webdriver.DesiredCapabilities.CHROME["proxy"] = {
"httpProxy": PROXY,
"ftpProxy": PROXY,
"sslProxy": PROXY,
"noProxy": None,
"proxyType": "MANUAL",
"autodetect": False,
}
options = webdriver.ChromeOptions()
# options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
options.add_argument(f"user-agent={user_agent}")
driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 10)
driver.get(
"https://whatismyipaddress.com/"
)
driver.implicitly_wait(10)
#driver.find_element_by_xpath('//*[#id="footer_tc_privacy_button_2"]').click()
I have experienced problems with proxy + chrome/chromedriver forever now, and the only solution I came up with was trying to detect the error and reload programmatically, something like this:
err = driver.find_element_by_css('#main-frame-error .error-code')
# found chrome err! lets reload...
if(err){
driver.get(driver.current_url)
}
Related
I am trying to open a website using undetected_chromedriver with the headless model. The website is using Cloudflare. If I am using without a headless model then the script is working but when I use headless = True it shows a captcha.
import ssl
import time
ssl._create_default_https_context = ssl._create_unverified_context
import undetected_chromedriver as uc
from selenium import webdriver
import os
options = webdriver.ChromeOptions()
options.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"')
driver = uc.Chrome(options=options)
# driver = webdriver.Chrome()
driver.get('website_url')
# import time
time.sleep(10)
driver.save_screenshot('sample.png')
Now if I set headless = True, it's showing a captcha
import ssl
import time
ssl._create_default_https_context = ssl._create_unverified_context
import undetected_chromedriver as uc
from selenium import webdriver
import os
options = webdriver.ChromeOptions()
options.add_argument('--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"')
options.headless = True
driver = uc.Chrome(options=options)
driver.get('website_url')
# import time
time.sleep(10)
driver.save_screenshot('sample.png')
How Can I make it undetectable from cloudflare?
Add this chrome options as well.
options.add_argument('--disable-blink-features=AutomationControlled')
This should prevent detection.
We are trying to access a website
Url=https://www.nseindia.com/option-chain using selenium.
However, it loads only once if we reload it, we get an access denied error.
CODE-
from webdriver_manager.chrome import ChromeDriverManager
import time
from selenium.webdriver.chrome.options import Options
opts = Options()
user_agent = ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/39.0.2171.95 Safari/537.36')
opts.add_argument(f'user-agent={user_agent}')
opts.add_argument('--disable-infobars')
browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get('https://www.nseindia.com/option-chain')
time.sleep(1000)
Some website use anti-bot protection, that can detect your bot thanks to some differencies between automated brower and standard browser.
You should try to add these settings:
opts.add_argument('--disable-blink-features=AutomationControlled')
opts.add_experimental_option('useAutomationExtension', False)
opts.add_experimental_option("excludeSwitches", ["enable-automation"])
If this don't work, try Undetected Chromedriver, it work like the standard chrome driver, but it patch it with more setting to increase stealthiness.
By the way, your user-agent looks a little bit outdated, you should sue a newer one according to your chromedriver version, like this one: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36
options.add_argument("--disable-infobars")
options.add_argument("--disable-notifications")
options.add_argument("--disable-default-apps")
options.add_argument("--disable-web-security")
options.add_argument("--disable-site-isolation-trials")
options.add_argument("--disable-logging")
options.add_argument("--disable-bundled-ppapi-flash")
options.add_argument("--disable-gpu-compositing")
options.add_argument("--disable-gpu-shader-disk-cache")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--disable-extensions")
options.add_argument("--log-level=3")
# options.add_argument("--window-size=600,600")
# options.page_load_strategy = 'none'
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
I am trying to use BrowsermobProxy to Sniff network logs from my UI automation tests. I have to implement it for both HEAD and HEADLESS RemoteDriver. For Head-Driver it is working like a charm.
Code for my Head-Remote Driver:
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(browserMobProxy);
seleniumProxy.setNoProxy("<-loopback>");
seleniumProxy.setHttpProxy("localhost" + ":" + browserMobProxy.getPort());
seleniumProxy.setSslProxy("localhost" + ":" + browserMobProxy.getPort());
desiredCapabilities = DesiredCapabilities.chrome();
desiredCapabilities.setCapability(CapabilityType.UNEXPECTED_ALERT_BEHAVIOUR, UnexpectedAlertBehaviour.IGNORE);
desiredCapabilities.setCapability(CapabilityType.ACCEPT_SSL_CERTS, Boolean.TRUE);
desiredCapabilities.setAcceptInsecureCerts(Boolean.TRUE);
desiredCapabilities.setJavascriptEnabled(Boolean.TRUE);
ChromeOptions options = getChromeOptions(driverConfig.getUserAgent(), Boolean.TRUE);
options.setCapability(CapabilityType.PROXY, seleniumProxy);
desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
return RemoteWebDriver(remoteAddress, desiredCapabilities)
For Headless-RemoteDriver code:
ChromeOptions options = new ChromeOptions();
options.addArguments("--user-agent=" + "HeadlessChrome");
options.setAcceptInsecureCerts(Boolean.TRUE);
options.setHeadless(true);
options.addArguments("--allow-insecure-localhost", "--no-sandbox", "--disable-extensions", "--window-size=1920,1080");
options.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
List<String> optionsArg = ImmutableList.of("--ignore-certificate-errors", "--proxy-bypass-list=<-loopback>",
"--proxy-server=http://localhost:" + mobProxy.getPort(), "--remote-debugging-port=9222");
options.addArguments("--ssl-protocol=any");
options.addArguments("--allow-running-insecure-content");
options.addArguments(optionsArg);
capabilities.setCapability(ChromeOptions.CAPABILITY, options);
driver = new RemoteWebDriver(driverUrl, capabilities);
In case of my headless driver the config for driver are as follows:
Driver URL -> http://localhost:<port_number> [eg: http://localhost:29515]
BrowserMob Proxy Port -> <dynamic_port> [eg:33173]
Proxy Server Address -> http://localhost:<dynamic_port> [eg: http://localhost:33173]
The full capabilities list before driver creation for headless is as follows:
{acceptInsecureCerts=true, acceptSslCerts=true, browserName=chrome,
goog:chromeOptions={args=[--user-agent=HeadlessChrome, --headless, --disable-gpu,
--allow-insecure-localhost, --no-sandbox, --disable-extensions,
--window-size=1920,1080, --ssl-protocol=any, --allow-running-insecure-content,
--ignore-certificate-errors, --proxy-bypass-list=<-loopback>,
--proxy-server=http://localhost:33173, --remote-debugging-port=9222], extensions=[]},
platform=ANY, version=}
Result
When taken screenshot I am seeing a white page only on test failure, Implicit waits are already applies for page load and on top of that I have tried with 5 seconds static sleep time. No fix. I get a HTML dump:
<html><head></head><body></body></html>
NOTES
I have tried with the argument --proxy-bypass-list=*, but in this case the network is not routed through mob proxy, as I am unable to record any logs. The pages are loading fine in this case.
The capabilities: --ssl-protocol=any, --remote-debugging-port=9222 and --allow-running-insecure-content are extras, I have tried without them also, no avail.
I am using browser-mob-proxy = 2.1.5
ChromeDriver version = 92.0.4515.159
I am running this on a Remote Linux CLI System
When starting the browsermob proxy I am doing the following:
System.setProperty("bmp.allowNativeDnsFallback", "true");
BrowserMobProxy browserMobProxy = new BrowserMobProxyServer();
browserMobProxy.setTrustAllServers(Boolean.TRUE);
Kindly, someone help to resolve this issue, I am stuck for at least a week now and my deadline is nearing fast.
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
For headless chrome , useragent is set as chromeheadless or something , this makes website to detect that you are using headless chrome and behave differently
You can vent this by specifying hardcoded useragent,
open a normal chrome , goto network tab , open request header and copy the user agent part and replace in your code
Otherwise: Anyone have any good pypeteer guides they can link me to?
Hi guys, i'm trying to run chrome headless. This is my code:
def init_chrome(profile, headless, proxy: str = "159.197.220.31:5594"):
options = webdriver.ChromeOptions()
if headless:
options.add_argument("--headless")
print('headless set')
options.binary_location = r"C:\Program Files\Google\Chrome\Application\chrome.exe"
options.add_argument("--remote-debugging-port=8000")
options.add_argument(f'--proxy-server={proxy}')
options.add_argument(f"user-data-dir=browser-profiles/chrome/{profile}")
options.add_argument("--disable-blink-features")
options.add_argument("--disable-notifications")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("excludeSwitches", ["disable-popup-blocking"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36")
driver = webdriver.Chrome(options=options,
executable_path=fr'{getcwd()}\chromedriver.exe')
print('driver launched')
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
# driver.set_window_size(1200, 900)
driver.implicitly_wait(5)
return driver
This is the output:
headless set
driver launched
As you can see, the driver is launched, but it never goes to the page. It just sits there and eats resources. What am I doing wrong? For context, this works fine when headless=False.
I suppose you need [--no-sandbox] to by pass OS security and if you using Windows add
[--disable-gpu] to enable headless.
It should look like this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('start-maximized')
driver =webdriver.Chrome(executable_path= path to your browser, options=options)
driver.get("http://google.com/")
driver.quit()
This code was working fine, but when switched to firefox from chrome now this is giving me error.
Please help me to find whats the solution
FirefoxOptions options = new FirefoxOptions();
options.addArguments("--incognito");
options.addArguments("start-maximized");
// options.addArguments("--user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac\n" + "OS X; en-us) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53");
options.addArguments("--user-agent=Mozilla/5.0 (Linux; Android 6.0.1; Nexus 6P Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36");
capabilities.setCapability(FirefoxOptions(capabilities), options);
System.setProperty("webdriver.gecko.driver","/Users/abcd/Downloads/geckodriver");
WebDriver driver = null;
Please help ....
I can sense several issues:
driver.manage().window().maximize();
to browse incognito with Firefox use
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument("-private")
or see: Python/Selenium incognito/private mode
adjusting the user agent is more difficult: https://stackoverflow.com/a/42171906/8291949
Basically, there is an about:config option general.useragent.override, which you can specify in the user.js file in the profile folder with a line like:
user_pref("general.useragent.extra.firefox", "Mozilla/5.0 AppleWebKit/537.36…")
And than use capabilities to use the profile. Regarding capabilities see: Can't open browser with Selenium after Firefox update