I am using Selenium Wire in Python, as I understand Selenium doesn't allow for modifications of headers.
How do I add headers to selenium driver? I googled and found I could add the refers header using the below code:
from seleniumwire import webdriver
driver = webdriver.Chrome()
def intercept(req):
del request.headers['Refers']
request.headers['Refers']='https://www.google.com/'
driver.request_interceptor= intercept
driver.get('URL')
But how do I add other headers like
Host: 127.0.0.1:65432
Connection: keep-alive
Cache-Control: max-age=0
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Related
We are trying to access a website
Url=https://www.nseindia.com/option-chain using selenium.
However, it loads only once if we reload it, we get an access denied error.
CODE-
from webdriver_manager.chrome import ChromeDriverManager
import time
from selenium.webdriver.chrome.options import Options
opts = Options()
user_agent = ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/39.0.2171.95 Safari/537.36')
opts.add_argument(f'user-agent={user_agent}')
opts.add_argument('--disable-infobars')
browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get('https://www.nseindia.com/option-chain')
time.sleep(1000)
Some website use anti-bot protection, that can detect your bot thanks to some differencies between automated brower and standard browser.
You should try to add these settings:
opts.add_argument('--disable-blink-features=AutomationControlled')
opts.add_experimental_option('useAutomationExtension', False)
opts.add_experimental_option("excludeSwitches", ["enable-automation"])
If this don't work, try Undetected Chromedriver, it work like the standard chrome driver, but it patch it with more setting to increase stealthiness.
By the way, your user-agent looks a little bit outdated, you should sue a newer one according to your chromedriver version, like this one: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36
options.add_argument("--disable-infobars")
options.add_argument("--disable-notifications")
options.add_argument("--disable-default-apps")
options.add_argument("--disable-web-security")
options.add_argument("--disable-site-isolation-trials")
options.add_argument("--disable-logging")
options.add_argument("--disable-bundled-ppapi-flash")
options.add_argument("--disable-gpu-compositing")
options.add_argument("--disable-gpu-shader-disk-cache")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--disable-extensions")
options.add_argument("--log-level=3")
# options.add_argument("--window-size=600,600")
# options.page_load_strategy = 'none'
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
I have been trying to disable DEBUG message to console, but no matter what I do, it still display on the console . I need to find a way to disable the constant logging of the HTTP request and response on the console . Code used :
System.setProperty(ChromeDriverService.CHROME_DRIVER_SILENT_OUTPUT_PROPERTY, "true");
java.util.logging.Logger.getLogger("org.openqa.selenium").setLevel(Level.SEVERE);
HashMap<String, Object> chromePrefs = new HashMap<String, Object>();
chromePrefs.put("profile.default_content_settings.popups", 0);
chromePrefs.put("download.default_directory", downloadPath);
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("prefs", chromePrefs);
options.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
options.setCapability(ChromeOptions.CAPABILITY, options);
// options.setCapability(ChromeOptions.CAPABILITY, options);
LoggingPreferences logPrefs = new LoggingPreferences();
logPrefs.enable(LogType.PERFORMANCE, Level.ALL);
options.addArguments("--disable-logging");
options.addArguments("--log-level=3");
options.addArguments("--silent");
options.setCapability( "goog:loggingPrefs", logPrefs );
options.setCapability(CapabilityType.LOGGING_PREFS, logPrefs);
options.setAcceptInsecureCerts(true);
System.out.println("Launching Google Chrome Browser");
//ChromeDriverManager.getInstance(CHROME).setup();
WebDriverManager.chromedriver().setup();
// options.merge(cap);
driver = new ChromeDriver(options);
driver.manage().timeouts().implicitlyWait(TimeOut,TimeUnit.SECONDS);
driver.manage().window().maximize();
on console
Request DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
POST /session HTTP/1.1
User-Agent: selenium/4.0.0 (java windows)
Content-Length: 1259
Content-Type: application/json; charset=utf-8
host: localhost:61670
accept: */*
Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
cache-control: no-cache
content-length: 788
[AsyncHttpClient-1-2] DEBUG org.asynchttpclient.netty.channel.ChannelManager - Adding key: http://localhost:61670 for channel [id: 0xaa9e94af, L:/127.0.0.1:63133 - R:localhost/127.0.0.1:61670]
[AsyncHttpClient-1-3] DEBUG org.asynchttpclient.netty.channel.NettyConnectListener - Using new Channel '[id: 0x60845110, L:/127.0.0.1:63145 - R:localhost/127.0.0.1:63134]' for 'GET' to '/json/version'
[AsyncHttpClient-1-3] DEBUG org.asynchttpclient.netty.handler.HttpHandler -
Request DefaultFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: EmptyByteBufBE)
GET /json/version HTTP/1.1
User-Agent: selenium/4.0.0 (java windows)
host: localhost:63134
accept: */*
Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
content-length: 424
I need to find a way to remove the debug message during execution , this occur on selenium 4.0
Solution – Empty Configuration
To fix it, create an empty configuration file as logback-test.xml, and save it under $project/src/test/resources
$project/src/test/resources/logback-test.xml
Visit the below website for more reference
https://mkyong.com/logging/logback-disable-logging-in-unit-test/
Simply change the logging level:
java.util.logging.Logger.getLogger("org.openqa.selenium").setLevel(Level.SEVERE);
to:
java.util.logging.Logger.getLogger("org.openqa.selenium").setLevel(Level.INFO);
or some lower level than SEVERE.
See https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html
I cannot locate elements using headless mode because of this restriction "All users will have to use google Chrome when accessing our sites."
This restriction was added by our admins so users could use only Google chrome.
My code is
#Test(priority = 1)
public void setupApplication() throws IOException {
/*
* open browser (GoogleChrome) and enter user credentials
*/
ChromeOptions options = new ChromeOptions();
options.addArguments("--window-size=1920,1080");
options.addArguments("--disable-gpu");
options.addArguments("--disable-extensions");
options.setExperimentalOption("useAutomationExtension", false);
options.addArguments("--proxy-server='direct://'");
options.addArguments("--proxy-bypass-list=*");
options.addArguments("--start-maximized");
options.addArguments("--headless");
driver = new ChromeDriver(options);
driver.get("link");
log.info("Launching chrome browser");
File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(scrFile, new File("C:/Automation Testing/scr3.png"));
}
Unfortunately I cannot show our link.
My question is how to bypass this and find elements?
Thanks in advance!
enter image description here
Update
if you wish to bypass the headless agent footprint attach following argument:
--user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.50 Safari/537.36
Note: any version you apply to the user-agent argument will be displayed in the request header information.
...or speak with the 'admins' of your project so they can include the headless chrome agent to the white-list.
Here is a normal agent information from chrome:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100
Safari/537.36
Here is the headless chrome
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/69.0.3497.100
Safari/537.36
As you can see the headless chrome agent is called: HeadlessChrome
when I use ipad on chrome the useragent is
Mozila/5.0(iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1
(KHTML,like Gecko) CruiOS/57.0.2987.137 Mobile/13G36 ....
but ipad on safari is
(Macintosh; Intel Mac OS X 10_15) AppleWebKit/605.1.15 (KHTML, like
Gecko) Version/13.0 Safari/605.1.15
And MacOS useragent is
Safari: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100
Safari/537.36
Problem : Since on iPadOS User Agent on Safari is same as on MacOS notebook following https://forums.developer.apple.com/thread/119186
and I this issue of Mobile_Detect php library https://github.com/serbanghita/Mobile-Detect/issues/795
To detect iPads, try this:
let isIpad = /Macintosh/i.test(navigator.userAgent) && navigator.maxTouchPoints && navigator.maxTouchPoints > 1;
I am new to Scrapy but spending massive effort on it the last days. However, I still fail at the basics.
I am trying to crawl the following website: https://blogabet.com/tipsters
My objective is to download all Links to the userprofiles. For instance, https://sabobic.blogabet.com/
When I am using scrapy shell, I can extract the specific xpath. But when I try to use a script and start it with "scrapy crawl ....". I am always getting no results. INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
What is wrong in my code?
import scrapy
from scrapy import Request
class BlogmeSpider(scrapy.Spider):
name = 'blogme'
def start_request(self):
url = "https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=0"
headers={
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,pl;q=0.8,de;q=0.7',
'Connection': 'keep-alive',
'Host': 'blogabet.com',
'Referer': 'https://blogabet.com/tipsters',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
yield scrapy.http.Request(url, headers=headers)
def parse(self, response):
username = response.xpath('//*[#class="e-mail u-db u-mb1 text-ellipsis"]/a/#href').extract()
yield {'username': username}