Sellenium/BrowserMob won't connect to HTTPS port - selenium

Im writing an automated flow using Selenium and Java and I need to
connect via an authenticated HTTPS proxy using a "< username> " and "< password >".
Since Selenium doesn't support proxy authentication I'm using the standard technique of
running BrowserMobProxyServer and "chaining" the external proxy to it. While the code below works great with regular HTTP
For some reason it doesn't work with HTTPS and Im getting ERR_PROXY_CONNECTION_FAILED in my browser.
note that
"curl -v -x https://<username>:<password>#<proxy hostname>:<proxy HTTPS port> https://ipinfo.io"
works perfectly fine one my Ubuntu 22.04 LTS,
So I suspect that it is code error.
implementation 'org.seleniumhq.selenium:selenium-java:4.5.0'
implementation 'net.lightbody.bmp:browsermob-core:2.1.5'
public static BrowserMobProxyServer createLocalProxy(String hostname, String port,
String username, String password) {
BrowserMobProxyServer proxy = new BrowserMobProxyServer();
// Handling http and https URLs
proxy.setTrustAllServers(true);
// // remote proxy as added to the chain of locally running proxy server
proxy.setChainedProxy(new InetSocketAddress(hostname, Integer.parseInt(port)));
proxy.chainedProxyAuthorization(username, password, AuthType.BASIC);
proxy.setMitmManager(ImpersonatingMitmManager.builder().trustAllServers(true).build());
// This is a local proxy in JVM. Port is assigned automatically.
// It must be stopped using the stop() method before exiting.
proxy.start(0);
return proxy;
}
// proxy setup
BrowserMobProxy proxy =
createLocalProxy("<proxy hostname>", "<proxy HTTPS port>", "<user name>",
"<password>");
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);
seleniumProxy.setHttpProxy("localhost:" + proxy.getPort());
seleniumProxy.setSslProxy("localhost:" + proxy.getPort());
<some additional options here>
options.setProxy(seleniumProxy);
WebDriver driver = new ChromeDriver(options);

Related

Varnish 6.0lts won't handle secure websockets on a remote proxy?

I'm having a hard time with this setup. I have a node.js box serving HTTP on 3000, websockets on 3001, and secure websockets on 3002. Out in front of that I have a remote Hitch/Varnish caching proxy on its own server that's listening on 443/80 and connecting the first server as its default backend via 3000. A user who visits the site URL https://foo.tld hits the varnish proxy and sees the site, where some javascript on the site tells their browser to connect to wss://foo.tld:3002 for secure websockets.
My problem is getting websockets to pass transparently through to the backend. In the VCL I have the standard
if (req.http.upgrade ~ "(?i)websocket") {
return (pipe);
}
and
sub vcl_pipe {
#Declare pipe handler for websockets
if (req.http.upgrade) {
set bereq.http.upgrade = req.http.upgrade;
set bereq.http.connection = req.http.connection;
}
}
Which doesn't work in this case. To list what I have tried so far with no success:
1: Creating a second backend in VCL named "websockets" that is the same backend IP but on either port 3001 or 3002 and adding "set req.backend_hint = websockets;" before the pipe summon in the first snippet above.
2: Turning off HTTPS and trying to connect it over pure HTTP.
3: Modifying varnish.service to try and make varnish listen on ports other than, or in addition to, -a :80 and -a :8443,proxy, in which cases Varnish simply refuses to start. One attempt was to simply use HTTP only and attempt to run varnish on 3001 to get ws:// working without SSL but varnish refuses to start.
4: Most recently I attempted the following in VCL to try and pick up client connections coming in on 3001:
if (std.port(server.ip) == 3001) {
set req.backend_hint = websockets;
}
My goal is for the Varnish box to pick up secure websocket traffic (wss://) on 3002 (via hitch at 443 using the normal secure websocket connection protocol) and have that passed transparently to the backend websocket server, whether SSL encrypted across that leg of the connection or not. I have set up other, smaller servers like this before and getting websockets working is trivial if Varnish and the backend service are either on the same machine or behind a regulating CDN like Cloudflare, so it has been extra frustrating trying to figure out just what this remote proxy setup needs. I feel like part of the solution is having Varnish or Hitch (not sure) listening on 3002 to accept the connections at which point the normal req.http.upgrade and pipe functions would come into play, but the software refuses to cooperate.
--------Update--
I have broken down the problem into the simplest form I can. The main server (backend) is now serving plain HTTP on 8080 and WS:// on 6081. I have removed hitch and TLS from the equation entirely, but even in this simplified form it is impossible to connect to websockets through Varnish. I can verify that the Websocket server is working correctly on the backend. Connecting to the backend IP address with a browser shows websockets functioning perfectly there. It's Varnish that's the problem.
My current hitch.conf (not relevant here but provided per request):
frontend = "[*]:443"
frontend = "[*]:3001"
backend = "[127.0.0.1]:8443" # 6086 is the default Varnish PROXY port.
workers = 4 # number of CPU cores
daemon = on
# We strongly recommend you create a separate non-privileged hitch
# user and group
user = "redacted"
group = "redacted"
# Enable to let clients negotiate HTTP/2 with ALPN. (default off)
# alpn-protos = "h2, http/1.1"
# run Varnish as backend over PROXY; varnishd -a :80 -a localhost:6086,PROXY ..
write-proxy-v2 = on # Write PROXY header
syslog = on
log-level = 1
# Add pem files to this directory
# pem-dir = "/etc/pki/tls/private"
pem-file = "/redacted/hitch-bundle.pem"
Current default.vcl (stripped down to almost nothing just for testing this. The backend is NOT running on the same machine, it is remote):
# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;
# Default backend definition. Set this to point to your content server.
backend default {
.host = "remote.server.ip";
.port = "8080";
}
backend websockets {
.host = "remote.server.ip";
.port = "6081";
}
sub vcl_recv {
# Happens before we check if we have this in cache already.
#
# Typically you clean up the request here, removing cookies you don't need,
# rewriting the request, etc.
#Allow websockets to pass through the cache (summons pipe handler below)
if (req.http.Upgrade ~ "(?i)websocket") {
set req.backend_hint = websockets;
return (pipe);
} else {
set req.backend_hint = default;
}
}
sub vcl_pipe {
if (req.http.upgrade) {
set bereq.http.upgrade = req.http.upgrade;
set bereq.http.connection = req.http.connection;
}
return (pipe);
}
Varnish's systemd exec parameters:
ExecStart=/usr/sbin/varnishd \
-a http=:80 \
-a proxy=localhost:8443,PROXY \
-a ws=:6081 \
-p feature=+http2 \
-f /etc/varnish/default.vcl \
-s malloc,256m \
-p pipe_timeout=1800
Working in plain HTTP and insecure websockets like this, it should be very simple to get a working model. I don't understand what could possibly be going wrong.
Varnish Cache, the open source version of Varnish, doesn't support backend connections over TLS.
While you can offload TLS using Hitch, the connection to your websocket server will not be encrypted.
Basic VCL example
Here's a very basic VCL example where web & websocket requests are split and sent to separate backends:
vcl 4.1;
backend web {
.port = "3000";
}
backend ws {
.port = "3001";
}
sub vcl_recv {
if (req.http.Upgrade ~ "(?i)websocket") {
set req.backend_hint = ws;
return (pipe);
} else {
set req.backend_hint = web;
}
}
sub vcl_pipe {
if (req.http.upgrade) {
set bereq.http.upgrade = req.http.upgrade;
}
return (pipe);
}
Need more input
However, I'm probably missing a lot of context. I also didn't specify a .host parameter in the backends, so the assumption is that all services are hosted locally.
Please add your full VCL, your Hitch config and the varnishd runtime parameters to your question. This will add context and allows me to come up with a better solution.
What about Hitch?
If you terminate TLS in Hitch, both HTTPS & secure websockets will be handled by Hitch where the plain-text HTTP & websockets will still be directly handeled by Varnish.
See https://www.varnish-software.com/developers/tutorials/terminate-tls-varnish-hitch for a Hitch tutorial that also explains how Varnish should be configured.
I'm a big advocate of using the PROXY protocol in Varnish. The hitch tutorial has a specific section about this: https://www.varnish-software.com/developers/tutorials/terminate-tls-varnish-hitch/#enable-the-proxy-protocol-in-varnish
Custom ports
The standard ports to access the service are 80 for HTTP and insecure websockets and 443 for HTTPS and secure websockets.
If you want to use custom ports for the websockets, it is possible to configure them in Hitch and Varnish.
Let's say you want to main ports 3001 and 3002 for your websockets. This means you need 2 frontends in Hitch:
One for HTTPs on 443
One for secure WS on 3002
See https://www.varnish-software.com/developers/tutorials/terminate-tls-varnish-hitch/#listening-address for more information about the frontend config.
Varnish on the other hand needs to have 3 listening addresses:
One for HTTP on port 80 (-a http=:80)
One for offloaded HTTPS & secure WS with PROXY support on port 8443 (-a proxy=:8443,PROXY)
One for insecure WS on port 3001 (-a ws=:3001)
Next steps
Please use the information and see if this helps to find a solution. If not, please share your VCL file, your Hitch config and varnishd runtime.
Update
Now that you provided more input, the picture starts to become more clear. The fact that you eliminated the TLS part for now will make it a lot easier to debug.
Assuming the names of your listening interfaces for varnishd are http and ws (as mentioned in your systemd unit file), we can use the following varnishlog commands to debug:
varnishlog -g request -q "ReqStart[3] eq 'http'"
This command will show logs for all log transactions where the http listening interface is used.
If you want to make it more granular, you can also add the request URL as a filtering criterium. This will narrow down the number of transactions:
varnishlog -g request -q "ReqStart[3] eq 'http' and ReqUrl eq '/'"
Please add a complete log transaction for one of the failed requests. This will help us understand why requests are failing.
You can do the same for requests on the ws listening interface by using the commands below:
varnishlog -g request -q "ReqStart[3] eq 'ws'"
varnishlog -g request -q "ReqStart[3] eq 'ws' and ReqUrl eq '/'"
I'm assuming you're successful at starting the varnishd program but unsuccessful at getting decent output out of Varnish. The varnishlog program will provide the insight we need. Please add the logging output to your question so I can look into it.

how to apply Proxy to scrapy+selenium

I got proxy from https://www.zyte.com/ with APIkey and followed their instruction(https://support.zyte.com/support/solutions/articles/22000203564-using-zyte-smart-proxy-manager-with-selenium), but the response time is too long due to it goes through Docker to connect the proxy.
Isn't there any other solution to connect the such proxy to my code? (host url, APIkey and port are given)
I am using Python, Scrapy, Selenium on Window 10

No internet connection while using selenium proxy

I am a newbie in using selenium. When I use proxy chrome shows there is no internet connection. I have checked my internet connection and looked for possible solutions on the internet but failed. I also restarted my computer but it did not work.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep
PROXY = '43.225.164.59:38829'
options = Options()
options.add_argument('--proxy-server=%s' % PROXY)
driver = webdriver.Chrome(options = options,
executable_path='C:\webscraping\chromedriver')
driver.get('https://whatismyipaddress.com/')
sleep(6)
driver.close()
The code is looking good, it should works.
Do you use proxy with authentification? If do, you should use another way to enable proxy - issue (Selenium doesn't support authentification of proxy).
You should also be sure, that proxy is working. Check it with curl:
If you're using public proxy:
curl https://2ip.ru/ -x "http://proxy_host:proxy_port"
If you're using private proxy with authentification:
curl https://2ip.ru/ -x "http://proxy_user:proxy_pwd#proxy_host:proxy_port"
Both of these commands will return your ip address (it should be equal to proxy address you use.

Certificate rejected when using proxy

I have to connect to a REST webservice, and use a certificate to authenticate myself.
Everything works perfectly when I connect from OUTSIDE our corporate network and DONT use a proxy server.
The problem is that i have to consume the service from WITHIN our network, where i HAVE to use the proxy server.
When i try from within the network i just get a "401 - Unauthorized"
private static void RestWithCert(string certificatePath,string baseUrl,string method)
{
HttpClientHandler handler = new HttpClientHandler();
//when trying the code OUTside our company network I remove the proxy below
handler.Proxy = new WebProxy("http://proxy.company.dk");
handler.ClientCertificates.Add(new X509Certificate2(certificatePath, "password"));
HttpClient client = new HttpClient(handler);
client.BaseAddress = new Uri(baseUrl);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
HttpResponseMessage response = client.GetAsync(method).Result;
var status = response.StatusCode;
}
Outside the network, without proxy: Status = OK
Inside the network, with proxy: Status = Unauthorized
Inside the network, without proxy: Connection timed out
I have NO control over either Proxy or the webservice i am trying to access
What is happening? Is the proxy intercepting the certificate, and replacing it?
A proxy can capture SSL traffic and connect to the target server by acting as a ManInTheMiddle or allow the SSL connection to be made directly to the target server.
When client certificates are required, the proxy must allow the direct connection, because the proxy can not present a valid certificate when it is needed by target server. During SSL handshake some interchanged data is signed with the private key of the client certificate to authenticate the message and the proxy does not have that private ke
Probably your proxy is capturing the traffic. I suggest checking this. You can verify if the certificate received during SSL connection corresponds to the target server or proxy

node.js, socket.io and SSL

I have an Apache server running with SSL enabled. Now I made a small chat which is using node.js and socket.io to transmit data. Using port 8080 on a none secured connection is working just fine, but when I try it on a SSL secured domain it is not working. I do not get how the whole setup should work since SSL is only working through port 443. Apache is already listining on port 443. On which port should socket.io listen?
I had to set the SSL certificates like
var fs = require('fs');
var options = {
key: fs.readFileSync('/etc/ssl/ebscerts/wildcard.my_example.com.no_pass.key'),
cert: fs.readFileSync('/etc/ssl/ebscerts/wildcard.my_example.com.crt'),
ca: fs.readFileSync('/etc/ssl/ebscerts/bundle.crt')
};
var app = require('https').createServer(options),
io = require('socket.io').listen(app);
app.listen(8080);
I found the solution on github