how to apply Proxy to scrapy+selenium - selenium

I got proxy from https://www.zyte.com/ with APIkey and followed their instruction(https://support.zyte.com/support/solutions/articles/22000203564-using-zyte-smart-proxy-manager-with-selenium), but the response time is too long due to it goes through Docker to connect the proxy.
Isn't there any other solution to connect the such proxy to my code? (host url, APIkey and port are given)
I am using Python, Scrapy, Selenium on Window 10

Related

Sellenium/BrowserMob won't connect to HTTPS port

Im writing an automated flow using Selenium and Java and I need to
connect via an authenticated HTTPS proxy using a "< username> " and "< password >".
Since Selenium doesn't support proxy authentication I'm using the standard technique of
running BrowserMobProxyServer and "chaining" the external proxy to it. While the code below works great with regular HTTP
For some reason it doesn't work with HTTPS and Im getting ERR_PROXY_CONNECTION_FAILED in my browser.
note that
"curl -v -x https://<username>:<password>#<proxy hostname>:<proxy HTTPS port> https://ipinfo.io"
works perfectly fine one my Ubuntu 22.04 LTS,
So I suspect that it is code error.
implementation 'org.seleniumhq.selenium:selenium-java:4.5.0'
implementation 'net.lightbody.bmp:browsermob-core:2.1.5'
public static BrowserMobProxyServer createLocalProxy(String hostname, String port,
String username, String password) {
BrowserMobProxyServer proxy = new BrowserMobProxyServer();
// Handling http and https URLs
proxy.setTrustAllServers(true);
// // remote proxy as added to the chain of locally running proxy server
proxy.setChainedProxy(new InetSocketAddress(hostname, Integer.parseInt(port)));
proxy.chainedProxyAuthorization(username, password, AuthType.BASIC);
proxy.setMitmManager(ImpersonatingMitmManager.builder().trustAllServers(true).build());
// This is a local proxy in JVM. Port is assigned automatically.
// It must be stopped using the stop() method before exiting.
proxy.start(0);
return proxy;
}
// proxy setup
BrowserMobProxy proxy =
createLocalProxy("<proxy hostname>", "<proxy HTTPS port>", "<user name>",
"<password>");
Proxy seleniumProxy = ClientUtil.createSeleniumProxy(proxy);
seleniumProxy.setHttpProxy("localhost:" + proxy.getPort());
seleniumProxy.setSslProxy("localhost:" + proxy.getPort());
<some additional options here>
options.setProxy(seleniumProxy);
WebDriver driver = new ChromeDriver(options);

No internet connection while using selenium proxy

I am a newbie in using selenium. When I use proxy chrome shows there is no internet connection. I have checked my internet connection and looked for possible solutions on the internet but failed. I also restarted my computer but it did not work.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep
PROXY = '43.225.164.59:38829'
options = Options()
options.add_argument('--proxy-server=%s' % PROXY)
driver = webdriver.Chrome(options = options,
executable_path='C:\webscraping\chromedriver')
driver.get('https://whatismyipaddress.com/')
sleep(6)
driver.close()
The code is looking good, it should works.
Do you use proxy with authentification? If do, you should use another way to enable proxy - issue (Selenium doesn't support authentification of proxy).
You should also be sure, that proxy is working. Check it with curl:
If you're using public proxy:
curl https://2ip.ru/ -x "http://proxy_host:proxy_port"
If you're using private proxy with authentification:
curl https://2ip.ru/ -x "http://proxy_user:proxy_pwd#proxy_host:proxy_port"
Both of these commands will return your ip address (it should be equal to proxy address you use.

Accessing localhost outside of server

I am new to node.js and am trying to get into the hang of actually using it. I am very familiar with JavaScript so the language itself is self-explanatory but the use of Node.js is quite different from the browser implementation.
I have my own remote virtual server and have installed Node and the Package Manager and everything works as expected. I am not exactly a server extraordinaire and have limited experience with the Terminal and Apache Configurations.
I can run my server using:
nodejs index.js
Which gives me: listening on *:3300 as expected.
I can then access my localhost from the terminal using: curl http://localhost:3300/ which gives me the response I expect.
Given that the website that links to my server is https://example.com, how do I allow this link to access: http://localhost:3300/ so that I can actually use my node server in production? For example, http://localhost:3300/ runs a Socket Server that I would like to use using Socket.io on https://example.com/chat.html with the JavaScript:
var socket = io.connect('http://localhost:3300/', {transports: ['websocket'], upgrade: false});
Ok, this question has nothing to do with nodeJS.
localhost is a hostname that means this computer. it's equivalent to 127.0.0.1 or whatever IP address you can refer to your computer.
After the double colon (:) you enter the port number.
So if you want to make an HTTP call to a web-server running on your server, you have to know what is the IP address of your server, or the domain name, and then you call it with the port number where the server is running.
For Instance, you would call https://example.com:3300/chat.html to make an HTTP call to a server running on example.com with port 3300.
Keep in mind, that you have to make sure with your firewall configuration, that the specific port is open for incoming HTTP requests.

Running Fiddler as a Reverse Proxy for HTTPS server

I have the following situation: 2 hosts, one is a client and the other an HTTPS server.
Client (:<brwsr-port>) <=============> Web server (:443)
I installed Fiddler on the server so that I now have Fiddler running on my server on port 8888.
The situation i would like to reach is the following:
|Client (:<brwsr-port>)| <===> |Fiddler (:8888) <===> Web server (:443)|
|-Me-------------------| |-Server--------------------------------|
From my computer I want to contact Fiddler which will redirect traffic to the web server. The web server however uses HTTPS.
On The server I set up Fiddler to handle HTTPS sessions and decrypt them. I was asked to install on the server Fiddler's fake CA's certificate and I did it! I also inserted the script suggested by the Fiddler wiki page to redirect HTTPS traffic
// HTTPS redirect -----------------------
FiddlerObject.log("Connect received...");
if (oSession.HTTPMethodIs("CONNECT") && (oSession.PathAndQuery == "<server-addr>:8888")) {
oSession.PathAndQuery = "<server-addr>:443";
}
// --------------------------------------
However when I try https://myserver:8888/index.html I fail!
Failure details
When using Fiddler on the client, I can see that the CONNECT request starts but the session fails because response is HTTP error 502. Looks like no one is listening on port 8888. In fact, If I stop Fiddler on the server I get the same situation: 502 bad gateway.
Please note that when I try https://myserver/index.html and https://myserver:443/index.html everything works!
Question
What am I doing wrong?
Is it possible that...?
I thought that since maybe TLS/SSL works on port 443, I should have Fiddler listen there and move my web server to another port, like 444 (I should probably set on IIS an https binding on port 444 then). Is it correct?
If Fiddler isn't configured as the client's proxy and is instead running as a reverse proxy on the Server, then things get a bit more complicated.
Running Fiddler as a Reverse Proxy for HTTPS
Move your existing HTTPS server to a new port (e.g. 444)
Inside Tools > Fiddler Options > Connections, tick Allow Remote Clients to Connect. Restart Fiddler.
Inside Fiddler's QuickExec box, type !listen 443 ServerName where ServerName is whatever the server's hostname is; for instance, for https://Fuzzle/ you would use fuzzle for the server name.
Inside your OnBeforeRequest method, add:
if ((oSession.HostnameIs("fuzzle")) &&
(oSession.oRequest.pipeClient.LocalPort == 443) )
{
oSession.host = "fuzzle:444";
}
Why do you need to do it this way?
The !listen command instructs Fiddler to create a new endpoint that will perform a HTTPS handshake with the client upon connection; the default proxy endpoint doesn't do that because when a proxy receives a connection for HTTPS traffic it gets a HTTP CONNECT request instead of a handshake.
I just ran into a similar situation where I have VS2013 (IISExpress) running a web application on HTTPS (port 44300) and I wanted to browse the application from a mobile device.
I configured Fiddler to "act as a reverse proxy" and "allow remote clients to connect" but it would only work on port 80 (HTTP).
Following on from EricLaw's suggestion, I changed the listening port from 8888 to 8889 and ran the command "!listen 8889 [host_machine_name]" and bingo I was able to browse my application on HTTPS on port 8889.
Note: I had previously entered the forwarding port number into the registry (as described here) so Fiddler already knew what port to forward the requests on to.

HTTPS Web(only)Proxy

I just read over node-tls-proxy (http://code.google.com/p/node-tls-proxy/), a https proxy. I like the idea of it, but I'm not getting why this proxy needs a local http server (see the local-proxy.js script).
So I was wondering if this is necessary?
My idea of the proxy was actually like this: Client -> HTTPS Connection to trusted Server/Proxy -> Internets
In this case network sniffing between the Client and the Server wouldn't (hardly) be possible because it would be ssl encrypted.
Thanks,
Seb
If I get the idea correctly, the goal is to set up a "remote" proxy in a location that one trusts to be secure. Your client shall only communicate with this remote proxy using TLS, the remote proxy is then allowed to do the actual (no longer encrypted) HTTP requests.
What you do on the client side now is this: you configure the "local" proxy in your browser. Since you type "http://..." in your browser even when using the proxy, your browser will initiate an unencrypted HTTP connection to the local proxy first. Then the local proxy will open an encrypted TLS connection to the remote proxy and forward your request over a secured channel.
This means you need the local proxy for the purpose of "transforming" HTTP into HTTPS requests because your browser will dutifully only use HTTP when asked to make an actual HTTP request.