PhantomJs and Seleno - selenium

I'm using PhantomJs and Seleno to implement some UI tests. But the problem is whenever I'm trying to find an element on my page it throws an error. "Unknown Command"
code for initializing the servers are like this:
var projectPath =
new DirectoryInfo(Environment.CurrentDirectory).Parent.Parent.Parent.GetDirectories("Foo")
.First()
.FullName;
var loc = ProjectLocation.FromPath(projectPath);
var service =PhantomJSDriverService.CreateDefaultService();
service.Port= 123;
var phantomJs = new PhantomJSDriver(service);
Func<PhantomJSDriver> newFunc = () => phantomJs;
var app = new WebApplication(loc, 123);
Instance.Run(app, c => c
.UsingLoggerFactory(new ConsoleFactory())
.WithRemoteWebDriver(newFunc)
);
it opens the iisexpress using port 123 and PhantomGhost is pointing to the same port.
and showing this error:
Unknown Command - Request => {"headers":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8","Accept-Encoding":"gzip, deflate","Accept-Language":"en-US,en;q=0.5","Cache-Control":"max-age=0","Connection":"keep-alive","Cookie":"ASP.NET_SessionId=a2umglrwcaquccg2rar0vzqa; .ASPXAUTH=7CBEDA8FC6170B15E116E77016D2136D4F58C8B73B0B2D54149B96847FE8A26E8D8FA24E41E5F0F0AFFE336D896B53C4628AB5B67B1960CB34727C85B6EF9720F7FF2A792BF1B5ECEECE5429DE212D8B7BA948978F302EF9B3A1040F05902AE92280FF8047D380583465D6CE6C6B103E5286F6FE37E75CFE22910E271BE2BEB4B552124B","Host":"localhost:12346","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"},"httpVersion":"1.1","method":"GET","url":"/","urlParsed":{"anchor":"","query":"","file":"","directory":"/","path":"/","relative":"/","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/","queryKey":{},"chunks":[""]}}
I can browse to it using firefox as well it's showing the same error. and obviously selenium can't find the elements and it will show an error.

I find out if you upgrade the PhantomJS to the newest version it will solve the problem

Related

Trying to recrate Pearl LWP code in VB using ActiveX component

Anyone familiar with the Chilkat ActiveX HTTP component or Pearl LWP? Im trying to reproduce some code that we currently have working in Pearl. What am trying to do is to log into an internet appliance and read the log file.
The login requires cookies. In a regular web browser I can just use http://test.com:3800/login.tgi?Username=BOB&Password=12345. Then once the login cookie is stored in the browser I can navigate to the log file page.
The working Pearl code is
my $Authenticate = "http://test.com:3800/login.tgi?Username=BOB&Password=12345";
my $Action = "http://test.com:3800/log”;
use strict qw/refs/;
use HTTP::Cookies;
use LWP::UserAgent;
use HTTP::Request::Common qw/POST GET/;
my $Browser = LWP::UserAgent->new(
cookie_jar => HTTP::Cookies->new,
requests_redirectable => [],
timeout => 10,
pragma => "no-cache",
max-age => 0,
agent => "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
);
my $Page = $Browser->request(GET $Authenticate);
if ($Page->is_success) { my $Page = $Browser->request(GET $Action); }
else { print $ErrorPage; die; }
I put this together quickly in VB using the ActiveX component but I doesn't even successfully login.
Authenticate = "http://test.com:3800/login.tgi?Username=BOB&Password=12345"
Action = "http://test.com:3800/log”
Set HTTP = New ChilkatHttp
HTTP.UserAgent = "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
HTTP.HeartbeatMs = 500
HTTP.ConnectTimeout = 45
HTTP.ReadTimeout = 100
HTTP.FetchFromCache = False
HTTP.FreshnessAlgorithm = 0
HTTP.DefaultFreshPeriod = 1
HTTP.MaxFreshPeriod = 1
HTTP.MaxFreshPeriod = 1
HTTP.SaveCookies = 1
HTTP.SendCookies = 1
HTTP.CookieDir = "memory"
Auth = HTTP.QuickGetStr (Authenticate)
If Auth <> "" Then Act = HTTP.QuickGetStr(Action)
Auth is returning
<HTML><HEAD>
<META HTTP-EQUIV="refresh" content="0; URL=/index.htm">
</HEAD><BODY></BODY>
</HTML>
If I substitute another URL for the login url, or leave off the login credentials (so it is just) http://test.com:3800
Auth gives me the correct HTML for that web page.
Can anyone see anything that is different between the 2 code snippets, or think of a reason that I may be having this issue?
I found this and it works perfectly

Converting HTML to PDF from https requiring authentication

I've been trying to convert html to pdf from my company's https secured authentication required web.
I tried directly converting it with pdfkit first.
pdfkit.from_url("https://companywebsite.com", 'output.pdf')
However I'm receiving these errors
Error: Authentication Required
Error: Failed to load https://companywebsite.com,
with network status code 204 and http status code 401 - Host requires authentication
So I added options to argument
pdfkit.from_url("https://companywebsite.com", 'output.pdf', options=options)
options = {'username': username,
'password': password}
It's loading forever without any output
My second method was to try creating session with requests
def download(session,username,password):
session.get('https://companywebsite.com', auth=HTTPBasicAuth(username,password),verify=False)
ua = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
session.headers = {'User-Agent': ua}
payload = {'UserName':username,
'Password':password,
'AuthMethod':'FormsAuthentication'}
session.post('https://companywebsite.com', data = payload, headers = session.headers)
my_html = session.get('https://companywebsite.com/thepageiwant')
my_pdf = open('myfile.html','wb+')
my_pdf.write(my_html.content)
my_pdf.close()
path_wkthmltopdf = 'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=bytes(path_wkthmltopdf, 'utf8'))
pdfkit.from_file('myfile.html', 'out.pdf')
download(session,username,password)
Could someone help me, I am getting 200 from session.get so its definitely getting the session
Maybe try using selenium to access to that site and snap the screenshot

How to fix 403 response when using HttpURLConnection in Selenium since the links are opening manually without any issue

I was checking the active links in a website with selenium web driver and java. I have passed the links to the array and while verifying I am getting the response as 403 forbidden for all links in the site. It is just a public website anyone can access. The links are working properly when clicking manually. I wanted to know Why it is not showing 200 and what can be done on this situation.
This is for Selenium webdriver with Java
for(int j=0;j< activelinks.size();j++) {
System.out.println("Active Link address and status >>> " + activelinks.get(j).getAttribute("href"));
HttpURLConnection connection = (HttpURLConnection)new URL(activelinks.get(j).getAttribute("href")).openConnection();
connection.connect();
String response = connection.getResponseMessage();
int responsecode = connection.getResponseCode();
connection.disconnect();
System.out.println(activelinks.get(j).getAttribute("href")+ ">>"+ response+ " " + responsecode);}
I expect the response code as 200, but the actual output is 403
I believe your need to add the relevant Cookies to the HTTPUrlConnection, or even better consider switching to OkHttp library which is under the hood of Selenium Java Client
So you basically need to fetch the cookies from the browser using driver.manage.getCookies() function and generate a proper Cookie request header for the subsequent calls.
Example code:
driver.manage().getCookies()
.forEach(cookie -> cookieBuilder
.append(cookie.getName())
.append("=")
.append(cookie.getValue())
.append(";"));
OkHttpClient client = new OkHttpClient().newBuilder().build();
for (WebElement activelink : activelinks) {
Request request = new Request.Builder()
.url(activelink.getAttribute("href"))
.addHeader("Cookie", cookieBuilder.toString())
.build();
Response urlResponse = client.newCall(request).execute();
String response = urlResponse.message();
int responsecode = urlResponse.code();
System.out.println(activelink.getAttribute("href") + ">>" + response + " " + responsecode);
}
If you need nothing else but response code you can consider using HEAD method to avoid executing calls for the full URLs - this will allow you to save traffic and your test will be much faster.
403 Forbidden
The HTTP 403 Forbidden client error status response code indicates that the server understood the request but refuses to authorize it.
This status is similar to 401, but in this case, re-authenticating will make no difference. The access is permanently forbidden and tied to the application logic, such as insufficient rights to a resource.
Reason
I don't see any such issue in your code block. However, there is a possibility that the WebDriver controlled Browser Client is getting detected and hence the subsequent requests are getting blocked and there can be numerous factors as follows:
User agent
Plugins
Languages
WebGL
Browser features
Missing image
You can find a couple of detailed discussion in:
How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha
Solution
A generic solution will be to use a proxy or rotating proxies from the Free Proxy List.
You can find a detailed discussion in Change proxy in chromedriver for scraping purposes
Outro
You can a couple relevant discussions in:
Can a website detect when you are using selenium with chromedriver?
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Failed to load resource: the server responded with a status of 429 (Too Many Requests) and 404 (Not Found) with ChromeDriver Chrome through Selenium
Had the same problem, user agent was the issue in my case (read more here: https://www.javacodegeeks.com/2018/05/how-to-handle-http-403-forbidden-error-in-java.html).
Also check what request methods are allowed on your website, you can do that by looking at any endpoint in "Network" tab in Chrome. It should list the allowed request methods, in my case I couldn't use "HEAD", but "GET" did the trick.
Code:
List<WebElement> links = driver.findElements(By.tagName("a"));
boolean brokenLink = false;
for (WebElement link : links) {
String url = link.getAttribute("href");
HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36");
conn.connect();
int httpCode = conn.getResponseCode();
if (httpCode >= 400) {
System.out.println("BROKEN LINK: " + url + " " + httpCode);
brokenLink = true;
Assert.assertFalse(brokenLink);
}
else {
System.out.println("Working link: " + url + " " + httpCode);
}
}

Using Selenium with Phantomjs in node not returning results

I have the following node route using selenium and chrome driver which is working correctly and returning expected html in the console:
app.get('/google', function (req, res) {
var driver = new webdriver
.Builder()
.forBrowser('chrome')
.build();
driver.get('https://www.google.com')
driver
.manage()
.window()
.setSize(1200, 1024);
driver.wait(webdriver.until.elementLocated({xpath: '//*[#id="lst-ib"]'}));
return driver
.findElement({xpath: '//*[#id="lst-ib"]'})
.sendKeys('stackoverflow' + webdriver.Key.RETURN)
.then((html) => {
return driver
.findElement({xpath: '//*[#id="rso"]/div[1]/div/div/div/div'})
.getAttribute("innerHTML")
})
.then((result) => {
console.log(result)
})
.then(() => {
res
.status(200)
.send('ok')
});
I have also installed the phantom js driver and tested that its working by returning the URL title - it works. When I use the above exact route and replace the chrome with phantomjs I get no results returned. There are no errors - just no print out in my console. The status and result are never sent to the browser so it doesn't appear to be stepping through promise chain.
Any suggestions?
The issue was that there was different html being rendered depending on the user agent. By forcing a user agent I was able to retrieve the results i needed.
Here is the code snippet replaced above to get this working.
.Builder()
// .forBrowser('phantomjs')
.withCapabilities(webdriver.Capabilities.phantomjs()
.set("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"))
.build();

Webclient always returns an empty Source Code

I would like to get the source code of this page for exemple:
My page URL
I used Webclient (DownloadString and DownloadFile) or HttpWebRequest. But, I always get return an empty string (Code source).
With firefox, Edge or other browser, I get the code source without problem.
How can I get the source code of the given exemple.
This a code of many codes that I used:
Using client = New WebClient()
client.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 10.0; rv:40.0) Gecko/20100101 Firefox/40.0")
Dim MyURL As String = "https://www.virustotal.com/fr/file/c65ce5ab02b69358d07b56434527d3292ea2cb12357047e6a396a5b27d9ef680/analysis/"
Dim Source_Code As String = client.DownloadString(MyURL)
MsgBox(Source_Code)
textbox1.text = Source_Code
End Using
NB 2: Webclient works fine with all other sites.
NB 1: I don't like to use Webbrowser or such control.
It seems the target server is picky and requires the Accept-Language header to return any content. The following code returns the page's content:
var url="https://www.virustotal.com/fr/file/c65ce5ab02b69358d07b56434527d3292ea2cb12357047e6a396a5b27d9ef680/analysis/";
var client=new System.Net.WebClient();
client.Headers.Add("Accept-Language","en");
var content=client.DownloadString(url);
If the Accept-Language header is missing, no data is returned.
To find this, you can use a tool like Fiddler to capture the HTTP request and responses from your browser and application. By removing one by one the headers sent by the browser, you can find which header the server actually requires.