Anyone familiar with the Chilkat ActiveX HTTP component or Pearl LWP? Im trying to reproduce some code that we currently have working in Pearl. What am trying to do is to log into an internet appliance and read the log file.
The login requires cookies. In a regular web browser I can just use http://test.com:3800/login.tgi?Username=BOB&Password=12345. Then once the login cookie is stored in the browser I can navigate to the log file page.
The working Pearl code is
my $Authenticate = "http://test.com:3800/login.tgi?Username=BOB&Password=12345";
my $Action = "http://test.com:3800/log”;
use strict qw/refs/;
use HTTP::Cookies;
use LWP::UserAgent;
use HTTP::Request::Common qw/POST GET/;
my $Browser = LWP::UserAgent->new(
cookie_jar => HTTP::Cookies->new,
requests_redirectable => [],
timeout => 10,
pragma => "no-cache",
max-age => 0,
agent => "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
);
my $Page = $Browser->request(GET $Authenticate);
if ($Page->is_success) { my $Page = $Browser->request(GET $Action); }
else { print $ErrorPage; die; }
I put this together quickly in VB using the ActiveX component but I doesn't even successfully login.
Authenticate = "http://test.com:3800/login.tgi?Username=BOB&Password=12345"
Action = "http://test.com:3800/log”
Set HTTP = New ChilkatHttp
HTTP.UserAgent = "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
HTTP.HeartbeatMs = 500
HTTP.ConnectTimeout = 45
HTTP.ReadTimeout = 100
HTTP.FetchFromCache = False
HTTP.FreshnessAlgorithm = 0
HTTP.DefaultFreshPeriod = 1
HTTP.MaxFreshPeriod = 1
HTTP.MaxFreshPeriod = 1
HTTP.SaveCookies = 1
HTTP.SendCookies = 1
HTTP.CookieDir = "memory"
Auth = HTTP.QuickGetStr (Authenticate)
If Auth <> "" Then Act = HTTP.QuickGetStr(Action)
Auth is returning
<HTML><HEAD>
<META HTTP-EQUIV="refresh" content="0; URL=/index.htm">
</HEAD><BODY></BODY>
</HTML>
If I substitute another URL for the login url, or leave off the login credentials (so it is just) http://test.com:3800
Auth gives me the correct HTML for that web page.
Can anyone see anything that is different between the 2 code snippets, or think of a reason that I may be having this issue?
I found this and it works perfectly
Related
I've been trying to convert html to pdf from my company's https secured authentication required web.
I tried directly converting it with pdfkit first.
pdfkit.from_url("https://companywebsite.com", 'output.pdf')
However I'm receiving these errors
Error: Authentication Required
Error: Failed to load https://companywebsite.com,
with network status code 204 and http status code 401 - Host requires authentication
So I added options to argument
pdfkit.from_url("https://companywebsite.com", 'output.pdf', options=options)
options = {'username': username,
'password': password}
It's loading forever without any output
My second method was to try creating session with requests
def download(session,username,password):
session.get('https://companywebsite.com', auth=HTTPBasicAuth(username,password),verify=False)
ua = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
session.headers = {'User-Agent': ua}
payload = {'UserName':username,
'Password':password,
'AuthMethod':'FormsAuthentication'}
session.post('https://companywebsite.com', data = payload, headers = session.headers)
my_html = session.get('https://companywebsite.com/thepageiwant')
my_pdf = open('myfile.html','wb+')
my_pdf.write(my_html.content)
my_pdf.close()
path_wkthmltopdf = 'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=bytes(path_wkthmltopdf, 'utf8'))
pdfkit.from_file('myfile.html', 'out.pdf')
download(session,username,password)
Could someone help me, I am getting 200 from session.get so its definitely getting the session
Maybe try using selenium to access to that site and snap the screenshot
I am working on a script to scrape some information off Amazon's Prime Now grocery website. However, I am stumbling on the first step in which I am attempting to start a session and login to the page.
I am fairly positive that the issue is in building the 'data' object. There are 10 input's in the html but the data object I have constructed only has 9, with the missing one being the submit button. I am not entirely sure if it is relevant as this is my first time working with BeautifulSoup.
Any help would be greatly appreciated! All of my code is below, with the last if/else statement confirming that it has not worked when I run the code.
import requests
from bs4 import BeautifulSoup
# define URL where login form is located
site = 'https://primenow.amazon.com/ap/signin?clientContext=133-1292951-7489930&openid.return_to=https%3A%2F%2Fprimenow.amazon.com%2Fap-post-redirect%3FsiteState%3DclientContext%253D131-7694496-4754740%252CsourceUrl%253Dhttps%25253A%25252F%25252Fprimenow.amazon.com%25252Fhome%252Csignature%253DIFISh0byLJrJApqlChzLdkc2FCEj3D&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=amzn_houdini_desktop_us&openid.mode=checkid_setup&marketPlaceId=A1IXFGJ6ITL7J4&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&pageId=amzn_pn_us&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.pape.max_auth_age=3600'
# initiate session
session = requests.Session()
# define session headers
session.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.61 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': site
}
# get login page
resp = session.get(site)
html = resp.text
# get BeautifulSoup object of the html of the login page
soup = BeautifulSoup(html , 'lxml')
# scrape login page to get all the needed inputs required for login
data = {}
form = soup.find('form')
for field in form.find_all('input'):
try:
data[field['name']] = field['value']
except:
pass
# add username and password to the data for post request
data['email'] = 'my email'
data['password'] = 'my password'
# submit post request with username / password and other needed info
post_resp = session.post(site, data = data)
post_soup = BeautifulSoup(post_resp.content , 'lxml')
if post_soup.find_all('title')[0].text == 'Your Account':
print('Login Successfull')
else:
print('Login Failed')
I've been working on this problem for the last few days and finally made some progress.. Today I managed to force the cookie through the request and the server has finally authenticated the request, however I am unable to update the cookies and transfer the authenticated cookies to the next few pages.
'post form data to page
strUrl = "https://e926.net/user/authenticate"
webRequest2 = HttpWebRequest.Create(strUrl)
webRequest2.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 Firefox/4.0 (.NET CLR 3.5.30729)"
webRequest2.AllowAutoRedirect = True
webRequest2.Method = WebRequestMethods.Http.Post
webRequest2.ContentType = "application/x-www-form-urlencoded"
webRequest2.CookieContainer = cookies
webRequest2.ContentLength = postData.Length
requestWriter = New StreamWriter(webRequest2.GetRequestStream)
requestWriter.Write(postData)
requestWriter.Close()
Dim response2 As HttpWebResponse = CType(webRequest2.GetResponse(), HttpWebResponse)
Dim strCookies2 As String = response2.Headers("Set-Cookie")
MsgBox(strCookies2)
strCookies2 = System.Text.RegularExpressions.Regex.Split(strCookies2, "((e926=.*))")(1)
strCookies2 = strCookies2.Split(";")(0)
strCookies2 = strCookies2.Replace("e926=", "")
cookie.Name = "e926"
cookie.Value = strCookies2
cookie.Domain = ".e926.net"
cookie.HttpOnly = True
cookie.Path = "/"
cookies.Add(cookie)
'recieve authenticated cookie
webRequest2.GetResponse().Close()
This is the page code that actually submits the login details and deals with the actual login request, I can see in Fiddler that the 'user' cookie is sent and the 'e926/auth' cookie is updated, but I have been unable to get the updated cookies from the headers or any other method I have tried..
The page is PHP and doesn't allow 'GET' requests and of course these wouldn't help anyways since the cookies never seem to transfer properly, and the cookies have to be updated from the request.
So my question is, how do I get the updated cookies from the page in VB.NET?
All I had to do was change 'auto-redirect' from 'true' to false and it forced the cookies to be gathered from the 'auth' page as opposed to the 'home' page.
i am trying to grab betting lines with python from pinnaclesports using their API http://www.pinnaclesports.com/api-xml/manual
which requires basic authentication (http://www.pinnaclesports.com/api-xml/manual#authentication):
Authentication
API use HTTP Basic access authentication . Always use HTTPS to access
the API. You need to send HTTP Request header like this:
Authorization: Basic
For example:
Authorization: Basic U03MyOT23YbzMDc6d3c3O1DQ1
import urllib.request, urllib.parse, urllib.error
import socket
import base64
url = 'https://api.pinnaclesports.com/v1//feed?sportid=12&leagueid=6164'
username = "abc"
password = "xyz"
base64 = "Basic: " + base64.b64encode('{}:{}'.format(username,password).encode('utf-8')).decode('ascii')
print (base64)
details = urllib.parse.urlencode({ 'Authorization' : base64 })
details = details.encode('UTF-8')
url = urllib.request.Request(url, details)
url.add_header("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13")
responseData = urllib.request.urlopen(url).read().decode('utf8', 'ignore')
print (responseData)
Unfortunately i get a http 500 error. Which from my point means either my authentication isn't working properly or their API is not working.
Thanks in advance
As it happens, I don't seem to use the Python version you use, so this has not been tested using your code, but there is an extraneous colon after "Basic" in your base64 string. In my own code, adding this colon after "Basic" indeed yields a http 500 error.
Edit: Code example using Python 2.7 and urllib2:
import urllib2
import base64
def get_leagues():
url = 'https://api.pinnaclesports.com/v1/leagues?sportid=33'
username = "myusername"
password = "mypassword"
b64str = "Basic " + base64.b64encode('{}:{}'.format(username,password).encode('utf-8')).decode('ascii')
headers = {'Content-length' : '0',
'Content-type' : 'application/xml',
'Authorization' : b64str}
req = urllib2.Request(url, headers=headers)
responseData = urllib2.urlopen(req).read()
ofn = 'api_leagues.txt'
with open(ofn, 'w') as ofile:
ofile.write(responseData)
I'm using PhantomJs and Seleno to implement some UI tests. But the problem is whenever I'm trying to find an element on my page it throws an error. "Unknown Command"
code for initializing the servers are like this:
var projectPath =
new DirectoryInfo(Environment.CurrentDirectory).Parent.Parent.Parent.GetDirectories("Foo")
.First()
.FullName;
var loc = ProjectLocation.FromPath(projectPath);
var service =PhantomJSDriverService.CreateDefaultService();
service.Port= 123;
var phantomJs = new PhantomJSDriver(service);
Func<PhantomJSDriver> newFunc = () => phantomJs;
var app = new WebApplication(loc, 123);
Instance.Run(app, c => c
.UsingLoggerFactory(new ConsoleFactory())
.WithRemoteWebDriver(newFunc)
);
it opens the iisexpress using port 123 and PhantomGhost is pointing to the same port.
and showing this error:
Unknown Command - Request => {"headers":{"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8","Accept-Encoding":"gzip, deflate","Accept-Language":"en-US,en;q=0.5","Cache-Control":"max-age=0","Connection":"keep-alive","Cookie":"ASP.NET_SessionId=a2umglrwcaquccg2rar0vzqa; .ASPXAUTH=7CBEDA8FC6170B15E116E77016D2136D4F58C8B73B0B2D54149B96847FE8A26E8D8FA24E41E5F0F0AFFE336D896B53C4628AB5B67B1960CB34727C85B6EF9720F7FF2A792BF1B5ECEECE5429DE212D8B7BA948978F302EF9B3A1040F05902AE92280FF8047D380583465D6CE6C6B103E5286F6FE37E75CFE22910E271BE2BEB4B552124B","Host":"localhost:12346","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"},"httpVersion":"1.1","method":"GET","url":"/","urlParsed":{"anchor":"","query":"","file":"","directory":"/","path":"/","relative":"/","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/","queryKey":{},"chunks":[""]}}
I can browse to it using firefox as well it's showing the same error. and obviously selenium can't find the elements and it will show an error.
I find out if you upgrade the PhantomJS to the newest version it will solve the problem