I'm trying to scrape a page, but when I login the page displays a pop up before the page that I need (Welcome to blah-blah-blah...don't hit refresh as it will slow the process...etc.etc...).
Naturally, HttpWebRequest scrapes THIS data and not the page that follows.
The popup self cancels so if I could just get the HttpWebRequest to wait a second or two and then scrape, it would work - or - if I can get it to do two scrapes (and I simply discard the 1st one) in the same session then that would work too.
Here's the code:
Dim CookieJar As New CookieContainer
Dim Request As HttpWebRequest = WebRequest.CreateHttp(TextBox1.Text)
Request.CookieContainer = New CookieContainer()
Request.CookieContainer.Add(New Uri(TextBox1.Text),
New Cookie("id", "1234"))
Request.PreAuthenticate = True
Request.Credentials = CredentialCache.DefaultCredentials
Request.UserAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64)"
Request.AllowAutoRedirect = True
Request.MaximumAutomaticRedirections = 4
Request.MaximumResponseHeadersLength = 4
Dim Response As WebResponse = DirectCast(Request.GetResponse(), HttpWebResponse)
Dim WebResult As String = New StreamReader(Response.GetResponseStream()).ReadToEnd()
TextBox2.Text = WebResult
Thanks in advance for any suggestions.
Related
i have a vb.net console application that logged into a website (POST form) by using Webclient:
Dim responsebytes = myWebClient.UploadValues("https:!!xxx.com/mysession/create", "POST", myNameValueCollection)
Last friday this suddenly stopped working, it worked without a problem for about 2-3 years. With Fiddler I got a HTTP 504 error but without Fiddler I got the error message:
The underlying connection was closed: The connection was closed unexpectedly.
I assume that something on the server-side has changed, but I have no influence on that. It's a commercial website, where I want to login automatically on my account to fetch some data.
As Fiddler can't help me much further I decided to built a basic HttpWebRequest example to rule out it was caused by the WebClient.
The example does:
navigate to the homepage of the company and read out an securityToken (this goes ok!)
post the securityToken + username + password to get logged in.
Public Class Form1
Const ConnectURL = "https:!!member.company.com/homepage/index"
Const LoginURL = "https:!!member.company.com/account/logn"
Private Function RegularPage(ByVal URL As String, ByVal CookieJar As CookieContainer) As String
Dim reader As StreamReader
Dim Request As HttpWebRequest = HttpWebRequest.Create(URL)
Request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
Request.AllowAutoRedirect = False
Request.CookieContainer = CookieJar
Dim Response As HttpWebResponse = Request.GetResponse()
reader = New StreamReader(Response.GetResponseStream())
Return reader.ReadToEnd()
reader.Close()
Response.Close()
End Function
Private Function LogonPage(ByVal URL As String, ByRef CookieJar As CookieContainer, ByVal PostData As String) As String
Dim reader As StreamReader
Dim Request As HttpWebRequest = HttpWebRequest.Create(URL)
Request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
Request.CookieContainer = CookieJar
Request.AllowAutoRedirect = False
Request.ContentType = "application/x-www-form-urlencoded"
Request.Method = "POST"
Request.ContentLength = PostData.Length
Dim requestStream As Stream = Request.GetRequestStream()
Dim postBytes As Byte() = Encoding.ASCII.GetBytes(PostData)
requestStream.Write(postBytes, 0, postBytes.Length)
requestStream.Close()
Dim Response As HttpWebResponse = Request.GetResponse()
For Each tempCookie In Response.Cookies
CookieJar.Add(tempCookie)
Next
reader = New StreamReader(Response.GetResponseStream())
Return reader.ReadToEnd()
reader.Close()
Response.Close()
End Function
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim CookieJar As New CookieContainer
Dim PostData As String
Try
Dim homePage As String = (RegularPage(ConnectURL, CookieJar))
Dim securityToken = homePage.Substring(homePage.IndexOf("securityToken") + 22, 36) 'momenteel 36 characters lang
PostData = "securityToken=" + securityToken + "&accountId=123456789&password=mypassword"
MsgBox(PostData)
Dim accountPage As String = (LogonPage(LoginURL, CookieJar, PostData))
Catch ex As Exception
MsgBox(ex.Message.ToString)
End Try
End Sub
End Class
This line causes the connection to be closed:
Dim requestStream As Stream = Request.GetRequestStream()
Is it possible that this company doesnt like the automated login and somehow notices that a application is used for logging in? How can I debug this? Fiddler doesn't seem to work. Is my only option WireShark as this seems kind of difficult to me.
Also is it weird that the connection is already is closed before I do the Post?
Are there other languages I can program this "easily" to rule out it's VB.net / .NET problem?
Have you attempted to capture the request using something like your browser's networking tools?
The auth process may have changed. Could even be some name or post data changes.
I got this fixed by:
double checking all the headers to be sent when using a browser
made sure all those headers where sent by the VB.NET application.
Not sure which one did the trick, but just always make sure you replicate all the headers that the browser would sent!
VB2012: I'm trying to login to my company website to parse out some info. It has a typical page
portal.mycompany.com
which redirects eventually to
security.mycompany.com/login.jsp?TYPE=xxx&METHOD=GET&{more parameters}
and there we are presented with textboxes for user name and password. I have Fiddler running and am a little lost as to what to look for when setting up my POST. My example looks to be the same from various coding sites. I am mainly looking for help on what to look for in Fiddler to use as a basis for the POST request to login to my site programmatically.
I took a look at some of the Fiddler entries and added what I thought was the enrty with the credentials. But when i tried adding this to the POST request it just responded with the original page.
Dim cookieJar As New Net.CookieContainer()
Dim request As Net.HttpWebRequest
Dim response As Net.HttpWebResponse
Dim strURL As String
Try
'Get Cookies
strURL = "http://portal.mycompany.com"
request = Net.HttpWebRequest.Create(strURL)
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
request.Method = "GET"
request.CookieContainer = cookieJar
response = request.GetResponse()
For Each tempCookie As Net.Cookie In response.Cookies
cookieJar.Add(tempCookie)
Next
'Send the post data now
request = Net.HttpWebRequest.Create(strURL)
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
request.Method = "POST"
request.AllowAutoRedirect = True
request.CookieContainer = cookieJar
Dim writer As StreamWriter = New StreamWriter(request.GetRequestStream())
writer.Write("email=username&pass=password") 'where do I get this in Fiddler?
writer.Close()
response = request.GetResponse()
'Get the data from the page
Dim stream As StreamReader = New StreamReader(response.GetResponseStream())
Dim data As String = stream.ReadToEnd()
response.Close()
If data.Contains("<title>MyCompany") = True Then
'LOGGED IN SUCCESSFULLY
End If
Catch ex As Exception
MsgBox(ex.Message)
End Try
i was inspired by Awesomium and im trying to use it as my client side app for my web application using vb.net . im trying to do an uploader module where our clinets can upload any files to our server . i used HttpWebRequest for uploading files which is working fine. but only problem is how to set the session created by Awesomium when i logged on to my web application to httpwebrequest . or is there any other way in awesomium itself to upload files to the server (ie php server ) .
Please apologies im not good in English.
below is the code im using for upload
Dim filepath As String = Path 'Path to file on local machine
Dim url As String = "http://xxxxx.com/uploadscanfile.php"
Dim boundary As String = IO.Path.GetRandomFileName
ImageRandomName.Add(IO.Path.GetFileName(filepath))
Dim header As New System.Text.StringBuilder()
header.AppendLine("--" & boundary)
header.Append("Content-Disposition: form-data; name=""file"";")
header.AppendFormat("filename=""{0}""", IO.Path.GetFileName(filepath))
header.AppendLine()
header.AppendLine("Content-Type: application/octet-stream")
header.AppendLine()
Dim headerbytes() As Byte = System.Text.Encoding.UTF8.GetBytes(header.ToString)
Dim endboundarybytes() As Byte = System.Text.Encoding.ASCII.GetBytes(vbNewLine & "--" & boundary & "--" & vbNewLine)
Dim req As Net.HttpWebRequest = Net.HttpWebRequest.Create(url)
req.ContentType = "multipart/form-data; boundary=" & boundary
req.ContentLength = headerbytes.Length + New IO.FileInfo(filepath).Length + endboundarybytes.Length
req.AllowAutoRedirect = True
req.Timeout = -1
req.KeepAlive = True
req.AllowWriteStreamBuffering = False
req.Method = "POST"
Dim s As IO.Stream = req.GetRequestStream
s.Write(headerbytes, 0, headerbytes.Length)
Dim filebytes() As Byte = My.Computer.FileSystem.ReadAllBytes(filepath)
s.Write(filebytes, 0, filebytes.Length)
s.Write(endboundarybytes, 0, endboundarybytes.Length)
s.Close()
Typically, session info is stored in the cookies. So, you will first need to send the log in data (and receive a reply). Also, when receiving the reply, you need to set a CookieContainer in the CookieContainer property and you will reuse it later. Only then, you can send the form data and upload the file, but make sure to set the CookieContainer to the CookieContainer that has the log in cookies when uploading.
A good way to see all the request/responses that go through to/from the server when loging in, I suggest you use Fiddler it is a very useful tool that monitors all the requests that you make. Note the headers, data that gets send and anything else that you may find useful.
More code:
Here is the part that logs you in: The post string that you send needs to contain the username and password. (and any other info that may be required).
Dim CookieJar As New CookieContainer() 'The CookieContainer that will keep all the cookies.
'DO NOT CLEAR THIS BETWEEN REQUESTS! ONLY CLEAR TO "Log Out".
Dim req As HttpWebRequest = HttpWebRequest.Create("<login URL goes here>")
req.Method = "POST"
req.Accept = "text/html, application/xhtml+xml, */*" 'This may be a bit different in your case. Refer to what Fiddler will say.
req.CookieContainer = CookieJar
req.ContentLength = post_str.Length
req.ContentType = "application/x-www-form-urlencoded" 'Also, any useragent will do.
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
req.Headers.Add("Accept-Language", "en-US,en;q=0.7,ru;q=0.3")
req.Headers.Add("Accept-Encoding", "gzip, deflate")
req.Headers.Add("DNT", "1") 'Make sure to add all headers that you found using Fiddler!
req.Headers.Add("Pragma", "no-cache")
Dim RequestWriter As New IO.StreamWriter(req.GetRequestStream())
RequestWriter.Write(post_str) 'Write the post string that contains the log in, password, etc.
RequestWriter.Close()
RequestWriter.Dispose()
Dim ResponceReader As New IO.StreamReader(req.GetResponse().GetResponseStream())
Dim ResponceData As String = ResponceReader.ReadToEnd()
ResponceReader.Close()
ResponceReader.Dispose()
req.GetResponse.Close()
'In the long run, you can check the ResponceData to verify that the log in was successful.
Here is where the CookieJar gets used to sent a request:
post_str = "" 'What needs to be sent goes in this variable.
req = HttpWebRequest.Create("<page to send request to goes here>")
req.Method = "POST"
req.Accept = "text/html, application/xhtml+xml, */*" 'May be different in your case
req.CookieContainer = CookieJar 'Please note: this HAS to be the same CookieJar as you used to login.
req.ContentLength = post_str.Length
req.ContentType = "application/x-www-form-urlencoded"
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
req.Headers.Add("Accept-Language", "en-US,en;q=0.7,ru;q=0.3")
req.Headers.Add("Accept-Encoding", "gzip, deflate")
req.Headers.Add("DNT", "1") 'Add all headers here.
req.Headers.Add("Pragma", "no-cache")
RequestWriter = New IO.StreamWriter(req.GetRequestStream())
RequestWriter.Write(post_str)
RequestWriter.Close()
RequestWriter.Dispose()
ResponceReader = New IO.StreamReader(req.GetResponse().GetResponseStream())
ResponceData = ResponceReader.ReadToEnd()
ResponceReader.Close()
ResponceReader.Dispose()
req.GetResponse.Close()
'You may want to read the ResponceData.
Hope this helps.
My code is 100% working but when I try filling up the forms under yelp.com it gives me 403 error. Here is my code:
Dim cweb As String = "http://www.yelp.com/biz_share?bizid=T6XCD1_eLEk3LaSp8C7E1g&return_url=%2Fbiz%2Fmr-c-los-angeles-2"
Dim POST As String = "csrftok=6cc5dea3ff8bf8f404f1e7a4951342cb2f132a17cb625a3988c50027d358285d&context=pyZQEaHS1YbhP3EEsTKGww&action_submit=1&emails=samplemail#gmail.com&emails=&emails=&unauth_name=Test+Name&unauth_email=testemail%40email.com¬e=How%27s+it+going%3F"
Dim request As HttpWebRequest
Dim response As HttpWebResponse
Dim tempCookies As New CookieContainer
request = CType(WebRequest.Create(cweb), HttpWebRequest)
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0"
request.AllowAutoRedirect = True
request.ContentType = "application/x-www-form-urlencoded"
request.ContentLength = POST.Length
request.Method = "POST"
request.KeepAlive = True
request.CookieContainer = tempCookies
Dim requestStream As Stream = request.GetRequestStream()
Dim postBytes As Byte() = Encoding.ASCII.GetBytes(POST)
requestStream.Write(postBytes, 0, postBytes.Length)
requestStream.Close()
response = CType(request.GetResponse(), HttpWebResponse)
tempCookies.Add(response.Cookies)
Dim postreader As New StreamReader(response.GetResponseStream())
Dim thepage As String = postreader.ReadToEnd
response.Close()
The web form I am basing is this:
http://www.yelp.com/biz_share?bizid=T6XCD1_eLEk3LaSp8C7E1g&return_url=%2Fbiz%2Fmr-c-los-angeles-2
On other web forms I am able to fill-up and send them, does this mean that yelp.com won't let you send any webrequest? I am really confused right now. Any help will be gladly accepted thanks in advance.
The csrftok will change for each user / session, the 403 is there because your authentication data is bad / the csrftok is not valid for the session. You need to go to the page before this (Or a login page or similar) to get the correct token.
I need to use HTTPWebRequest to login to an external website and redirect me to the default page. My code below is behind a button - when clicked it currently tries to do some processing but stays on the same page. I need it to redirect me to the default page of the external website without seeing the login page. Any help on what I'm doing wrong?
Dim loginURL As String = "https://www.example.com/login.aspx"
Dim cookies As CookieContainer = New CookieContainer
Dim myRequest As HttpWebRequest = CType(WebRequest.Create(loginURL), HttpWebRequest)
myRequest.CookieContainer = cookies
myRequest.AllowAutoRedirect = True
myRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1"
Dim myResponse As HttpWebResponse = CType(myRequest.GetResponse(), HttpWebResponse)
Dim responseReader As StreamReader
responseReader = New StreamReader(myResponse.GetResponseStream())
Dim responseData As String = responseReader.ReadToEnd()
responseReader.Close()
'call a function to extract the viewstate needed to login
Dim ViewState As String = ExtractViewState(responseData)
Dim postData As String = String.Format("__VIEWSTATE={0}&txtUsername={1}&txtPassword={2}&btnLogin.x=27&btnLogin.y=9", ViewState, "username", "password")
Dim encoding As UTF8Encoding = New UTF8Encoding()
Dim data As Byte() = encoding.GetBytes(postData)
'POST to login page
Dim postRequest As HttpWebRequest = CType(WebRequest.Create(loginURL), HttpWebRequest)
postRequest.Method = "POST"
postRequest.AllowAutoRedirect = True
postRequest.ContentLength = data.Length
postRequest.CookieContainer = cookies
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0.1"
Dim newStream = postRequest.GetRequestStream()
newStream.Write(data, 0, data.Length)
newStream.Close()
Dim postResponse As HttpWebResponse = CType(postRequest.GetResponse(), HttpWebResponse)
'using GET request on default page
Dim getRequest As HttpWebRequest = CType(WebRequest.Create("https://www.example.com/default.aspx"), HttpWebRequest)
getRequest.CookieContainer = cookies
getRequest.AllowAutoRedirect = True
Dim getResponse As HttpWebResponse = CType(getRequest.GetResponse(), HttpWebResponse)
'returns statuscode = 200
FYI - when i add in this code at the end, i get the HTML of the default page I'm trying to redirect to
Dim responseReader1 As StreamReader
responseReader1 = New StreamReader(getRequest.GetResponse().GetResponseStream())
responseData = responseReader1.ReadToEnd()
responseReader1.Close()
Response.Write(responseData)
Any help on whats missing to get the redirect working?
Cheers
The HttpWebRequest only automatically redirects you if the server sends an HTTP 3xx redirection status with a Location field in the response. Otherwise you are supposed to manually navigate to the page by using Response.Redirect, for example. Also keep in mind that the automatic redirection IGNORES ANY COOKIES sent by the server. That may be the problem in your case if the server is actually sending a redirection status.