Extracting data from a facebook page using HttpRequest and HttpResponse - vb.net

I am trying to extract some business data from Facebook pages using VB.NET. However, I am not getting the response I would expect.
Dim request As HttpWebRequest
Dim response As HttpWebResponse
Dim responseText As String
request = CType(WebRequest.Create(http://www.facebook.com/Microsoft))
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"
request.AllowAutoRedirect = True
response = CType(request.GetResponse(), HttpWebResponse)
If I look at the text for the response I get this:
<html><head><title>Redirecting...</title><script>__DEV__=0;_script_path = "XVanityURLController";var uri_re=/^(?:(?:[^:\/?#]+):)?(?:\/\/(?:[^\/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?/,target_domain='';window.location.href.replace(uri_re,function(a,b,c,d){var e,f,g;e=f=b+(c?'?'+c:'');if(d){d=d.replace(/^(!|%21)/,'');g=d.charAt(0);if(g=='/'||g=='\\')e=d.replace(/^[\\\/]+/,'/');}if(e!=f)window.location.replace(target_domain+e);});</script><script type="text/javascript">/*<![CDATA[*/(function(){function si_cj(m){setTimeout(function(){new Image().src="https:\/\/error.facebook.com\/common\/scribe_endpoint.php?c=si_clickjacking&t=956"+"&m="+m;},5000);}if(top!=self && !false){try{if(parent!=top){throw 1;}var si_cj_d=["apps.facebook.com","apps.beta.facebook.com"];var href=top.location.href.toLowerCase();for(var i=0;i<si_cj_d.length;i++){if (href.indexOf(si_cj_d[i])>=0){throw 1;}}si_cj("3 ");}catch(e){si_cj("1 \t");window.document.write("\u003Cstyle>body * {display:none !important;}\u003C\/style>\u003Ca href=\"#\" onclick=\"top.location.href=window.location.href\" style=\"display:block !important;padding:10px\">Go to Facebook.com\u003C\/a>");/*kSxhSBR_*/}}}())/*]]>*/</script><script>window.location.replace("https:\/\/m.facebook.com\/AMD");</script><meta http-equiv="refresh" content="0;url=https://m.facebook.com/AMD" /></head><body></body></html>
However, when I use a WebBrowser it actually redirects me to the Microsoft page. I don't want to use a form though to accomplish this.
So, I'm not sure how to bypass this redirect with HttpWebRequest. Do I need to somehow login to facebook in order to get the response I'm looking for? If so, how do I do this? Please help, I've been banging my head on this for days.
##

The page is using javascript to perform the redirect.
Your HttpResponse is getting the HTML returned as string but it does not execute the JavaScript inside of it.
Try looking into using a headless web browser, such as Selenium.

Related

404 error while getting server response vb.net

I'm a totally beginner with webrequest, so I have no idea about what cause the error I get.
I try to login on a form following the microsoft tutorial for webrequest, but when I want to get the server response, I have the following error :
"the remote server returned an error (404) not found"
So I know that the URL I use actually exist and then wonder which part of the code is bad. Maybe it's because I'm doing an HTTPS request unlike the tutorial and it changes something ?
Also, I'm a little confused by getting directly the answer from the server : shouldn't there be kind of a trigger to know when the server answered ?
Dim request = WebRequest.Create("https://ssl.vocabell.com/mytica2/login")
request.Credentials = CredentialCache.DefaultCredentials
request.Method = "POST"
Dim byteArray = Encoding.UTF8.GetBytes("_username=x&_password=x")
request.ContentType = "application/x-www-form-urlencoded"
request.ContentLength = byteArray.Length
Dim dataStream = request.GetRequestStream()
dataStream.Write(byteArray, 0, byteArray.Length)
dataStream.Close()
Dim reponse = request.GetResponse() 'ERROR
MsgBox(CType(reponse, HttpWebResponse).StatusDescription)
Using ds = reponse.GetResponseStream
Dim reader = New StreamReader(ds)
MsgBox(reader.ReadToEnd)
End Using
reponse.Close()
Thank you for your time, and if you have any relevant tutorial on the topic I would be glad to read it !
The page you've mentioned does exist and uses HTTPS, but if you look at the form tag within it, it's like this:
<form class="login-form form-horizontal" action="/mytica2/login_check" method="POST">
This means it doesn't post the form back to the same URL as the page, instead it sends it to the URL contained within that "action" attribute. If you're trying to use your code to simulate the submission of the login form then it looks like you need to send your POST request to https://ssl.vocabell.com/mytica2/login_check instead.

Cannot load page with either WebClient or HttpWebRequest

Regardless of whether I use WebClient or HttpWebRequest, loading this page times out. What am I doing wrong? It can't be https, since other https sites load just fine.
Below is my latest attempt, which adds all headers that I see in Firefox's inspector.
One interesting behavior is that I cannot monitor this with Fiddler, because everything works properly when Fiddler is running.
Using client As WebClient = New WebClient()
client.Headers(HttpRequestHeader.Accept) = "text/html, image/png, image/jpeg, image/gif, */*;q=0.1"
client.Headers(HttpRequestHeader.UserAgent) = "Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12"
client.Headers(HttpRequestHeader.AcceptLanguage) = "en-US;en;q=0.5"
client.Headers(HttpRequestHeader.AcceptEncoding) = "gzip, deflate, br"
client.Headers(HttpRequestHeader.Referer) = "http://www.torontohydro.com/sites/electricsystem/Pages/foryourhome.aspx"
client.Headers("DNT") = "1"
client.Headers(HttpRequestHeader.KeepAlive) = "keep-alive"
client.Headers(HttpRequestHeader.Upgrade) = "1"
client.Headers(HttpRequestHeader.CacheControl) = "max-age=0"
Dim x = New Uri("https://css.torontohydro.com/")
Dim data as string = client.DownloadString(x)
End Using
All of this is excess code. Boiling it down to just a couple of lines causes the same hang.
Using client as WebClient = New WebClient()
Dim data as string = client.DownloadString("https://css.torontohydro.com")
End Using
And this is the HttpWebRequest code, in a nutshell, which also hangs getting the response.
Dim getRequest As HttpWebRequest = CreateWebRequest("https://css.torontohydro.com/")
getRequest.CachePolicy = New Cache.RequestCachePolicy(Cache.RequestCacheLevel.BypassCache)
Using webResponse As HttpWebResponse = CType(getRequest.GetResponse(), HttpWebResponse)
'no need for any more code, since the above line is where things hang
So this ended up being due to the project still being in .NET 3.5. .NET was trying to load the site, being https, using SSL. Adding this line fixed the problem:
ServicePointManager.SecurityProtocol = 3072
I had to use 3072 since 3.5 does not contain a definition for SecurityProtocolType.Tls12.

How to use VB web client to sign into a website, using POST or other. i.e. Facebook

I have a function that pulls and formats the source code of pages using a the VB webclient. I need a way to pull the source code of the page as though I were signed in on a browser.
I understand that I could use httpwebrequest in normal circumstances but this doesn't yield even the normal page, but one saying the browser is out of date. Even when I have used a new useragent. I believe it is related to the browser the request uses in VB.
I have been trying to do this using POST requests with the webclient but this doesn't work either. Below is the closest I have got.
Dim url As String = "URL HERE"
Dim xDoc As New XmlDocument
Dim s As String
Using client As New Net.WebClient
Dim reqparm As New Specialized.NameValueCollection
reqparm.Add("email", "EMAIL HERE")
reqparm.Add("pass", "PASSWORD HERE")
client.Headers("user-agent") = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
Dim responseBytes = client.UploadValues(url, "POST", reqparm)
Dim responsebody = (New Text.UTF8Encoding).GetString(responseBytes)
s = responsebody
End Using
I then proceed to output it to the next section of the program.
The attempt above just returns the normal source code. I'm guessing I'm completely missing how this works and how to implement it.
TL;DR:
Need to use vb webclient to pull source code of a page whilst acting like its signed in, but httpswebrequest is not an option.
Any help would be greatly appreciated.

Webclient always returns an empty Source Code

I would like to get the source code of this page for exemple:
My page URL
I used Webclient (DownloadString and DownloadFile) or HttpWebRequest. But, I always get return an empty string (Code source).
With firefox, Edge or other browser, I get the code source without problem.
How can I get the source code of the given exemple.
This a code of many codes that I used:
Using client = New WebClient()
client.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 10.0; rv:40.0) Gecko/20100101 Firefox/40.0")
Dim MyURL As String = "https://www.virustotal.com/fr/file/c65ce5ab02b69358d07b56434527d3292ea2cb12357047e6a396a5b27d9ef680/analysis/"
Dim Source_Code As String = client.DownloadString(MyURL)
MsgBox(Source_Code)
textbox1.text = Source_Code
End Using
NB 2: Webclient works fine with all other sites.
NB 1: I don't like to use Webbrowser or such control.
It seems the target server is picky and requires the Accept-Language header to return any content. The following code returns the page's content:
var url="https://www.virustotal.com/fr/file/c65ce5ab02b69358d07b56434527d3292ea2cb12357047e6a396a5b27d9ef680/analysis/";
var client=new System.Net.WebClient();
client.Headers.Add("Accept-Language","en");
var content=client.DownloadString(url);
If the Accept-Language header is missing, no data is returned.
To find this, you can use a tool like Fiddler to capture the HTTP request and responses from your browser and application. By removing one by one the headers sent by the browser, you can find which header the server actually requires.

Basic Authentication Webpage Login VB.NET

Hey all, I am having an issue with trying to automate our UPS installations. The webpage uses basic authentication and prompts for a login when loading the page. We do not have access to the registry to enable this feature in IE since it was disabled. I have tried useing an httpwebrequest and response to pull the cookie but it doesn't ever appear to send one back. My logic for that was going to be to use that cookie for the web browser control so it wouldn't then ask for the login. Here is my code that I have for that:
Dim request As HttpWebRequest = DirectCast(WebRequest.Create("http://10.106.206.249"), HttpWebRequest)
Dim mycache = New CredentialCache()
mycache.Add(New Uri("http://10.106.206.249"), "Basic", New NetworkCredential("User", "Pass"))
request.Credentials = mycache
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14"
request.CookieContainer = New CookieContainer()
Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
Dim cook As Cookie
For Each cook In response.Cookies
Console.WriteLine("Cookie:")
Console.WriteLine("{0} = {1}", cook.Name, cook.Value)
Console.WriteLine("Domain: {0}", cook.Domain)
Console.WriteLine("Path: {0}", cook.Path)
Console.WriteLine("Port: {0}", cook.Port)
Console.WriteLine("Secure: {0}", cook.Secure)
Console.WriteLine("When issued: {0}", cook.TimeStamp)
Console.WriteLine("Expires: {0} (expired? {1})", cook.Expires, cook.Expired)
Console.WriteLine("Don't save: {0}", cook.Discard)
Console.WriteLine("Comment: {0}", cook.Comment)
Console.WriteLine("Uri for comments: {0}", cook.CommentUri)
Console.WriteLine("Version: RFC {0}", IIf(cook.Version = 1, "2109", "2965"))
' Show the string representation of the cookie.
Console.WriteLine("String: {0}", cook.ToString())
Next cook
I know this works to some extent because if I use the incorrect creds I get an unathorized error thrown. So it appears either I am not catching the cookie or one is not being sent.
Another way I have tried is by sending a header with a regular Web.Navigate but that just acts like it is loading the page and prompts for login:
Dim authData
Dim authHeader As String
authData = System.Text.UnicodeEncoding.UTF8.GetBytes("User:Pass")
authHeader = "Authorization: Basic: " & System.Convert.ToBase64String(System.Text.Encoding.ASCII.GetBytes("User:Pass")) & Chr(13) & Chr(10)
Web.Navigate("http://10.106.206.249", False, Nothing, authHeader)
Anyone have any insight to see if maybe I am just doing something wrong here?
A simpler solution would be this:
Web.Navigate("http://Administrator:retail#10.106.206.249")
Note that if you have an #-sign in your password you'll have to UrlEncode it. (I'm not 100% sure whether the password will still work then)