Having problems when trying to login to a website using C++/cli - authentication

I've been roughly searching over the internet for some answers to my problem, but I still couldn't figure out how to log in to a website properly.
Firstly, I'm going to explain what I've done until this moment.
» I opened this website: http://side.utad.pt/cursos/einformatica/ upon which I want to log in.
» After opening its souce code, I found the Url that the Login form posts to is:
https://side.utad.pt/side-secure3/login.pl
I tried to open it but an Internal Error came out so I tried to access that url without /login.pl instead but i don't have permission to. As I can't get this Url working, I thought about using the first link itself.
» By using tamper data extension(Firefox) I found that there are 3 post arguments: sessionid, username and password. Username and password are input by the user himself.
To get sessionid, I simply searched for it inside source code and took it from there:
String^ formUrl = "http://side.utad.pt/cursos/einformatica/";
String^ pageSource;
WebClient^ client = gcnew WebClient();
pageSource = client->DownloadString(formUrl);
delete client;
client = nullptr;int index = pageSource->IndexOf("sessionid");
int startIndex = index + 34;
String^ _sessionid = pageSource->Substring(startIndex, 32);
Until here, everything was fine apart from the Url problem.
» Started formatting all the data gathered(which I believe is the correct way):
String^ formParams;
// format data
formParams = "sessionid="+ _sessionid+"&username="+username+"&password="+password;
» After that, I started working with the "body" of the code:
WebRequest^ req = WebRequest::Create(formUrl);
// encode our data
array<Byte>^ bytes = System::Text::Encoding::ASCII->GetBytes(formParams);
req->ContentType = "application/x-www-form-urlencoded";
req->Method = "POST";
req->ContentLength = bytes->Length;
Stream^ os = req->GetRequestStream();
os->Write(bytes,0,bytes->Length);
os->Close();
Am I doing it correctly until here?
» I wanted to check if i'm logged in or not, so I thought about getting another source code, but this time on pos-login page(can be accessed without logging in but we're always carried to that page after logging in):
// this code is added below os->Close();
WebResponse^ resp = req->GetResponse();
String^ cookieHeader;
cookieHeader = resp->Headers["Set-cookie"];
WebRequest^ getRequest = WebRequest::Create("http://side.utad.pt/cursos/einformatica/principal"); // Exception 1
getRequest->Headers->Add("Cookie", cookieHeader);
WebResponse^ getResponse = getRequest->GetResponse();
StreamReader^ sr = gcnew StreamReader(getRequest->GetRequestStream()); // Exception 2
pageSource = ""; // reset
pageSource = sr->ReadToEnd();
Firstly, the first line does raise an exception - most of the times - but I don't know the cause: 'The server commited a protocol violation Section=ResponseStatusLine'
Secondly and lastly, when that line doesn't raise an exception, this does(cannot send a content-body with this verb-type)
StreamReader^ sr = gcnew StreamReader(getRequest->GetRequestStream());
Any ideas to get this working?
I think that the problem here is related to the cookies. I might have not saved them properly..
Thanks

Related

Read a file from the cache in CEFSharp

I need to navigate to a web site that ultimately contains a .pdf file and I want to save that file locally. I am using CEFSharp to do this. The nature of this site is such that once the .pdf appears in the browser, it cannot be accessed again. For this reason, I was wondering if once you have a .pdf displayed in the browser, is there a way to access the source for that file in the cache?
I have tried implementing IDownloadHandler and that works, but you have to click the save button on the embedded .pdf. I am trying to get around that.
OK, here is how I got it to work. There is a function in CEFSharp that allows you to filter an incoming web response. Consequently, this gives you complete access to the incoming stream. My solution is a little on the dirty side and not particularly efficient, but it works for my situation. If anyone sees a better way, I am open for suggestions. There are two things I have to assume in order for my code to work.
GetResourceResponseFilter is called every time a new page is downloaded.
The PDF is that last thing to be downloaded during the navigation process.
Start with the CEF Minimal Example found here : https://github.com/cefsharp/CefSharp.MinimalExample
I used the WinForms version. Implement the IRequestHandler and IResponseFilter in the form definition as follows:
public partial class BrowserForm : Form, IRequestHandler, IResponseFilter
{
public readonly ChromiumWebBrowser browser;
public BrowserForm(string url)
{
InitializeComponent();
browser = new ChromiumWebBrowser(url)
{
Dock = DockStyle.Fill,
};
toolStripContainer.ContentPanel.Controls.Add(browser);
browser.BrowserSettings.FileAccessFromFileUrls = CefState.Enabled;
browser.BrowserSettings.UniversalAccessFromFileUrls = CefState.Enabled;
browser.BrowserSettings.WebSecurity = CefState.Disabled;
browser.BrowserSettings.Javascript = CefState.Enabled;
browser.LoadingStateChanged += OnLoadingStateChanged;
browser.ConsoleMessage += OnBrowserConsoleMessage;
browser.StatusMessage += OnBrowserStatusMessage;
browser.TitleChanged += OnBrowserTitleChanged;
browser.AddressChanged += OnBrowserAddressChanged;
browser.FrameLoadEnd += browser_FrameLoadEnd;
browser.LifeSpanHandler = this;
browser.RequestHandler = this;
The declaration and the last two lines are the most important for this explanation. I implemented the IRequestHandler using the template found here:
https://github.com/cefsharp/CefSharp/blob/master/CefSharp.Example/RequestHandler.cs
I changed everything to what it recommends as default except for GetResourceResponseFilter which I implemented as follows:
IResponseFilter IRequestHandler.GetResourceResponseFilter(IWebBrowser browserControl, IBrowser browser, IFrame frame, IRequest request, IResponse response)
{
if (request.Url.EndsWith(".pdf"))
return this;
return null;
}
I then implemented IResponseFilter as follows:
FilterStatus IResponseFilter.Filter(Stream dataIn, out long dataInRead, Stream dataOut, out long dataOutWritten)
{
BinaryWriter sw;
if (dataIn == null)
{
dataInRead = 0;
dataOutWritten = 0;
return FilterStatus.Done;
}
dataInRead = dataIn.Length;
dataOutWritten = Math.Min(dataInRead, dataOut.Length);
byte[] buffer = new byte[dataOutWritten];
int bytesRead = dataIn.Read(buffer, 0, (int)dataOutWritten);
string s = System.Text.Encoding.UTF8.GetString(buffer);
if (s.StartsWith("%PDF"))
File.Delete(pdfFileName);
sw = new BinaryWriter(File.Open(pdfFileName, FileMode.Append));
sw.Write(buffer);
sw.Close();
dataOut.Write(buffer, 0, bytesRead);
return FilterStatus.Done;
}
bool IResponseFilter.InitFilter()
{
return true;
}
What I found is that the PDF is actually downloaded twice when it is loaded. In any case, there might be header information and what not at the beginning of the page. When I get a stream segment that begins with %PDF, I know it is the beginning of a PDF so I delete the file to discard any previous contents that might be there. Otherwise, I just keep appending each segment to the end of the file. Theoretically, the PDF file will be safe until you navigate to another PDF, but my recommendation is to do something with the file as soon as the page is loaded just to be safe.

Google ouath2 Token Request Does not work

This walk through/demonstration and instruction for requesting a token does not work. When I execute my code exactly as you have it and run it, I receive a 400 error every time, and this json response:
{
error: "invalid_grant"
}
https://developers.google.com/accounts/docs/OAuth2ServiceAccount#makingrequest
I have been trying now to get this to work for almost a week, and am not getting any useful help here, and I see lot of similar questions here unanswered.
Thanks, any help would be amazing!
Karl..
Here is the code I am using (which I've wrapped up a bit and I may need to reveal internal code). Note: I left in the strange \/ slashes ins the scope and aud props of the claim as I am trying another guys fix from Stack http://goo.gl/bt9lPj (that doesn't seem to be working either and I'm getting the exact same error)
var claimbuilder = new Stub.Jwt.ClaimsBuilder();
claimbuilder.Add("iss", "...#developer.gserviceaccount.com");
claimbuilder.Add("scope", "https:\\/\\/picasaweb.google.com\\/data\\/");
claimbuilder.Add("aud", "https:\\/\\/accounts.google.com\\/o\\/oauth2\\/token");
claimbuilder.Add("exp", (Stub.Jwt.Utility.UnixTime + (60 * 5)).ToString());
claimbuilder.Add("iat", Stub.Jwt.Utility.UnixTime.ToString());
string head = "{\"alg\":\"RS256\",\"typ\":\"JWT\"}";
var jwt = String.Format("{0}.{1}", head, claimbuilder.ClaimSet);
Console.WriteLine(jwt);
var certificate = new X509Certificate2(#"....-privatekey.p12", "notasecret", X509KeyStorageFlags.Exportable);
var token = new Stub.Jwt.JsonWebToken();
var jwtresult = token.Generate(head, claimbuilder.ClaimSet, certificate);
Console.WriteLine("jwt: {0}", jwtresult);
OAuth.Response resp = new OAuth.Response();
OAuth.Request auth = new OAuth.Request("https://accounts.google.com/o/oauth2/token");
auth.AddPostVar("grant_type", HttpUtility.UrlEncode("urn:ietf:params:oauth:grant-type:jwt-bearer")); // "authorization_code");
auth.AddPostVar("assertion", jwt);
auth.Go(resp);
Console.WriteLine(resp.OAuthTokenValue);
The code is only valid for a few minutes and after expired you will receive invalid_grant as response.
Could you paste here the JSON payload your code constructs? That'd make spotting issues easier.

Google Spellcheck

I'm unable to access the Google spell check service located at this address:
https://www.google.com/tbproxy/spell
is anyone else having this problem? I keep getting "bad gateway" when I try to connect. I'm pretty sure the service is offline.
Is there any news on what's going on? I know Google Drive went down a few weeks ago with the same set of error messages.
You can try this below Java code. This doesn't require any API Key. But please note, if you run it frequently, it will stop working as google blocks the IP Address from making future calls. You can use it on small data set. Not ideal solution, but if it is part of some batch job which runs in a while, then this approach may be acceptable to you.
public static String getSpellCheckedText(String Text) throws Exception {
String google = "http://www.google.com/complete/search?output=toolbar&q=";
String search = Text;
String charset = "UTF-8";
String spellCheckedText = Text;
URL url = new URL(google + URLEncoder.encode(search, charset));
Reader reader = new InputStreamReader(url.openStream(), charset);
BufferedReader bufReader = new BufferedReader(reader);
String line = bufReader.readLine();
StringBuffer sBuffer = new StringBuffer();
while (line != null) {
sBuffer.append(line).append("\n");
line = bufReader.readLine();
}
String content = sBuffer.toString();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(content));
Document document = builder.parse(is);
NodeList nodeList = document.getElementsByTagName("suggestion");
if (nodeList != null && nodeList.getLength() > 0) {
org.w3c.dom.Node elm = nodeList.item(0);
if (elm.getNodeType() == Node.ELEMENT_NODE) {
Element suggestionElement = (Element)elm;
String suggestedString = suggestionElement.getAttribute("data");
if (suggestedString != null && suggestedString.trim().length() != 0) {
spellCheckedText = suggestedString.trim();
System.out.println(Text + " => "+ spellCheckedText);
}
}
}
return spellCheckedText;
}
I am also having this problem. I am getting a 503 Server Error. The problem is definitely on Google's end. (N.B. I am on Safari 6.0.3)
In specific...
503. That's an error.
The service you requested is not available at this time.
Service error -27. That’s all we know.
It seems as though Google is having some problems with their services. Hopefully they fix it soon!
Ditto, here. I really depend on it to check spelling in text boxes. It says "Unable to connect to Google spelling servers. Please check your internet connection and try again"

Crawl Wikipedia using ASP.NET HttpWebRequest

I am new to Web Crawling, and I am using HttpWebRequest to crawl data from sites.
As of now I was successfully able to crawl and get data from my wordpress site. This data was a simple user profile data. (like name, email, AIM id etc...)
Now as an exercise I want to crawl wikipedia, where I will search using the value entered into textbox at my end and then crawl wikipedia with the search value and get the appropriate title(s) from the search.
Now I have the following doubts/difficulties.
Firstly, is this even possible ? I have heard that wiki has robot.txt setup to block this. Though I have heard this only from a friend and hence not sure.
I am using the same procedure I used earlier, but I am not getting the required results.
Thanks !
Update :
After some explanation and help from #svick, I tried the below code, but still not able to get any value (see last line of code, there I am expecting an html markup of the search result page)
string searchUrl = "http://en.wikipedia.org/w/index.php?search=Wikipedia&title=Special%3ASearch";
var postData = new StringBuilder();
postData.Append("search=" + model.Query);
postData.Append("&");
postData.Append("title" + "Special:Search");
byte[] data2 = Crawler.GetEncodedData(postData.ToString());
var webRequest = (HttpWebRequest)WebRequest.Create(searchUrl);
webRequest.Method = "POST";
webRequest.UserAgent = "Crawling HW (http://yassershaikh.com/contact-me/)";
webRequest.AllowAutoRedirect = false;
ServicePointManager.Expect100Continue = false;
Stream requestStream = webRequest.GetRequestStream();
requestStream.Write(data2, 0, data2.Length);
requestStream.Close();
var responseCsv = (HttpWebResponse)webRequest.GetResponse();
Stream response = responseCsv.GetResponseStream();
// Todo Parsing
var streamReader = new StreamReader(response);
string val = streamReader.ReadToEnd();
// val is empty !! <-- this is my problem !
and here is my GetEncodedData method defination.
public static byte[] GetEncodedData(string postData)
{
var encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(postData);
return data;
}
Pls help me on this.
You probably don't need to use HttpWebRequest. Using WebClient (or HttpClient if you're on .Net 4.5) will be much easier for you.
robots.txt doesn't actually block anything. If something doesn't support it (and .Net doesn't support it), it can access anything.
Wikipedia does block requests that don't have their User-Agent header set. And you should use an informative User-Agent string with your contact information.
A better way to access Wikipedia is to use its API, rather than scraping. This way, you will get an answer that's specifically meant to be read by a custom applications, formatted as XML or JSON. There are also dumps containing all information from Wikipedia available for download.
EDIT: The problem with your newly posted code is that your query returns a 302 Moved Temporarily response to the searched article, if it exists. Either remove the line that forbids AllowAutoRedirect, or add &fulltext=Search to your query, which will mean you won't get redirected.

Why am I getting System.FormatException: String was not recognized as a valid Boolean on a fraction of our customers machines?

Our c#.net software connects to an online app to deal with accounts and a shop. It does this using HttpWebRequest and HttpWebResponse.
An example of this interaction, and one area where the exception in the title has come from is:
var request = HttpWebRequest.Create(onlineApp + string.Format("isvalid.ashx?username={0}&password={1}", HttpUtility.UrlEncode(username), HttpUtility.UrlEncode(password))) as HttpWebRequest;
request.Method = "GET";
using (var response = request.GetResponse() as HttpWebResponse)
using (var ms = new MemoryStream())
{
var responseStream = response.GetResponseStream();
byte[] buffer = new byte[4096];
int read;
do
{
read = responseStream.Read(buffer, 0, buffer.Length);
ms.Write(buffer, 0, read);
} while (read > 0);
ms.Position = 0;
return Convert.ToBoolean(Encoding.ASCII.GetString(ms.ToArray()));
}
The online app will respond either 'true' or 'false'. In all our testing it gets one of these values, but for a couple of customers (out of hundreds) we get this exception System.FormatException: String was not recognized as a valid Boolean Which sounds like the response is being garbled by something. If we ask them to go to the online app in their web browser, they see the correct response. The clients are usually on school networks which can be fairly restrictive and often under proxy servers, but most cope fine once they've put the proxy details in or added a firewall exception. Is there something that could be messing up the response from the server, or is something wrong with our code?
Indeed, it's possible that the return result is somehow different.
Is there any particular reason you are doing the reasonably elaborate method of reading the repsonse there? Why not:
string data;
using(HttpWebResponse response = request.GetResponse() as HttpWebResponse){
StreamReader str = new StreamReader(response.GetResponseStream());
data = str.ReadToEnd();
str.Close();
}
string cleanResult = data.Trim().ToLower();
// log this
return Convert.ToBoolean(cleanResult);
First thing to note is I would definitely use something like:
bool myBool = false;
Boolean.TryParse(Encoding.ASCII.GetString(ms.ToArray()), myBool);
return myBool;
It's not some localisation issue is it? It's expecting the Swahili version of 'true', and getting confused. Are all the sites in one country, with the same language, etc?
I'd add logging, as suggested by others, and see what results you're seeing.
I'd also lean towards changing the code as silky suggested, though with a few further changes from me (code 'smell' issues, IMO); Use using around the stream reader, as well as the response.
Also, I don't think the use of as is appropriate in this instance. If the Response can't be cast to HttpWebResponse (which, admittedly is unlikely, but still) you'll get a NullRef exception on the response.GetResponseStream() bit which is both a vague error, and you've lost the original line number. Using (HttpWebResponse)request.GetResponse() will give you a more correct error, and the correct line number of the actual error.