Why am I getting System.FormatException: String was not recognized as a valid Boolean on a fraction of our customers machines? - httpwebrequest

Our c#.net software connects to an online app to deal with accounts and a shop. It does this using HttpWebRequest and HttpWebResponse.
An example of this interaction, and one area where the exception in the title has come from is:
var request = HttpWebRequest.Create(onlineApp + string.Format("isvalid.ashx?username={0}&password={1}", HttpUtility.UrlEncode(username), HttpUtility.UrlEncode(password))) as HttpWebRequest;
request.Method = "GET";
using (var response = request.GetResponse() as HttpWebResponse)
using (var ms = new MemoryStream())
{
var responseStream = response.GetResponseStream();
byte[] buffer = new byte[4096];
int read;
do
{
read = responseStream.Read(buffer, 0, buffer.Length);
ms.Write(buffer, 0, read);
} while (read > 0);
ms.Position = 0;
return Convert.ToBoolean(Encoding.ASCII.GetString(ms.ToArray()));
}
The online app will respond either 'true' or 'false'. In all our testing it gets one of these values, but for a couple of customers (out of hundreds) we get this exception System.FormatException: String was not recognized as a valid Boolean Which sounds like the response is being garbled by something. If we ask them to go to the online app in their web browser, they see the correct response. The clients are usually on school networks which can be fairly restrictive and often under proxy servers, but most cope fine once they've put the proxy details in or added a firewall exception. Is there something that could be messing up the response from the server, or is something wrong with our code?

Indeed, it's possible that the return result is somehow different.
Is there any particular reason you are doing the reasonably elaborate method of reading the repsonse there? Why not:
string data;
using(HttpWebResponse response = request.GetResponse() as HttpWebResponse){
StreamReader str = new StreamReader(response.GetResponseStream());
data = str.ReadToEnd();
str.Close();
}
string cleanResult = data.Trim().ToLower();
// log this
return Convert.ToBoolean(cleanResult);

First thing to note is I would definitely use something like:
bool myBool = false;
Boolean.TryParse(Encoding.ASCII.GetString(ms.ToArray()), myBool);
return myBool;

It's not some localisation issue is it? It's expecting the Swahili version of 'true', and getting confused. Are all the sites in one country, with the same language, etc?
I'd add logging, as suggested by others, and see what results you're seeing.
I'd also lean towards changing the code as silky suggested, though with a few further changes from me (code 'smell' issues, IMO); Use using around the stream reader, as well as the response.
Also, I don't think the use of as is appropriate in this instance. If the Response can't be cast to HttpWebResponse (which, admittedly is unlikely, but still) you'll get a NullRef exception on the response.GetResponseStream() bit which is both a vague error, and you've lost the original line number. Using (HttpWebResponse)request.GetResponse() will give you a more correct error, and the correct line number of the actual error.

Related

Appication goto Freeze when downloading a webpage

i wrote a function for download a webpage : function like:
public string GetWebPage(string sURL)
{
System.Net.WebResponse objResponse = null;
System.Net.WebRequest objRequest = null;
System.IO.StreamReader objStreamReader = null;
string sResultPage = null;
try
{
objRequest = System.Net.HttpWebRequest.Create(sURL);
objResponse = objRequest.GetResponse();
objStreamReader = new System.IO.StreamReader(objResponse.GetResponseStream());
sResultPage = objStreamReader.ReadToEnd();
return sResultPage;
}
catch (Exception ex)
{
return "";
}
}
But my problem is that. when this function working at that time application goto freeze (not response) and that time my can't not do any thing. How can i solve this problem. when downloading at time user can do other thing in my application.
Welcome to the world of blocking IO.
Consider the following:
You want your program to download a web page and then return the first 10 letters it finds in the source html. Your code might look like this:
...
string page = GetWebPage("http://example.com"); // download web page
page = page.Substring(0, 10);
Console.WriteLine(page);
....
When your program calls GetWebPage(), it must WAIT for the web page to be fully downloaded before it can possibly try to call Substring() - else it may try to get the substring before it actually downloads the letters.
Now consider your program. You've got lots of code - maybe a GUI interface running - and it's all executing line by line one instruction at a time. When your code calls GetWebPage(), it can't possibly continue executing additional code until that request is fully finished. Your entire program is waiting on that request to finish.
The problem can be solved in a few different ways, and the best solution depends on exactly what you're doing with your code. Ideally, your code needs to execute asynchronously. c# has methods that can handle a lot of this for you, but one way or another, you're going to want to start some work - downloading the web page in your case - and then continue executing code until your main thread is notified that the webpage is fully downloaded. Then your main thread can begin parsing the return value.
I'm assuming that since you've asked this question, you are very new to threads and concurrency in general. You have a lot of work to do. Here are some resources for you to read up about threading and implementing concurrency in c#:
C# Thread Introduction
.NET Asynchronous IO Design
the best was is to use thread
new Thread(download).Start(url);
and if your download page size is large use chunk logic.
HttpWebRequest ObjHttpWebRequest = (HttpWebRequest)HttpWebRequest.Create(Convert.ToString(url));
ObjHttpWebRequest.AddRange(99204);
ObjHttpWebRequest.Timeout = Timeout.Infinite;
ObjHttpWebRequest.Method = "get";
HttpWebResponse ObjHttpWebResponse = (HttpWebResponse)ObjHttpWebRequest.GetResponse();
Stream ObjStream = ObjHttpWebResponse.GetResponseStream();
StreamReader ObjStreamReader = new StreamReader(ObjStream);
byte[] buffer = new byte[1224];
int length = 0;
while ((length = ObjStream.Read(buffer, 0, buffer.Length)) > 0)
{
downloaddata += Encoding.GetEncoding(936).GetString(buffer);

Crawl Wikipedia using ASP.NET HttpWebRequest

I am new to Web Crawling, and I am using HttpWebRequest to crawl data from sites.
As of now I was successfully able to crawl and get data from my wordpress site. This data was a simple user profile data. (like name, email, AIM id etc...)
Now as an exercise I want to crawl wikipedia, where I will search using the value entered into textbox at my end and then crawl wikipedia with the search value and get the appropriate title(s) from the search.
Now I have the following doubts/difficulties.
Firstly, is this even possible ? I have heard that wiki has robot.txt setup to block this. Though I have heard this only from a friend and hence not sure.
I am using the same procedure I used earlier, but I am not getting the required results.
Thanks !
Update :
After some explanation and help from #svick, I tried the below code, but still not able to get any value (see last line of code, there I am expecting an html markup of the search result page)
string searchUrl = "http://en.wikipedia.org/w/index.php?search=Wikipedia&title=Special%3ASearch";
var postData = new StringBuilder();
postData.Append("search=" + model.Query);
postData.Append("&");
postData.Append("title" + "Special:Search");
byte[] data2 = Crawler.GetEncodedData(postData.ToString());
var webRequest = (HttpWebRequest)WebRequest.Create(searchUrl);
webRequest.Method = "POST";
webRequest.UserAgent = "Crawling HW (http://yassershaikh.com/contact-me/)";
webRequest.AllowAutoRedirect = false;
ServicePointManager.Expect100Continue = false;
Stream requestStream = webRequest.GetRequestStream();
requestStream.Write(data2, 0, data2.Length);
requestStream.Close();
var responseCsv = (HttpWebResponse)webRequest.GetResponse();
Stream response = responseCsv.GetResponseStream();
// Todo Parsing
var streamReader = new StreamReader(response);
string val = streamReader.ReadToEnd();
// val is empty !! <-- this is my problem !
and here is my GetEncodedData method defination.
public static byte[] GetEncodedData(string postData)
{
var encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(postData);
return data;
}
Pls help me on this.
You probably don't need to use HttpWebRequest. Using WebClient (or HttpClient if you're on .Net 4.5) will be much easier for you.
robots.txt doesn't actually block anything. If something doesn't support it (and .Net doesn't support it), it can access anything.
Wikipedia does block requests that don't have their User-Agent header set. And you should use an informative User-Agent string with your contact information.
A better way to access Wikipedia is to use its API, rather than scraping. This way, you will get an answer that's specifically meant to be read by a custom applications, formatted as XML or JSON. There are also dumps containing all information from Wikipedia available for download.
EDIT: The problem with your newly posted code is that your query returns a 302 Moved Temporarily response to the searched article, if it exists. Either remove the line that forbids AllowAutoRedirect, or add &fulltext=Search to your query, which will mean you won't get redirected.

Stream text to client via handler ASP.NET

To get around twitters streaming API not having a crossdomain file to access it from client side( in this case Silverlight) I have made a Generic Handler file in a web project which basically downloads the stream from twitter and as it reads it, writes it to the client.
Here is the handler code:
context.Response.Buffer = false;
context.Response.ContentType = "text/plain";
WebRequest request = WebRequest.Create("http://stream.twitter.com/1/statuses/filter.json?locations=-180,-90,180,90");
request.Credentials = new NetworkCredential("username", "password");
StreamReader responseStream = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.GetEncoding("utf-8"));
while (!responseStream.EndOfStream)
{
string line = "(~!-/" + responseStream.ReadLine() + "~!-/)";
context.Response.BinaryWrite((Encoding.UTF8.GetBytes(line)));}
And this does work, but the problem is that once the client disconnects the handler just carry's on downloading. So how do I tell if the client is still busy receiving the request and if not, end the while loop?
Also, my second problem is that on the client side doing a "ReadLine()" does nothing, presumably because it is counting the entire stream as one line so never gets the full response. To work around that I read it byte by byte and when it sees "(~!-/" around something it know that is one line. VERY hacky, I know.
Thanks!
Found the answer!
while (context.Response.IsClientConnected)
:)

Optimizing HttpWebResponse - GetResponse

I'm using the following lines of code to read the response of an asynchronous HttpWebRequest. This seems to be the largest amount of time spent in a particular operation. Is there anything I can optimize here?
System.Net.HttpWebResponse oResp =(System.Net.HttpWebResponse)oReq.EndGetResponse(oResult);
oResp = (HttpWebResponse)oReq.GetResponse();
StreamReader oStreamReader = new StreamReader(oResp.GetResponseStream());
string sResponse = oStreamReader.ReadToEnd();
...goes on to make an XmlDocument, append some more XML to it, then perform an XSL transform.
Creating the Connections:
HttpWebRequest oReq;
oReq = (HttpWebRequest)WebRequest.Create(sUrl + sQueryString);
oReq.ContentType = sContentType;
oReq.Method = "POST";
oReq.ContentLength = aBytes.Length;
Stream oStream = oReq.GetRequestStream();
oStream.Write(aBytes, 0, aBytes.Length);
oStream.Close();
AsyncState oState = new AsyncState(oReq);
return oReq.BeginGetResponse(fCallBack, oState);
I found one major improvement to the scheme I was using. Rather than using the StreamReader and ReadToEnd to get the Stream into a string, only then to convert it into an XmlDocument. I skipped the middle man and converted the Stream directly into a XmlDocument.
This left me with another problem though, I had to change the parent of the XmlDocument to fit my Xslt (there are a great many and they all expect the structure I had). See How can I add new root element to a C# XmlDocument? for that fix.
This has given me a roughly 2/3 decrease in the time taken to process the results of the webservice call, and a great decrease in the amount of memory used. In the previous version, the response xml was in memory two different times (maybe three if the stream counts)!
In addition, removing the extra GetResponse seemed to help.
using (HttpWebResponse oResp = (HttpWebResponse)oReq.EndGetResponse(oResult))
{
oXml.Load(oResp.GetResponseStream());
XmlNode oApiResult = oXml.RemoveChild(oXml.DocumentElement);
oXml.LoadXml(sOtherXml);
oXml.DocumentElement.AppendChild(oApiResult);
}

Why am I getting a "double response" from HttpWebResponse?

The follow code (running in ASP.Net 2.0) displays the contents of the requested URL twice. I only want it to display the contents of the requested URL once. I can't figure out what I'm doing wrong. The URL requested is returning XML and if I visit the URL directly, it works fine.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
byte[] postDataBytes = Encoding.UTF8.GetBytes(postData);
request.Method = "POST";
request.ContentType = "application/xml";
request.ContentLength = postDataBytes.Length;
Stream requestStream = request.GetRequestStream();
requestStream.Write(postDataBytes, 0, postDataBytes.Length);
requestStream.Close();
// get response and write to console
response = (HttpWebResponse) request.GetResponse();
StreamReader responseReader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
try {
Response.Write(responseReader.ReadToEnd());
}
finally {
responseReader.Close();
}
response.Close();
Your code looks good, so I don't think the problem is there... but what I would suggest is the following:
1) Maybe the error is on the URL's other end... so try hitting Google and see if the returned content is good or not.
2) Put a breakpoint at the "responseReader.ReadToEnd()" spot, and see if what's coming out of there is good.
3) If this code above is in an ASPX page... are you making sure to call "Response.End();" after you're last line of code? (not "resposne.close()", but "Response.End()").
I found the problem. It's not with the above code at all, but with the page being called. The page I was calling was inherited from a class whose Page_OnInit method contained the following line: "MyBase.OnLoad(e)", which caused the Page_OnLoad method to be executed twice. Obviously, it should have been MyBase.OnInit(e) instead. I didn't catch it because when I tested the page directly I had to temporarily remove the inheritance from the class because of some other code that would've have prevented me from testing the page directly.
I will now put on my "Dunce" hat and retreat to the corner for a time out. Thanks anyway for the help.