If I use Nightingale REST Client to download a CSV file it works, but if I try to duplicate the configuration and get the file using VB.Net it fails.
Here is a screencap of my Nightingale config and a successful response:
This is my code to try and do the same thing:
Dim values As NameValueCollection
Dim result As Byte()
Using client = New WebClient()
client.Headers.Add("Content-Type", "application/x-www-form-urlencoded")
values = New NameValueCollection From {{"_EVENTARGUMENT", "CSV,Export,,M"}, {"_EVENTTARGET", "dnn$ctr520$dnn"}, {"txtFlt_AI", ""}, {"txtFlt_CoName", ""}, {"cboFlt_ParishName", ""}, {"cboFlt_Fyear", ""}, {"ScrollTop", ""}}
result = client.UploadValues("https://internet.deq.louisiana.gov/portal/DIVISIONS/UNDER-GROUND-STORAGE-TANK/CURRENT-UST-TANK-CERTIFICATES", "POST", values)
End Using
Instead of the CSV file, I get the web page that this URL normally retrieves. What am I missing here? Thanks.
hard to tell from just this info, i'd say make sure you have all your headers input, and you might have to set the user agent, and perhaps along with it, the referrer.
also look into your auth, if you receive any sort of token or cookie, you'll need to store and pass that along too.
Related
I have a bot running in an online game which the staff use to draw out the map's levels. It uploads these pics to Imgur using its API, but Imgur isn't always up and running so I'd like to add a 2nd host as a backup plan.
I discovered this site which looks like a fine choice: https://boring.host/api
To get an API key, sign up (fast, no confirmation needed) then go to "My Account" by clicking your name on the top right.
Looking at the API info, it looks like I just need to send a HTTP POST to "https://api.boring.host/api/v1/upload" and include the "api_token" and "file" parameters.
When I upload to Imgur, I write the image to a MemoryStream then convert that to a Base64String:
Convert.ToBase64String(bytImage)
So I tried the same thing for this:
Public Function UploadBoringHost(Dim bm as Bitmap) As String
'Write image to memorystream
Dim bytImage As Byte()
Using stream As New System.IO.MemoryStream
bm.Save(stream, System.Drawing.Imaging.ImageFormat.Png)
bytImage = stream.ToArray
End Using
'Upload
Using client As New WebClient
Dim reqparm As New Specialized.NameValueCollection
reqparm.Add("api_token", strApiKey)
reqparm.Add("file", Convert.ToBase64String(bytImage))
Dim responsebytes = client.UploadValues("https://api.boring.host/api/v1/upload", "POST", reqparm)
Dim responsebody = (New Text.UTF8Encoding).GetString(responsebytes)
End Using
End Function
Trying it this way, I get the following web exception: "The remote server returned an error: (422) Unprocessable Entity"
My only guess is that it's expecting the image data to be passed through in some other format. Any ideas?
I am struggling getting a cookie, from a website, when I alert the cookie,it just returns: system.net.cookiecontainer Here's how I am trying to get the cookie:
'get the cookie for the post request !important
Dim req As HttpWebRequest = DirectCast(WebRequest.Create("http://www.dailymail.co.uk/home/index.html"), HttpWebRequest)
req.Method = "GET"
'iniate the cookie container for the post request
Dim tmpcookie As New CookieContainer
'get the cookie.
Dim postcookie = DirectCast(req.GetResponse(), HttpWebResponse)
tmpcookie.Add(postcookie.Cookies)
'assign the cookie to use outsie the scope (background worker)
textcookie = tmpcookie.ToString()
but when I alert textcookie I get what I said above :(
tmpcookie is a CookieContainer. You're calling ToString on a CookieContainer, it does what it's specified to do: output the fully qualified type name, "System.Net.CookieContainer".
It's like doing (New List(Of Object)).ToString() - it's going to output "System.Collection.Generics.List", not a string representing every item in that list.
You'll want to iterate the cookies in that container, and concatenate/build (?) a string from each individual cookie in that container.
USing Wicket 6.17 and servlet 2.5, I have a form that allows file upload, and also has ReCaptcha (using Recaptcha4j). When the form has ReCaptcha without file upload, it works properly using the code:
final HttpServletRequest servletRequest = (HttpServletRequest ) ((WebRequest) getRequest()).getContainerRequest();
final String remoteAddress = servletRequest.getRemoteAddr();
final String challengeField = servletRequest.getParameter("recaptcha_challenge_field");
final String responseField = servletRequest.getParameter("recaptcha_response_field");
to get the challenge and response fields so that they can be validated.
This doesn't work when the form has the file upload because the form must be multipart for the upload to work, and so when I try to get the parameters in that fashion, it fails.
I have pursued trying to get the parameters differently using ServletFileUpload:
ServletFileUpload fileUpload = new ServletFileUpload(new DiskFileItemFactory(new FileCleaner()) );
String response = IOUtils.toString(servletRequest.getInputStream());
and
ServletFileUpload fileUpload = new ServletFileUpload(new DiskFileItemFactory(new FileCleaner()) );
List<FileItem> requests = fileUpload.parseRequest(servletRequest);
both of which always return empty.
Using Chrome's network console, I see the values that I'm looking for in the Request Payload, so I know that they are there somewhere.
Any advice on why the requests are coming back empty and how to find them would be greatly appreciated.
Update: I have also tried making the ReCaptcha component multipart and left out the file upload. The result is still the same that the response is empty, leaving me with the original conclusion about multipart form submission being the problem.
Thanks to the Wicket In Action book, I have found the solution:
MultipartServletWebRequest multiPartRequest = webRequest.newMultipartWebRequest(getMaxSize(), "ignored");
// multiPartRequest.parseFileParts(); // this is needed since Wicket 6.19.0+
IRequestParameters params = multiPartRequest.getRequestParameters();
allows me to read the values now using the getParameterValue() method.
I am very new to vb/.net and I'm trying to do something that I can do easily in classic vb. I want to get the source html for a webpage from the URL.
I'm using vb.net in Visual Studio Express for Windows 8.
I've read loads of stuff that talk about HttpWebRequest, but I can't get it to work properly.
I did at one point have it returning the html header, but I want to content of the page. Now, I can't even get it back to giving me the header. Ultimately, I want to process the html returned which I'll do (to begin with) the old-fashioned way and process the returned html as a string, but for now I'd like to just get the page.
The code I've got is:
Dim URL As String = "http://www.crayola.com/"
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(New Uri(URL))
txtHTML.Text = request.GetRequestStreamAsync().ToString()
Can anyone help me with an example to get me going please?
You're trying to use an Async method in a synchronous way, which won't make any sense. If you're using .NET 4.5, you can try marking the calling method with Async and then using the Await keyword when calling GetRequestStreamAsync.
Public Sub MyDownloaderMethod()
Dim URL As String = "http://www.crayola.com/"
Dim request As System.Net.HttpWebRequest
= System.Net.HttpWebRequest.Create(New Uri(URL))
' Use the Await keyword wait for the async task to complete.
Dim response = request.GetResponseAsync()
txtHTML.Text = response.GetResponseStream().ToString()
End Function
See the following MSDN article for more information on async programming with the Await keyword: http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx
Edit
You are receiving your error because you're trying to get the Request stream (what you send the server), and what you really want is the Response stream (what the server sends back to you). I've updated my code to get the WebResponse from your WebRequest and then retrieve the stream from that.
Public Shared Function GetWebPageString(ByVal address As Uri) As String
Using client As New Net.WebClient()
Return client.DownloadString(address)
End Using
End Function
There is also DownloadStringAsync if you don't want to block
request.GetRequestStreamAsync() is probably not a method. I think you're cribbing code from a site where someone wrote their own add-on methods to HttpWebRequest. Try request.GetResponse() to return a response object, then in the response object you can inspect the stream and convert it to text if you need to.
This worked for me in VB.Net 4.5
Public Async Sub GetHTML()
Dim PageHTML as string
Dim client As New HttpClient
Dim getStringTask As Task(Of String) = client.GetStringAsync(PageURL)
PageHTML = Await getStringTask
MsgBox(PageHTML)
End Sub
I am new to Web Crawling, and I am using HttpWebRequest to crawl data from sites.
As of now I was successfully able to crawl and get data from my wordpress site. This data was a simple user profile data. (like name, email, AIM id etc...)
Now as an exercise I want to crawl wikipedia, where I will search using the value entered into textbox at my end and then crawl wikipedia with the search value and get the appropriate title(s) from the search.
Now I have the following doubts/difficulties.
Firstly, is this even possible ? I have heard that wiki has robot.txt setup to block this. Though I have heard this only from a friend and hence not sure.
I am using the same procedure I used earlier, but I am not getting the required results.
Thanks !
Update :
After some explanation and help from #svick, I tried the below code, but still not able to get any value (see last line of code, there I am expecting an html markup of the search result page)
string searchUrl = "http://en.wikipedia.org/w/index.php?search=Wikipedia&title=Special%3ASearch";
var postData = new StringBuilder();
postData.Append("search=" + model.Query);
postData.Append("&");
postData.Append("title" + "Special:Search");
byte[] data2 = Crawler.GetEncodedData(postData.ToString());
var webRequest = (HttpWebRequest)WebRequest.Create(searchUrl);
webRequest.Method = "POST";
webRequest.UserAgent = "Crawling HW (http://yassershaikh.com/contact-me/)";
webRequest.AllowAutoRedirect = false;
ServicePointManager.Expect100Continue = false;
Stream requestStream = webRequest.GetRequestStream();
requestStream.Write(data2, 0, data2.Length);
requestStream.Close();
var responseCsv = (HttpWebResponse)webRequest.GetResponse();
Stream response = responseCsv.GetResponseStream();
// Todo Parsing
var streamReader = new StreamReader(response);
string val = streamReader.ReadToEnd();
// val is empty !! <-- this is my problem !
and here is my GetEncodedData method defination.
public static byte[] GetEncodedData(string postData)
{
var encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(postData);
return data;
}
Pls help me on this.
You probably don't need to use HttpWebRequest. Using WebClient (or HttpClient if you're on .Net 4.5) will be much easier for you.
robots.txt doesn't actually block anything. If something doesn't support it (and .Net doesn't support it), it can access anything.
Wikipedia does block requests that don't have their User-Agent header set. And you should use an informative User-Agent string with your contact information.
A better way to access Wikipedia is to use its API, rather than scraping. This way, you will get an answer that's specifically meant to be read by a custom applications, formatted as XML or JSON. There are also dumps containing all information from Wikipedia available for download.
EDIT: The problem with your newly posted code is that your query returns a 302 Moved Temporarily response to the searched article, if it exists. Either remove the line that forbids AllowAutoRedirect, or add &fulltext=Search to your query, which will mean you won't get redirected.