I sucessfully display a web site on WebView2 in my VB.net (Visual Studio 2017) project but can not get html souce code. Please advise me how to get html code.
My code:
Private Sub testbtn_Click(sender As Object, e As EventArgs) Handles testbtn.Click
WebView2.CoreWebView2.Navigate("https://www.microsoft.com/")
End Sub
Private Sub WebView2_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WebView2.NavigationCompleted
Dim html As String = ?????
End Sub
Thank you indeed for your advise in advance.
I've only just started messing with the WebView2 earlier today as well, and was just looking for this same thing. I did manage to scrape together this solution:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
' The Html comes back with unicode character codes, other escaped characters, and
' wrapped in double quotes, so I'm using this code to clean it up for what I'm doing.
html = Regex.Unescape(html)
html = html.Remove(0, 1)
html = html.Remove(html.Length - 1, 1)
Converted my code from C# to VB on the fly, so hopefully didn't miss any syntax errors.
Adding to #Xaviorq8 answer, you can use Span to get rid of generating new strings with Remove:
html = Regex.Unescape(html)
html = html.AsSpan()[1..^1].ToString();
I must credit #Xaviorq8; his answer was needed to solve my problem.
I was successfully using .NET WebBrowser and Html Agility Pack but I wanted to replace WebBrowser with .NET WebView2.
Snippet (working code with WebBrowser):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.Load(webBrowser1.DocumentStream);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
Snippet (failing code with WebView2):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
Success withWebView2 and Html Agility Pack
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
// thanks to #Xaviorq8 answer (next 3 lines)
html = Regex.Unescape(html);
html = html.Remove(0, 1);
html = html.Remove(html.Length - 1, 1);
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
The accepted answer is on the right track. However, it's missing on important thing:
The returned string is NOT HTMLEncoded, it's JSON!
So to do it right, you need to deserialize the JSON, which is just as simple:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
html = Await JsonSerializer.DeserializeAsync(Of String)(html);
I am wondering if it is possible to obtain captcha image from site using web request. The thing is, that this captcha image is changing every time I refresh the page, for example:
http://zapytaj.onet.pl/captcha/obrazek.jpg?6165303
As you can notice, if you refresh the page url stays the same, but the code is changed.
The implementation of captcha code on page is:
<img id="captcha-image" src="/captcha/obrazek.jpg?5842454"/>
<div>
Complete captcha (Refresh)
<input type="text" name="captcha_code" class="text">
Url link to every captcha is changing every time I refresh to. So even if I get the captcha url it will not work, because the image is randomly generated.
I managed something like this:
Dim dt1970 As DateTime = New DateTime(1970, 1, 1)
Dim current As Date = DateTime.Now
Dim span As TimeSpan = current - dt1970
Dim requestUriString As String = "http://zapytaj.onet.pl/captcha/obrazek.jpg?" + (Convert.ToInt64(DateTime.Now) / span.TotalMilliseconds).ToString() + ""
Dim httpWebRequest As HttpWebRequest = CType(WebRequest.Create(requestUriString), HttpWebRequest)
httpWebRequest.CookieContainer = Me.cookieJar(4)
Dim httpWebResponse As HttpWebResponse = CType(httpWebRequest.GetResponse(), HttpWebResponse)
Dim image As Image = Image.FromStream(httpWebResponse.GetResponseStream())
httpWebResponse.Close()
Me.PictureBox1.SizeMode = PictureBoxSizeMode.StretchImage
Me.PictureBox1.Image = image
I should also add the bitwise AND operation, but I think that would not work anyway. I think because of differences in DateTime values between my app and website.
If I take only the http://zapytaj.onet.pl/captcha/obrazek.jpg as requestUriString I got the captcha image, but as I said its not that one.
The url with captcha image:
http://zapytaj.onet.pl/register.html#register-email-form
We have the requirement to take a form submission and save some data, then redirect the user to a page offsite, but in redirecting, we need to "submit" a form with POST, not GET.
I was hoping there was an easy way to accomplish this, but I'm starting to think there isn't. I think I must now create a simple other page, with just the form that I want, redirect to it, populate the form variables, then do a body.onload call to a script that merely calls document.forms[0].submit();
Can anyone tell me if there is an alternative? We might need to tweak this later in the project, and it might get sort of complicated, so if there was an easy we could do this all non-other page dependent that would be fantastic.
Anyway, thanks for any and all responses.
Doing this requires understanding how HTTP redirects work. When you use Response.Redirect(), you send a response (to the browser that made the request) with HTTP Status Code 302, which tells the browser where to go next. By definition, the browser will make that via a GET request, even if the original request was a POST.
Another option is to use HTTP Status Code 307, which specifies that the browser should make the redirect request in the same way as the original request, but to prompt the user with a security warning. To do that, you would write something like this:
public void PageLoad(object sender, EventArgs e)
{
// Process the post on your side
Response.Status = "307 Temporary Redirect";
Response.AddHeader("Location", "http://example.com/page/to/post.to");
}
Unfortunately, this won't always work. Different browsers implement this differently, since it is not a common status code.
Alas, unlike the Opera and FireFox developers, the IE developers have never read the spec, and even the latest, most secure IE7 will redirect the POST request from domain A to domain B without any warnings or confirmation dialogs! Safari also acts in an interesting manner, while it does not raise a confirmation dialog and performs the redirect, it throws away the POST data, effectively changing 307 redirect into the more common 302.
So, as far as I know, the only way to implement something like this would be to use Javascript. There are two options I can think of off the top of my head:
Create the form and have its action attribute point to the third-party server. Then, add a click event to the submit button that first executes an AJAX request to your server with the data, and then allows the form to be submitted to the third-party server.
Create the form to post to your server. When the form is submitted, show the user a page that has a form in it with all of the data you want to pass on, all in hidden inputs. Just show a message like "Redirecting...". Then, add a javascript event to the page that submits the form to the third-party server.
Of the two, I would choose the second, for two reasons. First, it is more reliable than the first because Javascript is not required for it to work; for those who don't have it enabled, you can always make the submit button for the hidden form visible, and instruct them to press it if it takes more than 5 seconds. Second, you can decide what data gets transmitted to the third-party server; if you use just process the form as it goes by, you will be passing along all of the post data, which is not always what you want. Same for the 307 solution, assuming it worked for all of your users.
You can use this aproach:
Response.Clear();
StringBuilder sb = new StringBuilder();
sb.Append("<html>");
sb.AppendFormat(#"<body onload='document.forms[""form""].submit()'>");
sb.AppendFormat("<form name='form' action='{0}' method='post'>",postbackUrl);
sb.AppendFormat("<input type='hidden' name='id' value='{0}'>", id);
// Other params go here
sb.Append("</form>");
sb.Append("</body>");
sb.Append("</html>");
Response.Write(sb.ToString());
Response.End();
As result right after client will get all html from server the event onload take place that triggers form submit and post all data to defined postbackUrl.
HttpWebRequest is used for this.
On postback, create a HttpWebRequest to your third party and post the form data, then once that is done, you can Response.Redirect wherever you want.
You get the added advantage that you don't have to name all of your server controls to make the 3rd parties form, you can do this translation when building the POST string.
string url = "3rd Party Url";
StringBuilder postData = new StringBuilder();
postData.Append("first_name=" + HttpUtility.UrlEncode(txtFirstName.Text) + "&");
postData.Append("last_name=" + HttpUtility.UrlEncode(txtLastName.Text));
//ETC for all Form Elements
// Now to Send Data.
StreamWriter writer = null;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.ToString().Length;
try
{
writer = new StreamWriter(request.GetRequestStream());
writer.Write(postData.ToString());
}
finally
{
if (writer != null)
writer.Close();
}
Response.Redirect("NewPage");
However, if you need the user to see the response page from this form, your only option is to utilize Server.Transfer, and that may or may not work.
Something new in ASP.Net 3.5 is this "PostBackUrl" property of ASP buttons. You can set it to the address of the page you want to post directly to, and when that button is clicked, instead of posting back to the same page like normal, it instead posts to the page you've indicated. Handy. Be sure UseSubmitBehavior is also set to TRUE.
This should make life much easier.
You can simply use Response.RedirectWithData(...) method in your web application easily.
Imports System.Web
Imports System.Runtime.CompilerServices
Module WebExtensions
<Extension()> _
Public Sub RedirectWithData(ByRef aThis As HttpResponse, ByVal aDestination As String, _
ByVal aData As NameValueCollection)
aThis.Clear()
Dim sb As StringBuilder = New StringBuilder()
sb.Append("<html>")
sb.AppendFormat("<body onload='document.forms[""form""].submit()'>")
sb.AppendFormat("<form name='form' action='{0}' method='post'>", aDestination)
For Each key As String In aData
sb.AppendFormat("<input type='hidden' name='{0}' value='{1}' />", key, aData(key))
Next
sb.Append("</form>")
sb.Append("</body>")
sb.Append("</html>")
aThis.Write(sb.ToString())
aThis.End()
End Sub
End Module
Thought it might interesting to share that heroku does this with it's SSO to Add-on providers
An example of how it works can be seen in the source to the "kensa" tool:
https://github.com/heroku/kensa/blob/d4a56d50dcbebc2d26a4950081acda988937ee10/lib/heroku/kensa/post_proxy.rb
And can be seen in practice if you turn of javascript. Example page source:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Heroku Add-ons SSO</title>
</head>
<body>
<form method="POST" action="https://XXXXXXXX/sso/login">
<input type="hidden" name="email" value="XXXXXXXX" />
<input type="hidden" name="app" value="XXXXXXXXXX" />
<input type="hidden" name="id" value="XXXXXXXX" />
<input type="hidden" name="timestamp" value="1382728968" />
<input type="hidden" name="token" value="XXXXXXX" />
<input type="hidden" name="nav-data" value="XXXXXXXXX" />
</form>
<script type="text/javascript">
document.forms[0].submit();
</script>
</body>
</html>
PostbackUrl can be set on your asp button to post to a different page.
if you need to do it in codebehind, try Server.Transfer.
#Matt,
You can still use the HttpWebRequest, then direct the response you receive to the actual outputstream response, this would serve the response back to the user. The only issue is that any relative urls would be broken.
Still, that may work.
I suggest building an HttpWebRequest to programmatically execute your POST and then redirect after reading the Response if applicable.
Here's what I'd do :
Put the data in a standard form (with no runat="server" attribute) and set the action of the form to post to the target off-site page.
Before submitting I would submit the data to my server using an XmlHttpRequest and analyze the response. If the response means you should go ahead with the offsite POSTing then I (the JavaScript) would proceed with the post otherwise I would redirect to a page on my site
In PHP, you can send POST data with cURL. Is there something comparable for .NET?
Yes, HttpWebRequest, see my post below.
The GET (and HEAD) method should never be used to do anything that has side-effects. A side-effect might be updating the state of a web application, or it might be charging your credit card. If an action has side-effects another method (POST) should be used instead.
So, a user (or their browser) shouldn't be held accountable for something done by a GET. If some harmful or expensive side-effect occurred as the result of a GET, that would be the fault of the web application, not the user. According to the spec, a user agent must not automatically follow a redirect unless it is a response to a GET or HEAD request.
Of course, a lot of GET requests do have some side-effects, even if it's just appending to a log file. The important thing is that the application, not the user, should be held responsible for those effects.
The relevant sections of the HTTP spec are 9.1.1 and 9.1.2, and 10.3.
Typically, all you'll ever need is to carry some state between these two requests. There's actually a really funky way to do this which doesn't rely on JavaScript (think <noscript/>).
Set-Cookie: name=value; Max-Age=120; Path=/redirect.html
With that cookie there, you can in the following request to /redirect.html retrieve the name=value info, you can store any kind of information in this name/value pair string, up to say 4K of data (typical cookie limit). Of course you should avoid this and store status codes and flag bits instead.
Upon receiving this request you in return respond with a delete request for that status code.
Set-Cookie: name=value; Max-Age=0; Path=/redirect.html
My HTTP is a bit rusty I've been going trough RFC2109 and RFC2965 to figure how reliable this really is, preferably I would want the cookie to round trip exactly once but that doesn't seem to be possible, also, third-party cookies might be a problem for you if you are relocating to another domain. This is still possible but not as painless as when you're doing stuff within your own domain.
The problem here is concurrency, if a power user is using multiple tabs and manages to interleave a couple of requests belonging to the same session (this is very unlikely, but not impossible) this may lead to inconsistencies in your application.
It's the <noscript/> way of doing HTTP round trips without meaningless URLs and JavaScript
I provide this code as a prof of concept: If this code is run in a context that you are not familiar with I think you can work out what part is what.
The idea is that you call Relocate with some state when you redirect, and the URL which you relocated calls GetState to get the data (if any).
const string StateCookieName = "state";
static int StateCookieID;
protected void Relocate(string url, object state)
{
var key = "__" + StateCookieName + Interlocked
.Add(ref StateCookieID, 1).ToInvariantString();
var absoluteExpiration = DateTime.Now
.Add(new TimeSpan(120 * TimeSpan.TicksPerSecond));
Context.Cache.Insert(key, state, null, absoluteExpiration,
Cache.NoSlidingExpiration);
var path = Context.Response.ApplyAppPathModifier(url);
Context.Response.Cookies
.Add(new HttpCookie(StateCookieName, key)
{
Path = path,
Expires = absoluteExpiration
});
Context.Response.Redirect(path, false);
}
protected TData GetState<TData>()
where TData : class
{
var cookie = Context.Request.Cookies[StateCookieName];
if (cookie != null)
{
var key = cookie.Value;
if (key.IsNonEmpty())
{
var obj = Context.Cache.Remove(key);
Context.Response.Cookies
.Add(new HttpCookie(StateCookieName)
{
Path = cookie.Path,
Expires = new DateTime(1970, 1, 1)
});
return obj as TData;
}
}
return null;
}
Copy-pasteable code based on Pavlo Neyman's method
RedirectPost(string url, T bodyPayload) and GetPostData() are for those who just want to dump some strongly typed data in the source page and fetch it back in the target one.
The data must be serializeable by NewtonSoft Json.NET and you need to reference the library of course.
Just copy-paste into your page(s) or better yet base class for your pages and use it anywhere in you application.
My heart goes out to all of you who still have to use Web Forms in 2019 for whatever reason.
protected void RedirectPost(string url, IEnumerable<KeyValuePair<string,string>> fields)
{
Response.Clear();
const string template =
#"<html>
<body onload='document.forms[""form""].submit()'>
<form name='form' action='{0}' method='post'>
{1}
</form>
</body>
</html>";
var fieldsSection = string.Join(
Environment.NewLine,
fields.Select(x => $"<input type='hidden' name='{HttpUtility.UrlEncode(x.Key)}' value='{HttpUtility.UrlEncode(x.Value)}'>")
);
var html = string.Format(template, HttpUtility.UrlEncode(url), fieldsSection);
Response.Write(html);
Response.End();
}
private const string JsonDataFieldName = "_jsonData";
protected void RedirectPost<T>(string url, T bodyPayload)
{
var json = JsonConvert.SerializeObject(bodyPayload, Formatting.Indented);
//explicit type declaration to prevent recursion
IEnumerable<KeyValuePair<string, string>> postFields = new List<KeyValuePair<string, string>>()
{new KeyValuePair<string, string>(JsonDataFieldName, json)};
RedirectPost(url, postFields);
}
protected T GetPostData<T>() where T: class
{
var urlEncodedFieldData = Request.Params[JsonDataFieldName];
if (string.IsNullOrEmpty(urlEncodedFieldData))
{
return null;// default(T);
}
var fieldData = HttpUtility.UrlDecode(urlEncodedFieldData);
var result = JsonConvert.DeserializeObject<T>(fieldData);
return result;
}
I have the following code that is using HtmlAgilityPack to pull back html code for a number of websites. All seems to be working well, apart from asos.com. When running a url through, it returns random characters (‹\b\0\0\0\0\0\0UÍ „ï&¾CãÁ¢ø›\bãhìÁ3-«Ziý}z‘š/»ómf³Ü`]In#iÉÑbr[œ¡Ä¬v7Ðœ¶7N[GáôSv;Ü°?[†.ã*3Ž¢G×ù6OƒäwPŒõH\rÙ¸\vzìmèÎ;M›4q_K¨Ð)
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = false;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
request.Timeout = 10000;
request.ReadWriteTimeout = 32000;
request.UserAgent = "TEST";
request.Method = "GET";
request.Accept = "text/html";
request.AllowAutoRedirect = false;
request.CookieContainer = new CookieContainer();
StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.Default); //put your encoding
doc.Load(reader);
string html = doc.DocumentNode.OuterHtml;
I have ran the url through Fiddler, however cant seem to see anything to suggest there should be a problem. Any ideas where i'm going wrong?
See header image from fiddler here: http://i.stack.imgur.com/2LRFY.png
This has nothing to do with Html Agility Pack, it's because you have set AllowAutoRedirect to false. Remove it and it will work. The site apparently does a redirect, you need to follow it if you want the final HTML text.
Note the Html Agility Pack has a utility HtmlWeb class that can download file directly as an HmlDocument:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(#"http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
Yes I am using Classic ASP, not by choice I am supporting a Legacy application. Objective: I need to have a form page that submits to another .asp page that will upload the file and store it on the server in a certain directory such as "/uploads". I'm not real familiar with asp or asp.net so I am very new to this. I've created a test prototype:
Form page:
<!DOCTYPE html>
<head>
<title>Test upload</title>
</head>
<body>
<form action="process.asp" method="post" enctype="multipart/form-data">
<p>Filename: <input type="text" name="filename" size="50" /></p>
<p><input type="file" name="file" /><input type="submit" value="Upload file" /></p>
</form>
</body>
</html>
Processing page:
<%
Set fs = Server.CreateObject("Scripting.FileSystemObject")
Set tfolder = fs.GetSpecialFolder(2)
tname = fs.GetTempName
'Declare variables
Dim fileSize
Dim filename
Dim file
Dim fileType
Dim p
Dim newPath
'Assign variables
fileSize = Request.TotalBytes
fileName = Request.form("filename")
file = request.form("file")
fileType = fs.GetExtensionName(file)
fileOldPath = tfolder
newPath = Server.MapPath("/uploads/")
fs.MoveFile fileOrigPath, newPath
set fs = nothing
%>
The problem is that everytime I try to upload or run the script I get this error:
Microsoft VBScript runtime error '800a0035'
File not found
/tbird/fileUpload/process.asp, line 25
Obviously I'm not mapping correctly to the file and I think the major reason I am getting stuck is that in the first parameter of the MoveFile method I am not mapping to the file correctly. Can anyone tell me how I should be referencing the file or if I am doing it wrong?
Thanks in advance I would really appreciate the help I've searched all over and everything I find related to classic asp and uploading files are classes that you can purchase and I don't want to do that.
Have a look at a solution like Pure ASP Upload, it should help you. In classic ASP, you cannot directly access Request.Form when data is sent in multipart/form-data, so you have the choice of using a third party component like ASPUpload or a ASP class that does the work of parsing the request for you and exposing methods to save the file.
When moving files you must also specify the file names.
Change:
fs.MoveFile fileOrigPath, newPath
To:
fs.MoveFile fileOrigPath & fileName, newPath & fileName
Assuming "fileName" is the correct variable for the file name and not "file" variable above.