VB.Net Webview2 How can I get html source code?

VB.Net Webview2 How can I get html source code? - vb.net

I sucessfully display a web site on WebView2 in my VB.net (Visual Studio 2017) project but can not get html souce code. Please advise me how to get html code.
My code:
Private Sub testbtn_Click(sender As Object, e As EventArgs) Handles testbtn.Click
WebView2.CoreWebView2.Navigate("https://www.microsoft.com/")
End Sub
Private Sub WebView2_NavigationCompleted(sender As Object, e As CoreWebView2NavigationCompletedEventArgs) Handles WebView2.NavigationCompleted
Dim html As String = ?????
End Sub
Thank you indeed for your advise in advance.

I've only just started messing with the WebView2 earlier today as well, and was just looking for this same thing. I did manage to scrape together this solution:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
' The Html comes back with unicode character codes, other escaped characters, and
' wrapped in double quotes, so I'm using this code to clean it up for what I'm doing.
html = Regex.Unescape(html)
html = html.Remove(0, 1)
html = html.Remove(html.Length - 1, 1)
Converted my code from C# to VB on the fly, so hopefully didn't miss any syntax errors.

Adding to #Xaviorq8 answer, you can use Span to get rid of generating new strings with Remove:
html = Regex.Unescape(html)
html = html.AsSpan()[1..^1].ToString();

I must credit #Xaviorq8; his answer was needed to solve my problem.
I was successfully using .NET WebBrowser and Html Agility Pack but I wanted to replace WebBrowser with .NET WebView2.
Snippet (working code with WebBrowser):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.Load(webBrowser1.DocumentStream);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
Snippet (failing code with WebView2):
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");
Success withWebView2 and Html Agility Pack
using HAP = HtmlAgilityPack;
HAP.HtmlDocument hapHtmlDocument = null;
string html = await webView21.ExecuteScriptAsync("document.documentElement.outerHTML");
// thanks to #Xaviorq8 answer (next 3 lines)
html = Regex.Unescape(html);
html = html.Remove(0, 1);
html = html.Remove(html.Length - 1, 1);
hapHtmlDocument = new HAP.HtmlDocument();
hapHtmlDocument.LoadHtml(html);
HtmlNodeCollection nodes = hapHtmlDocument.DocumentNode.SelectNodes("//*[#id=\"apptAndReportsTbl\"]");

The accepted answer is on the right track. However, it's missing on important thing:
The returned string is NOT HTMLEncoded, it's JSON!
So to do it right, you need to deserialize the JSON, which is just as simple:
Dim html As String
html = Await WebView2.ExecuteScriptAsync("document.documentElement.outerHTML;")
html = Await JsonSerializer.DeserializeAsync(Of String)(html);

Related

Why HtmlAgilityPack adds some characters to my html

Here is my code:
Dim input = "<div><textarea>something</div></textarea>"
Dim doc As New HtmlAgilityPack.HtmlDocument
doc.OptionOutputAsXml = True
doc.LoadHtml(Input)
Using writer As New StringWriter
doc.Save(writer)
Dim res = writer.ToString
End Using
and the value of 'res' is:
"<?xml version="1.0" encoding="windows-1255"?>
<div>
<textarea>
//<![CDATA[
something
//]]>//
</textarea>
</div>"
the result as html is: My textarea
How can I prevent it ?

From my understanding of it, the reason is implied by this answer to Set textarea value with HtmlAgilityPack:
A <textarea> element doesn't have a value attribute. It's content is it's own text node:
<textarea>
Some content
</textarea>
To simulate the same thing safely, HAP has to enclose the content in a //<![CDATA[ section.
The source code for HAP has this comment for the relevant line(s):
// tags whose content may be anything
ElementsFlags.Add("textarea", HtmlElementFlag.CData);
So, you can't prevent it.

Get info between two custom tags from a html page in vb.net's webbrowser

I need a code in vb.net for web browser to extract a information between two custom HTML tags.
Like <arandomword>MyWord</arandomword> in page.html and and to get MyWord .
Something easy please, cause i have only 15 years.
And i'm from Romania and i don't speak english very good.

You can try using HtmlDocument.GetElementsByTagName() method :
if (webBrowser1.Document != null)
{
HtmlElementCollection elems = webBrowser1.Document
.GetElementsByTagName("arandomword");
string word = elems[0].InnerText;
......
}

HtmlAgilityPack returning random characters

I have the following code that is using HtmlAgilityPack to pull back html code for a number of websites. All seems to be working well, apart from asos.com. When running a url through, it returns random characters (‹\b\0\0\0\0\0\0UÍÂ „ï&¾CãÁ¢ø›\bãhìÁ3-«Ziý}z‘š/»ómf³Ü`]In#iÉÑbr[œ¡Ä¬v7Ðœ¶7N[GáôSv;Ü°?[†.ã*3Ž¢G×ù6OƒäwPŒõH\rÙ¸\vzìmèÎ;M›4q_K¨Ð)
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = false;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");
request.Timeout = 10000;
request.ReadWriteTimeout = 32000;
request.UserAgent = "TEST";
request.Method = "GET";
request.Accept = "text/html";
request.AllowAutoRedirect = false;
request.CookieContainer = new CookieContainer();
StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream(), Encoding.Default); //put your encoding
doc.Load(reader);
string html = doc.DocumentNode.OuterHtml;
I have ran the url through Fiddler, however cant seem to see anything to suggest there should be a problem. Any ideas where i'm going wrong?
See header image from fiddler here: http://i.stack.imgur.com/2LRFY.png

This has nothing to do with Html Agility Pack, it's because you have set AllowAutoRedirect to false. Remove it and it will work. The site apparently does a redirect, you need to follow it if you want the final HTML text.
Note the Html Agility Pack has a utility HtmlWeb class that can download file directly as an HmlDocument:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(#"http://www.asos.com/ASOS/ASOS-Sweatshirt-With-Contrast-Ribs/Prod/pgeproduct.aspx?iid=2765751&cid=14368&sh=0&pge=0&pgesize=20&sort=-1&clr=Red");

How do i execute javascript that was loaded from the DOM

I am trying to execute a function in a javascript that has been loaded via the DOMDocument.
For instance:
'on page load
Webbrowser1.navigate("a htmldocument with a div called mainDiv")
Then later:
mDoc = WebBrowser1.Document
Dim mainDiv As IHTMLDOMNode = mDoc.DomDocument.getElementById("mainDiv")
mainDiv.innerHTML = (IO.File.ReadAllText("a file with just a div and script"))
'File has no html, head and body tags
So now i need to execute the script that was loaded retrospectively into mainDiv.
i've tried:
Webbrowser1.Document.InvokeScript("onLoadScript")
...but as far as i can gather, this method sees only the DOM loaded from the navigate event.
I'm hoping that there is a way of executing a script by accessing the DOMDocument.
Any help appreciated.
Thanks

You could try injecting a script that calls your dynamic script, dynamically. This bypasses the .InvokeScript function
HtmlElement headtag = WebBrowser1.Document.GetElementsByTagName("head")[0];
HtmlElement scripttag = WebBrowser1.Document.CreateElement("script");
IHTMLScriptElement scriptelm = (IHTMLScriptElement)scripttag.DomElement;
scriptelm.text = "onLoadScript();";
headtag.AppendChild(scripttag);

Visual Basic 2010 - Retrieving numbers from an html div

I'm currently working on a program that will average the prices from a searched item on Amazon.
I have button on the program that when pressed, prints out the HTML source code into a richtextbox and then finds the specific div within the source code.
My only problem right now is having it print out the money amount after each div.
Is there any way to do this?

You can use HTMLAgilityPack, one of the best HTML parsing libraries to ease this task.
An example.
Examples:
Assuming div's id is divPrice
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(yourHTML);
HtmlNode priceNode = doc.GetElementbyId("divPrice");
string price = priceNode.InnerText;
Let's say div has no id but a css class cssPrice, then you can query it with XPath:
HtmlNode priceNode = doc.DocumentNode.SelectSingleNode("//div[#class='cssPrice']");
OR
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[#class='cssPrice']");
foreach (HtmlNode node in nodes) {
string nodeText = node.InnerText;
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

VB.Net Webview2 How can I get html source code? - vb.net

Adding to #Xaviorq8 answer, you can use Span to get rid of generating new strings with Remove: html = Regex.Unescape(html) html = html.AsSpan()[1..^1].ToString();

Related

Why HtmlAgilityPack adds some characters to my html

Get info between two custom tags from a html page in vb.net's webbrowser

HtmlAgilityPack returning random characters

How do i execute javascript that was loaded from the DOM

Visual Basic 2010 - Retrieving numbers from an html div

Categories

Resources