VB.NET Get page Source WebBrowser Control - vb.net

i want to get the html Source code of website,
it something like:
wb.Document...

1:
webbrowser1.document.body.innerhtml
2:
Dim webClient As New System.Net.WebClient
Dim result As String = webClient.DownloadString("http://www.google.co.il")
;)

Related

Google searching URL and add Listbox C# vb.net

i want to using webrequest codes then adding google search result URL's listbox1. But i can't codes gives error.
Try
Dim adres As String = "https://www.google.com/search?q=" + TextBox1.Text
Dim istek As WebRequest = HttpWebRequest.Create(adres)
Dim cevap As WebResponse
cevap = istek.GetResponse()
Dim donenBilgiler As StreamReader = New StreamReader(cevap.GetResponseStream())
Dim gelen As String = donenBilgiler.ReadToEnd()
Dim titleIndexBaslangici As Integer = gelen.IndexOf("<link href") + 2
Dim titleIndexBitisi As Integer = gelen.Substring(titleIndexBaslangici).IndexOf(">")
ListBox1.Items.Add(gelen.Substring(titleIndexBaslangici, titleIndexBitisi))
Catch ex As Exception
MsgBox(ex.Message)
End Try
First, are you familliar with API, because that's what you need ! You might work this out with the way you want to make it but its really bad and no one will recommend you to continue like this!
What you need to look for is API, (google search API)! API "only" purpose is to access to some data (in database) with easy Http route (well documented), try it out!
If you keep trying to do it in your own way, the best result that you will get is a really bad html page that you will need to parse and you don't want that !

How would I retrieve certain information on a webpage using their ID value?

In vb.net, I can download a webpage as a string like this:
Using ee As New System.Net.WebClient()
Dim reply As String = ee.DownloadString("https://pastebin.com/eHcQRiff")
MessageBox.Show(reply)
End Using
Would it be possible to specify an ID tag of an item on the webpage so that the reply will only output the information inside of the code box/id tag?
Example:
The ID tag of RAW Paste Data on https://pastebin.com/eHcQRiff is id="paste_code" which includes the following text:
Test=1
Test=2
Is there anyway to get the WebClient to only output that exact same message using the ID tag (or any other method)?
You can use HtmlAgilityPack library
Dim document as HtmlAgilityPack.HtmlDocument = new HtmlAgilityPack.HtmlDocument()
document.Load(#"C:\YourDownloadedHtml.html")
Dim text as string = document.GetElementbyId("paste_code").InnerText
Some more sample code:
(Tested with HtmlAgilityPack 1.6.10.0)
Dim html As string = "<TD width=""""50%""""><DIV align=right>Name :<B> </B></DIV></TD><TD width=""""50%""""><div id='i1'>SomeText</div></TD><TR vAlign=center>"
Dim htmlDoc As HtmlDocument = New HtmlDocument
htmlDoc.LoadHtml(html) 'To load from html string directly
Dim name As String = htmlDoc.DocumentNode.SelectSingleNode("//td/div[#id='i1']").InnerText
Console.WriteLine(name)
Output:
SomeText

System.UnauthorizedAccessException only using multithreading

I wrote a code to parse some Web tables.
I get some web tables into an IHTMLElementCollection using Internet Explorer with this code:
TabWeb = IE.document.getelementsbytagname("table")
Then I use a sub who gets an object containing the IHTMLElementCollection and some other data:
Private Sub TblParsing(ByVal ArrVal() As Object)
Dim WTab As mshtml.IHTMLElementCollection = ArrVal(0)
'some code
End sub
My issue is: if I simply "call" this code, it works correctly:
Call TblParsing({WTab, LiRow})
but, if I try to run it into a threadpool:
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf TblParsing), {WTab, LiRow})
the code fails and give me multiple
System.UnauthorizedAccessException
This happens on (each of) these code rows:
Rws = WTab(RifWT("Disc")).Rows.Length
If Not IsError(WTab(6).Cells(1).innertext) Then
Ogg_W = WTab(6).Cells(1).innertext
My goal is to navigate to another web page while my sub perform parsing.
I want to clarify that:
1) I've tryed to send the entire HTML to the sub and get it into a webbrowser but it didn't work because it isn't possible to cast from System.Windows.Forms.HtmlElementCollection to mshtml.IHTMLElementCollection (or I wasn't able to do it);
2) I can't use WebRequest and similar: I'm forced to use InternetExplorer;
3) I can't use System.Windows.Forms.HtmlElementCollection because my parsing code uses Cells, Rows and so on that are unavailable (and I don't want to rewrite all my parsing code)
EDIT:
Ok, I modified my code using answer hints as below:
'This in the caller sub
Dim IE As Object = CreateObject("internetexplorer.application")
'...some code
Dim IE_Body As String = IE.document.body.innerhtml
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf TblParsing_2), {IE_Body, LiRow})
'...some code
'This is the called sub
Private Sub TblParsing_2(ByVal ArrVal() As Object)
Dim domDoc As New mshtml.HTMLDocument
Dim domDoc2 As mshtml.IHTMLDocument2 = CType(domDoc, mshtml.IHTMLDocument2)
domDoc2.write(ArrVal(0))
Dim body As mshtml.IHTMLElement2 = CType(domDoc2.body, mshtml.IHTMLElement2)
Dim TabWeb As mshtml.IHTMLElementCollection = body.getElementsByTagName("TABLE")
'...some code
I get no errors but I'm not sure that it's all right because I tryed to use IE_Body string into webbrowser and it throws errors in the webpage (it shows a popup and I can ignore errors).
Am I using the right way to get Html from Internet Explorer into a string?
EDIT2:
I changed my code to:
Dim IE As New SHDocVw.InternetExplorer
'... some code
Dim sourceIDoc3 As mshtml.IHTMLDocument3 = CType(IE.Document, mshtml.IHTMLDocument3)
Dim html As String = sourceIDoc3.documentElement.outerHTML
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf TblParsing_2), {html, LiRow})
'... some code
Private Sub TblParsing_2(ByVal ArrVal() As Object)
Dim domDoc As New mshtml.HTMLDocument
Dim domDoc2 As mshtml.IHTMLDocument2 = CType(domDoc, mshtml.IHTMLDocument2)
domDoc2.write(ArrVal(0))
Dim body As mshtml.IHTMLElement2 = CType(domDoc2.body, mshtml.IHTMLElement2)
Dim TabWeb As mshtml.IHTMLElementCollection = body.getElementsByTagName("TABLE")
But I get an error PopUp like (I tryed to translate it):
Title:
Web page error
Text:
Debug this page?
This page contains errors that might prevent the proper display or function properly.
If you are not testing the web page, click No.
two checkboxes
do not show this message again
Use script debugger built-in Internet Explorer
It's the same error I got trying to get Html text into a WebBrowser.
But, If I could ignore this error, I think the code could work!
While the pop is showing I get error on
Dim domDoc As New mshtml.HTMLDocument
Error text translated is:
Retrieving the COM class factory for component with CLSID {25336920-03F9-11CF-8FD0-00AA00686F13} failed due to the following error: The 8,001,010th message filter indicated that the application is busy. (Exception from HRESULT: 0x8001010A (RPC_E_SERVERCALL_RETRYLATER)).
Note that I've alredy set IE.silent = True
Edit: There was confusion as to what the OP meant by "Internet Explorer". I originally assumed that it meant the WinForm Webbrowser control; however the OP is creating the COM browser directly instead of using the .Net wrapper.
To get the browser document's defining HTML, you can cast the document against the mshtml.IHTMLDocument3 interface to expose the documentElement property.
Dim ie As New SHDocVw.InternetExplorer ' Proj COM Ref: Microsoft Internet Controls
ie.Navigate("some url")
' ... other stuff
Dim sourceIDoc3 As mshtml.IHTMLDocument3 = CType(ie.Document, mshtml.IHTMLDocument3)
Dim html As String = sourceIDoc3.documentElement.outerHTML
End Edit.
The following is based on my comment above. You use the WebBrowser.DocumentText property to create a mshtml.HTMLDocument.
Use this property when you want to manipulate the contents of an HTML page displayed in the WebBrowser control using string processing tools.
Once you extract this property as a String, there is no connection to the WebBrowser control and you can process the data in any thread you want.
Dim html As String = WebBrowser1.DocumentText
Dim domDoc As New mshtml.HTMLDocument
Dim domDoc2 As mshtml.IHTMLDocument2 = CType(domDoc, mshtml.IHTMLDocument2)
domDoc2.write(html)
Dim body As mshtml.IHTMLElement2 = CType(domDoc2.body, mshtml.IHTMLElement2)
Dim tables As mshtml.IHTMLElementCollection = body.getElementsByTagName("TABLE")
' ... do something
' cleanup COM objects
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(body)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(tables)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(domDoc)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(domDoc2)

Text from webpage

I need to get some text from this web page. I want to use the trade feed for my program to analyse the sentiment of the markets.
I used the browser control and the get element command but its not working. The problem is that whenever my browser starts to open the page I get Java scripts errors.
I tried with DOM but seems that i dont quite understand what i need to do :)
Here is the code:
Dim code As String
Using client As New WebClient
code = client.DownloadString("http://openbook.etoro.com/ahanit/#/profile/Trades/")
End Using
Dim htmlDocument As IHTMLDocument2 = New HTMLDocument(code)
htmlDocument.write(htmlDocument)
Dim allElements As IHTMLElementCollection = htmlDocument.body.all
Dim allid As IHTMLElementCollection = allElements.tags("id")
Dim element As IHTMLElement
For Each element In allid
element.title = element.innerText
MsgBox(element.innerText)
Next
Update: So I tried the HTML Agility pack, as suggested in the comments, and I am stuck again on this code
Dim plain As String = String.Empty
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml("http://openbook.etoro.com/ahanit/#/profile/Trades/")
Dim goodnods As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("THE PROBLEM")
For Each node In goodnods
TextBox1.Text = htmldoc.DocumentNode.InnerText
Next
Any advice what to now?
Ok I think I know what the problem is somehow the div that I need is hidden and its not loaded when I load the web page just the source code. Does someone knows how to load all the hidden divs ??
Here is my new code
Dim doc As New HtmlAgilityPack.HtmlDocument
Dim web As New HtmlWeb
doc = web.Load("http://openbook.etoro.com/ahanit/#/profile/Trades/")
Dim nodes As HtmlNode = doc.GetElementbyId("feed-items")
Dim id As String = nodes.WriteTo()
TextBox1.Text = TextBox1.Text & vbCrLf & id
user1336635,
Welcome to SO! Something you might try is to check out his source code, figure out what javascript function is populating the field you want (using firebug - I assume it's the one that "trades result in profit" next to it), and then embedding that script into a web page that your webbrowser control loads. That's where I'd try to start. I checked his source code and searched for "trades result in profit" and didn't find anything which leads me to believe hunting for the element 'might' not be possible. Just a starting place until someone with more experience with this chimes in!! Best!
-sf

call pylons controller from vb.net

i have a application thats writen with the pylons framework. Now i want to call some controllers from a vb.net application. How should i do this?
I've tried it like this:
Dim webclient As New WebClient
Dim dataStream As IO.Stream = webclient.OpenRead("http://192.168.0.20:5000/controller/default")
Dim reader As New StreamReader(dataStream)
Dim responseFromServer As String = reader.ReadToEnd()
Dim erg As String = responseFromServer.ToString
reader.Close()
dataStream.Close()
But instead of an json object that is generate by the pylons controller, i'll get the html code for the page which is reachable under "http://192.168.0.20:5000"
Any help would be great!
Cheers,
Nico
You're probably requesting either the wrong content type or the wrong URL.
Make sure the URL is correct, or try this code:
Dim webclient As New WebClient
webclient.Headers.Add(HttpRequestHeader.ContentType, "test/json")
Dim erg As String = webclient.DownloadString("http://192.168.0.20:5000/controller/default")
As I demonstrated, you should use the DownloadString method instead of manually using a StreamReader.
Thank you for the answer, but this brings me the same result.
I figured out that you have to first login to the page. That means i have to call another controller, which is responsible for the login. For that i have to add params in the post.
How i do this in vb.net?
Cheers Nico