Grab specific part of text from a local html file and use it as variable - vb.net

I am making a small "home" application using VB. As the title says, I want to grab a part of text from a local html file and use it as variable, or put it in a textbox.
I have tried something like this...
Private Sub Open_Button_Click(sender As Object, e As EventArgs) Handles Open_Button.Click
Dim openFileDialog As New OpenFileDialog()
openFileDialog.CheckFileExists = True
openFileDialog.CheckPathExists = True
openFileDialog.FileName = ""
openFileDialog.Filter = "All|*.*"
openFileDialog.Multiselect = False
openFileDialog.Title = "Open"
If openFileDialog.ShowDialog = Windows.Forms.DialogResult.OK Then
Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)
TextBox.Text = fileReader
End If
End Sub
The result is to load the whole html code inside this textbox. What should I do so to grab a specific part of html files's code? Let's say I want to grab only the word text from this span...<span id="something">This is a text!!!</a>

I make the following assumptions on this answer.
Your html is valid - i.e. the id is completely unique in the document.
You will always have an id on your html tag
You'll always be using the same tag (e.g. span)
I'd do something like this:
' get the html document
Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)
' split the html text based on the span element
Dim fileSplit as string() = fileReader.Split(New String () {"<span id=""something"">"}, StringSplitOptions.None)
' get the last part of the text
fileReader = fileSplit.last
' we now need to trim everything after the close tag
fileSplit = fileReader.Split(New String () {"</span>"}, StringSplitOptions.None)
' get the first part of the text
fileReader = fileSplit.first
' the fileReader variable should now contain the contents of the span tag with id "something"
Note: this code is untested and I've typed it on the stack exchange mobile app, so there might be some auto correct typos in it.
You might want to add in some error validation such as making sure that the span element only occurs once, etc.

Using an HTML parser is highly recommended due to the HTML language's many nested tags (see this question for example).
However, finding the contents of a single tag using Regex is possible with no bigger problems if the HTML is formatted correctly.
This would be what you need (the function is case-insensitive):
Public Function FindTextInSpan(ByVal HTML As String, ByVal SpanId As String, ByVal LookFor As String) As String
Dim m As Match = Regex.Match(HTML, "(?<=<span.+id=""" & SpanId & """.*>.*)" & LookFor & "(?=.*<\/span>)", RegexOptions.IgnoreCase)
Return If(m IsNot Nothing, m.Value, "")
End Function
The parameters of the function are:
HTML: The HTML code as string.
SpanId: The id of the span (ex. <span id="hello"> - hello is the id)
LookFor: What text to look for inside the span.
Online test: http://ideone.com/luGw1V

Related

Is there a way to retrieve some text from a webpage to a textbox in VB?

I'm trying to make it so I can have several text boxes in my form show pieces of information from a specific webpage. For example, would there be a way I would be able to retrieve the title of this question to a variable with the click of a button in Visual Basic?
It not hard but you have to look at the source page, and identify the elements.
In nicely formed pages, usually the div elements have a tag ID, but often they don't, so you have to grab say by a attribute name - often you can use the class name of the div in question.
So, to grab the Title, and you question text of this post?
This works:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim xDoc As New Xml.XmlDocument
Dim strURL As String = "https://stackoverflow.com/questions/55753982"
Dim xWeb As New WebBrowser
xWeb.ScriptErrorsSuppressed = True
xWeb.Navigate(strURL)
Do Until xWeb.ReadyState = WebBrowserReadyState.Complete
Application.DoEvents()
Loop
Dim HDoc As HtmlDocument = xWeb.Document
Debug.Print(HDoc.GetElementById("question-header").FirstChild.InnerText)
Debug.Print(FindClass(HDoc, "post-text"))
End Sub
Function FindClass(Hdoc As HtmlDocument, strClass As String) As String
' get all Divs, and search by class name
Dim OneElement As HtmlElement
For Each OneElement In Hdoc.GetElementsByTagName("div")
If OneElement.GetAttribute("classname") = strClass Then
Return OneElement.InnerText
End If
Next
' we get here, not found, so return a empty stirng
Return "not found"
End Function
OutPut:
(first part is title question)
Is there a way to retrieve some text from a webpage to a textbox in VB?
(second part is question text)
I'm trying to make it so I can have several text boxes in my form show pieces of
information from a specific webpage. For example, would there be a way I would be
able to retrieve the title of this question to a variable with the click of a button
in Visual Basic?

Reading Numbers from Webbrowser Vb.net

In my WebBrowser I got text like this: Remaining balance: 10$
I would like to convert it to another currency, I want just to read number from my browser, then after that I will send it to a Label or TextBox with the new converted currency. I am stuck here.
A screenshot of it
Private Sub LinkLabel2_LinkClicked(sender As Object, e As LinkLabelLinkClickedEventArgs) Handles LinkLabel2.LinkClicked
WebBrowser2.Navigate(TextBox9.Text + TextBox2.Text)
End Sub
Private Sub WebBrowser2_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser2.DocumentCompleted
Label1.Text = (WebBrowser2.Document.Body.InnerText)
End Sub
I have this suggestion (I know this is probably not the perfect solution):
First, try loading that same webpage in a normal web browser such as Google Chrome or Firefox or any browser which has the feature to "inspect elements", meaning to view their HTML code.
Find out the element which displays the price you want.
Note down the ID of the element (usually written something like id="someID")
Back to your program's code, include the following Function, which will get the text displayed and convert it to another currency:
Public Function ConvertDisplayedCurrency() As Decimal
Dim MyCurrencyElement As HtmlElement = WebBrowser2.Document.GetElementById("theIdYouGotInStep3") 'This will refer to the element you want.
Dim TheTextDisplayed As String = MyCurrencyElement.InnerText 'This will refer to the text that is displayed.
'Assuming that the text begins like "Remaining balance: ###", you need to strip off that first part, which is "Remaining balance: ".
Dim TheNumberDisplayed As String = TheTextDisplayed.Substring(19)
'The final variable TheNumberDisplayed will be resulting into a String like only the number.
Dim ParsedNumber As Decimal = 0 'A variable which will be used below.
Dim ParseSucceeded As Boolean = Decimal.TryParse(TheNumberDisplayed, ParsedNumber)
'The statement above will TRY converting the String TheNumberDisplayed to a Decimal.
'If it succeeds, the number will be set to the variable ParsedNumber and the variable
'ParseSucceeded will be True. If the conversion fails, the ParseSucceeded will be set
'to False.
If Not ParseSucceeded = True Then Return 0 : Exit Function 'This will return 0 and quit the Function if the parse was a failure.
'Now here comes your turn. Write your own statements to convert the number "ParsedNumber"
'to your new currency and finally write "Return MyFinalVariableName" in the end.
End Function
Call that Function probably when the document of the WebBrowser2 loads, that is, in the WebBrowser2_DocumentCompleted sub.
I hope it helps!

WP use string as name of control

Please, can anyone help me with this problem:
I have a name(s) of control(s) in string format (str) and I want to set property (in code) of that controls using that string-name.
I try something like this but it doesn't work. Actually, I have a problem with expression. When I put exactly the name it works but when i use variable in string format it doesn't.
Dim str as String
str="k3"
Dim g As Image = CType(str, Image)
g.Source = New BitmapImage(New Uri("/APP;component/Icons/hero.png", UriKind.Relative))
This works:
Dim g As Image = CType(k3, Image)
While this does not:
Dim g As Image = CType(str, Image)
I think I understand what you are trying to do, to declare an object by a string...
Essentially for this to work you will need a custom function that returns the Object Type that you are seeking...
You will need to loop through each control and check the name of the control as a comparison, e.g. If oControl.Name.ToString = sString then Return oControl
Example
' A function to return a Control by the Control's name...
Public Function GetControlByName(ByVal oForm As Form, ByVal sName As String) As Control
Dim cReturn As New Control
Dim ctrl As Control
For Each ctrl In oForm.Controls
cReturn = ctrl
If ctrl.Name.ToString = sName Then
Return ctrl ' this is what we want!
End If
Next
Return cReturn
End Function
' Example Usage
Dim oButton As Button = GetControlByName(Me, "Button44")
If oButton.Name.ToString = "Button44" Then
MessageBox.Show("I have found your Button!")
Else
MessageBox.Show("Your button was NOT Found!")
End If
Obviously there is room for error with this function, because if sName is NOT found, then it will return the last ctrl found, therefore, you will need to ensure that the control you seek is indeed found, via the If statement as provided in the example above...
Furthermore, it may not loop through controls inside of containers, menus, etc, but I'm not sure on that, so you will need to check to ensure it's not having that problem...
(The Me in the statement will most likely be used more often than not, though Me could be the name of the form you are searching if you are running the code outside of the form you are searching the form with the function.)
FINALLY, to answer your question, you will need to change Control to Image, and Set CReturn as a New Image, and then use Return ctrl.BackgroundImage (etc) to return the image..

How to Post & Retrieve Data from Website

I am working with a Windows form application. I have a textbox called "tbPhoneNumber" which contains a phone number.
I want to go on the website http://canada411.com and enter in the number that was in my textbox, into the website textbox ID: "c411PeopleReverseWhat" and then somehow send a click on "Find" (which is an input belonging to class "c411ButtonImg").
After that, I want to retrieve what is in between the asterixs of the following HTML section:
<div id="contact" class="vcard">
<span><h1 class="fn c411ListedName">**Full Name**</h1></span>
<span class="c411Phone">**(###)-###-####**</span>
<span class="c411Address">**Address**</span>
<span class="adr">
<span class="locality">**City**</span>
<span class="region">**Province**</span>
<span class="postal-code">**L#L#L#**</span>
</span>
So basically I am trying to send data into an input box, click the input button and store the values retrieved into variables. I want to do this seemlessly so I would need to do something like an HTTPWebRequest? Or do I use a WebBrowser object? I just don't want the user to see that the application is going on a website.
I do a good amount of website scraping and I will show you how I do it. Feel free to skip ahead if I am being too specific, but this is a commonly requested theme and should be made specific.
URL Simplification
The library I use for this is htmlagilitypack (It is a dll, make a new project and add a reference to it). The first thing to check is if we have to go to take any special steps to get to a page by using a phone number. I searched for John Smith and found quite a few. I entered 2 of these results and noticed that the url formatting is very simple. Those results were..
http://www.canada411.ca/res/7056736767/John-Smith/138223109.html
http://www.canada411.ca/res/7052355273/John-Smith/172439951.html
I tested to see if I can remove some of the values from the url that I don't know and just leave the phone number. The result was that I can...
http://www.canada411.ca/search/re/1/7056736767/-
http://www.canada411.ca/search/re/1/7052355273/-
You can see by the url that there are some static areas in the url and our phone number. From this lets construct a string for the url.
Dim phoneNumber as string = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL as string = "http://www.canada411.ca/search/re/1/" + phoneNumber +"/-"
Value Extraction with XPath
Now that we have the page dialed in, lets examine the html you provided above. You need 6 values from the page so we will create them now...
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
As mentioned above, we will be using htmlagilitypack which uses Xpath. The cool thing about this is that once we can find some unique identifier in the html, we can use Xpath to find our values. I know it may be confusing, but it will become clearer.
All of the values you need are within tags that have a class name. Lets use the class name in our Xpath to find them.
Dim FullNameXPath As String = "//*[#class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[#class='c411Phone']"
Dim AddressXPath As String = "//*[#class='c411Address']"
Dim LocalityXPath As String = "//*[#class='locality']"
Dim RegionXPath As String = "//*[#class='region']"
Dim PostalCodeXPath As String = "//*[#class='postal-code']"
Essentially what we are looking at is a string that will inform htmlagilitypack what to look for. In our case, text contained within the classes we named. There is a lot to XPath and it could take a while to explain all of it. On a side note though...If you use Google Chrome and highlight a value on a page, you can right click inspect element. In the code that appears below, you can right click the value and copy to XPath!!! Very useful.
Basic HTMLAgilityPack Template
Now, all that is left is to connect to the page and get those variables populated.
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
Msgbox(nameResult.InnerText)
Next
In the above example we create an HtmlWeb object named Web. This is the actual crawler of our project. We then define a HtmlDocument which will consist of our converted and searchable page source. All of this is done behind the scenes. We then send Web to get the page source and assign it to the Doc object we created. Doc is reusable, which thankfully requires us to connect to the page only once.
The for loop looks for any nodes in our Doc that match FullNameXPath which was defined previously as the XPath value for finding the name. When a Node is found, it is assigned to the nameResult variable and from within the loop we call a message box to display the inner text of our node.
So when we put it all together
Complete Working Code (As of 2/17/2013)
Dim phoneNumber As String = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL As String = "http://www.canada411.ca/search/re/1/" + phoneNumber + "/-"
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
Dim FullNameXPath As String = "//*[#class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[#class='c411Phone']"
Dim AddressXPath As String = "//*[#class='c411Address']"
Dim LocalityXPath As String = "//*[#class='locality']"
Dim RegionXPath As String = "//*[#class='region']"
Dim PostalCodeXPath As String = "//*[#class='postal-code']"
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
FullName = nameResult.InnerText
MsgBox(FullName)
Next
For Each PhoneResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PhoneXPath)
Phone = PhoneResult.InnerText
MsgBox(Phone)
Next
For Each ADDRResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(AddressXPath)
Address = ADDRResult.InnerText
MsgBox(Address)
Next
For Each LocalResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(LocalityXPath)
Locality = LocalResult.InnerText
MsgBox(Locality)
Next
For Each RegionResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(RegionXPath)
Region = RegionResult.InnerText
MsgBox(Region)
Next
For Each postalCodeResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PostalCodeXPath)
PostalCode = postalCodeResult.InnerText
MsgBox(PostalCode)
Next
Yes it is possible, I've done this using the selenium framework, which is aimed for testing automation. However, it provides you with the tools to do exactly that.
Download for .net here:
http://docs.seleniumhq.org/download/

How do I search through a string for a particular hyperlink in Visualbasic.net?

I have a written a program which downloads a webpage's source but now I want to search the source for a particular link I know the link is written like this:
<b>Geographical Survey Work</b>
Is there anyway of using "Geographical Survey Work" as criteria to retrieve the link? The code I am using to download the source to a string is this:
Dim sourcecode As String = ((New Net.WebClient).DownloadString("http://examplesite.com"))
So just to clarify I want to type into an input box "Geographical Survey Work" for instance and "/internet/A2" to popup in a messagebox? I think it can be done using a regex, but that's a bit beyond me. Any help would be great.
With HTMLAgilityPack:
Dim vsPageHTML As String = "<html>... your webpage HTML code ...</html>"
Dim voHTMLDoc.LoadHtml(vsPageHTML) : vsPageHTML = ""
Dim vsURI As String = ""
Dim voNodes As HtmlAgilityPack.HtmlNodeCollection = voHTMLDoc.SelectNodes("//a[#href]")
If Not IsNothing(voNodes) Then
For Each voNode As HtmlAgilityPack.HtmlNode In voNodes
If voNode.innerHTML.toLower() = "<b>geographical survey work</b>" Then
vsURI = voNode.GetAttributeValue("href", "")
Exit For
End If
Next
End If
voNodes = Nothing : voHTMLDoc = Nothing
Do whatever you want with vsURI.
You might need to tweak the code a bit as I'm writing free-hand.