Get information from a HTML page - vb.net

I'm novice and I'm trying to understand how to get information from a webpage, I've already read about HtmlAgilityPack and I'm using it, but after 2 days trying to understand how I can do this, here am I asking for help.
Ok, the thing is: I want to read some informations from a page and write it in a label text.
The page I'll use as example is: http://www.tibia.com/community/?subtopic=characters&name=Huur
I want to show in different labels the level, the vocation and the guild informations...
But, all I got is this:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim myWeb As HtmlWeb = New HtmlWeb
Dim myDoc = myWeb.Load("http://www.tibia.com/community/?subtopic=characters&name=" & TextBox1.Text.Trim())
Dim myRoot As HtmlNode = myDoc.DocumentNode
Dim myElements As List(Of HtmlElement) = New List(Of HtmlElement)
Dim MainContentArea As HtmlNode
myWeb.Load("http://www.tibia.com/community/?subtopic=characters&name=" & TextBox1.Text.Trim())
MainContentArea = myDoc.GetElementbyId("characters")
TextBox2.Text = MainContentArea.InnerHtml
End Sub
As you guys can see, I found a way to read all the character informations, but I don't know how to find the thing that I want that is: level, vocation and guild informations and show it in differents labels text.
Can you guys help me please? :}
(In the code I'm using Textbox2.Text to show the page content cause it shows alot of things and I've got errors when trying to show the content in a label text.)
Sorry for the bad english guys.

First I would suggest looking at xpath if you aren't familiar with it. Secondly, you will need to figure out your html structure. You can use Firefox and go to what you are looking for and right-click on inspect element. It will lay out the structure of the document and give you information you can use for the xpath.
For instance if you want to get the level you can use "/html/body//div[#class='BoxContent'/table/body/tr[td='Level:']/td" to get the element that contains the level indicator and then move to the HtmlNode.NextSibling to get the next element whose text contains the value of the level you are looking for.
I hope that is enough to get you started.

Related

Needing Assistance with vb.NET Application

I don't normally post on forums because I try to find information for myself, and ask as an absolute last resort. I've tried scouring the net for answers, but I'm only receiving about half of the answer I'm looking for.
I'm currently building an application that deals with state law. There's one combo box and one text box. One for the offense title, and one for the numerical code for that particular code section. So say if I select "Kidnapping", it prepopulates the text box below it with "11-5-77", for example.
The method I've been using for, oh, about the last hour now, is:
If AWOffenseTitle.Text = "Kidnapping" Then
AWCN.Text = "11-5-77"
ElseIf AWOffenseTitle.Text = "False Imprisonment" Then
AWCN.Text = "11-5-78"
With AWOffenseTitle being the combo box name, and AWCN being the text box name. While this has proved to work perfectly well so far, I'm sure you can imagine with hundreds of offense titles, this is going to take a ridiculously long time. Well, I finally found a spreadsheet with offense titles and their respective title codes. What I'm looking to do is create two text files within a folder in the local directory "Offenses". One with a vertical list of offenses, and one with a vertical list of offense code numbers that populate the same lines in each. What I'm looking to do is populate the combo box with the contents of text file one (which I can do already), but then selecting an offense title will read the second text file and display it's proper title code. That's what has me at a loss. I'm relatively well-versed with vb.NET, but I'm not an expert by any means.
I'm hoping someone here will be able to provide a code example and explain it to me line-by-line so I can gain a better understanding. I want to get more proficient with VB although it's not so popular anymore. I've been using VB since 6.0, but not on a regular basis. More on a sporadic project kind of basis.
I really appreciate any assistance anyone might be able to provide, and if you need more information, I'd be glad to answer any questions. I tried to be as thorough as I could.
Thank you in advance!
First, you need to retrieve your data. I demonstrated using an Sql Server database containing a table named Offenses with columns named OffenseTitle and OffenseCode. You will have to change this code to match your situation.
Private Function GetOffenseData() As DataTable
Dim dt As New DataTable
Using cn As New SqlConnection("Your connection string"),
cmd As New SqlCommand("Select OffenseTitle, OffenseCode From Offenses;")
cn.Open()
dt.Load(cmd.ExecuteReader)
End Using
Return dt
End Function
As the Form loads, set the properties of the ComboBox. DisplayMember matches the name of the title column and ValueMember is the name of the code column.
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim dt = GetOffenseData()
ComboBox1.DisplayMember = "OffenseTitle"
ComboBox1.ValueMember = "OffenseCode"
ComboBox1.DataSource = dt
End Sub
Then when the selected item in the combo changes, just set the .Text property of TextBox to the SelectedValue in the combo and your code appears.
Private Sub ComboBox1_SelectionChangeCommitted(sender As Object, e As EventArgs) Handles ComboBox1.SelectionChangeCommitted
TextBox1.Text = ComboBox1.SelectedValue.ToString
End Sub
There are other ways to do this if your data source is other than a database. Please advise if you need additional help.
In addition to HardCode's comment and Mary's detailed answer, I can only add an answer that's somewhere in between them.
It might be the case, that the information is not taken from a database, but from another source, like a text/data file or a web service. So it might be useful to create an abstraction for the data source you actually use.
First, I create a class or struct that will hold the data for each combo box item.
Class Offense
Public ReadOnly Property Title As String
Public ReadOnly Property Code As String
Public Sub New(title As String, code As String)
Me.Title = title
Me.Code = code
End Sub
End Class
Next, you need a method that retrieves a list of offenses that you can bind to your combo box. It's entirely up to you how you fill/fetch the offenses list. I have simply hard coded your two values here.
Private Function GetOffenseData() As List(Of Offense)
Dim offenses As New List(Of Offense)
offenses.Add(New Offense("Kidnapping", "11-5-77"))
offenses.Add(New Offense("False Imprisonment", "11-5-78"))
Return offenses
End Function
At a certain moment (probably in your form's Load event handler), you need to initialize your combo box. Just like Mary did, I use data binding.
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
AWOffenseTitle.DropDownStyle = ComboBoxStyle.DropDownList
AWCN.ReadOnly = True
AWOffenseTitle.DisplayMember = NameOf(Offense.Title)
AWOffenseTitle.ValueMember = NameOf(Offense.Code)
AWOffenseTitle.DataSource = GetOffenseData()
End Sub
Note that I use the NameOf operator to get the desired property names of the Offense class. If you ever decide to rename the properties of your Offense class, you will be able to easily detect where they are used, since the compiler will complain if your code still uses the wrong property names somewhere.
Finally, the app needs to react to combo box value changes, so that the text box will show the corresponding offense code. Mary used an event handler for the SelectionChangeCommitted event, but I use a handler for the SelectedIndexChanged event instead:
Private Sub AWOffenseTitle_SelectedIndexChanged(sender As Object, e As EventArgs) Handles AWOffenseTitle.SelectedIndexChanged
AWCN.Text = AWOffenseTitle.SelectedValue
End Sub
(Up to now, I was not aware of the SelectionChangeCommitted event of the ComboBox control. I will need to look into this event to see if it is actually a better choice for this scenario, but I found that the SelectedIndexChanged event does the job just fine, so for now I sticked with that event, since I am more familiar with it.)

task 1 for project

I need to produce code within Visual Basic that identify's a words position. For example, my sentence could write 'This is my Visual Basic Project'. If the user entered the word 'my', the output will open another form displaying 'Your word is in the 3rd position'. Its required to use strings then split it into an array, then using the match function give each word individual properties/positions.
I am fairly new to programming and would love any help. I would appreciate it if you could return some code for my design e.g buttons and listboxes. I have tried incredibly hard to get this program fully functioning but i'm finding it very challenging.
Really please. Many thanks!!
First of all, I am not a Visual Basic or .NET person, but I really liked the problem and so optimization of my code is possible . I am little confused by, what do you mean by match function. Are you looking for REGEX or something for string matching over here?
Anyways, based on your description, I tried to code something for you, which I think is something what you are looking for.
CODE:
The whole logic is within the click of the button "FIND POSITION OF WORD". Split the sentence then compare the entered word with each word in sentence
Public Class FindTheWord
Private Sub buttonFindTheWord_Click(sender As Object, e As EventArgs) Handles buttonFindTheWord.Click
Dim inputSentence As String = TextBox1.Text
Dim inputWord As String = TextBox2.Text
Dim SplittedSentence As String() = inputSentence.Split(" ")
Dim Position As Integer = 0
For Each word In SplittedSentence
Position = Position + 1
If (word = inputWord) Then
MessageBox.Show("Your word is at position : " + Position.ToString)
End If
Next
End Sub End Class
Hope this helps.

How to get the last child of an HTMLElement

I have written a macro in Excel that opens and parses a website and pulls the data from it. The trouble I'm having is once I'm done with all of the data on the current page I want to go to the next page. To do this I want to get the last child of the "result-stats" node. I found the lastChild function, and so came up with the following code:
'Checks to see if there is a next page
If html.getElementById("result-stats").LastChild.innerText = "Next" Then
html.getElementById("result-stats").LastChild.Click
End If
And here is the HTML that it is accessing:
<p id="result-stats">
949 results
<span class="optional"> (1.06 seconds)</span>
Modify search
Show more columns
Next
</p>
When I try to run this, I get an error. After a lot of searching I think I found the reason. According to what I read, getElementById returns an element and not a node. lastChild only works on nodes, which is why the function doesn't work here.
My question is this. Is there a clean and simple way to grab the last child of an element? Or is there a way to typecast an element to that of a node? I feel like I'm missing something obvious, but I've been at this way longer than I should have been. Any help anyone could provide would be greatly appreciated.
Thanks.
Here's a shell of how to do it. If my comments are not clear, ask away. I assumed knowledge of how to navigate to the page, wait for the browser, etc.
Sub ClickLink()
Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
'load up page and all that stuff
'process data ...
'click link
Dim doc As Object
Set doc = IE.document
Dim aLinks As Object, sLink As Object
For Each sLink In doc.getElementsByTagName("a")
If sLink.innerText = "Next" Then 'may need to play with this, if `innerttext' doesn't work
sLink.Click
Exit For
End If
Next
End Sub

HtmlAgilityPack - Get only the first match

Im new to HtmlAgilityPack and i cant figure out how to stop the search after the first match has been found.
Dim site As HtmlAgilityPack.HtmlWeb = New HtmlWeb()
Dim document As HtmlAgilityPack.HtmlDocument = site.Load("website")
For Each table As HtmlNode In document.DocumentNode.SelectNodes("//td[#class='forum_thread_post']//a[#href]")
ListBox1.Items.Add(table.InnerText)
Next
The problem is that the website contains many td[#class='forum_thread_post' nodes and i only need the first one. I experimented with SelectSingleNode too, but i couldnt even get that to work but im thinking that is the way to do it? If getting the single match to a textbox is better/easier i would want that.
Here is a picture: http://oi42.tinypic.com/25fmwr5.jpg
I want to get the a title or a alt from the picture
What was wrong with SelectSingleNode?
Dim table = document.DocumentNode.SelectSingleNode("//td[#class='forum_thread_post']//a[#href]")
ListBox1.Items.Add(table.InnerText)
If the InnerText could be empty:
Dim tables = document.DocumentNode.SelectNodes("//td[#class='forum_thread_post']//a[#href]")
Dim firstNotEmpty = tables.FirstOrDefault(Function(t) Not String.IsNullOrWhiteSpace(t.InnerText))

VB.NET Webbrowser.Document - what you see is not what you can get

My attempts at writing a simple crawler seem to be confounded by the fact that my target webpage (as would appear in the UI browser control, or through a typical browser application) is not completely accessible as an HTMLDocument (due to frames, javascript, etc.)
The code below executes, and the correct webpage (e.g. the one displaying items 50-59) can even be seen in the control, but where I would expect the “next page” hyperlink retrieved to be “...&start=60”, I see something else – the one corresponding to opening the first catalog page “...&start=10”.
What is odd, is that if I press the button a second time, I DO get what I’m looking for. Even odder to me, if I inserted a MsgBox, say right after I’ve looped to wait until WebBrowserReadyState.Complete, then I get what I’m looking for.
Private Sub ButtonGo_Click(sender As System.Object, e As System.EventArgs) Handles ButtonGo.Click
'start at this URL
'e.g. http://www.somewebsite.com/properties?l=Dallas+TX&co=US&start=50
catalogPageURL = TextBoxInitialURL.Text
WebBrowser1.Navigate(catalogPageURL)
While WebBrowser1.ReadyState <> WebBrowserReadyState.Complete
Application.DoEvents()
End While
'Locate the URL associated with the NEXT>> hyperlink
Dim allLinksInDocument As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")
Dim strNextPgLink As String = ""
For Each link As HtmlElement In allLinksInDocument
If link.GetAttribute("className") = "next" Then
strNextPgLink = link.GetAttribute("href")
End If
Next
End Sub
I’ve googled around enough to try things like using a WebBrowser1.DocumentCompleted
event, but that still didn’t work. I’ve tried inserting sleep commands.
I’ve avoided using WebClient and regular expressions, the way I would have ordinarily done this, because I’m convinced using the DOM will be easier for other things I have planned down the road, and I’m aware of HTML Agility Pack but not ambitious enough to learn it. Because it seems there has to be a simple way to have this dang webbrowser.document object synchronized with the stuff you can actually see.
If this is because of javascript, is there a way I can tell the webbrowser to just execute them all?
First question on the forum, looking forward to more (smarter ones hopefully)
Be warned when using webbrowser1.Document or something similar - you will not get 'raw html'
Example: (assume wbMain is a webbrowser control)
RTB_RawHTML.Text = wbMain.DocumentText
Try
RTB_BodyHTML.Text = wbMain.Document.Body.OuterHtml
Catch
debugMessage("Body tag not found.")
End Try
in this example, the code in the body tag as displayed in the body tag portion of RTB_RawHTML will NOT perfectly match the html as displayed in RTB_BodyHTML. Accessing it through (yourwebbrowserhere).Document.Body.OuterHtml appears to 'clean' it somewhat as opposed to the 'raw' html as retreived by (yourwebbrowserhere).DocumentText
This was a problem for me when i was making a web scraper, as it would continually throw me off - sometimes i would try to match a tag and it would find it, and other times it wouldnt even though i was sure it was there. The reason was that i was trying to match the raw html, but i needed to match the 'cleaned' html.
Im not sure if this will help you isolate the problem or not - for me it did.