HtmlAgilityPack - SelectNodes - vb.net

I'm trying to retrieve a <p class> element.
<div class="thread-plate__details">
<h3 class="thread-plate__title">(S) HexHunter BOW</h3>
<p class="thread-plate__summary">created by Aazoth</p> <!-- (THIS ONE) -->
</div>
But with no luck.
The code I am using is below:
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[#class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("created by ")
Select attribute.Value
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
It runs but I can't get the 'created by XXX' text.
I've tried many ways but got no luck, an hand would be appreciated.
Thanks in advance everyone.

Looks like you are looking wrong string in the attribute.Value. What I see is that attribute.Value.StartsWith("created by ") must be changed to this one attribute.Value.StartsWith("thread-plate__summary").
And to grab inner content of node you have to do this: Select node.InnerText;
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[#class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("thread-plate__summary")
Select node.InnerText
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
I hope this will work for you.

Related

Xpath syntax for HtmlAgilityPack row

I'm using the following code:
Dim cl As WebClient = New WebClient()
Dim html As String = cl.DownloadString(url)
Dim doc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(html)
Dim table As HtmlNode = doc.DocumentNode.SelectSingleNode("//table[#class='table']")
For Each row As HtmlNode In table.SelectNodes(".//tr")
Dim inner_text As String = row.InnerHtml.Trim()
Next
My inner_text for each row looks like this, with different years and data:
"<th scope="row">2015<!-- --> RG Journal Impact</th><td>6.33</td>"
Each row has a th element and a td element and I have tried different ways to pull the value but I can't seem to pull them one after the other by looping the column collection. How can I pull just the th element and the td element using the correct Xpath syntax ?
Until I can use better code I'll use standard parsing functions:
Dim hname As String = row.InnerHtml.Trim()
Dim items() As String = hname.Split("</td>")
Dim year As String = items(1).Substring(items(1).IndexOf(">") + 1)
Dim value As String = items(4).Substring(items(4).IndexOf(">") + 1)
If value.ToLower.Contains("available") Then
value = ""
End If
You can carry on with querying the row:
Option Infer On
Option Strict On
Imports HtmlAgilityPack
Module Module1
Sub Main()
Dim h = "<html><head><title></title></head><body>
<table class=""table"">
<tr><th scope=""row"">2015<!-- --> RG Journal Impact</th><td>6.33</td></tr>
<tr><th scope=""row"">2018 JIR</th><td>9.99</td></tr>
</table>
</body></html>"
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(h)
Dim table = doc.DocumentNode.SelectSingleNode("//table[#class='table']")
For Each row In table.SelectNodes(".//tr")
Dim yearData = row.SelectSingleNode(".//th").InnerText.Split(" "c)(0)
Dim value = row.SelectSingleNode(".//td").InnerText
Console.WriteLine($"Year: {yearData} Value: {value}")
Next
Console.ReadLine()
End Sub
End Module
Outputs:
Year: 2015 Value: 6.33
Year: 2018 Value: 9.99

VB.NET SetAttribute in WebBrowser doens't work

I did some research previously, but all the answers do not work.
The "value" attribute exists in the element but does not appear in the webBrowser, nor in the input.
This is my code until then, I need the webBrowser to read an html file, then load your answers or values ​​from your inputs from a database.
PS:
My application is built in real time, there is no webbrowser control on the screen, it is created shortly after reading the html file and only then it is placed inside a panel.
Dim webBrowser As WebBrowser = New WebBrowser
Dim _doc As HtmlDocument
Dim htmlPath As String = "C:\ePrimeCare\Platis\Debug\Protocolos\" +
nomeProtocolo + "_" + idSistema.ToString() + ".html"
webBrowser.ScriptErrorsSuppressed = True
webBrowser.Navigate(htmlPath)
_doc = webBrowser.Document.OpenNew(False)
'webBrowser.DocumentText = IO.File.ReadAllText(htmlPath).ToString()
'webBrowser.Document.OpenNew(False)
'RetornaRespostasAnteriores(idSistema, idFicha, nomeProtocolo, _doc, Convert.ToDateTime(dtVisita))
_doc.Title = nomeProtocolo
_doc.Write(IO.File.ReadAllText(htmlPath).ToString())
Dim carregaRespostas As CarregarRespostaProtocoloHTML = New CarregarRespostaProtocoloHTML
Dim respostas As DataTable = carregaRespostas.BuscarRespostasProtocoloAnterior(idFicha, idSistema, dtVisita)
Dim idopcaoitem As String = 0
Dim idsetitem As String = 0
Dim value As DataRow()
Dim strArr As String()
For Each element As HtmlElement In _doc.GetElementsByTagName("input")
Dim type As String = element.GetAttribute("type")
Select Case type
Case "text"
strArr = element.GetAttribute("id").Split("_") 'For get the two ids
idopcaoitem = strArr(0)
value = respostas.Select(("IDOPCAOITEM = " + idopcaoitem.ToString()))
If value.Length > 0 Then
element.SetAttribute("value", value(0)(2).ToString())'Here i try to set the value, but does not work
End If
Case "radio"
Debug.WriteLine("Input de radio")
Case "checkbox"
Debug.WriteLine("Input de checkbox")
Case "hidden"
Debug.WriteLine("Input de hidden")
Case Else
Debug.WriteLine("Outro input")
End Select
Next
_doc.Write(IO.File.ReadAllText(htmlPath).ToString())
webBrowser.Refresh(WebBrowserRefreshOption.Completely)
webBrowser.Dock = Dock.Fill
pnlMain.Controls.Add(webBrowser)
All your document rewriting and the refresh you do at the end will overwrite any changes you made to it.
'Either of these will revert the document back to its original state.
_doc.Write(IO.File.ReadAllText(htmlPath).ToString())
webBrowser.Refresh(WebBrowserRefreshOption.Completely)
You don't even need to call _doc.Write() as WebBrowser1.Navigate(htmlPath) will work just as fine.
New code:
Dim webBrowser As New WebBrowser 'Shorthand statement.
Dim _doc As HtmlDocument
Dim htmlPath As String = "C:\ePrimeCare\Platis\Debug\Protocolos\" +
nomeProtocolo + "_" + idSistema.ToString() + ".html"
webBrowser.ScriptErrorsSuppressed = True
webBrowser.Navigate(htmlPath)
_doc = webBrowser.Document 'Removed OpenNew().
_doc.Title = nomeProtocolo
Dim carregaRespostas As New CarregarRespostaProtocoloHTML 'Another shorthand statement.
Dim respostas As DataTable = carregaRespostas.BuscarRespostasProtocoloAnterior(idFicha, idSistema, dtVisita)
(...your variables...)
For Each element As HtmlElement In _doc.GetElementsByTagName("input")
(...your code...)
Next
'(Removed _doc.Write() and Refresh() since they will undo all changes)
webBrowser.Dock = Dock.Fill
pnlMain.Controls.Add(webBrowser)

I want to align Dynamic pictureboxs with Dynamic Labels

Here is my function that allows me to get from a webpage the image link for my PictureBoxs and the title for my labels
Public Shared Function getPics(website As String, pattern As String)
Dim tempTitles As New List(Of String)()
Dim tempTitles2 As New List(Of String)()
Dim lestitres As New List(Of titlesclass)
Dim webClient As New WebClient()
webClient.Headers.Add("user-agent", "null")
Dim counter As Integer = 0
Dim counter2 As Integer = 0
Dim counter3 As Integer = 0
Dim counter4 As Integer = 1
Dim counter5 As Integer = 0
Dim counter6 As Integer = 0
'If the website happens to go offline, at least your application wont crash.
Dim content As String = webClient.DownloadString(website)
Dim query = From title In Regex.Matches(content, pattern).Cast(Of Match)
Select New With {Key .Link = String.Concat("http://www.gamestop.com", title.Groups("Data").Value),
Key .Title = title.Groups("Dataa").Value}
For Each letitre In query.Distinct
'MsgBox(letitre.Link & " ======= " & letitre.Title)
Next
'For Each title As Match In (New Regex(pattern).Matches(content)) 'Since you are only pulling a few strings, I thought a regex would be better.
' Dim letitre As New titlesclass
' letitre.Link = title.Groups("Data").Value
' letitre.Title = title.Groups("Dataa").Value
' lestitres.Add(letitre)
' 'tempTitles2.Add(title.Groups("Dataa").Value)
'Next
Dim titles = tempTitles.Distinct().ToArray() 'remove duplicate titles
'Dim titles2 = tempTitles2.Distinct().ToArray()
Dim titles2 = lestitres.Distinct().ToArray()
lestitres.Clear()
'For Each title As titlesclass In titles2
For Each letitre In query.Distinct
'ListBox.Items.Add(title) 'what you do with the values from here is up to you.
Dim ImageInBytes() As Byte = webClient.DownloadData(letitre.Link)
Dim ImageStream As New IO.MemoryStream(ImageInBytes)
Dim MyPic As New PictureBox
Dim MyLab As New Label
If (counter2 > 0 AndAlso ((counter2 Mod 4 = 0) OrElse counter3 = 1)) Then
counter3 = 1
counter4 += 1
If (counter2 Mod 4 = 0) Then
counter5 = 0
counter6 += 170
End If
MyPic.Location = New Point(counter5, MyPic.Location.Y + counter6)
MyLab.Location = New Point(counter5, MyPic.Location.Y + counter6)
If counter4 = 4 Then
counter3 = 0
End If
counter5 += 200
Else
MyPic.Location = New Point(counter, MyPic.Location.Y)
MyLab.Location = New Point(counter, MyPic.Location.Y)
End If
counter += 200
counter2 += 1
MyPic.SizeMode = PictureBoxSizeMode.AutoSize
MyLab.Text = letitre.Title
MyPic.Image = New System.Drawing.Bitmap(ImageStream)
Form2.Controls.Add(MyPic)
Form2.Controls.Add(MyLab)
Next
End Function
The class named titlesclass contain two elements which i will store my Link and Title in :
Public Class titlesclass
Public Property Link As String
Public Property Title As String
End Class
And My little button does all the work
Dim websiteURL1 As String = "http://www.gamestop.com/collection/upcoming-video-games"
Class1.getPics(websiteURL1, "<img src=""(?<Data>[^>]*)""><p>(?<Dataa>[^>]*)<br>")
What i'm trying to do is to show 4 pictureboxs per row with the lables right bellow each picture , for now some labels doesn't show , some shows just in the right place and some shows very far bellow ! I tested the values i'm getting with a Message Box and it shows me the informations in the order i need , i'm not sure if i screw up in the x,y values or if it's something else ...
Edit : I can already show the 4 pictureboxes per row , the labels also , but the Y of some labels isn't well adjusted it can go far far bellow !
Here is some pictures that will help you to understand my situation Don't mind the buttons and listboxs , it's just for the test :
My list containt a lot of pictures , so i just showed you some when the thing work kind of well , when it shows couple of rows far from the designed picture
http://img110.xooimage.com/files/f/a/d/picture1-5239f7c.png
http://img110.xooimage.com/files/8/f/8/picture-2-5239f7e.png
http://img110.xooimage.com/files/4/7/b/picture-3-5239f80.png
http://img110.xooimage.com/files/f/0/f/picture4-5239f82.png
So I cleaned the way you where generating the position of the PictureBox by using a row principle and increment:
Note:
If you need space at the top to add information start the row count at 1 instead of 0
Note 2:
Here the image dimensions are harcoded but you could use dynamic ones which would be more fluid. I just cleaned the positioning code not the rest.
Replace your function by this one:
Public Shared Sub getPics(website As String, pattern As String)
Dim tempTitles As New List(Of String)()
Dim tempTitles2 As New List(Of String)()
Dim lestitres As New List(Of titlesclass)
Dim webClient As New WebClient()
webClient.Headers.Add("user-agent", "null")
Dim counter As Integer = 0
Dim counter2 As Integer = 0
Dim counter3 As Integer = 0
Dim counter4 As Integer = 1
Dim counter5 As Integer = 0
Dim counter6 As Integer = 0
'If the website happens to go offline, at least your application wont crash.
'Handle default proxy
Dim proxy As IWebProxy = WebRequest.GetSystemWebProxy()
proxy.Credentials = CredentialCache.DefaultCredentials
webClient.Proxy = proxy
Dim content As String = webClient.DownloadString(website)
Dim query = From title In Regex.Matches(content, pattern).Cast(Of Match)
Select New With {Key .Link = String.Concat("http://www.gamestop.com", title.Groups("Data").Value),
Key .Title = title.Groups("Dataa").Value}
Dim titles = tempTitles.Distinct().ToArray() 'remove duplicate titles
Dim titles2 = lestitres.Distinct().ToArray()
lestitres.Clear()
'Count Items
Dim item As Integer = 0
'Count Row
Dim row As Integer = 0
'image: 222*122
For Each letitre In query.Distinct
Dim ImageInBytes() As Byte = webClient.DownloadData(letitre.Link)
Dim ImageStream As New IO.MemoryStream(ImageInBytes)
Dim MyPic As New PictureBox
Dim MyLab As New Label
'x = numéro item fois largeur image
'y = numéro de ligne fois hauteur image
MyPic.Location = New Point(item * 222, row * 122)
MyLab.Location = New Point(item * 222, row * 122)
MyPic.SizeMode = PictureBoxSizeMode.AutoSize
MyLab.Text = letitre.Title
MyPic.Image = New System.Drawing.Bitmap(ImageStream)
Form1.Controls.Add(MyPic)
Form1.Controls.Add(MyLab)
'Bring Labels to front
For Each ctrl As Control In Form1.Controls
'If the control is of type label
If TypeOf ctrl Is Label Then
'Then bring to front
ctrl.BringToFront()
End If
Next
'Increment the item count
item = item + 1
'If item is multiple of 4 or could check 4 then
If item Mod 4 = 0 Then
'Reset counter
item = 0
'Increment Row
row = row + 1
End If
Next
End Sub
Example return:

How to create a New List and ADD records to it in Visual Basic?

I have a LIST which contains many record's, I want to extract records based on a condition and if the condition satisfies then I need to add the condition satisfied record into a new list.
Below is the code which I have written till now:
Module Module2
Sub Main()
Dim td
td = CreateObject("TDApiOle80.TDConnection")
td.InitConnectionEx("http://qc10dev/qcbin")
'Note: For Quality Center, connect by using the URL below:
'td.InitConnectionEx "http://<servername>:port/qcbin"
td.ConnectProjectEx("DEFAULT", "GPS_PROGRAM", "PQRST", "XYX#123")
Dim tsfact 'As TDAPIOLELib.TestSetFactory
Dim tslist 'As TDAPIOLELib.List
'Getting Random Test Set ID
'************************ACCESS ALL THE TEST SETS ******************************************************************** '
tsfact = td.TestSetFactory
tslist = tsfact.NewList("")
'************************GET THE COUNT OF TEST SETS ******************************************************************
Dim Count_Of_TestSets
Count_Of_TestSets = tslist.Count
Console.WriteLine("Count of Test Sets" + Count_Of_TestSets.ToString)
'************************GET A RANDOM TEST SET INDEX ***************************************************************
Dim TestSetID As Integer
Dim TestSetName = Nothing
Dim SerialNumber As Integer = 0
Dim AttachmentPresent
Dim tslist_Having_Attachments = Nothing
For Each TestSet In tslist
TestSetID = TestSet.ID
TestSetName = TestSet.Name
'Console.WriteLine("TestSet ID::" + TestSetID.ToString() + "Test Set Name" + TestSetName)
AttachmentPresent = TestSet.HasAttachment()
If StrComp(AttachmentPresent, "True") = 0 Then
Console.WriteLine("TestSet ID::" + TestSetID.ToString() + "Test Set Name" + TestSetName)
End If
Next
Console.WriteLine("Logic Completed, Press enter")
Console.ReadLine()
tslist = Nothing
tsfact = Nothing
td = Nothing
End Sub
End Module
If you go through the above code the base List is tslist.
From this tslist which ever records has satisfied condition StrComp(AttachmentPresent, "True") = 0
has to be added to New list say tslist_attachment.
How can create a new list and add the values?
Please let me know the steps,.
Regards,
Srihari
From your description, it seems that you want to have a list of all attachments. You want to do that by iterating the TestSet, see if it contains an attachment and if so, add it to a list.
This post on SQA Forums explains how you can directly retrieve the list of attachments from a given testset by using the TestSetTreeManager and retrieving a TestSet from it by node id. Then all attachment from this node can be gathered at once:
Snippet:
Set TestSetTreeManager = TDConnection.TestSetTreeManager
Set TestSetFolder = TestSetTreeManager.NodeById(provideId)
If TestSetFolder.HasAttachments Then
Set Attachment = TestSetFolder.Attachments
Set AttachmentList = Attachment.NewList(" ")
End if
Below is the Logic I have written to handle this :)
Module Module2
Sub Main()
Dim td
td = CreateObject("TDApiOle80.TDConnection")
td.InitConnectionEx("http://qc10dev/qcbin")
'Note: For Quality Center, connect by using the URL below:
'td.InitConnectionEx "http://<servername>:port/qcbin"
td.ConnectProjectEx("DEFAULT", "GPS_PROGRAM", "svanumu", "ABCD")
Dim tsfact As TDAPIOLELib.TestSetFactory
Dim tslist As TDAPIOLELib.List
Dim Temporary_List As TDAPIOLELib.List = Nothing
Temporary_List = New TDAPIOLELib.List()
'Getting Random Test Set ID
'************************ACCESS ALL THE TEST SETS **************************************************​****************** '
tsfact = td.TestSetFactory
tslist = tsfact.NewList("")
'************************GET THE COUNT OF TEST SETS **************************************************​****************
Dim Count_Of_TestSets
Count_Of_TestSets = tslist.Count
Console.WriteLine("Count of Test Sets" + Count_Of_TestSets.ToString)
'************************GET A RANDOM TEST SET INDEX **************************************************​*************
Dim TestSetID As Integer
Dim TestSetName = Nothing
Dim SerialNumber As Integer = 0
Dim AttachmentPresent
'Dim tslist_Having_Attachments As TDAPIOLELib.TestSetFactory
Dim TestSetID1 = Nothing
Dim TestSetName1 = Nothing
For Each TestSet In tslist
TestSetID = TestSet.ID
TestSetName = TestSet.Name
AttachmentPresent = TestSet.HasAttachment()
If StrComp(AttachmentPresent, "True") = 0 Then
Temporary_List.Add(TestSet)
'Console.WriteLine("TestSetID::" + TestSetID.ToString + "TestSetName::" + TestSetName + "is Added from temporary list")
End If
Next
'************************GET THE COUNT OF TEST SETS IN THE TEMPORARY LIST **************************************************​****************
Dim Count_Of_TestSets_In_Temporary_List
Count_Of_TestSets_In_Temporary_List = Temporary_List.Count
Console.WriteLine("Count_Of_TestSets_In_Temporary_​List" + Count_Of_TestSets_In_Temporary_List.ToString)
Console.WriteLine("Logic Completed, Press enter")
Console.ReadLine()
tslist = Nothing
tsfact = Nothing
td = Nothing
End Sub
End Module
Regards,
Srihari

Extract all form elements name htmlagilitypack

i have this code to extract all form input element in html document. currently, i cant get select, textarea and other elements except input element.
Dim htmldoc As HtmlDocument = New HtmlDocument()
htmldoc.LoadHtml(txtHtml.Text)
Dim root As HtmlNode = htmldoc.DocumentNode
If root Is Nothing Then
tsslStatus.Text = "Error parsing html"
End If
' parse the page content
For Each InputTag As HtmlNode In root.SelectNodes("//input")
'get title
Dim attName As String = Nothing
Dim attType As String = Nothing
For Each att As HtmlAttribute In InputTag.Attributes
Select Case att.Name.ToLower
Case "name"
attName = att.Value
Case "type"
attType = att.Value
End Select
If attName Is Nothing OrElse attType Is Nothing Then
Continue For
End If
Dim sResult As String = String.Format("Type={0},Name={1}", attType, attName).ToLower
If txtResult.Text.Contains(sResult) = False Then
'Debug.Print(sResult)
txtResult.Text &= sResult & vbCrLf
End If
Next
Next
Can anyone help me on how to get all elements in all forms in the html document?
i found the solution, what i did was use this
Dim Tags As HtmlNodeCollection = docNode.SelectNodes("//input | //select | //textarea")
thanks for looking