Extract all form elements name htmlagilitypack - vb.net

i have this code to extract all form input element in html document. currently, i cant get select, textarea and other elements except input element.
Dim htmldoc As HtmlDocument = New HtmlDocument()
htmldoc.LoadHtml(txtHtml.Text)
Dim root As HtmlNode = htmldoc.DocumentNode
If root Is Nothing Then
tsslStatus.Text = "Error parsing html"
End If
' parse the page content
For Each InputTag As HtmlNode In root.SelectNodes("//input")
'get title
Dim attName As String = Nothing
Dim attType As String = Nothing
For Each att As HtmlAttribute In InputTag.Attributes
Select Case att.Name.ToLower
Case "name"
attName = att.Value
Case "type"
attType = att.Value
End Select
If attName Is Nothing OrElse attType Is Nothing Then
Continue For
End If
Dim sResult As String = String.Format("Type={0},Name={1}", attType, attName).ToLower
If txtResult.Text.Contains(sResult) = False Then
'Debug.Print(sResult)
txtResult.Text &= sResult & vbCrLf
End If
Next
Next
Can anyone help me on how to get all elements in all forms in the html document?

i found the solution, what i did was use this
Dim Tags As HtmlNodeCollection = docNode.SelectNodes("//input | //select | //textarea")
thanks for looking

Related

Extract specific data points from pdf. Getting "Class not registered" error

I am trying to search strings in pdf and extract data points after those strings. But I get "Class not registered" error in this line of code. The reference I use is Adobe Acrobat 8.0 Type Library
Please help. Thanks
If (objAvDoc.Open(strFilename, "")) Then
Syntax error in above line
Sub callFunc()
getTextFromPDF ("C:\XXXXXX\XXXXX\6. CDS vs FA\052835022.pdf")
End Sub
Function getTextFromPDF(ByVal strFilename As String) As String
Dim objAVDoc As New AcroAVDoc
Dim objPDDoc As New AcroPDDoc
Dim objPage As AcroPDPage
Dim objSelection As AcroPDTextSelect
Dim objHighlight As AcroHiliteList
Dim pageNum As Long
Dim strText As String
strText = ""
If (objAvDoc.Open(strFilename, "")) Then
Set objPDDoc = objAVDoc.GetPDDoc
For pageNum = 0 To objPDDoc.GetNumPages() - 1
Set objPage = objPDDoc.AcquirePage(pageNum)
Set objHighlight = New AcroHiliteList
objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
Set objSelection = objPage.CreatePageHilite(objHighlight)
If Not objSelection Is Nothing Then
For tCount = 0 To objSelection.GetNumText - 1
strText = strText & objSelection.GetText(tCount)
Next tCount
End If
Next pageNum
objAVDoc.Close 1
End If
getTextFromPDF = strText
End Function

VB.NET SetAttribute in WebBrowser doens't work

I did some research previously, but all the answers do not work.
The "value" attribute exists in the element but does not appear in the webBrowser, nor in the input.
This is my code until then, I need the webBrowser to read an html file, then load your answers or values ​​from your inputs from a database.
PS:
My application is built in real time, there is no webbrowser control on the screen, it is created shortly after reading the html file and only then it is placed inside a panel.
Dim webBrowser As WebBrowser = New WebBrowser
Dim _doc As HtmlDocument
Dim htmlPath As String = "C:\ePrimeCare\Platis\Debug\Protocolos\" +
nomeProtocolo + "_" + idSistema.ToString() + ".html"
webBrowser.ScriptErrorsSuppressed = True
webBrowser.Navigate(htmlPath)
_doc = webBrowser.Document.OpenNew(False)
'webBrowser.DocumentText = IO.File.ReadAllText(htmlPath).ToString()
'webBrowser.Document.OpenNew(False)
'RetornaRespostasAnteriores(idSistema, idFicha, nomeProtocolo, _doc, Convert.ToDateTime(dtVisita))
_doc.Title = nomeProtocolo
_doc.Write(IO.File.ReadAllText(htmlPath).ToString())
Dim carregaRespostas As CarregarRespostaProtocoloHTML = New CarregarRespostaProtocoloHTML
Dim respostas As DataTable = carregaRespostas.BuscarRespostasProtocoloAnterior(idFicha, idSistema, dtVisita)
Dim idopcaoitem As String = 0
Dim idsetitem As String = 0
Dim value As DataRow()
Dim strArr As String()
For Each element As HtmlElement In _doc.GetElementsByTagName("input")
Dim type As String = element.GetAttribute("type")
Select Case type
Case "text"
strArr = element.GetAttribute("id").Split("_") 'For get the two ids
idopcaoitem = strArr(0)
value = respostas.Select(("IDOPCAOITEM = " + idopcaoitem.ToString()))
If value.Length > 0 Then
element.SetAttribute("value", value(0)(2).ToString())'Here i try to set the value, but does not work
End If
Case "radio"
Debug.WriteLine("Input de radio")
Case "checkbox"
Debug.WriteLine("Input de checkbox")
Case "hidden"
Debug.WriteLine("Input de hidden")
Case Else
Debug.WriteLine("Outro input")
End Select
Next
_doc.Write(IO.File.ReadAllText(htmlPath).ToString())
webBrowser.Refresh(WebBrowserRefreshOption.Completely)
webBrowser.Dock = Dock.Fill
pnlMain.Controls.Add(webBrowser)
All your document rewriting and the refresh you do at the end will overwrite any changes you made to it.
'Either of these will revert the document back to its original state.
_doc.Write(IO.File.ReadAllText(htmlPath).ToString())
webBrowser.Refresh(WebBrowserRefreshOption.Completely)
You don't even need to call _doc.Write() as WebBrowser1.Navigate(htmlPath) will work just as fine.
New code:
Dim webBrowser As New WebBrowser 'Shorthand statement.
Dim _doc As HtmlDocument
Dim htmlPath As String = "C:\ePrimeCare\Platis\Debug\Protocolos\" +
nomeProtocolo + "_" + idSistema.ToString() + ".html"
webBrowser.ScriptErrorsSuppressed = True
webBrowser.Navigate(htmlPath)
_doc = webBrowser.Document 'Removed OpenNew().
_doc.Title = nomeProtocolo
Dim carregaRespostas As New CarregarRespostaProtocoloHTML 'Another shorthand statement.
Dim respostas As DataTable = carregaRespostas.BuscarRespostasProtocoloAnterior(idFicha, idSistema, dtVisita)
(...your variables...)
For Each element As HtmlElement In _doc.GetElementsByTagName("input")
(...your code...)
Next
'(Removed _doc.Write() and Refresh() since they will undo all changes)
webBrowser.Dock = Dock.Fill
pnlMain.Controls.Add(webBrowser)

HtmlAgilityPack - SelectNodes

I'm trying to retrieve a <p class> element.
<div class="thread-plate__details">
<h3 class="thread-plate__title">(S) HexHunter BOW</h3>
<p class="thread-plate__summary">created by Aazoth</p> <!-- (THIS ONE) -->
</div>
But with no luck.
The code I am using is below:
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[#class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("created by ")
Select attribute.Value
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
It runs but I can't get the 'created by XXX' text.
I've tried many ways but got no luck, an hand would be appreciated.
Thanks in advance everyone.
Looks like you are looking wrong string in the attribute.Value. What I see is that attribute.Value.StartsWith("created by ") must be changed to this one attribute.Value.StartsWith("thread-plate__summary").
And to grab inner content of node you have to do this: Select node.InnerText;
' the example url to scrape
Dim url As String = "http://services.runescape.com/m=forum/forums.ws?39,40,goto," & Label6.Text
Dim source As String = GetSource(url)
If source IsNot Nothing Then
' create a new html document and load the pages source
Dim htmlDocument As New HtmlDocument
htmlDocument.LoadHtml(source)
' Create a new collection of all href tags
Dim nodes As HtmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//p[#class]")
' Using LINQ get all href values that start with http://
' of course there are others such as www.
Dim links =
(
From node
In nodes
Let attribute = node.Attributes("class")
Where attribute.Value.StartsWith("thread-plate__summary")
Select node.InnerText
)
Me.ListBox1a.Items.AddRange(links.ToArray)
Dim o, j As Long
For o = 0 To ListBox1a.Items.Count - 1
For j = ListBox1a.Items.Count - 1 To (o + 1) Step -1
If ListBox1a.Items(o) = ListBox1a.Items(j) Then
ListBox1a.Items.Remove(ListBox1a.Items((j)))
End If
Next
Next
For i As Integer = 0 To Me.ListBox1a.Items.Count - 1
Me.ListBox1a.Items(i) = Me.ListBox1a.Items(i).ToString.Replace("created by ", "")
Next
For Each s As String In ListBox1a.Items
Dim lvi As New NetSeal.NSListView
lvi.Text = s
NsListView1.Items.Add(lvi.Text)
Next
I hope this will work for you.

VBA is it possible to get location of keyword on IE webpage

I want to use VBA to open up a webpage for me (this webpage is made up of HTML with cells of data), find some keywords, and email out the keywords and a certain number of rows of data above and below the keywords. To do this though, I need to be able to find the location of the keywords (eg. row 3, column 2, or line 4 characters 4-10, etc.). Are there any commands in the Internet Explorer Library that will allow me to do this? So far I have code for one keyword only, that will go to the keyword and select/highlight it. Now I need to find out how to grab a certain amount of rows above/below it and send it out.
Also a side question: If you know a good way to modify my current code to create a nested loop that scans through the whole webpage, and for multiple keywords that would be very helpful!
Sub subFindScrollIE()
Dim boolFound As Boolean
Dim ie As InternetExplorer
Set ie = New InternetExplorer
ie.Navigate "my URL HERE"
strTemp = "KEYWORD1"
Do Until ie.ReadyState = READYSTATE_COMPLETE
'DoEvents
Loop
ie.Visible = True
Set txt = ie.Document.body.createTextRange()
boolFound = txt.findText(strTemp)
txt.moveStart "character", -1
txt.findText strTemp
txt.Select
txt.ScrollIntoView
Set ie = Nothing
End Sub
You could continue with the approach you have started using if you use Regular Expressions to locate the text you are after and the surrounding text.
Personally, I would favor using html objects to find what you're after. Here is some example code to iterate through generic tables:
Sub subFindScrollIE()
Dim strTemp() As Variant, output() As String, txt As String
Dim tr As HTMLTableRow, r As Integer, i As Integer
Dim tRows As IHTMLElementCollection
Dim xlR As Byte, c As Byte
Dim ie As InternetExplorerMedium
Set ie = New InternetExplorerMedium
ie.Visible = True
ie.Navigate "E:\Dev\table.htm"
strTemp = Array("abc", "mno", "vwx", "efg")
Do Until (ie.ReadyState = 4 Or ie.ReadyState = 3): Loop
Set tRows = ie.Document.body.getElementsByTagName("tr")
xlR = 2
' loop through rows
For r = 0 To tRows.Length - 1
Set tr = tRows(r)
' loop through search text
For i = 0 To UBound(strTemp)
' search row for string
txt = LCase(tr.innerHTML)
If (InStr(txt, LCase(strTemp(i))) > 0) Then
' search string found. split table data into array
txt = tr.innerHTML
txt = Replace(txt, "</td><td>", "~")
txt = Replace(txt, "<td>", "")
txt = Replace(txt, "</td>", "")
output = Split(txt, "~")
' populate cells from array
For c = 0 To UBound(output)
Sheet1.Cells(xlR, c + 2) = output(c)
Next c
xlR = xlR + 2
End If
Next i
Next r
ie.Quit
Set ie = Nothing
End Sub

Search line in text file and return value from a set starting point vb.net

I'm currently using the following to read the contents of all text files in a directory into an array
Dim allLines() As String = File.ReadAllLines(txtfi.FullName)
Within the text files are only 6 lines that all follow the same format and will read something like
forecolour=black
I'm trying to then search for the word "forecolour" and retrieve the information after the "=" sign (black) so i can then populate the below code
AllDetail(numfiles).uPath = ' this needs to be the above result
I've only posted parts of the code but if it helps i can post the rest. I just need a little guidance if possible
Thanks
This is the full code
Dim numfiles As Integer
ReDim AllDetail(0 To 0)
numfiles = 0
lb1.Items.Clear()
Dim lynxin As New IO.DirectoryInfo(zMailbox)
lb1.Items.Clear()
For Each txtfi In lynxin.GetFiles("*.txt")
Dim allLines() As String = File.ReadAllLines(txtfi.FullName)
ReDim Preserve AllDetail(0 To numfiles)
AllDetail(numfiles).uPath = 'Needs to be populated
AllDetail(numfiles).uName = 'Needs to be populated
AllDetail(numfiles).uCode = 'Needs to be populated
AllDetail(numfiles).uOps = 'Needs to be populated
lb1.Items.Add(IO.Path.GetFileNameWithoutExtension(txtfi.Name))
numfiles = numfiles + 1
Next
End Sub
AllDetail(numfiles).uPath = Would be the actual file path
AllDetail(numfiles).uName = Would be the detail after “unitname=”
AllDetail(numfiles).uCode = Would be the detail after “unitcode=”
AllDetail(numfiles).uOps = Would be the detail after “operation=”
Within the text files that are being read there will be the following lines
Unitname=
Unitcode=
Operation=
Requirements=
Dateplanned=
For the purpose of this array I just need the unitname, unitcode & operation. Going forward I will need the dateplanned as when this is working I want to try and work out how to only display the information if the dateplanned matches the date from a datepicker. Hope that helps and any guidance or tips are gratefully received
If your file is not very big you could simply
Dim allLines() As String = File.ReadAllLines(txtfi.FullName)
For each line in allLines
Dim parts = line.Split("="c)
if parts.Length = 2 andalso parts(0) = "unitname" Then
AllDetails(numFiles).uName = parts(1)
Exit For
End If
Next
If you are absolutely sure of the format of your input file, you could also use Linq to remove the explict for each
Dim line = allLines.Where(Function(x) (x.StartsWith("unitname"))).SingleOrDefault()
if line IsNot Nothing then
AllDetails(numFiles).uName = line.Split("="c)(1)
End If
EDIT
Looking at the last details added to your question I think you could rewrite your code in this way, but still a critical piece of info is missing.
What kind of object is supposed to be stored in the array AllDetails?
I suppose you have a class named FileDetail as this
Public class FileDetail
Public Dim uName As String
Public Dim uCode As String
Public Dim uCode As String
End Class
....
numfiles = 0
lb1.Items.Clear()
Dim lynxin As New IO.DirectoryInfo(zMailbox)
' Get the FileInfo array here and dimension the array for the size required
Dim allfiles = lynxin.GetFiles("*.txt")
' The array should contains elements of a class that have the appropriate properties
Dim AllDetails(allfiles.Count) as FileDetail
lb1.Items.Clear()
For Each txtfi In allfiles)
Dim allLines() As String = File.ReadAllLines(txtfi.FullName)
AllDetails(numFiles) = new FileDetail()
AllDetails(numFiles).uPath = txtfi.FullName
Dim line = allLines.Where(Function(x) (x.StartsWith("unitname="))).SingleOrDefault()
if line IsNot Nothing then
AllDetails(numFiles).uName = line.Split("="c)(1)
End If
line = allLines.Where(Function(x) (x.StartsWith("unitcode="))).SingleOrDefault()
if line IsNot Nothing then
AllDetails(numFiles).uName = line.Split("="c)(1)
End If
line = allLines.Where(Function(x) (x.StartsWith("operation="))).SingleOrDefault()
if line IsNot Nothing then
AllDetails(numFiles).uOps = line.Split("="c)(1)
End If
lb1.Items.Add(IO.Path.GetFileNameWithoutExtension(txtfi.Name))
numfiles = numfiles + 1
Next
Keep in mind that this code could be really simplified if you start using a List(Of FileDetails)