How do I search through a string for a particular hyperlink in Visualbasic.net? - vb.net

I have a written a program which downloads a webpage's source but now I want to search the source for a particular link I know the link is written like this:
<b>Geographical Survey Work</b>
Is there anyway of using "Geographical Survey Work" as criteria to retrieve the link? The code I am using to download the source to a string is this:
Dim sourcecode As String = ((New Net.WebClient).DownloadString("http://examplesite.com"))
So just to clarify I want to type into an input box "Geographical Survey Work" for instance and "/internet/A2" to popup in a messagebox? I think it can be done using a regex, but that's a bit beyond me. Any help would be great.

With HTMLAgilityPack:
Dim vsPageHTML As String = "<html>... your webpage HTML code ...</html>"
Dim voHTMLDoc.LoadHtml(vsPageHTML) : vsPageHTML = ""
Dim vsURI As String = ""
Dim voNodes As HtmlAgilityPack.HtmlNodeCollection = voHTMLDoc.SelectNodes("//a[#href]")
If Not IsNothing(voNodes) Then
For Each voNode As HtmlAgilityPack.HtmlNode In voNodes
If voNode.innerHTML.toLower() = "<b>geographical survey work</b>" Then
vsURI = voNode.GetAttributeValue("href", "")
Exit For
End If
Next
End If
voNodes = Nothing : voHTMLDoc = Nothing
Do whatever you want with vsURI.
You might need to tweak the code a bit as I'm writing free-hand.

Related

OpenXML: Losing Custom Document Property After Editing Word Document

Using DocumentFormat.OpenXML, I am trying to add a custom property to a Word document and then later read the property. The following code "appears" to do just that:
Dim os As OpenSettings = New OpenSettings() With {
.AutoSave = False
}
Dim propVal As String = "Test Value"
Using doc As WordprocessingDocument = WordprocessingDocument.Open(filename, True, os)
Dim cPart As CustomFilePropertiesPart = doc.CustomFilePropertiesPart
If cPart Is Nothing Then
cPart = doc.AddCustomFilePropertiesPart
cPart.Properties = New DocumentFormat.OpenXml.CustomProperties.Properties()
End If
Dim cPart As CustomFilePropertiesPart = doc.CustomFilePropertiesPart
Dim cProps As Properties = cPart.Properties
For Each prop As CustomDocumentProperty In cProps
If prop.Name = "TranscriptID" Then
prop.Remove()
Exit For
End If
Next
Dim newProp As CustomDocumentProperty = New CustomDocumentProperty() With {
.Name = "TranscriptID"
}
newProp.VTBString = New VTBString(propVal)
newProp.FormatId = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
cProps.AppendChild(newProp)
Dim pid As Integer = 2
For Each item As CustomDocumentProperty In cProps
item.PropertyId = pid
pid += 1
Next
cProps.Save()
End Using
This code is modeled after code found here:
https://learn.microsoft.com/en-us/office/open-xml/how-to-set-a-custom-property-in-a-word-processing-document
It appears to work in this scenario:
Execute code from above.
Execute code from above again.
At #2 I expect to find the CustomFilePropertiesPart and the property value and my expectation is met.
The problem appears in this scenario:
Execute code from above.
Open document using Microsoft Word, save and close.
Execute code from above again.
What happens in this scenario is that the CustomFilePropertiesPart is missing, whereas it should be found. It is as if Microsoft Word does not successfully read this object, so when the document is save, the object is lost. This suggests to me that there is something that there is something wrong with my code. If you can see what it is, or if you have a comparable working example that I could compare it with, I would appreciate hearing from you. I feel like I correctly followed the Microsoft example, but obviously I did not and I am having trouble seeing where I departed. Thanks.
OK, I found this wonderful tool called the Office Productivity Tool. It has a code generation feature, so I was able to compare what I was doing with what Word does. Basically the problem was with setting the property value. This snippet does the trick:
Dim cProps As Properties = cPart.Properties
Dim val As DocumentFormat.OpenXml.VariantTypes.VTLPWSTR = New DocumentFormat.OpenXml.VariantTypes.VTLPWSTR
val.Text = tr.ID.ToString
Dim newProp As CustomDocumentProperty = New CustomDocumentProperty() With {
.Name = "TranscriptID",
.FormatId = "{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
}
newProp.Append(val)
cProps.AppendChild(newProp)

Grab specific part of text from a local html file and use it as variable

I am making a small "home" application using VB. As the title says, I want to grab a part of text from a local html file and use it as variable, or put it in a textbox.
I have tried something like this...
Private Sub Open_Button_Click(sender As Object, e As EventArgs) Handles Open_Button.Click
Dim openFileDialog As New OpenFileDialog()
openFileDialog.CheckFileExists = True
openFileDialog.CheckPathExists = True
openFileDialog.FileName = ""
openFileDialog.Filter = "All|*.*"
openFileDialog.Multiselect = False
openFileDialog.Title = "Open"
If openFileDialog.ShowDialog = Windows.Forms.DialogResult.OK Then
Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)
TextBox.Text = fileReader
End If
End Sub
The result is to load the whole html code inside this textbox. What should I do so to grab a specific part of html files's code? Let's say I want to grab only the word text from this span...<span id="something">This is a text!!!</a>
I make the following assumptions on this answer.
Your html is valid - i.e. the id is completely unique in the document.
You will always have an id on your html tag
You'll always be using the same tag (e.g. span)
I'd do something like this:
' get the html document
Dim fileReader As String = My.Computer.FileSystem.ReadAllText(openFileDialog1.FileName)
' split the html text based on the span element
Dim fileSplit as string() = fileReader.Split(New String () {"<span id=""something"">"}, StringSplitOptions.None)
' get the last part of the text
fileReader = fileSplit.last
' we now need to trim everything after the close tag
fileSplit = fileReader.Split(New String () {"</span>"}, StringSplitOptions.None)
' get the first part of the text
fileReader = fileSplit.first
' the fileReader variable should now contain the contents of the span tag with id "something"
Note: this code is untested and I've typed it on the stack exchange mobile app, so there might be some auto correct typos in it.
You might want to add in some error validation such as making sure that the span element only occurs once, etc.
Using an HTML parser is highly recommended due to the HTML language's many nested tags (see this question for example).
However, finding the contents of a single tag using Regex is possible with no bigger problems if the HTML is formatted correctly.
This would be what you need (the function is case-insensitive):
Public Function FindTextInSpan(ByVal HTML As String, ByVal SpanId As String, ByVal LookFor As String) As String
Dim m As Match = Regex.Match(HTML, "(?<=<span.+id=""" & SpanId & """.*>.*)" & LookFor & "(?=.*<\/span>)", RegexOptions.IgnoreCase)
Return If(m IsNot Nothing, m.Value, "")
End Function
The parameters of the function are:
HTML: The HTML code as string.
SpanId: The id of the span (ex. <span id="hello"> - hello is the id)
LookFor: What text to look for inside the span.
Online test: http://ideone.com/luGw1V

vb.net scan installation folder for rtf documents

Does someone know a code so that my program looks in the map where it is installed for rtf documents and show them in a combobox. I am making an agenda.
It can search for "VPA event -" in the title too (that is what all the event names start with). If i have the code to d that then i can use this one to read the events
Dim objreader2 As New System.IO.StreamReader(ComboBox1.Text & ".rtf")
RichTextBox2.Text = objreader2.ReadToEnd
objreader2.Close()
thanks
Here's the code you can use to load the comboBox. After that, you can use the SelectedIndexChanged event (or other events) to read the file into the rich text box. I didn't test this, but it should be pretty close.
dim ss() as string
fPath = System.Windows.Forms.Application.UserAppDataPath
' or fPath = My.Computer.FileSystem.SpecialDirectories.MyDocuments, or other directory
ss = Directory.GetFiles(fPath, "*.rtf")
ComboBox1.items.clear()
for each string s in ss
ComboBox1.Items.add(s)
next s

How to Post & Retrieve Data from Website

I am working with a Windows form application. I have a textbox called "tbPhoneNumber" which contains a phone number.
I want to go on the website http://canada411.com and enter in the number that was in my textbox, into the website textbox ID: "c411PeopleReverseWhat" and then somehow send a click on "Find" (which is an input belonging to class "c411ButtonImg").
After that, I want to retrieve what is in between the asterixs of the following HTML section:
<div id="contact" class="vcard">
<span><h1 class="fn c411ListedName">**Full Name**</h1></span>
<span class="c411Phone">**(###)-###-####**</span>
<span class="c411Address">**Address**</span>
<span class="adr">
<span class="locality">**City**</span>
<span class="region">**Province**</span>
<span class="postal-code">**L#L#L#**</span>
</span>
So basically I am trying to send data into an input box, click the input button and store the values retrieved into variables. I want to do this seemlessly so I would need to do something like an HTTPWebRequest? Or do I use a WebBrowser object? I just don't want the user to see that the application is going on a website.
I do a good amount of website scraping and I will show you how I do it. Feel free to skip ahead if I am being too specific, but this is a commonly requested theme and should be made specific.
URL Simplification
The library I use for this is htmlagilitypack (It is a dll, make a new project and add a reference to it). The first thing to check is if we have to go to take any special steps to get to a page by using a phone number. I searched for John Smith and found quite a few. I entered 2 of these results and noticed that the url formatting is very simple. Those results were..
http://www.canada411.ca/res/7056736767/John-Smith/138223109.html
http://www.canada411.ca/res/7052355273/John-Smith/172439951.html
I tested to see if I can remove some of the values from the url that I don't know and just leave the phone number. The result was that I can...
http://www.canada411.ca/search/re/1/7056736767/-
http://www.canada411.ca/search/re/1/7052355273/-
You can see by the url that there are some static areas in the url and our phone number. From this lets construct a string for the url.
Dim phoneNumber as string = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL as string = "http://www.canada411.ca/search/re/1/" + phoneNumber +"/-"
Value Extraction with XPath
Now that we have the page dialed in, lets examine the html you provided above. You need 6 values from the page so we will create them now...
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
As mentioned above, we will be using htmlagilitypack which uses Xpath. The cool thing about this is that once we can find some unique identifier in the html, we can use Xpath to find our values. I know it may be confusing, but it will become clearer.
All of the values you need are within tags that have a class name. Lets use the class name in our Xpath to find them.
Dim FullNameXPath As String = "//*[#class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[#class='c411Phone']"
Dim AddressXPath As String = "//*[#class='c411Address']"
Dim LocalityXPath As String = "//*[#class='locality']"
Dim RegionXPath As String = "//*[#class='region']"
Dim PostalCodeXPath As String = "//*[#class='postal-code']"
Essentially what we are looking at is a string that will inform htmlagilitypack what to look for. In our case, text contained within the classes we named. There is a lot to XPath and it could take a while to explain all of it. On a side note though...If you use Google Chrome and highlight a value on a page, you can right click inspect element. In the code that appears below, you can right click the value and copy to XPath!!! Very useful.
Basic HTMLAgilityPack Template
Now, all that is left is to connect to the page and get those variables populated.
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
Msgbox(nameResult.InnerText)
Next
In the above example we create an HtmlWeb object named Web. This is the actual crawler of our project. We then define a HtmlDocument which will consist of our converted and searchable page source. All of this is done behind the scenes. We then send Web to get the page source and assign it to the Doc object we created. Doc is reusable, which thankfully requires us to connect to the page only once.
The for loop looks for any nodes in our Doc that match FullNameXPath which was defined previously as the XPath value for finding the name. When a Node is found, it is assigned to the nameResult variable and from within the loop we call a message box to display the inner text of our node.
So when we put it all together
Complete Working Code (As of 2/17/2013)
Dim phoneNumber As String = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL As String = "http://www.canada411.ca/search/re/1/" + phoneNumber + "/-"
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
Dim FullNameXPath As String = "//*[#class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[#class='c411Phone']"
Dim AddressXPath As String = "//*[#class='c411Address']"
Dim LocalityXPath As String = "//*[#class='locality']"
Dim RegionXPath As String = "//*[#class='region']"
Dim PostalCodeXPath As String = "//*[#class='postal-code']"
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
FullName = nameResult.InnerText
MsgBox(FullName)
Next
For Each PhoneResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PhoneXPath)
Phone = PhoneResult.InnerText
MsgBox(Phone)
Next
For Each ADDRResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(AddressXPath)
Address = ADDRResult.InnerText
MsgBox(Address)
Next
For Each LocalResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(LocalityXPath)
Locality = LocalResult.InnerText
MsgBox(Locality)
Next
For Each RegionResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(RegionXPath)
Region = RegionResult.InnerText
MsgBox(Region)
Next
For Each postalCodeResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PostalCodeXPath)
PostalCode = postalCodeResult.InnerText
MsgBox(PostalCode)
Next
Yes it is possible, I've done this using the selenium framework, which is aimed for testing automation. However, it provides you with the tools to do exactly that.
Download for .net here:
http://docs.seleniumhq.org/download/

Text from webpage

I need to get some text from this web page. I want to use the trade feed for my program to analyse the sentiment of the markets.
I used the browser control and the get element command but its not working. The problem is that whenever my browser starts to open the page I get Java scripts errors.
I tried with DOM but seems that i dont quite understand what i need to do :)
Here is the code:
Dim code As String
Using client As New WebClient
code = client.DownloadString("http://openbook.etoro.com/ahanit/#/profile/Trades/")
End Using
Dim htmlDocument As IHTMLDocument2 = New HTMLDocument(code)
htmlDocument.write(htmlDocument)
Dim allElements As IHTMLElementCollection = htmlDocument.body.all
Dim allid As IHTMLElementCollection = allElements.tags("id")
Dim element As IHTMLElement
For Each element In allid
element.title = element.innerText
MsgBox(element.innerText)
Next
Update: So I tried the HTML Agility pack, as suggested in the comments, and I am stuck again on this code
Dim plain As String = String.Empty
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml("http://openbook.etoro.com/ahanit/#/profile/Trades/")
Dim goodnods As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("THE PROBLEM")
For Each node In goodnods
TextBox1.Text = htmldoc.DocumentNode.InnerText
Next
Any advice what to now?
Ok I think I know what the problem is somehow the div that I need is hidden and its not loaded when I load the web page just the source code. Does someone knows how to load all the hidden divs ??
Here is my new code
Dim doc As New HtmlAgilityPack.HtmlDocument
Dim web As New HtmlWeb
doc = web.Load("http://openbook.etoro.com/ahanit/#/profile/Trades/")
Dim nodes As HtmlNode = doc.GetElementbyId("feed-items")
Dim id As String = nodes.WriteTo()
TextBox1.Text = TextBox1.Text & vbCrLf & id
user1336635,
Welcome to SO! Something you might try is to check out his source code, figure out what javascript function is populating the field you want (using firebug - I assume it's the one that "trades result in profit" next to it), and then embedding that script into a web page that your webbrowser control loads. That's where I'd try to start. I checked his source code and searched for "trades result in profit" and didn't find anything which leads me to believe hunting for the element 'might' not be possible. Just a starting place until someone with more experience with this chimes in!! Best!
-sf