How to extract URLs from HTML source in vb.net - vb.net

My question is: I have a program that fetches the whole source code of a specified URL. The source code will be saved in a variable.
Part of the source code looks like this:
"thumbnail_src":"https:\/\/scontent-fra3-1.blablabla.com\/t51.2885-15\/s640x640\/sh0.08\/e35\/1234567_984778981596410_1107218704_n.jpg","is_video":false,
The code is has quite a bunch of those URLs. I want my code to look for the part "thumbnail_src":" as a marker for beginning the extraction process and stop the extraction at ","is_video":
This should be obviously done in a loop until all URLs are being extracted and saved into a listing variable.
How can I achieve that?

I am trying to get that Regexp into my sourcecode. The one that codexer wrote, which is correct but I am getting eerrors in visual basic net.
Dim regex As Regex = New Regex("thumbnail_src""": """(.*)""","""is_video")
Dim match As Match = regex.Match(sourceString)
If match.Success Then
Console.WriteLine(match.Value)
End If
I tried it this way..and also that way:
Dim regex As Regex = New Regex("thumbnail_src":"(.*)","is_video")
Something is wrong the way I am entering the regex code.
Here is the correct one I need to implement:
https://regex101.com/r/hK0xH8/4
thumbnail_src":"(.*)","is_video

In light of your recent edit I am going to redo this answer.
Since it looks like everything is coming in on one line of text here is how I would handle it.
Dim LargetxtLine as String = TheVeryLargylineofText
Dim CommaSplit as String() = LargetxtLine.split(","c)
Dim URLList as New List(of String)
Dim RG as New Regex("\"":\""(.*)\""")
For Each str as String in CommaSplit
If str.contains("thumbnail_src") Then
URLList.Add(RG.Match(str).value)
End If
Next
This will break up the long line of text into managable chunks and then it uses regex to add it to a list of URL's (URLList)
From there you can do just about anything with a list(of String).
There is another way of doing it without splitting on the ,'s
if you use this Regex
"thumbnail_src\"":\""(.*?)\"",\""is_video"
Adding the "?" in there turns it into a greedy statement meaning that it will stop at the first occurance.
After that you could create a URLList like this
DIM RG as New Regex("thumbnail_src\"":\""(.*?)\"",\""is_video")
Dim URLList as MatchCollection = RG.Matches(reallybigString)
It's really personal preference

Related

VB.Net Read multi column text file and load into ListBox

First, I am not a programmer, I mainly just do simple scripts however there are somethings that are just easier to do in VB, I am pretty much self taught so forgive me if this sounds basic or if I can't explain it to well.
I have run into an issue trying to load a multi-column text file into a list box. There are two separate issues.
First issue is to read the text file and only grab the first column to use in the listbox, I am currently using ReadAllLines to copy the text file to a string first.
Dim RDPItems() As String = IO.File.ReadAllLines(MyDocsDir & "\RDPservers.txt")
However I am having a difficult time finding the correct code to only grab the first Column of this string to put in the listbox, if I use the split option I get an error that "Value of type '1-dimensional array of String' cannot be converted to 'String'"
The code looked like
frmRDP.lstRDP.Items.Add() = Split(RDPItems, ";", CompareMethod.Text)
This is the first hurdle, the second issue is what I want to do is if an item is selected from the List box, the value of the second column gets pulled into a variable to use.
This part I'm not even sure where to begin.
Example data of the text file
Server1 ; 10.1.1.1:3389
Server2 ; 192.168.1.1:8080
Server3 ; 172.16.0.1:9833
.....
When it's working the application will read a text file with a list of servers and their IPs and put the servers in a listbox, when you select the server from the listbox it and click a connect button it will then launch
c:\windows\system32\mstsc.exe /v:serverip
Any help would be appreciated, as I can hard code a large list of this into the VB application it would be easier to just have a text file with a list of servers and IPs to load instead.
The best practise for this would probably be to store your "columns" in a Dictionary. Declare this at class level (that is, outside any Sub or Function):
Dim Servers As New Dictionary(Of String, String)
When you load your items you read the file line-by-line, adding the items to the Dictionary and the ListBox at the same time:
Using Reader As New IO.StreamReader(IO.Path.Combine(MyDocsDir, "RDPservers.txt")) 'Open the file.
While Reader.EndOfStream = False 'Loop until the StreamReader has read the whole file.
Dim Line As String = Reader.ReadLine() 'Read a line.
Dim LineParts() As String = Line.Split(New String() {" ; "}, StringSplitOptions.None) 'Split the line into two parts.
Servers.Add(LineParts(0), LineParts(1)) 'Add them to the Dictionary. LineParts(0) is the name, LineParts(1) is the IP-address.
lstRDP.Items.Add(LineParts(0)) 'Add the name to the ListBox.
End While
End Using 'Dispose the StreamReader.
(Note that I used IO.Path.Combine() instead of simply concatenating the strings. I recommend using that instead for joining paths together)
Now, whenever you want to get the IP-address from the selected item you can just do for example:
Dim IP As String = Servers(lstRDP.SelectedItem.ToString())
Hope this helps!
EDIT:
Missed that you wanted to start a process with it... But it's like charliefox2 wrote:
Process.Start("c:\windows\system32\mstsc.exe", "/v:" & Servers(lstRDP.SelectedItem.ToString()))
Edit: #Visual Vincent's answer is way cleaner. I'll leave mine, but I recommend using his solution instead. That said, scroll down a little for how to open the server. He's got that too! Upvote his answer, and mark it as correct!
It looks like you're trying to split an array. Also, ListBox.Items.Add() works a bit differently than the way you've written your code. Let's take a look.
ListBox.Items.Add() requires that you provide it with a string inside the parameters. So you would do it like this:
frmRDP.lstRDP.Items.Add(Split(RDPItems, ";", CompareMethod.Text))
But don't do that!
When you call Split(), you must supply it with a string, not an array. In this case, RDPItems is an array, so we can't split the entire thing at once. This is the source of the error you were getting. Instead, we'll have to do it one item at a time. For this, we can use a For Each loop. See here for more info if you're not familiar with the concept.
A For Each loop will execute a block of code for each item in a collection. Using this, we get:
For Each item In RDPItems
Dim splitline() As String = Split(item, ";") 'splits the item by semicolon, and puts each portion into the array
frmRDP.lstRDP.Items.Add(splitline(0)) 'adds the first item in the array
Next
OK, so that gets us our server list put in our ListBox. But now, we want to open the server that our user has selected. To do that, we'll need an event handler (to know when the user has double clicked something), we'll have to find out which server they selected, and then we'll have to open that server.
We'll start by handling the double click by creating a sub to deal with it:
Private Sub lstRDP_MouseDoubleClick(sender As Object, e As MouseEventArgs) Handles lstRDP.MouseDoubleClick
Next, we'll get what the user has selected. Here, we're setting selection equal to the index that the user has selected (in this case, the first item is 0, the second is 1, and so on).
Dim selection As Integer = lstRDP.SelectedIndex
Lastly, we need to open the server. I'm assuming you want to do that in windows explorer, but if I'm mistaken please let me know.
Dim splitline() As String = Split(RDPItems(selection), ";")
Dim location As String = Trim(splitline(1))
We'll need to split the string again, but you'll notice this time I'm choosing the item whose location in the array is the same as the index of the list box the user has selected. Since we added our items to our listbox in the order they were added to our array, the first item in our listbox will be the first in the array, and so on. The location of the server will be the second part of the split function, or splitline(1). I've also included the Trim() function, which will remove any leading or trailing spaces.
Finally, we need to connect to our server. We'll use Process.Start() to launch the process.
Process.Start("c:\windows\system32\mstsc.exe", "/v:" & location)
For future reference, to first argument for Process.Start() is the location of the process, and the second argument is any argument the process might take (in this case, what to connect to).
Our final double click event handler looks something like this:
Private Sub lstRDP_MouseDoubleClick(sender As Object, e As MouseEventArgs) Handles lstRDP.MouseDoubleClick
Dim selection As Integer = lstRDP.SelectedIndex
Dim splitline() As String = Split(RDPItems(selection), ";")
Dim location As String = Trim(splitline(1))
Process.Start("c:\windows\system32\mstsc.exe", "/v:" & location)
End Sub
A final note: You may need to put
Dim RDPItems() As String = IO.File.ReadAllLines(MyDocsDir & "\RDPservers.txt")
outside of a sub, and instead just inside your class. This will ensure that both the click handler and your other sub where you populate the list box can both read from it.

In vb.net, I can't get my DirectoryInfo function to work properly

I'm trying to perform file deletion but only on files that don't exist in a list.
Example:
Dim FilesToKeep As List(Of String) = MyFunctionThatPopulatesTheList
The FilesToKeep list consists of the filenames.
Here is where I'm having trouble, as these clause functions throw me off big-time.
Dim filesToDelete
filesToDelete = New DirectoryInfo(FilePath) _
.GetFiles("*", SearchOption.AllDirectories) _
.Where(Function(f) Not f.Attributes.HasFlag(FileAttributes.Hidden)) _
.Where(Function(f) Not FilesToKeep.ToString.Contains(f.Name)) _
.[Select](Function(f) New FileCollectionForDelete(f)).ToArray()
Two things I'm trying to do if you look at the bottom two lines of the DirectoryInfo function. I only want the files that do not exist in the FilesToKeep list. The second, is just a helper where I'm storing the information about the file.
But as it stands, filesToDelete returns every single file.
Thank you for your help.
=========== EDIT =============
After comments, I gave it another shot, but curious if anyone can offer opinion on stability of this function.
First, I created another variable called FilesToKeep2
Dim FilesToKeep2 As String = String.Join(",", FilesTOKeep.ToArray())
And my function I left how it was, as it isn't comparing the entire path, note the (f.Name).
So right now this seems to be working properly, but worried about gotcha's later on.
Would this function be as solid as iterating through each one individually?
The problem is this expression:
Function(f) Not FilesToKeep.ToString.Contains(f.Name)
The type of the FilesToKeep object is a List(Of String). Calling ToString on a List(Of String) returns the name of the type. Just remove that part of the expression and you'll be fine:
Function(f) Not FilesToKeep.Contains(f.Name)
Also, I think you're overthinking things with the final .Select(). Skip that (and the .ToArray() call) entirely.
Final code:
Dim FilesToKeep As List(Of String) = MyFunctionThatPopulatesTheList()
Dim filesToDelete = (New DirectoryInfo(FilePath)).GetFiles("*", SearchOption.AllDirectories).
Where(Function(f) Not f.Attributes.HasFlag(FileAttributes.Hidden)).
Where(Function(f) Not FilesToKeep.Contains(f.Name))
For Each fileName As String In filesToDelete
File.Delete(fileName)
Next
Regarding your edit: you're probably fine with that edit. However, you should know that commas are legal in file names, and therefore it's possible to create files that should be deleted, but will still match your string. For best results here, at least use a delimiter character that's not legal in file names.

how to extract a quote from a quote generator in Vb.net

im trying to extract a quote from this quote gen URL 'http://www.quotedb.com/quote/quote.php?action=random_quote'. i need it to extract JUST the quote and optionally the person who made the quote. this is an example reply from the generator.
document.write('When nothing seems to help, I go and look at a stonecutter hammering away at his rock perhaps a hundred times without as much as a crack showing in it. Yet at the hundred and first blow it will split in two, and I know it was not that blow that did it, but all that had gone before.');
document.write('More quotes from Jacob August Riis');
I know i need to parse it to extract the quote itself but im not to sure how to so this.
I know how to download the string of the quote but not how to extract it. So this is all i have currently:
Dim Cient As New System.Net.WebClient
Dim grab = Cient.DownloadString("http://www.quotedb.com/quote/quote.php?action=random_quote")
any help is greatly appreciated!
Someone else could probably come up with more elegant regular expressions, but this should work. Just a couple of regular expressions to extract the parts of the returned data that you are interested in.
Dim quote = RegEx.Matches(grab, "document\.write\('(.*?)<br>'\);")(0).Groups(1).Value
Dim author = RegEx.Matches(grab, "document\.write\('<i>.*?>(.*?)</a></i>'\);")(0).Groups(1).Value
I'm not a fan of parsing HTML with Regex, but since all of these come back with the same grammar so to speak, we can consider it regular for this case.
Dim pattern As String = <![CDATA[document\.write\('(?<quote>.*)<br\>'\);\ndocument\.write\('.*href=\"(?<url>[^\"]*)\">(?<author>[^<]*)</a>.*'\).*]]>.Value
Dim quoteRegex As New Regex(pattern, RegexOptions.Compiled Or RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim Cient As New System.Net.WebClient
Dim grab = Cient.DownloadString("http://www.quotedb.com/quote/quote.php?action=random_quote")
Dim matches As MatchCollection = quoteRegex.Matches(grab)
For Each m As Match In matches
Console.WriteLine("Quote: {0}", m.Groups("quote"))
Console.WriteLine("Author: {0}", m.Groups("author"))
Console.WriteLine("URL: {0}", m.Groups("url"))
Next
This finds the quote (text within the first document.write() ignoring the quotes and the <br> tag), the Author of the quote (the Textual display of the anchor tag) and then the URL for more quotes (the href attribute of the anchor)
I declared the pattern by using XML literals so that I didn't have to escape out all the quote characters.
Requires Imports System.Text.RegularExpressions

Parse String to Array or DataTable

I'm trying to parse a long boring text document and parse and format it.
"7/29/2012 1:25:20 PM","Summary Plan/Second Floor /Master_VAV_2-24","Source :OEnd"
"7/29/2012 11:25:23 AM","Summary Plan/Second Floor /Master_VAV_2-24","Source :OStart"
I'd like to parse each value between the quotes but I cant find anything online to help me, I believe it's a matter of knowing what to call it and search for.
"date", "location", "type" would be the 3 values I want to parse it into, then I could run a loop for each item in datatable and format it as required.
ANY HELP would be great, thank you!
I'm thinking of using RegEx to get the rows and add them manually to an array, something like this.
Dim rx As New Regex(",", RegexOptions.IgnoreCase Or RegexOptions.Multiline)
Dim matches As MatchCollection = rx.Matches(strSource)
For Each match As Match In matches
Dim matchValue As String = match.Value
If Not list.Contains(matchValue) Then
list.Add(matchValue)
End If
Next
This format appears to be CSV - use one of the many free and open source CSV parsers (google ".NET CSV parser" will return many results.
There is one that comes built in, in the Microsoft.VisualBasic.FileIO namespece - the TextFieldParser.

How can I read individual lines of a CSV file into a string array, to then be selectively displayed via combobox input?

I need your help, guys! :|
I've got myself a CSV file with the following contents:
1,The Compact,1.8GHz,1024MB,160GB,440
2,The Medium,2.4GHz,1024MB,180GB,500
3,The Workhorse,2.4GHz,2048MB,220GB,650
It's a list of computer systems, basically, that the user can purchase.
I need to read this file, line-by-line, into an array. Let's call this array csvline().
The first line of the text file would stored in csvline(0). Line two would be stored in csvline(1). And so on. (I've started with zero because that's where VB starts its arrays). A drop-down list would then enable the user to select 1, 2 or 3 (or however many lines/systems are stored in the file). Upon selecting a number - say, 1 - csvline(0) would be displayed inside a textbox (textbox1, let's say). If 2 was selected, csvline(1) would be displayed, and so on.
It's not the formatting I need help with, though; that's the easy part. I just need someone to help teach me how to read a CSV file line-by-line, putting each line into a string array - csvlines(count) - then increment count by one so that the next line is read into another slot.
So far, I've been able to paste the numbers of each system into an combobox:
Using csvfileparser As New Microsoft.VisualBasic.FileIO.TextFieldParser _
("F:\folder\programname\programname\bin\Debug\systems.csv")
Dim csvalue As String()
csvfileparser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
csvfileparser.Delimiters = New String() {","}
While Not csvfileparser.EndOfData
csvalue = csvfileparser.ReadFields()
combobox1.Items.Add(String.Format("{1}{0}", _
Environment.NewLine, _
csvalue(0)))
End While
End Using
But this only selects individual values. I need to figure out how selecting one of these numbers in the combobox can trigger textbox1 to be appended with just that line (I can handle the formatting, using the string.format stuff). If I try to do this using csvalue = csvtranslator.ReadLine , I get the following error message:
"Error 1 Value of type 'String' cannot be converted to '1-dimensional array of String'."
If I then put it as an array, ie: csvalue() = csvtranslator.ReadLine , I then get a different error message:
"Error 1 Number of indices is less than the number of dimensions of the indexed array."
What's the knack, guys? I've spent hours trying to figure this out.
Please go easy on me - and keep any responses ultra-simple for my newbie brain - I'm very new to all this programming malarkey and just starting out! :)
Structure systemstructure
Dim number As Byte
Dim name As String
Dim procspeed As String
Dim ram As String
Dim harddrive As String
Dim price As Integer
End Structure
Private Sub csvmanagement()
Dim systemspecs As New systemstructure
Using csvparser As New FileIO.TextFieldParser _
("F:\folder\programname\programname\bin\Debug\systems.csv")
Dim csvalue As String()
csvparser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
csvparser.Delimiters = New String() {","}
csvalue = csvparser.ReadFields()
systemspecs.number = csvalue(0)
systemspecs.name = csvalue(1)
systemspecs.procspeed = csvalue(2)
systemspecs.ram = csvalue(3)
systemspecs.harddrive = csvalue(4)
systemspecs.optical = csvalue(5)
systemspecs.graphics = csvalue(6)
systemspecs.audio = csvalue(7)
systemspecs.monitor = csvalue(8)
systemspecs.software = csvalue(9)
systemspecs.price = csvalue(10)
While Not csvparser.EndOfData
csvalue = csvparser.ReadFields()
systemlist.Items.Add(systemspecs)
End While
End Using
End Sub
Edit:
Thanks for your help guys, I've managed to solve the problem now.
It was merely a matter calling loops at the right point in time.
I would recommend using FileHelpers to do the reading.
The binding shouldn't be an issue after that.
Here is the Quickstart for Delimited Records:
Dim engine As New FileHelperEngine(GetType( Customer))
// To Read Use:
Dim res As Customer() = DirectCast(engine.ReadFile("FileIn.txt"), Customer())
// To Write Use:
engine.WriteFile("FileOut.txt", res)
When you get the file read, put it into a normal class and just bind to the class or use the list of items you have to do custom stuff with the combobox. Basically, get it out of the file and into a real class asap, then things will be easier.
At least take a look at the library. After using it, we use a lot more simple flat files since it is so easy, and we haven't written a file access routine since (for that kinda stuff).
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
I think your main problem is understanding how arrays work (hence the error message).
You can use split and join functions to convert strings into and out of arrays
dim s() as string = split("1,2,3",",") gives and array of strings with 3 elements
dim ss as string = join(s,",") gives you the string back
Firstly, it's actually really good that you are using the TextFieldParser for reading CSV files - most don't but you won't have to worry about extra commas and quoted text etc...
The Readline method only gives you the raw string, hence the "Error 1 Value of type 'String' cannot be converted to '1-dimensional array of String'."
What you may find easier with combo boxes etc is to use an object (e.g. 'systemspecs') rather than strings. Assign the CSV data to the objects and override the "ToString" method of the 'systemspecs' class to display in the combo box how you want with formatting etc. That way when you handle the SelectedIndexChanged event (or similar) you get the "SelectedItem" from the combo box (which can be Nothing so check) and cast it as the 'systemspecs' to use it. The advantage is that you are not restricted to display the exact data in the combo etc.
' in "systemspecs"...
Public Overrides Function ToString() As String
Return Name ' or whatever...
End Function ' ToString
e.g.
dim item as new systemspecs
item.ID = csvalue(1)
item.Name = csvalue(2)
' etc...
combobox1.Items.Add(item)
Let me know if that makes sense!
PK :-)