I use a wordcloud based on a list of words and their frequencies. I load the list from a text file and display them in a Listview and image. When the textfile is not indexed (the highest frequencies first) the word cloud doesn't make the words with the highest counts the biggest.
Is there a way to load the words, highest frequencies first, without having to change the list?
Imports WordCloudGen = WordCloud.WordCloud
Imports System.IO
Public Class WordCloud
Private Sub WordCloud_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim lines = File.ReadLines("C:\Users\Gebruiker\Downloads\Words.txt")
Dim Words As New List(Of String) '({100})
Dim Frequencies As New List(Of Integer) '({100})
Dim textValue As String()
Dim items As New List(Of ListViewItem)
For Each line In lines
textValue = line.Split(New Char() {","})
Words.Add(textValue(0))
Frequencies.Add(Integer.Parse(textValue(1)))
items.Add(New ListViewItem(New String() {textValue(0).ToString, textValue(1).ToString}, 0))
Next
ListView1.Items.AddRange(items.ToArray)
Dim wc As WordCloudGen = New WordCloudGen(600, 400)
Dim i As Image = wc.Draw(Words, Frequencies)
ResultPictureBox.Image = i
End Sub
When the textfile is not indexed (the highest frequencies first) the word cloud doesn't make the words with the highest counts the biggest. Is there a way to load the words, highest frequencies first, without having to change the list?
I would recommend a new class to hold your data then you can sort anything you need much easier.
Create a new class: WordsFrequencies
Public Class WordsFrequencies
Public Property Word As String
Public Property Frequency As Integer
End Class
Change your WordCloud_Load routine as below:
Dim WordsFreqList As New List(Of WordsFrequencies)
For Each line As String In File.ReadLines("C:\Users\Gebruiker\Downloads\Words.txt")
Dim splitText As String() = line.Split(","c)
If splitText IsNot Nothing AndAlso splitText.Length = 2 Then
Dim wordFrq As New WordsFrequencies
Dim freq As Integer
wordFrq.Word = splitText(0)
wordFrq.Frequency = If(Integer.TryParse(splitText(1), freq), freq, 0)
WordsFreqList.Add(wordFrq)
End If
Next
If WordsFreqList.Count > 0 Then
' Order the list based on the Frequency
WordsFreqList = WordsFreqList.OrderByDescending(Function(w) w.Frequency).ToList
' Add the sorted items to the listview
WordsFreqList.ForEach(Sub(wf)
ListView1.Items.Add(New ListViewItem(New String() {wf.Word, wf.Frequency.ToString}, 0))
End Sub)
End If
In the above, I would recommend doing a simple For loop with File.ReadLines, this is so you don't have to load the whole file in memory if you're just getting data and parsing it. I'm using the OrderByDescending Method which is part of System.Linq namespace.
As far as this: Dim i As Image = wc.Draw(Words, Frequencies) you then could do something like:
Dim i As Image = wc.Draw(WordsFreqList.Select(Function(wf) wf.Word), WordsFreqList.Select(Function(wf) wf.Frequency))
This will project the Word's into an IEnumerable(String) and then Frequency into an IEnumerable(Integer).
Related
I am needing to create a code that is versatile enough where I can add more columns in the future with minimum reconstruction of my code. My current code does not allow me to travel through my file with my 2-D array. If I was to change MsgBox("map = "+ map(0,1) I can retrieve the value easily. Currently all I get in the code listed is 'System.IndexOutOfRangeException' and that Index was outside the bounds of the array. My current text file is 15 rows (down) and 2 columns (across) which puts it at a 14x1. they are also comma separated values.
Dim map(14,1) as string
Dim reader As IO.StreamReader
reader = IO.File.OpenText("C:\LocationOfTextFile")
Dim Linie As String, x,y As Integer
For x = 0 To 14
Linie = reader.ReadLine.Trim
For y = 0 To 1
map(x,y) = Split(Linie, ",")(y)
Next 'y
Next 'x
reader.Close()
MsgBox("map = " + map(y,x))``
Here's a generic way to look at reading the file:
Dim data As New List(Of List(Of String))
For Each line As String In IO.File.ReadAllLines("C:\LocationOfTextFile")
data.Add(New List(Of String)(line.Split(",")))
Next
Dim row As Integer = 1
Dim col As Integer = 10
Dim value As String = data(row)(col)
This is the method suggested by Microsoft. It is generic and will work on any properly formatted comma delimited file. It will also catch and display any errors found in the file.
Using MyReader As New Microsoft.VisualBasic.
FileIO.TextFieldParser(
"C:\LocationOfTextFile")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.
FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
End Try
End While
End Using
Essentially what you are asking is how can I take the contents of comma-separated values and convert this to a 2D array.
The easiest way, which is not necessarily the best way, is to return an IEnuemrable(Of IEnumerable(Of String)). The number of items will grow both vertically based on the number of lines and the number of items will grow horizontally based on the values split on a respective line by a comma.
Something along these lines:
Private Function GetMap(path As String) As IEnumerable(Of IEnumerable(Of String)
Dim map = New List(Of IEnumerable(Of String))()
Dim lines = IO.File.ReadAllLines(path)
For Each line In lines
Dim row = New List(Of String)()
Dim values = line.Split(","c)
row.AddRange(values)
map.Add(row)
Next
Return map
End Function
Now when you want to grab a specific cell using the (row, column) syntax, you could use:
Private _map As IEnumerable(Of IEnumerable(Of String))
Private Sub LoadMap()
_map = GetMap("C:/path-to-map")
End Sub
Private Function GetCell(row As Integer, column As Integer) As String
If (_map Is Nothing) Then
LoadMap()
End If
Return _map.ElementAt(row).ElementAt(column)
End Function
Here is an example: https://dotnetfiddle.net/ZmY5Ki
Keep in mind that there are some issues with this, for example:
What if you have commas in your cells?
What if you try to access a cell that doesn't exist?
These are considerations you need to make when implementing this in more detail.
You can consider the DataTable class for this. It uses much more memory than an array, but gives you a lot of versatility in adding columns, filtering, etc. You can also access columns by name rather than index.
You can bind to a DataGridView for visualizing the data.
It is something like an in-memory database.
This is much like #Idle_Mind's suggestion, but saves an array copy operation and at least one allocation per row by using an array, rather than a list, for the individual rows:
Dim data = File.ReadLines("C:\LocationOfTextFile").
Select(Function(ln) ln.Split(","c)).
ToList()
' Show last row and column:
Dim lastRow As Integer = data.Count - 1
Dim lastCol As Integer = data(row).Length - 1
MsgBox($"map = {data(lastRow)(lastCol)}")
Here, assuming Option Infer, the data variable will be a List(Of String())
As a step up from this, you could also define a class with fields corresponding to the expected CSV columns, and map the array elements to the class properties as another call to .Select() before calling .ToList().
But what I really recommend is getting a dedicated CSV parser from NuGet. While a given CSV source is usually consistent, more broadly the format is known for having a number of edge cases that can easily confound the Split() function. Therefore you tend to get better performance and consistency from a dedicated parser, and NuGet has several good options.
Right now I have many locations with this structure. At the moment I have: name as string and x,y,z positions as single. So it's a mix of data types and I might want to add both more data in the future and also other data types. I must be able to easily extract any part of this data.
Example of how I'll work with this data is: When I choose South Wales from a combobox then I want to get its properties, x,y,z populated in a textbox. So they need to be "linked". If I choose London then it'll have its x,y,z properties etc.
My initial idea is just to dim every single data such as in the first example below. This should be the easiest way with 100% control of what's what, and I could easily extract any single data but at the same time might get tedious I assume, or am I wrong? Is it a good way to approach this?
Dim SW_FP As String = "South Wales"
Dim SW_FP_X As Single = "489,1154"
Dim SW_FP_Y As Single = "-8836,795"
Dim SW_FP_Z As Single = "109,6124"
The next example below is something i just googled up. Is this a good method?
Dim dt As DataTable = New DataTable
dt.Columns.Add("South Wales", GetType(String))
dt.Columns.Add("489,1154", GetType(Single))
dt.Columns.Add("-8836,795", GetType(Single))
dt.Columns.Add("109,6124", GetType(Single))
OR should I use something else? Arrays, Objects with properties... and this is where my ideas end. Are there other methods? XML?
I want to do it in a smart way from start instead of risking to rewrite/recreate everything in the future. So my main question is: Which method would you suggest to be the smartest to choose? and also if you could provide a super tiny code example.
You mentioned that when you choose an item you want to get it's properties. This shows that you are looking for objects. If not using a database one example could be to make Location objects and have a List of these to be added or removed from. Then you have a lot of different ways to get the data back from the List. For example:
Class:
Public Class Location
Public Property Name As String
Public Property X As Single
Public Property Y As Single
Public Property Z As Single
End Class
List:
Dim locations As New List(Of Location)
Dim location As New Location With {
.Name = "South Wales",
.X = 1.1,
.Y = 1.2,
.Z = 1.3
}
locations.Add(location)
LINQ to get result:
Dim result = locations.SingleOrDefault(Function(i) i.Name = "South Wales")
This is just an example for use within your program, hope it helps.
Disclaimer: Untested code. It's more to guide you than copy-paste into your project.
First, create a Class that will represent the structured data:
Public Class Location
Public Property Name As String
Public Property PositionX As Single
Public Property PositionY As Single
Public Property PositionZ As Single
Public Sub New()
Me.New (String.Empty, 0, 0, 0)
End Sub
Public Sub New(name As String, x As Single, y As Single, z As Single)
Me.Name = name
Me.PositionX = x
Me.PositionY = y
Me.PositionZ = z
End Sub
Now, you can create a List(Of Location) and use that List to bind to a ComboBox, like this:
Dim list As New List(Of Location) = someOtherClass.ReadLocations ' Returns a List(Of Location) from your database, or file, or whatever.
cboLocations.DataSource = list
cboLocations.DisplayMember = "Name" ' The name of the Location class' Property to display.
cboLocations.ValueMember = "Name" ' Use the same Name Property since you have no ID.
You can also forego the list variable declaration like the following, but I wanted to show the declaration of list above:
cboLocations.DataSource = someOtherClass.ReadLocations
Function someOtherClass.ReadLocations() may populate the List(Of Locations) in a way similar to this. Note I'm not including data access code; this is just an example to show how to add Location objects to the List(Of Location):
Dim list As List(Of Location)
' Some loop construct
For each foo in Bar
Dim item As New Location(foo.Name, foo.X, foo.Y, foo.Z)
list.Add(item)
' End loop
Return list
The "magic" happens when you select an option from the ComboBox. I forget the ComboBox event offhand, so that's homework for you :-) You take the selected Object of the ComboBox and cast it back to the native type, in this case Location:
Dim item As Location = DirectCast(cboLocations.SelectedItem, Location)
txtName.Text = item.Name
txtPosX.Text = item.PositionX.ToString
txtPosY.Text = item.PositionY.ToString
txtPosZ.Text = item.PositionZ.ToString
Here is one way, using a DataTable as you mentioned. This is a stand alone example project just to show code used.
This example loads data from file is found and saves data on exit.
Form1 Image
' Stand alone example
' needs DataGridView1, Label1 and
' ComboBox1 on the Designer
' copy/replace this code with default
Option Strict On
Option Explicit On
Public Class Form1
Dim dt As New DataTable("Freddy")
Dim bs As New BindingSource
'edit path/filename to use as test data path
Dim filepath As String = "C:\Users\lesha\Desktop\TestData.xml"
Private Sub Form1_FormClosing(sender As Object, e As FormClosingEventArgs) Handles Me.FormClosing
dt.WriteXml(filepath)
End Sub
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
With dt
dt.Columns.Add("Country", GetType(String))
dt.Columns.Add("x", GetType(String))
dt.Columns.Add("y", GetType(String))
dt.Columns.Add("z", GetType(String))
' add extra column to hold concatenated
' location (could be a hidden column)
dt.Columns.Add("CombLoc", GetType(String), "'x = ' + x + ' y = ' + y + ' z = ' + z")
If IO.File.Exists(filepath) Then
' saved file found so load it
dt.ReadXml(filepath)
Else
' no saved file so make one test row
.Rows.Add("South Wales", 489.1154, -8836.795, 109.6124)
End If
End With
bs.DataSource = dt
DataGridView1.DataSource = bs
' set any properties for DataGridView1
With DataGridView1
' to hide Combined Location column
.Columns("CombLoc").Visible = False
' dontwant row headers
.RowHeadersVisible = False
End With
set up ComboBox
With ComboBox1
.DataSource = bs
' displayed item
.DisplayMember = "Country"
' returned item
.ValueMember = "CombLoc"
If .Items.Count > 0 Then .SelectedIndex = 0
End With
' default Label text
Label1.Text = "No items found"
End Sub
Private Sub ComboBox1_SelectedIndexChanged(sender As Object, e As EventArgs) Handles ComboBox1.SelectedIndexChanged
no items in list so exit sub
If ComboBox1.SelectedIndex < 0 Then Exit Sub
send returneditem to Label
Label1.Text = ComboBox1.SelectedValue.ToString
End Sub
End Class
I'm trying to create a program in vb.net (forms) to process data from a UVvis spectrometer.
The txt file output looks as following.
"180809_QuartzRefTrans.spc - RawData"
"Wavelength nm.","T%"
400.00,90.822
401.00,90.800
402.00,90.823
403.00,90.811
404.00,90.803
405.00,90.804
406.00,90.816
407.00,90.811
408.00,90.833
409.00,90.837
410.00,90.847
411.00,90.827
412.00,90.839
413.00,90.851
414.00,90.828
415.00,90.879
416.00,90.846
and so on.
What I am trying to do is read the data it into an array so I can manipulate the columns. I need to be able to skip the first two lines so that all I have is numerical data. I also need it to sort the array from lowest to highest (wavelengths). Sometimes we run from 800->200 nm then accidentally put in 200->800 nm
Imports System.IO
Imports System.Windows.Forms.DataVisualization.Charting
Public Class Form1
Public Class RefTrans
Public Property Wavelength As Double
Public Property Transpercent As Double
End Class
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim strtext As String
OpenFileDialog1.Title = "Open Text Files"
OpenFileDialog1.ShowDialog()
strtext = OpenFileDialog1.FileName
TextBox1.Text = My.Computer.FileSystem.GetName(strtext)
Label1.Text = My.Computer.FileSystem.GetName(strtext)
Dim line1 As String
Dim output1 As New ArrayList
Using sr As New IO.StreamReader(strtext)
sr.ReadLine()
sr.ReadLine()
Do While sr.Peek() >= 0
line1 = sr.ReadLine()
output1.Add(line1)
Loop
End Using
If strtext <> "" Then
Dim SR As New StreamReader(strtext)
SR.ReadLine()
Do Until SR.EndOfStream
TextBox3.Text = TextBox3.Text & SR.ReadLine & vbCrLf
Loop
SR.Close()
End If
Dim data1 = IO.File.ReadLines(strtext).
Skip(2).
Select(Function(line)
Dim parts = line.Split(","c)
Return New RefTrans With {.Wavelength = CDbl(parts(0)),
.Transpercent = CDbl(parts(1))}
End Function).
ToArray() = line.Split(","c)
End Sub
End Class
Here's an example that will get you an array of Tuple(Of Double, Double):
Dim data = IO.File.ReadLines(filePath).
Skip(2).
Select(Function(line)
Dim parts = line.Split(","c)
Return Tuple.Create(CDbl(parts(0)), CDbl(parts(1)))
End Function).
ToArray()
The IO.File.ReadLines method will read the lines of the file. Where the ReadAllLines method reads the entire file and then returns the lines as a String array, the ReadLines method exposes the lines of the file as an enumerable list and the lines only get read from the file as you use them. Particularly for big files, ReadLines is more efficient than ReadAllLines because it does create an intermediate array first that you don't actually want.
The Skip method allows you to access the items in a list starting at a particular index. In the case of reading a file, you need to read every line regardless but call Skip(2) on the result of File.ReadLines means that the first two lines of the file will be discarded after reading and any subsequent processing will only but done on lines from the third.
The Select method is basically a transformation. It says create an output list containing an item for every item in the input list where the output item is the result of a transformation of the input item. The transformation is defined by the function you provide. In this case, a line from the file is the input and the transformation is to split that line on the comma, convert the two substrings to Double values and then create a Tuple containing those two Double values.
The ToArray method takes any enumerable list, i.e. any object that implements IEnumerable(Of T), and returns an array containing the items from that list. In this case, Select returns an IEnumerable(Of Tuple(Of Double, Double)) so ToArray returns a Tuple(Of Double, Double) array.
If you wanted to write that all out long-hand then it would look like this:
'Get all the lines of the file.
Dim step1 As IEnumerable(Of String) = IO.File.ReadLines(filePath)
'Skip the first two lines.
Dim step2 As IEnumerable(Of String) = step1.Skip(2)
'Split each line into two substrings.
Dim step3 As IEnumerable(Of String()) = step2.Select(Function(line) line.Split(","c))
'Convert substrings to numbers and combine.
Dim step4 As IEnumerable(Of Tuple(Of Double, Double)) = step3.Select(Function(parts) Tuple.Create(CDbl(parts(0)), CDbl(parts(1))))
'Create an array of Tuples.
Dim data As Tuple(Of Double, Double)() = step4.ToArray()
A Tuple is basically a general-purpose object for grouping values together. You can use a Tuple for one-off cases rather than defining your own class. You may prefer to define your own class in all cases though, and you should define your own class where you will use the data extensively. In this case, your own class might look like this:
Public Class RefTrans
Public Property Wavelength As Double
Public Property TransPercent As Double
End Class
and the code would then become:
Dim data = IO.File.ReadLines(filePath).
Skip(2).
Select(Function(line)
Dim parts = line.Split(","c)
Return New RefTrans With {.Wavelength = CDbl(parts(0)),
.TransPercent = CDbl(parts(1))}
End Function).
ToArray()
In that case, you have properties Wavelength and TransPercent on each element of your array. If you use Tuples then the properties have the generic names Item1 and Item2.
Once you have the array, you can use any appropriate overload of Array.Sort to do the sorting, e.g.
Array.Sort(data, Function(d1, d2) d1.Item1.CompareTo(d2.Item1))
That will sort the Tuples by comparing their Item1 properties, which contain the wavelength values. If you're using your own class then obviously you specify your own property, e.g. the Wavelength property in the RefTrans class I showed.
The text file contains lines with the year followed by population like:
2016, 322690000
2015, 320220000
etc.
I separated the lines substrings to get all the years in a list box, and all the population amounts in a separate listbox, using the following code:
Dim strYearPop As String
Dim intYear As Integer
Dim intPop As Integer
strYearPop = popFile.ReadLine()
intYear = CInt(strYearPop.Substring(0, 4))
intPop = CInt(strYearPop.Substring(5))
lstYear.Items.Add(intYear)
lstPop.Items.Add(intPop)
Now I want to add the population amounts together, using the .Items to act as an array.
Dim intPop1 As Integer
intPop1 = lstPop.Items(0) + lstPop.Items(1)
But I get an error on lstPop.Items(1) and any item other than lstPop.Items(0), due to out of range. I understand the concept of out of range, but I thought that I create an index of several items (about 117 lines in the file, so the items indices should go up to 116) when I populated the list box.
How do i populate the list box in a way that creates an index of list box items (similar to an array)?
[I will treat this as an XY problem - please consider reading that after reading this answer.]
What you are missing is the separation of the data from the presentation of the data.
It is not a good idea to use controls to store data: they are meant to show the underlying data.
You could use two arrays for the data, one for the year and one for the population count, or you could use a Class which has properties of the year and the count. The latter is more sensible, as it ties the year and count together in one entity. You can then have a List of that Class to make a collection of the data, like this:
Option Infer On
Option Strict On
Imports System.IO
Public Class Form1
Public Class PopulationDatum
Property Year As Integer
Property Count As Integer
End Class
Function GetData(srcFile As String) As List(Of PopulationDatum)
Dim data As New List(Of PopulationDatum)
Using sr As New StreamReader(srcFile)
While Not sr.EndOfStream
Dim thisLine = sr.ReadLine
Dim parts = thisLine.Split(","c)
If parts.Count = 2 Then
data.Add(New PopulationDatum With {.Year = CInt(parts(0).Trim()), .Count = CInt(parts(1).Trim)})
End If
End While
End Using
Return data
End Function
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim srcFile = "C:\temp\PopulationData.txt"
Dim popData = GetData(srcFile)
Dim popTotal = 0
For Each p In popData
lstYear.Items.Add(p.Year)
lstPop.Items.Add(p.Count)
popTotal = popTotal + p.Count
Next
' popTotal now has the value of the sum of the populations
End Sub
End Class
If using a List(Of T) is too much, then just use the idea of separating the data from the user interface. It makes processing the data much simpler.
So I have two ListBoxes. Listbox1 gathers files from a directory and I have an add button to add selected files from Listbox1 over to Listbox2.
Is there a way to manually sort these items? Maybe with an up down buttons?
The reason I'd like to sort/reorder is I'll have a process that will run those selected files and each file will have to produce another file unique to the filename.
A quick example would be process FILE1.txt and produce FILE1.pdf, etc.
Is there an easier way to accomplish the sort/reorder?
UPDATE
Here is currently how I'm populating my listbox1, before adding anything to listbox2, which is the ListBox I'd like to have sorted so way or another.
Dim directoryInfo As _
New System.IO.DirectoryInfo(FolderBrowserDialog1.SelectedPath)
Dim fileInfos() As System.IO.FileInfo
fileInfos = directoryInfo.GetFiles()
For Each fileInfo As System.IO.FileInfo In fileInfos
ListBox1.DataSource = _list
_list.Add(fileInfo.Name)
_list.Sort()
Next
'Refresh Listbox1
ListBox1.DataSource = Nothing
ListBox1.DataSource = _list
You can use data-binding instead of adding the items one by one to the ListBox. I suggest you to add the files to a list first then sort the list and assign it to the ListBox's DataSource.
Define the list a class member
Private _list As New List(Of String)()
Assign it to the ListBox
listBox1.DataSource = _list
Then add new list entry with
_list.Add("new file")
_list.Sort()
' Refresh the ListBox
listBox1.DataSource = Nothing
listBox1.DataSource = _list
UPDATE
If you want to implement your own sorting order then implement a IComparer(Of String)
Class MyFileComparer
Implements IComparer(Of String)
Public Function Compare(x As String, y As String) As Integer _
Implements IComparer(Of String).Compare
Const AlwaysFirst As String = "FILE1"
Dim x = If(x.Contains(AlwaysFirst), "1_", "2_") & x
Dim y = If(y.Contains(AlwaysFirst), "1_", "2_") & y
' Note: If "FILE1" appears always at the end then this would be better
'Dim x = If(x.EndsWith(AlwaysFirst & ".txt"), "1_", "2_") & x
'Dim y = If(y.EndsWith(AlwaysFirst & ".txt"), "1_", "2_") & y
' Normalize strings (e.g. if "File_123.txt" = ""File 123.txt")
x = x.Replace("_"C, " "C)
y = y.Replace("_"C, " "C)
Return x.CompareTo(y)
End Function
End Class
Then you can sort like this
Static comparer = New MyFileComparer()
_list.Sort(comparer)
UPDATE #2
I do not know how your files are named exactly, however if they always end with "FILE<number>.<ext>" you could also change the file name for the string comparison like this:
Original file names
abc_FILE1.txt
abc_123_FILE2.txt
sssd_FILE23.txt
xxx_24_FILE073.txt
Prepared file names
FILE001_abc.txt
FILE002_abc_123.txt
FILE023_sssd.txt
FILE073_xxx_24.txt
Now the Compare method can determine the result simply with
Return x_prepared.CompareTo(y_prepared)
Piggybacking on Olivier's answer, perhaps a Sorted List may work for you?
You also could use OrderBy to sort the items if you need more than an alpha-name sort.
var sorted = from m in myCollection select m orderBy m.FileName;
(syntax may be off)