What is the best way to calculate word frequency in VB.NET? - vb.net

There are some good examples on how to calculate word frequencies in C#, but none of them are comprehensive and I really need one in VB.NET.
My current approach is limited to one word per frequency count. What is the best way to change this so that I can get a completely accurate word frequency listing?
wordFreq = New Hashtable()
Dim words As String() = Regex.Split(inputText, "(\W)")
For i As Integer = 0 To words.Length - 1
If words(i) <> "" Then
Dim realWord As Boolean = True
For j As Integer = 0 To words(i).Length - 1
If Char.IsLetter(words(i).Chars(j)) = False Then
realWord = False
End If
Next j
If realWord = True Then
If wordFreq.Contains(words(i).ToLower()) Then
wordFreq(words(i).ToLower()) += 1
Else
wordFreq.Add(words(i).ToLower, 1)
End If
End If
End If
Next
Me.wordCount = New SortedList
For Each de As DictionaryEntry In wordFreq
If wordCount.ContainsKey(de.Value) = False Then
wordCount.Add(de.Value, de.Key)
End If
Next
I'd prefer an actual code snippet, but generic 'oh yeah...use this and run that' would work as well.

This might be what your looking for:
Dim Words = "Hello World ))))) This is a test Hello World"
Dim CountTheWords = From str In Words.Split(" ") _
Where Char.IsLetter(str) _
Group By str Into Count()
I have just tested it and it does work
EDIT! I have added code to make sure that it counts only letters and not symbols.
FYI: I found an article on how to use LINQ and target 2.0, its a feels bit dirty but it might help someone http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on-net-2-0.aspx

Public Class CountWords
Public Function WordCount(ByVal str As String) As Dictionary(Of String, Integer)
Dim ret As Dictionary(Of String, Integer) = New Dictionary(Of String, Integer)
Dim word As String = ""
Dim add As Boolean = True
Dim ch As Char
str = str.ToLower
For index As Integer = 1 To str.Length - 1 Step index + 1
ch = str(index)
If Char.IsLetter(ch) Then
add = True
word += ch
ElseIf add And word.Length Then
If Not ret.ContainsKey(word) Then
ret(word) = 1
Else
ret(word) += 1
End If
word = ""
End If
Next
Return ret
End Function
End Class
Then for a quick demo application, create a winforms app with one multiline textbox called InputBox, one listview called OutputList and one button called CountBtn. In the list view create two columns - "Word" and "Freq." Select the "details" list type. Add an event handler for CountBtn. Then use this code:
Imports System.Windows.Forms.ListViewItem
Public Class MainForm
Private WordCounts As CountWords = New CountWords
Private Sub CountBtn_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CountBtn.Click
OutputList.Items.Clear()
Dim ret As Dictionary(Of String, Integer) = Me.WordCounts.WordCount(InputBox.Text)
For Each item As String In ret.Keys
Dim litem As ListViewItem = New ListViewItem
litem.Text = item
Dim csitem As ListViewSubItem = New ListViewSubItem(litem, ret.Item(item).ToString())
litem.SubItems.Add(csitem)
OutputList.Items.Add(litem)
Word.Width = -1
Freq.Width = -1
Next
End Sub
End Class
You did a terrible terrible thing to make me write this in VB and I will never forgive you.
:p
Good luck!
EDIT
Fixed blank string bug and case bug

This might be helpful:
Word frequency algorithm for natural language processing

Pretty close, but \w+ is a good regex to match with (matches word characters only).
Public Function CountWords(ByVal inputText as String) As Dictionary(Of String, Integer)
Dim frequency As New Dictionary(Of String, Integer)
For Each wordMatch as Match in Regex.Match(inputText, "\w+")
If frequency.ContainsKey(wordMatch.Value.ToLower()) Then
frequency(wordMatch.Value.ToLower()) += 1
Else
frequency.Add(wordMatch.Value.ToLower(), 1)
End If
Next
Return frequency
End Function

Related

Find Value as Integer in a Textbox

I would need a little help. If in the Textbox - TxtStringNum1.Text - we have the Digit 3. and TxtIntDraws.Lines (1) - contains the following set: 13,20,21,23,47,49,50,51,63,64,66,70
It shows me that there is Exists a Digit of 3, but it is actually the Digits of 13. It fails to look for it as a whole.
Private Sub ScanareLinia1()
Dim textsrtring As String = TxtStringNum1.Text
Dim words As String() = textsrtring.Split(New Char() {" "c})
' Split string based on space
Dim found As Boolean = False
' Use For Each loop over words
Dim word As Integer
For Each word In words
For i As Integer = 0 To TxtIntDraws.Lines.Count - 1
If TxtIntDraws.Lines(1).Contains(word) Then
TxtResultStr1.Text = word
End If
Next
Next
ScanareLinia2()
End Sub
I have a good comparison code, but I do not know how to use it for the code above.
Private Sub CompareNumbers()
'Pentru funcția de Check-In (For Match Exactly Value Number in List)
'First Textbox that is to be used for compare
Dim textBox1Numbers As List(Of Integer) = GetNumbersFromTextLine(TxtStringNum1.Text)
'Second Textbox that is to be used for compare
Dim textBox2Numbers As List(Of Integer) = GetNumbersFromTextLine(TxtbValBeforeCompar.Text)
'Union List of Common Numbers (this uses a lambda expression, it can be done using two For Each loops instead.)
Dim commonNumbers As List(Of Integer) = textBox1Numbers.Where(Function(num) textBox2Numbers.Contains(num)).ToList()
'This is purely for testing to see if it worked you can.
Dim sb As StringBuilder = New StringBuilder()
For Each foundNum As Integer In commonNumbers
sb.Append(foundNum.ToString()).Append(TextBox25.Text)
TxtbValAfterCompar.Text = (sb.ToString())
Next
End Sub
Private Function GetNumbersFromTextLine(ByVal sTextLine As String) As List(Of Integer)
'Pentru funcția de Check-In (For Match Exactly Value Number in List)
Dim numberList As List(Of Integer) = New List(Of Integer)()
Dim sSplitNumbers As String() = sTextLine.Split(TextBox8.Text)
For Each sNumber As String In sSplitNumbers
If IsNumeric(sNumber) Then
Dim iNum As Integer = CInt(sNumber)
TxtbValAfterCompar.Text = iNum
If Not numberList.Contains(iNum) Then
TxtbValAfterCompar.Text = ("")
numberList.Add(iNum)
End If
Else
End If
Next
Return numberList
End Function
You can use the following condition using String.Split and Array.IndexOf:
If Array.IndexOf(TxtIntDraws.Lines(1).Split(","c), CStr(word)) > -1 Then
TxtResultStr1.Text = word
End If
So the following Array.IndexOf(CStr("13,14,15,16,17").Split(","c), "13") > -1 is True and Array.IndexOf(CStr("13,14,15,16,17").Split(","c), "3") > -1 is False.

VB Check a Only Exactly Number in Textbox

Take a look at this picture. - http://www.imagebam.com/image/f544011007926944
I want to check in Textbox1, in the textbox2 enter the number and if there is that number I will appear in another text box. the problem occurs when in textbox1 there are numbers like 10,11, if I enter textbox2 number 1 then it will be taken as such, it will appear as if there is the number 1. I will only use for numbers from 1 to 80 .
where am I wrong?
' Split string based on space
Dim textsrtring As String = TextBox1.Text
Dim words As String() = textsrtring.Split(New Char() {" "c})
Dim found As Boolean = False
' Use For Each loop over words
Dim word As String
For Each word In words
If TextBox2.Lines(0).Contains(word) Then
found = True
If CheckVar1.Text.Contains(word) Then
Else
CheckVar1.Text = CheckVar1.Text + " " + TextBox1.Text()
End If
End If
So I think I know what you want. Your issue is the Compare Function on a String is comparing string literals and not numbers '11' contains '1' and '1' for Compare will return true that it contains a '1' in it. You need to convert the strings to numbers and then compare the numbers to each other.
Private Sub CompareNumbers()
'First Textbox that is to be used for compare
Dim textBox1Numbers As List(Of Integer) = GetNumbersFromTextLine(TextBox1.Text)
'Second Textbox that is to be used for compare
Dim textBox2Numbers As List(Of Integer) = GetNumbersFromTextLine(TextBox2.Text)
'Union List of Common Numbers (this uses a lambda expression, it can be done using two For Each loops instead.)
Dim commonNumbers As List(Of Integer) = textBox1Numbers.Where(Function(num) textBox2Numbers.Contains(num)).ToList()
'This is purely for testing to see if it worked you can.
Dim sb As StringBuilder = New StringBuilder()
For Each foundNum As Integer In commonNumbers
sb.Append(foundNum.ToString()).Append(" ")
Next
MessageBox.Show(sb.ToString())
End Sub
Private Function GetNumbersFromTextLine(sTextLine As String) As List(Of Integer)
Dim numberList As List(Of Integer) = New List(Of Integer)()
Dim sSplitNumbers As String() = sTextLine.Split(" ")
For Each sNumber As String In sSplitNumbers
If IsNumeric(sNumber) Then
Dim iNum As Integer = CInt(sNumber)
If Not numberList.Contains(iNum) Then
numberList.Add(iNum)
End If
Else
MessageBox.Show("Non Numeric Found in String :" + sTextLine)
End If
Next
Return numberList
End Function

permutation not accepting large words

the vb.net code below permutates a given word...the problem i have is that it does not accept larger words like "photosynthesis", "Calendar", etc but accepts smaller words like "book", "land", etc ...what is missing...Pls help
Module Module1
Sub Main()
Dim strInputString As String = String.Empty
Dim lstPermutations As List(Of String)
'Loop until exit character is read
While strInputString <> "x"
Console.Write("Please enter a string or x to exit: ")
strInputString = Console.ReadLine()
If strInputString = "x" Then
Continue While
End If
'Create a new list and append all possible permutations to it.
lstPermutations = New List(Of String)
Append(strInputString, lstPermutations)
'Sort and display list+stats
lstPermutations.Sort()
For Each strPermutation As String In lstPermutations
Console.WriteLine("Permutation: " + strPermutation)
Next
Console.WriteLine("Total: " + lstPermutations.Count.ToString)
Console.WriteLine("")
End While
End Sub
Public Sub Append(ByVal pString As String, ByRef pList As List(Of String))
Dim strInsertValue As String
Dim strBase As String
Dim strComposed As String
'Add the base string to the list if it doesn't exist
If pList.Contains(pString) = False Then
pList.Add(pString)
End If
'Iterate through every possible set of characters
For intLoop As Integer = 1 To pString.Length - 1
'we need to slide and call an interative function.
For intInnerLoop As Integer = 0 To pString.Length - intLoop
'Get a base insert value, example (a,ab,abc)
strInsertValue = pString.Substring(intInnerLoop, intLoop)
'Remove the base insert value from the string eg (bcd,cd,d)
strBase = pString.Remove(intInnerLoop, intLoop)
'insert the value from the string into spot and check
For intCharLoop As Integer = 0 To strBase.Length - 1
strComposed = strBase.Insert(intCharLoop, strInsertValue)
If pList.Contains(strComposed) = False Then
pList.Add(strComposed)
'Call the same function to review any sub-permutations.
Append(strComposed, pList)
End If
Next
Next
Next
End Sub
End Module
Without actually creating a project to run this code, nor knowing how it 'doesn't accept' long words, my answer would be that there are a lot of permutations for long words and your program is just taking much longer than you're expecting to run. So you probably think it has crashed.
UPDATE:
The problem is the recursion, it's blowing up the stack. You'll have to rewrite your code to use an iteration instead of recursion. Generally explained here
http://www.refactoring.com/catalog/replaceRecursionWithIteration.html
Psuedo code here uses iteration instead of recursion
Generate list of all possible permutations of a string

Using Functions in Visual Basic

The program I'm working on has two different functions, one that calculates the number of syllables in a text file, and another that calculates the readability of the text file based on the formula
206.835-85.6*(Number of Syllables/Number of Words)-1.015*(Number of Words/Number of Sentences)
Here are the problems I'm having:
I'm supposed to display the contents of the text file in a multi-line text box.
I'm supposed to display the answer I get from the function indexCalculation in a label below the text box.
I'm having trouble calling the function to actually have the program calculate the answer to be displayed in the label.
Here is the code I have so far.
Option Strict On
Imports System.IO
Public Class Form1
Private Sub ExitToolStripMenuItem_Click(sender As Object, e As EventArgs) Handles ExitToolStripMenuItem.Click
Me.Close()
End Sub
Private Sub OpenToolStripMenuItem_Click(sender As Object, e As EventArgs) Handles OpenToolStripMenuItem.Click
Dim open As New OpenFileDialog
open.Filter = "text files |project7.txt|All file |*.*"
open.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory)
If open.ShowDialog() = Windows.Forms.DialogResult.OK Then
Dim selectedFileName As String = System.IO.Path.GetFileName(open.FileName)
If selectedFileName.ToLower = "project7.txt" Then
Dim text As String = File.ReadAllText("Project7.txt")
Dim words = text.Split(" "c)
Dim wordCount As Integer = words.Length
Dim separators As Char() = {"."c, "!"c, "?"c, ":"c}
Dim sentences = text.Split(separators, StringSplitOptions.RemoveEmptyEntries)
Dim sentenceCount As Integer = sentences.Length
Dim vowelCount As Integer = 0
For Each word As String In words
vowelCount += CountSyllables(word)
Next
vowelCount = CountSyllables(text)
Label1.Show(indexCalculation(wordCount, sentenceCount, vowelCount))
Else
MessageBox.Show("You cannot use that file!")
End If
End If
End Sub
Function CountSyllables(word As String) As Integer
word = word.ToLower()
Dim dipthongs = {"oo", "ou", "ie", "oi", "ea", "ee", _
"eu", "ai", "ua", "ue", "au", "io"}
For Each dipthong In dipthongs
word = word.Replace(dipthong, dipthong(0))
Next
Dim vowels = "aeiou"
Dim vowelCount = 0
For Each c In word
If vowels.IndexOf(c) >= 0 Then vowelCount += 1
Next
If vowelCount = 0 Then
vowelCount = 1
End If
Return vowelCount
End Function
Function indexCalculation(ByRef wordCount As Integer, ByRef sentenceCount As Integer, ByRef vowelCount As Integer) As Integer
Dim answer As Integer = CInt(206.835 - 85.6 * (vowelCount / wordCount) - 1.015 * (wordCount / sentenceCount))
Return answer
End Function
End Class
Any suggestions would be greatly appreciated.
Here are my suggestions:
update your indexCalculation function to take in Integers, not strings. that way you don't have to convert them to numbers.
remove all of your extra variables you are not using. this will clean things up a bit.
remove your streamreader. it appears you are reading the text via File.ReadAllText
Label1.Show(answer) should be changed to Label1.Show(indexCalculation(wordCount,sentenceCount,vowelCount)) -- unless Label1 is something other than a regular label, use Label1.Text = indexCalculation(wordCount,sentenceCount,vowelCount))
Then for the vowelCount, you need to do the following:
Dim vowelCount as Integer = 0
For Each word as String in words
vowelCount += CountSyllables(word)
Next
Also, add the logic to the CountSyllables function to make it 1 if 0. If you don't want to include the last character in your vowel counting, then use a for loop instead of a for each loop and stop 1 character short.

How do I create a loop statement that finds the specific amount of characters in a string?

Beginner here, bear with me, I apologize in advance for any mistakes.
It's some homework i'm having a bit of trouble going about.
Overall goal: outputting the specific amount of characters in a string using a loop statement. Example being, user wants to find how many "I" is in "Why did the chicken cross the road?", the answer should be 2.
1) The form/gui has 1 MultiLine textbox and 1 button titled "Search"
2) User enters/copys/pastes text into the Textbox clicks "Search" button
3) Search button opens an InputBox where the user will type in what character(s) they want to search for in the Textbox then presses "Ok"
4) (where I really need help) Using a Loop Statement, The program searches and counts the amount of times the text entered into the Inputbox, appears in the text inside the MultiLine Textbox, then, displays the amount of times the character showed up in a "messagebox.show"
All I have so far
Private Sub Search_btn_Click(sender As System.Object, e As System.EventArgs) Handles Search_btn.Click
Dim counterInt As Integer = 0
Dim charInputStr As String
charInputStr = CStr(InputBox("Enter Search Characters", "Search"))
I would use String.IndexOf(string, int) method to get that. Simple example of concept:
Dim input As String = "Test input string for Test and Testing the Test"
Dim search As String = "Test"
Dim count As Integer = -1
Dim index As Integer = 0
Do
index = input.IndexOf(search, index) + 1
count += 1
Loop While index > 0
count is initialized with -1 because of do-while loop usage - it will be set to 0 even if there is no pattern occurrence in input string.
Try this Code
Dim input As String = "Test input string for Test and Testing the Test"
Dim search() As String = {"Te"}
MsgBox(input.Split(input.Split(search, StringSplitOptions.None), StringSplitOptions.RemoveEmptyEntries).Count)
Concept: Increment the count until the input containing the particular search string. If it contains the search string then replace the first occurance with string.empty (In String.contains() , the search starts from its first index, that is 0)
Dim input As String = "Test input string for Test and Testing the Test"
Dim search As String = "T"
Dim count As Integer = 0
While input.Contains(search) : input = New Regex(search).Replace(input, String.Empty, 1) : count += 1 : End While
MsgBox(count)
Edit:
Another solution:
Dim Input As String = "Test input string for Test and Testing the Test"
Dim Search As String = "Test"
MsgBox((Input.Length - Input.Replace(Search, String.Empty).Length) / Search.Length)
try this code.... untested but i know my vb :)
Function lol(ByVal YourLine As String, ByVal YourSearch As String)
Dim num As Integer = 0
Dim y = YourLine.ToCharArray
Dim z = y.Count
Dim x = 0
Do
Dim l = y(x)
If l = YourSearch Then
num = num + 1
End If
x = x + 1
Loop While x < z
Return num
End Function
Its a function that uses its own counter... for every character in the string it will check if that character is one that you have set (YourSearch) and then it will return the number of items that it found. so in your case it would return 2 because there are two i's in your line.
Hope this helps!
EDIT:
This only works if you are searching for individual Characters not words
You can try with something like this:
Dim sText As String = TextBox1.Text
Dim searchChars() As Char = New Char() {"i"c, "a"c, "x"c}
Dim index, iCont As Integer
index = sText.IndexOfAny(searchChars)
While index >= 0 Then
iCont += 1
index = sText.IndexOfAny(searchChars, index + 1)
End While
Messagebox.Show("Characters found " & iCont & " times in text")
If you want to search for words and the times each one is appearing try this:
Dim text As String = TextBox1.Text
Dim wordsToSearch() As String = New String() {"Hello", "World", "foo"}
Dim words As New List(Of String)()
Dim findings As Dictionary(Of String, List(Of Integer))
'Dividing into words
words.AddRange(text.Split(New String() {" ", Environment.NewLine()}, StringSplitOptions.RemoveEmptyEntries))
findings = SearchWords(words, wordsToSearch)
Console.WriteLine("Number of 'foo': " & findings("foo").Count)
With this function used:
Private Function SearchWords(ByVal allWords As List(Of String), ByVal wordsToSearch() As String) As Dictionary(Of String, List(Of Integer))
Dim dResult As New Dictionary(Of String, List(Of Integer))()
Dim i As Integer = 0
For Each s As String In wordsToSearch
dResult.Add(s, New List(Of Integer))
While i >= 0 AndAlso i < allWords.Count
i = allWords.IndexOf(s, i)
If i >= 0 Then dResult(s).Add(i)
i += 1
End While
Next
Return dResult
End Function
You will have not only the number of occurances, but the index positions in the file, grouped easily in a Dictionary.