permutation not accepting large words - vb.net

the vb.net code below permutates a given word...the problem i have is that it does not accept larger words like "photosynthesis", "Calendar", etc but accepts smaller words like "book", "land", etc ...what is missing...Pls help
Module Module1
Sub Main()
Dim strInputString As String = String.Empty
Dim lstPermutations As List(Of String)
'Loop until exit character is read
While strInputString <> "x"
Console.Write("Please enter a string or x to exit: ")
strInputString = Console.ReadLine()
If strInputString = "x" Then
Continue While
End If
'Create a new list and append all possible permutations to it.
lstPermutations = New List(Of String)
Append(strInputString, lstPermutations)
'Sort and display list+stats
lstPermutations.Sort()
For Each strPermutation As String In lstPermutations
Console.WriteLine("Permutation: " + strPermutation)
Next
Console.WriteLine("Total: " + lstPermutations.Count.ToString)
Console.WriteLine("")
End While
End Sub
Public Sub Append(ByVal pString As String, ByRef pList As List(Of String))
Dim strInsertValue As String
Dim strBase As String
Dim strComposed As String
'Add the base string to the list if it doesn't exist
If pList.Contains(pString) = False Then
pList.Add(pString)
End If
'Iterate through every possible set of characters
For intLoop As Integer = 1 To pString.Length - 1
'we need to slide and call an interative function.
For intInnerLoop As Integer = 0 To pString.Length - intLoop
'Get a base insert value, example (a,ab,abc)
strInsertValue = pString.Substring(intInnerLoop, intLoop)
'Remove the base insert value from the string eg (bcd,cd,d)
strBase = pString.Remove(intInnerLoop, intLoop)
'insert the value from the string into spot and check
For intCharLoop As Integer = 0 To strBase.Length - 1
strComposed = strBase.Insert(intCharLoop, strInsertValue)
If pList.Contains(strComposed) = False Then
pList.Add(strComposed)
'Call the same function to review any sub-permutations.
Append(strComposed, pList)
End If
Next
Next
Next
End Sub
End Module

Without actually creating a project to run this code, nor knowing how it 'doesn't accept' long words, my answer would be that there are a lot of permutations for long words and your program is just taking much longer than you're expecting to run. So you probably think it has crashed.
UPDATE:
The problem is the recursion, it's blowing up the stack. You'll have to rewrite your code to use an iteration instead of recursion. Generally explained here
http://www.refactoring.com/catalog/replaceRecursionWithIteration.html
Psuedo code here uses iteration instead of recursion
Generate list of all possible permutations of a string

Related

I want to save a file in VB.net from the amount of tries a user has

I'm a student trying to learn Console App .Net Framework and I want to code a random number between 0000 and 9999 (as a pin that you need to guess). Thus far I've had to set it as a random number from 1000 to 9999 as the system wont let me do 0000. Furthermore, I want to save the amount of tries the user has as a text file e.g. if the user tries 50 times, I'd like it to say
Username Tries
I've tried Randomise() Rnd(*9999) and X = EasyNum.Next(1000, 9999) but then I can't compare unless I convert that to an integer.
Module Module1
Dim Tries As String = 0
Dim EasyNum As New Random
Dim HardNum As New Random
Dim Attempt As String
Sub Main()
Dim Difficulty As String
Console.WriteLine("Welcome to MasterMind")
Console.WriteLine("Choose between Easy and Hard difficulty")
Difficulty = Strings.LCase(Console.ReadLine)
While Difficulty <> "easy" And Difficulty <> "hard"
Console.WriteLine("That's not a correct mode")
Difficulty = Strings.LCase(Console.ReadLine)
End While
If Difficulty = "easy" Then
Easy()
ElseIf Difficulty = "hard" Then
Hard()
End If
End Sub
Dim EasyGuess1 As Integer
Dim EasyGuess2 As Integer
Dim X As String
Dim Y As Integer
Sub Easy()
Console.WriteLine("You have chosen the easy difficulty")
Console.WriteLine("You have to guess a 4 Digit number between 1000 and 9999")
Console.WriteLine("Enter your guess")
X = EasyNum.Next(1000, 9999)
Console.WriteLine(X)
EasyGuess1 = Console.ReadLine
Tries = +1
If Mid(CStr(EasyGuess1), 1, 1) = Mid(CStr(X), 1, 1) Then
Console.WriteLine("You have 1 number correct, try again?")
Attempt = Strings.LCase(Console.ReadLine)
While Attempt <> "yes" And Attempt <> "no"
Console.WriteLine("Enter either yes or no")
Attempt = Strings.LCase(Console.ReadLine)
End While
ElseIf Mid(CStr(EasyGuess1), 2, 1) = Mid(CStr(X), 2, 1) Then
Console.WriteLine("You have 1 number correct, try again?")
Attempt = Strings.LCase(Console.ReadLine)
While Attempt <> "yes" And Attempt <> "no"
Console.WriteLine("Enter either yes or no")
Attempt = Strings.LCase(Console.ReadLine)
End While
End If
If Attempt = "yes" Then
EasyYes()
ElseIf Attempt = "no" Then
EasyNo()
End If
Console.WriteLine("incorrect")
Console.ReadKey()
End Sub
Sub EasyYes()
Console.WriteLine("Enter a new guess")
EasyGuess1 = Console.ReadLine
End Sub
Sub EasyNo()
Dim Save As String
Dim File As System.IO.File
Console.WriteLine("Do you want to save your tries? Enter Yes or no")
Save = Strings.LCase(Console.ReadLine)
If Save = "yes" Then
System.IO.File.Create(Tries, "C:\X\Code\VB\Challenges\Challenge 1\MasterMind Test")
End If
End Sub
Sub Hard()
Console.WriteLine("You have chosen the hard difficulty")
End Sub
Sub HardYes()
End Sub
Sub HardNo()
End Sub
End Module
When I try to save the tries, I get this:
System.InvalidCastException: 'Conversion from string "C:\X\Code\VB\Challenges\Cha" to type 'Integer' is not valid.'
InnerException
FormatException: Input string was not in a correct format.
Which I don't understand myself.
Comments in line. Please read all comments. This program is still not working too well but I will leave it to you to tidy up.
Module Module1
Dim Tries As Integer = 0
'Use only a single instance of Random
Dim EasyNum As New Random
'Dim HardNum As New Random
Dim Attempt As String
Sub Main()
Dim Difficulty As String
Console.WriteLine("Welcome to MasterMind")
Console.WriteLine("Choose between Easy and Hard difficulty")
Difficulty = Strings.LCase(Console.ReadLine)
'AndAlso prevents the second condition from executing if the first condition is true
While Difficulty <> "easy" AndAlso Difficulty <> "hard"
Console.WriteLine("That's not a correct mode")
Difficulty = Strings.LCase(Console.ReadLine)
End While
If Difficulty = "easy" Then
Easy()
'ElseIf Difficulty = "hard" Then
' Hard() 'Not implemented
End If
End Sub
'Dim EasyGuess2 As Integer
'Dim X As String
'Dim Y As Integer
Sub Easy()
Dim EasyGuess1 As Integer
Console.WriteLine("You have chosen the easy difficulty")
Console.WriteLine("You have to guess a 4 Digit number between 1000 and 9999")
Console.WriteLine("Enter your guess")
'X = EasyNum.Next(1000, 9999)
'The next method returns a non-negative integer not a string
'To get 0 to 9999 with leading zeros do the following
'Returns a non-negative random integer that is less than the specified maximum.
Dim X = EasyNum.Next(10000).ToString("0000")
Console.WriteLine(X)
'Console.ReadLine returns a String. You should Not assign a String to an Integer
'EasyGuess1 = Console.ReadLine
'Never trust a user :-) Use .TryParse to set the variable
Integer.TryParse(Console.ReadLine, EasyGuess1)
'This just assigns the value of 1 to Tries
'Tries = +1
'If you want to increment Tries
Tries += 1
'Let's compare apples and apples
'If you just convert EasyGuess1 To a string and EasyGuess1 is 54
'the string is "54" You want a 4 character string
'If Mid(EasyGuess1.ToString("0000"), 1, 1) = Mid(CStr(X), 1, 1) Then
' 'but you only compared the first character, what about the rest?
' 'Mid is aroung for backward compatibility. It should not be used for new code.
' 'The position Integer is one based which is not at all sympatico with .net
' Console.WriteLine("You have 1 number correct, try again?")
' Attempt = Strings.LCase(Console.ReadLine)
' While Attempt <> "yes" And Attempt <> "no"
' Console.WriteLine("Enter either yes or no")
' Attempt = Strings.LCase(Console.ReadLine)
' End While
'ElseIf Mid(CStr(EasyGuess1), 2, 1) = Mid(CStr(X), 2, 1) Then
' Console.WriteLine("You have 1 number correct, try again?")
' Attempt = Strings.LCase(Console.ReadLine)
' While Attempt <> "yes" And Attempt <> "no"
' Console.WriteLine("Enter either yes or no")
' Attempt = Strings.LCase(Console.ReadLine)
' End While
'End If
Dim CorrectCharacters = TestInput(EasyGuess1, X)
Console.WriteLine($"You have guessed {CorrectCharacters} correctly. Try again?")
'Use the .net way. The framework is available to all languages in .net
'so it will be easier to learn other languages.
Attempt = (Console.ReadLine).ToLower
If Attempt = "yes" Then
EasyYes()
ElseIf Attempt = "no" Then
EasyNo()
End If
Console.WriteLine("incorrect")
Console.ReadKey()
End Sub
Sub EasyYes()
Dim guess As Integer
Console.WriteLine("Enter a new guess")
'You are doing this twice ???
'EasyGuess1 = Console.ReadLine
Integer.TryParse(Console.ReadLine, guess)
'EasyGuess1 will be 0 if the entry is other than an integer
'Very nice bu this Sub doesn't do anything
End Sub
Sub EasyNo()
Dim Save As String
'Unused local variable and if you needed this it is a poor choice for a variable name
'Dim File As System.IO.File
Console.WriteLine("Do you want to save your tries? Enter Yes or no")
Save = Console.ReadLine.ToLower
If Save = "yes" Then
'Give the file name an extension.
'When you pass (String, Integer) to the Create method, the Integer is the number of bytes
'buffered for reads And writes to the file.
'Not exactly what you were expecting.
'System.IO.File.Create("C:\X\Code\VB\Challenges\Challenge 1\MasterMind Test.txt", Tries)
'Imports System.IO at top of file
File.WriteAllText("C:\X\Code\VB\Challenges\Challenge 1\MasterMind Test.txt", Tries.ToString)
End If
End Sub
Sub Hard()
Console.WriteLine("You have chosen the hard difficulty")
End Sub
Private Function TestInput(Guessed As Integer, RandString As String) As Integer
Dim Correct As Integer
Dim Guess As String = Guessed.ToString("0000")
'A String can be a Char array
'Here we check each character in the 2 strings for equality
'and if true then increment Correct
For i = 0 To 3
If Guess(i) = RandString(i) Then
Correct += 1
End If
Next
Return Correct
End Function
End Module
There are a lot of issues, but I don't want to spoil it for you.
I definitely recommend to put "Option Strict On" in the project settings, so you immediately see where there are conversion errors (a string assigned to a integer etc).
To save the file, it should be something like
If Save = "yes" Then
File.WriteAllText("C:\X\Code\VB\Challenges\Challenge 1\MasterMind Test", Tries.ToString())
End If
(there are also File.Append... functions).
The Random class is a bit tricky, this is an object that provides random values and is not yet the random value itself. Always use the same random object for all different numbers, otherwise you might get the same number:
Private Randomizer As New Random(Environment.TickCount)
Private EasyNum As Int32 = Randomizer.Next(0, 10000) '0 to 9999
Private HardNum As Int32 = Randomizer.Next(0, 100000) '0 to 99999
The "from" value of the randomizer's Next method is always inclusive, the "to" value exclusive, for whatever reasons.
To increment a variable, the syntax is
Tries += 1
instead of "Tries = +1"
To write a number with leading digits, use might use
Console.Out.WriteLine($"The correct solution would have been: {EasyNum:0000}")
or
EasyNum.ToString("0000")

VB Check a Only Exactly Number in Textbox

Take a look at this picture. - http://www.imagebam.com/image/f544011007926944
I want to check in Textbox1, in the textbox2 enter the number and if there is that number I will appear in another text box. the problem occurs when in textbox1 there are numbers like 10,11, if I enter textbox2 number 1 then it will be taken as such, it will appear as if there is the number 1. I will only use for numbers from 1 to 80 .
where am I wrong?
' Split string based on space
Dim textsrtring As String = TextBox1.Text
Dim words As String() = textsrtring.Split(New Char() {" "c})
Dim found As Boolean = False
' Use For Each loop over words
Dim word As String
For Each word In words
If TextBox2.Lines(0).Contains(word) Then
found = True
If CheckVar1.Text.Contains(word) Then
Else
CheckVar1.Text = CheckVar1.Text + " " + TextBox1.Text()
End If
End If
So I think I know what you want. Your issue is the Compare Function on a String is comparing string literals and not numbers '11' contains '1' and '1' for Compare will return true that it contains a '1' in it. You need to convert the strings to numbers and then compare the numbers to each other.
Private Sub CompareNumbers()
'First Textbox that is to be used for compare
Dim textBox1Numbers As List(Of Integer) = GetNumbersFromTextLine(TextBox1.Text)
'Second Textbox that is to be used for compare
Dim textBox2Numbers As List(Of Integer) = GetNumbersFromTextLine(TextBox2.Text)
'Union List of Common Numbers (this uses a lambda expression, it can be done using two For Each loops instead.)
Dim commonNumbers As List(Of Integer) = textBox1Numbers.Where(Function(num) textBox2Numbers.Contains(num)).ToList()
'This is purely for testing to see if it worked you can.
Dim sb As StringBuilder = New StringBuilder()
For Each foundNum As Integer In commonNumbers
sb.Append(foundNum.ToString()).Append(" ")
Next
MessageBox.Show(sb.ToString())
End Sub
Private Function GetNumbersFromTextLine(sTextLine As String) As List(Of Integer)
Dim numberList As List(Of Integer) = New List(Of Integer)()
Dim sSplitNumbers As String() = sTextLine.Split(" ")
For Each sNumber As String In sSplitNumbers
If IsNumeric(sNumber) Then
Dim iNum As Integer = CInt(sNumber)
If Not numberList.Contains(iNum) Then
numberList.Add(iNum)
End If
Else
MessageBox.Show("Non Numeric Found in String :" + sTextLine)
End If
Next
Return numberList
End Function

Creating multiple .txt files while restricting size of each

In my program, I collect bits of information on a massive scale, hundreds of thousands to millions of lines each. I am trying to limit each file I create to a certain size in order to be able to quickly open it and read the data. I am using a HashSet to collect all the data without duplicates.
Here's my code so far:
Dim Founds As HashSet(Of String)
Dim filename As String = (Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\Sorted_byKING\sorted" + Label4.Text + ".txt")
Using writer As New System.IO.StreamWriter(filename)
For Each line As String In Founds
writer.WriteLine(line)
Next
Label4.Text = Label4.Text + 1 'Increments sorted1.txt, sorted2.txt etc
End Using
So, my question is:
How do I go about saving, let's say 250,000 lines in a text file before moving to another one and adding the next 250,000?
First of all, do not use Labels to simply store values. You should use variables instead, that's what variables are for.
Another advice, always use Path.Combine to concatenate paths, that way you don't have to worry about if each part of the path ends with a separator character or not.
Now, to answer your question:
If you'd like to insert the text line by line, you can use something like:
Sub SplitAndWriteLineByLine()
Dim Founds As HashSet(Of String) 'Don't forget to initialize and fill your HashSet
Dim maxLinesPerFile As Integer = 250000
Dim fileNum As Integer = 0
Dim counter As Integer = 0
Dim filename As String = String.Empty
Dim writer As IO.StreamWriter = Nothing
For Each line As String In Founds
If counter Mod maxLinesPerFile = 0 Then
fileNum += 1
filename = IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),
$"Sorted_byKING\sorted{fileNum.ToString}.txt")
If writer IsNot Nothing Then writer.Close()
writer = New IO.StreamWriter(filename)
End If
writer.WriteLine(line)
counter += 1
Next
writer.Dispose()
End Sub
However, if you will be inserting the text from the HashSet as is, you probably don't need to write line by line, instead you can write each "bunch" of lines at once. You could use something like the following:
Sub SplitAndWriteAll()
Dim Founds As HashSet(Of String) 'Don't forget to initialize and fill your HashSet
Dim maxLinesPerFile As Integer = 250000
Dim fileNum As Integer = 0
Dim filename As String = String.Empty
For i = 0 To Founds.Count - 1 Step maxLinesPerFile
fileNum += 1
filename = IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),
$"Sorted_byKING\sorted{fileNum.ToString}.txt")
IO.File.WriteAllLines(filename, Founds.Skip(i).Take(maxLinesPerFile))
Next
End Sub

VB.NET - List.Contains Returns False But Should be True

I had a look at List.Contains returns false, even though it seems it should return true but his code structure is a bit different to mine so im unsure if I have the same issue.
Before I continue, Let me explain what my result should be.
We have 2 input files, File 1 with email:hash's the other with a mix of email:hash and email:plain.
End output: If the Second file has plaintext after :, Output it (Making sure not to make duplicates when outputting file 1's email:hash's if no file 2 line for that email/hash is made), Otherwise output with the Hash.
tl;dr - Basically make the Second File overwrite prioritized over the First File.
(First File Randomized)
ABC_123#gmail.com:f6deea50e7eeb2d930fab83ccc32cdfe
123abc#domain.ext:82e6eeea4060c90cc3dc6ddd25885806
123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd
abc123#email.com:2d366131008f89781b8379bed3451656
(Second File Randomized)
123abc#domain.ext:aaaaaaaa
ABC_123#gmail.com:cccccccc
abc123#email.com:bbbbbbbb
newemail#hotmail.com:ddddddddd
Output should be:
123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd
123abc#domain.ext:aaaaaaaa
ABC_123#gmail.com:cccccccc
abc123#email.com:bbbbbbbb
newemail#hotmail.com:ddddddddd
(Output from Tests - "->" lines shouldn't be outputted.)
123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd
->123abc#domain.ext:82e6eeea4060c90cc3dc6ddd25885806
123abc#domain.ext:aaaaaaaa
ABC_123#gmail.com:cccccccc
->ABC_123#gmail.com:f6deea50e7eeb2d930fab83ccc32cdfe
abc123#email.com:bbbbbbbb
newemail#hotmail.com:ddddddddd
In the Second OpenFileDialog Block it always returns false until the LAST line in the For Each combo as Match in matches.
Weirdly, If I change the second regex from (.*)#(.*):(.*) to (.*)#(.*):([a-f0-9]{32}) it for some reason works, The issue with that is it will only match Email#domain.ext:{md5} and won't match for example Email#domain.ext:abc123 which is a requirement.
(Latest code update where I was messing around to try fix it broke it even more so this doesn't even work now).
I slept for once and came back on to try fix it, So far im almost there, It's overwriting correctly except for on one email:hash for some reason.
Image showing error
As you can see it changed the 123abc#domain.ext from the hash to aaaaaaa but for the ABC_123#gmail.com it didn't for some strange reason. Also yes the abc123#email.com hash does change so it's odd that a random email:hash didn't change.
I have been at this for about 9 12+ hours straight. (No Exaggeration) and i'd really love an enlightenment on what's going on.
I have tried so many alternatives and such that I can't even remember at this point.
Code: (Updated x3)
Reminder: Read above on what im trying to achieve :)
#Region "Merge Combo's"
Private Sub List_Merge_Click(sender As Object, e As EventArgs) Handles List_Merge.Click
Dim ofd = New OpenFileDialog()
ofd.Title = "Import..."
ofd.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
ofd.Filter = "Text files|*.txt"
ofd.Multiselect = True
'If the user selects 2 Files;
If (ofd.ShowDialog() = DialogResult.OK And ofd.CheckFileExists = True) Then
'Make sure there are no previously stored Conversions;
ActionList2.Items.Clear()
'Variables;
Dim MergedCombos As Integer = 0
Dim TotalCombos As Integer = 0
Try
For Each filename As String In ofd.FileNames
Using sr As New StreamReader(filename)
Dim result = filename.Union(filename, New MailEqualityComparer)
'Get all Matches found from the Regex Condition;
Dim combos As MatchCollection = New Regex("^([^#]+)#(.*):(.*)$", RegexOptions.Multiline).Matches(sr.ReadToEnd)
'Add each Match to the ActionList except for Duplicates;
For Each combo As Match In combos
'Increment the Total Combo's count;
TotalCombos += 1
'If the ActionList doesn't contain the same Combo;
If Not ActionList2.Items.Contains(combo.Value) Then
'If the email is already in the ActionList;
If IsInListbox(ActionList2, combo.Groups(1).Value + "#" + combo.Groups(2).Value) = True Then
'This combo is presumed to be a Hash Decrypted Combo - Overwrite it with the Encrypted Hash;
ActionList2.Items.Add(combo.Value)
'Remove the Hash Item from ActionList;
ActionList2.Items.RemoveAt(FindListboxIndex(ActionList2, combo.Groups(1).Value + "#" + combo.Groups(2).Value))
Else
'Add the Combo;
ActionList2.Items.Add(combo.Value)
End If
End If
Next
End Using
Next
Catch ex As Exception
Console.WriteLine("Error: " + ex.ToString)
Finally
'If atleast 1 Item is in the ActionList, Enable the Export Button;
If ActionList2.Items.Count > 0 Then
ExportButton.Enabled = True
ExportButton.BackColor = Color.FromArgb(149, 255, 141)
End If
'Update the Merged Combo's count;
StatusBar_LeftText.Text = MergedCombos.ToString
'If MergedCombos are less than TotalCombos, Add a "x Duplicates Removed" message;
If MergedCombos < TotalCombos Then
StatusBar_LeftText.Text += " - " + (TotalCombos - MergedCombos).ToString + " Duplicates Removed"
End If
'Autoscroll;
ActionList2.TopIndex = ActionList2.Items.Count - 1
End Try
End If
End Sub
Private Function FindListboxIndex(lb As ListBox, searchString As String) As Integer
For i As Integer = 0 To lb.Items.Count - 1
If lb.Items(i).ToString().Contains(searchString) Then
Return i
Exit Function
End If
Next
Return -1
End Function
Private Function IsInListbox(lb As ListBox, searchString As String) As Boolean
For i As Integer = 0 To lb.Items.Count - 1
If lb.Items(i).ToString().Contains(searchString) Then
Return True
Exit Function
End If
Next
Return False
End Function
#End Region
Hi, I think this should fit to your Needs.
Dim ofd = New OpenFileDialog()
ofd.Title = "Import..."
ofd.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
ofd.Filter = "Text files|*.txt"
ofd.Multiselect = True
Dim output As New Dictionary(Of String, String) 'key = mail, value = plain/hash
If (ofd.ShowDialog() = DialogResult.OK And ofd.CheckFileExists = True) Then
For Each filename As String In ofd.FileNames
Using sr As New StreamReader(filename)
Dim combos As MatchCollection = New Regex("(.*)#(.*):(.*)").Matches(sr.ReadToEnd)
For Each match As Match In combos
Dim tmp() As String = Split(match.ToString, ":")
tmp(1) = tmp(1).Replace(vbCr, "").Replace(vbLf, "") 'Delete carriage return
If Not output.ContainsKey(tmp(0)) Then
output.Add(tmp(0), tmp(1))
Else
If output(tmp(0)).Length = 32 Then 'condition whether to change the value or not. You need to design it to your needs.
output(tmp(0)) = tmp(1)
End If
End If
Next match
End Using
Next filename
End If
I don't know if there is a carriage return in your data. I tried with simple txt files and there was.
You need to change the condition to your needs. I saw the hashes have 32 signs and the plain text does not..
What you want is the union of the two set of lines you have and that's exactly what Enumerable.Union do provided we have a way to distinguish equal elements.
Starting from this we first define that equality comparer :
Class MailEqualityComparer
Implements IEqualityComparer(Of String)
Overloads Function Equals(mail1 As String, mail2 As String) As Boolean Implements IEqualityComparer(Of String).Equals
' assume input validated
Return mail1.Split(":"c)(0).Equals(mail2.Split(":"c)(0))
End Function
Overloads Function GetHashCode(mail As String) As Integer Implements IEqualityComparer(Of String).GetHashCode
' assume input validated
Return mail.Split(":"c)(0).GetHashCode
End Function
End Class
That is for this class two string are equal if their part before the : are equal.
Then you just have to do the Union using an instance of that class to ensure proper equality :
' file1 and file2 are String arrays
' they could be the result of File.ReadAllLines for example
Dim file1 = {
"ABC_123#gmail.com: f6deea50e7eeb2d930fab83ccc32cdfe",
"123abc#domain.ext:82e6eeea4060c90cc3dc6ddd25885806",
"123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd",
"abc123#email.com: 2d366131008f89781b8379bed3451656"
}
Dim file2 = {
"123abc#domain.ext: aaaaaaaa",
"ABC_123#gmail.com: cccccccc",
"abc123#email.com: bbbbbbbb",
"newemail#hotmail.com: ddddddddd"
}
' result is an IEnumerable(Of String)
' add ToArray (for example) if you want to materialize the result
Dim result = file2.Union(file1, New MailEqualityComparer) ' note the order ; it matters
' result content :
' ----------------
' 123abc#domain.ext: aaaaaaaa
' ABC_123#gmail.com: cccccccc
' abc123#email.com: bbbbbbbb
' newemail#hotmail.com: ddddddddd
' 123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd
It just leaves us with "how do we determine the order for Union parameters ?"
For that we have to have a way to know which file contains the "plain text" stuff, information you didn't provided at time of writing.
(Note: the same could have been achieved using Hashset(Of String) or SortedSet(Of String) and their UnionWith method)
[SortedSet requires an IComparer(Of String) instead of an IEqualityComparer(Of String)]
Edit after comment
If I understood correctly your comment (which I'm not sure) ; here is what could be done using a SortedSet :
Class MailComparer
Implements IComparer(Of String)
Function Compare(mail1 As String, mail2 As String) As Integer Implements IComparer(Of String).Compare
' assume input validated
Return mail1.Split(":"c)(0).CompareTo(mail2.Split(":"c)(0))
End Function
End Class
' file1 and file2 are String arrays
' they could be the result of File.ReadAllLines for example
Dim file1 = {
"ABC_123#gmail.com: f6deea50e7eeb2d930fab83ccc32cdfe",
"123abc#domain.ext:82e6eeea4060c90cc3dc6ddd25885806",
"123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd",
"abc123#email.com: 2d366131008f89781b8379bed3451656"
}
Dim file2 = {
"123abc#domain.ext: aaaaaaaa",
"ABC_123#gmail.com: cccccccc",
"abc123#email.com: bbbbbbbb",
"newemail#hotmail.com: ddddddddd"
}
' allLines is an IEnumerable(Of String)
Dim allLines = file2.Concat(file1) ' note the order ; it matters
Dim result As New SortedSet(Of String)(allLines, New MailComparer)
' result content :
' ----------------
' 123_ABC#gmail.com:8fa5104d4d995dc153e5509ab988bcfd
' 123abc#domain.ext: aaaaaaaa
' ABC_123#gmail.com: cccccccc
' abc123#email.com: bbbbbbbb
' newemail#hotmail.com: ddddddddd
I don't see where it's not simple using an 8 lines class ; but maybe I missed the point...

What is the best way to calculate word frequency in VB.NET?

There are some good examples on how to calculate word frequencies in C#, but none of them are comprehensive and I really need one in VB.NET.
My current approach is limited to one word per frequency count. What is the best way to change this so that I can get a completely accurate word frequency listing?
wordFreq = New Hashtable()
Dim words As String() = Regex.Split(inputText, "(\W)")
For i As Integer = 0 To words.Length - 1
If words(i) <> "" Then
Dim realWord As Boolean = True
For j As Integer = 0 To words(i).Length - 1
If Char.IsLetter(words(i).Chars(j)) = False Then
realWord = False
End If
Next j
If realWord = True Then
If wordFreq.Contains(words(i).ToLower()) Then
wordFreq(words(i).ToLower()) += 1
Else
wordFreq.Add(words(i).ToLower, 1)
End If
End If
End If
Next
Me.wordCount = New SortedList
For Each de As DictionaryEntry In wordFreq
If wordCount.ContainsKey(de.Value) = False Then
wordCount.Add(de.Value, de.Key)
End If
Next
I'd prefer an actual code snippet, but generic 'oh yeah...use this and run that' would work as well.
This might be what your looking for:
Dim Words = "Hello World ))))) This is a test Hello World"
Dim CountTheWords = From str In Words.Split(" ") _
Where Char.IsLetter(str) _
Group By str Into Count()
I have just tested it and it does work
EDIT! I have added code to make sure that it counts only letters and not symbols.
FYI: I found an article on how to use LINQ and target 2.0, its a feels bit dirty but it might help someone http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on-net-2-0.aspx
Public Class CountWords
Public Function WordCount(ByVal str As String) As Dictionary(Of String, Integer)
Dim ret As Dictionary(Of String, Integer) = New Dictionary(Of String, Integer)
Dim word As String = ""
Dim add As Boolean = True
Dim ch As Char
str = str.ToLower
For index As Integer = 1 To str.Length - 1 Step index + 1
ch = str(index)
If Char.IsLetter(ch) Then
add = True
word += ch
ElseIf add And word.Length Then
If Not ret.ContainsKey(word) Then
ret(word) = 1
Else
ret(word) += 1
End If
word = ""
End If
Next
Return ret
End Function
End Class
Then for a quick demo application, create a winforms app with one multiline textbox called InputBox, one listview called OutputList and one button called CountBtn. In the list view create two columns - "Word" and "Freq." Select the "details" list type. Add an event handler for CountBtn. Then use this code:
Imports System.Windows.Forms.ListViewItem
Public Class MainForm
Private WordCounts As CountWords = New CountWords
Private Sub CountBtn_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CountBtn.Click
OutputList.Items.Clear()
Dim ret As Dictionary(Of String, Integer) = Me.WordCounts.WordCount(InputBox.Text)
For Each item As String In ret.Keys
Dim litem As ListViewItem = New ListViewItem
litem.Text = item
Dim csitem As ListViewSubItem = New ListViewSubItem(litem, ret.Item(item).ToString())
litem.SubItems.Add(csitem)
OutputList.Items.Add(litem)
Word.Width = -1
Freq.Width = -1
Next
End Sub
End Class
You did a terrible terrible thing to make me write this in VB and I will never forgive you.
:p
Good luck!
EDIT
Fixed blank string bug and case bug
This might be helpful:
Word frequency algorithm for natural language processing
Pretty close, but \w+ is a good regex to match with (matches word characters only).
Public Function CountWords(ByVal inputText as String) As Dictionary(Of String, Integer)
Dim frequency As New Dictionary(Of String, Integer)
For Each wordMatch as Match in Regex.Match(inputText, "\w+")
If frequency.ContainsKey(wordMatch.Value.ToLower()) Then
frequency(wordMatch.Value.ToLower()) += 1
Else
frequency.Add(wordMatch.Value.ToLower(), 1)
End If
Next
Return frequency
End Function