vb.net Linq statement to find duplicates using fuzzy searching - vb.net

I have a datatable containing:
ID as Integer = Primary Key.
Name as String
OtherFields ....
I have a function that can find duplicate names in the table
Private Function DuplicateComputerNames() As String
Dim ret As String = $"Duplicate Computer Names found in Table Computer.{vbCrLf}"
Dim computerTable As DataTable
computerTable = ComputerTableAdapter.GetData
Dim duplicates = computerTable.AsEnumerable().GroupBy(Function(i) i.Field(Of String)("Name")) _
.Where(Function(g) g.Count() > 1).Select(Function(g) g.Key)
For Each d In duplicates
ret = $"{ret}{vbTab}{d}{vbCrLf}"
Next
Return ret
End Function
This works perfectly.
Now sometimes the name has a prepend of "(POOL)" i.e. "Laptop-55 (POOL)"
I need a function that will find duplicates of just the Laptop-55 part.
i.e. If there is a Laptop-55 and a Laptop-55 (POOL)
I thought this code would do the job, but apparently not.
Private Function PoolComputerNames() As String
Dim ret As String = $"Possible Duplicate Computer Names found in Table Computer.{vbCrLf}"
Dim computerTable As DataTable
computerTable = ComputerTableAdapter.GetData
Dim duplicates = computerTable.AsEnumerable().GroupBy(Function(i) i.Field(Of String)("Name")) _
.Where(Function(g) g.Count() > 1).Select(Function(g) g.Key.Split("("c)(0).Trim())
For Each d In duplicates
ret = $"{ret}{vbTab}{d}{vbCrLf}"
Next
Return ret
End Function
Hoping that someone can point in the right direction.
Thanks.

Thanks for jmcilhinney in the comments I managed to solve my issue.
Private Function PoolComputerNames() As String
Dim count As Integer = 0
Dim ret As String = $"[count] possible duplicate Computer Names found in Table Computer.{vbCrLf}"
Dim computerTable As DataTable
computerTable = ComputerTableAdapter.GetData
Dim duplicates = computerTable.AsEnumerable().GroupBy(Function(i) i.Field(Of String)("Name").Split("("c)(0).Trim()) _
.Where(Function(g) g.Count() > 1).Select(Function(g) g.Key)
For Each d In duplicates
count += 1
ret = $"{ret}{vbTab}{d}{vbCrLf}"
Next
ret = ret.Replace("[count]", $"{count}")
Return ret
End Function

Related

Search and replace inside string column in DataTable is slow?

I am fetching distinct words in a string column of a DataTable (.dt) and then replacing the unique values with another value, so essentially changing words to other words. Both approaches listed below work, however, for 90k records, the process is not very fast. Is there a way to speed up either approach?
The first approach, is as follows:
'fldNo is column number in dt
For Each Word As String In DistinctWordList
Dim myRow() As DataRow
myRow = dt.Select(MyColumnName & "='" & Word & "'")
For Each row In myRow
row(fldNo) = dicNewWords(Word)
Next
Next
A second LINQ-based approach is as follows, and is actually not very fast either:
Dim flds as new List(of String)
flds.Add(myColumnName)
For Each Word As String In DistinctWordsList
Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
ReDim foundrecs(rowData.Count)
Cnt = 0
For Each row As DataRow In rowData
Dim Index As Integer = dt.Rows.IndexOf(row)
foundrecs(Cnt) = Index + 1 'row.RowId
Cnt += 1
Next
For i = 0 To Cnt
dt(foundrecs(i))(fldNo) = dicNewWords(Word)
Next
Next
So you have your dictionary of replacements:
Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"
You can apply them to your table's ReplaceMe column:
Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep
Next r
On my machine it takes 340ms for 1 million replacements. I can cut that down to 260ms by using column number rather than name - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep
Timing:
'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
Dim d As New Dictionary(Of String, String)
For i = 0 To 9
d(i.ToString()) = (i + 10).ToString()
Next
'put a million rows in a datatable, randomly assign dictionary keys as row values
Dim dt As New DataTable
dt.Columns.Add("ReplaceMe")
Dim r As New Random()
Dim k = d.Keys.ToArray()
For i = 1 To 1000000
dt.Rows.Add(k(r.Next(k.Length)))
Next
'what range of values do we have in our dt?
Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
'it's a crappy way to time, but it'll prove the point
Dim start = DateTime.Now
Dim rep As String = Nothing
For Each ro As DataRow In dt.Rows
If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
Next
Dim ennd = DateTime.Now
'what range of values do we have now
Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")

String Manipulation from "[1_5],[1_3],[1_5]" to "5,3,5"

I have the a string in below format
Dim str as String = "[1_5],[1_3],[1_5]"
The part before the _ can be variable is not a fix number
i need to convert it in to format
"5,3,5"
i have used the below code to obtain all the number that i need in the new string in the matches Item Groups
Dim pattern As String = "_(.*?)\]"
Dim matches As MatchCollection =
Regex.Matches(rowpanel.getRequestedArea_selectionArea(), pattern, RegexOptions.Singleline)
My question is how i can join all the Groups to obtain the final string format ?
A Regex solution could be
Dim x = matches.Select(Function(g) g.Groups(1).Value)
Dim final = String.Join(",", x)
A Split and Join one is
Dim blocks As String() = str.Split(",")
Dim result = New List(Of String)()
For Each s In blocks
result.Add(s.Split("_")(1).Trim("]"))
Next
Dim final = String.Join(",", result)
You can do something like this using RegEx and replace:
Dim str As String = "[1_5],[1_3],[1_5]"
Dim RegX As Regex = New Regex("\[\d{1}_")
Dim match As Match = RegX.Match(str)
If match.Success Then
str = Replace(str, match.Value, "")
str = Replace(str, "]", "")
End If
Alternative with simple and readable loop but multiple lines of code wrapped with a method
Public Function ExtractNumbers(text As String) As String
Dim current As New StringBuilder()
Dim write As Boolean = False
For Each character As Char In text
If character = "_"c Then
write = True
Continue For
End If
If character = "]"c Then
write = False
current.Append(",")
Continue For
End If
If write Then
current.Append(character)
Continue For
End If
Next
current.Length -= 1
Return current.ToString()
End Function
Usage
var extracted = ExtractNumbers("[1_5],[1_3],[1_5]")
Console.WriteLine(extracted)
' 5,3,5

Linq from CSV source return empty record on VB.net

I have data extract from csv using vb.net as below code
Dim district As New List(Of tbDistrict)
Dim path As String = AppDomain.CurrentDomain.BaseDirectory & "tb_district.csv"
Dim results = (From line In File.ReadAllLines(path)
Let value = line.Split(";").
Select(Function(x) x)
Select New With {.district_id = value(0),
.district_code = value(1),
.amphur_id = value(2),
.district_name = value(3),
.province_id = value(4)}).ToList()
'Console.WriteLine(results.Count)
For Each item In results
district.Add(New tbDistrict With {
.district_id = item.district_id,
.district_code = item.district_code,
.district_name = item.district_name,
.amphur_id = item.amphur_id,
.province_id = item.province_id
})
Next
I have 2000 records add to list. Then I try to use linq to get some record by using below code
Dim provinceid As String = "1"
Dim amphurid As String = "1"
Dim mystring As String = "test"
Dim searchtabbol As List(Of tbDistrict) = district.Where(Function(x) _
(x.province_id = provinceid) AndAlso
(x.amphur_id = amphurid) AndAlso
(x.district_name.Contains(mystring.Trim()))).ToList()
Console.WriteLine(searchtabbol.Count)
But I get 0 record return although row exist. I'm not sure what's wrong in code?. Appreciated for all advise.

How can I get String values rather than integer

How To get StartString And EndString
Dim startNumber As Integer
Dim endNumber As Integer
Dim i As Integer
startNumber = 1
endNumber = 4
For i = startNumber To endNumber
MsgBox(i)
Next i
Output: 1,2,3,4
I want mo make this like sample: startString AAA endString AAD
and the output is AAA, AAB, AAC, AAD
This is a simple function that should be easy to understand and use. Every time you call it, it just increments the string by one value. Just be careful to check the values in the text boxes or you can have an endless loop on your hands.
Function AddOneChar(Str As String) As String
AddOneChar = ""
Str = StrReverse(Str)
Dim CharSet As String = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Dim Done As Boolean = False
For Each Ltr In Str
If Not Done Then
If InStr(CharSet, Ltr) = CharSet.Length Then
Ltr = CharSet(0)
Else
Ltr = CharSet(InStr(CharSet, Ltr))
Done = True
End If
End If
AddOneChar = Ltr & AddOneChar
Next
If Not Done Then
AddOneChar = CharSet(0) & AddOneChar
End If
End Function
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim S = TextBox1.Text
Do Until S = TextBox2.Text
S = AddOneChar(S)
MsgBox(S)
Loop
End Sub
This works as a way to all the codes given an arbitrary alphabet:
Public Function Generate(starting As String, ending As String, alphabet As String) As IEnumerable(Of String)
Dim increment As Func(Of String, String) = _
Function(x)
Dim f As Func(Of IEnumerable(Of Char), IEnumerable(Of Char)) = Nothing
f = _
Function(cs)
If cs.Any() Then
Dim first = cs.First()
Dim rest = cs.Skip(1)
If first = alphabet.Last() Then
rest = f(rest)
first = alphabet(0)
Else
first = alphabet(alphabet.IndexOf(first) + 1)
End If
Return Enumerable.Repeat(first, 1).Concat(rest)
Else
Return Enumerable.Empty(Of Char)()
End If
End Function
Return New String(f(x.ToCharArray().Reverse()).Reverse().ToArray())
End Function
Dim results = New List(Of String)
Dim text = starting
While True
results.Add(text)
If text = ending Then
Exit While
End If
text = increment(text)
End While
Return results
End Function
I used it like this to produce the required result:
Dim alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Dim results = Generate("S30AB", "S30B1", alphabet)
This gave me 63 values:
S30AB
S30AC
...
S30BY
S30BZ
S30B0
S30B1
It should now be very easy to modify the alphabet as needed and to use the results.
One option would be to put those String values into an array and then use i as an index into that array to get one element each iteration. If you do that though, keep in mind that array indexes start at 0.
You can also use a For Each loop to access each element of the array without the need for an index.
if the default first two string value of your output is AA.
You can have a case or if-else conditioning statement :
and then set 1 == A 2 == B...
the just add or concatenate your default two string and result string of your case.
I have tried to understand that you are looking for a series using range between 2 textboxes. Here is the code which will take the series and will give the output as required.
Dim startingStr As String = Mid(TextBox1.Text, TextBox1.Text.Length, 1)
Dim endStr As String = Mid(TextBox2.Text, TextBox2.Text.Length, 1)
Dim outputstr As String = String.Empty
Dim startNumber As Integer
Dim endNumber As Integer
startNumber = Asc(startingStr)
endNumber = Asc(endStr)
Dim TempStr As String = Mid(TextBox1.Text, 1, TextBox1.Text.Length - 1)
Dim i As Integer
For i = startNumber To endNumber
outputstr = outputstr + ", " + TempStr + Chr(i)
Next i
MsgBox(outputstr)
The First two lines will take out the Last Character of the String in the text box.
So in your case it will get A and D respectively
Then outputstr to create the series which we will use in the loop
StartNumber and EndNumber will be give the Ascii values for the character we fetched.
TempStr to Store the string which is left off of the series string like in our case AAA - AAD Tempstr will have AA
then the simple loop to get all the items fixed and show
in your case to achive goal you may do something like this
Dim S() As String = {"AAA", "AAB", "AAC", "AAD"}
For Each el In S
MsgBox(el.ToString)
Next
FIX FOR PREVIOUS ISSUE
Dim s1 As String = "AAA"
Dim s2 As String = "AAZ"
Dim Last As String = s1.Last
Dim LastS2 As String = s2.Last
Dim StartBase As String = s1.Substring(0, 2)
Dim result As String = String.Empty
For I As Integer = Asc(s1.Last) To Asc(s2.Last)
Dim zz As String = StartBase & Chr(I)
result += zz & vbCrLf
zz = Nothing
MsgBox(result)
Next
**UPDATE CODE VERSION**
Dim BARCODEBASE As String = "SBA0021"
Dim BarCode1 As String = "SBA0021AA1"
Dim BarCode2 As String = "SBA0021CD9"
'return AA1
Dim FirstBarCodeSuffix As String = Replace(BarCode1, BARCODEBASE, "")
'return CD9
Dim SecondBarCodeSuffix As String = Replace(BarCode2, BARCODEBASE, "")
Dim InternalSecondBarCodeSuffix = SecondBarCodeSuffix.Substring(1, 1)
Dim IsTaskCompleted As Boolean = False
For First As Integer = Asc(FirstBarCodeSuffix.First) To Asc(SecondBarCodeSuffix)
If IsTaskCompleted = True Then Exit For
For Second As Integer = Asc(FirstBarCodeSuffix.First) To Asc(InternalSecondBarCodeSuffix)
For Third As Integer = 1 To 9
Dim tmp = Chr(First) & Chr(Second) & Third
Console.WriteLine(BARCODEBASE & tmp)
If tmp = SecondBarCodeSuffix Then
IsTaskCompleted = True
End If
Next
Next
Next
Console.WriteLine("Completed")
Console.Read()
Take a look into this check it and let me know if it can help

Group DataSet Column Values into Comma Separated String using LINQ

How do I combine column values in a dataset using LINQ into a single string with comma separated values in VB.NET ?
I have one table with following structure
ID Name
728 Jim
728 Katie
728 Rich
How do I combine these into a single row like following
ID Name
728 Jim,Katie,Rich
Please note I am using a LINQ to Dataset so please respond in the applicable syntax.
Here is an example (using LINQ to objects, but should be easy to adjust for LINQ to DataSet):
Class Record
Public Property ID As Integer
Public Property Name As String
Sub New(id As Integer, name As String)
Me.ID = id
Me.Name = name
End Sub
End Class
Sub Main()
Dim recordList As New List(Of Record)
recordList.Add(New Record(728, "Jim"))
recordList.Add(New Record(728, "Katie"))
recordList.Add(New Record(728, "Rich"))
recordList.Add(New Record(729, "John"))
recordList.Add(New Record(729, "Michael"))
Dim v = From r As Record In recordList
Group By ID = r.ID Into Records = Group
Select ID, Name = String.Join(","c, Records.Select(Function(x) x.Name))
End Sub
This should do what you want:
Dim result = list.GroupBy(Function(a) a.ID) _
.Select(Function(g) New With {.ID = g.Key, .csvList = g.Select(Function(n) n.Name) _
.Aggregate(Function(accumulation, current) accumulation + "," + current)}) _
.ToList()
This is an example using LINQ to Dataset:
Dim grouped =
From row In dt.AsEnumerable()
Group row By id = row.Field(Of Integer)("ID") Into Group
Select ID, Name = String.Join(",", From i In Group Select i.Field(Of String)("Name"))
pretty late but i also ran into same problem and this is my solution. Hope this helps someone
Dim grouped =
a.AsEnumerable().
GroupBy(Function(row) row.Field(Of Integer)("ID")).
Select(Function(group, ID)
Return New With
{
.ID= ID,
.Name= String.Join(",", group.Select(Function(row) row.Field(Of String)("Name")))
}
End Function)