How to remove all duplicates in a data table in vb.net? - vb.net

Consider my data table
ID Name
1 AAA
2 BBB
3 CCC
1 AAA
4 DDD
Final Output is
2 BBB
3 CCC
4 DDD
How can i remove the rows in the data table using Vb.Net
Any help is appreciated.

Following works if you only want the distinct rows(skip those with same ID and Name):
Dim distinctRows = From r In tbl
Group By Distinct = New With {Key .ID = CInt(r("ID")), Key .Name = CStr(r("Name"))} Into Group
Where Group.Count = 1
Select Distinct
' Create a new DataTable containing only the unique rows '
Dim tblDistinct = (From r In tbl
Join distinctRow In tblDistinct
On distinctRow.ID Equals CInt(r("ID")) _
And distinctRow.Name Equals CStr(r("Name"))
Select r).CopyToDataTable
If you want to remove the dups from the original table:
Dim tblDups = From r In tbl
Group By Dups = New With {Key .ID = CInt(r("ID")), Key .Name = CStr(r("Name"))} Into Group
Where Group.Count > 1
Select Dups
Dim dupRowList = (From r In tbl
Join dupRow In tblDups
On dupRow.ID Equals CInt(r("ID")) _
And dupRow.Name Equals CStr(r("Name"))
Select r).ToList()
For Each dup In dupRowList
tbl.Rows.Remove(dup)
Next
Here is your sample-data:
Dim tbl As New DataTable
tbl.Columns.Add(New DataColumn("ID", GetType(Int32)))
tbl.Columns.Add(New DataColumn("Name", GetType(String)))
Dim row = tbl.NewRow
row("ID") = 1
row("Name") = "AAA"
tbl.Rows.Add(row)
row = tbl.NewRow
row("ID") = 2
row("Name") = "BBB"
tbl.Rows.Add(row)
row = tbl.NewRow
row("ID") = 3
row("Name") = "CCC"
tbl.Rows.Add(row)
row = tbl.NewRow
row("ID") = 1
row("Name") = "AAA"
tbl.Rows.Add(row)
row = tbl.NewRow
row("ID") = 4
row("Name") = "DDD"
tbl.Rows.Add(row)

You can use the DefaultView.ToTable method of a DataTable to do the filtering like this:
Public Sub RemoveDuplicateRows(ByRef rDataTable As DataTable)
Dim pNewDataTable As DataTable
Dim pCurrentRowCopy As DataRow
Dim pColumnList As New List(Of String)
Dim pColumn As DataColumn
'Build column list
For Each pColumn In rDataTable.Columns
pColumnList.Add(pColumn.ColumnName)
Next
'Filter by all columns
pNewDataTable = rDataTable.DefaultView.ToTable(True, pColumnList.ToArray)
rDataTable = rDataTable.Clone
'Import rows into original table structure
For Each pCurrentRowCopy In pNewDataTable.Rows
rDataTable.ImportRow(pCurrentRowCopy)
Next
End Sub

Assuming you want to check all the columns, this should remove the duplicates from the DataTable (DT):
DT = DT.DefaultView.ToTable(True, Array.ConvertAll((From v In DT.Columns Select v.ColumnName).ToArray(), Function(x) x.ToString()))
Unless I overlooked it, this doesn't seem to be in the documentation (DataView.ToTable Method), but this also appears to do the same thing:
DT = DT.DefaultView.ToTable(True)

Related

Display number of rows and columns

I have in my table MS Access named ( Table1 ) two fields ( ID1 - Team1 ).
With NumericUpDown1 i select the number of rows that i want to display after randomize in DataGridView2.With NumericUpDown2 i select the number of columns that i want to display after randomize in DataGridView2.If i choose with NumericUpDown2 only one column ( the number 1 ) it work very well with this query :
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Con_randomize()
Dim rows As Integer
If Not Integer.TryParse(NumericUpDown1.Value, rows) Then
MsgBox("NUMBER NOT AVAILABLE", MsgBoxStyle.Critical + MsgBoxStyle.OkOnly, "Error")
NumericUpDown1.Value = ""
NumericUpDown1.Focus()
Exit Sub
End If
If NumericUpDown2.Value = 1 Then
Dim sql As String = String.Format("SELECT Top {0} ID1,Team1 From Table1 ORDER BY RND(-(100000*ID1)*Time())", rows)
InfoCommand = New OleDbCommand(sql, Con_randomize)
InfoAdapter = New OleDbDataAdapter()
InfoAdapter.SelectCommand = InfoCommand
InfoTable = New DataTable()
InfoAdapter.Fill(InfoTable)
DataGridView2.DataSource = InfoTable
DataGridView2.Columns(0).HeaderText = "NUMERO"
DataGridView2.Columns(1).HeaderText = "CATEGORY1"
End If
End Sub
How to make if i choose with NumericUpDown2 the number 2 or 3 columns i want to display in Datagridview2.
The columns will be named ( CATEGORY2 - CATEGORY3 ) . for example ( 1 Victor - David - Vincent ) ( 2 wiliam- George - Joseph ) ..in my only field named Team1 I have a hundred of the names
I'm not 100% sure why you need a query to initialize the data, but give this a go. It will automatically create the columns the way you need them.
' If NumericUpDown2.Value = 1 Then ' Comment This If Block Out
' Create the Teams String
Dim teamsString as New System.Text.StringBuilder("")
For i as Integer = 1 to Convert.ToInt32(NumericUpDown2.Value)
teamsString.Append(", (SELECT Top 1 Team1 From Table1 ORDER BY RND(-(100000*ID1)*Time())) as Category" + i.ToString())
Next
Dim sql As String = String.Format("SELECT Top {0} ID1" + teamsString.ToString() + " From Table1 ORDER BY RND(-(100000*ID1)*Time())", rows)
InfoCommand = New OleDbCommand(sql, Con_randomize)
InfoAdapter = New OleDbDataAdapter()
InfoAdapter.SelectCommand = InfoCommand
InfoTable = New DataTable()
InfoAdapter.Fill(InfoTable)
DataGridView2.DataSource = InfoTable
DataGridView2.Columns(0).HeaderText = "NUMERO"
' Don't Need This, We Made It the Binding Name
' DataGridView2.Columns(1).HeaderText = "CATEGORY1"

datatable sum column and concatenate rows using LINQ and group by on multiple columns

I Have a datatable with following records
ID NAME VALUE CONTENT
1 AAA 10 SYS, LKE
2 BBB 20 NOM
1 AAA 15 BST
3 CCC 30 DSR
2 BBB 05 EFG
I want to write a VB.NET/LINQ query to have a output like below table: -
ID NAME SUM CONTENT (as CSV)
1 AAA 25 SYS, LKE, BST
2 BBB 25 NOM, EFG
3 CCC 30 DSR
Please provide me LINQ query to get the desired result. Thanks.
I have tried concatenation using below query
Dim grouped = From row In dtTgt.AsEnumerable() _
Group row By New With {row.Field(Of Int16)("ID"), row.Field(Of String)("Name")} _
Into grp() _
Select ID, Name, CONTENT= String.Join(",", From i In grp Select i.Field(Of String)("CONTENT"))
This query will give you the expected output:-
Dim result = From row In dt.AsEnumerable()
Group row By _group = New With {Key .Id = row.Field(Of Integer)("Id"),
Key .Name = row.Field(Of String)("Name")} Into g = Group
Select New With {Key .Id = _group.Id, Key .Name = _group.Name,
Key .Sum = g.Sum(Function(x) x.Field(Of Integer)("Value")),
Key .Content = String.Join(",", g.Select(Function(x) x.Field(Of String)("Content")))}
Thanks for your answers.
However, I have managed to get the desired result using simple code (Without LINQ): -
Dim dt2 As New DataTable
dt2 = dt.Clone()
For Each dRow As DataRow In dt.Rows
Dim iID As Integer = dRow("ID")
Dim sName As String = dRow("Name")
Dim sContt As String = dRow("Content")
Dim iValue As Integer = dRow("Value")
Dim rwTgt() As DataRow = dt2.Select("ID=" & iID)
If rwTgt.Length > 0 Then
rwTgt(0)("Value") += iValue
rwTgt(0)("Content") += ", " & sContt
Else
rw = dt2.NewRow()
rw("ID") = iID
rw("Name") = sName
rw("Value") = iValue
rw("Content") = sContt
dt2.Rows.Add(rw)
End If
Next

Using Linq for select row from DataTableA where id not in DataTableB

I have two dataTables ,and i want select all rows from DataTable1 where id is not in DataTable2.below what i have tried :
Sql = "select *,N°Reçu as NumRecu from V_Sit_J_Vente,V_Bien where V_Sit_J_Vente.Code_bien=V_Bien.Code_bien and date_situation <= '" + dt2 + "' and date_situation >= '" + dt1 + "'"
Dim GlobalDataVente As DataTable = utilitaire.getDataSet(Sql).Tables(0)
Sql = "select * from V_Reserv_Annule"
Dim GlobalDataAnnule As DataTable = utilitaire.getDataSet(Sql).Tables(0)
Dim query = (From order In GlobalDataVente.AsEnumerable() _
Where order!code_projet = tab.Rows(i).Item("code_projet")).ToList
Dim bannedCCList = From c In GlobalDataAnnule.AsEnumerable() _
Where c!type.Equals("Transfert acompte") = False And c!date_annule <= dt2
Dim exceptBanned = From c In query Group Join b In bannedCCList On c.Field(Of String)("N°Reçu") Equals b.Field(Of String)("num_reserv_remplace")
Into j() From x In j.DefaultIfEmpty() Where x Is Nothing Select c
What i want that "exceptBanned " containt all rows of "query" except row exist in "bannedCCList "
Thanks in advance
You can use Contains for this:
Dim query = (From order In GlobalDataVente.AsEnumerable() _
Where order!code_projet = tab.Rows(i).Item("code_projet")).ToList
Dim bannedCCList = From c In GlobalDataAnnule.AsEnumerable() _
Where c.type.Equals("Transfert acompte") = False And c.date_annule <= dt2
Select c.Field(Of String)("num_reserv_remplace")
Dim exceptBanned = From c In query
Where Not bannedCCList.Contains(c.Field(Of String)("N°Reçu"))
Select c
bannedCCList defines a query that produces the Id values you want to exclude; exceptBanned combines query with this list of Ids into a query that only runs once to return the final results. It works this way because bannedCCList is an IEnumerable. It isn't executed when it's defined, only when it's actually used.

Cross table VB.NET & SQL Server & Linq

I have a table like this:
MAName feldtext
------------------
karl fieldtext1
karl fieldtext2
karl fieldtext1
karl fieldtext3
karl fieldtext4
karl fieldtext2
karl fieldtext5
karl fieldtext3
karl fieldtext3
susi fieldtext1
susi fieldtext4
john fieldtext2
john fieldtext5
john fieldtext5
and I need:
MAName fieldtext1 fieldtext2 fieldtext3 fieldtext4 fieldtext5 FehlerJeMA
karl 2 2 3 1 1 9
susi 1 0 0 1 0 2
john 0 1 0 0 2 3
The columns fieldtext can go from fieldtext1 to fieldtextn, it's dynamic, depending on query.
I was looking here for solutions and found, so my approach:
Dim dt2 As New DataTable
Dim nn As Integer = 0
Dim Zeile As DataRow
dt2.Columns.Add("MAName")
' fieldtext distinct
Dim query2 = (From dr In (From d In newTable2.AsEnumerable Select New With {.feldtext1 = d("feldtext")}) Select dr.feldtext1 Distinct)
For Each Feldtext In query2
dt2.Columns.Add(Feldtext)
Next
column = New DataColumn()
column.DataType = System.Type.GetType("System.Int32")
column.ColumnName = "FehlerJeMA"
dt2.Columns.Add(column)
' MAName distinct
Dim query3 = (From dr In (From d In newTable2.AsEnumerable Select New With {.MAName2 = d("MAName")}) Select dr.MAName2.ToString.ToLower Distinct)
For Each Mitarbeiter In query3
Zeile = dt2.NewRow()
Zeile(0) = Mitarbeiter.ToString.ToLower
MA2 = Mitarbeiter.ToString.ToLower
nn = 1
For Each colName2 In query2
Fehler2 = colName2
Dim AnzahlFehler As String = (From row In newTable2.Rows Select row Where row("MAName").ToString.ToLower = MA2 And row("feldtext") = Fehler2).Count
If AnzahlFehler = 0 Then
AnzahlFehler = ""
End If
Zeile(nn) = AnzahlFehler
nn += 1
If AnzahlFehler <> "" Then
FehlerJeMA += CInt(AnzahlFehler)
End If
Next
Zeile(nn) = FehlerJeMA
dt2.Rows.Add(Zeile)
Next
This works, but is very slow...
It could be the case that in my table has more than 10.000 rows...
So my question is: what is fastest approach to get the result?
Is it some kind of cross table with linq? Other approaches?
In C# you will be able to use the code, try to translate it for your problem:
var pivotData = data.GroupBy(x => new {x.MAName, x.feldtext}, (key, group) => new { MAName = key.Column1, feldtext = key.Column2, count = group.Count() });

How to remove only one row among duplicate rows in a datatable

I've a datatable dtPackageTest with following rows in it
testid testname
------ -----------
1 abc
2 xyz
1 abc
2 xyz
I followed this answer to but it removes all the duplicate rows, and my expected output is
testid testname
------ -----------
1 abc
2 xyz
My code:
Dim tblDups = From r In dtPackageTest _
Group By Dups = New With {Key .testid = CInt(r("testid")), Key .test = CStr(r("test"))} Into Group _
Where (Group.Count > 1) _
Select Dups
Dim dupRowList = (From r In dtPackageTest _
Join dupRow In tblDups _
On dupRow.testid Equals CInt(r("testid")) _
And dupRow.test Equals CStr(r("test")) _
Select r).ToList()
For Each dup In dupRowList
dtPackageTest.Rows.Remove(dup)
Next
Make following changes in your existing code,this will work as you expected :
(I guess this should be an old school logic but it works )
'Add order by - Order By Dups.testid
Dim tblDups = From r In dtPackageTest _
Group By Dups = New With {Key .testid = CInt(r("testid")), Key .test = CStr(r("test"))} Into Group _
Where (Group.Count > 1) Order By Dups.testid _
Select Dups
'Add order by - Order By r("testid")
Dim dupRowList = (From r In dtPackageTest _
Join dupRow In tblDups _
On dupRow.testid Equals CInt(r("testid")) _
And dupRow.test Equals CStr(r("test")) Order By r("testid") _
Select r).ToList()
Dim id As Integer = 0
For Each dup In dupRowList
'Checking for testid is already removed or not
If id <> dup("testid") Then
id = dup("testid")
dtPackageTest.Rows.Remove(dup)
End If
Next