Count duplicate rows in a datatable using vb.net

Count duplicate rows in a datatable using vb.net - vb.net

I have a datatable called dtstore with 4 columns called section, department, palletnumber and uniquenumber. I am trying to make a new datatable called dtmulti which has an extra column called multi which shows the count for the number of duplicate rows...
dtstore
section | department | palletnumber | batchnumber
---------------------------------------------------
pipes 2012 1234 21
taps 2011 5678 345
pipes 2012 1234 21
taps 2011 5678 345
taps 2011 5678 345
plugs 2009 7643 63
dtmulti
section | department | palletnumber | batchnumber | multi
----------------------------------------------------------
pipes 2012 1234 21 2
taps 2011 5678 345 3
I have tried lots of approaches but my code always feels clumsy and bloated, is there an efficient way to do this?
Here is the code I am using:
Dim dupmulti = dataTable.AsEnumerable().GroupBy(Function(i) i).Where(Function(g) g.Count() = 2).Select(Function(g) g.Key)
For Each row In dupmulti multirow("Section") = dup("Section")
multirow("Department") = dup("Department")
multirow("PalletNumber") = dup("PalletNumber")
multirow("BatchNumber") = dup("BatchNumber")
multirow("Multi") = 2
Next

Assumptions of the code below these lines: the DataTable containing the original information is called dup. It might contain any number of duplicates and all of them can be defined by just looking at the first column.
'Creating final table from the columns in the original table
Dim multirow As DataTable = New DataTable
For Each col As DataColumn In dup.Columns
multirow.Columns.Add(col.ColumnName, col.DataType)
Next
multirow.Columns.Add("multi", GetType(Integer))
'Looping though the groupped rows (= no duplicates) on account of the first column
For Each groups In dup.AsEnumerable().GroupBy(Function(x) x(0))
multirow.Rows.Add()
'Adding all the cells in the corresponding row except the last one
For c As Integer = 0 To dup.Columns.Count - 1
multirow(multirow.Rows.Count - 1)(c) = groups(0)(c)
Next
'Adding the last cell (duplicates count)
multirow(multirow.Rows.Count - 1)(multirow.Columns.Count - 1) = groups.Count
Next

Related

Creating an Excel Lookup Table Sheet from a Comma Delimited and ID column

We exported a customer's table who was using AirTable to keep track of their client's information and locations in an attempt to import into a SQL database. Because of the way AirTable exports, the references to other tables in their "AirTable Base" are not via ID's, but exported in a single column as basically power labels for lack of a better explanation.
There's about 4,000 client rows in this table. Clients can have one or more locations. Excluding many of the other columns it looks like:
| Client_ID | Client_Name | ... | Locations
| 3456 | Acme Grocery | ... | "Memphis, TN","Orlando, FL","Philadelphia, PA"
| 3457 | Addition Financial | ... | "Miami, FL","Plano, TX","New York, NY"
| 3458 | Barros Pizza | ... | "Queen Creek, AZ"
We are trying to get the data ready for import into SQL, so we are attempting to find a formula/method which could take the Client_ID and then insert that into rows in a new data sheet made from the comma-delimited column. Using the above example the new data should look like the following:
| ClientInLocation_ID | Client_ID | Location |
| 10000 | 3456 | Memphis, TN |
| 10001 | 3456 | Orlando, FL |
| 10002 | 3456 | Philadelphia, PA |
| 10003 | 3457 | Miami, FL |
| 10004 | 3457 | Plano, TX |
| 10005 | 3457 | New York, NY |
| 10006 | 3458 | Queen Creek, AZ |
Doing so will allow us to then grab the unique locations, assign ID's to them and then replace the Location text with a Location_ID field.
I was thinking pivot tables, text to rows, etc. but perhaps I'm not experienced enough with them to pull this off. Also, any solutions can obviously exclude the ClientInLocation_ID auto increment as we could always have that autofilled once the other two fields are populated. Any help greatly appreciated.

There are many ways to tackle this problem. You can use PowerQuery (PQ) to do some of the lifting if you have an appropriate version of Excel. PQ is built into recently released Excel versions and is a free add-on for Excel 2013 and 2010 but is not available for anything older than Excel 2010. If you see a Power Query tab on the ribbon then you're good to go.
Use your data as the source for a new query and split the location column by delimiter "," To clarify, you are using three characters as the delimiter: the last quote of a location, the comma delimiting two locations, and the first quote of the second location. This puts one location in a cell with subsequent locations in columns to the right.
Every cell in the first column well have a quote in front of the text and the cell holding the final location for that row will have a quote at the end of the text. This is easily cleared in PQ but we're done here so it's probably faster to click Save & Load to close the editor and use Ctrl+H in Excel to clear them.
Your data will automatically be converted into a table that is connected to your source data. That means that refreshing the table does two things: it wipes any edits you've made and it updates the table with any changes in your source data. So either delete the query (if this is a one and done project) or copy the table to a new sheet (if you want to rapidly rebuild with new source data)
From there, I'd turn to VBA and use three nested For loops. The outer loop iterates every row in your data from the bottom up (Step -1). The middle loop iterates the columns to add new rows. The inner loop populates the rows.
This is quick, dirty, makes several assumptions and is in no way tested because it was written on my phone:
Option Explicit
Sub TransformTable ()
Dim ws As Worksheet
Dim myTable As ListObject
Dim rng As Range
Dim j As Long
Dim k As Long
Dim l as Long
Set ws = ActiveSheet
Set myTable = ws.ListObjects(1)
Application.ScreenUpdating = False
For j = myTable.ListRows.Count to 2 Step -1
For k = 1 to Application.WorksheetFunction.CountA(ws.Range(ws.Cells(j,1),ws.Cells(j,myTable.ListColumns.Count) - 3
Set rng = ws.Cells(j,1)
myTable.ListRows.Add j+k
For l = 0 to 1
rng.Offset(k,l) = rng.Offset(0,l)
Next l
rng.Offset(k,3) = rng.Offset(0,3+k)
rng.Offset(0,3+k).Cells.Clear
Next k
Next l
Application.ScreenUpdating = True
End Sub

Adding columns from one datagridview and results into another datagridview

The main task is to use values from one datagridview and have the sum be displayed on another datagridview on the same form.
There will be two datagridviews on the same form
datagridview1
apples | oranges
2 | 3
10 | 20
1 | 1
datagridview2
total apples | total oranges
13 | 24
is this at all possible?
If not, I was thinking of possibly creating a new row in the original datagridview?

Here's code to find the total of column index "1" for datagridview1 :
Dim total As Double 'double or integer depends on your situation
For i As Integer = 0 To DataGridView1.RowCount - 1
total += DataGridView1.Rows(i).Cells(1).Value
Next
Now you need to add total to another datagridview:
DataGridView2.Rows.Add("Oranges",total)

Excel/VBA/Conditional Formatting: Dictionary of Dictionaries

I've got an Excel workbook that obtains data from an MS SQL database. One of the sheets is used to check the data against requirements and to highlight faults. In order to do that, I've got a requirements sheet where the requirement is in a named range; after updating the data I copy the conditional formatting of the table header to all data rows. That works pretty nicely so far. The problem comes when I have more than one set of requirements:
An (agreeably silly) example could be car racing, where requirements may exist for driver's license and min/max horsepower. When looking at the example, please imagine there are a few thousand rows and 71 columns presently...
+-----+--------+----------------+------------+---------+
| Car | Race | RequirementSet | Horsepower | License |
+-----+--------+----------------+------------+---------+
| 1 | Monaco | 2 | 200 | A |
+-----+--------+----------------+------------+---------+
| 2 | Monaco | 2 | 400 | B |
+-----+--------+----------------+------------+---------+
| 3 | Japan | 3 | 200 | C |
+-----+--------+----------------+------------+---------+
| 4 | Japan | 3 | 300 | A |
+-----+--------+----------------+------------+---------+
| 5 | Japan | 3 | 350 | B |
+-----+--------+----------------+------------+---------+
| 6 | Mexico | 1 | 200 | A |
+-----+--------+----------------+------------+---------+
The individual data now needs to be checked against the requirements set in another sheet:
+-------------+---------------+---------------+---------+
| Requirement | MinHorsepower | MaxHorsepower | License |
+-------------+---------------+---------------+---------+
| 1 | 200 | 250 | A |
+-------------+---------------+---------------+---------+
| 2 | 250 | 500 | B |
+-------------+---------------+---------------+---------+
| 3 | 250 | 400 | A |
+-------------+---------------+---------------+---------+
In order to relate back to my present situation, I am only looking at either the Monaco, Japan or Mexico Race, and there is only 1 record in the requirements sheet, where the value in e.g. Cell B2 is always the MinHorsepower and the value in C2 is always the MaxHorsepower. So these cells are a named range that I can access in my data sheet.
Now however I would like to obtain all races at once, and refer conditional formatting formulas to the particular requirement.
Focussing on "Horsepower" in Monaco (requirement set 2), I can now find out that the min Horsepower is 250 and the max is 500 - so I will colour that column for car 1 as red and for car 2 as green.
The formula is programatically copied from the header row (the first conditional format rule is if row(D1) = 1 then do nothing)
I can't decide what the best approach to the problem is. Ideally, the formula is readable, something like `AND(D2 >= MinHorsepower; D2 <= MaxHorsepower) - I cannot imagine it to be maintainable if I had to use Vlookup combined with Indirect and Match to match a column header in requirements for that particular requirement - especially when it comes to combining criteria like in the HP example with min and max above.
I am wondering if I should read the requirements table into a dictionary or something in VBA, and then use a function like
public function check(requirementId as int, requirement$)
which then in Excel I could use like =D2 >= check(c2, "MinHorsepower")
Playing around with this a little bit it appears to be pretty slow as opposed to the previous system where I could only have one requirement. It would be fantastic if you could help me out with a fresh approach to this problem. I'll update this question as I go along; I'm not sure if I managed to illustrate the example really well but the actual data wouldn't mean anything to you.
In any case, thanks for hanging in until here!
Edit 29 October 2016
I have found a solution as basis for mine. Using the following code I can add my whole requirements table to a dictionary, and access the requirement.
Using a class clsRangeToDictionary (based on Tim Williams clsMatrix)
Option Explicit
Private m_array As Variant
Private dictRows As Object
Private dictColumns As Object
Public Sub Init(vArray As Variant)
Dim i As Long
Set dictRows = CreateObject("Scripting.Dictionary")
Set dictColumns = CreateObject("Scripting.Dictionary")
'add the row keys and positions. Skip the first row as it contains the column key
For i = LBound(vArray, 1) + 1 To UBound(vArray, 1)
dictRows.Add vArray(i, 1), i
Next i
'add the column keys and positions, skipping the first column
For i = LBound(vArray, 2) + 1 To UBound(vArray, 2)
dictColumns.Add vArray(1, i), i
Next i
' store the array for future use
m_array = vArray
End Sub
Public Function GetValue(rowKey, colKey) As Variant
If dictRows.Exists(rowKey) And dictColumns.Exists(colKey) Then
GetValue = m_array(dictRows(rowKey), dictColumns(colKey))
Else
Err.Raise 1000, "clsRangeToDictionary:GetValue", "The requested row key " & CStr(rowKey) & " or column Key " & CStr(colKey) & " does not exist"
End If
End Function
' return a zero-based array of RowKeys
Public Function RowKeys() As Variant
RowKeys = dictRows.Keys
End Function
' return a zero-based array of ColumnKeys
Public Function ColumnKeys() As Variant
ColumnKeys = dictColumns.Keys
End Function
I can now read the whole RequirementSet table into a dictionary and write a helper to obtain the particular requirement roughly so:
myDictionaryObject.GetValue(table1's RequirementSet, "MinHorsePower")
If someone could help me figure out how to put this into an answer giving the credit due to Tim Williams that'd be great.

FormulaArray not averaging out all the specified entries

Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?

The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.

To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i

VB.NET - Combine rows in DataTable based on shared value

I'm trying to combine rows in a DataTable based on their shared ID. The table data looks something like this:
Member | ID | Assistant | Content
---------------------------------------------
16 | 1234 | jkaufman | 1/1/2015 - stuff1
16 | 1234 | jkaufman | 1/2/2015 - stuff2
16 | 4321 | mhatfield | 1/3/2015 - stuff3
16 | 4321 | mhatfield | 1/4/2015 - stuff4
16 | 4321 | mhatfield | 1/5/2015 - stuff5
16 | 5678 | psmith | 1/6/2015 - stuff6
I want to combine rows based on matching IDs. There are two steps I could use some clarification on. The first is merging the rows. The second is combining the Content columns so that the contents aren't lost. For the example above, here's what I want:
Member | ID | Assistant | Content
-------------------------------------------------------------------------------------------
16 | 1234 | jkaufman | 1/1/2015 - stuff1 \r\n 1/2/2015 - stuff2
16 | 4321 | mhatfield | 1/3/2015 - stuff3 \r\n 1/4/2015 - stuff4 \r\n 1/5/2015 - stuff5
16 | 5678 | psmith | 1/6/2015 - stuff6
My eventual goal is copy the DataTable to an Excel spreadsheet so I'm not sure sure if the \r\n is the correct newline character but that's the least of my concerns at this point.
Here's my code right now (EDIT: updated to current code):
Dim tmpRow As DataRow
dtFinal = dt.Clone()
Dim i As Integer = 0
While i < dt.Rows.Count
tmpRow = dtFinal.NewRow()
tmpRow.ItemArray = dt.Rows(i).ItemArray.Clone()
Dim j As Integer = i + 1
While j <= dt.Rows.Count
If j = dt.Rows.Count Then 'if we've iterated off the end of the datset
i = j
Exit While
End If
If dt.Rows(i).Item("ID") = dt.Rows(j).Item("ID") Then 'if we've found another entry for this id
'append change to tmpRow
tmpRow.Item("Content") = tmpRow.Item("Content").ToString & Environment.NewLine & dt.Rows(j).Item("Content").ToString
Else 'if we've run out of entries to combine
i = j
Exit While
End If
j += 1
End While
'add our combined row to the final result
dtFinal.ImportRow(tmpRow)
End While
When I export the final table to Excel, the spreadsheet is blank so I'm definitely doing something wrong.
Any help would be fantastic. Thanks!

I see various problems with your approach (with both versions; but the second one seems better). That's why I have preferred to write a whole working code to help transmit my ideas clearly.
Dim dtFinal As DataTable = New DataTable
For Each col As DataColumn In dt.Columns
dtFinal.Columns.Add(col.ColumnName, col.DataType)
Next
Dim oldRow As Integer = -1
Dim row As Integer = -1
While oldRow < dt.Rows.Count - 1
dtFinal.Rows.Add()
row = row + 1
oldRow = oldRow + 1
Dim curID As String = dt.Rows(oldRow)(1).ToString()
Dim lastCol As String = ""
While (oldRow < dt.Rows.Count AndAlso dt.Rows(oldRow)(1).ToString() = curID)
lastCol = lastCol & dt.Rows(oldRow)(3).ToString() & Environment.NewLine
oldRow = oldRow + 1
End While
oldRow = oldRow - 1
For i As Integer = 0 To 2
dtFinal.Rows(row)(i) = dt.Rows(oldRow)(i)
Next
dtFinal.Rows(row)(3) = lastCol
End While
Note that trying to come up with the most "elegant" solution or to maximise the given in-built functionalities might not be the best way to face certain situations. In the problem you propose, for example, I think that it is better going step by step (and reducing code size/improving elegance only after a properly working version is in place). This is the kind of code I have tried to create here: a simple one delivering what is expected (I think that this is the exact functionality you want; in any case, bear in mind that I am including a simplistic code which you are expected to take as a mere help to understand the point).

I find the VB syntax clunky compared to how it would be in C#, but you may prefer this Linq with grouping solution:
Dim merge = (From rw In dt.Rows.OfType(Of DataRow)()
Group rw By
New With {.fld1 = rw(0)}.fld1,
New With {.fld2 = rw(1)}.fld2,
New With {.fld3 = rw(2)}.fld3 Into Group).
Select(Function(x)
Return New With {.Member = x.fld1,
.ID = x.fld2,
.Assistant = x.fld3,
.Content = String.Join("", x.Group.Select(Function(y)
Return String.Join("", y.ItemArray)
End Function))}
End Function)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count duplicate rows in a datatable using vb.net - vb.net

Related

Creating an Excel Lookup Table Sheet from a Comma Delimited and ID column

Adding columns from one datagridview and results into another datagridview

Excel/VBA/Conditional Formatting: Dictionary of Dictionaries

FormulaArray not averaging out all the specified entries

VB.NET - Combine rows in DataTable based on shared value

Categories

Resources