Comparing All Columns That Share A Common Identifier in MS Access - sql

Good day all.
I'm working on doing a comparison of records query and am having trouble finding an easier way to do it. Here's the run down:
Essentially, two people I work with have been entering data so each participant in our study will have two records in our database. I have to do a comparison of these records and return all discrepancies so they can be examined and fixed upon further review. Cannot think of an easy way to do this - there are a hefty amount of fields and I'd rather avoid having to do a comparison of each column manually.
Is there a method to compare from the following:
I'd want to compare all columns where Participant ID is the same. End goal is to make one column of all discrepancies per Participant.
Any ideas? My first attempt involved pulling all records where Count(Participant ID)>1 and then joined this on all the tables we're examining. As such, each participant had 2 rows - one for each extraction that was entered into the database. I then made these records side by side into one row so there was aType_1, bType_1, aType_2, bType_2, etc. I then began writing comparisons for each individual column pair, eg. aType_1<>bType_1, and returning the ID. This would effectively require a lot of coding and a massive set of union queries if I want to condense into a list of discrepancies. I cannot think of a simple way to do this... Thank you in advance for any ideas! :)

Hope I understand what you want.
SELECT a.Type_1, b.Type_1, a.Type_2, b.Type_2 etc.
FROM table a
INNER JOIN table b ON b.ParticipantID = a.ParticipantID
WHERE a.Type_1<>b.Type_1 AND a.Type_2<>b.Type_2 etc.

This is how far I managed to get. Not very far from what you managed I guess. But here goes anything. (Oh and this is in MySql >_<)
Assuming your participant-data table has some kind of unique valued column (primary key?), you could do something like the following:
If this is your table structure:
mysql> describe foo;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| pid | int(11) | YES | | NULL | |
| type1 | int(11) | YES | | NULL | |
| type2 | int(11) | YES | | NULL | |
| type3 | int(11) | YES | | NULL | |
| pk | int(11) | YES | UNI | NULL | |
+-------+---------+------+-----+---------+-------+
5 rows in set (0.00 sec)
And you have data as follows:
mysql> select * from foo;
+------+-------+-------+-------+------+
| pid | type1 | type2 | type3 | pk |
+------+-------+-------+-------+------+
| 1 | 14 | 24 | 34 | 1 |
| 1 | 15 | 24 | 34 | 2 |
| 2 | 15 | 24 | 34 | 3 |
| 2 | 15 | 25 | 34 | 4 |
| 3 | 15 | 25 | 34 | 5 |
+------+-------+-------+-------+------+
5 rows in set (0.00 sec)
The following query returns a comparison of rows with matching pids:
mysql> select a.pid, a.type1 <> b.type1 as t1, a.type2 <> b.type2 as t2, a.type3 <> b.type3 as t3 from foo a join foo b on a.pid = b.pid where a.pk != b.pk group by a.pid having count(a.pid) > 1;
+------+------+------+------+
| pid | t1 | t2 | t3 |
+------+------+------+------+
| 1 | 1 | 0 | 0 |
| 2 | 0 | 1 | 0 |
+------+------+------+------+
2 rows in set (0.00 sec)
Same query, including the original column values for reference:
mysql> select *, a.pid, a.type1 <> b.type1 as t1, a.type2 <> b.type2 as t2, a.type3 <> b.type3 as t3 from foo a join foo b on a.pid = b.pid where a.pk != b.pk group by a.pid having count(a.pid) > 1;
+------+-------+-------+-------+------+------+-------+-------+-------+------+------+------+------+------+
| pid | type1 | type2 | type3 | pk | pid | type1 | type2 | type3 | pk | pid | t1 | t2 | t3 |
+------+-------+-------+-------+------+------+-------+-------+-------+------+------+------+------+------+
| 1 | 15 | 24 | 34 | 2 | 1 | 14 | 24 | 34 | 1 | 1 | 1 | 0 | 0 |
| 2 | 15 | 25 | 34 | 4 | 2 | 15 | 24 | 34 | 3 | 2 | 0 | 1 | 0 |
+------+-------+-------+-------+------+------+-------+-------+-------+------+------+------+------+------+
2 rows in set (0.00 sec)
If you somehow want to avoid writing down the name of each column (because you have 100s) you would want to consider using the INFORMATION_SCHEMA table to get the column names via a stored procedure maybe. Again this is MySql, but I'm sure MS-Access has an equivalent table.
Hope this helps at least a bit. :P

are you familiar with VBA? When I had to implement a solution similar to yours, I iterated through the fields collection of one entry's recordset and compared it with the value in field in the same ordinal position of the 2nd entry's recordset.
dim db as database
dim rs1 as recordset
dim rs2 as recordset
dim i as integer
dim bSame as boolean
bsame = true ' init
set rs1 = db.openrecordset("qryEntry1")
set rs2 = db.openrecordset("qryEntry2")
for i = 0 to rs1.fields.count
bSame = bSame and ( rs1(i) = rs2(i) )
next
here's the actual code I used:
Public Function verify(sPatientID As String) As Boolean
' returns true if all fields for entries 1 and 2 match'
Dim db As Database
Dim qdef As QueryDef
Dim rsDemo1 As Recordset
Dim rsDemo2 As Recordset
Dim rsDemoDiffs As Recordset
Dim rs As Recordset
Dim iDifferCountTestValues As Integer
Dim iDifferCountDemo As Integer
Dim iField As Integer
Dim bDiffers As Boolean
Dim bDemoEntered As Boolean
Dim bValuesEntered As Boolean
Dim bActive As Boolean
Dim bWithdrawn As Boolean
Dim bGroupSet As Boolean
Set db = CurrentDb
Set rs = db.OpenRecordset("select * from tblPatient where patientID = '" & sPatientID & "'")
bActive = rs("isActive")
bWithdrawn = rs("isWithdrawn")
bGroupSet = (rs("groupID") > 0)
Set qdef = db.QueryDefs("qryClearEntriesDifferPerPatient")
qdef.Parameters("patientIDCrit") = sPatientID
qdef.Execute
Set qdef = db.QueryDefs("qrySetEntry1DiffersForPatient")
qdef.Parameters("patientIDCrit") = sPatientID
qdef.Execute
Set qdef = db.QueryDefs("qrySetEntry2DiffersForPatient")
qdef.Parameters("patientIDCrit") = sPatientID
qdef.Execute
Set qdef = db.QueryDefs("qryTestValueEntriesDifferForPatient")
qdef.Parameters("patientIDCrit") = sPatientID
Set rs = qdef.OpenRecordset
bValuesEntered = Not rs.EOF
If rs.EOF Then
iDifferCountTestValues = 0
Else
iDifferCountTestValues = Abs(rs(1)) ' sum of true values is negative'
End If
db.Execute "Delete from tblDemographicEntriesDiffer where patientID = '" & sPatientID & "'"
Set rsDemo1 = db.OpenRecordset("select * from tblPatientDemographics where patientID = '" & sPatientID & "' and entryID = 1")
Set rsDemo2 = db.OpenRecordset("select * from tblPatientDemographics where patientID = '" & sPatientID & "' and entryID = 2")
bDemoEntered = (Not rsDemo1.EOF) And (Not rsDemo2.EOF)
Set rsDemoDiffs = db.OpenRecordset("Select * from tblDemographicEntriesDiffer")
For iField = 2 To rsDemo1.Fields.Count - 1 ' skip comparison of entryID'
If rsDemo1.Fields(iField).Value <> rsDemo2.Fields(iField).Value Then
bDiffers = True
ElseIf IsNull(rsDemo1.Fields(iField)) And Not IsNull(rsDemo2.Fields(iField)) Then
bDiffers = True
ElseIf Not IsNull(rsDemo1.Fields(iField)) And IsNull(rsDemo2.Fields(iField)) Then
bDiffers = True
Else
bDiffers = False
End If
If bDiffers Then
rsDemoDiffs.AddNew
rsDemoDiffs("patientID") = sPatientID
rsDemoDiffs("fieldName") = rsDemo1.Fields(iField).Name
rsDemoDiffs("entry1") = rsDemo1.Fields(iField).Value
rsDemoDiffs("entry2") = rsDemo2.Fields(iField).Value
rsDemoDiffs.Update
End If
Next
Set qdef = db.QueryDefs("qryDemoDiffersCountForPatient")
qdef.Parameters("patientIDCrit") = sPatientID
Set rs = qdef.OpenRecordset
If rs.BOF And rs.EOF Then
iDifferCountDemo = 0
Else
iDifferCountDemo = rs(1)
End If
verify = (iDifferCountTestValues + iDifferCountDemo = 0) And bDemoEntered And bActive
If bWithdrawn Then
verify = verify And bGroupSet
Else
verify = verify And bValuesEntered
End If
db.Execute "Update tblPatient set isVerified = " & verify & " where patientID = '" & sPatientID & "'"
rsDemo1.Close
rsDemo2.Close
rsDemoDiffs.Close
db.Close
End Function

Related

How can I put SQL result into variable in VBA

I have example table like below
| ID | Qty |
| -- | ----|
| 1 | 5 |
| 2 | 7 |
| 3 | 8 |
| 4 | 9 |
| 5 | 12 |
How can I pass result of below query into VBA variable to use in next queries (update and insert)?
SELECT example_tab.Qty
FROM example_tab
WHERE ID = 4
In example that test_variable = 9
To get a single value into a variable, DLookup can be used:
Dim test_variable As Variant
test_variable = DLookup("[Qty]", "example_tab", "[ID] = 4")
Is this what your looking for ?
sSQL = "SELECT example_tab.Qty FROM example_tab WHERE ID = 4"
Dim rs As DAO.Recordset
Set rs = CurrentDB.OpenRecordset(sSQL)

Validate multiple cell data using VBA

I'm stuck in a bit of a pickle could someone please help.
I am trying to check if each property has a particular property_id.
Eg:- verify if each of the property has property if "ABC, XYZ, LMN, IJK". Also verify if each of date is > 10-12-2018
| Property | Property_ID | Date |
|----------|-------------|------------|
| A | ABC | 10/12/2018 |
| A | XYZ | 08/11/2018 |
| A | LMN | 12/05/2018 |
| A | IJK | 15/05/2018 |
| B | ABC | 13/12/2018 |
| B | XYZ | 14/10/2018 |
| B | IJK | 15/12/2018 |
| C | LMN | 01/12/2018 |
| C | XYZ | 17/05/2018 |
Expected Result
| Property | Property_ID | Date | Result |
|----------|-------------|------------|------------|
| A | ABC | 10/12/2018 |
| A | XYZ | 08/11/2018 |
| A | LMN | 12/05/2018 |
| A | IJK | 15/05/2018 |All PID's are found
| B | ABC | 13/12/2018 |
| B | XYZ | 14/10/2018 |
| B | IJK | 15/12/2018 |LMN is missing for Property B
| C | LMN | 01/12/2018 |
| C | XYZ | 17/05/2018 |ABC, IJK is missing for property C
MY Logic:
'CREATING VARIABLE TO ACCESS SHEET RANGE
sheetName1 = "test" 'sheetName SHOULD BE EQUAL TO WORKSHEET NAME (REPLACE THE NAME ACCORDINGLY)
Set sht1 = Sheets(sheetName1)
'FINDING TOTAL NUMBER OF ROWS PRESENT IN THE ACTIVE WORKSHEET
totalRowCount = ActiveWorkbook.Worksheets(sheetName1).Range("A1", Worksheets(sheetName1).Range("A1").End(xlDown)).Rows.Count
previous_Value = sht1.Range("A2")
current_Value = Null
'Creating Flags to verify value
ABC = False
XYZ = False
IJK = False
LMN = False
OPQ = False
Date_Validation = Null
For i = 2 To totalRowCount
current_Value = Trim(sht1.Range("A" & i))
If current_Value = previous_Value Then
promotion_ID = Trim(sht1.Range("B" & i))
'Validate date
Date = "10-12-2018"
If promotion_ID = "ABC" Then
ABC = True
ElseIf promotion_ID = "IJK" Then
IJK = True
ElseIf promotion_ID = "XYZ" Then
XYZ = True
End If
'FULL SERVICE
If promotion_ID = "LMN" Then
LMN = True
ElseIf promotion_ID = "OPQ" Then
OPQ = True
ElseIf promotion_ID = "QWE" Then
QWE = True
End If
Else
sht1.Range("D" & i) = "Here i need to display msg of flag which is not found"
previous_Value = sht1.Range("A" & i)
End If
I'm seeing a couple of problems in the code:
Your setting your "valid date" value inside your loop;
a. it will never change;
b. it's not being compared to anything to trigger an exception.
c. the current "Date" should be initialized from the first record, outside the loop and Date_validation should be set based on this first value, such as:
Date_validation = Date = Trim(sht1.Range("c2"))
Therefore "Date_validation" is only set true when the date is valid, any false value is an exception.
d. A side note: a var named "Date" is liable to run into keyword issues... may I suggest "PropDate" or "PromoDate".
e. Each new line read (i incremented) should test not only is `date_validation
If your If current_Value = previous_Value Then is false (ie new property), you haven't checked if all of the conditions have been met, only that your' now sitting on a new property. The corresponding Else should likely be an ElseIf such as:
ElseIf not (ABC and IJK and XYZ and ... and Date_validation ) 'only if all false
'.... display error message
' also all if the ABC (etc) ...
' triggers need to be reset to false
' along with Data_validation
At the point of the message in the code, i has already incremented and is sitting on a new record i. it should probably read (something) i-1

Sum Values in dataset By Grouping and Loading result to its header row for Trial Balance Sheet

I have created the Data-set by filling values of Assets child accounts and bank balances, AND stock values. Gathered all these values from different tables.
Now problem i want to sum each values in row which has the same parent account number and the result of that want to update in the account number which is parent in the same data-set table.
There are different types of rows. following possibilities could be in the rows.
the row can be parent group account. each group can have different transaction accounts and also group account in child, group account does not have transactions. their value would be only as a sum of their child accounts.
Table is as follows.
AccountNo | AccountDepth| ParrentAccountNo|OpeningDebit|
1 | 0 | | | <Main Header
11 | 1 | 1 | | <Header Child
111 | 2 | 11 |10000 | <Trx
112 | 2 | 11 | | <Header Child
1121 | 3 | 112 | 5000 | <Trx
1122 | 3 | 112 |15000 | <Trx
113 | 2 | 11 | | <Hedaer Child
1131 | 3 | 113 | | <Header Child
11311 | 4 | 1131 |20000 | <Trx
1132 | 3 | 113 |35000 | <Trx
12 | 1 | 1 | | <Header Child
121 | 2 | 12 | | <Header Child
1211 | 3 | 121 |10000 | <Trx
I want to sum each transaction account into header, than sum all header to their header and if their exists any child then its child sum also in header and then all the header to its parent.
Remember that all the values are in DATA-SET Table
I never Got Answer to any of my question from here. Any How I did it my self.
Let me share with you if any one need this too.
Dim currentValue As Integer = 0, Depth As Integer = 0
For c As Integer = 0 To ds.Tables(0).Rows.Count - 1
currentValue = ds.Tables(0).Rows(c).Item("AccountDepth")
If currentValue > Depth Then Depth = currentValue
Next
Dim depthchang As Integer = Depth
For i = 0 To Depth
Dim dt As DataTable = ds.Tables(0).Clone
Dim rows() As DataRow = ds.Tables(0).Select("AccountDepth = " & depthchang & "")
Dim row1 As DataRow
For Each row1 In rows
dt.ImportRow(row1)
Next
For j = 0 To ds.Tables(0).Rows.Count - 1
For k = 0 To dt.Rows.Count - 1
If ds.Tables(0).Rows(j).Item("AccountNo") = dt.Rows(k).Item("ParentAccountNo") Then
ds.Tables(0).Rows(j).Item("OpeningDebit") = ds.Tables(0).Rows(j).Item("OpeningDebit") + dt.Rows(k).Item("OpeningDebit")
ds.Tables(0).Rows(j).Item("OpeningCredit") = ds.Tables(0).Rows(j).Item("OpeningCredit") + dt.Rows(k).Item("OpeningCredit")
End If
Next
Next
dt.Clear()
depthchang = depthchang - 1
Next
This code gives me my required output.

Merge 2 rows from 2 separate datagridviewrows into a new one vb.net

I'm trying to merge two rows and write it to a new row on a third table:
Example:
TableA: (data is fixed)
customer | name | last name
1 | bob | jansens
2 | jan | peeters
... | ... | ...
TableB: (data is fixed)
age | lenght | weight
23 | 178 | 76
75 | 165 | 86
... | ... | ...
Now, those two tables need to be merged like so:
TableC:
customer | name | last name | age | lenght | weight
1 | bob | jansens | 23 | 178 | 76
2 | jan | peeters | 75 | 165 | 86
... | ... | ... | ... | ... | ...
My code for now, even not working:
Public Sub merge_BAK(adminis As DataGridView, kluwer As DataGridView, merged As DataGridView)
Dim adminis_header_count As Integer = adminis.Columns.Count
Dim kluwer_header_count As Integer = kluwer.Columns.Count
Dim diff_header_count As Integer = kluwer.Columns.Count - adminis.Columns.Count
Dim total_header_count As Integer = adminis_header_count + kluwer_header_count
For Each adminis_row As DataGridViewRow In adminis.Rows
If adminis_row.IsNewRow = False Then
Dim btw As String = adminis_row.Cells(4).Value()
If btw IsNot String.Empty Then
btw = btw.Remove(0, 3)
For Each kluwer_row As DataGridViewRow In kluwer.Rows
Dim venn_onderneming As String = kluwer_row.Cells(44).Value()
If btw = venn_onderneming Then
merged.ColumnCount = total_header_count
Dim merge_row As DataGridViewRow = CType(adminis_row.Clone(), DataGridViewRow)
For i As Integer = 0 To adminis_row.Cells.Count - 1
merge_row.Cells(i).Value = adminis_row.Cells(i).Value
Next
merged.Rows.Add(merge_row) 'somewhere here the current row (kluwer_row) needs to be placed behind the current row of the previous table (adminis_row)
End If
Next kluwer_row
End If
End If
Next adminis_row
End Sub
Does anybody have an idea how to achieve this?
You can use SQL server "VIEWs"
To achieve that, just drag and drop both tables in your pane and join them

Linq join on parameterized distinct key

I'm trying to LINQ two tables based on a dynamic key. User can change key via a combo box. Key may be money, string, double, int, etc. Currently I'm getting the data just fine, but without filtering out the doubles. I can filter the double in VB, but it's slooooow. I'd like to do it in the LINQ query right out of the gate.
Here's the data:
First Table:
-------------------------------------------------------------
| AppleIndex | AppleCost | AppleColor | AppleDescription |
------------------------------------------------------------
| 1 | 3 | Red | This is an apple |
| 2 | 5 | Green | This is an apple |
| 3 | 4 | Pink | This is an apple |
| 4 | 2 | Yellow | This is an apple |
| 5 | 2 | Orange | This is an apple |
| 1 | 3 | Red | This is a duplicate|
| 2 | 5 | Green | This is a duplicate|
| 3 | 4 | Pink | This is a duplicate|
| 4 | 2 | Yellow | This is a duplicate|
| 5 | 2 | Orange | This is a duplicate|
-------------------------------------------------------------
Second Table:
------------------------------------------------------------
| OrangeIndex | OrangeCost | OrangeColor | OrangeDescription |
------------------------------------------------------------
| 1 | 1 | Orange | This is an Orange |
| 2 | 3 | Orange | |
| 3 | 2 | Orange | This is an Orange |
| 4 | 3 | Orange | |
| 5 | 2 | Orange | This is an Orange |
------------------------------------------------------------
Currently, I'm using the following code to get too much data:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows Distinct
Outcome:
-------------------------------------------------------------------------
| 1 | 3 | Red | This is an apple | 1 | Orange | This is an Orange |
| 1 | 3 | Red | This is an duplicate | 1 | Orange | This is an Orange |
| 2 | 5 | Green | This is an apple | 3 | Orange | |
| 2 | 5 | Green | This is an duplicate | 3 | Orange | |
| 3 | 4 | Pink | This is an apple | 2 | Orange | This is an Orange |
| 3 | 4 | Pink | This is an duplicate | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 3 | Orange | |
| 4 | 2 | Yellow | This is an duplicate | 3 | Orange | |
| 5 | 2 | Orange | This is an apple | 2 | Orange | This is an Orange |
| 5 | 2 | Orange | This is an duplicate | 2 | Orange | This is an Orange |
-------------------------------------------------------------------------
Desired Outcome:
------------------------------------------------------------------------
| 1 | 3 | Red | This is an apple | 1 | 1 | Orange | This is an Orange |
| 2 | 5 | Green | This is an apple | 2 | 3 | Orange | |
| 3 | 4 | Pink | This is an apple | 3 | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 4 | 3 | Orange | |
| 5 | 2 | Orange | This is an apple | 5 | 2 | Orange | This is an Orange |
------------------------------------------------------------------------
I have tried the following:
'Get the original Column Names into an Array List
'MasterTableColumns = GetColumns(qMasterDS, TheMasterTable) '(external code)
'Plug the Existing DataSet into a DataView:
Dim View As DataView = New DataView(qMasterTable)
'Sort by the Primary Key:
View.Sort = ThePrimaryKey
'Build a new table listing only one column:
Dim newListTable As DataTable = _
View.ToTable("UniqueData", True, ThePrimaryKey)
This returns a unique list, but no associated data:
-------------
| AppleIndex |
-------------
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
-------------
So I tried this instead:
'Build a new table with ALL the columns:
Dim newFullTable As DataTable = _
View.ToTable("UniqueData", True, _
MasterTableColumns(0), _
MasterTableColumns(1), _
MasterTableColumns(2), _
MasterTableColumns(3))
Unfortunately, it yields the following... with duplicates:
-------------------------------------------------------------
| AppleIndex | AppleCost | AppleColor | AppleDescription |
------------------------------------------------------------
| 1 | 3 | Red | This is an apple |
| 2 | 5 | Green | This is an apple |
| 3 | 4 | Pink | This is an apple |
| 4 | 2 | Yellow | This is an apple |
| 5 | 2 | Orange | This is an apple |
| 1 | 3 | Red | This is a duplicate|
| 2 | 5 | Green | This is a duplicate|
| 3 | 4 | Pink | This is a duplicate|
| 4 | 2 | Yellow | This is a duplicate|
| 5 | 2 | Orange | This is a duplicate|
-------------------------------------------------------------
Any ideas?
~~~~~~~~~~~~ Update: ~~~~~~~~~~~~
Jeff M suggested the following code. (Thanks Jeff) However, it gives me a error. Does anyone know the syntax for making this work in VB? I've monkeyed with it a bit and can't seem to get it right.
Dim matches = _
From mRows In (From row In LinqMasterTable _
Group row By row(ThePrimaryKey) Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Error in Third row at "row(ThePrimaryKey)":
"Range variable name can be inferred only from a simple or qualified name with no arguments."
Well, the basic problem isn't the LINQ. It's the fact the your First Table contains "duplicates", which aren't really duplicates, since in your example, every row is distinctive.
So, our question to you is "How do we identify the duplicates in the original table?". Once that is answered, the rest should be trivial.
For example (In C# since I'm not sure of the VB syntax)
var Matches = from mRows in LinqMasterTable
.Where(r=>r.AppleDescription=="This is an Apple")
join sRows in LinqSecondTable
on mRows(ThePrimaryKey) equals sRows(TheForignKey)
orderby mRows(ThePrimaryKey)
select new { mRows, sRows};
Edit:
Here's how I would write the C# LINQ query. Here's an alternate version rather than using Distinct(), uses a nested query with grouping which should have similar semantics. It should be easily convertible to VB.
var matches = from mRows in (from row in LinqMasterTable
group row by row[ThePrimaryKey] into g
select g.First())
join sRows in LinqSecondTable
on mRows[ThePrimaryKey] Equals sRows[TheForignKey]
orderby mRows[ThePrimaryKey]
select new { mRows, sRows }
and my attempt at a VB version of the above:
Edit:
As for the most recent error, I know exactly how to deal with it. When I was playing with VB LINQ, I found that the compiler doesn't like complex grouping expressions. To get around that, assign row(ThePrimaryKey) to a temporary variable and group by that variable. It should work then.
Dim matches = From mRows In (From row In LinqMasterTable _
Let grouping = row(ThePrimaryKey)
Group row By grouping Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Actually upon second inspection, it turns out that what is being grouped by needs a name. The following will work.
Dim matches = From mRows In (From row In LinqMasterTable _
Group row By Grouping = row(ThePrimaryKey) Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Declarations and Such:
Private Sub LinqTwoTableInnerJoin(ByRef qMasterDS As DataSet, _
ByRef qMasterTable As DataTable, _
ByRef qSecondDS As DataSet, _
ByRef qSecondTable As DataTable, _
ByRef qPrimaryKey As String, _
ByRef qForignKey As String, _
ByVal qResultsName As String)
Dim TheMasterTable As String = qMasterTable.TableName
Dim TheSecondTable As String = qSecondTable.TableName
Dim ThePrimaryKey As String = qPrimaryKey
Dim TheForignKey As String = qForignKey
Dim TheNewForignKey As String = ""
MasterTableColumns = GetColumns(qMasterDS, TheMasterTable)
SecondTableColumns = GetColumns(qSecondDS, TheSecondTable)
Dim mColumnCount As Integer = MasterTableColumns.Count
Dim sColumnCount As Integer = SecondTableColumns.Count
Dim ColumnCount As Integer = mColumnCount + sColumnCount
Dim LinqMasterTable = qMasterDS.Tables(TheMasterTable).AsEnumerable
Dim LinqSecondTable = qSecondDS.Tables(TheSecondTable).AsEnumerable
Get the Data and order it by the Selected Key:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Put the Results into a Dataset Table:
' Make sure the dataset is available and/or cleared:
If dsResults.Tables(qResultsName) Is Nothing Then dsResults.Tables.Add(qResultsName)
dsResults.Tables(qResultsName).Clear() : dsResults.Tables(qResultsName).Columns.Clear()
'Adds Master Table Column Names
For x = 0 To MasterTableColumns.Count - 1
dsResults.Tables(qResultsName).Columns.Add(MasterTableColumns(x))
Next
'Rename Second Table Names if Needed:
For x = 0 To SecondTableColumns.Count - 1
With dsResults.Tables(qResultsName)
For y = 0 To .Columns.Count - 1
If SecondTableColumns(x) = .Columns(y).ColumnName Then
SecondTableColumns(x) = SecondTableColumns(x) & "_2"
End If
Next
End With
Next
'Make sure that the Forign Key is a Unique Value
If ForignKey1 = PrimaryKey Then
TheNewForignKey = ForignKey1 & "_2"
Else
TheNewForignKey = ForignKey1
End If
'Adds Second Table Column Names
For x = 0 To SecondTableColumns.Count - 1
dsResults.Tables(qResultsName).Columns.Add(SecondTableColumns(x))
Next
'Copy Results into the Dataset:
For Each Match In Matches
'Build an array for each row:
Dim NewRow(ColumnCount - 1) As Object
'Add the mRow Items:
For x = 0 To MasterTableColumns.Count - 1
NewRow(x) = Match.mRows.Item(x)
Next
'Add the srow Items:
For x = 0 To SecondTableColumns.Count - 1
Dim y As Integer = x + (MasterTableColumns.Count)
NewRow(y) = Match.sRows.Item(x)
Next
'Add the array to dsResults as a Row:
dsResults.Tables(qResultsName).Rows.Add(NewRow)
Next
Give the user an option to clean doubles or not:
If chkUnique.Checked = True Then
ReMoveDuplicates(dsResults.Tables(qResultsName), ThePrimaryKey)
End If
Remove the Duplicates if they so desire:
Private Sub ReMoveDuplicates(ByRef SkipTable As DataTable, _
ByRef TableKey As String)
'Make sure that there's data to work with:
If SkipTable Is Nothing Then Exit Sub
If TableKey Is Nothing Then Exit Sub
'Create an ArrayList of rows to delete:
Dim DeleteRows As New ArrayList()
'Fill the Array with Row Number of the items equal
'to the item above them:
For x = 1 To SkipTable.Rows.Count - 1
Dim RowOne As DataRow = SkipTable.Rows(x - 1)
Dim RowTwo As DataRow = SkipTable.Rows(x)
If RowTwo.Item(TableKey) = RowOne.Item(TableKey) Then
DeleteRows.Add(x)
End If
Next
'If there are no hits, exit this sub:
If DeleteRows.Count < 1 Or DeleteRows Is Nothing Then
Exit Sub
End If
'Otherwise, remove the rows based on the row count value:
For x = 0 To DeleteRows.Count - 1
'Start at the END and count backwards so the duplicate
'item's row count value doesn't change with each deleted row
Dim KillRow As Integer = DeleteRows((DeleteRows.Count - 1) - x)
'Delete the row:
SkipTable.Rows(KillRow).Delete()
Next
End Sub
Then clean up any leftovers:
If Not chkRetainKeys.Checked = True Then 'Removes Forign Key
dsResults.Tables(qResultsName).Columns.Remove(TheNewForignKey)
End If
'Clear Arrays
MasterTableColumns.Clear()
SecondTableColumns.Clear()
Final Analysis:
Ran this against 2 Files with 4 columns, 65,535 rows, and with some doubles. Process time, roughly 1 second. In fact it took longer to load the fields into memory than it did to parse the data.