Validate multiple cell data using VBA - vba

I'm stuck in a bit of a pickle could someone please help.
I am trying to check if each property has a particular property_id.
Eg:- verify if each of the property has property if "ABC, XYZ, LMN, IJK". Also verify if each of date is > 10-12-2018
| Property | Property_ID | Date |
|----------|-------------|------------|
| A | ABC | 10/12/2018 |
| A | XYZ | 08/11/2018 |
| A | LMN | 12/05/2018 |
| A | IJK | 15/05/2018 |
| B | ABC | 13/12/2018 |
| B | XYZ | 14/10/2018 |
| B | IJK | 15/12/2018 |
| C | LMN | 01/12/2018 |
| C | XYZ | 17/05/2018 |
Expected Result
| Property | Property_ID | Date | Result |
|----------|-------------|------------|------------|
| A | ABC | 10/12/2018 |
| A | XYZ | 08/11/2018 |
| A | LMN | 12/05/2018 |
| A | IJK | 15/05/2018 |All PID's are found
| B | ABC | 13/12/2018 |
| B | XYZ | 14/10/2018 |
| B | IJK | 15/12/2018 |LMN is missing for Property B
| C | LMN | 01/12/2018 |
| C | XYZ | 17/05/2018 |ABC, IJK is missing for property C
MY Logic:
'CREATING VARIABLE TO ACCESS SHEET RANGE
sheetName1 = "test" 'sheetName SHOULD BE EQUAL TO WORKSHEET NAME (REPLACE THE NAME ACCORDINGLY)
Set sht1 = Sheets(sheetName1)
'FINDING TOTAL NUMBER OF ROWS PRESENT IN THE ACTIVE WORKSHEET
totalRowCount = ActiveWorkbook.Worksheets(sheetName1).Range("A1", Worksheets(sheetName1).Range("A1").End(xlDown)).Rows.Count
previous_Value = sht1.Range("A2")
current_Value = Null
'Creating Flags to verify value
ABC = False
XYZ = False
IJK = False
LMN = False
OPQ = False
Date_Validation = Null
For i = 2 To totalRowCount
current_Value = Trim(sht1.Range("A" & i))
If current_Value = previous_Value Then
promotion_ID = Trim(sht1.Range("B" & i))
'Validate date
Date = "10-12-2018"
If promotion_ID = "ABC" Then
ABC = True
ElseIf promotion_ID = "IJK" Then
IJK = True
ElseIf promotion_ID = "XYZ" Then
XYZ = True
End If
'FULL SERVICE
If promotion_ID = "LMN" Then
LMN = True
ElseIf promotion_ID = "OPQ" Then
OPQ = True
ElseIf promotion_ID = "QWE" Then
QWE = True
End If
Else
sht1.Range("D" & i) = "Here i need to display msg of flag which is not found"
previous_Value = sht1.Range("A" & i)
End If

I'm seeing a couple of problems in the code:
Your setting your "valid date" value inside your loop;
a. it will never change;
b. it's not being compared to anything to trigger an exception.
c. the current "Date" should be initialized from the first record, outside the loop and Date_validation should be set based on this first value, such as:
Date_validation = Date = Trim(sht1.Range("c2"))
Therefore "Date_validation" is only set true when the date is valid, any false value is an exception.
d. A side note: a var named "Date" is liable to run into keyword issues... may I suggest "PropDate" or "PromoDate".
e. Each new line read (i incremented) should test not only is `date_validation
If your If current_Value = previous_Value Then is false (ie new property), you haven't checked if all of the conditions have been met, only that your' now sitting on a new property. The corresponding Else should likely be an ElseIf such as:
ElseIf not (ABC and IJK and XYZ and ... and Date_validation ) 'only if all false
'.... display error message
' also all if the ABC (etc) ...
' triggers need to be reset to false
' along with Data_validation
At the point of the message in the code, i has already incremented and is sitting on a new record i. it should probably read (something) i-1

Related

Pandas share of value with condition and adding new column

I'm new to pandas and got stuck a bit. Can you help me?
I have a dataframe storing orders:
| item | store_status | customer_status |
|------|--------------|-----------------|
| A | 'dispatched' | 'received' |
| A | 'dispatched' | 'pending' |
| B | 'pending' | 'pending' |
| B | 'dispatched' | 'received' |
| B | 'dispatched' | 'pending' |
I want to create a new dataframe that shows what portion of each item is 'dispatched' and 'received'. So the result would be:
| item | dispatched_and_received |
|------|-------------------------|
| A | 0.5 |
| B | 0.33 |
I'm also interested in the portion of each item that is 'dispatched', regardless of the customer status and want to add it as a new column to this dataframe:
| item | dispatched_and_received | dispatched |
|------|-------------------------|------------|
| A | 0.5 | 1.00 |
| B | 0.33 | 0.66 |
Thank you!
Create Boolean Series that check the conditions, then take the mean of those Series within each group.
(df.assign(dispatched=df.store_status.eq('dispatched'),
dispatched_and_received=(df.store_status.eq('dispatched')
& df.customer_status.eq('received')))
.groupby('item')[['dispatched', 'dispatched_and_received']]
.mean()
.reset_index())
# item dispatched dispatched_and_received
#0 A 1.000000 0.500000
#1 B 0.666667 0.333333
The assign just creates the columns, you can split that out manually above if all of that chaining seems a bit cluttered. It's equivalent to:
df['dispatched'] = df.store_status.eq('dispatched')
df['dispatched_and_received'] = df['dispatched'] & df.customer_status.eq('received')
This is the DataFrame after the assign
item store_status customer_status dispatched dispatched_and_received
0 A dispatched received True True
1 A dispatched pending True False
2 B pending pending False False
3 B dispatched received True True
4 B dispatched pending True False

SQL Query (Display All 'x' Where 'x' Is Not In Table '2' for field 'y' and has 'z' flag)

I need to return all 'contacts' that do not appear in the 'delegate' table for 'event name' but do have flags in the 'contacts' table that can selected by the user for the search.
I know the query can be broken in to 2 parts.
Are they already attending this event (Does their email appear in 'delegates' table with delegates.event field matching 'event' on the user form)
WHERE (
d.Event <> [Forms]![usf_FindCampaignContacts]![FCC_EventName]
Do they match the criteria (Have they got the HR flag in 'contacts' table)
AND (c.[HR-DEL] = [Forms]![usf_FindCampaignContacts]![FCC_HRD] OR IsNull([Forms]![usf_FindCampaignContacts]![FCC_HRD]));
Based on the 2 things that the query is required to do I have written the following code...
SELECT
c.[First Name], c.[Last Name], c.Email, d.Event, c.Suppress, c.[HR-DEL]
FROM tbl_Contacts AS c LEFT JOIN tbl_Delegates AS d ON c.Email = d.Email
WHERE (
d.Event <> [Forms]![usf_FindCampaignContacts]![FCC_EventName]
And
c.Suppress = False
)
AND (c.[HR-DEL] = [Forms]![usf_FindCampaignContacts]![FCC_HRD] OR IsNull([Forms]![usf_FindCampaignContacts]![FCC_HRD]));
[FCC_HRD] refers to the user selected input on the form, I tried to use a <> to remove matching records but I feel this is where the compile error is so I changed these to and/or statements and this part now returns results with the matching flags (Success)
Other issue with attempting to do it this way is even if it worked it would remove anyone who was listed in the delegates/sponsor table. Which is why I added the <> statement for the Event as it only needs to remove them off the list for the named event. Again this works perfectly well (Success)
Final issue is the results are clearly being pulled from the 'delegates' table not the 'contacts' table as both parts above work but only display the results that match criteria in delegates table not from contacts.
Here is the query/table relationships
Here is the user form (This is not the final design)
Below are the 3 tables that are used in the query (2 direct, 1 linked)
Contacts (c)
+----+------------+---------------+-------------------------+--------+----------+
| ID | First Name | Last Name | Email | HR-DEL | Suppress |
+----+------------+---------------+-------------------------+--------+----------+
| 1 | A | Platt | a.platt#fake.com | TRUE | TRUE |
| 2 | D | Farr | d.farr#fake.com | TRUE | FALSE |
| 3 | Y | Helle | y.helle#fake.com | TRUE | FALSE |
| 4 | S | Oliphant | soliphant#fake.com | TRUE | FALSE |
| 5 | J | Bedell-Pearce | jbedell-pearce#fake.com | TRUE | FALSE |
| 6 | J | Walker | j.walker#fake.com | FALSE | FALSE |
| 7 | S | Rug | s.rug#fake.com | FALSE | FALSE |
| 8 | D | Brown | d.brown#fake.com | FALSE | FALSE |
| 9 | R | Cooper | r.cooper#fake.com | TRUE | FALSE |
| 10 | M | Morrall | m.morrall#fake.com | TRUE | FALSE |
+----+------------+---------------+-------------------------+--------+----------+
Delegates (d)
+----+-------------------------+-------+
| ID | Email | Event |
+----+-------------------------+-------+
| 1 | a.platt#fake.com | 2 |
| 2 | d.farr#fake.com | 1 |
| 3 | y.helle#fake.com | 4 |
| 4 | soliphant#fake.com | 3 |
| 6 | jbedell-pearce#fake.com | 2 |
+----+-------------------------+-------+
Events (not direct but used to check event name drop-down on user form vs event number in delegates)
+----+------------+
| ID | Event Name |
+----+------------+
| 1 | Test 1 |
| 2 | Test 2 |
| 3 | Test 3 |
| 4 | Test 4 |
+----+------------+
Based on form selection and this sample data I need to return the following:
All contacts who are flagged 'HR' TRUE, not suppressed or going to event named 'test 2' (Should be 5 - I always return the names of 'delegates' not going to the event only = 3)
Final results should be:
+----+------------+-----------+--------------------+--------+----------+
| ID | First Name | Last Name | Email | HR-DEL | Suppress |
+----+------------+-----------+--------------------+--------+----------+
| 2 | D | Farr | d.farr#fake.com | TRUE | FALSE |
| 3 | Y | Helle | y.helle#fake.com | TRUE | FALSE |
| 4 | S | Oliphant | soliphant#fake.com | TRUE | FALSE |
| 9 | R | Cooper | r.cooper#fake.com | TRUE | FALSE |
| 10 | M | Morrall | m.morrall#fake.com | TRUE | FALSE |
+----+------------+-----------+--------------------+--------+----------+
At the moment it appears to be pulling results from the wrong table (d not c). I attempted to change to OUTER join type but that returned with a FROM syntax error.
If I understand it correctly, basically you want to do this:
SELECT A.foo
FROM A
LEFT JOIN B
ON A.bar = B.bar
WHERE
<complex condition, partly involving B>
This cannot work. By including B in the global WHERE condition, you turn the LEFT JOIN into an INNER JOIN, and so you will only ever get records that match between A and B.
You can either move the filter on B into the JOIN condition:
SELECT A.foo
FROM A
LEFT JOIN B
ON (A.bar = B.bar)
AND (B.bamboozle = 42)
WHERE
A.columns = things
or LEFT JOIN a filtered subquery:
SELECT A.foo
FROM A
LEFT JOIN
(SELECT bar, columns FROM B
WHERE B.bamboozle = 42) AS B1
ON A.bar = B1.bar
WHERE
A.columns = things
So in your query, this is the bamboozle part you will need to move:
d.Event <> [Forms]![usf_FindCampaignContacts]![FCC_EventName]

macro to check if value is in another list, and if so add today's date

I have two excel sheets, A which contains products and B, which is the products we will discontinue when stock runs out.
I would like a macro so that we can make a list in B, hit the run function, and it will go and find where it is in sheet A, go to column E of that row and enter in today's date.
The hitch I have so far, is to not make it overwrite previous entries in the column if it wasn't found.
The basic formula I have right now is this
Sub Deletions()
Dim LastRow As Long
With Sheets("A") '<-set this worksheet reference properly
LastRow = .Range("A" & Cells.Rows.Count).End(xlUp).Row
With .Range("E2:E" & LastRow)
.Formula = "=IF(A1='B'!A1,TODAY(),)"
.Cells = .Value2
End With
End With
End Sub
The reason I need to use VBA, is that we have over 100k items, and not everyone using this will know excel very well. So we want to be able to make a list, put it in excel, and click the macro button and voila.
Also, the list of removed items gets deleted afterwards, as the information is kept in sheet A. We also need to keep the dates of when products got discontinued, so it is very crucial that this macro not erase previous entries.
Heres my answer:
Please follow the comments inside the code.
Sub discontinue_Prods()
'the button need to be on sheet B
'In sheet B need to have a header
Dim r
Dim c
Dim disRange As Range
Dim i
Dim shtA As Worksheet
Dim shtB As Worksheet
Dim dLine
Dim E 'to store the column number of column E
Dim A 'to store the column number of column A
Set shtA = Sheets("A") 'storing the sheets...
Set shtB = Sheets("B")
shtB.Activate 'no matter you are in the workbook, always run from the sheet B,
'this code will do that for you.
r = Range("A2").End(xlDown).Row 'the last row of the list
'with the discounted prods
'If you do not want headers,
'use A1 here
c = 1 'column A... changed if you need
Set disRange = Range(Cells(2, c), Cells(r, c)) 'here need to change the 2 for
'1 if you do not want headers
E = 5 'column E and A, just the numbers
A = 1
shtA.Activate 'go to sheet A
For Each i In disRange 'for each item inside the list of prod going to discount
dLine = Empty
On Error Resume Next
dLine = Application.WorksheetFunction.Match(i.Value, shtA.Columns(A), False)
'here we find the row where the prod is,
'searching for the item on the list (Sheet B).
If Not dLine = Empty Then
shtA.Cells(dLine, E).Value = Date 'heres we add the today date (system date)
'to column E, just as text
'IMPORTANT!
'if you want the formula uncomment and use this:
'Cells(dLine, E).FormulaR1C1 = "=TODAY()"
End If
On Error GoTo 0
Next i
End Sub
Just go over the cells in the list of Sheet B, and go to Sheet A and find the products, and if the code find any Match product, set the column E as a Todays date, using the system date. Note, if you want to user formulas see the comments.
With a list like this:
Sheet A
+----------+-----+
| Products | Qty |
+----------+-----+
| Prod001 | 44 |
| Prod002 | 27 |
| Prod003 | 65 |
| Prod004 | 135 |
| Prod005 | 95 |
| Prod006 | 36 |
| Prod007 | 114 |
| Prod008 | 20 |
| Prod009 | 107 |
| Prod010 | 7 |
| Prod011 | 22 |
| Prod012 | 142 |
| Prod013 | 99 |
| Prod014 | 144 |
| Prod015 | 150 |
| Prod016 | 44 |
| Prod017 | 57 |
| Prod018 | 64 |
| Prod019 | 17 |
| Prod020 | 88 |
+----------+-----+
Sheet B
+----------+
| Products |
+----------+
| Prod017 |
| Prod011 |
| Prod005 |
| Prod018 |
| Prod006 |
| Prod009 |
| Prod006 |
| Prod001 |
| Prod017 |
+----------+
Result in Sheet A
+----------+-----+--+--+-----------+
| Products | Qty | | | |
+----------+-----+--+--+-----------+
| Prod001 | 44 | | | 2/23/2016 |
| Prod002 | 27 | | | |
| Prod003 | 65 | | | |
| Prod004 | 135 | | | |
| Prod005 | 95 | | | 2/23/2016 |
| Prod006 | 36 | | | 2/23/2016 |
| Prod007 | 114 | | | |
| Prod008 | 20 | | | |
| Prod009 | 107 | | | 2/23/2016 |
| Prod010 | 7 | | | |
| Prod011 | 22 | | | 2/23/2016 |
| Prod012 | 142 | | | |
| Prod013 | 99 | | | |
| Prod014 | 144 | | | |
| Prod015 | 150 | | | |
| Prod016 | 44 | | | |
| Prod017 | 57 | | | 2/23/2016 |
| Prod018 | 64 | | | 2/23/2016 |
| Prod019 | 17 | | | |
| Prod020 | 88 | | | |
+----------+-----+--+--+-----------+
I think you are overcomplicating this by using VBA.
Instead, you can do this with a simple Excel formula:
Assume 'Sheet B', column A holds the list of discontinued items. 'Sheet A' column A holds the name of each item, and you want today's date in column E, wherever there is a match of an item in Sheet B. Put this in 'Sheet A' E1 and copy it down to the end of the sheet.
=IF(ISERROR(MATCH(A1,'Sheet B'!A:A, 0)), "", TODAY())
This will put today's date, as long as the row in sheet A matches any of the rows in sheet B. It tries to find a match anywhere on Sheet B, and if it doesn't, it will produce an error, meaning ISERROR will be TRUE, and the IF statement will produce "". If it does match, there will be no error, and it will produce TODAY().
This is what I would do:
Dim b as Variant
For j=1 to Range("A1").End(xlDown).Row 'Assuming the button is on the "B" Sheet
b=Cells(j,1).Value 'This is your product in Sheet "B", assuming it is in the first column
For i=1 to Sheets("A").Range("A1").End(xlDown).Row
If Sheets("A").Cells(i,1).Value=b Then 'This would mean the product was found in the i Row
Sheets("A").Cells(i,5)=Format(Now(), "MMM-DD-YYYY") 'Write today's date
Exit For 'No need to keep looping
End if
Next i
Next j
It's very basic, but I'm sure it works.

How to add up row totals in report viewer

I'm new to reporting and the jargon that goes with so I will try to draw it insted of write it.
| | A | B | C | D | E |
-------------------------------------------------
| Apples | 1 | 3 | 6 | 2 | 12 |
-------------------------------------------------
| Oranges | 3 | 2 | 4 | 1 | 10 |
-------------------------------------------------
| Bananas | 5 | 3 | | 1 | 9 |
-------------------------------------------------
| | | | | | 31 |
I need to sum up the last column E where I indicated 31. The cells with values 12,10,9 are obtained by =Sum(Fields!A.Value + Fields!B.Value + Fields!C.Value + Fields!D.Value).
I can't change the sql query and/or the dataset that is used. Does any one have a suggestion? Thanks!
EDIT:
I've added a function to the code
Public Total_lookup_Sum As Integer = 0
Public Function Lookup_Sum(ByVal value As Integer) As Integer
Total_lookup_Sum = Total_lookup_Sum + value
Return Total_lookup_Sum
End Function
and calling like this Code.Equals(ReportItems!txtFruitTotal.Value) but I get FALSE.
With a help of a colleague the answer was reached:
Sum(Fields!A.Value) + Sum(Fields!B.Value) + Sum(Fields!C.Value) + Sum(Fields!D.Value)
In the Cell Where 31 is simply, make this Expression
=Sum(Fields!A.Value) + Sum(Fields!B.Value) + Sum(Fields!C.Value) + Sum(Fields!D.Value)
OR
=Sum(Fields!E.Value)

Linq join on parameterized distinct key

I'm trying to LINQ two tables based on a dynamic key. User can change key via a combo box. Key may be money, string, double, int, etc. Currently I'm getting the data just fine, but without filtering out the doubles. I can filter the double in VB, but it's slooooow. I'd like to do it in the LINQ query right out of the gate.
Here's the data:
First Table:
-------------------------------------------------------------
| AppleIndex | AppleCost | AppleColor | AppleDescription |
------------------------------------------------------------
| 1 | 3 | Red | This is an apple |
| 2 | 5 | Green | This is an apple |
| 3 | 4 | Pink | This is an apple |
| 4 | 2 | Yellow | This is an apple |
| 5 | 2 | Orange | This is an apple |
| 1 | 3 | Red | This is a duplicate|
| 2 | 5 | Green | This is a duplicate|
| 3 | 4 | Pink | This is a duplicate|
| 4 | 2 | Yellow | This is a duplicate|
| 5 | 2 | Orange | This is a duplicate|
-------------------------------------------------------------
Second Table:
------------------------------------------------------------
| OrangeIndex | OrangeCost | OrangeColor | OrangeDescription |
------------------------------------------------------------
| 1 | 1 | Orange | This is an Orange |
| 2 | 3 | Orange | |
| 3 | 2 | Orange | This is an Orange |
| 4 | 3 | Orange | |
| 5 | 2 | Orange | This is an Orange |
------------------------------------------------------------
Currently, I'm using the following code to get too much data:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows Distinct
Outcome:
-------------------------------------------------------------------------
| 1 | 3 | Red | This is an apple | 1 | Orange | This is an Orange |
| 1 | 3 | Red | This is an duplicate | 1 | Orange | This is an Orange |
| 2 | 5 | Green | This is an apple | 3 | Orange | |
| 2 | 5 | Green | This is an duplicate | 3 | Orange | |
| 3 | 4 | Pink | This is an apple | 2 | Orange | This is an Orange |
| 3 | 4 | Pink | This is an duplicate | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 3 | Orange | |
| 4 | 2 | Yellow | This is an duplicate | 3 | Orange | |
| 5 | 2 | Orange | This is an apple | 2 | Orange | This is an Orange |
| 5 | 2 | Orange | This is an duplicate | 2 | Orange | This is an Orange |
-------------------------------------------------------------------------
Desired Outcome:
------------------------------------------------------------------------
| 1 | 3 | Red | This is an apple | 1 | 1 | Orange | This is an Orange |
| 2 | 5 | Green | This is an apple | 2 | 3 | Orange | |
| 3 | 4 | Pink | This is an apple | 3 | 2 | Orange | This is an Orange |
| 4 | 2 | Yellow | This is an apple | 4 | 3 | Orange | |
| 5 | 2 | Orange | This is an apple | 5 | 2 | Orange | This is an Orange |
------------------------------------------------------------------------
I have tried the following:
'Get the original Column Names into an Array List
'MasterTableColumns = GetColumns(qMasterDS, TheMasterTable) '(external code)
'Plug the Existing DataSet into a DataView:
Dim View As DataView = New DataView(qMasterTable)
'Sort by the Primary Key:
View.Sort = ThePrimaryKey
'Build a new table listing only one column:
Dim newListTable As DataTable = _
View.ToTable("UniqueData", True, ThePrimaryKey)
This returns a unique list, but no associated data:
-------------
| AppleIndex |
-------------
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
-------------
So I tried this instead:
'Build a new table with ALL the columns:
Dim newFullTable As DataTable = _
View.ToTable("UniqueData", True, _
MasterTableColumns(0), _
MasterTableColumns(1), _
MasterTableColumns(2), _
MasterTableColumns(3))
Unfortunately, it yields the following... with duplicates:
-------------------------------------------------------------
| AppleIndex | AppleCost | AppleColor | AppleDescription |
------------------------------------------------------------
| 1 | 3 | Red | This is an apple |
| 2 | 5 | Green | This is an apple |
| 3 | 4 | Pink | This is an apple |
| 4 | 2 | Yellow | This is an apple |
| 5 | 2 | Orange | This is an apple |
| 1 | 3 | Red | This is a duplicate|
| 2 | 5 | Green | This is a duplicate|
| 3 | 4 | Pink | This is a duplicate|
| 4 | 2 | Yellow | This is a duplicate|
| 5 | 2 | Orange | This is a duplicate|
-------------------------------------------------------------
Any ideas?
~~~~~~~~~~~~ Update: ~~~~~~~~~~~~
Jeff M suggested the following code. (Thanks Jeff) However, it gives me a error. Does anyone know the syntax for making this work in VB? I've monkeyed with it a bit and can't seem to get it right.
Dim matches = _
From mRows In (From row In LinqMasterTable _
Group row By row(ThePrimaryKey) Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Error in Third row at "row(ThePrimaryKey)":
"Range variable name can be inferred only from a simple or qualified name with no arguments."
Well, the basic problem isn't the LINQ. It's the fact the your First Table contains "duplicates", which aren't really duplicates, since in your example, every row is distinctive.
So, our question to you is "How do we identify the duplicates in the original table?". Once that is answered, the rest should be trivial.
For example (In C# since I'm not sure of the VB syntax)
var Matches = from mRows in LinqMasterTable
.Where(r=>r.AppleDescription=="This is an Apple")
join sRows in LinqSecondTable
on mRows(ThePrimaryKey) equals sRows(TheForignKey)
orderby mRows(ThePrimaryKey)
select new { mRows, sRows};
Edit:
Here's how I would write the C# LINQ query. Here's an alternate version rather than using Distinct(), uses a nested query with grouping which should have similar semantics. It should be easily convertible to VB.
var matches = from mRows in (from row in LinqMasterTable
group row by row[ThePrimaryKey] into g
select g.First())
join sRows in LinqSecondTable
on mRows[ThePrimaryKey] Equals sRows[TheForignKey]
orderby mRows[ThePrimaryKey]
select new { mRows, sRows }
and my attempt at a VB version of the above:
Edit:
As for the most recent error, I know exactly how to deal with it. When I was playing with VB LINQ, I found that the compiler doesn't like complex grouping expressions. To get around that, assign row(ThePrimaryKey) to a temporary variable and group by that variable. It should work then.
Dim matches = From mRows In (From row In LinqMasterTable _
Let grouping = row(ThePrimaryKey)
Group row By grouping Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Actually upon second inspection, it turns out that what is being grouped by needs a name. The following will work.
Dim matches = From mRows In (From row In LinqMasterTable _
Group row By Grouping = row(ThePrimaryKey) Into g() _
Select g.First()) _
Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Declarations and Such:
Private Sub LinqTwoTableInnerJoin(ByRef qMasterDS As DataSet, _
ByRef qMasterTable As DataTable, _
ByRef qSecondDS As DataSet, _
ByRef qSecondTable As DataTable, _
ByRef qPrimaryKey As String, _
ByRef qForignKey As String, _
ByVal qResultsName As String)
Dim TheMasterTable As String = qMasterTable.TableName
Dim TheSecondTable As String = qSecondTable.TableName
Dim ThePrimaryKey As String = qPrimaryKey
Dim TheForignKey As String = qForignKey
Dim TheNewForignKey As String = ""
MasterTableColumns = GetColumns(qMasterDS, TheMasterTable)
SecondTableColumns = GetColumns(qSecondDS, TheSecondTable)
Dim mColumnCount As Integer = MasterTableColumns.Count
Dim sColumnCount As Integer = SecondTableColumns.Count
Dim ColumnCount As Integer = mColumnCount + sColumnCount
Dim LinqMasterTable = qMasterDS.Tables(TheMasterTable).AsEnumerable
Dim LinqSecondTable = qSecondDS.Tables(TheSecondTable).AsEnumerable
Get the Data and order it by the Selected Key:
Dim Matches = From mRows In LinqMasterTable Join sRows In LinqSecondTable _
On mRows(ThePrimaryKey) Equals sRows(TheForignKey) _
Order By mRows(ThePrimaryKey) _
Select mRows, sRows
Put the Results into a Dataset Table:
' Make sure the dataset is available and/or cleared:
If dsResults.Tables(qResultsName) Is Nothing Then dsResults.Tables.Add(qResultsName)
dsResults.Tables(qResultsName).Clear() : dsResults.Tables(qResultsName).Columns.Clear()
'Adds Master Table Column Names
For x = 0 To MasterTableColumns.Count - 1
dsResults.Tables(qResultsName).Columns.Add(MasterTableColumns(x))
Next
'Rename Second Table Names if Needed:
For x = 0 To SecondTableColumns.Count - 1
With dsResults.Tables(qResultsName)
For y = 0 To .Columns.Count - 1
If SecondTableColumns(x) = .Columns(y).ColumnName Then
SecondTableColumns(x) = SecondTableColumns(x) & "_2"
End If
Next
End With
Next
'Make sure that the Forign Key is a Unique Value
If ForignKey1 = PrimaryKey Then
TheNewForignKey = ForignKey1 & "_2"
Else
TheNewForignKey = ForignKey1
End If
'Adds Second Table Column Names
For x = 0 To SecondTableColumns.Count - 1
dsResults.Tables(qResultsName).Columns.Add(SecondTableColumns(x))
Next
'Copy Results into the Dataset:
For Each Match In Matches
'Build an array for each row:
Dim NewRow(ColumnCount - 1) As Object
'Add the mRow Items:
For x = 0 To MasterTableColumns.Count - 1
NewRow(x) = Match.mRows.Item(x)
Next
'Add the srow Items:
For x = 0 To SecondTableColumns.Count - 1
Dim y As Integer = x + (MasterTableColumns.Count)
NewRow(y) = Match.sRows.Item(x)
Next
'Add the array to dsResults as a Row:
dsResults.Tables(qResultsName).Rows.Add(NewRow)
Next
Give the user an option to clean doubles or not:
If chkUnique.Checked = True Then
ReMoveDuplicates(dsResults.Tables(qResultsName), ThePrimaryKey)
End If
Remove the Duplicates if they so desire:
Private Sub ReMoveDuplicates(ByRef SkipTable As DataTable, _
ByRef TableKey As String)
'Make sure that there's data to work with:
If SkipTable Is Nothing Then Exit Sub
If TableKey Is Nothing Then Exit Sub
'Create an ArrayList of rows to delete:
Dim DeleteRows As New ArrayList()
'Fill the Array with Row Number of the items equal
'to the item above them:
For x = 1 To SkipTable.Rows.Count - 1
Dim RowOne As DataRow = SkipTable.Rows(x - 1)
Dim RowTwo As DataRow = SkipTable.Rows(x)
If RowTwo.Item(TableKey) = RowOne.Item(TableKey) Then
DeleteRows.Add(x)
End If
Next
'If there are no hits, exit this sub:
If DeleteRows.Count < 1 Or DeleteRows Is Nothing Then
Exit Sub
End If
'Otherwise, remove the rows based on the row count value:
For x = 0 To DeleteRows.Count - 1
'Start at the END and count backwards so the duplicate
'item's row count value doesn't change with each deleted row
Dim KillRow As Integer = DeleteRows((DeleteRows.Count - 1) - x)
'Delete the row:
SkipTable.Rows(KillRow).Delete()
Next
End Sub
Then clean up any leftovers:
If Not chkRetainKeys.Checked = True Then 'Removes Forign Key
dsResults.Tables(qResultsName).Columns.Remove(TheNewForignKey)
End If
'Clear Arrays
MasterTableColumns.Clear()
SecondTableColumns.Clear()
Final Analysis:
Ran this against 2 Files with 4 columns, 65,535 rows, and with some doubles. Process time, roughly 1 second. In fact it took longer to load the fields into memory than it did to parse the data.