SQL - identifying exact matches across multiple records

SQL - identifying exact matches across multiple records - sql

Table Parent
Column1
S1
S2
S3
Table Child
Column1 Column2
S1 P1
S1 P2
S2 P1
S2 P2
S3 P1
Where parent.column1 = child.column1
Given the above tables, I need to identify the parents whose children have the same records in column2 as parent S1 does.
For example, S1 and S2 both have P1 and P2, so that would meet the condition. S3, however, does not have P2, and should therefore be excluded.
New to SQL, so I'm having some trouble. Tried it by using a not in statement, but since S3 has P1, it's not being excluded.

You need a join. This will vary by SQL dialect. Something like:
select child.column1, child.column2 from (
select column2 as parentsColumn2Value from child where column1='S1'
) as parentsColumn2Table
left join child on parentsColumn2Table.column2=child.column2

Solving this for the general case of finding all the matching parents is more fun :) See note at the bottom for how this can be used for the simpler case (one of the parent keys is fixed).
I'll give what I think is a "proper" solution, but I don't have a copy of Access (or SQL Server) to hand to see if it works in there. (Yes, I have tested this against a DB here though...)
SELECT p1.column1, p2.column1
FROM parent p1 JOIN parent p2 ON p1.column1 < p2.column1
WHERE NOT EXISTS (SELECT 1
FROM (SELECT c1.column1, c1.column2 FROM child c1 WHERE c1.column1 = p1.column1) c1f
FULL OUTER JOIN
(SELECT c2.column1, c2.column2 FROM child c2 WHERE c2.column1 = p2.column1) c2f
ON c1f.column2 = c2f.column2
WHERE c1f.column1 IS NULL OR c2f.column1 IS NULL
);
So hopefully you can see how what I said above is tied together in this :) I'll try to explain...
The "outer" (first) select generates combinations of column1 values (p1.column1 and p2.column1). For each of the combinations, we list the rows in "child" for those values (these are c1f and c2f: c1f means "child 1 filtered") and do a FULL OUTER JOIN. Which is a comparatively rare construct, in my experience. We want to match up all the entries in c1f and c2f (using their column1 values), and find any on either side that doesn't have a corresponding entry on the other side. If there are any such non-matchers then they will manifest as rows from the join with a null for their column1 value. So the parent query selects only combinations of column1 values where no such rows in the subquery exist, i.e. every child row for p1's column1 value has a corresponding child row for p2's column1 value and vice versa.
So for instance, for the iteration where p1.column1 is 'S1' and p2.column2 is 'S3', that subquery (without aggregation) would produce:
c1f__column1 | c1f__column2 | c2f__column1 | c2f__column2
--------------+--------------+--------------+--------------
S1 | P1 | S3 | P1
S1 | P2 | |
and it's those nulls in the second row that flag this combination as not matching. Some twisty thinking involved, it's tempting to get fixated on finding matching combinations, when finding non-matching ones is easier.
As a final bonus, when I created some test tables for this, I made (column1,column2) the primary key of child, which just so happened to be exactly what you need to drive the full outer join of the filtered tables efficiently. Win! (So do note I haven't tried to cope with duplicate combinations in child... but you could just slap "distinct" in the c1f and c2f derivations)
NB based on Matt's comment, if one of your parent values was known (i.e. you just wanted to list all the parent values with the same children as S1) then you can just slap "and p1.column1 = 'S1'" on the end of this. But replace "parent p1 JOIN parent p2 ON p1.column1 < p2.column1" with just "parent p1, parent p2" in that case... remember that otherwise the query as written will only output half of all the possible pairs...

This is an Access, rather than an SQL solution in that it makes use of a User Defined Function (UDF).
SELECT p.Column1,
ConcatList("SELECT Column2 FROM c WHERE Column1='S1'","|") AS S1,
ConcatList("SELECT Column2 FROM c WHERE Column1='" & [p].[Column1] & "'","|") AS Child,
Format([S1]=[Child],"Yes/No") AS [Match]
FROM p;
The UDF
Function ConcatList(strSQL As String, strDelim, ParamArray NameList() As Variant)
''Reference: Microsoft DAO x.x Object Library
Dim db As Database
Dim rs As DAO.Recordset
Dim strList As String
Set db = CurrentDb
If strSQL <> "" Then
Set rs = db.OpenRecordset(strSQL)
Do While Not rs.EOF
strList = strList & strDelim & rs.Fields(0)
rs.MoveNext
Loop
strList = Mid(strList, Len(strDelim) + 1)
Else
strList = Join(NameList, strDelim)
End If
ConcatList = strList
End Function
Unfortunately, I do not believe Jet supports Full Outer Join, so a solution using only SQL is likely to be a little tedious, and require more information, such as whether S1 has a fixed number of entries in column2.

ACE/Jet may not support FULL OUTER JOIN directly but the workaround is simple enough i.e. just UNION ALL the LEFT JOIN, INNER JOIN and RIGHT JOIN respectively of the tables.

Incidentally, i was searching for the same solution and have figured it out myself
select * from TableParent where id in
(select temp_parent.id from
(select TableParent.*, count(exact_match.match_count) as exact_count
from TableParent
inner join
(select Column1, count(*) as match_count from TableChild
group by Column1
having match_count = 2
) as exact_match on exact_match.Column1 = TableParent.Column1
inner join
TableChild on TableChild.event_id = exact_match.Column1 where TableChild.Column2 in (P1,P2)
group by TableParent.Column1
having exact_count = 2) as temp_parent)

Related

Sub-query works but would a join or other alternative be better?

I am trying to select rows from one table where the id referenced in those rows matches the unique id from another table that relates to it like so:
SELECT *
FROM booklet_tickets
WHERE bookletId = (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
With the bookletNum/seasonId/bookletTypeId being filled in by a user form and inserted into the query.
This works and returns what I want but seems messy. Is a join better to use in this type of scenario?

If there is even a possibility for your subquery to return multiple value you should use in instead:
SELECT *
FROM booklet_tickets
WHERE bookletId in (SELECT id
FROM booklets
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3)
But I would prefer exists over in :
SELECT *
FROM booklet_tickets bt
WHERE EXISTS (SELECT 1
FROM booklets b
WHERE bookletNum = 2000
AND seasonId = 9
AND bookletTypeId = 3
AND b.id = bt.bookletId)

It is not possible to give a "Yes it's better" or "no it's not" answer for this type of scenario.
My personal rule of thumb if number of rows in a table is less than 1 million, I do not care optimising "SELECT WHERE IN" types of queries as SQL Server Query Optimizer is smart enough to pick an appropriate plan for the query.
In reality however you often need more values from a joined table in the final resultset so a JOIN with a filter WHERE clause might make more sense, such as:
SELECT BT.*, B.SeasonId
FROM booklet_tickes BT
INNER JOIN booklets B ON BT.bookletId = B.id
WHERE B.bookletNum = 2000
AND B.seasonId = 9
AND B.bookletTypeId = 3
To me it comes down to a question of style rather than anything else, write your code so that it'll be easier for you to understand it months later. So pick a certain style and then stick to it :)
The question however is old as the time itself :)
SQL JOIN vs IN performance?

Should I use an SQL full outer join for this?

Consider the following tables:
Table A:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
Table B:
DOC_NUM
DOC_TYPE
RELATED_DOC_NUM
NEXT_STATUS
...
The DOC_TYPE and NEXT_STATUS columns have different meanings between the two tables, although a NEXT_STATUS = 999 means "closed" in both. Also, under certain conditions, there will be a record in each table, with a reference to a corresponding entry in the other table (i.e. the RELATED_DOC_NUM columns).
I am trying to create a query that will get data from both tables that meet the following conditions:
A.RELATED_DOC_NUM = B.DOC_NUM
A.DOC_TYPE = "ST"
B.DOC_TYPE = "OT"
A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999
A.DOC_TYPE = "ST" represents a transfer order to transfer inventory from one plant to another. B.DOC_TYPE = "OT" represents a corresponding receipt of the transferred inventory at the receiving plant.
We want to get records from either table where there is an ST/OT pair where either or both entries are not closed (i.e. NEXT_STATUS < 999).
I am assuming that I need to use a FULL OUTER join to accomplish this. If this is the wrong assumption, please let me know what I should be doing instead.
UPDATE (11/30/2021):
I believe that #Caius Jard is correct in that this does not need to be an outer join. There should always be an ST/OT pair.
With that I have written my query as follows:
SELECT <columns>
FROM A LEFT JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
Does this make sense?
UPDATE 2 (11/30/2021):
The reality is that these are DB2 database tables being used by the JD Edwards ERP application. The only way I know of to see the table definitions is by using the web site http://www.jdetables.com/, entering the table ID and hitting return to run the search. It comes back with a ton of information about the table and its columns.
Table A is really F4211 and table B is really F4311.
Right now, I've simplified the query to keep it simple and keep variables to a minimum. This is what I have currently:
SELECT CAST(F4211.SDDOCO AS VARCHAR(8)) AS SO_NUM,
F4211.SDRORN AS RELATED_PO,
F4211.SDDCTO AS SO_DOC_TYPE,
F4211.SDNXTR AS SO_NEXT_STATUS,
CAST(F4311.PDDOCO AS VARCHAR(8)) AS PO_NUM,
F4311.PDRORN AS RELATED_SO,
F4311.PDDCTO AS PO_DOC_TYPE,
F4311.PDNXTR AS PO_NEXT_STATUS
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = CAST(F4311.PDDOCO AS VARCHAR(8))
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
The other part of the story is that I'm using a reporting package that allows you to define "virtual" views of the data. Virtual views allow the report developer to specify the SQL to use. This is the application where I am using the SQL. When I set up the SQL, there is a validation step that must be performed. It will return a limited set of results if the SQL is validated.
When I enter the query above and validate it, it says that there are no results, which makes no sense. I'm guessing the data casting is causing the issue, but not sure.
UPDATE 3 (11/30/2021):
One more twist to the story. The related doc number is not only defined as a string value, but it contains leading zeros. This is true in both tables. The main doc number (in both tables) is defined as a numeric value and therefore has no leading zeros. I have no idea why those who developed JDE would have done this, but that is what is there.
So, there are matching records between the two tables that meet the criteria, but I think I'm getting no results because when I convert the numeric to a string, it does not match, because one value is, say "12345", while the other is "00012345".
Can I pad the numeric -> string value with zeros before doing the equals check?
UPDATE 4 (12/2/2021):
Was able to finally get the query to work by converting the numeric doc num to a left zero padded string.
SELECT <columns>
FROM PROD2DTA.F4211 AS F4211
INNER JOIN PROD2DTA.F4311 AS F4311
ON F4211.SDRORN = RIGHT(CONCAT('00000000', CAST(F4311.PDDOCO AS VARCHAR(8))), 8)
WHERE F4211.SDDCTO IN ( 'ST' )
AND F4311.PDDCTO IN ( 'OT' )
AND ( F4211.SDNXTR < 999
OR F4311.PDNXTR < 999 )

You should write your query as follows:
SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)
LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause

See Update 4 for details of how I resolved the "data conversion or mapping" error.

Adding a WHERE clause to a Group Join

I am working in VB.NET. I have two datatables (table & table2), which are identical. The two columns in question are:
id | voided_id
1 | null
2 | null
3 | 2
table1 is the list of items, and table2 is the list of voided items. so, if an item has voided an earlier item, I want to exclude the voided item. In this example id 2 would be excluded because it was voided by id 3.
Here is what I have so far:
Dim compareResults = From table In resultOds
Group Join table2 In voidOds
On table.Field(Of Int64)("id") Equals table2.Field(Of Int64?)("voided_loan_id")
Into tablesJoin = Group From tableJoin In tablesJoin.Where(Function(x) x.Field(Of Int64?)("voided_loan_id") Is Nothing).DefaultIfEmpty()
Select table
right now, I get everything. The WHERE clause inside the group join isn't working. Any suggestions?
Many articles I found said .DefaultIfEmpty() should provide the functionality of the WHERE but this returns everything as well:
Dim compareResults2 = From table In resultOds
Group Join table2 In voidOds
On table.Field(Of Int64)("id") Equals table2.Field(Of Int64?)("voided_loan_id") Into tablesJoin = Group
From table2 In tablesJoin.DefaultIfEmpty()
Select table
based on some off-line input I got, I rewrote this as a subquery. Still returns everything.
Dim compareResults2 = From r In resultOds
Where Not (From v In voidOds Where v.Field(Of Int64?)("voided_loan_id") IsNot Nothing Select v.Field(Of Int64?)("voided_loan_id")).Contains(r.Field(Of Int64?)("id"))
Select r

Try below
C#
var s =ItemTable.Where(i=>!VoidedItem.Any(v=>v.id==i.id))?.ToList();
VB
Dim s = ItemTable.Where(Function(i) Not VoidedItem.Any(Function(v) v.id = i.id))?.ToList()

I didn't get it working so I defaulted to executing a SQL string. thanks for the feedback.

MS Access SQL: Enumerate results from a many-to-one relationship

Very simply, I have a many-to-one relationship table set in MS Access where I've managed to pull out the distinct values as separate rows. I now need to enumerate these rows.
The query looks like the following (generated by the MS Access Designer - apologies for the formatting):
SELECT DISTINCT ValidationRule.ValidationCode AS Rule, Table.Template AS Template
FROM ValidationRule RIGHT JOIN (([Table] INNER JOIN TableVersion ON Table.TableID = TableVersion.TableID) INNER JOIN ValidationScope ON TableVersion.TableVID = ValidationScope.TableVID) ON ValidationRule.ValidationId = ValidationScope.ValidationID
GROUP BY ValidationRule.ValidationCode, Table.Template
ORDER BY ValidationRule.ValidationCode;
So my data looks like:
Rule Template
v0007_m C 00.01
v0189_h C 01.00
v0189_h C 05.01
v3000_i C 08.00
I need to add sequential values to the results as follows:
Rule Template Sequence
v0007_m C 00.01 1
v0189_h C 01.00 1
v0189_h C 05.01 2
v3000_i C 08.00 1
What function should I be looking at in MS Access SQL to do this?

If you save the query you have as a separate query called qryValdationRule, this query which builds off that should give you what you need:
SELECT qryValidationRule.Rule, qryValidationRule.Template, DCount("*", 'qryValidationRule', "[Rule] = '" & qryValidationRule.Rule & "' AND [Template] <= '" & qryValidationRule.Template & "'") AS Sequence
FROM qryValidationRule
ORDER BY qryValidationRule.Rule, qryValidationRule.Template;
We are looking up and getting a count of all records with the same Rule value with an equal or less Template value within the dataset. This, essentially, gives us a Sequence grouped by Rule. This only works properly if Template values are distinct across Rule groups, which should be the case because you are pulling a DISTINCT across the CROSS JOIN of tables. It is not as convenient or flexible as window functions, but will get you what you need.
You may also want to try this method, which may be more efficient:
SELECT t1.Rule, t1.Template, COUNT(t2.Template) AS Sequence
FROM qryValidationRule AS t1 INNER JOIN qryValidationRule AS t2 ON t1.Rule = t2.Rule AND t1.Template >= t2.Template
GROUP BY t1.Rule, t1.Template
ORDER BY t1.Rule, t1.Template;
EDIT: Added an alternative way to find the same data; may be more performant because of JOINing vs. subqueries.

Use: Count(*) AS Sequence
SELECT DISTINCT ValidationRule.ValidationCode AS Rule, Table.Template AS Template, Count(*) AS Sequence
FROM ValidationRule RIGHT JOIN (([Table] INNER JOIN TableVersion ON Table.TableID = TableVersion.TableID) INNER JOIN ValidationScope ON TableVersion.TableVID = ValidationScope.TableVID) ON ValidationRule.ValidationId = ValidationScope.ValidationID
GROUP BY ValidationRule.ValidationCode, Table.Template
ORDER BY ValidationRule.ValidationCode;

cannot create a single result set oracle joins or subqueries

I have three basic queries that are related to each other and I need a single result set to return.
Simplified:
Query1 (possible return not a show stopper return null):
SELECT * FROM monkey WHERE monkey.ENDDATE IS NULL AND monkey.TEMPLATEID = 1
Query2 (possible return not a show stopper return null):
SELECT * FROM banana WHERE banana.ENDDATE IS NULL AND banana.TEMPLATEID = 1
Query3 (must return something):
SELECT * FROM tree WHERE tree.TEMPLATEID = 1
Query 1 and 2 may or may not return a value (come back null).
The third one will need to return a result (or not) IF the third query returns something I and query 1 or 2 fail I still want to return something.
I can’t do an outer join with 2 queries, because oracle won’t let me the error said… a.b (+) = b.b and a.c(+) = c.c is not allowed instead turn b+c into a view.
I think I understand the logical reason, never-the-less I need to return query 3 and maybe query 1, 2 or 1 and 2 along with 3 as a single result set.
I hope this makes sense.

You seem to be hitting ORA-01417. Using made up tables and data as you haven't provided any, or the join conditions, I can get the same effect by trying to outer join monkey to both tree and banana - in a completely contrived way, of course:
with banana as (select 'yellow' as colour, 1 as template_id, null as enddate
from dual),
monkey as (select 'capuchin' as monkeytype, 1 as template_id, null as enddate
from dual),
tree as (select 'tropical' as treetype, 1 as template_id from dual)
select t.treetype, b.colour, m.monkeytype
from tree t, banana b, monkey m
where t.template_id = 1
and b.template_id (+) = t.template_id
and b.enddate (+) is null
and m.template_id (+) = t.template_id
and m.enddate (+) is null
and m.template_id (+) = b.template_id;
Error at Command Line:10 Column:22
Error report:
SQL Error: ORA-01417: a table may be outer joined to at most one other table
01417. 00000 - "a table may be outer joined to at most one other table"
*Cause: a.b (+) = b.b and a.c (+) = c.c is not allowed
*Action: Check that this is really what you want, then join b and c first
in a view.
if you use the 'new' (since 9i, I think) ANSI join syntax, rather the Oracle-specific (+) notation, you can do more:
with banana as (select 'yellow' as colour, 1 as template_id, null as enddate
from dual),
monkey as (select 'capuchin' as monkeytype, 1 as template_id, null as enddate
from dual),
tree as (select 'tropical' as treetype, 1 as template_id from dual)
select t.treetype, b.colour, m.monkeytype
from tree t
left join banana b on b.template_id = t.template_id
and b.enddate is null
left join monkey m on m.template_id = t.template_id
and m.enddate is null
and m.template_id = b.template_id
where t.template_id = 1;
TREETYPE COLOUR MONKEYTYPE
-------- ------ ----------
tropical yellow capuchin
See the documentation for some of the restrictions on (+); Oracle recommend using the ANSI version, though they seem to use their own most of the time in examples in the documentation.

I suspect, that the tables, "monkey", "banana", and "tree" all have different column types, otherwise you can easily obtain what you want using a "UNION" or "UNION ALL" operator.
Also since you haven't shared the table defn. of these tables, I don't know if you have any foreign keys to join these tables or not. SO I am assuming they are completely unrelated.
So with these 2 assumptions....
Here's another way, of returning combined resultsets from different queries.
SELECT CURSOR(QRY1), CURSOR(QRY2), CURSOR(QRY3) FROM DUAL;
This will give you all three result-sets in one query. But each result-set is a ref-cursor which you'll have to navigate in your application.
If using java, the java type of each of the 3 columns, is a ResultSet, which you'll have to navigate, to get individual query results.
e.g.
String qry1= "SELECT * FROM monkey WHERE monkey.ENDDATE IS NULL";
String qry2 = " SELECT * FROM banana WHERE banana.ENDDATE IS NULL AND banana.TEMPLATEID = 1";
String qry3= "SELECT * FROM tree WHERE tree.ENDDATE IS NULL AND tree.TEMPLATEID = 1";
String qry = "SELECT CURSOR("+qry1+"), CURSOR("+qry2+"), CURSOR("+qry3+") FROM DUAL";
ResultSet rs = conn.createStatement().executeQuery(qry);
if(rs.next()) { //no while loop necessary, as we are expecting only one row
ResultSet rs1 = (ResultSet) rs.getObject(1);
ResultSet rs2 = (ResultSet) rs.getObject(2);
ResultSet rs3 = (ResultSet) rs.getObject(3);
while(rs1.next()) {
// retrive results of qry1
}
// same for rs2 and rs3
}
Hope that helped.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - identifying exact matches across multiple records - sql

You need a join. This will vary by SQL dialect. Something like: select child.column1, child.column2 from ( select column2 as parentsColumn2Value from child where column1='S1' ) as parentsColumn2Table left join child on parentsColumn2Table.column2=child.column2

ACE/Jet may not support FULL OUTER JOIN directly but the workaround is simple enough i.e. just UNION ALL the LEFT JOIN, INNER JOIN and RIGHT JOIN respectively of the tables.

Related

Sub-query works but would a join or other alternative be better?

Should I use an SQL full outer join for this?

Adding a WHERE clause to a Group Join

MS Access SQL: Enumerate results from a many-to-one relationship

cannot create a single result set oracle joins or subqueries

Categories

Resources