How do I invert my join critera in TSQL? - sql

I think this is a relatively basic operation in SQL, but I'm having a hard time figuring it out.
I have two tables, a source table I'm trying to SELECT my data from, and a reference table containing serial #'s and transaction #'s (the source table also has these columns, and more). I want to do two different SELECTs, one where the serial/trans number pair exist in the source table and one where the serial/trans number do not exist. (the combination of serial and trans number are the primary keys for both these tables)
Intially I'm doing this with a join like this:
SELECT * FROM source s
INNER JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
I would think this should give me everything from the source table that has a serial/trans pair matching with one in the reference table. I'm not positive this is correct, but it is returning a reasonable number of results and I think it looks good.
When I go to do the opposite, get everything from source where the serial/trans pair do not match up with one in reference, I encounter a problem. I tried the following query:
SELECT * FROM source s
INNER JOIN reference r ON s.serial <> r.serial AND s.trans <> r.trans
When I run this the query goes on forever, it starts returning way more results than it should, more than are actually in the entire source table. It eventually ends with an OOM exception, I let it run for 20 min+. For some persepctive the source table I'm dealing with has about 13 million records, and reference table has about 105,000.
So how do I get the results I'm looking for? If it is not already clear, the number of results from the first query + results from second query should equal the total number of records in my source table.
Thanks for any help!

I think you need something like NOT EXISTS:
SELECT *
FROM source s
WHERE NOT EXISTS (SELECT 1
FROM reference r
WHERE s.serial = r.serial AND s.trans = r.trans)
The above query will get everything from source where the serial/trans pair do not match up with one in reference.
As cited in comment from #Dan below NOT EXISTS generally has a performance advantage over LEFT JOIN in a situation like this (see this for example).

"LEFT JOIN" match records that match together, and return NULL for those that doesn't fit.
So, non-matching records all have NULL in column coming from reference table. This way, you can than add a where clause to return only records with null value in a non-nullable column of table "reference" (like the primarykey).
SELECT * FROM source s
LEFT JOIN reference r ON s.serial = r.serial AND s.trans = r.trans
WHERE R.serial IS NULL

Related

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

Databricks, comparing two tables to see which records are missing

I'm looking into two tables that are supposed to be equal. I run this query to see which records are missing in table B against table A (we have a 3-columned key):
select *
from tableA A
left join TableB B
on A.joinField1 = B.joinField1
and A.joinField2 = B.joinField2
and A.joinField3 = B.joinField3
where B.joinField1 is null
or B.joinField2 is null
or B.joinField3 is null
This way, should a record in A be missing in B, it gets filtered in this query (based on the key).
For some reason, when I randomly pick one of these missing records and look it up directly in table B (with a simple select, filtered on the key), it shows up.
Why does my query include them when actually there is a match? there are no null values and fields formats match.
We can use EXCEPT command for this . EXCEPT and EXCEPT ALL return the rows that are found in one relation but not the other. EXCEPT (alternatively, EXCEPT DISTINCT) takes only distinct rows while EXCEPT ALL does not remove duplicates from the result rows. Note that MINUS is an alias for EXCEPT. You can ref link

Duplicate Rows when self joining tables in SQL

I am trying to self join a table together based on the column "Warehouse Number". The goal is to list part numbers, descriptions, and item class of any pairs of parts that are in the same item class and same warehouse. Below is an example of the desired output and starting data.
STARTING DATA
EXAMPLE OF SOME DESIRED DATA
However, when that self join happens, there aren't "exact" duplicates but the pairs appear twice in the table.
EXAMPLE OF OUTPUT WITH PROBLEMS (HIGHLIGHTED)
I have tried most iterations of UNION, INNER JOIN, and other join methods. Is it possible to remove the pairs since it isn't technically an exact duplicate of another row?
Current SQL Code
You may alter your join condition to check that the first part number is strictly less than the second one:
SELECT
t1.PARTNUMB, t1.PARTDESC, t1.ITEMCLSS, t2.PARTNUMB, t2.PARTDESC, t2.ITEMCLSS
FROM PARTFIRST t1
INNER JOIN PARTSECOND t2
ON t1.WRHSNUMB = t2.WRHSNUMB AND
t1.ITEMCLSS = t2.ITEMCLSS AND
t1.PARTNUMB < t2.PARTNUMB;
The problem with using FIRST.PARTNUMB <> SECOND.PARTNUMB is that it would report two different part numbers twice, once on the left/right side and vice-versa. By using a strictly less than inequality, we exclude "duplicates," as you view them.

Newbie to SQL I have run the the inner join query but result comes up with columns only

I have run this query in adventureworks but the result is run successfully but i only get the columns instead of the data with columns how so?
select
a.BusinessEntityID,b.bonus,b.SalesLastYear
from
[Sales].[SalesPersonQuotaHistory] a
inner join
[Sales].[SalesPerson] b
on
a.SalesQuota = b.SalesQuota
My best guess is that instead of joining the tables on SalesQuota, you should be joining them on something else - An ID field, typically.
I don't have Adventureworks here, but judging from the names of the tables and the columns that you've provided, I would assume that there's a SalesPersonID field of some sort that actually connects a Salesperson's quota history to the Salesperson him/herself.
I would expect that you're looking for something closer to this:
SELECT
a.BusinessEntityID
,b.bonus
,b.SalesLastYear
FROM [Sales].[SalesPersonQuotaHistory] a
INNER JOIN [Sales].[SalesPerson] b
ON a.SalesPersonID = b.SalesPersonID
General Knowledge:
INNER JOIN means "Show me only entries (rows) that have a matching value on both sides of the condition." (i.e. The value in Table A matches the value in Table B).
So ON a.SalesQuota = b.SalesQuota means "Only where the value of SalesQuota in Table A matches the value of SalesQuota in Table B."
I'm not sure what the purpose of this query could be, since it is entirely possible that two salespeople have the same values in both tables, and then you would get duplicate rows (because the values of SalesQuota would match in both cases), or that the values wouldn't match at all, and then you wouldn't get any rows - I suspect that is what's happening to you.
Consider the conditions of what you're trying to join. Are you really trying to join quota amounts, or are you trying to retrieve quota information for specific salespeople? The answer should help guide your JOIN conditions.

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?
ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.
This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).
Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);
For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.