I'm trying to write a bit of SQL for SQLITE that will take a subset from two tables (TableA and TableB) and then perform a LEFT JOIN.
This is what I've tried, but this produces the wrong result:
Select * from TableA
Left Join TableB using(key)
where TableA.key2 = "xxxx"
AND TableB.key3 = "yyyy"
This ignore cases where key2="xxxx" but key3 != "yyyy".
I want all the rows from TableA that match my criteria whether or not their corresponding value in TableB matches, but only those rows from TableB that match both conditions.
I did manage to solve this by using a VIEW, but I'm sure there must be a better way of doing this. It's just beginning to drive me insane tryng to solve it now.
(Thanks for any help, hope I've explained this well enough).
YOu have made the classic left join mistake. In most databases if you want a condition on the table on the right side of the left join you must put this condition in the join itself and NOT the where clause. IN SQL Server this would turn the left join into an inner join. I've not used SQl lite so I don't know if it does the same but all records must meet the where clause generally.
Select *
from TableA
Left Join TableB on TableA.key = TableB.key
and TableB.key3 = "yyyy"
where TableA.key2 = "xxxx"
Related
I have two tables, Table A and Table B. Each table have 4 fields, the name of the fields are the same for both. Both tables are extracted from other tables, and each record acts as a primary key.
I want to write a query in MS Access 2010 that gets the data unique to Table B and not shared with Table A. I am using the following image as a reference, and it looks like I need to do a Right Join.
Hello. There is something not right with my SQL, I've tested it and I am getting the incorrect result. Below is the closest I've gotten:
SELECT DISTINCT TableB.*
FROM TableB RIGHT JOIN TableA ON (TableB.Field1 = TableA.Field1) AND (TableB.Field2 = TableA.Field2) AND (TableB.Field3 = TableA.Field3) AND (TableB.Field4 = TableA.Field4)
WHERE (((TableA.Field1) Is Null));
I think it would be clearer for you to use not exists:
select tableb.*
from tableb
where not exists (select 1
from tablea
where (TableB.Field1 = TableA.Field1) AND (TableB.Field2 = TableA.Field2) AND (TableB.Field3 = TableA.Field3) AND (TableB.Field4 = TableA.Field4)
);
Your use of RIGHT JOIN is incorrect. As phrased, you want a LEFT JOIN. That is, you want to keep all rows in the first table (the "left" table in the JOIN) regardless of whether or not a match exists in the second table. However, the NOT EXISTS does the same thing and the logic is a bit clearer.
You want to have right join if tablea is in your select statement, but as you have
SELECT DISTINCT TableB.*
you may want to have a left join instead. My suggestion would be changing your code from right to left join.
TableB acts like table A from venn diagrams above.
I'm wondering why the following two queries produce different results (the first query has more rows than second).
SELECT * FROM A
JOIN ...
JOIN ...
JOIN C ON ...
LEFT JOIN B ON B.id = A.id AND B.otherId = C.otherId
As opposed to:
SELECT * FROM A
JOIN ...
JOIN ...
JOIN C ON ...
LEFT JOIN B ON B.id = A.id
WHERE B.otherId = C.otherId
Please help me understand. In the second query, the left join has only 1 condition so shouldn't it include all the results from the first query and more (where the extra rows have unmatched otherId). Then the WHERE clause should ensure that the otherId matches, like in the first query. Why are they different?
The WHERE is performed first by the Query engine before performing the JOIN.
The reasoning being why do the expensive JOIN, if we are going to filter some rows later.
The query engines are pretty good at optimizing the query you write.
Also you will see this effect only in OUTER JOINs. In inner joins both WHERE and JOIN conditions behave the same.
The second query returns less rows because your where clause was filtering the records out, and this is essentially changing the query from a left outer join to an inner join. So, you need to be careful where you place your filters in, but this will not matter if you were to do an inner join.
You've received correct answers, but allow me to delve a little deeper into the difference between join criteria and filtering criteria. Take a simple query with a left join:
select a.Key, a.NonKey1, b.NonKey2
from a
left join b on b.Key = a.Key;
This lists out all of NonKey1 values from table a and any NonKey2 fields from table b with matching key values or NULL where there is no match. A common variant is to look at only those rows in a that do not have a match in b:
select a.Key, a.NonKey1, b.NonKey2
from a
left join b on b.Key = a.Key
where b.Key is null;
Careful! If you accidentally write where b.Key is not null you've just changed your outer join into a regular inner join. Do that sometime and see if QA can catch it. On second thought, don't. (Also, having b.NonKey2 in the selection list is meaningless as it can only ever be NULL, but let's leave it there for the moment.) The join is based on the key fields of both tables matching. After the joining is complete, all rows with a successful join are discarded and only the results without a match remain. That means b.Key in the join criteria cannot be NULL and in the filtering criteria must be NULL for a row to be added to the result set. Fine, that's what we wanted. But consider what would happen if we moved the check to become part of the join criteria.
select a.Key, a.NonKey1, b.NonKey2
from a
left join b on b.Key = a.Key and b.Key is null;
The result is everything from a with nothing at all from b. Probably not what we wanted. If you think about it, you will see we could just as well have written on 0 = 1 and gotten the same result. What we've done is move a value from one context where NULL means one thing (success) to a context where NULL means something entirely different (failure).
So, in computer languages as in human languages, be careful of context. It can completely change the meaning of what you're trying to say.
This question already has answers here:
SQL left join vs multiple tables on FROM line?
(12 answers)
Closed 8 years ago.
I'm curious as to why we need to use LEFT JOIN since we can use commas to select multiple tables.
What are the differences between LEFT JOIN and using commas to select multiple tables.
Which one is faster?
Here is my code:
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT no as nonvs,
owner,
owner_no,
vocab_no,
correct
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
...and:
SELECT *
FROM vocab_stats vs,
mst_words mw
WHERE mw.no = vs.vocab_no
AND vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
First of all, to be completely equivalent, the first query should have been written
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT *
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
So that mw.* and nvs.* together produce the same set as the 2nd query's singular *. The query as you have written can use an INNER JOIN, since it includes a filter on nvs.correct.
The general form
TABLEA LEFT JOIN TABLEB ON <CONDITION>
attempts to find TableB records based on the condition. If the fails, the results from TABLEA are kept, with all the columns from TableB set to NULL. In contrast
TABLEA INNER JOIN TABLEB ON <CONDITION>
also attempts to find TableB records based on the condition. However, when fails, the particular record from TableA is removed from the output result set.
The ANSI standard for CROSS JOIN produces a Cartesian product between the two tables.
TABLEA CROSS JOIN TABLEB
-- # or in older syntax, simply using commas
TABLEA, TABLEB
The intention of the syntax is that EACH row in TABLEA is joined to EACH row in TABLEB. So 4 rows in A and 3 rows in B produces 12 rows of output. When paired with conditions in the WHERE clause, it sometimes produces the same behaviour of the INNER JOIN, since they express the same thing (condition between A and B => keep or not). However, it is a lot clearer when reading as to the intention when you use INNER JOIN instead of commas.
Performance-wise, most DBMS will process a LEFT join faster than an INNER JOIN. The comma notation can cause database systems to misinterpret the intention and produce a bad query plan - so another plus for SQL92 notation.
Why do we need LEFT JOIN? If the explanation of LEFT JOIN above is still not enough (keep records in A without matches in B), then consider that to achieve the same, you would need a complex UNION between two sets using the old comma-notation to achieve the same effect. But as previously stated, this doesn't apply to your example, which is really an INNER JOIN hiding behind a LEFT JOIN.
Notes:
The RIGHT JOIN is the same as LEFT, except that it starts with TABLEB (right side) instead of A.
RIGHT and LEFT JOINS are both OUTER joins. The word OUTER is optional, i.e. it can be written as LEFT OUTER JOIN.
The third type of OUTER join is FULL OUTER join, but that is not discussed here.
Separating the JOIN from the WHERE makes it easy to read, as the join logic cannot be confused with the WHERE conditions. It will also generally be faster as the server will not need to conduct two separate queries and combine the results.
The two examples you've given are not really equivalent, as you have included a sub-query in the first example. This is a better example:
SELECT vs.*, mw.*
FROM vocab_stats vs, mst_words mw
LEFT JOIN vocab_stats vs ON mw.no = vs.vocab_no
WHERE vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
I've saw a join just like this:
Select <blablabla>
from
TableA TA
Inner join TableB TB on Ta.Id = Tb.Id
Inner join TableC TC on Tc.Id = Tb.Id and Ta.OtheriD = Tc.OtherColumn
But what's the point (end effect) of that second join clause?
What the implications when an outer join clause is used?
And, more important, what is the best to rewrite it in a way that is easy
to understand what it's trying to join?
And, more important, what is the best way to rewrite it to get rid of the construction
and mantain the correctness of the query.
I don't specify the RDBMS, because it's a more generic question, but for those
curious (since people always ask): it's SQL Server 2005.
EDIT: It's just a made up example (since I would have to dig the original source - which I don't have access anymore). I found the original join clause on a 10 join SELECT command.
It simply means you have an extra restriction on the intersection between tablea and tablec.
Because we know Ta.Id = Tb.Id, Tc.Id = Tb.Id is the same as Tc.Id = Ta.Id. Inner joins are associative. So it makes more sense like this so each join is between 2 tables only
Select <blablabla>
from
TableB TB
Inner join
TableA TA on Tb.Id = Ta.Id --a and b intersection
Inner join
TableC TC on Ta.Id = Tc.Id and Ta.OtheriD = Tc.Column --a and c intersection
Your Q : But what's the point (end effect) of that second join clause?
Effectively filters rows...you could move the second half of the on statement into the where clause if you really want, only really effects readability. gbn's answer looks good for this 3 table example,but to expand on it...sometimes a rewrite like this isn't possible. I have seen an occasion where 2 different systems (one oracle 8i and one SQL server 2000) had their databases joined together. A 3 part key was identified as being required to make the records unique in both systems, but each component of the 3 part key was held in different tables...the final result had a few joins like that.
Functionally...I'm not sure if there's a difference really. Unless I'm completely off, readability seems to be the biggest difference.
Your Second Q: What the implications when an outer join clause is used?
You'll potentially get a bunch of nulls (pending how you setup the outer join) while the inner join would have dropped them. Be careful though...inner joins is associative...as gbn put it: An OUTER JOIN is different and order does matter
The user may want to furthur filter the set of rows which are included in the Join set...
The point of the second join is to further limit your result set based on the contents of TableC. The first join gives you ONLY records that exist in TA and TB. The second join gives you ONLY results from the first join that also exist in TC.
I have a query that shows me a listing of ALL opportunities in one query
I have a query that shows me a listing of EXCLUSION opportunities, ones we want to eliminate from the results
I need to produce a query that will take everything from the first query minus the second query...
SELECT DISTINCT qryMissedOpportunity_ALL_Clients.*
FROM qryMissedOpportunity_ALL_Clients INNER JOIN qryMissedOpportunity_Exclusions ON
([qryMissedOpportunity_ALL_Clients].[ClientID] <> [qryMissedOpportunity_Exclusions].[ClientID])
AND
([qryMissedOpportunity_Exclusions].[ClientID] <> [qryMissedOpportunity_Exclusions].[BillingCode])
The initial query works as intended and exclusions successfully lists all the hits, but I get the full listing when I query with the above which is obviously wrong. Any tips would be appreciated.
EDIT - Two originating queries
qryMissedOpportunity_ALL_Clients (1)
SELECT MissedOpportunities.MOID, PriceList.BillingCode, Client.ClientID, Client.ClientName, PriceList.WorkDescription, PriceList.UnitOfWork, MissedOpportunities.Qty, PriceList.CostPerUnit AS Our_PriceList_Cost, ([MissedOpportunities].[Qty]*[PriceList].[CostPerUnit]) AS At_Cost, MissedOpportunities.fBegin
FROM PriceList INNER JOIN (Client INNER JOIN MissedOpportunities ON Client.ClientID = MissedOpportunities.ClientID) ON PriceList.BillingCode = MissedOpportunities.BillingCode
WHERE (((MissedOpportunities.fBegin)=#10/1/2009#));
qryMissedOpportunity_Exclusions
SELECT qryMissedOpportunity_ALL_Clients.*, MissedOpportunity_Exclusions.Exclusion, MissedOpportunity_Exclusions.Comments
FROM qryMissedOpportunity_ALL_Clients INNER JOIN MissedOpportunity_Exclusions ON (qryMissedOpportunity_ALL_Clients.BillingCode = MissedOpportunity_Exclusions.BillingCode) AND (qryMissedOpportunity_ALL_Clients.ClientID = MissedOpportunity_Exclusions.ClientID)
WHERE (((MissedOpportunity_Exclusions.Exclusion)=True));
One group needs to see everything, the other needs to see things they havn't deamed as "valid" missed opportunity as in, we've seen it, verified why its there and don't need to bother critiquing it every single month.
Generally you can exclude a table by doing a left join and comparing against null:
SELECT t1.* FROM t1 LEFT JOIN t2 on t1.id = t2.id where t2.id is null;
Should be pretty easy to adopt this to your situation.
Looking at your query rewritten to use table aliases so I can read it...
SELECT DISTINCT c.*
FROM qryMissedOpportunity_ALL_Clients c
JOIN qryMissedOpportunity_Exclusions e
ON c.ClientID <> e.ClientID
AND e.ClientID <> e.BillingCode
This query will produce a cartesian product of sorts... each and every row in qryMissedOpportunity_ALL_Clients will match and join with every row in qryMissedOpportunity_Exclusions where ClientIDs do not match... Is this what you want?? Generally join conditions are based on a column in one table being equal to the value of a column in the other table... Joining where they are not equal is unusual ...
Second, the second iniquality in the join conditions is between columns in the same table (qryMissedOpportunity_Exclusions table) Are you sure this is what you want? If it is, it is not a join condition, it is a Where clause condition...
Second, your question mentions two queries, but there is only the one query (above) in yr question. Where is the second one?