Multi Table NOT EQUAL in Access Query - sql

I have two tables. VEHICLES and OWNERSHIP. I am trying to make a query that will give me a list of all VEHICLES NOT in the OWNERSHIP table. I basically need a report of my available VEHICLE inventory. I tried this query:
SELECT VEHICLE.*
FROM VEHICLE, OWNERSHIP
WHERE (VEHICLE.VEH_ID <> OWNERSHIP.VEH_ID);
Im getting:
When I do an equal I get all vehicles which are listed in the ownership so that works. But the NOT Equal does not. Any ideas?

Try
SELECT VEHICLE.*
FROM VEHICLE
WHERE NOT EXISTS
(SELECT NULL FROM OWNERSHIP WHERE VEHICLE.VEH_ID= OWNERSHIP.VEH_ID);

The NOT EXISTS approach can be slow if your tables contain many rows. An alternative approach which can be much faster is to use a LEFT JOIN with a WHERE clause to return only the rows where the right-hand join field is Null.
SELECT VEHICLE.*
FROM
VEHICLE AS v
LEFT JOIN OWNERSHIP AS o
ON v.VEH_ID = o.VEH_ID
WHERE o.VEH_ID Is Null;
You could use Access' "Find Unmatched Query Wizard" to create a similar query.
If both tables are small you probably won't notice a difference. But it should be easy to check whether the difference is noticeable. And this approach will serve you better if your tables grow substantially over time.

Related

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

DB2 Performance CASE vs COALESCE

I'm modifying an existing statement that joins user information in one table so that the user info can come from another table. One is a permanent table and the other is temporary (records get moved from one to the other). I changed my join to a left join and then left joined the second contact info table. I need to select the permanent field if it exists and the temporary if the permanent isn't there. 154306 is the user id of all incoming records on the main table I'm selecting from. Here are my 2 options for selecting fields:
SELECT
CASE WHEN U.USRID = 154306
THEN T.TMPFNAME
ELSE U.FNAME
END AS FNAME,
COALESCE (U.LNAME, T.TMPLNAME) AS LNAME
FROM FILES.ORDERS O
LEFT JOIN FILES.USERS U ON U.USRID <> 154306 AND U.USRID = O.ORDUSR
LEFT JOIN FILES.TMPUSERS T ON O.ORDNUM = T.TMPORD
I'm thinking the case seems more "correct" as it's actually controlling the flow, but since the coalesce has less logic to follow it might perform faster. Either should accomplish the same result because the 2 left joins ensures we'll get the info for the user no matter what, but don't get the permanent user info for orders which are still assigned to the temp user. It looks like we have 10 fields to case/coalesce so I'm thinking the method with better performance is the way to go, which I think is coalesce but I'm not even sure on that. Is either way better for any reason?
The performance of case versus coalesce() just will not make a difference to a query that is joining three large tables. Such queries are dominated by the time for reading and matching the rows in the table.
By the way, the two are not exactly the same. If you have NULL values in users.Fname, then the case logic would keep them but the coalesce() logic would fill in the values from the other table.
Your criterion should be clarity of expression. Because you think the case makes more sense, I would suggest you go with that.

INNER JOIN vs multiple table names in "FROM" [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.

Understanding how JOIN works when 3 or more tables are involved. [SQL]

I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]
If I compose a query e.g
SELECT *
FROM A JOIN B
ON A.AId = B.AId
All good - I'm sweet with how this works.
What happens when Table C (Or some other D,E, .... gets added)
In the situation
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
What is C joining to? - is it that B table (and the values therein)?
Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
Conceptually, Closing Balance of a date, would be tomorrows opening balance
If I was trying to find a list of all the opening and closing balances for an account
I might do something like
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
I know that the example tables do not conform to good DB design
Apologies for the essay, thanks for any elightenment :)
Edit in response to question by Marc
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.
Conceptually here is what happens when you join three tables together.
The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
The query execution engine applies any predicates (WHERE clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in the JOIN conditions or the SELECT list or the ORDER BY list. Call this result A
It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
Then it joins in the final table and applies the ORDER BY
This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.
For example, there is really no need to generate the full result sets along the way. The ORDER BY may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.
We know that the data from B is going to be filtered by the (inner) join to A (the data in A is also filtered). So if we (inner) join from B to C, thus the set C is also filtered by the relationship to A. And note also that any duplicates from the join will be included.
However; what order this happens in is up to the optimizer; it could decide to do the B/C join first then introduce A, or any other sequence (probably based on the estimated number of rows from each join and the appropriate indexes).
HOWEVER; in your later example you use a LEFT OUTER join; so Account is not filtered at all, and may well my duplicated if any of the other tables have multiple matches.
Are there duplicates (per account) in BalanceToken?
I often find it helps to view the actual execution plan. In query analyser/management studio, you can turn this on for queries from the Query menu, or use Ctrl+M. After running the query, the plan that was executed is shown in another result tab. From this you'll see that C and B are joined first, and then the result is joined with A. The plan might vary depending on information the DBMS has because both joins are inner, making it A-and-B-and-C. What I mean is that the result will be the same regardless of which is joined first, but the time it takes might differ greatly, and this is where the optimiser and hints come into play.
Joins can be tricky, and much of the behavior is of course dictated by how the data is stored in the actual tables.
Without seeing the tables it's hard to give a clear answer in your particular case but I think the basic issue is that you are summing over multiple result sets that are being combined into one.
Perhaps instead of multiple joins you should make two separate temporary tables in your query, one with the accountID, date and sum of openingbalances, a second one with the accountID, date and sum of closing balances, then joining those two on AccountID and date.
In order to find out exactly what is happening with joins, also in your specific case, I would do the following:
Change the initial part
SELECT accountID Accountbalancedate, sum(...) as openingbalance,
sum(...) as closingbalance FROM
to simply
"SELECT * FROM"
Study the resulting table, and you will see exactly what data is being duplicated. Remove the joins one by one and see what happens. This should give you a clue to what it is about your particular data that is causing the dupes.
If you open the query in SQL server management studio (Free version exists) you can edit the query in the designer. The visual view of how the tables are being joined might also help you realize what's going on.

Object Relational Mapping Issues: Suggestions needed

I've been trying to come up with a good design pattern for mapping data contained in relational databases to the business objects I've created but I keep hitting a wall.
Consider the following tables:
TYPE: typeid, description
USER: userid, username, usertypeid->TYPE.typeid, imageid->IMAGE.imageid
IMAGE: imageid, location, imagetypeid->TYPE.typeid
I would like to gather all the information regarding a specific user. Creating a query for this isn't too difficult.
SELECT u.*, ut.*, i.*, it.* FROM user u
INNER JOIN type ut ON ut.typeid = u.usertypeid
INNER JOIN image i ON i.imageid = u.imageid
INNER JOIN type it ON it.typeid = i.imagetypeid
WHERE u.userid = #userid
The problem is that the field names collide and then I'm forced to alias every single field which gets out of hand very quickly.
Does anyone have a decent design pattern for this kind of thing?
I've thought about retrieving multiple results from a single stored procedure and then using a dataset to iterate through each one but I'm worried that some performance issues might bite me in the butt later. For example instead of the above query something like:
SELECT u.*, t.* FROM user u
INNER JOIN type t ON t.typeid = u.usertypeid
WHERE u.userid = #userid;
SELECT i.*, t.* FROM image i
INNER JOIN type t ON t.typeid = i.imagetypeid
INNER JOIN user u ON u.imageid = i.imageid
WHERE u.userid = #userid;
Does that seem like a decent solution? Can anyone foresee any issues with this approach?
Never use the SQL * wildcard in production code. Always spell out all the columns you want to retrieve.
Then aliasing some of them doesn't seem like such a huge amount of extra work.
Re your comment asking for background and reasoning:
Sometimes you don't really need every column from all tables, and fetching them can be needlessly costly (especially for large strings and blobs). There is no SQL syntax for "all columns except the following exceptions."
You can't alias columns that you fetch using the wildcard. Once you need to alias any of the columns, you need to expand the wildcard to list all the columns explicitly.
If the table structure changes, e.g. columns are renamed, reordered, dropped, or added, then the wildcard fetches them all, by position as defined in the tables. This may seem like a convenience, but not when your application depends on columns being in the result set by a given name or in a given position. You can get mysterious bugs where your application displays columns in the wrong order (if referencing columns by position), or shows them as blank (if referencing columns by name).
However, if the SQL query names columns explicitly, you can employ the "Fail Early" principle. This helps debugging, because it leads you directly to the SQL query that needs to be edited to account for the schema change.