SQL Server 2016 AdventureWorks NULL checks - sql

I am working on 2 different assignments where I have to do null checks but I'm not sure if I have written the syntax correctly for that my instructor has not really discussed this but will be marking for it.
Below are the 2 questions and what I have written. Any help is appreciated.
Assignment 1 question: Create a list of the sales order numbers for orders not ordered online and not with a credit card. Note: 0 is false and 1 is true for bit fields. Below is the syntax i used, am i doing a null check here?
SELECT SalesOrderNumber
FROM Sales.SalesOrder_json
WHERE OnlineOrderFlag = 0 AND CreditCardID IS NULL
Assignment 2 question: list the vendors that have no products. Below is the syntax I used, am I doing a null check here?
SELECT
pv.Name AS Vendors,
COUNT(PP.ProductID) AS 'Products'
FROM
Purchasing.Vendor AS PV
LEFT JOIN
Purchasing.ProductVendor AS PPV ON PV.BusinessEntityID = PPV.BusinessEntityID
LEFT JOIN
Production.Product AS PP ON PP.ProductID = PPV.ProductID
GROUP BY
PV.Name
HAVING
COUNT(PP.ProductID) = 0;

Welcome to Stack Overflow!
In the future, please post a summary or create table statements that represents the schema of the tables used in your queries so that we have enough information to provide more than speculative responses. Even though this is the Adventure Works DB, you should start your SO journey with good habits!
please try not to post direct Assignment questions online as you will easily get done for plagiarism by most academic assignment checkers, mainly because other students may see your post, and the support that you get from the community which could result in all of you handing in the same result.
Have you run your queries? Do you think the results are correct?
If the results from your queries are correct, then the only issue is "have you done any null checks"? One could say that if your results have returned the correct results then you must have satisfied the criteria, otherwise the question wasn't formulated very well.
Null checks can be summarised into 3 patterns:
You directly compare against null using IS NULL or IS NOT NULL in your query
Use of JOIN syntax to deal with data that may have nulls.
INNER JOIN will limit the results to only records that match in both tables. Use this if you need to omit records that have a null in the foreign key field.
Non INNER joins, like LEFT JOIN. This will return results from the left table, even if there are no matching records in the joined or right table.
This is a good discussion on all supported joins: LEFT JOIN vs. LEFT OUTER JOIN in SQL Server
Use of Aggregation functions, aggregates will generally omit null values, COUNT will return 0 if all values are NULL, where as other aggregates such as SUM, MIN, MAX, AVG will return NULL if all values are NULL
Question 1
Clearly you have implemented a NULL check because you have evaluated criteria directly on the nullable column.
It looks like your answer to Question 1 is pretty good.
Question 2
While your query looks like it would return the vendors with no products, it is also returning a count of zero.
You do not need to output a column so that you can use it in a filter criteria, so remove COUNT(PP.ProductID) AS 'Products' unless you have been otherwise instructed to use it.
Is this a NULL check... That up to the interpretation, I think in this case the answer is yes. By using LEFT JOIN (or OUTER joins) you have created a result set that will have the field PP.ProductID with a value of NULL If there are no products.
Using Count in the filter criteria over that null column and recognising that a Count with a zero result means that the ProductID column was in fact null means you have evaluated a null check.
There are other ways to query for the same results, such as using NOT EXISTS. NOT EXISTS would NOT be a direct null check, because NULLABILITY was not evaluated directly.

Related

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

PostgreSQL: SQL JOIN with "Lookahead" Condition in ON or WHERE Clause

I am not sure how best to describe the problem I have, but it feels very much like an SQL query need for a lookahead condition such as those in a regular expression :). Pardon my verbosity as I try to find a way to express this problem.
I have a relatively complex schema of medical services, patient info, patient payment types (including insurances, workers comp info, etc), and medical sales commission earnings. Naturally there are sales reporting tools. What follows is a very oversimplified version of a query involving varying JOIN conditional cases. Most of this query is just context for the clause my question focuses on, the 2nd-to-last WHERE condition (which defines JOIN conditions):
SELECT vendor_services.*, patients.*, payments.*, vendor_products.*, products.*, vendor_product_commissions.*, vendor_commissions.*, commission_earners.*, users.*
FROM vendor_services
JOIN patients ON patients.vendor_id = vendor_services.vendor_id
JOIN payments ON payments.patient_id = patiends.id
JOIN payment_types ON payment_types.id = payments.payment_type_id
JOIN vendor_products ON vendor_products.id = vendor_services.vendor_product_id
JOIN products ON products.id = vendor_products.product_id
JOIN vendor_product_commissions ON vendor_product_commissions.vendor_product_id = vendor_products.id
JOIN vendor_commissions ON vendor_commissions.id = vendor_product_commissions.vendor_commissions.id
LEFT JOIN commission_earners ON commission_earners.id = vendor_commissions.commission_earners_id
JOIN users ON commission_earners.user_id = users.id
WHERE
vendor_services.state != 'In Progress'
AND
vendor_services.date BETWEEN :datetime_1 AND :datetime_2
AND
vendor_commissions.start_date > :datetime_1
AND
vendor_commissions.end_date < :datetime_2
AND
vendor_product_commissions.payment_type = payment_types.type
AND
payments.transaction_type = 'Paid'
GROUP BY
....
Again, this is very oversimplified: the SELECT clause is far more complex, as are the GROUP BY and ORDER clauses, performing CASE statements and aggregate calculations, etc. I have left out many other tables which represent other systems within the overall application, and focused just on the data and clauses that are relevant. My question is in regards to a needed change to this particular WHERE condition regarding the following JOIN:
WHERE ... AND vendor_product_commissions.payment_type = payment_types.type
There has been an introduction of a new possible vendor_product_commissions.payment_type value that is not a member of the payment_types.type values. With the SQL query exactly as is, it no longer selects rows in most cases, as much of the LEFT table will be using this new value. When adding an OR clause, then duplicate rows are selected when only one row should be selected:
WHERE ... AND vendor_product_commissions.payment_type = payment_types.type OR vendor_product_commissions.payment_type = 'DEFAULTVALUE'
What I need is to JOIN only on the row where vendor_product_commissions.payment_type = payment_types.type, unless that produces NULL, in which case I need to perform the JOIN on vendor_product_commissions.payment_type = 'DEFAULTVALUE'.
I can do this programatically with the ORM code surrounding this query, but that is very inefficient for a very large reporting system (essentially, query first for the specific type, then if none returned, query again for the "default" type).
I dont believe this feature exists in PostgreSQL, but thats why I am describing it as a "lookahead JOIN" problem - I need to have a sort of CASE statement that if the first JOIN condition produces a NULL relation, then perform the subsequent JOIN (OR) condition to match against this newly introduced value ('DEFAULTVALUE'). Can this be done in raw SQL? Or do I need to break this whole query apart and perform the selection of services and related data, and then programatically / iteratively relate it (via ORM/application language code) to the sales commission data? I have a strong hunch that the query can be modified to do this, but without being knowledgeable of a particular label or term for this problem, I am having a hard time searching for a possible SQL-based solution.
This is for a Ruby on Rails 4 application, using ActiveRecord, though the SQL JOIN statements are all in plaintext / strings since AR doesnt provide methods for LEFT JOIN (again, there are more and more types of JOIN statements than those listed above). I am not sure if Rails is relevant to my question, but I figured I would mention it.

Setting the ID in a fact table from a dimension table

In my dimension table for abandoned calls I have the ID 1 Code NO , ID 2 Code YES
I am wanting to load these ID's into the fact table based on whether or not the call was abandoned using a join.
How ever the problem I'm having it that the Abandoned value in my database is NULL for NO and 1 for YES.
So when i join
INNER JOIN datamartend.dbo.Abandoned_Call_Dim
ON incoming_measure.Abandoned = Abandoned_Call_Dim.abandoned_code
It's pulling no results?
Any ideas around this?
Basically what is needed is:
I want the abandoned ID from the abandoned dimension to be 2 if the abandonded value in measure is null and abandoned id 2 if not null
Thanks
You can use a CASE WHEN clause to get around this (or ISNULL, but case when is more portable across different DB engines)
INNER JOIN datamartend.dbo.Abandoned_Call_Dim
ON case when incoming_measure.Abandoned is null then '0'
else incoming_measure.Abandoned end
= case when Abandoned_Call_Dim.abandoned_code is null then '0'
else Abandoned_Call_Dim.abandoned_code end
This will replace nulls with 0. As long as you don't have a 0 code, you should be fine. If you do, try -1, or some other value you know is not in the possible set of codes.
Another thing to do if you have an unknown set of codes would be to do the join and add:
OR (incoming_measure.Abandoned is null and Abandoned_Call_Dim.abandoned_code is null)
Which doesn't technically join - it cross joins the null records (and as long as there's only one null that matters on the abandoned call dim, you're fine).
Can you check whether it is possible for you to use Decode function for the ID before doing Join.
Decode(value) = joining column
or try using
COALESCE(REPLACE(COL, VAL_TO_B_REPLACE_IF_NOT_NULL), VALUE_TO_REPLCE_WHEN_NULL)

Access 2010 Query Confusion

Just need some quick clarification
I have 2 Queries in my Access Database that should return Inverse results:
SELECT Equipment.title
FROM Equipment
WHERE (((Equipment.[EquipmentID]) Not In (
select EquipmentID
from DownPeriod
where UpDate is null
)));
The 2nd just excludes the Not before the In.
My Confusion comes from the fact that the query posted above does not return any results if an EquipmentID field has at least 1 null value in the DownPeriod table.
It works fine if the fields are filled, and the inverse query list always works. This makes me think there's an issue with the null value.
Now this field should never be null but I wanted to know if I could still get this to work in the unlikely event a null did occur.
Thank you in advanced!
Try joins:
SELECT Equipment.title FROM Equipment INNER JOIN DownPeriod
ON Equipment.EquipmentID = DownPeriod.EquipmentID
WHERE DownPeriod.UpDate is null
and
SELECT Equipment.title FROM Equipment INNER JOIN DownPeriod
ON Equipment.EquipmentID = DownPeriod.EquipmentID
WHERE DownPeriod.UpDate is not null
See if a change in syntax fixes your issue.
Not only should this work, but I believe it is a faster practise than using the IN() NOT IN() methods (might be wrong on that, but it looks nicer to read). It also adds the ability to quickly change the "is not null" criteria just the same as IN->NOT IN
I agree with StuckAtWork's approach. However, if you still want to understand why your original approach didn't produce the results you want, I think I can help you.
There may be an issue with empty strings which could complicate the situation. But regardless of whether or not empty strings are involved you have something more fundamental to consider.
Here is my version of the Equipment table.
EquipmentID title
1 one
2 two
3 three
And here is my version of the DownPeriod table.
ID EquipmentID text_field
1 1 one
2 2 two
3 Null
4 3 three
I didn't include your UpDate field in my DownPeriod table. It's irrelevant to your problem.
I pasted your SQL into a new Access query, discarded the WHERE clause from the subquery, and got exactly the same result as this query --- no rows returned:
SELECT e.title
FROM Equipment AS e
WHERE
e.EquipmentID Not In (
SELECT EquipmentID
FROM DownPeriod
);
So consider this situation from the db engine's perspective. Using my version of the Downloads table, it has a set of values (1, 2, Null, and 3) from the subquery. You're asking it to show you the rows from Equipment where EquipmentID is NOT IN that list of values. The db engine will only give you the rows for which that condition is True.
Null is the problem. For each EquipmentID, when it considers whether that value is not present in the subquery set, it doesn't know. That Null is an unknown value ... and the unknown value might be the same as the current EquipmentID it's considering ... or might be something else. But since the db engine doesn't know the real value, it can't evaluate the condition as True, so will not include that row in the result set. The same thing happens for every row in Equipment table ... therefore your query's result set is empty (no rows).
You could get your desired results by excluding Null values from the subquery result set with a WHERE clause like the one below. But I think StuckAtWork's suggestion is a better way to go.
SELECT e.title
FROM Equipment AS e
WHERE
e.EquipmentID Not In (
SELECT EquipmentID
FROM DownPeriod
WHERE EquipmentID Is Not Null
);

INNER JOIN vs multiple table names in "FROM" [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
INNER JOIN versus WHERE clause — any difference?
What is the difference between an INNER JOIN query and an implicit join query (i.e. listing multiple tables after the FROM keyword)?
For example, given the following two tables:
CREATE TABLE Statuses(
id INT PRIMARY KEY,
description VARCHAR(50)
);
INSERT INTO Statuses VALUES (1, 'status');
CREATE TABLE Documents(
id INT PRIMARY KEY,
statusId INT REFERENCES Statuses(id)
);
INSERT INTO Documents VALUES (9, 1);
What is the difference between the below two SQL queries?
From the testing I've done, they return the same result. Do they do the same thing? Are there situations where they will return different result sets?
-- Using implicit join (listing multiple tables)
SELECT s.description
FROM Documents d, Statuses s
WHERE d.statusId = s.id
AND d.id = 9;
-- Using INNER JOIN
SELECT s.description
FROM Documents d
INNER JOIN Statuses s ON d.statusId = s.id
WHERE d.id = 9;
There is no reason to ever use an implicit join (the one with the commas). Yes for inner joins it will return the same results. However, it is subject to inadvertent cross joins especially in complex queries and it is harder for maintenance because the left/right outer join syntax (deprecated in SQL Server, where it doesn't work correctly right now anyway) differs from vendor to vendor. Since you shouldn't mix implicit and explict joins in the same query (you can get wrong results), needing to change something to a left join means rewriting the entire query.
If you do it the first way, people under the age of 30 will probably chuckle at you, but as long as you're doing an inner join, they produce the same result and the optimizer will generate the same execution plan (at least as far as I've ever been able to tell).
This does of course presume that the where clause in the first query is how you would be joining in the second query.
This will probably get closed as a duplicate, btw.
The nice part of the second method is that it helps separates the join condition (on ...) from the filter condition (where ...). This can help make the intent of the query more readable.
The join condition will typically be more descriptive of the structure of the database and the relation between the tables. e.g., the salary table is related to the employee table by the EmployeeID column, and queries involving those two tables will probably always join on that column.
The filter condition is more descriptive of the specific task being performed by the query. If the query is FindRichPeople, the where clause might be "where salaries.Salary > 1000000"... thats describing the task at hand, not the database structure.
Note that the SQL compiler doesn't see it that way... if it decides that it will be faster to cross join and then filter the results, it will cross join and filter the results. It doesn't care what is in the ON clause and whats in the WHERE clause. But, that typically wont happen if the on clause matches a foreign key or joins to a primary key or indexed column. As far as operating correctly, they are identical; as far as writing readable, maintainable code, the second way is probably a little better.
there is no difference as far as I know is the second one with the inner join the new way to write such statements and the first one the old method.
The first one does a Cartesian product on all record within those two tables then filters by the where clause.
The second only joins on records that meet the requirements of your ON clause.
EDIT: As others have indicated, the optimization engine will take care of an attempt on a Cartesian product and will result in the same query more or less.
A bit same. Can help you out.
Left join vs multiple tables in SQL (a)
Left join vs multiple tables in SQL (b)
In the example you've given, the queries are equivalent; if you're using SQL Server, run the query and display the actual exection plan to see what the server's doing internally.