Invalid results from SQL Server "NOT IN" clause - sql

I have run a query on our SQL Server 2012 which returned no results. I discovered that this was incorrect and I SHOULD have gotten 16 records. I changed the query and get the answer expected but I am at a loss to understand why my original query did not work as expected.
So my ORIGINAL query which returned no results was:
SELECT
WPB.[ID number]
FROM
[Fact].[REPORT].[WPB_LIST_OF_IDS] WPB
WHERE
[ID number] NOT IN (SELECT DISTINCT IdNumber
FROM MasterData.Dimension.Customer DC)
The reworked query is this:
SELECT
WPB.[ID number]
FROM
[Fact].[REPORT].[WPB_LIST_OF_IDS] WPB
LEFT JOIN
MasterData.Dimension.Customer DC ON WPB.[ID number] = DC.IdNumber
WHERE
DC.IdNumber IS NULL
Can anyone tell me WHY the first query (which incidentally runs in fractions of a second vs the 2nd which takes a minute) does not work? I don't want to repeat this mistake in the future!

Don't use not in with a subquery. It doesn't work the way you expect with NULL values. If any value returned by the subquery is NULL, then no rows are returned at all.
Instead, use not exists. This has the semantics that you expect:
select wpb.[ID number]
from [Fact].[REPORT].[WPB_LIST_OF_IDS] wpb
where not exists (select 1
from MasterData.Dimension.Customer dc
where wpb.[ID number] = dc.IdNumber
);
Of course, the left join method also works.

Related

How to remove duplicates and unwanted rows

So I have a "Sample", "Test" and "Result" table linked to each other from a database and I am trying to pull information using MS Query. Each sample has one test and each test could have roughly 20 results entered by different people attached to it.
What I want is for the sample to only display if the person's name I enter is NOT involved with entering ANY of the results.
SELECT SAMPLE.SAMPLE_NUMBER, SAMPLE.TEXT_ID, SAMPLE.STATUS, SAMPLE.DATE_COMPLETED, SAMPLE.LOCATION, TEST.ANALYSIS, RESULT.ENTERED_BY
FROM DATABASE.RESULT RESULT, DATABASE.SAMPLE SAMPLE, DATABASE.TEST TEST
WHERE TEST.SAMPLE_NUMBER = SAMPLE.SAMPLE_NUMBER AND RESULT.TEST_NUMBER = TEST.TEST_NUMBER
AND ((TEST.ANALYSIS='ID_META' Or TEST.ANALYSIS='ID_RIBO' Or TEST.ANALYSIS='ID_BACTERIA' Or TEST.ANALYSIS='ID_MOULD')
AND (SAMPLE.STATUS='C') AND (SAMPLE.DATE_COMPLETED Is Not Null)
AND (RESULT.ENTERED_ON Between [Start Date] And [End Date])
AND (RESULT.ENTERED_BY<>[Enter Name]))
ORDER BY SAMPLE.DATE_COMPLETED
This is the code that I have so far but the problem is if the person has entered one of 10 results then that same sample will display 9 times and just not display for the one time he didn't enter a result. Is there a way that I can say if he entered ANY result at all then the sample won't appear at all.
When you find yourself wanting to limit the rows by a condition that involves multiple rows (like "I want every test where none of the multiple results were entered by this person"), you can't do it with simple conditions like RESULT.ENTERED_BY<>[Enter Name]. That only looks at the value of each single row you're currently working with. You either need a correlated subquery or an analytical function. I think subqueries are easier to start out with, and in your case a NOT EXISTS clause makes intuitive sense.
(I'm also going to rewrite this with standard modern JOIN syntax)
select SAMPLE.SAMPLE_NUMBER, SAMPLE.TEXT_ID, SAMPLE.STATUS, SAMPLE.DATE_COMPLETED, SAMPLE.LOCATION, TEST.ANALYSIS, RESULT.ENTERED_BY
from DATABASE.SAMPLE SAMPLE
join DATABASE.TEST TEST
on TEST.SAMPLE_NUMBER = SAMPLE.SAMPLE_NUMBER
join DATABASE.RESULT RESULT
on RESULT.TEST_NUMBER = TEST.TEST_NUMBER
where (TEST.ANALYSIS in ('ID_META','ID_RIBO','ID_BACTERIA','ID_MOULD')
and (SAMPLE.STATUS='C') and (SAMPLE.DATE_COMPLETED Is Not Null)
and (RESULT.ENTERED_ON Between [Start Date] And [End Date])
-- up until here, it's the same as your query
and NOT EXISTS (select 1
from DATABASE.TEST T2
join DATABASE.RESULT R2
on R2.TEST_NUMBER = T2.TEST_NUMBER
where T2.SAMPLE_NUMBER = SAMPLE.SAMPLE_NUMBER
and T2.ANALYSIS in ('ID_META','ID_RIBO','ID_BACTERIA','ID_MOULD')
and R2.ENTERED_ON Between [Start Date] And [End Date]
and R2.ENTERED_BY = [Enter Name])
ORDER BY SAMPLE.DATE_COMPLETED;
So here we're saying to return all the samples where there "doesn't exist" any test with any result which was entered by the specific person. (I'm not sure whether you'll want the date filter on both the main query and the subquery - both RESULT and R2 - you'll have to figure that out based on your data.)
Edit: if you want one row per sample, just remove the TEST/RESULT joins from the main query:
select SAMPLE.SAMPLE_NUMBER, SAMPLE.TEXT_ID, SAMPLE.STATUS, SAMPLE.DATE_COMPLETED, SAMPLE.LOCATION
from DATABASE.SAMPLE SAMPLE
where (SAMPLE.STATUS='C') and (SAMPLE.DATE_COMPLETED Is Not Null)
and NOT EXISTS (select 1
from DATABASE.TEST T2
join DATABASE.RESULT R2
on R2.TEST_NUMBER = T2.TEST_NUMBER
where T2.SAMPLE_NUMBER = SAMPLE.SAMPLE_NUMBER
and T2.ANALYSIS in ('ID_META','ID_RIBO','ID_BACTERIA','ID_MOULD')
and R2.ENTERED_ON Between [Start Date] And [End Date]
and R2.ENTERED_BY = [Enter Name])
ORDER BY SAMPLE.DATE_COMPLETED;

How to only pull a row once from MS Query

So I have a "Sample", "Test" and "Result" table linked to each other from a database and I am trying to pull information using MS Query. Each sample has one test and each test could have roughly 20 results entered by different people attached to it.
What I want is for the sample to only display if the person's name I enter is NOT involved with entering ANY of the results.
SELECT SAMPLE.SAMPLE_NUMBER, SAMPLE.TEXT_ID, SAMPLE.STATUS, SAMPLE.DATE_COMPLETED, SAMPLE.LOCATION, TEST.ANALYSIS, RESULT.ENTERED_BY
FROM DATABASE.RESULT RESULT, DATABASE.SAMPLE SAMPLE, DATABASE.TEST TEST
WHERE TEST.SAMPLE_NUMBER = SAMPLE.SAMPLE_NUMBER AND RESULT.TEST_NUMBER = TEST.TEST_NUMBER
AND ((TEST.ANALYSIS='ID_META' Or TEST.ANALYSIS='ID_RIBO' Or TEST.ANALYSIS='ID_BACTERIA' Or TEST.ANALYSIS='ID_MOULD')
AND (SAMPLE.STATUS='C') AND (SAMPLE.DATE_COMPLETED Is Not Null)
AND (RESULT.ENTERED_ON Between [Start Date] And [End Date])
AND (RESULT.ENTERED_BY<>[Enter Name]))
ORDER BY SAMPLE.DATE_COMPLETED
This is the code that I have so far but the problem is if Alan has entered one of 10 results then that same sample will display 9 times and just not display for the one time he didn't enter a result. Is there a way that I can say if he entered ANY result at all then the sample won't appear at all.
Edit - To include additional clauses incorporated into the query. Query pulled directly from Excel connection window (from MS Query).
This answers the original version of the question.
You seem to be describing NOT EXISTS:
SELECT s.SAMPLE_NUMBER
FROM DATABASE.SAMPLE s
WHERE NOT EXISTS (SELECT 1
FROM DATABASE.RESULT r JOIN
DATABASE.TEST t
ON r.TEST_NUMBER = t.TEST_NUMBER
WHERE t.SAMPLE_NUMBER = s.SAMPLE_NUMBER AND
R.ENTERED_ON >= DATE '2020-02-01' AND
R.ENTERED_ON >= DATE '2020-02-03' AND
R.ENTERED_BY = 'ALAN'
) AND
S..DATE_COMPLETED Is Not Null ;
I have left in your additional conditions, even though they are not mentioned in the question.
Notes:
NEVER use commas in the FROM clause.
Always use proper, explicit, standard, readable JOIN syntax.
Use proper DATE constants in Oracle.
Don't use BETWEEN with DATE particularly in Oracle. The DATE datatype has a time component, which might not be visible when you look at the data.
Please, try with below query:
SELECT DISTINCT(SAMPLE.SAMPLE_NUMBER) as SAMPLE_NUMBER
FROM DATABASE.SAMPLE SAMPLE
LEFT OUTER JOIN DATABASE.TEST TEST ON TEST.SAMPLE_NUMBER = SAMPLE.SAMPLE_NUMBER
LEFT OUTER JOIN DATABASE.RESULT RESULT ON RESULT.TEST_NUMBER = TEST.TEST_NUMBER
WHERE ((SAMPLE.DATE_COMPLETED Is Not Null)
AND (RESULT.ENTERED_ON Between CAST('01-FEB-2020' as DATE) And CAST('02-FEB-2020' as DATE))
AND (RESULT.ENTERED_BY <> 'ALAN'))

Trouble with WHERE EXIST subquery of a LEFT JOIN

I am trying to run a Left Join in the MS Access SQL. I am trying to Left Join my "OldPE" table to New "1 PE" table and update my column labeled "Line Num". There is no primary key in these tables so I am linking them through a series of conditions. Here is my code so far (excuse the poor formatting I am new and still learning SQL).
UPDATE [1 PE]
LEFT JOIN OldPE ON ([1 PE].SumRes = OldPE.SumRes)
AND ([1 PE].[Project Code] = OldPE.[Project Code])
AND ([1 PE].[DeptID] = OldPE.[DeptID])
AND ([1 PE].[Res Code] = OldPe.[Res Code])
AND ([1 PE].[Explain The Cost] LIKE OldPE.[Explain The Cost])
AND ([1 PE].Notes LIKE OldPE.Notes)
SET [1 PE].[Line Num] = [OldPE].[Line Num];
There are a lot of rows that have null or blank values in their "Explain The Cost" and "Notes" columns. I used a like statement because some of the notes that I want together vary slightly due to spelling mistakes and such. However now that I use the "like" it won't return the rows with a null value for these columns. The SQL code wont accept a WHERE EXISTS (I may also just be writing it wrong).
How do I get these null values to still be returned while using the Like command
Use ISNULL to join back to the original value if it is null. (This will include the null values in the results.)
AND ([1 PE].[Explain The Cost] LIKE ISNULL(OldPE.[Explain The Cost],[1 PE].[Explain The Cost]))

Whats wrong with this nested query?

I am trying to write a query to return the id of the latest version of a market index stored in a database.
SELECT miv.market_index_id market_index_id from ref_market_index_version miv
INNER JOIN ref_market_index mi ON miv.market_index_id = mi.id
WHERE mi.short_name='dow30'
AND miv.version_num = (SELECT MAX(m1.version_num) FROM ref_market_index_version m1 INNER JOIN ref_market_index m2 ON m1.market_index_id = m2.id )
The above SQL statement can be (roughly) translated into the form:
SELECT some columns FROM SOME CRITERIA MATCHED TABLES
WHERE mi.short_name='some name'
AND miv.version_num = SOME NUMBER
What I don't understand is that when I supply an actual number (instead of a sub query), the SQL statement works - also, when I test the SUB query used to determine the latest version number, that also works - however, when I attempt to use the result returned by sub query in the outer (parent?) query, it returns 0 rows - what am I doing wrong here?
Incidentally, I also tried an IN CLAUSE instead of the strict equality match i.e.
... AND miv.version_num IN (SUB QUERY)
That also resulted in 0 rows, although as before, when running the parent query with a hard coded version number, I get 1 row returned (as expected).
BTW I am using postgeresql, but I prefer the solution to be db agnostic.
The problem is probably that the max(version_num) doesn't exist for 'dow30'.
Try the following correlated subquery:
SELECT miv.market_index_id market_index_id
from ref_market_index_version miv INNER JOIN
ref_market_index mi
ON miv.market_index_id = mi.id
WHERE mi.short_name='dow30' AND
miv.version_num = (SELECT MAX(m1.version_num)
FROM ref_market_index_version m1 INNER JOIN
ref_market_index m2
ON m1.market_index_id = m2.id
where m1.short_name = 'dow30'
)
I added the where clause in the subquery.

MS Access SQL: Troubles combining UNION ALL with a LEFT JOIN

I have created a query in MS Access to simulate a FULL OUTER JOIN and combine the results that looks something like the following:
SELECT NZ(estimates.employee_id, actuals.employee_id) AS employee_id
, NZ(estimates.a_date, actuals.a_date) AS a_date
, estimates.estimated_hours
, actuals.actual_hours
FROM (SELECT *
FROM estimates
LEFT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
UNION ALL
SELECT *
FROM estimates
RIGHT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
WHERE estimates.employee_id IS NULL
OR estimates.a_date IS NULL) AS qFullJoinEstimatesActuals
I have saved this query as an object (let's call it qEstimatesAndActuals). My objective is to LEFT JOIN qEstimatesAndActuals with another table. Something like the following:
SELECT *
FROM qJoinedTable
LEFT JOIN (SELECT *
FROM labor_rates) AS rates
ON qJoinedTable.employee_id = rates.employee_id
AND qJoinedTable.a_date BETWEEN rates.begin_date AND rates.end_date
MS Access accepts the syntax and runs the query, but it omits results that are clearly within the result set. Wondering if the date format was somehow lost, I placed a FORMAT around the begin_date and end_date to force them to be interpreted as Short Dates. Oddly, this produced a different result set, but it still omitted result that it shouldn't have.
I am wondering if the queries are performed in such a way that you can't LEFT JOIN the result set of a UNION ALL. Does anyone have any thoughts/ideas on this? Is there a better way of accomplishing the end goal?
I would try breaking each part of the query into its own access query object, e.g.
SELECT *
FROM estimates
LEFT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
Would be qryOne
SELECT *
FROM estimates
RIGHT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
WHERE estimates.employee_id IS NULL
OR estimates.a_date IS NULL
Would be qryTwo
SELECT * FROM qryOne
UNION ALL
SELECT * FROM qryTwo
Would be qryFullJoinEstimatesActuals, and finally
SELECT NZ(estimates.employee_id, actuals.employee_id) AS employee_id
, NZ(estimates.a_date, actuals.a_date) AS a_date
, estimates.estimated_hours
, actuals.actual_hours
FROM qryFullJoinEstimatesActuals
I've found that constructs that don't work in complex Access SQL statements often do work properly if they are broken down into individual query objects and reassembled step-by-step. Additionally, you can test each part of the query individually. This will help you find a workaround if one proves to be necessary.
You can find exactly how to do this here.
You're missing an INNER JOIN.... UNION ALL step.
Consistent with the odd behavior surrounding the dates, this issue turned out to be related to the use of NZ to select a date from qFullJoinEstimatesActuals. The use of NZ appears to make the data type ambiguous. As such, the following line from the example in my post caused the error:
, NZ(estimates.a_date, actuals.a_date) AS a_date
The ambiguous data type of a_date caused the BETWEEN operator to produce erroneous results when comparing a_date to rates.begin_date and rates.end_date in the LEFT JOIN. The issue was resolved by type casting the result of the NZ function, as follows:
, CDate(NZ(estimates.a_date, actuals.a_date)) AS a_date