What is wrong with using 'Not In' in this SQL query? - sql

I have table called BST as shown below:
Here N is value of node of Binary Tree and P is its Parent node. I have to write a query that will determine if a node is a Root Node, Leaf Node or Inner Node. I wrote below SQL query for this:
select N,
case
when P is null then 'Root'
when N in (select distinct P from BST) then 'Inner'
when N not in (select distinct P from BST) then 'Leaf'
end as type
from BST
However, this is not giving me desired result as last condition for 'Leaf' in Case statement doesn't satisfy for leaf node. I am getting below output in this case:
I have a workaround for now as below query which is giving me expected output:
select N,
case
when P is null then 'Root'
when N in (select distinct P from BST) then 'Inner'
else 'Leaf'
end as type
from BST
Expected Output:
But I can't figure out what's wrong with the first one. Could someone explain me this?

The problem is because one of your P values is null. Remove this by saying select distinct p from t where p is not null in at least the Not In one of your subqueries
http://sqlfiddle.com/#!6/77fb8/3
hence:
select N,
case
when P is null then 'Root'
when N in (select distinct P from BST) then 'Inner'
when N not in (select distinct P from BST where p is not null) then 'Leaf'
end as type
from BST
the null P value gets included in the list of distinct values selected, and not in can not determine if a given value of N is equal/not equal to the null coming from the root node of P.
It's somewhat counter intuitive but nothing is ever equal to or not equal to a null, not even null. using = with one side being null results in null, not true and not false
IN can be used to check if a value IS in the list, but not if it's not, if the list includes a null
1 IN (1,2,null) --true
3 IN (1,2,null) --null, not false, null which isn't true
3 NOT IN (1,2,null) --null, not false, null which isn't true
The ELSE form is the way to go here. Or put the disctinct query in as a subquery in the FROM block and do a left join to it

in is a shorthand for a series of = checks. null, is not a value - it's the lack thereof. Whenever applying it to an operator expecting a value (like =, or in), it results in null, which is not "true".
You can think of null as an "unknown" value. I.e. - is an unknown value in a list of values selected from a table? We can't know.
Thus, you have to handle nulls explicitly, as you did in your second query.

Try this:
DECLARE #DataSource TABLE
(
[N] TINYINT
,[P] TINYINT
);
INSERT INTO #DataSource ([N], [P])
VALUES (1, 2)
,(3, 2)
,(5, 6)
,(7, 6)
,(2, 4)
,(6, 4)
,(4, 15)
,(8, 9)
,(10, 9)
,(12, 13)
,(14, 13)
,(9, 11)
,(13, 11)
,(11, 15)
,(15, NULL);
SELECT DISTINCT
DS1.[N]
,CASE WHEN DS2.[N] IS NULL THEN 'IsLeaf' ELSE CASE WHEN DS3.[N] IS NOT NULL THEN 'Inner' ELSE ' Root' END END AS [Type]
FROM #DataSource DS1
LEFT JOIN #DataSource DS2
ON DS1.[N] = DS2.[P]
LEFT JOIN #DataSource DS3
ON DS1.[P] = DS3.[N]
ORDER BY [Type];
The idea is to use two LEFT JOINs in order to see if the current node is child and if the current not is parent.

Because P has a null value.
You can't compare NULL with the regular (arithmetic) comparison operators. Any arithmetic comparison to NULL will return NULL, even NULL = NULL or NULL <> NULL will yield NULL.
Use IS or IS NOT instead.

Write where notExists instead of not in so that it will not consider nulls
select N,
case
when P is null then 'Root'
when N in (select distinct P from BST) then 'Inner'
when N not exists (select * from BST as t2 where t2.N=t1.N)
then 'Leaf'
end as type
from BST as t1

Related

Recursive select that selects rows based own plus childrens values

I need to select rows in a table like this:
Select all rows in the table where both conditions are met:
Condition 1: the value column should not match with any value in table v
Condition 2: no decendent (on any level, ie: child or sub child, sub- sub- child etc) has a value that matches with any value in table v
Table v looks like this:
Expected result from example table. Should [row] be selected/returned?
a1: No - condition 2
a2: No - condition 2
a3: No - condition 1
a4: No - condition 1
a5: Yes - (value does not match in v and no decendents that match in v)
a6: Yes - (value does not match in v and no decendents that match in v)
a7: Yes - (value does not match in v and no decendents that match in v)
a8: Yes - (value does not match in v and no decendents that match in v)
Here is an sqlfiddle where the tables are set up together with a recursive function that shows all rows and their level in the tree, but that I don't know how to procede with:
http://sqlfiddle.com/#!18/736a28/15/0
Check this solution:
--------------------------- DDL+DML
drop table if exists a
drop table if exists v
GO
CREATE TABLE a
([id] varchar(13), [parentId] varchar(57), [value] varchar(57))
;
CREATE TABLE v
([id] varchar(13), [value] varchar(57))
;
INSERT INTO a
([id], [parentId], [value])
VALUES
('a1', NULL, NULL),
('a2', 'a1', NULL),
('a3', 'a2', '1'),
('a4', NULL, '5'),
('a5', 'a1', '8'),
('a6', 'a2', NULL),
('a7', NULL, NULL),
('a8', NULL, '3'),
('a9', 'a8', '7')
;
INSERT INTO v
([id], [value])
VALUES
('v1', '1'),
('v2', '5'),
('v3', '10'),
('v4', '15'),
('v5', '20'),
('v6', '25'),
('v7', '30'),
('v8', '35'),
('v9', '40')
;
SELECT * FROM a
SELECT * FROM v
GO
-------------------- Solution
WITH MyRecCTE AS(
SELECT a.id, a.parentId, a.[value], Res = 'NO'
FROM a
INNER JOIN v ON a.[value] = v.[value]
UNION ALL
SELECT
a.id, a.parentId, a.[value], Res = 'NO'
FROM a
INNER JOIN MyRecCTE c ON c.parentId = a.id
)
SELECT DISTINCT a.id, a.parentId,a.[value], ISNULL(CONVERT(VARCHAR(3),c.Res),'YES')
FROM a
LEFT JOIN MyRecCTE c ON a.id = c.id
ORDER BY id
GO
Result Set (fits requested)):
For the sake of the discussion let's add another row which lead rows with id a8 and a9 to be "NO" since it is child of a9 and has value from the second table
INSERT INTO a
([id], [parentId], [value])
VALUES
('a10', 'a9', 35)
GO
test 2 Result set (fits expected)
This got somewhat complicated, but I created a CTE where there is a record that contains a Path for every combination of ancestor and descendant (transitive closure). Then, I create a second CTE where I extract the parent id from the beginning of Path and the descendant id from the end of Path and look up the descendant's value. Then, finally, I query the second CTE and use NOT EXISTS to filter the rows.
WITH tree
AS
(
SELECT a.id, a.parentId, a.value,
CAST('/' + a.id as varchar(1000)) as Path
FROM a
UNION ALL
SELECT a.id, a.parentId, a.value,
CAST(t.Path + '/' + a.id as varchar(1000)) as Path
FROM a
INNER JOIN tree t
ON Path LIKE '%/' + a.parentId
),
DT
AS
(
SELECT t.Path,
RIGHT(LEFT(t.Path,3),2) as parent_id,
RIGHT(t.Path,2) as descendant_id,
(SELECT q.[value]
FROM a q
WHERE q.id = RIGHT(t.Path,2)
) as [descendant_value]
FROM tree t
)
SELECT *
FROM DT dt_outer
WHERE NOT EXISTS (SELECT 1 FROM DT dt_inner WHERE dt_inner.parent_id = dt_outer.parent_id AND
dt_inner.descendant_value IN (SELECT [value] FROM v))
ORDER BY 2,3
I left the result set with duplicates to get a clearer picture of what's going on. You can finish up with a DISTINCT parent_id to get the unique ids.
SQL Fiddle

How to compare two adjacent rows in SQL?

In SQL is there a way to compare two adjacent rows. In other words if C2 = BEM and C3 = Compliance or if C4 = Compliance and C5 = BEM, then return true. But consecutive rows are identical like in C6 = BEM and C7 = BEM, then return fail.
Check out the lead() and lag() functions.
They do work best (most reliable) with a sorting field... Your sample does not appear to contain such a field. I added a sorting field in my second solution.
The coalesce() function handles the first row that does not have a preceeding row.
Solution 1 without sort field
create table data
(
A nvarchar(10)
);
insert into data (A) values
('BEM'),
('Compliance'),
('BEM'),
('Compliance'),
('BEM'),
('Compliance'),
('Compliance'),
('Compliance');
select d.A,
coalesce(lag(d.A) over(order by (select null)), '') as lag_A,
case
when d.A <> coalesce(lag(d.A) over(order by (select null)), '')
then 'Ok'
else 'Fail'
end as B
from data d;
Solution 2 with sort field
create table data2
(
Sort int,
A nvarchar(10)
);
insert into data2 (Sort, A) values
(1, 'BEM'),
(2, 'Compliance'),
(3, 'BEM'),
(4, 'Compliance'),
(5, 'BEM'),
(6, 'Compliance'),
(7, 'Compliance'),
(8, 'Compliance');
select d.A,
case
when d.A <> coalesce(lag(d.A) over(order by d.Sort), '')
then 'Ok'
else 'Fail'
end as B
from data2 d
order by d.Sort;
Fiddle with results.
As a starter: a SQL table represents an unordered set of rows; there is no inherent ordering. Assuming that you have a column that defines the ordering of the rows, say id, and that your values are stored in column col, you can use lead() and a case expression as follows:
select col,
case when col = lead(col, 1, col) over(order by id)
then 'Fail' else 'OK'
end as status
from mytable t
Assuming you have some sort of column that you can use to determine the row order then you can use the LEAD window function to get the next value.
SELECT
[A],
CASE
WHEN [A] = LEAD([A], 1, [A]) OVER (ORDER BY SomeSortIndex) THEN 'Fail'
ELSE 'Ok'
END [B]
FROM src
The additional parameters in the LEAD function specify the row offset and default value in case there is no additional row. By using the current value as the default it will cause the condition to be true and display Fail like the last result in your example.

Updating multiple rows with a conditional where clause in Postgres?

I'm trying to update multiple rows in a single query as I have many rows to update at once. In my query, there is a where clause that applies only to certain rows.
For example, I've the following query:
update mytable as m set
column_a = c.column_a,
column_b = c.column_b
from (values
(1, 12, 6, TRUE),
(2, 1, 45, FALSE),
(3, 56, 3, TRUE)
) as c(id, column_a, column_b, additional_condition)
where c.id = m.id
and CASE c.additional_condition when TRUE m.status != ALL(array['active', 'inactive']) end;
The last line in the where clause (m.status != ALL(array['active', 'inactive'])) should only be applied to rows which has TRUE in the value of c.additional_condition. Otherwise, the condition should not be applied.
Is it possible to achieve this in Postgres?
I think that this is what you want:
and CASE
when c.additional_condition THEN m.status != ALL(array['active', 'inactive'])
else TRUE
end
I think the logic you want is:
where c.id = m.id and
( (not c.additional_condition) and orm.status = 'active' )
You can use in or arrays for multiple values:
where c.id = m.id and
( (not c.additional_condition) and orm.status not in ( 'active', 'inacctive') )
I don't see a particular value to use arrays, unless you are passing a value in as an array.

How can I test whether all of the rows in a table are duplicated (except for one column)

I am working with a datawarehouse table that has can be split into claimed rows, and computed rows.
I suspect that the computed rows are perfect duplicates of the claimed row (with the exception of the claimed/computed column).
I tried to test this using the except clause:
But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.
Is there a way to compare the records which will compare nulls to nulls?
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'clm'
EXCEPT
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'cmp'
But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.
Is there a way to compare the records which will compare nulls to nulls?
edit: the solution should work with an arbitrary number of fields, with varying types. In this case, I have ~100 fields, 2/3 of which may have null values. This is a data warehouse, and some degree of denormalization must be expected.
edit: I tested the query while limiting myself to non-null columns, and I got the result I expected (nothing).
But, I would still like to compare fields which potentially contain null values.
Your supposition would appear to be false. You might try this:
select a, b, c,
sum(case when clm_cmp_cd = 'clm' then 1 else 0 end) as num_clm,
sum(case when clm_cmp_cd = 'cmp' then 1 else 0 end) as num_cmp
from t
group by a, b, c;
This will show you the values of the three columns and the number of matches of each type.
Your problem is probably that values that look alike are not exactly the same. This could be due to slight differences in floating point number or due to unmatched characters in the string, such as leading spaces.
Let's look how Db2 works with NULL values in GROUP BY and INTERSECT:
with t(a, b, clm_cmp_cd) as (values
( 1, 1, 'clm')
, ( 1, 1, 'cmp')
, (null, 1, 'clm')
, (null, 1, 'cmp')
, ( 2, 1, 'cmp')
)
select a, b
from t
where clm_cmp_cd='clm'
intersect
select a, b
from t
where clm_cmp_cd='cmp';
with t(a, b, clm_cmp_cd) as (values
( 1, 1, 'clm')
, ( 1, 1, 'cmp')
, (null, 1, 'clm')
, (null, 1, 'cmp')
, ( 2, 1, 'cmp')
)
select a, b
from t
where clm_cmp_cd in ('clm', 'cmp')
group by a, b
having count(1)>1;
Both queries return the same result:
A B
-- --
1 1
<null> 1
NULL values are treated as the same by these operators.
If you have too many columns in your table to specify them manually in your query, you may produce the column list with the following query:
select listagg(colname, ', ')
from syscat.columns
where tabschema='MYSCHEMA' and tabname='TABLE' and colname<>'CLM_CMP_CD';

No records found when running not in operator

I am trying to get records from one table excluding some records (Order No.'s in the Union). Can anybody tell me what could be wrong with this query. I am getting no records after running it.
SELECT *
FROM [dbo].[FMD15_18]
WHERE [OrderNo] NOT IN ((SELECT OrderNo
FROM [dbo].[FMD15_18]
WHERE [Item Description] Like '%AP%')
UNION ALL
SELECT [OrderNo] FROM [dbo].[AP&C]
)
I would use NOT EXISTS instead :
SELECT t.*
FROM [dbo].[FMD15_18] t
WHERE NOT EXISTS (SELECT 1
FROM [dbo].[FMD15_18] t1
WHERE t1.OrderNo = t.OrderNo AND
t1.[Item Description] Like '%AP%') AND
NOT EXISTS (SELECT 1
FROM [dbo].[AP&C] a
WHERE a.OrderNo = t.OrderNo);
However, i suspect some nullable issues with current query. If so, then you need to fiter out with IS NOT NULL in subquery.
NOT IN is tricky. I guess that OrderNo is nullable that is why you don't get any rows.
SELECT *
FROM [dbo].[FMD15_18]
WHERE [OrderNo] NOT IN (SELECT COALESCE(OrderNo, '^')
FROM [dbo].[FMD15_18]
WHERE [Item Description] Like '%AP%'
UNION ALL
SELECT COALESCE([OrderNo], '^') FROM [dbo].[AP&C]
);
Explanation:
1 IN (1, NULL)
<=>
1=1 OR 1 = NULL
-- 1 row returned
And NOT NULL:
1 NOT IN (1, NULL)
1!=1 AND 1 != NULL
-- always not true
-- always 0 rows returned
You should be able to avoid using sub-queries entirely. It sounds like you want orders (from FMD15_18) where the description does not contain "AP", and the order number is not in the AP&C table. If that's the case, you could do something like the following:
select FMD15_18.*
from FMD15_18
left join [AP&C] on
[AP&C].OrderNo = FMD15_18.OrderNo
where
FMD15_18.[Item Description] NOT like '%AP%'
and [AP&C].OrderNo is null
I don't know what kind of data is in the [FMD15_18].[Item Description] field, but it seems heavy-handed to exclude items where the description contains 2 letters. How long does the description column tend to be? Might there be records that contain "AP" that you're excluding inadvertently? Items with descriptions as varied as "APPLE", "MAPLE SYRUP", and "BURLAP" would be excluded based on this condition.