"NOT IN" not working as expected

"NOT IN" not working as expected - sql

I was writing a query to find the node type from the table BST ordered by the value of the node.
table, BST, had two columns N and P, where N represents the value of a node in BST, and P is the parent of N.
say, BST has following records:
BST Table
I successfully executed the query as follows:
SELECT n,CASE
WHEN p IS NULL THEN 'Root'
WHEN n IN (SELECT DISTINCT p FROM BST) THEN 'Inner'
ELSE 'Leaf'
END
FROM BST
ORDER BY n;
Result: Result as expected
But instead of using "IN", when I tried the same query using "NOT IN" as given below:
SELECT n,CASE
WHEN p IS NULL THEN 'Root'
WHEN n NOT IN (SELECT DISTINCT p FROM BST) THEN 'Leaf'
ELSE 'Inner'
END
FROM BST
ORDER BY n;
it didn't work as expected. Why so?

As #jarlh suggested, use NOT EXISTS, or when using NOT IN, be sure to exclude NULLs from your subquery like:
SELECT n,CASE
WHEN p IS NULL THEN 'Root'
WHEN n NOT IN (SELECT DISTINCT p FROM BST WHERE p IS NOT NULL) THEN 'Leaf'
ELSE 'Inner'
END
FROM BST
ORDER BY n;

If I were you, I'd just a NOT EXISTS. I used to always use NOT IN as a beginner. Later on I realized that you need to consider quite a bit of different factors that you never thought of at first. Use WHERE NOT EXISTS and you'll be happy fellow.
Cheers!

Related

Solution of hackerrank Binary Tree Nodes question

You are given a table, BST, containing two columns: N and P, where N represents the value of a node in Binary Tree, and P is the parent of N.
Write a query to find the node type of Binary Tree ordered by the value of the node. Output one of the following for each node:
Root: If node is root node.
Leaf: If node is leaf node.
Inner: If node is neither root nor leaf node.
Sample Input
Sample Output
1 Leaf
2 Inner
3 Leaf
5 Root
6 Leaf
8 Inner
9 Leaf
Explanation
The Binary Tree below illustrates the sample:
why below solution is not working :
select n,
CASE when P is null then 'Root'
when (select count(*) from BST where n = p)>0 then 'Inner'
else 'Leaf'
end as nodetype from BST
order by n
and below solution is working:
select n,
CASE when P is null then 'Root'
when (select count(*) from BST where b.n = p)>0 then 'Inner'
else 'Leaf'
end as nodetype from BST b
order by n

In your fist query you are comparing n with p column within the subquery which should never be true.
In second query you are comparing n column of outer query with p column of subquery which will return more than 0 if there is at least one leaf under the b.n node otherwise it will return 0.

Using a case statement is a way to go. However, to determine if a node is 'Inner' one needs to see if it is a parent to another node i.e. is its value N in the set of all P values.
SELECT N, CASE
WHEN P IS NULL THEN 'Root'
WHEN N IN (SELECT P FROM BST) THEN 'Inner'
ELSE 'Leaf' END as node_type FROM BST ORDER BY N

By using below code, you can easily solve the binary tree nodes question.In foreach statement we can check the max value and print in console
int n = Convert.ToInt32(Console.ReadLine().Trim());
string[] groupings = Convert.ToString(n, 2).Split("0");
int max = 0;
foreach(string s in groupings){
if(max < s.Length){
max = s.Length;
}
}
Console.WriteLine(max);

Write an additional column to query result with different values everytime

I've been searching for quite a while now and I haven't been able to find an answer for what I was looking. I have the following query:
SELECT DISTINCT o.titulo, o.fecha_estreno
FROM Obra o
WHERE (o.titulo LIKE '%Barcelona%' AND EXISTS(SELECT p.id_obra FROM Pelicula p WHERE p.id_obra = o.id_obra)) OR EXISTS(SELECT DISTINCT pa.id_obra
FROM Participa pa
WHERE pa.id_obra = o.id_obra AND EXISTS(SELECT DISTINCT l.nombre FROM Lugar l
WHERE l.nombre LIKE '%Barcelona%' AND EXISTS(SELECT DISTINCT tl.id_lugar FROM TieneLugar tl
WHERE tl.id_lugar = l.id_lugar AND tl.id_profesional = pa.id_profesional))) OR EXISTS(SELECT DISTINCT er.id_obra
FROM EstaRelacionado er
WHERE er.id_obra = o.id_obra AND EXISTS(SELECT DISTINCT k.keyword
FROM Keywords k
WHERE k.id_keyword = er.id_keyword AND k.keyword LIKE '%Barcelona%'));
What it basically does is it searches for every movie in my database which is related in some way to the city it gets. I wanted to have a third column showing for every result, with the reason the row is showing as a result (for example: TITLE CONTAINS IT, or ACTOR FROM THE MOVIE BORN THERE, etc.)
Thank you for your patience and help!
EDIT: As suggested, here are some examples of output. The column should show just the first cause related to the movie:
TITULO FECHA_ESTRENO CAUSE
---------- ---------------- ----------
Barcelona mia 1967 TITLE

https://www.postgresql.org/docs/7.4/static/functions-conditional.html
The SQL CASE expression is a generic conditional expression, similar
to if/else statements in other languages:
CASE WHEN condition THEN result
[WHEN ...]
[ELSE result]
END
CASE clauses can be used wherever an expression is valid. condition is an expression that returns a boolean result. If
the result is true then the value of the CASE expression is the result
that follows the condition. If the result is false any subsequent WHEN
clauses are searched in the same manner. If no WHEN condition is true
then the value of the case expression is the result in the ELSE
clause. If the ELSE clause is omitted and no condition matches, the
result is null.
Example for your case:
SELECT (CASE WHEN EXISTS(... l.nombre LIKE '%Barcelona%') THEN 'TITLE CONTAINS IT' WHEN <conditon for actor> THEN 'ACTOR WA BORN THERE' WHEN ... END) as reason

Here is one solution.
Create a subquery for each search condition.
include the reason in the subqueries' projections
outer join the subqueries so it doesn't matter which one hist
filter to make sure that at least one of your subqueries has a positive result
use coalesce() to get one reason.
I haven't done all your conditions, and I've probably mangled your logic but this is the general idea:
SELECT o.titulo
, o.fecha_estreno
, coalesce(t1.reason, t2.reason) as reason
FROM Obra o
left outer join ( select id_obra, 'title contains it' as reason
from Obra
where titulo LIKE '%Barcelona%' ) t1
on t1.id_obra o.id_obra
left outer join ( select distinct pa.id_obra , 'takes place there' as reason
from Participa pa
join TieneLugar tl
on tl.id_profesional = pa.id_profesional
join Lugar l
on tl.id_lugar = l.id_lugar
where l.nombre LIKE '%Barcelona%' ) t2
on t2.id_obra o.id_obra
WHERE t1.id_obra is not null
or t2.id_obra is not null
/
coalesce() just returns the first non-null value which means you won't see multiple reasons if you get more than one hit. So order the arguments to put the most powerful reasons first.
Also, you should consider consider using Oracle Text. It's the smartest way to wrangle this sort of keyword searching. Find out more.

Recursive CTE...dealing with nested parent/children records

I have the following records:
My goal is to check the SUM of the children for each parent and make sure it is 1 (or 100%).
In the example above, you have a first parent:
12043
It has 2 children:
12484 & 12485
Child (now parent) 12484 has child 12486. The child here (12486) has a percentage of 0.6 (which is NOT 100%). This is NOT OK.
Child (now parent) 12485 has child 12487. The child here (12487) has a percentage of 1 (or 100%). This is OK.
I need to sum the percentages of each nested children and get that value because it doesn't sum up to 100%, then I have to display a message. I'm having a hard time coming up with a query for this. Can someone give me a hand?
This is what I tried and I'm getting the "The statement terminated. The maximum recursion 100 has been exhausted before statement completion." error message.
with cte
as (select cp.parent_payee_id,
cp.payee_id,
cp.payee_pct,
0 as level
from dbo.tp_contract_payee cp
where cp.participant_id = 12067
and cp.payee_id = cp.parent_payee_id
union all
select cp.parent_payee_id,
cp.payee_id,
cp.payee_pct,
c.level + 1 as level
from dbo.tp_contract_payee cp
inner join cte c
on cp.parent_payee_id = c.payee_id
where cp.participant_id = 12067
)
select *
from cte

I believe something like the following should work:
WITH RECURSIVE recCTE AS
(
SELECT
parent_payee_id as parent,
payee_id as child,
payee_pct
1 as depth,
parent_payee_id + '>' + payee_id as path
FROM
table
WHERE
--top most node
parent_payee_id = 12043
AND payee_id <> parent_payee_id --prevent endless recursion
UNION ALL
SELECT
table.parent_payee_id as parent,
table.payee_id as child,
table.payee_pct,
recCTE.depth + 1 as Depth,
recCTE.path + '>' + table.payee_id as path
FROM
recCTE
INNER JOIN table ON
recCTE.child = table.parent_payee_id AND
recCTE.child <> table.payee_id --again prevent records where parent is child
Where depth < 15 --prevent endless cycles
)
SELECT DISTINCT parent
FROM recCTE
GROUP BY parent
HAVING sum(payee_pct) <> 1;
This differs from yours mostly in the WHERE statements on both the Recursive Seed (query before UNION) and the recursive term (query after UNION). I believe yours is too restrictive, especially in the recursive term since you want to allow records that are children of 12067 through, but then you only allow 12067 as the parent id to pull in.
Here, though, we pull every descendant of 12043 (from your example table) and it's payee_pct. Then we analyze each parent in the final SELECT and the sum of all it's payee_pcts, which are essentially that parent's first childrens sum(payee_pct). If any of them are not a total of 1, then we display the parent in the output.
At any rate, between your query and mine, I would imagine this is pretty close to the requirements, so it should be tweaks to get you exactly where you need to be if this doesn't do the trick.

How to group by more than one row value?

I am working with POSTGRESQL and I can't find out how to solve a problem. I have a model called Foobar. Some of its attributes are:
FOOBAR
check_in:datetime
qr_code:string
city_id:integer
In this table there is a lot of redundancy (qr_code is not unique) but that is not my problem right now. What I am trying to get are the foobars that have same qr_code and have been in a well known group of cities, that have checked in at different moments.
I got this by querying:
SELECT * FROM foobar AS a
WHERE a.city_id = 1
AND EXISTS (
SELECT * FROM foobar AS b
WHERE a.check_in < b.check_in
AND a.qr_code = b.qr_code
AND b.city_id = 2
AND EXISTS (
SELECT * FROM foobar as c
WHERE b.check_in < c.check_in
AND c.qr_code = b.qr_code
AND c.city_id = 3
AND EXISTS(...)
)
)
where '...' represents more queries to get more persons with the same qr_code, different check_in date and those well known cities.
My problem is that I want to group this by qr_code, and I want to show the check_in fields of each qr_code like this:
2015-11-11 14:14:14 => [2015-11-11 14:14:14, 2015-11-11 16:16:16, 2015-11-11 17:18:20] (this for each different qr_code)
where the data at the left is the 'smaller' date for that qr_code, and the right part are all the other dates for that qr_code, including the first one.
Is this possible to do with a sql query only? I am asking this because I am actually doing this app with rails, and I know that I can make a different approach with array methods of ruby (a solution with this would be well received too)

You could solve that with a recursive CTE - if I interpret your question correctly:
Assuming you have a given list of cities that must be visited in order by the same qr_code. Your text doesn't say so, but your query indicates as much.
WITH RECURSIVE
c AS (SELECT '{1,2,3}'::int[] AS cities) -- your list of city_id's here
, route AS (
SELECT f.check_in, f.qr_code, 2 AS idx
FROM foobar f
JOIN c ON f.city_id = c.cities[1]
UNION ALL
SELECT f.check_in, f.qr_code, r.idx + 1
FROM route r
JOIN foobar f USING (qr_code)
JOIN c ON f.city_id = c.cities[r.idx]
WHERE r.check_in < f.check_in
)
SELECT qr_code, array_agg(check_in) AS check_in_list
FROM (
SELECT *
FROM route
ORDER BY qr_code, idx -- or check_in
) sub
HAVING count(*) = (SELECT array_length(cities) FROM c);
GROUP BY 1;
Provide the list as array in the first (non-recursive) CTE c.
In the recursive part start with any rows in the first city and travel along your array until the last element.
In the final SELECT aggregate your check_in column in order. Only return qr_code that have visited all cities of the array.
Similar:
Recursive query used for transitive closure

subquery returning more than one value

SELECT CG.SITEID,
CR.COLLECTIONID,
CG.COLLECTIONNAME,
CASE
WHEN CR.ARCHITECTUREKEY = 5
THEN
N'vSMS_R_System'
WHEN CR.ARCHITECTUREKEY = 0
THEN
(SELECT BASETABLENAME
FROM DISCOVERYARCHITECTURES
JOIN
COLLECTION_RULES
ON DISCOVERYARCHITECTURES.DISCARCHKEY =
COLLECTION_RULES.ARCHITECTUREKEY
JOIN
COLLECTIONS_G
ON COLLECTION_RULES.COLLECTIONID =
COLLECTIONS_G.COLLECTIONID
WHERE COLLECTIONS_G.SITEID = (SELECT TOP 1 SOURCECOLLECTIONID FROM VCOLLECTIONDEPENDENCYCHAIN WHERE DEPENDENTCOLLECTIONID = CG.SITEID ORDER BY LEVEL DESC))
ELSE (SELECT DA.BASETABLENAME FROM DISCOVERYARCHITECTURES DA WHERE DA.DISCARCHKEY=CR.ARCHITECTUREKEY) END AS TABLENAME
FROM COLLECTIONS_G CG
JOIN COLLECTIONS_L CL ON CG.COLLECTIONID=CL.COLLECTIONID
JOIN COLLECTION_RULES CR ON CG.COLLECTIONID=CR.COLLECTIONID
WHERE (CG.FLAGS&4)=4 AND CL.CURRENTSTATUS!=5
I am having a problem with the code above, around the line:
when cr.ArchitectureKey=0 then...
The problem is that the sub-query returns more than one value, and I'm not too sure how to invert the query so that I get rid of the error.
To make matters worse, cr.ArchitectureKey would normally join with da.DiscArchKey, but while cr.ArchitectureKey can have a value of 0, that does not exist in da.DiscArchKey, meaning if I join the two directly I lose data.
EDIT
More information regarding the problem itself:
This is a stored procedure for a Microsoft product that has a 'bug' (probably considered a feature though) which I'm trying to fix. Don't worry, this is only in my own little test server.
Anyway, there's the concept of a Collection. All Collections must have a parent (determined through VCOLLECTIONDEPENDENCYCHAIN), with the exception of the very top level Collection that is a system collection and cannot be modified.
Each collection can have 0 or more rules, and each rule has a rule type, where the ID of the rule type is saved onto COLLECTION_RULES and the matching string for that ID is saved onto DISCOVERYARCHITECTURES.
In most cases, a rule is a WQL query, and the rule type is determined by what tables are queried on the WQL query.
However, and this is where the problem lies, collections can also have a query of type 'include' or 'exclude', which basically forces it to borrow the query of another Collection. So effectively you include the results of another Collection's query onto your own Collection, and that's the query.
As far as COLLECTION_RULES is concerned, when that happens, the ID of the rule type is 0, which is a value that doesn't exist in DISCOVERYARCHITECTURES.
What I was trying to modify was so that when the rule type is 0, get and use the rule type(s) of the highest up parent (not the direct parent since the parent Collection could also have a single include rule, in which case the rule type would still be 0).
The problem is that because each rule can have multiple rule types, it returns multiple rows in some instances.
I tried to invert the query to remove the SELECT and use joins only, but failed because I found I always needed to join it to DISCOVERYARCHITECTURES and I have nothing to join it on when the rule type = 0.
EDIT2
Sample data:
Collections_G
Collections_L
Collection_Rules
DiscoveryArchitectures
vCollectionDependencyChain
Original Query and Original Results
SELECT cg.SiteID,
CASE
WHEN da.DiscArchKey=5
THEN N'vSMS_R_System'
ELSE da.BaseTableName END AS TableName
FROM Collections_G cg
JOIN Collections_L cl ON cg.CollectionID=cl.CollectionID
JOIN Collection_Rules cr ON cg.CollectionID=cr.CollectionID
JOIN DiscoveryArchitectures da ON cr.ArchitectureKey=da.DiscArchKey
WHERE (cg.Flags&4)=4 AND cl.CurrentStatus!=5
As you can see from the results picture above, some collections appear multiple times but with different TableNames. This is because each collection have have several rules, and each rule has one cr.ArchitectureKey
Also, and more importantly, collections PS10000B and PS10000C do not show up because their cr.ArchitectureKey = 0 which is a value that doesn't exist in da.DiscArchKey.
My goal is to have collections that have a cr.ArchitectureKey appear, but I need to assign them a cr.ArchitectureKey
My thought (which is slightly flawed, but don't know enough SQL to make it better, so if someone could help with that it would be appreciated too) was to get use the da.DiscArchKey from the top level parent. But the top level parent can have multiple DiscArchKeys, which is what is causing the problem.
As mentioned above getting the top level parent is slightly flawed, and ideally I would get the top level cr.ReferencedCollectionID. In other words, if PS10000B has a cr.ReferencedCollectionID of PS10000C and PS10000C has a cr.ReferencedCollectionID of SMS00002 but because SMS00002 has no cr.ReferencedCollectionID then SMS00002 is the top level cr.ReferencedCollectionID and both PS10000B and PS10000C should have da.DiscArchKey(s) equal to those of SMS00002.

Please have a look at a wired solution that comes into mind. You may face some syntax errors(most probably in 2nd and 3rd CTE) but it just an idea.
Get each case values in separate CTEs and then combine them at the end.
;WITH CTE
AS
(
SELECT CG.SITEID,
CR.COLLECTIONID,
CG.COLLECTIONNAME
FROM COLLECTIONS_G CG
JOIN COLLECTIONS_L CL ON CG.COLLECTIONID=CL.COLLECTIONID
JOIN COLLECTION_RULES CR ON CG.COLLECTIONID=CR.COLLECTIONID
WHERE (CG.FLAGS&4)=4 AND CL.CURRENTSTATUS!=5
),
ARCHITECTUREKEY5
AS
(
SELECT C.SITEID,
C.COLLECTIONID,
C.COLLECTIONNAME,
N'vSMS_R_System' as TABLENAME
FROM CTE C WHERE C.ARCHITECTUREKEY = 5
),
ARCHITECTUREKEY0
AS
(
SELECT C.SITEID,
C.COLLECTIONID,
C.COLLECTIONNAME,
BASETABLENAME as TABLENAME
FROM CTE C,
DISCOVERYARCHITECTURES
JOIN
COLLECTION_RULES
ON DISCOVERYARCHITECTURES.DISCARCHKEY =
COLLECTION_RULES.ARCHITECTUREKEY
JOIN
COLLECTIONS_G
ON COLLECTION_RULES.COLLECTIONID =
COLLECTIONS_G.COLLECTIONID
WHERE COLLECTIONS_G.SITEID = (SELECT TOP 1 SOURCECOLLECTIONID FROM VCOLLECTIONDEPENDENCYCHAIN WHERE DEPENDENTCOLLECTIONID = C.SITEID ORDER BY LEVEL DESC))
and C.ARCHITECTUREKEY = 0
),
ARCHITECTUREKEYOTHER
AS
(
SELECT C.SITEID,
C.COLLECTIONID,
C.COLLECTIONNAME,
DA.BASETABLENAME as TABLENAME
FROM DISCOVERYARCHITECTURES DA, CTE C WHERE DA.DISCARCHKEY=CR.ARCHITECTUREKEY AND C.ARCHITECTUREKEY not in (0,1)
)
Select * from ARCHITECTUREKEY5
UNION
Select * from ARCHITECTUREKEY0
UNION
Select * from ARCHITECTUREKEYOTHER

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

"NOT IN" not working as expected - sql

As #jarlh suggested, use NOT EXISTS, or when using NOT IN, be sure to exclude NULLs from your subquery like: SELECT n,CASE WHEN p IS NULL THEN 'Root' WHEN n NOT IN (SELECT DISTINCT p FROM BST WHERE p IS NOT NULL) THEN 'Leaf' ELSE 'Inner' END FROM BST ORDER BY n;

If I were you, I'd just a NOT EXISTS. I used to always use NOT IN as a beginner. Later on I realized that you need to consider quite a bit of different factors that you never thought of at first. Use WHERE NOT EXISTS and you'll be happy fellow. Cheers!

Related

Solution of hackerrank Binary Tree Nodes question

Write an additional column to query result with different values everytime

Recursive CTE...dealing with nested parent/children records

How to group by more than one row value?

subquery returning more than one value

Categories

Resources