SQL Join tables - detecting presence of some tuples but not others - sql

I've got two primary tables: codes and categories.
I've also got a join table code_mappings which associates codes with categories.
I need to be able to determine which codes are mapped to one group of categories, but not mapped to another. Been banging my head against this for a while, but am completely stuck.
Here's the schema:
create table codes(
id int,
name varchar(256));
create table code_mappings(
id int,
code_id int,
category_id int);
create table categories(
id int,
name varchar(256));
And some seed data:
INSERT INTO categories VALUES(1, 'Dental');
INSERT INTO categories VALUES(2, 'Weight');
INSERT INTO categories VALUES(3, 'Other');
INSERT INTO categories VALUES(4, 'Acme Co');
INSERT INTO categories VALUES(5, 'No Name');
INSERT INTO codes VALUES(100, "big bag of cat food");
INSERT INTO codes VALUES(200, "healthy doggie treatz");
INSERT INTO code_mappings VALUES(50, 200, 1);
INSERT INTO code_mappings VALUES(51, 100, 4);
INSERT INTO code_mappings VALUES(52, 100, 3);
How would I write a query that will give me the codes that are mapped to one of categories (1,2,3) but not to one of categories (4,5)?

This is an example of a set-within-sets query. I like to approach these using group by and having, because I find that the most flexible approach:
select cm.code_id
from code_mappings cm
group by cm.code_id
having sum(case when cm.category_id in (1, 2, 3) then 1 else 0 end) = 1 and
sum(case when cm.category_id in (4, 5) then 1 else 0 end) = 0;
Each condition in the having clause implements exactly one of the conditions. You said one code of 1, 2, or 3, hence the = 1 (if you wanted at least one of these three, it would be > 0). You said no 4 or 5, hence = 0.

SELECT *
FROM codes co
WHERE EXISTS (
SELECT *
FROM code_mappings ex
WHERE ex.code_id = co.id
AND ex.category_id IN (1,2,3)
)
AND NOT EXISTS (
SELECT *
FROM code_mappings nx
WHERE nx.code_id = co.id
AND nx.category_id IN (4,5)
)
;

Related

Find data by multiple Lookup table clauses

declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
The Name Table has more than just 1,2,3 and the list to parse on is dynamic
NameTable
id | name
----------
1 foo
2 bar
3 steak
CharacterTable
id | name
---------
1 tom
2 jerry
3 dog
NameToCharacterTable
id | nameId | characterId
1 1 1
2 1 3
3 1 2
4 2 1
I am looking for a query that will return a character that has two names. For example
With the above data only "tom" will be returned.
SELECT *
FROM nameToCharacterTable
WHERE nameId in (1,2)
The in clause will return every row that has a 1 or a 3. I want to only return the rows that have both a 1 and a 3.
I am stumped I have tried everything I know and do not want to resort to dynamic SQL. Any help would be great
The 1,3 in this example will be a dynamic list of integers. for example it could be 1,3,4,5,.....
Filter out a count of how many times the Character appears in the CharacterToName table matching the list you are providing (which I have assumed you can convert into a table variable or temp table) e.g.
declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
declare #RequiredNames table (nameId int);
insert into #RequiredNames (nameId)
values
(1),
(2);
select *
from #Character C
where (
select count(*)
from #NameToCharacter NC
where NC.characterId = c.id
and NC.nameId in (select nameId from #RequiredNames)
) = 2;
Returns:
id
name
1
tom
Note: Providing DDL+DML as shown here makes it much easier for people to assist you.
This is classic Relational Division With Remainder.
There are a number of different solutions. #DaleK has given you an excellent one: inner-join everything, then check that each set has the right amount. This is normally the fastest solution.
If you want to ensure it works with a dynamic amount of rows, just change the last line to
) = (SELECT COUNT(*) FROM #RequiredNames);
Two other common solutions exist.
Left-join and check that all rows were joined
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM #RequiredNames rn
LEFT JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = COUNT(nc.nameId) -- all rows are joined
);
Double anti-join, in other words: there are no "required" that are "not in the set"
SELECT *
FROM #Character c
WHERE NOT EXISTS (SELECT 1
FROM #RequiredNames rn
WHERE NOT EXISTS (SELECT 1
FROM #NameToCharacter nc
WHERE nc.nameId = rn.nameId AND nc.characterId = c.id
)
);
A variation on the one from the other answer uses a windowed aggregate instead of a subquery. I don't think this is performant, but it may have uses in certain cases.
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM (
SELECT *, COUNT(*) OVER () AS cnt
FROM #RequiredNames
) rn
JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = MIN(rn.cnt)
);
db<>fiddle

Oracle SQL invalid identifier error in nested WITH subquery

Below you will find three sample tables and data along with a query. This example might seem contrived, but it is part of much larger (nearly 1500 lines) SQL query. The original query works great, but I've run into a problem while adding some new functionality.
CREATE TABLE rule_table (
id_rule_table NUMBER (10),
name VARCHAR2 (24),
goal NUMBER (10),
amount NUMBER (10)
);
INSERT INTO rule_table (id_rule_table, name, goal, amount) VALUES(1, 'lorem', 2, 3);
INSERT INTO rule_table (id_rule_table, name, goal, amount) VALUES(2, 'ipsum', 3, 3);
INSERT INTO rule_table (id_rule_table, name, goal, amount) VALUES(3, 'dolor', 4, 3);
CREATE TABLE content_table (
id_content_table NUMBER (10),
name VARCHAR2 (24),
show_flag NUMBER (10)
);
INSERT INTO content_table (id_content_table, name, show_flag) VALUES(1, 'lorem', 0);
INSERT INTO content_table (id_content_table, name, show_flag) VALUES(2, 'ipsum', 1);
INSERT INTO content_table (id_content_table, name, show_flag) VALUES(3, 'dolor', 1);
CREATE TABLE module_table (
id_module_table NUMBER (10),
id_content_table NUMBER (10),
name VARCHAR2 (24),
amount NUMBER (10)
);
INSERT INTO module_table (id_module_table, id_content_table, name, amount) VALUES(1, 2, 'lorem', 10);
INSERT INTO module_table (id_module_table, id_content_table, name, amount) VALUES(2, 2, 'ipsum', 11);
INSERT INTO module_table (id_module_table, id_content_table, name, amount) VALUES(3, 2, 'dolor', 12);
SELECT RULE.id_rule_table
FROM rule_table RULE
WHERE (
CASE
WHEN RULE.goal <= (
WITH contentTbl (id_content_table)
AS (
SELECT id_content_table
FROM content_table
WHERE show_flag = 1
),
modulesTbl (id_content_table, id_module_table)
AS (
SELECT C.id_content_table, M.id_module_table
FROM contentTbl C
JOIN module_table M ON M.id_content_table = C.id_content_table
WHERE 4 < M.amount - RULE.amount
)
SELECT SUM(M.id_module_table)
FROM contentTbl C
JOIN modulesTbl M ON C.id_content_table = M.id_content_table
)
THEN 1
ELSE 0
END
) = 1;
DROP TABLE rule_table;
DROP TABLE content_table;
DROP TABLE module_table;
If you try this you will receive the error ORA-00904: "RULE"."AMOUNT": invalid identifier. The problem lies with the line "WHERE 4 < M.amount - RULE.amount".
If you replace RULE.amount, in that line, with some number (e.g., WHERE 4 < M.amount - 3) then the query will run just fine.
As mentioned above, this is a snippet test case from a much larger query, so the structure of the query can't be (or hopefully doesn't need to be) changed too much. That is, ideally I'm looking for a solution that will allow me to use RULE.amount in the sub-query without changing anything other that the SQL inside of the "WHEN RULE.goal <= ()" block.
I'm trying to run this on Oracle 11g.
One last thing, I tried searching google and stackoverflow for solutions, but I couldn't figure out the correct terminology to describe my issue. The closest thing seemed to be nested correlated subquery, but that doesn't seem to be exactly right.
Taking into account that this is only part of a much larger query, here are the surgical changes required to make this work:
Move the WHERE 4 < M.amount - RULE.amount condition out of the CTE and into the main query so that RULE is in scope.
Modify the modulesTbl CTE to return an additional column amount so that M.amount is now available to the main query.
With these 2 changes, the query would look like this:
SELECT RULE.id_rule_table
FROM rule_table RULE
WHERE (
CASE
WHEN RULE.goal <= (
WITH contentTbl (id_content_table)
AS (
SELECT id_content_table
FROM content_table
WHERE show_flag = 1
),
modulesTbl (id_content_table, id_module_table, amount) -- add amount
AS (
SELECT C.id_content_table, M.id_module_table, M.amount -- add amount
FROM contentTbl C
JOIN module_table M ON M.id_content_table = C.id_content_table
)
SELECT SUM(M.id_module_table)
FROM contentTbl C
JOIN modulesTbl M ON C.id_content_table = M.id_content_table
AND 4 < M.amount - RULE.amount -- moved from CTE to here
)
THEN 1
ELSE 0
END
) = 1;

SQL 'arrays' for repeated use with IN comparisons

It's entirely possible to do a SELECT statement like so:
SELECT *
FROM orders
WHERE order_id in (10000, 10001, 10003, 10005);
However, is it possible to create a variable which stores that 'array' (10000, ...) for repeated use in multiple statements like so?
SELECT *
FROM orders
WHERE order_id in #IDarray;
Apologies if this is a painfully simple question - we've all gotta ask them once!
Edit: Hmm, perhaps I should clarify. In my exact situation, I have a load of IDs (let's use the array above as an example) that are hard coded but might change.
These should be re-usable for multiple INSERT statements, so that we can insert things into multiple tables for each of the IDs. Two such end results might be:
INSERT INTO table1 VALUES (10000, 1, 2, 3);
INSERT INTO table1 VALUES (10001, 1, 2, 3);
INSERT INTO table1 VALUES (10003, 1, 2, 3);
INSERT INTO table1 VALUES (10005, 1, 2, 3);
INSERT INTO table2 VALUES (10000, a, b, c);
INSERT INTO table2 VALUES (10001, a, b, c);
INSERT INTO table2 VALUES (10003, a, b, c);
INSERT INTO table2 VALUES (10005, a, b, c);
Obviously here being able to specify the array saves room and also allows it to be changed in one location instead of the INSERTs having to be modified.
With Microsoft SQL Server you could use an table variable:
CREATE TABLE #IDTable(id INT PRIMARY KEY);
-- insert IDs into table
SELECT *
FROM orders o
INNER JOIN #IDTable i ON i.id = o.order_id;
INSERT INTO table2
SELECT id, 1, 2, 3
FROM #IDTable
INSERT INTO table2
SELECT id, 'a', 'b', 'c'
FROM #IDTable
With the use of Declare table variable you can achieve what you want to do.
For example :
Declare #tbl table(orderID int,Orders varchar(max));
insert into #tbl
SELECT * FROM orders WHERE order_id in (10000, 10001, 10003, 10005);
Select orderID ,Orders from #tbl

SQL Query problems exist

I'm having a lot of troubles with the last query I need and I think it's a level out of my league so any help is appreciated.
The tables:
CREATE TABLE Recipe
(
nrecipe integer,
name varchar(255),
primary key (nrecipe)
);
CREATE TABLE Food
(
designation varchar(255) unique,
quantity integer,
primary key (designation)
);
CREATE TABLE Contains
(
nrecipe integer,
designation varchar(255),
quantity integer,
primary key (nrecipe, designation),
foreign key (nrecipe) references Recepie (nrecipe),
foreign key (designation) references Food (designation)
);
Quantity in Food table is the quantity stored in warehouse.
Quantity in Contains is the amount needed of a food element to use in recipe.
Quantity in Food table and Contains differ from each other.
The query:
I want to know the names of ALL recipes that are possible to be done with the food stored in warehouse.
It requires that the quantity of every element of food in warehouse is bigger than the quantity needed for the recipe.
EDIT: also, it shouldn't show a recipe's name if there is nothing referring to it on Contains table.
To make it easier to understand, I'll give some data:
INSERT INTO Recipe VALUES ('01', 'Steak with potatos and water');
INSERT INTO Recipe VALUES ('02', 'Rice and ice tea');
INSERT INTO Recipe VALUES ('03', 'Potatos and shrimp');
INSERT INTO Recipe VALUES ('04', 'Water');
INSERT INTO Recipe VALUES ('05', 'Steak with rice');
INSERT INTO Recipe VALUES ('06', 'Steak with spaguetti');
INSERT INTO Recipe VALUES ('07', 'Potatos with rice');
INSERT INTO Food VALUES ('Water', 5);
INSERT INTO Food VALUES ('Ice tea', 10);
INSERT INTO Food VALUES ('Steak', 30);
INSERT INTO Food VALUES ('Potatos', 20);
INSERT INTO Food VALUES ('Rice', 50);
INSERT INTO Food VALUES ('Shrimp', 5);
INSERT INTO Food VALUES ('Spaguetti', 5);
INSERT INTO Contains VALUES ('01', 'Steak', 1);
INSERT INTO Contains VALUES ('01', 'Potatos', 15);
INSERT INTO Contains VALUES ('01', 'Water', 10);
INSERT INTO Contains VALUES ('02', 'Rice', 5);
INSERT INTO Contains VALUES ('02', 'Ice tea', 8);
INSERT INTO Contains VALUES ('03', 'Potatos', 1);
INSERT INTO Contains VALUES ('03', 'Shrimp', 10);
INSERT INTO Contains VALUES ('04', 'Water', 20);
INSERT INTO Contains VALUES ('05', 'Steak', 1);
INSERT INTO Contains VALUES ('05', 'Rice', 20);
INSERT INTO Contains VALUES ('06', 'Steak', 1);
INSERT INTO Contains VALUES ('06', 'Spaguetti', 10);
The outcome expected from the query is:
Rice and ice tea
Steak with rice
Since it's the only two recipes with enough quantity in warehouse.
EDIT: potatoes with rice shouldn't appear as it is a recipe but isn't in contains list
Thanks for input and time. Any help is welcome :)
I'd use >= ALL operator :
SELECT name
FROM Recipe R
WHERE 0 >= ALL (SELECT C.quantity - F.quantity
FROM Food F
INNER JOIN Contains C
USING (designation)
WHERE C.nrecipe = R.nrecipe);
The correct spelling is recipe, and you used different names for some columns (recepie, nrecipe, nrecepie) so I changed it. Note that instead of using a varchar primary key, you should use a numeric one.
Edit:
SELECT name
FROM Recipe R
WHERE 0 >= ALL (SELECT C.quantity - F.quantity
FROM Food F
INNER JOIN Contains C
USING (designation)
WHERE C.nrecipe = R.nrecipe)
AND EXISTS(SELECT NULL
FROM Contains C
WHERE C.nrecipe = R.nrecipe);
This is in SQL Server because that is what I have:
select
r.name
from
Recepie r
where
not exists
(
select 1
from
[Contains] c
where
c.nrecipe = r.nrecepie and
not exists
(
select 1
from
Food f
where
f.designation = c.designation and
f.quantity >= c.quantity
)
)
Which in plain language is "Get me all recipes where there are no ingredients of insufficient quantity"

SQL one-to-many match the one side by ALL in many side

In the following one to many
CREATE TABLE source(id int, name varchar(10), PRIMARY KEY(id));
CREATE TABLE params(id int, source int, value int);
where params.source is a foreign key to source.id
INSERT INTO source values(1, 'yes');
INSERT INTO source values(2, 'no');
INSERT INTO params VALUES(1,1,1);
INSERT INTO params VALUES(2,1,2);
INSERT INTO params VALUES(3,1,3);
INSERT INTO params VALUES(4,2,1);
INSERT INTO params VALUES(5,2,3);
INSERT INTO params VALUES(6,2,4);
If i have a list of param values (say [1,2,3]), how do I find all the sources that have ALL of the values in the list (source 1, "yes") in SQL?
Thanks
SELECT s.*
FROM source AS s
JOIN params AS p ON (p.source = s.id)
WHERE p.value IN (1,2,3)
GROUP BY s.id
HAVING COUNT(DISTINCT p.value) = 3;
You need the DISTINCT because your params.value is not prevented from having duplicates.
Edit Modified to handle case where there can be multiple occurances of the value for a given source.
Try this:
SELECT
*
FROM
source
WHERE
(
SELECT COUNT(DISTINCT value)
FROM params
WHERE params.source = source.id
AND params.value IN (1, 2, 3)
) = 3
You can rewrite it to a GROUP BY as well:
SELECT
source.*
FROM
source
INNER JOIN params ON params.source = source.id
WHERE
params.value IN (1, 2, 3)
GROUP BY
source.id,
source.name
HAVING
COUNT(DISTINCT params.value) = 3