Querying a SQL Table using conditions from another table - sql

Stuck on SQL college question!
I want to search the table Em_Sum and find any Em_num that went from column Em_before (4, 5, 6) to Em_after (6), but I only want to query Employees who have the type_id 1, 2 or 3 which can be seen in table Em_Type.
This is what I have so far
SELECT Em_Sum.Em_num
FROM Em_Sum
FULL JOIN Em_Type ON Em_Type.Em_num = Em_Sum.Em_num
WHERE Em_Type.Type_id IN (1, 2, 3)
AND Em_Sum.Em_before IN (4, 5, 6)
AND Em_Sum.Em_after IN (6) ;
I'm just confused as to how to query the Em_Type table using Type_id

Good thing you've tried to find the answer yourself. I think what you did is correct.
First you indeed join the 2 tables. Next you can filter on either table you want.
e.g.:
SELECT sum.Em_num
FROM Em_Sum sum --we are giving this table an alias 'sum'
JOIN Em_Type type --this table gets alias 'type'
--now join both tables on the primary/foreign key 'employee number (=Em_num)':
ON type .Em_num = Em_Sum.Em_num
WHERE type.Type_id IN (1, 2, 3)
AND sum.Em_before IN (1, 2, 3, 4, 5, 6)
AND sum.Em_after IN (1, 2, 3) ;

Related

Selecting X amount of rows from one table depending on value of column from another joined table

I am trying to join several tables. To simplify the situation, there is a table called Boxes which has a foreign key column for another table, Requests. This means that with a simple join I can get all the boxes that can be used to fulfill a request. But the Requests table also has a column called BoxCount which limits the number of boxes that is needed.
Is there a way to structure the query in such a way that when I join the two tables, I will only get the number of rows from Boxes that is specified in the BoxCount column of the given Request, rather than all of the rows from Boxes that have a matching foreign key?
Script to initialize sample data:
CREATE TABLE Requests (
Id int NOT NULL PRIMARY KEY,
BoxCount Int NOT NULL);
CREATE TABLE Boxes (
Id int NOT NULL PRIMARY KEY,
Label varchar,
RequestId INT FOREIGN KEY REFERENCES Requests(Id));
INSERT INTO Requests (Id, BoxCount)
VALUES
(1, 2),
(2, 3);
INSERT INTO Boxes (Id, Label, RequestId)
VALUES
(1, 'A', 1),
(2, 'B', 1),
(3, 'C', 1),
(4, 'D', 2),
(5, 'E', 2),
(6, 'F', 2),
(7, 'G', 2);
So, for example, when the hypothetical query is ran, it should return boxes A and B (because the first Request only needs 2 boxes), but not C. Similarly it should also include boxes D, E and F, but not box G, because the second request only requires 3 boxes.
Here is another approach using ROWCOUNT - a common and useful technique that every sql writer should master. The idea here is that you create a sequential number for all boxes within a request and use that to compare to the box count for filtering.
with boxord as (select *,
ROW_NUMBER() OVER (PARTITION BY RequestId ORDER BY Id) as rno
from dbo.Boxes
)
select req.*, boxord.Label, boxord.rno
from dbo.Requests as req inner join boxord on req.Id = boxord.RequestId
where req.BoxCount >= boxord.rno
order by req.Id, boxord.rno
;
fiddle to demonstrate
The INNER JOIN keyword selects records that have matching values in both tables
SELECT (cols) FROM Boxes
INNER JOIN Request on Boxes.(FK_column) = request.id
WHERE Request.BoxCount = (value)
select r.id,
r.boxcount,
b.id,
b.label
from requests r
cross apply (
select top (r.BoxCount)
id, label
from boxes
where requestid = r.id
order by id
) b;

How to specify a linear programming-like constraint (i.e. max number of rows for a dimension's attributes) in SQL server?

I'm looking to assign unique person IDs to a marketing program, but need to optimize based on each person's Probability Score (some people can be sent to multiple programs, some only one) and have two constraints such as budgeted mail quantity for each program.
I'm using SQL Server and am able to put IDs into their highest scoring program using the row_number() over(partition by person_ID order by Prob_Score), but I need to return a table where each ID is assigned to a program, but I'm not sure how to add the max mail quantity constraint specific to each individual program. I've looked into the Check() constraint functionality, but I'm not sure if that's applicable.
create table test_marketing_table(
PersonID int,
MarketingProgram varchar(255),
ProbabilityScore real
);
insert into test_marketing_table (PersonID, MarketingProgram, ProbabilityScore)
values (1, 'A', 0.07)
,(1, 'B', 0.06)
,(1, 'C', 0.02)
,(2, 'A', 0.02)
,(3, 'B', 0.08)
,(3, 'C', 0.13)
,(4, 'C', 0.02)
,(5, 'A', 0.04)
,(6, 'B', 0.045)
,(6, 'C', 0.09);
--this section assigns everyone to their highest scoring program,
--but this isn't necessarily what I need
with x
as
(
select *, row_number()over(partition by PersonID order by ProbabilityScore desc) as PersonScoreRank
from test_marketing_table
)
select *
from x
where PersonScoreRank='1';
I also need to specify some constraints: two max C packages, one max A & one max B package can be sent. How can I reassign the IDs to a program while also using the highest probability score left available?
The final result should look like:
PersonID MarketingProgram ProbabilityScore PersonScoreRank
3 C 0.13 1
6 C 0.09 1
1 A 0.07 1
6 B 0.045 2
You need to rethink your ROW_NUMBER() formula based on your actual need, and you should also have a table of Marketing Programs to make this work efficiently. This covers the basic ideas you need to incorporate to efficiently perform the filtering you need.
MarketingPrograms Table
CREATE TABLE MarketingPrograms (
ProgramID varchar(10),
PeopleDesired int
)
Populate the MarketingPrograms Table
INSERT INTO MarketingPrograms (ProgramID, PeopleDesired) Values
('A', 1),
('B', 1),
('C', 2)
Use the MarketingPrograms Table
with x as (
select *,
row_number()over(partition by ProgramId order by ProbabilityScore desc) as ProgramScoreRank
from test_marketing_table
)
select *
from x
INNER JOIN MarketingPrograms m
ON x.MarketingProgram = m.ProgramID
WHERE x.ProgramScoreRank <= m.PeopleDesired

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

SQL Server 2008, how to check if multi records exist in the DB?

I have 3 tables:
recipe:
id, name
ingredient:
id, name
recipeingredient:
id, recipeId, ingredientId, quantity
Every time, a customer creates a new recipe, I need to check the recipeingredient table to verify if this recipe exists or not. If ingredientId and quantity are exactly the same, I will tell the customer the recipe already exists. Since I need to check multiple rows, need help to write this query.
Knowing your ingredients and quantities, you can do something like this:
select recipeId as ExistingRecipeID
from recipeingredient
where (ingredientId = 1 and quantity = 1)
or (ingredientId = 8 and quantity = 1)
or (ingredientId = 13 and quantity = 1)
group by recipeId
having count(*) = 3 --must match # of ingeredients in WHERE clause
I originally thought that the following query would find pairs of recipes that have exactly the same ingredients:
select ri1.recipeId, ri2.recipeId
from RecipeIngredient ri1 full outer join
RecipeIngredient ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having count(ri1.id) = count(ri2.id) and -- same number of ingredients
count(ri1.id) = count(*) and -- all r1 ingredients are present
count(*) = count(ri2.id) -- all r2 ingredents are present
However, this query doesn't count things correctly, because the mismatches don't have the right pairs of ids. Alas.
The following does do the correct comparison. It counts the ingredients in each recipe before the join, so this value can just be compared on all matching rows.
select ri1.recipeId, ri2.recipeId
from (select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri1 full outer join
(select ri.*, COUNT(*) over (partition by recipeid) as numingredients
from #RecipeIngredient ri
) ri2
on ri1.ingredientId = ri2.ingredientId and
ri1.quantity = ri2.quantity and
ri1.recipeId < ri2.recipeId
group by ri1.recipeId, ri2.recipeId
having max(ri1.numingredients) = max(ri2.numingredients) and
max(ri1.numingredients) = count(*)
The having clause guarantees that each recipe that the same number of ingredients, and that the number of matching ingredients is the total. This time, I've tested it on the following data:
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
insert into #recipeingredient select 2, 3, 10
insert into #recipeingredient select 3, 1, 1
insert into #recipeingredient select 4, 1, 1
insert into #recipeingredient select 4, 3, 10
insert into #recipeingredient select 5, 1, 1
insert into #recipeingredient select 5, 2, 10
If you have a new recipe, you can modify this query to just look for the recipe in one of the tables (say ri1) using an additional condition on the on clause.
If you place the ingredients in a temporary table, you can substitute one of these tables, say ri1, with the new table.
You might try something like this to find if you have a duplicate:
-- Setup test data
declare #recipeingredient table (
id int not null primary key identity
, recipeId int not null
, ingredientId int not null
, quantity int not null
)
insert into #recipeingredient select 1, 1, 1
insert into #recipeingredient select 1, 2, 10
insert into #recipeingredient select 2, 1, 1
insert into #recipeingredient select 2, 2, 10
-- Actual Query
if exists (
select *
from #recipeingredient old
full outer join #recipeingredient new
on old.recipeId != new.recipeId -- Different recipes
and old.ingredientId = new.ingredientId -- but same ingredients
and old.quantity = new.quantity -- and same quantities
where old.id is null -- Match not found
or new.id is null -- Match not found
)
begin
select cast(0 as bit) as IsDuplicateRecipe
end
else begin
select cast(1 as bit) as IsDuplicateRecipe
end
Since this is really only searching for a duplicate, you might want to substitute a temp table or pass a table variable for the "new" table. This way you wouldn't have to insert the new records before doing your search. You could also insert into the base tables, wrap the whole thing in a transaction and rollback based upon the results.

Is there any way to get postgresql to report results from a join?

In other statistical softwares (STATA), when you perform a join between two separate tables there are options to reports the results of a join
For instance, if you join a table with another table on a column and the second table has non-unique values, it reports that.
Likewise, if you perform an inner join it reports the number of rows dropped from both tables and if you perform a left or right outer join it lets you know how many rows were unmatched.
It will need a nasty outer join. Here is the CTE version:
-- Some data
CREATE TABLE bob
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO bob(id, zname) VALUES
(2, 'Alice') ,(3, 'Charly')
,(4,'David') ,(5, 'Edsger') ,(6, 'Fanny')
;
CREATE TABLE john
( ID INTEGER NOT NULL
, zname varchar
);
INSERT INTO john(id, zname) VALUES
(4,'David') ,(5, 'Edsger') ,(6, 'Fanny')
,(7,'Gerard') ,(8, 'Hendrik') ,(9, 'Irene'), (10, 'Joop')
;
--
-- Encode presence in bob as 1, presence in John AS 2, both=3
--
WITH flags AS (
WITH b AS (
SELECT 1::integer AS flag, id
FROM bob
)
, j AS (
SELECT 2::integer AS flag, id
FROM john
)
SELECT COALESCE(b.flag, 0) + COALESCE(j.flag, 0) AS flag
FROM b
FULL OUTER JOIN j ON b.id = j.id
)
SELECT flag, COUNT(*)
FROM flags
GROUP BY flag;
The result:
CREATE TABLE
INSERT 0 5
CREATE TABLE
INSERT 0 7
flag | count
------+-------
1 | 2
3 | 3
2 | 4
(3 rows)
As far as I know there is no option to do that within Postgres, although you could get a guess by looking at the estimates.
Calculating the missing rows requires you to count all rows so databases generally try to avoid things like that.
The options I can think of:
writing multiple queries
doing a full outer join and filtering the results (maybe with a subquery... can't think of a good way which will always easily work)
use writable complex table expressions to log the intermediate results