I have the following two table E and G.
create table E(K1 int, K2 int primary key (K1, K2))
insert E
values (1, 11), (1, 20), (2, 10), (2, 30), (3, 10), (3, 30),
(4, 100), (5, 200), (6, 200),
(7, 300), (8, 300), (9, 310), (10, 310), (10, 320), (11, 320), (12, 330)
create table G(GroupID varchar(10), K1 int primary key)
insert G
values ('Group 1', 1), ('Group 1', 2), ('Group 2', 4), ('Group 2', 5),
('Group 3', 8), ('Group 3', 9), ('Group 3', 12)
I need to a view - giving a K2 number, find all related K1. The "related K1" is defined:
All K1s have the same K2 in table E. For example, 2 and 3 in E are related because both records have K2 of 10. ((2, 10), (3, 10)).
All K1s have the same GroupID in table G. For example, the K1 of 1 and 2 are both in group Group 1.
So querying the following view
select K1 from GroupByK2 where K2 = 200 -- or 100
should return
4
5
6
because both (5, 200) and (6, 200) have the same K2. And the 4 and 5 of (4, 100) and (5, 200) are both in 'Group 2'.
And select K1 from GroupByK2 where K2 = 300 -- or 310, 320, 330 should return 7, 8, 9, 10, 11, 12.
View:
create view GroupByK2
as
with cte as (
select E.*, K2 K2x from E
union all
select E.K1, E.K2, cte.K2x
from cte
join G on cte.K1 = G.K1
join G h on h.GroupID = G.GroupID
join E on E.K1 = h.K1 and E.K1 <> cte.K1
where not exists (select * from cte x where x.k1 = G.k1 and x.K2 = G.K2) -- error
)
select *
from cte;
However, the SQL has the error of
Recursive member of a common table expression 'cte' has multiple recursive references?
Scratched my head over this one a bit, but here is a working, although highly inefficient solution...
You correctly tried to eliminate joining the original rows back to avoid the cyclic recursion, but it won't work due to 2 reasons:
As the error stated, you can't reference the recursive member more
than once
Even if you could, at each recursion, the recursive set consists only of the output of the previous recursion, so you wouldn't be
able to eliminate the cycles from earlier recursions anyway.
My solution avoids that in a "less than optimal" way, it simply includes all the rows with the cycles, but limits the recursion level to a hard number (5 in the example, but you can parameterize it as well) to avoid the endless recursion, and only at the final query, eliminates the duplicates with a group by.
This may or not work for you depending on the depth of the hierarchy. It creates tons of redundant work, and I doubt it will scale, but YMMV. I addressed it as a logical puzzle :-)
This is one of the (rare) cases where I will definitely consider an iterative solution instead of a set based one. You will need to create a table valued function so you can parameterize it, which you won't be able to do properly with a view. Within the function create a temporary table or table variable, populate it with the output sets one by one, and loop until you are done. This way you will be able to eliminate the cycles at the root by checking the content of the temporary table and only inserting new rows.
Anyway, here goes:
;WITH KeyGroups AS
(
SELECT E.*, G.GroupID
FROM E
LEFT OUTER JOIN
G
ON E.K1 = G.K1
),
Recursive AS
(
SELECT K.K1, K.K2, K.GroupID, 0 AS lvl
FROM KeyGroups AS K
WHERE K.K2 = 300
UNION ALL
SELECT K.K1, K.K2, K.GroupID, lvl + 1
FROM Recursive AS R
INNER JOIN
KeyGroups AS K
ON R.GroupID = K.GroupID
OR
R.K2 = K.K2
OR
R.K1 = K.K1
WHERE lvl < 5
)
SELECT MIN(lvl) AS lvl, K1, K2, GroupID
FROM Recursive
GROUP BY GroupID, K1, K2
ORDER BY lvl, K1, K2, GroupID;
Also see DBFiddle.
I'll give this some more thought tomorrow if I have time, and update here if I find a better solution.
Thanks for the interesting challenge and well formulated post.
HTH
Related
I have a postgres schema like this:
CREATE TABLE rows
(
id bigint NOT NULL,
start_year integer
);
CREATE TABLE calculations
(
id bigint NOT NULL,
row_id bigint NOT NULL,
year integer,
calculation numeric(23,7)
);
INSERT INTO rows (id, start_year)
VALUES
(1, 2020),
(2, 2021);
INSERT INTO calculations (id, row_id, year, calculation)
VALUES
(1, 1, 2019, 0),
(2, 1, 2020, 100),
(3, 1, 2021, 900),
(4, 1, 2022, 300),
(5, 1, 2023, 500),
(6, 2, 2019, 220),
(7, 2, 2020, 111),
(8, 2, 2021, 222),
(9, 2, 2024, 333),
(10, 2, 2025, 444);
A an SQL view with select like this:
SELECT
row.id,
calc1.calculation as calc1,
calc2.calculation as calc2,
calc3.calculation as calc3
FROM
rows row
LEFT JOIN calculations calc1 on calc1.row_id = row.id and calc1.year = row.start_year
LEFT JOIN calculations calc2 on calc2.row_id = row.id and calc2.year = row.start_year + 1
LEFT JOIN calculations calc3 on calc3.row_id = row.id and calc3.year = row.start_year + 2;
Actually both tables are way larger. SQL query takes about 10 sec to execute and most of it is taken by calculations. The only thing I've managed to optimize it so far is:
SELECT
row.id,
calc.calculation->(row.start_year)::text as calc1,
calc.calculation->(row.start_year+1)::text as calc2,
calc.calculation->(row.start_year+2)::text as calc3
FROM
rows row
LEFT JOIN (select row_id, json_object_agg(year, calculation) as calculation
from calculations
group by row_id) calc on calc.row_id = row.id
Now it has x2 performance boost, but it not enough. It queries unneeded year values. When I've replaced this query with taking first, second and third year, it was working much faster., so I wonder if there is another way to merge these JOINs to one with performance boost.
http://sqlfiddle.com/#!17/8ff004/4
You may try adding the following index to the calculations table:
CREATE INDEX idx_calc ON calculations (row_id, year, calculation);
This index, if used, has the ability to speed up the multiple joins to the calculations table.
Using this self-referencing table:
CREATE TABLE ENTRY (
ID integer NOT NULL,
PARENT_ID integer,
... other columns ...
)
There are many top-level rows (with PARENT_ID = NULL) that can have 0 to several levels of child rows, forming a graph like this:
(1, NULL, 'A'),
(2, 1, 'B'),
(3, 2, 'C'),
(4, 3, 'D'),
(5, 4, 'E'),
(6, NULL, 'one'),
(7, 6, 'two'),
(8, 7, 'three'),
(9, 6, 'four'),
(10, 9, 'five'),
(11, 10, 'six');
I want to write a query that would give me the subgraph (all related rows in both directions) for a given row, for instance (just showing the ID values):
ID = 3: (1, 2, 3, 4, 5)
ID = 6: (6, 7, 8, 9, 10, 11)
ID = 7: (6, 7, 8)
ID = 10: (6, 9, 10, 11)
It's similar to the query in ยง3.3 Queries against a Graph of the SQLite documentation, for returning a graph from any of its nodes:
WITH RECURSIVE subtree(x) AS (
SELECT 3
UNION
SELECT e1.ID x FROM ENTRY e1 JOIN subtree ON e1.PARENT_ID = subtree.x
UNION
SELECT e2.PARENT_ID x FROM ENTRY e2 JOIN subtree ON e2.ID = subtree.x
)
SELECT x FROM subtree
LIMIT 100;
... with 3 as the anchor / initial-select value.
This particular query works fine in DBeaver. The sqlite version available in db-fiddle gives a circular reference error, but this nested CTE gives the same result in db-fiddle.
However, I can only get this to work when the initial value is hard-coded in the query. I can't find any mention of how to supply that initial-select value as a parameter.
I'd think it should be straightforward. Maybe the case of having more than one top-level row is very unusual, or I'm overlooking something blindingly obvious?
Any suggestions?
As forpas points out above, SQLite doesn't support passing parameters to stored/user defined functions.
Using a placeholder in the prepared statement from the calling code is a good alternative.
I have the following schema:
CREATE TABLE tbl_employee_team
(
employee_id int,
teams_id int
);
INSERT INTO tbl_employee_team
VALUES
(1, 2),
(1, 3),
(1, 4);
CREATE TABLE tbl_team_list_serv
(
service_id int,
team_id int
);
INSERT INTO tbl_team_list_serv
VALUES
(7, 2),
(9, 3),
(10, 4);
CREATE TABLE tbl_service
(
id int,
parent int
);
INSERT INTO tbl_service
VALUES
(5, null),
(6, 5),
(7, 6),
(8, null),
(9, 8),
(10, null);
For the sake of simplicity I declared:
1 as employee_id
2, 3, 4 as team_id
5 -> 6 -> 7 as service (5 is the main service)
8 -> 9 (8 is the main service)
10 (10 is the main service)
To retrieve the services the employee belongs to I query
SELECT ls.service_id FROM tbl_team_list_serv ls
JOIN tbl_employee_team t ON ls.team_id=t.teams_id WHERE t.employee_id = 1
To get the main service from the services I use
WITH RECURSIVE r AS
(
SELECT id, parent, 1 AS level
FROM tbl_service
WHERE id = 7 /*(here's I need to assign to every id from the JOIN)*/
UNION
SELECT tbl_service.id, tbl_service.parent, r.level + 1 AS level
FROM tbl_service
JOIN r
ON r.parent = tbl_service.id
)
SELECT id FROM r WHERE r.level = (SELECT max(level) FROM r)
My question is how do I merge the two queries?
Based on the data above I want to finally get a list of ids which is in this case:
5, 8, 10
Also, I want my recursive query to return the last row (I don't think that the solution with level is elegant)
SQLFiddle can be found here
Thanks in advance
I feel like you already did most of the work for this question. This is just a matter of the following tweaks:
Putting the logic for the first query in the anchor part of the CTE.
Adding the original service id as a column to remember the hierarchy.
Tweaking the final logic to get one row per original service.
As a query:
WITH RECURSIVE r AS (
SELECT ls.service_id as id, s.parent, 1 as level, ls.service_id as orig_service_id
FROM tbl_team_list_serv ls JOIN
tbl_employee_team t
ON ls.team_id = t.teams_id JOIN
tbl_service s
ON ls.service_id = s.id
WHERE t.employee_id = 1
UNION ALL
SELECT s.id, s.parent, r.level + 1 AS level, r.orig_service_id
FROM tbl_service s JOIN
r
ON r.parent = s.id
)
SELECT r.id
FROM (SELECT r.*,
MAX(level) OVER (PARTITION BY orig_service_id) as max_level
FROM r
) r
WHERE r.level = max_level;
Here is a db<>fiddle.
I'm looking to assign unique person IDs to a marketing program, but need to optimize based on each person's Probability Score (some people can be sent to multiple programs, some only one) and have two constraints such as budgeted mail quantity for each program.
I'm using SQL Server and am able to put IDs into their highest scoring program using the row_number() over(partition by person_ID order by Prob_Score), but I need to return a table where each ID is assigned to a program, but I'm not sure how to add the max mail quantity constraint specific to each individual program. I've looked into the Check() constraint functionality, but I'm not sure if that's applicable.
create table test_marketing_table(
PersonID int,
MarketingProgram varchar(255),
ProbabilityScore real
);
insert into test_marketing_table (PersonID, MarketingProgram, ProbabilityScore)
values (1, 'A', 0.07)
,(1, 'B', 0.06)
,(1, 'C', 0.02)
,(2, 'A', 0.02)
,(3, 'B', 0.08)
,(3, 'C', 0.13)
,(4, 'C', 0.02)
,(5, 'A', 0.04)
,(6, 'B', 0.045)
,(6, 'C', 0.09);
--this section assigns everyone to their highest scoring program,
--but this isn't necessarily what I need
with x
as
(
select *, row_number()over(partition by PersonID order by ProbabilityScore desc) as PersonScoreRank
from test_marketing_table
)
select *
from x
where PersonScoreRank='1';
I also need to specify some constraints: two max C packages, one max A & one max B package can be sent. How can I reassign the IDs to a program while also using the highest probability score left available?
The final result should look like:
PersonID MarketingProgram ProbabilityScore PersonScoreRank
3 C 0.13 1
6 C 0.09 1
1 A 0.07 1
6 B 0.045 2
You need to rethink your ROW_NUMBER() formula based on your actual need, and you should also have a table of Marketing Programs to make this work efficiently. This covers the basic ideas you need to incorporate to efficiently perform the filtering you need.
MarketingPrograms Table
CREATE TABLE MarketingPrograms (
ProgramID varchar(10),
PeopleDesired int
)
Populate the MarketingPrograms Table
INSERT INTO MarketingPrograms (ProgramID, PeopleDesired) Values
('A', 1),
('B', 1),
('C', 2)
Use the MarketingPrograms Table
with x as (
select *,
row_number()over(partition by ProgramId order by ProbabilityScore desc) as ProgramScoreRank
from test_marketing_table
)
select *
from x
INNER JOIN MarketingPrograms m
ON x.MarketingProgram = m.ProgramID
WHERE x.ProgramScoreRank <= m.PeopleDesired
I have a lookup table with values I want to check for in the data.
The problem is somewhat like this:
-- Data with an ID, a group (which is a number) and some letters which belong to that group.
select *
into #data
from (values
(1, 45, 'A'),
(1, 45, 'B'),
(1, 45, 'C'),
(2, 45, 'D'))
as data(id, number, letter)
-- The various letters that I expect for each ID in a specific group
select *
into #expected_letters
from (values
(45, 'A'),
(45, 'D'),
(45, 'E'),
(123, 'A'),
(123, 'Q'))
as expected_letters(number, letter)
The results that I expect from a query are all letters (from all ids from #data) that I expect belonging to that group, but are not there. So these results actually:
(1, 45, D)
(1, 45, E)
(2, 45, A)
(2, 45, E)
In my problem the list is a lot longer with more groups and more id's. I've tried a lot with different joins and set operators, but I can't seem to get my head around this problem.
Some help would be much appreciated.
This is my version which is very similar but uses an outer apply instead of multiple joins. :-
select distinct d.id, aa.number,aa.letter from #data d
outer apply (select * from #expected_letters el where el.number=d.number and el.letter not in
(select letter from #data dt where dt.number=d.number and dt.id=d.id)
) aa
Here's what I tried, and it seems to work. The last inner join aliased "nums" is to remove number 123 from your results, since it doesn't exist for any ID in #data.
select e.*, ids.id from #expected_letters e
cross join (select distinct id from #data) ids
full join #data d on e.number = d.number and e.letter = d.letter and d.id = ids.id
inner join (select distinct number from #data) nums on e.number = nums.number
where
d.id is null
--result:
number letter id
45 A 2
45 D 1
45 E 1
45 E 2