remove duplicate in double left join

remove duplicate in double left join - sql

I know there are several questions about this already.
But i cant find a answer from them about my case.
I need to produce a report with some data combines from three tables.
There is relations between those tables but some data may not match the related table, so i cant use a inner join.
So I tried with left join. This kinda worked but now I get back to many rows due to the left join.
I have set up a test case here -> http://sqlfiddle.com/#!18/637b1/1
CREATE TABLE [xOrder](
[ID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[SomeText] [nvarchar](255) NOT NULL DEFAULT (''))
GO
CREATE TABLE [Order_Time](
[ID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[OrderId] [uniqueidentifier] NOT NULL,
[SomeText] [nvarchar](255) NOT NULL DEFAULT (''),
[TimeTypeId] [int] NOT NULL DEFAULT ((-1)),
[TimeType2Id] [int] NOT NULL DEFAULT ((-1)))
GO
CREATE TABLE [Terms](
[ID] [int] NOT NULL,
[CategoryId] [int] NOT NULL,
[SomeText] [nvarchar](255) NOT NULL DEFAULT (''))
GO
Insert into [xOrder] (ID, SomeText) VALUES ('db63ddb9-40d9-4d41-9dfc-5335c400dbd8','aaa')
Insert into [xOrder] (ID, SomeText) VALUES ('ef19af2d-66e9-4de1-a9a2-178b61dfe958','bbb')
Insert into [xOrder] (ID, SomeText) VALUES (newid(),'ccc')
GO
Insert into Order_Time (ID, OrderId, SomeText, TimeTypeId, TimeType2Id) VALUES (newid(),'db63ddb9-40d9-4d41-9dfc-5335c400dbd8', 'time1.1', 1, 1)
Insert into Order_Time (ID, OrderId, SomeText, TimeTypeId, TimeType2Id) VALUES (newid(),'db63ddb9-40d9-4d41-9dfc-5335c400dbd8', 'time1.2', 1, -1)
Insert into Order_Time (ID, OrderId, SomeText, TimeTypeId, TimeType2Id) VALUES (newid(),'db63ddb9-40d9-4d41-9dfc-5335c400dbd8', 'time1.3', -1, -1)
GO
Insert into Order_Time (ID, OrderId, SomeText, TimeTypeId, TimeType2Id) VALUES (newid(),'ef19af2d-66e9-4de1-a9a2-178b61dfe958', 'time2.1', 2, 2)
Insert into Order_Time (ID, OrderId, SomeText, TimeTypeId, TimeType2Id) VALUES (newid(),'ef19af2d-66e9-4de1-a9a2-178b61dfe958', 'time2.2', 2, -1)
Insert into Order_Time (ID, OrderId, SomeText, TimeTypeId, TimeType2Id) VALUES (newid(),'ef19af2d-66e9-4de1-a9a2-178b61dfe958', 'time2.3', -1, -1)
GO
Insert into Terms (ID, CategoryId, SomeText) VALUES (1, 1, 'Term1')
Insert into Terms (ID, CategoryId, SomeText) VALUES (2, 1, 'Term2')
Insert into Terms (ID, CategoryId, SomeText) VALUES (3, 1, 'Term3')
Insert into Terms (ID, CategoryId, SomeText) VALUES (1, 2, 'Category1')
Insert into Terms (ID, CategoryId, SomeText) VALUES (2, 2, 'Category2')
Insert into Terms (ID, CategoryId, SomeText) VALUES (3, 2, 'Category3')
GO
And this i the query i have tried.
select o.SomeText as OrderText, ot.SomeText as TimeText1, coalesce(t.SomeText, 'NotFound') as TermText, coalesce(tt.SomeText, 'NotFound') as CategoryText from xOrder o
inner join order_time ot on o.id = ot.OrderId
left join terms t on ot.TimeTypeId = t.Id
left join terms tt on (ot.TimeType2Id = t.Id and t.ID = 2)
The result i expect is 6 row containing this:
----------------------------------------
| aaa | Time1.1 | Term1 | Category1 |
| aaa | Time1.2 | Term1 | NotFound |
| aaa | Time1.2 | NotFound | NotFound |
| bbb | Time2.1 | Term2 | Category2 |
| bbb | Time2.2 | Term2 | NotFound |
| bbb | Time2.2 | NotFound | NotFound |
----------------------------------------
But that isnt happening. So how do i remove the extra rows from the left joins?

I think this is what you want:
select o.SomeText as OrderText, ot.SomeText as TimeText1,
coalesce(t.SomeText, 'NotFound') as TermText,
coalesce(tt.SomeText, 'NotFound') as CategoryText
from xOrder o inner join
order_time ot
on o.id = ot.OrderId left join
terms t
on ot.TimeTypeId = t.Id and t.CategoryId = 1 left join
terms tt
on ot.TimeType2Id = tt.Id and tt.CategoryId = 2;
Here is a db<>fiddle.
Your query has some problem with table aliases, which is why you get duplicates.

You need to change your join condition
DEMO
select distinct o.SomeText as OrderText, ot.SomeText as TimeText1,
coalesce(t.SomeText, 'NotFound') as TermText,
coalesce(tt.SomeText, 'NotFound') as CategoryText
from xOrder o
inner join order_time ot on o.id = ot.OrderId
left join terms t on ot.TimeTypeId = t.id and t.CategoryId=1
left join terms tt on ot.TimeTypeId = tt.id and tt.CategoryId=2
OUTPUT:
OrderText TimeText1 TermText CategoryText
aaa time1.1 Term1 Category1
aaa time1.2 Term1 Category1
aaa time1.3 NotFound NotFound
bbb time2.1 Term2 Category2
bbb time2.2 Term2 Category2
bbb time2.3 NotFound NotFound

Try below query:
select o.SomeText, ot.sometext,
coalesce((select someText from terms where id = ot.TimetypeId and categoryid = 1), 'NotFound') TimeType,
coalesce((select someText from terms where id = ot.Timetype2Id and categoryid = 2), 'NotFound') CategroyId
from xOrder o
join order_time ot on o.id = ot.OrderId

Related

SQL select parent-child recursively based on a reference table

I saw many questions related to a recursive query but couldn't find any that shows how to use it based on a reference table.
I have a MasterTable where Id, ParentId columns are establishing the parent/child relation.
I have a SubTable where I have a bunch of Ids which could be a parent Id or child Id.
I would like to retrieve all related records (parent or child, recursively) from the MasterTable based on given SubTable
Current output:
id parentId
----------- -----------
1 NULL
2 1
3 1
4 NULL
5 4
6 5
7 6
Expected output
id parentId
----------- -----------
1 NULL
2 1
3 1
4 NULL
5 4
6 5
7 6
8 9
9 NULL
10 NULL
11 10
13 11
14 10
15 16
16 NULL
Comparison of actual vs expected:
Code:
DECLARE #MasterTable TABLE
(
id INT NOT NULL,
parentId INT NULL
);
DECLARE #SubTable TABLE
(
id INT NOT NULL
);
INSERT INTO #MasterTable (id, parentId)
VALUES (1, NULL), (2, 1), (3, 1), (4, NULL), (5, 4), (6, 5),
(7, 6), (8, 9), (9, NULL), (10, NULL), (11, 10), (12, NULL),
(13, 11), (13, 11), (14, 10), (15, 16), (16, NULL);
INSERT INTO #SubTable (id)
VALUES (1), (2), (3), (4), (6), (5), (7),
(8), -- it does not show
(13), -- it does not show
(15); -- it does not show
/* beside 8,13,15 it should add 9,11,14 and 10,16 */
;WITH cte AS
(
SELECT
mt1.id,
mt1.parentId
FROM
#MasterTable AS mt1
WHERE
mt1.parentId IS NULL
AND EXISTS (SELECT NULL AS empty
FROM #SubTable AS st
WHERE st.Id = mt1.id)
UNION ALL
SELECT
mt2.id,
mt2.parentId
FROM
#MasterTable AS mt2
INNER JOIN
cte AS c1 ON c1.id = mt2.parentId
)
SELECT DISTINCT
c2.id,
c2.parentId
FROM
cte AS c2
ORDER BY
id;

Is the following query suitable for the issue in question?
with
r as(
select
m.*, iif(m.parentid is null, 1, 0) p_flag
from #MasterTable m
join #SubTable s
on s.id = m.id
union all
select
m.*, iif(m.parentid is null, 1, r.p_flag)
from r
join #MasterTable m
on (r.p_flag = 1 and m.parentid = r.id) or
(r.p_flag = 0 and r.parentid = m.id)
)
select distinct
id, parentid
from r
order by id;
Output:
| id | parentid |
+----+----------+
| 1 | NULL |
| 2 | 1 |
| 3 | 1 |
| 4 | NULL |
| 5 | 4 |
| 6 | 5 |
| 7 | 6 |
| 8 | 9 |
| 9 | NULL |
| 10 | NULL |
| 11 | 10 |
| 13 | 11 |
| 14 | 10 |
| 15 | 16 |
| 16 | NULL |
Test it online with rextester.com.

;WITH cte
AS (
SELECT mt1.id,
mt1.parentId
FROM #MasterTable AS mt1
WHERE mt1.parentId IS NULL
UNION ALL
SELECT mt2.id,
mt2.parentId
FROM #MasterTable AS mt2
INNER JOIN cte AS c1
ON c1.id = mt2.parentId
)
SELECT DISTINCT c2.id,
c2.parentId
FROM cte AS c2
where
EXISTS (
SELECT 1 AS empty FROM #SubTable AS st
WHERE ( st.Id = c2.id or st.Id = c2.parentId)
)
or
EXISTS (
SELECT 1 AS empty FROM #MasterTable AS mt
WHERE ( c2.Id = mt.parentId or c2.parentId = mt.parentId)
)
ORDER BY id;

You may try this....
; with cte as(
select distinct mas.id, mas.parentId, iif(mas.parentid is null, 1, 0) PId
from #MasterTable mas inner join #SubTable sub
on sub.id in(mas.id, mas.parentid) ----- create top node having parentid is null
union all
select mas.id, mas.parentId, ct.PId
from cte ct inner join #MasterTable mas
on (ct.PId = 1 and mas.parentid = ct.id) or
(ct.PId = 0 and ct.parentid = mas.id) ----- create child node for correspoding parentid created above
)
select distinct id, parentid from cte order by id
option (MAXRECURSION 100); ---- Add Maxrecursion to prevent the infinite loop
You can find this link for more info on recursive query in SQL link. In this link see Example E or above.

Custom ordering before regular ordering?

I have 3 tables:
CREATE TABLE items (
id integer PRIMARY KEY,
title varchar (256) NOT NULL
);
INSERT INTO items (id, title) VALUES (1, 'qux');
INSERT INTO items (id, title) VALUES (2, 'quux');
INSERT INTO items (id, title) VALUES (3, 'quuz');
INSERT INTO items (id, title) VALUES (4, 'corge');
INSERT INTO items (id, title) VALUES (5, 'grault');
CREATE TABLE last_used (
item_id integer NOT NULL REFERENCES items (id),
date integer NOT NULL
);
INSERT INTO last_used (item_id, date) VALUES (2, 1000);
INSERT INTO last_used (item_id, date) VALUES (3, 2000);
INSERT INTO last_used (item_id, date) VALUES (2, 3000);
CREATE TABLE rating (
item_id integer NOT NULL REFERENCES items (id),
rating integer NOT NULL
);
INSERT INTO rating (item_id, rating) VALUES (1, 400);
INSERT INTO rating (item_id, rating) VALUES (2, 100);
INSERT INTO rating (item_id, rating) VALUES (3, 200);
INSERT INTO rating (item_id, rating) VALUES (4, 300);
INSERT INTO rating (item_id, rating) VALUES (5, 500);
I want to select rows in the following order:
Last used items matching the search string;
Most rated items matching the search string;
All other items matching the search string.
For the search i.title ~* '(?=.*u)', I get:
id | title | max(last_used.date) | rating.rating
3 | quuz | 2000 | 200
2 | quux | 3000 | 100
5 | grault | null | 500
1 | qux | null | 400
…with the following code:
WITH used AS (
SELECT lu.item_id
FROM last_used lu
JOIN (
SELECT item_id, max(date) AS date
FROM last_used
GROUP BY 1
) sub USING (date)
-- WHERE lu.user_id = 1
ORDER BY lu.date DESC
)
SELECT i.id, i.title, r.rating
FROM items i
LEFT JOIN rating r
ON r.item_id = i.id
WHERE
i.title ~* '(?=.*u)'
ORDER BY
i.id NOT IN (SELECT item_id FROM used),
r.rating DESC NULLS LAST
LIMIT 5 OFFSET 0
Is it possible to get the following results (latest used items first)?
id | title | max(last_used.date) | rating.rating
2 | quux | 3000 | 100
3 | quuz | 2000 | 200
5 | grault | null | 500
1 | qux | null | 400

You can use the following query to get the desired order
through the ORDER BY clause with l.date DESC NULLS LAST, r.rating DESC NULLS LAST:
SELECT i.id, i.title, l.date, r.rating
FROM items i
LEFT JOIN rating r
ON r.item_id = i.id
LEFT JOIN ( SELECT item_id, max(date) AS date FROM last_used GROUP BY 1 ) l
ON l.item_id = i.id
WHERE
i.title ~* '(?=.*u)'
ORDER BY l.date DESC NULLS LAST, r.rating DESC NULLS LAST
LIMIT 5 OFFSET 0;
Demo

Avoiding GROUP BY on specific columns

I am looking at avoiding grouping by specific columns in a large query to reduce performance problems.
Example: http://sqlfiddle.com/#!18/cb98e/2
CREATE TABLE [OrderTable]
(
[id] INT,
[OrderGroupID] INT,
[Total] INT,
[fkPerson] INT,
[fkitem] INT,
PRIMARY KEY (id)
)
INSERT INTO [OrderTable] (id, OrderGroupID, Total, [fkPerson], [fkItem])
VALUES ('1', '1', '20', '1', '1'),
('2', '1', '45', '2', '2'),
('3', '2', '32', '1', '1'),
('4', '2', '30', '2', '2');
CREATE TABLE [Person]
(
[id] INT,
[Name] VARCHAR(32),
PRIMARY KEY (id)
)
INSERT INTO [Person] (id, Name)
VALUES ('1', 'Fred'),
('2', 'Sam');
CREATE TABLE [Item]
(
[id] INT,
[ItemNo] VARCHAR(32),
[Price] INT,
PRIMARY KEY (id)
)
INSERT INTO [Item] (id, ItemNo, Price)
VALUES ('1', '453', '23'),
('2', '657', '34');
Query:
WITH TABLE1 AS
(
SELECT
P.ID AS [PersonID],
P.Name,
SUM(OT.[Total]) AS [Total],
i.[id] AS [ItemID],
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rownum,
ot.fkperson
FROM
OrderTable OT
INNER JOIN
Person P ON P.ID = OT.fkperson
INNER JOIN
Item I ON I.[id] = OT.[fkItem]
GROUP BY
P.ID, P.Name, i.id, ot.fkperson
)
SELECT
*,
Totalrows = (SELECT MAX(rownum) FROM TABLE1)
FROM
TABLE1
Result:
| PersonID | Name | Total | ItemID | rownum | fkperson | Totalrows |
+----------+------+-------+--------+--------+----------+-----------+
| 1 | Fred | 52 | 1 | 1 | 1 | 2 |
| 2 | Sam | 75 | 2 | 2 | 2 | 2 |
Now, for example if i didn't want to GROUP BY a varchar column (Person Name) i could do this - remove person table and join it back later. e.g.
WITH TABLE1 AS
(
SELECT
-- P.ID AS [PersonID],
-- P.Name,
SUM(OT.[Total]) AS [Total],
i.[id] AS [ItemID],
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rownum,
ot.fkperson
FROM
OrderTable OT
-- INNER JOIN Person P ON P.ID = OT.fkperson
INNER JOIN
Item I ON I.[id] = OT.[fkItem]
GROUP BY
-- P.ID, P.Name,
i.id, ot.fkperson
)
SELECT
p.id as [PersonID],
p.Name,
t1.[total],
t1.[itemid],
t1.[rownum],
t1.fkperson
-- Totalrows = (SELECT MAX(rownum) FROM TABLE1 GROUP BY
-- i.id
-- ,ot.fkperson
-- )
FROM
TABLE1 T1
INNER JOIN
Person P ON P.ID = T1.fkperson
Result:
| PersonID | Name | total | itemid | rownum | fkperson |
+----------+------+-------+--------+--------+----------+
| 1 | Fred | 52 | 1 | 1 | 1 |
| 2 | Sam | 75 | 2 | 2 | 2 |
My issue is I also want to include the MAX(rownum) column but how can I do this in my last query without having to group by everything again? What's the best approach to this? Have I missed something really obvious? :)

You can also use CROSS APPLY. And count for the total you can use COUNT(*) OVER()
SELECT
P.id as [PersonID],
P.Name,
T1.[total],
I.id as [itemid],
ROW_NUMBER() OVER (
ORDER BY (SELECT NULL)
) AS [rownum],
T1.fkperson,
COUNT(*) OVER () Totalrows
FROM
Item I
CROSS APPLY (SELECT ot.fkperson, SUM(OT.[Total]) AS [total]
FROM OrderTable OT
WHERE I.[id] = OT.[fkItem]
GROUP BY ot.fkperson ) AS T1
INNER JOIN Person P ON P.id = t1.fkperson
Result:
PersonID Name total itemid rownum fkperson Totalrows
----------- ----------- ----------- ----------- --------- ----------- -----------
1 Fred 52 1 1 1 2
2 Sam 75 2 2 2 2

Grouping values in order to sort by minimum of a value, then other fields then by that value itself

I would like to sort my data by an aggregate value, then by other fields, then by the unaggregated value.
I have the following schema:
CREATE TABLE priority_t (
id NUMERIC(10),
priority NUMERIC(10),
CONSTRAINT priority_t_pk PRIMARY KEY (id)
);
CREATE TABLE value_t (
id NUMERIC(10),
value NUMERIC(10),
CONSTRAINT value_t_pk PRIMARY KEY (id)
);
CREATE TABLE file_t (
id NUMERIC(10),
CONSTRAINT file_t_pk PRIMARY KEY (id)
);
CREATE TABLE main_t (
id NUMERIC(10),
priority_id NUMERIC(10),
value_id NUMERIC(10),
file_id NUMERIC(10),
CONSTRAINT main_t_pk PRIMARY KEY (id),
CONSTRAINT priority_t_fk FOREIGN KEY (priority_id) REFERENCES priority_t(id),
CONSTRAINT value_t_fk FOREIGN KEY (value_id) REFERENCES value_t(id),
CONSTRAINT file_t_fk FOREIGN KEY (file_id) REFERENCES file_t(id)
);
Then I insert the following data:
INSERT INTO priority_t (id, priority) VALUES (1, 10);
INSERT INTO priority_t (id, priority) VALUES (2, 20);
INSERT INTO value_t (id, value) VALUES (1, 987);
INSERT INTO value_t (id, value) VALUES (2, 876);
INSERT INTO value_t (id, value) VALUES (3, 765);
INSERT INTO value_t (id, value) VALUES (4, 654);
INSERT INTO file_t (id) VALUES (111);
INSERT INTO file_t (id) VALUES (222);
INSERT INTO file_t (id) VALUES (333);
INSERT INTO file_t (id) VALUES (444);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (1, 1, 1, 111);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (2, 2, 1, 111);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (3, 2, 2, 222);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (4, 1, 2, 333);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (5, 2, 3, 111);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (6, 1, 4, 444);
INSERT INTO main_t (id, priority_id, value_id, file_id) VALUES (7, 2, 4, 444);
COMMIT;
And I want to get the following result:
min_priority | priority | value | value_id | file_id
(hidden) | | | (hidden) |
--------------+----------+-------+----------+---------
10 | 10 | 654 | 4 | 444
10 | 20 | 654 | 4 | 444
10 | 10 | 876 | 2 | 333
10 | 10 | 987 | 1 | 111
10 | 20 | 987 | 1 | 111
20 | 20 | 765 | 3 | 111
20 | 20 | 876 | 2 | 222
I know how to sort them:
ORDER BY min_value ASC, value ASC, value_id ASC, priority ASC
But my problem is that I don't know how to group the values themselves: I keep getting duplicates in my rows, and/or incorrect values.
My closest attempt is the following:
WITH listing AS (
SELECT m.id AS main_id,
p.id AS priority_id,
p.priority AS priority,
v.id AS value_id,
v.value AS value,
f.id AS file_id
FROM main_t m
INNER JOIN priority_t p ON m.priority_id = p.id
INNER JOIN value_t v ON m.value_id = v.id
INNER JOIN file_t f ON m.file_id = f.id
)
SELECT min_p.min_priority AS min_priority,
listing.priority AS priority,
listing.value AS value,
listing.file_id AS file_id
FROM listing,
(
SELECT min(min_p_value.min_priority) AS min_priority,
min_p_value.value_id AS min_value_id,
listing.file_id AS file_id
FROM listing,
(
SELECT min(listing.priority) AS min_priority,
listing.value AS value,
listing.value_id AS value_id
FROM listing
GROUP BY listing.value, listing.value_id
) min_p_value
WHERE listing.value = min_p_value.value
AND listing.value_id = min_p_value.value_id
AND min_p_value.min_priority = min_priority
GROUP BY min_p_value.value_id, listing.file_id
) min_p
WHERE min_p.min_value_id = listing.value_id
AND min_p.file_id = listing.file_id
ORDER BY min_p.min_priority ASC,
listing.value ASC,
listing.value_id ASC,
listing.priority;
And this returns the following incorrect result:
MIN_PRIORITY PRIORITY VALUE FILE_ID
------------- ---------- ---------- ----------
10 10 654 444
10 20 654 444
10 10 876 333
10 20 876 222 <-- incorrect, should have a min_priority of 20, and therefore be the last
10 10 987 111
10 20 987 111
20 20 765 111
How can I achieve what I expect?

this should work:
select (select min(priority)
from main_t mm
join priority_t tt
on tt.id = mm.priority_id
where mm.value_id = m.value_id) as min_priority,
p.priority as priority,
v.value as value,
m.value_id,
m.file_id
from main_t m
join priority_t p
on p.id = m.priority_id
join value_t v
on v.id = m.value_id
order by 1, 3, 4, 2;
It determines the minimum priority by value_id.
You can order the result by column number as shown.

I have written this code in MySQL (not Oracle) and was not sure about the CTE syntax, so I replaced the WITH clause with a temp table called min_priority, but you can of course replace the temp table with a CTE. When I ran this on MySQl, I got the same result as you wanted.
Also was not sure whether you need a left join or inner join, both would work in this example.
-- find min priority for each value
create temporary table if not exists min_priority as (
select m.value_id, min(p.priority) as min_pri
from main_t m
inner join priority_t p on m.priority_id = p.id
group by m.value_id
);
-- just join all the tables, including min_priority, and order the result
select mp.min_pri, p.priority, v.value, v.id, f.id
from main_t m
left join priority_t p on m.priority_id = p.id
left join value_t v on m.value_id = v.id
left join file_t f on m.file_id = f.id
left join min_priority mp on mp.value_id = v.id
order by mp.min_pri asc, v.value asc, v.id asc, p.priority asc;
Just as a side note, I wouldn't use multiple levels of inner queries, as your example in the question, as they would affect the performance.

When you need aggregated and non-aggregated values use analytic functions. I think, that would be something like this:
select *
from (select min(p.priority) over (partition by v.id, v.value, f.id) min_priority,
p.priority, v.value, v.id value_id, f.id file_id
from main_t m
join priority_t p on m.priority_id = p.id
join value_t v on m.value_id = v.id
join file_t f on m.file_id = f.id)
order by min_priority, value, value_id, priority
Result:
MIN_PRIORITY PRIORITY VALUE VALUE_ID FILE_ID
------------ ----------- ----------- ----------- -----------
10 10 654 4 444
10 20 654 4 444
10 10 876 2 333
10 10 987 1 111
10 20 987 1 111
20 20 765 3 111
20 20 876 2 222

How to update X% of rows to A, Y% of rows to B, Z% of rows to C

I have a table like this:
Products
(
ID int not null primary key,
Type int not null,
Route varchar(20) null
)
I have a list on the client in this format:
Type=1, Percent=0.4, Route=A
Type=1, Percent=0.4, Route=B
Type=1, Percent=0.2, Route=C
Type=2, Percent=0.5, Route=A
Type=2, Percent=0.5, Route=B
Type=3, Percent=1.0, Route=C
...etc
When done, I'd like to assign 40% of type 1 products to Route A, 40% to Route B and 20% to Route C. Then 50% of type 2 products to Route A and 50% of type 2 products to Route B, etc.
Is there some way to do this in a single update statement?
If not in one giant statement, can it be done in one statement per type or one statement per route? As currently we're doing one per type+route any of the above would be an improvement.

Here's an Oracle statement that I prepared before you posted that you were using SQL-Server, but it might give you some ideas, though you will have to roll your own ratio_to_report analytic function using CTE and self-joins. We calculate the cumulative proportion of each type in the products and client route tables and do a non equi-join on the matching proportion bands. The sample data I have used has some round-offs but these will reduce for larger data sets.
Here's the setup:
create table products (id int not null primary key, "type" int not null, route varchar (20) null);
create table clienttable ( "type" int not null, percent number (10, 2) not null, route varchar (20) not null);
insert into clienttable ("type", percent, route) values (1, 0.4, 'A');
insert into clienttable ("type", percent, route) values (1, 0.4, 'B');
insert into clienttable ("type", percent, route) values (1, 0.2, 'C');
insert into clienttable ("type", percent, route) values (2, 0.5, 'A');
insert into clienttable ("type", percent, route) values (2, 0.5, 'B');
insert into clienttable ("type", percent, route) values (3, 1.0, 'C');
insert into products (id, "type", route) values (1, 1, null);
insert into products (id, "type", route) values (2, 1, null);
insert into products (id, "type", route) values (3, 1, null);
insert into products (id, "type", route) values (4, 1, null);
insert into products (id, "type", route) values (5, 1, null);
insert into products (id, "type", route) values (6, 1, null);
insert into products (id, "type", route) values (7, 1, null);
-- 7 rows for product type 1 so we will expect 3 of route A, 3 of route B, 1 of route C (rounded)
insert into products (id, "type", route) values (8, 2, null);
insert into products (id, "type", route) values (9, 2, null);
insert into products (id, "type", route) values (10, 2, null);
insert into products (id, "type", route) values (11, 2, null);
insert into products (id, "type", route) values (12, 2, null);
-- 5 rows for product type 2 so we will expect 3 of route A and 2 of route B (rounded)
insert into products (id, "type", route) values (13, 3, null);
insert into products (id, "type", route) values (14, 3, null);
-- 2 rows for product type 3 so we will expect 2 of route C
and here's the statement
select prods.id, prods."type", client.route cr from
(
select
p.id,
p."type",
row_number () over (partition by p."type" order by p.id) / count (*) over (partition by p."type") cum_ratio
from
products p
) prods
inner join
(
select "type", route, nvl (lag (cum_ratio, 1) over (partition by "type" order by route), 0) ratio_start, cum_ratio ratio_end from
(select "type", route, sum (rr) over (partition by "type" order by route) cum_ratio
from (select c."type", c.route, ratio_to_report (c.percent) over (partition by "type") rr from clienttable c))) client
on prods."type" = client."type"
and prods.cum_ratio >= client.ratio_start and prods.cum_ratio < client.ratio_end
This gives the following result:-
+----+------+----+
| ID | type | CR |
+----+------+----+
| 1 | 1 | A |
| 2 | 1 | A |
| 3 | 1 | B |
| 4 | 1 | B |
| 5 | 1 | B |
| 6 | 1 | C |
| 8 | 2 | A |
| 9 | 2 | A |
| 10 | 2 | B |
| 11 | 2 | B |
| 13 | 3 | C |
+----+------+----+

How about something like
--For updating type 1, set every route for type 1 as null.
UPDATE MyTable
SET [Route] = null
WHERE [Type] = '1'
--Update Route A(40%)
DECLARE #myVal int;
SET #myVal =CAST(0.4*(SELECT COUNT(*) FROM myTable WHERE [Type]='1') AS INT);
WITH tab AS
(
SELECT TOP (#myVal) *
FROM myTable
)
UPDATE tab
SET [Route] = 'A'
WHERE [Route] is null
--Update Route B (40%)
DECLARE #myVal int;
SET #myVal =CAST(0.4*(SELECT COUNT(*) FROM myTable WHERE [Type]='1') AS INT);
WITH tab AS
(
SELECT TOP (#myVal) *
FROM myTable
)
UPDATE tab
SET [Route] = 'B'
WHERE [Route] is null
--Update Route C (20%)
DECLARE #myVal int;
SET #myVal =CAST(0.2*(SELECT COUNT(*) FROM myTable WHERE [Type]='1') AS INT);
WITH tab AS
(
SELECT TOP (#myVal) *
FROM myTable
)
UPDATE tab
SET [Route] = 'C'
WHERE [Route] is null

I do not know if similar functionality exist in SQL Server. In Oracle there is SAMPLE clause.
Below query selects 10% of rows from a table:
SELECT empno
FROM scott.emp
SAMPLE (10)
/
Then your update would be easy... Maybe smth similar exists in SQL Server. You can also count rows or data then calc percent then update...

WITH po AS
( SELECT
ID,
Type,
ROW_NUMBER() OVER ( PARTITION BY Type
ORDER BY ID
) AS Rn,
COUNT(*) OVER (PARTITION BY Type) AS CntType
FROM
Products
)
, ro AS
( SELECT
Type,
Route,
( SELECT SUM(rr.Percent)
FROM Route AS rr
WHERE rr.Type = r.Type
AND rr.Route <= r.Route
) AS SumPercent
FROM
Routes AS r
)
UPDATE p
SET p.Route =
( SELECT MIN(ro.Route)
FROM ro
WHERE ro.Type = po.Type
AND ro.SumPercent >= po.Rn / po.CntType
)
FROM Products AS p
JOIN
po ON po.ID = p.ID ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

remove duplicate in double left join - sql

Related

SQL select parent-child recursively based on a reference table

Custom ordering before regular ordering?

Avoiding GROUP BY on specific columns

Grouping values in order to sort by minimum of a value, then other fields then by that value itself

How to update X% of rows to A, Y% of rows to B, Z% of rows to C

Categories

Resources