Assign ID based on commonality/groups

Assign ID based on commonality/groups - sql

I've put together what I view to be overly complicated SQL to get to what I'm after. I'm hoping for insight into a quicker and less complicated method.
What I'm after is the ability to assign an ID to groups of data where there is common groups of data across two columns.
For example I have the following subset of data:
CustID PartID RplcID
28 4 4
28 4 16
28 4 17
28 16 4
28 16 16
28 16 17
28 17 4
28 17 16
28 17 17
I want to create an ID for CustID=28 where there is overlap in the RplcID and PartID. So in this example, PartID 4, 16, 17 all have RplcIDs in common (4, 16, 17). As such, all of these pairs should have the same ID.
The method I'm using works (and is faster with temp tables instead of solely using CTEs) except for large datasets this thing is S-L-O-W. I'm sure there's a more efficient method out there and hoping someone can lend their expertise.
I'm outlining my current approach for as much clarity into my muddled thinking as possible.
STEP 1
Generate temporary ID using DENSE_RANK() partitioned by CustID, ordered by PartID.
RowID CustID PartID RplcID
1 28 16 16
1 28 17 16
1 28 4 16
2 28 16 17
2 28 17 17
2 28 4 17
3 28 16 4
3 28 17 4
3 28 4 4
STEP 2:
Then use these results and aggregate the PartIDs by using XML to create a comma separated string with which to group by.
RowID CustID RplcID PartIDS
4 28 16 16,17,4
4 28 17 16,17,4
4 28 4 16,17,4
STEP 3:
And finally split out these groups using the assigned ID by parsing the XML.
RowID CustID PartID RplcID
4 28 16 16
4 28 16 17
4 28 16 4
4 28 17 16
4 28 17 17
4 28 17 4
4 28 4 16
4 28 4 17
4 28 4 4
And the entirety of the SQL:
DECLARE #Parts TABLE
(
CustID VARCHAR(10),
PartID VARCHAR(10),
RplcID VARCHAR(10)
)
Insert Into #Parts VALUES
('26','19','93'),('26','19','63'),
('26','31','93'),('26','31','63'),('26','32','93'),('26','32','63'),('26','33','93'),('26','33','63'),('26','34','93'),
('26','34','63'),('26','35','93'),('26','35','63'),('26','36','93'),('26','36','63'),('26','37','93'),('26','37','63'),
('26','38','93'),('26','38','63'),('26','39','93'),('26','39','63'),('27','40','95'),('27','41','94'),
('27','41','95'),('27','42','94'),('27','42','95'),('27','43','94'),('27','43','95'),('27','44','94'),('27','44','95'),
('27','45','94'),('27','45','95'),('27','46','94'),('27','46','95'),('27','47','94'),('27','47','95'),('27','48','94'),
('27','48','95'),('27','49','94'),('27','49','95'),('27','50','94'),('27','50','95'),('27','17','94'),('27','17','95'),
('27','51','94'),('27','51','95'),('27','52','94'),('27','52','95'),('27','53','94'),('27','53','95'),('27','54','94'),
('27','54','95'),('27','33','94'),('27','33','95'),('27','55','94'),('27','55','95'),('27','34','94'),('27','34','95'),
('27','56','94'),('27','56','95'),('27','35','94'),('27','35','95'),('27','57','94'),('27','57','95'),('27','58','94'),
('27','58','95'),('27','59','94'),('27','59','95'),('27','37','94'),('27','37','95'),('27','60','94'),('27','60','95'),
('27','61','94'),('27','61','95'),('27','62','94'),('27','62','95'),('27','63','94'),('27','63','95'),('27','64','94'),
('27','64','95'),('27','3','96'),('27','3','97'),('27','3','98'),('27','3','99'),('27','3','100'),('28','4','4'),
('28','4','16'),('28','4','17'),('28','16','4'),('28','16','16'),('28','16','17'),('28','17','4'),('28','17','16'),
('28','17','17')
;
--Step 1: Create the initial ID
SELECT DISTINCT DENSE_RANK()
OVER(
partition BY r.CustID
ORDER BY r2.RplcID) AS RowID,
r.CustID,
r.BuyID,
r2.RplcID
INTO #tmp
FROM #Parts r
JOIN #Parts r1
ON r.CustID = r1.CustID
AND r.RplcID = r1.RplcID
JOIN #Parts r2
ON r.CustID = r2.CustID
AND r1.BuyID = r2.BuyID
--Step 2: Group the BuyIDs
SELECT DENSE_RANK()
OVER(
ORDER BY CustID, BuyIDs) AS RowID,
*
INTO #tmp2
FROM (SELECT CustID,
Rtrim(RplcID) RplcID,
Stuff((SELECT ',' + Rtrim(BuyID)
FROM #tmp RSLT2
WHERE RSLT2.ROWID = RSLT.ROWID
AND RSLT2.CustID = RSLT.CustID
FOR xml path('')), 1, 1, '') [BuyIDs]
FROM #tmp RSLT
GROUP BY RSLT.CustID,
RSLT.ROWID,
RSLT.RplcID)A
--Step 3: Using the grouped BuyIDs, split the strings using XML and assign RowID
SELECT RowID,
CustID,
BuyID,
RplcID
INTO #tmp3
FROM (SELECT RowID,
CustID,
n.r.value('.','varchar(10)') AS BuyID,
RplcID
FROM #tmp2
CROSS APPLY(SELECT Cast('<r>' + Replace(BuyIDs, ',', '</r><r>')
+ '</r>' AS XML)) AS S(xmlcol)
CROSS APPLY s.xmlcol.nodes('r') AS n(r))A
Order by RowID
Select * from #tmp3 where CustID='28'
Select distinct BuyID
from #tmp3
where CustID='28'
Select distinct RplcID
from #tmp3
where CustID='28'

Related

Would anyone know why I am getting invalid object name for CTE_ApplicationStatus?

1 DECLARE #Totalapp int,
2 #10days int;
3 SELECT #Totalapp = Count(0) from dbo.company;
4 WITH
5 CTE_ApplicationStatus AS (
6 SELECT
7 ROW_NUMBER() OVER (PARTITION BY a.Id ORDER BY a.Id, asl.DateCreated ASC) Row#,
8 asl.ApplicationStatus,
9 a.Id,
10 asl.DateCreated
11 FROM
12 mcf.[Application] a
13 INNER JOIN
14 mcf.ApplicationStatusLog asl ON asl.ApplicationId = a.Id
15 WHERE
16 asl.Id IS NOT NULL
17 )
18 SELECT #10days = COUNT(0) FROM CTE_ApplicationStatus
19
20 SELECT
21 #TotalApp/#10days * 100 as conversion,
22 DATEDIFF(DAY,sc.DateCreated ,sc2.DateCreated) DaysElapsed
23
24
25 FROM
26 CTE_ApplicationStatus sc
27 LEFT OUTER JOIN
28 CTE_ApplicationStatus sc2
29 ON
30 sc2.Id = sc.Id AND sc2.Row# = sc.Row# + 1
invalid object name for 'CTE_ApplicationStatus' on line 20?
I am trying to get the count of the first CTE and then use it in the select statement on line 20 but i seem to be getting an error message

Break up running sum into maximum group size / length

I am trying to break up a running (ordered) sum into groups of a max value. When I implement the following example logic...
IF OBJECT_ID(N'tempdb..#t') IS NOT NULL DROP TABLE #t
SELECT TOP (ABS(CHECKSUM(NewId())) % 1000) ROW_NUMBER() OVER (ORDER BY name) AS ID,
LEFT(CAST(NEWID() AS NVARCHAR(100)),ABS(CHECKSUM(NewId())) % 30) AS Description
INTO #t
FROM sys.objects
DECLARE #maxGroupSize INT
SET #maxGroupSize = 100
;WITH t AS (
SELECT
*,
LEN(Description) AS DescriptionLength,
SUM(LEN(Description)) OVER (/*PARTITION BY N/A */ ORDER BY ID) AS [RunningLength],
SUM(LEN(Description)) OVER (/*PARTITION BY N/A */ ORDER BY ID)/#maxGroupSize AS GroupID
FROM #t
)
SELECT *, SUM(DescriptionLength) OVER (PARTITION BY GroupID) AS SumOfGroup
FROM t
ORDER BY GroupID, ID
I am getting groups that are larger than the maximum group size (length) of 100.

A recusive common table expression (rcte) would be one way to resolve this.
Sample data
Limited set of fixed sample data.
create table data
(
id int,
description nvarchar(20)
);
insert into data (id, description) values
( 1, 'qmlsdkjfqmsldk'),
( 2, 'mldskjf'),
( 3, 'qmsdlfkqjsdm'),
( 4, 'fmqlsdkfq'),
( 5, 'qdsfqsdfqq'),
( 6, 'mds'),
( 7, 'qmsldfkqsjdmfqlkj'),
( 8, 'qdmsl'),
( 9, 'mqlskfjqmlkd'),
(10, 'qsdqfdddffd');
Solution
For every recursion step evaluate (r.group_running_length + len(d.description) <= #group_max_length) if the previous group must be extended or a new group must be started in a case expression.
Set group target size to 40 to better fit the sample data.
declare #group_max_length int = 40;
with rcte as
(
select d.id,
d.description,
len(d.description) as description_length,
len(d.description) as running_length,
1 as group_id,
len(d.description) as group_running_length
from data d
where d.id = 1
union all
select d.id,
d.description,
len(d.description),
r.running_length + len(d.description),
case
when r.group_running_length + len(d.description) <= #group_max_length
then r.group_id
else r.group_id + 1
end,
case
when r.group_running_length + len(d.description) <= #group_max_length
then r.group_running_length + len(d.description)
else len(d.description)
end
from rcte r
join data d
on d.id = r.id + 1
)
select r.id,
r.description,
r.description_length,
r.running_length,
r.group_id,
r.group_running_length,
gs.group_sum
from rcte r
cross apply ( select max(r2.group_running_length) as group_sum
from rcte r2
where r2.group_id = r.group_id ) gs -- group sum
order by r.id;
Result
Contains both the running group length as well as the group sum for every row.
id description description_length running_length group_id group_running_length group_sum
-- ---------------- ------------------ -------------- -------- -------------------- ---------
1 qmlsdkjfqmsldk 14 14 1 14 33
2 mldskjf 7 21 1 21 33
3 qmsdlfkqjsdm 12 33 1 33 33
4 fmqlsdkfq 9 42 2 9 39
5 qdsfqsdfqq 10 52 2 19 39
6 mds 3 55 2 22 39
7 qmsldfkqsjdmfqlkj 17 72 2 39 39
8 qdmsl 5 77 3 5 28
9 mqlskfjqmlkd 12 89 3 17 28
10 qsdqfdddffd 11 100 3 28 28
Fiddle to see things in action (includes random data version).

SQL to allocate rows from one table to another

I would like to allocate rows from one table to another using a sort on TopLvlOrd field. The inputs are the [Orders] table and the [Defects] table. I would like to create an SQL that produces [Output]. Even after a bunch of online research I'm not sure how to do this. I'd prefer not to do a cursor, but will go there if necessary. Any ideas? Using SQL Server 2012.
Rules:
(1) Allocate by TopLvlOrd asc,
(2) Allocate one TopLvlOrd row per PegQty
[Orders]
TopLvlOrd IntOrd PegQty
========= ====== ======
67 25 3
120 25 1
111 25 1
16 25 1
127 25 1
127 65 1
127 85 1
[Defects]
DefectID IntOrd TotQty
======== ====== ======
1 25 10
2 25 10
3 25 10
4 25 10
5 25 10
6 25 10
7 25 10
8 25 10
9 25 10
10 25 10
11 65 1
12 85 2
13 85 2
[Output]
DefectID IntOrd TotQty TopLvlOrd
======== ====== ====== =========
1 25 10 16
2 25 10 67
3 25 10 67
4 25 10 67
5 25 10 111
6 25 10 120
7 25 10 127
8 25 10 NULL
9 25 10 NULL
10 25 10 NULL
11 65 1 127
12 85 2 127
13 85 2 NULL

This answers the original version of the question.
I think you want to join on an implicit sequence number, which you can add using row_number():
select d.*, o.*
from (select d.*,
row_number() over (partition by intord order by defectid) as seqnum
from defects d
) d left join
(select o.*,
row_number() over (partition by IntOrd order by TopLvlOrd) as seqnum
from orders o
) o
on d.intord = o.intord and d.seqnum = o.seqnum

Please another time make another question. Please check this query:
SELECT DefectID, IntOrd,
TotQty, TopLvlOrd, PegQty
FROM
(
SELECT B. DefectID, COALESCE (A.IntOrd, B. IntOrd) IntOrd,
B.TotQty, A. TopLvlOrd, A.PegQty FROM
(
SELECT TopLvlOrd ,IntOrd, ROW_NUMBER () OVER (PARTITION By IntOrd ORDER by
TopLvlOrd) Num, PegQty FROM Orders
) A
FULL JOIN
(
SELECT DefectID , IntOrd ,TotQty, ROW_NUMBER () OVER (PARTITION By IntOrd ORDER by
TotQty) Num FROM Orders
) B
ON A. IntOrd=B.IntOrd AND A.Num=B.Num
)C
JOIN
master.dbo.spt_values Tab
ON Tab.type='P' AND Tab.number<C.PegQty

SELECT B. DefectID, COALESCE (A.IntOrd, B. IntOrd) IntOrd,
B.TotQty, A. TopLvlOrd FROM
(
SELECT TopLvlOrd ,IntOrd , ROW_NUMBER () OVER (PARTITION By IntOrd ORDER by TopLvlOrd) Num FROM Orders
) A
FULL JOIN
(
SELECT DefectID , IntOrd ,TotQty, ROW_NUMBER () OVER (PARTITION By IntOrd ORDER by TotQty) Num FROM Orders
) B
ON A. IntOrd=B.IntOrd AND A. Num=B.Num

SSRS not showing all rows by level (hierarchy)

Ok folks, this is driving me crazy...
I have a report that pulls back details for a number of features. THese features can hang off others, exist in their own right or both.
I have the following data as the result of the query:
Feature_ID Parent_ID
24
24 25
20
26 12
12
21 23
26 20
22
24 23
23 26
24 27
27 28
24 22
29 20
23
25
27 29
22 26
28 12
As you can see, some of the features fit in multiple places in the hierarchy. However, all I get back in the report is:
I am grouping on Feature_ID, recursive parent is Parent_ID. What am I missing?

By your wording and unexpected output it feels like you are looking for a list of distinct levels and features. Hopefully the following is helpful. If not, perhaps you can provide some additional context to understand what you are looking for.
declare #table table (Feature_ID int, Parent_ID int);
insert #table values
(24,null),
(24,25),
(20,null),
(26,12),
(12,null),
(21,23),
(26,20),
(22,null),
(24,23),
(23,26),
(24,27),
(27,28),
(24,22),
(29,20),
(23,null),
(25,null),
(27,29),
(22,26),
(28,12);
select * from #table order by 1,2;
select * from #table order by 2,1;
with cte as (
select Feature_ID, Parent_ID, 0 [Level], CAST(Feature_ID as varchar(200)) [Path]
from #table
where Parent_ID is null
union all
select t.Feature_ID, t.Parent_ID, c.[Level] + 1, cast(c.[Path] + '|' + CAST(t.Feature_ID as varchar(200)) as varchar(200))
from #table t
join cte c
on c.Feature_ID = t.Parent_ID
)
select distinct [Level], Feature_ID
from cte
order by [Level], Feature_ID;
This gives the following result:
Level Feature_ID
0 12
0 20
0 22
0 23
0 24
0 25
1 21
1 24
1 26
1 28
1 29
2 22
2 23
2 27
3 21
3 24

ORACLE SQL Query Table using criteria from other table

TABLEA contains the data, while TABLEB contains the search criteria
Here is a SQL Fiddle with the data
Tables
TABLEA
visited_states_time
AL= Alabama,2, AK=Alaska,5
AR=Arkansas,6
AZ=Arizona,10
CA=California, 10,CT=Connecticut,20
TABLEB
CRITERIA
AL
HI
CA
CT
AK
Desired Result
visited_states ................................... total_time_spent
AL= Alabama, AK=Alaska ............................ 7
CA=California, CT=Connecticut................... 30

That's a terrible data model. also you didn't say the condition for tableb. if any state matches, or if all?
as we need to split the rows up (to sum()) and then recombine them you can use:
SQL> with v as (select rownum r,
2 ','||visited_states_time||',' visited_states_time,
3 length(
4 regexp_replace(visited_states_time, '[^,]', '')
5 )+1 fields
6 from tablea)
7 select trim(both ',' from visited_states_time) visited_states_time,
8 sum(total_time_spent) total_time_spent
9 from (select *
10 from v
11 model
12 partition by (r)
13 dimension by (0 as f)
14 measures (visited_states_time, cast('' as varchar2(2)) state,
15 0 as total_time_spent, fields)
16 rules (
17 state[for f from 0 to fields[0]-1 increment 2]
18 = trim(
19 substr(visited_states_time[0],
20 instr(visited_states_time[0], ',', 1, cv(f)+1)+1,
21 instr(visited_states_time[0], '=', 1, (cv(f)/2)+1)
22 - instr(visited_states_time[0], ',', 1, cv(f)+1)-1
23 )),
24 visited_states_time[any]= visited_states_time[0],
25 total_time_spent[any]
26 = substr(visited_states_time[0],
27 instr(visited_states_time[0], ',', 1, (cv(f)+2))+1,
28 instr(visited_states_time[0], ',', 1, (cv(f)+3))
29 - instr(visited_states_time[0], ',', 1, (cv(f)+2))-1
30 )
31 ))
32 where state in (select criteria from tableb)
33 group by visited_states_time;
VISITED_STATES_TIME TOTAL_TIME_SPENT
------------------------------------- ----------------
CA=California, 10,CT=Connecticut,20 30
AL=Alabama,2, AK=Alaska,5 7
but seriously, rewrite that data model to store them separately to start with.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Assign ID based on commonality/groups - sql

Related

Would anyone know why I am getting invalid object name for CTE_ApplicationStatus?

Break up running sum into maximum group size / length

SQL to allocate rows from one table to another

SSRS not showing all rows by level (hierarchy)

ORACLE SQL Query Table using criteria from other table

Categories

Resources