Create groups for INSERT based on IDs of source table - sql

I'm trying to regroup some Ids, and create unique ID in a grouping table (acting as a bridge table in my data warehouse).
I got in a the source table transactionId, UserId and OrganisationId.
What I want to do is to create "groups" of OrganisationId, depending on User association, in order to use this group in a joint.
See below:
Original datas
TransaID UserId OrgaId
---------- -------- --------
24011035 1 180
24011035 1 19
24011040 2 89
24011064 3 89
24011070 4 19
24011082 4 180
24011106 5 89
24011106 5 180
24011107 6 180
Desired output
OrgaGroupId OrgaId
------------- --------
1 180
1 19
2 89
3 180
3 89
4 180
I've created 1 group for combination of Orga 180 and 19, as 2 users got it.
Two simple groups for OrgaId 89 and 180, as they appear only once associated to a user.
And finally, another group for 180 and 89, as it's a new combination.
What would be the T-SQL statements to achieve output exposed above?

declare #t table(TransaID int, UserId int, OrgaId int);
insert into #t(TransaID, UserId, OrgaId)
values
(24011035, 1, 180),
(24011035, 1, 19),
(24011040, 2, 89),
(24011064, 3, 89),
(24011070, 4, 19),
(24011082, 4, 180),
(24011106, 5, 89),
(24011106, 5, 180),
(24011107, 6, 180);
select /*straggr,*/ OrgaId, dense_rank() over(order by min(UserId)) as grpid
from
(
select a.*,
(select distinct concat(b.OrgaId, ',') from #t as b where b.UserId = a.UserId order by concat(b.OrgaId, ',') for xml path('')) as straggr
from #t as a
) as t
group by straggr, OrgaId;

Related

Remove duplicates values when all values are the same

I am using SQL workbench/J connecting to amazon redshift.
I have the following data in a table (there are more columns that need to be kept but are all the exact same values for each unique claim_id regardless of line number):
Member ID | Claim_ID | Line_Number |
1 100 1
1 100 2
1 100 1
1 100 2
2 101 13
2 101 13
2 101 13
2 101 13
3 102 12
3 102 12
1 103 2
1 103 2
I want it to become the following which will remove any duplicates based on claim_id (it does not matter which line number is kept):
Member ID | Claim_ID | Line_Number |
1 100 1
2 101 13
3 102 12
1 103 2
I have tried the following:
select er_main.member_id, er_main.claim_id, er_main.line_number,
temp.claim_id, temp.line_number
from OK_ER_30 er_main
inner join (
select row_number() over (partition by claim_id order by line_number desc) as seqnum
from
OK_ER_30 temp) temp
ON er_main.claim_id = temp.claim_id and seqnum = 1
Order by er_main.claim_id, temp.line_number
and this:
select * from ok_er_30
where claim_id in
(select distinct claim_id
from ok_er_30
group by claim_id
)
order by claim_id desc
I have checked many other ways of pulling only one row per distinct claim_id but nothing has worked.
try this
select Distant(Member_ID,Claim_ID,max(Line_Number)) group by Member_ID,Claim_ID
Check out the following code.
declare #OK_ER_30 table(Member_ID int, Claim_ID int, Line_Number int);
insert #OK_ER_30 values
(1, 100, 1),
(1, 100, 2),
(1, 100, 1),
(1, 100, 2),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(3, 102, 12),
(3, 102, 12),
(1, 103, 2),
(1, 103, 2);
with
t as(
select *, row_number() over(
partition by Member_ID, Claim_ID order by (select 0)
) rn
from #OK_ER_30
)
delete from t where rn > 1;
select * from #OK_ER_30;
Try this,
select Member_ID,Claim_ID,max(Line_Number) group by Member_ID,Claim_ID

Distribute values to several rows in SQL Server

I need help with SQL Server on how to distribute a row value to several rows with the same id. To illustrate,
Id = ProductInventoryCode
Qty = QuantityInStock
ForDistribution:
Id | Qty | TotalNoOfBranchesWithId
---+--------+-------------------------
1 | 40 | 2
2 | 33 | 3
3 | 21 | 2
A table that will receive the distributed values
Id | BranchCode | Qty | QtyFromForDistributionTable
-------------------------------------------------------
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 10
2 103 3 12
3 101 1 11
3 102 12 10
As much as possible the distribution should be near equal for each id and branches.
I got something like below, but somewhat got confused and lost path.
with rs as
(
select
r.*, cume.cumequantity,
coalesce(s.shipped, 0) AS shipped
from
tmpForDistribution r
cross apply
(SELECT SUM([QuantityInStock]) AS cumequantity
FROM tmpForDistribution r2
WHERE r2.ProductInventoryCode = r.ProductInventoryCode) cume
left join
(SELECT ProductInventoryCode, COUNT(ProductInventoryCode) AS shipped
FROM tmpDistributed s
GROUP BY s.ProductInventoryCode) s ON r.ProductInventoryCode = s.ProductInventoryCode
)
select
rs.ProductInventoryCode, rs.cumequantity, rs.QuantityInStock,
***"how to distribute"***
from rs
I'm currently using SQL Server 2008
Here's a sample screen output
The upper result is 145 Branches, below we use to distribute the ForDistributionQty field which is 3130, I am ending up with a fraction (DistVal = 21.586) which is not correct for this problem, it should be a whole number such as 21, however, if its just 21, then 21 x 145 is just 3045 which is shy of 85 units.
Here we distribute the values, and then make a final "adjustment" to the record which has the largest quantity (arbitrary). But at the end of the day, the math works and the distributed values are square.
Note: Not sure why in your sample why ID 2 did not get an even distribution
Declare #Table table (Id int,BranchCode int,Qty int)
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,49)
;with cte0 as (
Select A.*
,ToDist = cast(D.Qty as int)
,DistVal = cast(D.Qty as int)/C.Cnt
,RN = Row_Number() over (Partition By A.ID Order By cast(D.Qty as int)/C.Cnt Desc,A.Qty Desc)
From #Table A
Join (Select ID,Cnt=count(*) from #Table Group By ID) C on A.ID=C.ID
Join #Dist D on A.ID=D.ID )
, cte1 as (
Select ID,AdjVal=Sum(DistVal)-max(ToDist) From cte0 Group By ID
)
Select A.ID
,A.BranchCode
,A.Qty
,DistVal = DistVal - case when A.RN<=abs(AdjVal) then 1*sign(AdjVal) else 0 end
From cte0 A
Join cte1 B on (A.ID=B.Id)
Order By 1,2
Returns
ID BranchCode Qty DistVal
1 101 13 20
1 102 8 20
2 101 10 11
2 102 2 11
2 103 3 11
3 101 1 24
3 102 12 25
If you can tolerate decimal values, a subquery seems to give a better query plan (tested on SQL 2014, with some sensible keys in place, this avoids a table spool and some additional index scans):
Declare #Table table (Id int,BranchCode int,Qty int, primary key(id, branchcode))
Insert Into #Table values
(1, 101, 13),
(1, 102, 8),
(2, 101, 10),
(2, 102, 2),
(2, 103, 3),
(3, 101, 1),
(3, 102, 12)
Declare #Dist table (ID int primary key,Qty int)
Insert Into #Dist values
(1,40),
(2,33),
(3,21)
SELECT
t.id
,t.BranchCode
,t.Qty
,(d.Qty / CAST((SELECT COUNT(*) as cnt FROM #table t2 where t.id = t2.id) AS decimal(10,2))) as DistributedQty
FROM #Table t
INNER JOIN #Dist d
ON d.id = t.Id
outputs:
Id BranchCode Qty DistributedQty
1 101 13 20.00000000000
1 102 82 20.00000000000
2 101 10 11.00000000000
2 102 21 11.00000000000
2 103 31 11.00000000000
3 101 11 10.50000000000
3 102 12 10.50000000000
If you need DistributedQty to be an int and retain remainders then I can't think of a better solution than #John Cappelletti's, noting that uneven quantities may not be as exactly even as you might hope (e.g. 32 distributed by three would result in a 12/10/10 distribution instead of an 11/11/10 distribution).

How would I select the max for each row of data based on timestamp and unique id using SQL? [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have a table in my database that I am using a SQL query to retrieve data from. In my query, I am replacing some text and using integers. The query returns the data below:
user_id | event_code | total_bookmarks | total_folders | folder_depth | ts
0 8 34 6 1 128926
0 8 35 6 1 129001
4 8 18 2 1 123870
6 8 30 2 1 130099
6 8 30 2 1 132000
6 8 30 2 1 147778
The query I am using is:
SELECT
user_id,
event_code,
CAST(REPLACE(data1, 'total bookmarks', '') AS INTEGER) as total_bookmarks,
CAST(REPLACE(data2, 'folders', '') AS INTEGER) as total_folders,
CAST(REPLACE(data3, 'folder depth ', '') AS INTEGER) as folder_depth,
timestamp AS ts
FROM events
WHERE event_code = 8
What do I need to add to my query in order to only select the rows for each unique user_id with the max ts (timestamp) for each id? I tried MAX(timestamp), but I get two rows returned for the same ID if the total_bookmark is different (example: user_id 0 having 34 in one row, and 35 in another) I want the table to look like this:
user_id | event_code | total_bookmarks | total_folders | folder_depth | ts
0 8 34 6 1 129001
4 8 18 2 1 123870
6 8 30 2 1 147778
Declare #table table (user_id int, event_code int, total_bookmarks int, total_folders int, folder_depth int, ts decimal(18,0))
Insert into #table (user_id , event_code , total_bookmarks , total_folders , folder_depth , ts)
Values (0,8,34,6,1,128926),
(0,8,34,6,1,129001),
(4, 8, 18 , 2, 1, 123870),
(6, 8, 30, 2, 1, 130099),
(6, 8, 30, 2, 1, 132000),
(6, 8, 30, 2, 1, 147778)
Select * from #table
Select user_id,event_code,total_bookmarks,total_folders,folder_depth,ts
From (
Select RANK() over (Partition by user_id
Order by ts desc
) as Rank,
user_id,event_code,total_bookmarks,total_folders,folder_depth,ts
From #table
) D1
Where D1.Rank = 1

SQL Server 2008: find number of contiguous rows with equal values

I have a table with multiple Ids. Each Id has values arranged by a sequential index.
create table myValues
(
id int,
ind int,
val int
)
insert into myValues
values
(21, 5, 300),
(21, 4, 310),
(21, 3, 300),
(21, 2, 300),
(21, 1, 345),
(21, 0, 300),
(22, 5, 300),
(22, 4, 300),
(22, 3, 300),
(22, 2, 300),
(22, 1, 395),
(22, 0, 300)
I am trying to find the number of consecutive values that are the same.
The value field represents some data that should be change on each entry (but need not be unique overall).
The problem is to find out when there are more than two consecutive rows with the same value (given the same id).
Thus I'm looking for an output like this:
id ind val count
21 5 300 1
21 4 310 1
21 3 300 2
21 2 300 2
21 1 345 1
21 0 300 1
22 5 300 4
22 4 300 4
22 3 300 4
22 2 300 4
22 1 395 1
22 0 300 1
I'm aware this is similar to the island and gaps problem discussed here.
However, those solutions all hinge on the ability to use a partition statement with values that are supposed to be consecutively increasing.
A solution that generates the ranges of "islands" as an intermediary would work as well, e.g.
id startind endind
21 3 2
22 5 2
Note that there can be many islands for each id.
I'm sure there is a simple adaptation of the island solution, but for the life of me I can't think of it.
find the continuous group and then do a count() partition by that
select id, ind, val, count(*) over (partition by id, val, grp)
from
(
select *, grp = dense_rank() over (partition by id, val order by ind) - ind
from myValues
) d
order by id, ind desc
The other solution is obviously more elegant. I'll have to study it a little closer myself.
with agg(id, min_ind, max_ind, cnt) as (
select id, min(ind), max(ind), count(*)
from
(
select id, ind, val, sum(brk) over (partition by id order by ind desc) as grp
from
(
select
id, ind, val,
coalesce(sign(lag(ind) over (partition by id, val order by ind desc) - ind - 1), 1) as brk
from myValues
) as d
) as d
group by id, grp
)
select v.id, v.ind, v.val, a.cnt
from myValues v inner join agg a on a.id = v.id and v.ind between min_ind and max_ind
order by v.id, v.ind desc;

SQL Group Range Values

I have taken a look at several other questions/answers on here but I cannot apply those to my problem. I am trying to identify multiple sequential breaks based on a key column. Most examples I have found do not deal with multiple breaks in a sequence for the same key column.
Sample data:
Location Number
------------------------
300 15
300 16
300 17
300 18
300 21
300 22
300 23
400 10
400 11
400 14
400 16
Here is the result I am looking for:
Location StartNumber StartNumber
------------------------------------------
300 15 18
300 21 23
400 10 11
400 14 14
400 16 16
Here's as relatively portable SQL solution since you didn't specify the DB
Create Table SampleData (Location int, Number Int)
INSERT INTO SampleData VALUES (300, 15)
INSERT INTO SampleData VALUES (300, 16)
INSERT INTO SampleData VALUES (300, 17)
INSERT INTO SampleData VALUES (300, 18)
INSERT INTO SampleData VALUES (300, 21)
INSERT INTO SampleData VALUES (300, 22)
INSERT INTO SampleData VALUES (300, 23)
INSERT INTO SampleData VALUES (400, 10)
INSERT INTO SampleData VALUES (400, 11)
INSERT INTO SampleData VALUES (400, 14)
INSERT INTO SampleData VALUES (400, 16)
SELECT
t1.Location,
t1.Number AS startofgroup,
MIN(t2.Number) AS endofgroup
FROM (SELECT Number , Location
FROM SampleData tbl1
WHERE NOT EXISTS(SELECT *
FROM SampleData tbl2
WHERE tbl1.Number - tbl2.Number = 1
and tbl1.Location = tbl2.Location)) t1
INNER JOIN (SELECT Number , Location
FROM SampleData tbl1
WHERE NOT EXISTS(SELECT *
FROM SampleData tbl2
WHERE tbl2.Number - tbl1.Number = 1
and tbl1.Location = tbl2.Location)) t2
ON t1.Number <= t2.Number
and t1.Location = t2.Location
GROUP BY
t1.Location,
t1.Number
ORDER BY
Location,
startofgroup
Output
Location startofgroup endofgroup
----------- ------------ -----------
300 15 18
300 21 23
400 10 11
400 14 14
400 16 16
Its a modified version of Listing 2. A set-based solution for identifying islands. From Islands and Gaps in Sequential Numbers by Alexander Kozak
If you're looking for more options with SQL Server 2005 and later you should search for the phrase "Itzik Ben-Gan gaps and islands"
Well, if you're using an RDBMS that supports the lag() function, then this should tell you where the breaks are. You should then be able to use this, along with some case statements and careful use of the min() and max() functions, to get the query that you want.
select location, lag_number as startnumber, number as endnumber
from(select location, number, lag_number
from(
select location, number
, lag(number) over (partition by location order by number) as lag_number
from table
)a
where number is not null and lag_number is not null
)b
where number-lag_number>1 order by 1,2,3;