SQL Data Grouping - sql

I was hoping someone would be able to help me (or point me in the right direction) on the following problem. I'm looking at grouping a large number of codes to only 3 digits, ensuring that if a participant had the code 122.2 and 122.3, it would count as one occurrence and not two.
Data Example:
Participant | group_code
1 | 1223
1 | 1224
1 | 1123
2 | 1012
2 | 0123
Current Code:
SELECT (left(group_code, 3)) as Group, count(left(group_code, 3)) as occurrence
from testDB
group by left(group_code, 3)
I suspect I need to use a unique element on the participant ID when grouping, however I'm not too sure.
Current Outcome:
Using the current data example, the result is as follows.
122 has 2 occurrences
112 has 1 occurrence
101 has 1 occurrence
012 has 1 occurrence
Expected Outcome:
122 has 1 occurrences
112 has 1 occurrence
101 has 1 occurrence
012 has 1 occurrence
Question: Is it possible to change the current code so that, if a single participant has multiple occurrences of a 3 digit value, for example 111.1, 111.2, 111.3, and 111.4, using the code above would provide the out 111 has occurred 4 times. However, I only want it to state it has appeared once as I'm only interested in a 3 digit level (and not the 4th).
Many thanks

Try this.
declare #t table ( group_code varchar(15))
insert into #t values ('122.2') ,('122.3' ) ,( '122.4' ) ,( '112.6'),( '112.0') , ( '119.1')
SELECT (left(group_code, 3)) as Grop,
count(left(group_code, 3)) as occurrence
from #t
group by left(group_code, 3)
select * from
(
SELECT (left(group_code, 3)) as Grop,
count(left(group_code, 3)) as occurrence
from #t
group by left(group_code, 3)
) a
join #t t on a.Grop = left(t.group_code, 3)

Create Table #T(Id int, Value decimal(16,2))
Insert into #T
Values(1,122.2),(1,122.3),(2,122.2)
Select Id,ROUND(Value,0)
from #T
Group By Id,ROUND(Value,0)

Try this
SELECT (left(group_code, 3)) as Group, count(*) as occurrence
from testDB
group by left(group_code, 3)

EDIT:
According to your last edit in your question it should be this:
DECLARE #tbl TABLE(Participant INT, group_code INT);
INSERT INTO #tbl VALUES
(1,1223)
,(1,1224)
,(1,1123)
,(2,1012)
,(3,0123);
WITH WithGroupingNew AS
(
SELECT tbl.*
,LEFT(CAST(tbl.group_code AS varchar(100)),3) AS NewGroupingCode
FROM #tbl AS tbl
)
,Counted AS
(
SELECT *
,ROW_NUMBER() OVER(PARTITION BY NewGroupingCode ORDER BY group_code) AS Counter
FROM WithGroupingNew
)
SELECT *
FROM Counted
WHERE Counter=1
This as before the edit, might be useful though...
With this CTE you get a field with the information you need:
DECLARE #tbl TABLE(Participant INT, group_code INT);
INSERT INTO #tbl VALUES
(1,1223)
,(1,1224)
,(1,1123)
,(2,1012)
,(3,0123);
WITH WithGroupingNew AS
(
SELECT tbl.*
,LEFT(CAST(tbl.group_code AS varchar(100)),3) AS NewGroupingCode
FROM #tbl AS tbl
)
SELECT * FROM WithGroupingNew
Result
Participant group_code NewGroupingCode
1 1223 122
1 1224 122
1 1123 112
2 1012 101
3 0123 123
It's on you to decide, what you are going to do with this:
SELECT DISTINCT NewGroupingCode FROM WithGroupingNew
or maybe
SELECT DISTINCT Participant,NewGroupingCode FROM WithGroupingNew
or maybe
SELECT *,ROW_NUMBER() OVER(PARTITION BY NewGroupingCode ORDER BY group_code)

Related

Rolling Average in SQL with Partition [duplicate]

declare #t table
(
id int,
SomeNumt int
)
insert into #t
select 1,10
union
select 2,12
union
select 3,3
union
select 4,15
union
select 5,23
select * from #t
the above select returns me the following.
id SomeNumt
1 10
2 12
3 3
4 15
5 23
How do I get the following:
id srome CumSrome
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from #t t1
inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id
SQL Fiddle example
Output
| ID | SOMENUMT | SUM |
-----------------------
| 1 | 10 | 10 |
| 2 | 12 | 22 |
| 3 | 3 | 25 |
| 4 | 15 | 40 |
| 5 | 23 | 63 |
Edit: this is a generalized solution that will work across most db platforms. When there is a better solution available for your specific platform (e.g., gareth's), use it!
The latest version of SQL Server (2012) permits the following.
SELECT
RowID,
Col1,
SUM(Col1) OVER(ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
or
SELECT
GroupID,
RowID,
Col1,
SUM(Col1) OVER(PARTITION BY GroupID ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
This is even faster. Partitioned version completes in 34 seconds over 5 million rows for me.
Thanks to Peso, who commented on the SQL Team thread referred to in another answer.
For SQL Server 2012 onwards it could be easy:
SELECT id, SomeNumt, sum(SomeNumt) OVER (ORDER BY id) as CumSrome FROM #t
because ORDER BY clause for SUM by default means RANGE UNBOUNDED PRECEDING AND CURRENT ROW for window frame ("General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx)
Let's first create a table with dummy data:
Create Table CUMULATIVESUM (id tinyint , SomeValue tinyint)
Now let's insert some data into the table;
Insert Into CUMULATIVESUM
Select 1, 10 union
Select 2, 2 union
Select 3, 6 union
Select 4, 10
Here I am joining same table (self joining)
Select c1.ID, c1.SomeValue, c2.SomeValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Order By c1.id Asc
Result:
ID SomeValue SomeValue
-------------------------
1 10 10
2 2 10
2 2 2
3 6 10
3 6 2
3 6 6
4 10 10
4 10 2
4 10 6
4 10 10
Here we go now just sum the Somevalue of t2 and we`ll get the answer:
Select c1.ID, c1.SomeValue, Sum(c2.SomeValue) CumulativeSumValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Group By c1.ID, c1.SomeValue
Order By c1.id Asc
For SQL Server 2012 and above (much better performance):
Select
c1.ID, c1.SomeValue,
Sum (SomeValue) Over (Order By c1.ID )
From CumulativeSum c1
Order By c1.id Asc
Desired result:
ID SomeValue CumlativeSumValue
---------------------------------
1 10 10
2 2 12
3 6 18
4 10 28
Drop Table CumulativeSum
A CTE version, just for fun:
;
WITH abcd
AS ( SELECT id
,SomeNumt
,SomeNumt AS MySum
FROM #t
WHERE id = 1
UNION ALL
SELECT t.id
,t.SomeNumt
,t.SomeNumt + a.MySum AS MySum
FROM #t AS t
JOIN abcd AS a ON a.id = t.id - 1
)
SELECT * FROM abcd
OPTION ( MAXRECURSION 1000 ) -- limit recursion here, or 0 for no limit.
Returns:
id SomeNumt MySum
----------- ----------- -----------
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
Late answer but showing one more possibility...
Cumulative Sum generation can be more optimized with the CROSS APPLY logic.
Works better than the INNER JOIN & OVER Clause when analyzed the actual query plan ...
/* Create table & populate data */
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL
DROP TABLE #TMP
SELECT * INTO #TMP
FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
UNION
SELECT 4 AS id
UNION
SELECT 5 AS id
) Tab
/* Using CROSS APPLY
Query cost relative to the batch 17%
*/
SELECT T1.id,
T2.CumSum
FROM #TMP T1
CROSS APPLY (
SELECT SUM(T2.id) AS CumSum
FROM #TMP T2
WHERE T1.id >= T2.id
) T2
/* Using INNER JOIN
Query cost relative to the batch 46%
*/
SELECT T1.id,
SUM(T2.id) CumSum
FROM #TMP T1
INNER JOIN #TMP T2
ON T1.id > = T2.id
GROUP BY T1.id
/* Using OVER clause
Query cost relative to the batch 37%
*/
SELECT T1.id,
SUM(T1.id) OVER( PARTITION BY id)
FROM #TMP T1
Output:-
id CumSum
------- -------
1 1
2 3
3 6
4 10
5 15
Select
*,
(Select Sum(SOMENUMT)
From #t S
Where S.id <= M.id)
From #t M
You can use this simple query for progressive calculation :
select
id
,SomeNumt
,sum(SomeNumt) over(order by id ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as CumSrome
from #t
There is a much faster CTE implementation available in this excellent post:
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx
The problem in this thread can be expressed like this:
DECLARE #RT INT
SELECT #RT = 0
;
WITH abcd
AS ( SELECT TOP 100 percent
id
,SomeNumt
,MySum
order by id
)
update abcd
set #RT = MySum = #RT + SomeNumt
output inserted.*
For Ex: IF you have a table with two columns one is ID and second is number and wants to find out the cumulative sum.
SELECT ID,Number,SUM(Number)OVER(ORDER BY ID) FROM T
Once the table is created -
select
A.id, A.SomeNumt, SUM(B.SomeNumt) as sum
from #t A, #t B where A.id >= B.id
group by A.id, A.SomeNumt
order by A.id
The SQL solution wich combines "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "SUM" did exactly what i wanted to achieve.
Thank you so much!
If it can help anyone, here was my case. I wanted to cumulate +1 in a column whenever a maker is found as "Some Maker" (example). If not, no increment but show previous increment result.
So this piece of SQL:
SUM( CASE [rmaker] WHEN 'Some Maker' THEN 1 ELSE 0 END)
OVER
(PARTITION BY UserID ORDER BY UserID,[rrank] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Cumul_CNT
Allowed me to get something like this:
User 1 Rank1 MakerA 0
User 1 Rank2 MakerB 0
User 1 Rank3 Some Maker 1
User 1 Rank4 Some Maker 2
User 1 Rank5 MakerC 2
User 1 Rank6 Some Maker 3
User 2 Rank1 MakerA 0
User 2 Rank2 SomeMaker 1
Explanation of above: It starts the count of "some maker" with 0, Some Maker is found and we do +1. For User 1, MakerC is found so we dont do +1 but instead vertical count of Some Maker is stuck to 2 until next row.
Partitioning is by User so when we change user, cumulative count is back to zero.
I am at work, I dont want any merit on this answer, just say thank you and show my example in case someone is in the same situation. I was trying to combine SUM and PARTITION but the amazing syntax "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" completed the task.
Thanks!
Groaker
Above (Pre-SQL12) we see examples like this:-
SELECT
T1.id, SUM(T2.id) AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < = T1.id
GROUP BY
T1.id
More efficient...
SELECT
T1.id, SUM(T2.id) + T1.id AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < T1.id
GROUP BY
T1.id
Try this
select
t.id,
t.SomeNumt,
sum(t.SomeNumt) Over (Order by t.id asc Rows Between Unbounded Preceding and Current Row) as cum
from
#t t
group by
t.id,
t.SomeNumt
order by
t.id asc;
Try this:
CREATE TABLE #t(
[name] varchar NULL,
[val] [int] NULL,
[ID] [int] NULL
) ON [PRIMARY]
insert into #t (id,name,val) values
(1,'A',10), (2,'B',20), (3,'C',30)
select t1.id, t1.val, SUM(t2.val) as cumSum
from #t t1 inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.val order by t1.id
Without using any type of JOIN cumulative salary for a person fetch by using follow query:
SELECT * , (
SELECT SUM( salary )
FROM `abc` AS table1
WHERE table1.ID <= `abc`.ID
AND table1.name = `abc`.Name
) AS cum
FROM `abc`
ORDER BY Name

SQL remove duplicates at ID and Month level

I have a table that is something like this:
ID Date Name Age
1 10/04/2015 Theja 24
1 28/04/2015 Theja1 26
1 14/07/2015 Theja2 45
1 30/07/2015 Theja2 45
1 30/08/2015 Theja3 54
2 10/04/2016 Jaya 23
2 28/04/2016 Jaya 23
2 14/05/2016 Jaya1 65
2 30/05/2016 Jaya1 65
But i want output like:
ID Date Name Age
1 28/04/2015 Theja1 26
1 01/05/2015 Theja1 26
1 01/06/2015 Theja1 26
1 30/07/2015 Theja2 45
1 30/08/2015 Theja3 54
2 28/04/2016 Jaya 23
2 30/05/2016 Jaya1 65
Consider 1 record per each month which is max and if any missing months for ID then consider previous records fill for missing months.
Different databases have different methods for handling dates. The following is an ANSI-standard way of getting one row per month:
select id, min(date)
from t
group by id,
extract(year from date), extract(month from date);
I have tried for a solution and came with the following, but you need a calendar table to insert missing rows in the output.
SQL Server Based solution given here
Data Setup:
create table temptable (
id int,
[date] date,
name varchar (50),
age int
);
insert into temptable values
(1,'04-10-2015','Theja',24)
insert into temptable values
(1,'04-28-2015','Theja1',26)
insert into temptable values
(1,'07-14-2015','Theja2',45)
insert into temptable values
(1,'07-30-2015','Theja2',45)
insert into temptable values
(1,'08-30-2015','Theja3',54)
insert into temptable values
(2,'04-10-2016','Jaya',23)
insert into temptable values
(2,'04-28-2016','Jaya',23)
insert into temptable values
(2,'05-14-2016','Jaya1',65)
insert into temptable values
(2,'05-30-2016','Jaya1',65)
The following solution completes till duplicate issue. but to get missing rows you need to implement the calendar table.
you can join with the calendar table and then use output of the cte3 to get missing data.
with cte1 as (
select *,
row_number() over ( partition by month([date]) order by [date]) as rownm,
concat(id,format([date],'MMyyyy')) as unqcol
from temptable
) , cte2 as
(
select unqcol, max(rownm) as maxdt
from cte1
group by unqcol
), cte3 as
( select a.*, lead(a.[date]) over (partition by a.id order by a.id,a.[date]) as NextDate from
cte1 a inner join cte2 b
on a.unqcol=b.unqcol and a.rownm=b.maxdt
)
select c.id,c.[date],c.name,c.age,c.NextDate from cte3 c
order by c.[date]

SQL get consecutive days ignoring weekend

I have a table with following format:
ID ID1 ID2 DATE
1 1 1 2018-03-01
2 1 1 2018-03-02
3 1 1 2018-03-05
4 1 1 2018-03-06
5 1 1 2018-03-07
6 2 2 2018-03-05
7 2 2 2018-03-05
8 2 2 2018-03-06
9 2 2 2018-03-07
10 2 2 2018-03-08
From this table I have to get all records where ID1 and ID2 are the same in that column and where DATE is 5 consecutive work days (5 dates in a row, ignoring missing dates for Saturday/Sunday; ignore holidays).
I have really no idea how to achieve this. I did search around, but couldn't find anything that helped me. So my question is, how can I achieve following output?
ID ID1 ID2 DATE
1 1 1 2018-03-01
2 1 1 2018-03-02
3 1 1 2018-03-05
4 1 1 2018-03-06
5 1 1 2018-03-07
SQLFiddle to mess around
Assuming you have no duplicates and work is only on weekdays, then there is a simplish solution for this particular case. We can identify the date 4 rows ahead. For a complete week, it is either 4 days ahead or 6 days ahead:
select t.*
from (select t.*, lead(dat, 4) over (order by id2, dat) as dat_4
from t
) t
where datediff(day, dat, dat_4) in (4, 6);
This happens to work because you are looking for a complete week.
Here is the SQL Fiddle.
select t.* from
(select id1,id2,count(distinct dat) count from t
group by id1,id2
having count(distinct dat)=5) t1 right join
t
on t.id1=t1.id1 and t.id2=t1.id2
where count=5
Check this-
Dates of Two weeks with 10 valid dates
http://sqlfiddle.com/#!18/76556/1
Dates of Two weeks with 10 non-unique dates
http://sqlfiddle.com/#!18/b4299/1
and
Dates of Two weeks with less than 10 but unique
http://sqlfiddle.com/#!18/f16cb/1
This query is very verbose without LEAD or LAG and it is the best I could do on my lunch break. You can probably improve on it given the time.
DECLARE #T TABLE
(
ID INT,
ID1 INT,
ID2 INT,
TheDate DATETIME
)
INSERT #T SELECT 1,1,1,'03/01/2018'
INSERT #T SELECT 2,1,1,'03/02/2018'
INSERT #T SELECT 3,1,1,'03/05/2018'
INSERT #T SELECT 4,1,1,'03/06/2018'
INSERT #T SELECT 5,1,1,'03/07/2018'
--INSERT #T SELECT 5,1,1,'03/09/2018'
INSERT #T SELECT 6,2,2,'03/02/2018'
INSERT #T SELECT 7,2,2,'03/05/2018'
INSERT #T SELECT 8,2,2,'03/05/2018'
--INSERT #T SELECT 9,2,2,'03/06/2018'
INSERT #T SELECT 10,2,2,'03/07/2018'
INSERT #T SELECT 11,2,2,'03/08/2018'
INSERT #T SELECT 12,2,2,'03/15/2018'
INSERT #T SELECT 13,1,1,'04/01/2018'
INSERT #T SELECT 14,1,1,'04/02/2018'
INSERT #T SELECT 15,1,1,'04/05/2018'
--SELECT * FROM #T
DECLARE #LowDate DATETIME = DATEADD(DAY,-1,(SELECT MIN(TheDate) FROM #T))
DECLARE #HighDate DATETIME = DATEADD(DAY,1,(SELECT MAX(TheDate) FROM #T))
DECLARE #DaysThreshold INT = 5
;
WITH Dates AS
(
SELECT DateValue=#LowDate
UNION ALL
SELECT DateValue + 1 FROM Dates
WHERE DateValue + 1 < #HighDate
),
Joined AS
(
SELECT * FROM Dates LEFT OUTER JOIN #T T ON T.TheDate=Dates.DateValue
),
Calculations AS
(
SELECT
ID=MAX(J1.ID),
J1.ID1,J1.ID2,
J1.TheDate,
LastDate=MAX(J2.TheDate),
LastDateWasWeekend = CASE WHEN ((DATEPART(DW,DATEADD(DAY,-1,J1.TheDate) ) + ##DATEFIRST) % 7) NOT IN (0, 1) THEN 0 ELSE 1 END,
Offset = DATEDIFF(DAY,MAX(J2.TheDate),J1.TheDate)
FROM
Joined J1
LEFT OUTER JOIN Joined J2 ON J2.ID1=J1.ID1 AND J2.ID2=J1.ID2 AND J2.TheDate<J1.TheDate
WHERE
NOT J1.ID IS NULL
GROUP BY J1.ID1,J1.ID2,J1.TheDate
)
,FindValid AS
(
SELECT
ID,ID1,ID2,TheDate,
IsValid=CASE
WHEN LastDate=TheDate THEN 0
WHEN LastDate IS NULL THEN 1
WHEN Offset=1 THEN 1
WHEN Offset>3 THEN 0
WHEN Offset<=3 THEN
LastDateWasWeekend
END
FROM
Calculations
UNION
SELECT DISTINCT ID=NULL,ID1,ID2, TheDate=#HighDate,IsValid=0 FROM #T
),
FindMax As
(
SELECT
This.ID,This.ID1,This.ID2,This.TheDate,MaxRange=MIN(Next.TheDate)
FROM
FindValid This
LEFT OUTER JOIN FindValid Next ON Next.ID2=This.ID2 AND Next.ID1=This.ID1 AND This.TheDate<Next.TheDate AND Next.IsValid=0
GROUP BY
This.ID,This.ID1,This.ID2,This.TheDate
),
FindMin AS
(
SELECT
This.ID,This.ID1,This.ID2,This.TheDate,This.MaxRange,MinRange=MIN(Next.TheDate)
FROM
FindMax This
LEFT OUTER JOIN FindMax Next ON Next.ID2=This.ID2 AND Next.ID1=This.ID1 AND This.TheDate<Next.MaxRange-- AND Next.IsValid=0 OR Next.TheDate IS NULL
GROUP BY
This.ID,This.ID1,This.ID2,This.TheDate,This.MaxRange
)
,Final AS
(
SELECT
ID1,ID2,MinRange,MaxRange,SequentialCount=COUNT(*)
FROM
FindMin
GROUP BY
ID1,ID2,MinRange,MaxRange
)
SELECT
T.ID,
T.ID1,
T.ID2,
T.TheDate
FROM #T T
INNER JOIN Final ON T.TheDate>= Final.MinRange AND T.TheDate < Final.MaxRange AND T.ID1=Final.ID1 AND T.ID2=Final.ID2
WHERE
SequentialCount>=#DaysThreshold
OPTION (MAXRECURSION 0)

SQL Server - Remove duplicates with different ordering

I have a table containing pairs of items bought together and the # of times the pairing occurred.
item_1 item_2 count
123 234 5
345 567 22
567 345 22
890 345 6
Some of the pairings are dupes that differ just by order (ie rows 2&3).
Is there an easy way to de-dupe this table?
If the "dups" can appear only once in either direction, then a convenient way is:
select t.*
from t
where t.item_1 <= t.item_2
union all
select t.*
from t t
where t.item_1 > t.item2 and
not exists (select 1
from t t2
where t2.item_1 = t.item_2 and t.item_2 = t.item_1 and t2.count = t.count
);
You can use this script.
DECLARE #T TABLE (item_1 INT, item_2 INT , [count] INT)
INSERT INTO #T
VALUES
(123 ,234, 5),
(345 ,567, 22),
(567 ,345, 22),
(890 ,345, 6)
;WITH BASE AS
(
SELECT RN = ROW_NUMBER() OVER(ORDER BY item_1), * FROM #T
)
SELECT T1.item_1, T1.item_2, T1.count FROM BASE T1
OUTER APPLY (SELECT TOP 1 *
FROM BASE T2
WHERE T2.RN > T1.RN AND T1.item_1 = T2.item_2 AND T1.item_2 = T2.item_1) X
WHERE X.RN IS NULL
Result
item_1 item_2 count
----------- ----------- -----------
123 234 5
567 345 22
890 345 6
You can classify a pair being the same with comparison similar to the least and greatest of the two. And select one of them.
select item_1,item_2,count
from (select t.*
,row_number() over(partition by case when item_1<item_2 then item_1 else item_2 end,
case when item_1>item_2 then item_1 else item_2 end
order by item_1) as rnum
from tbl t
) t
where rnum=1
Edit: Per Gordon's comment, if the duplicates have to eliminated only when the count is the same, use
select item_1,item_2,count
from (select t.*
,row_number() over(partition by case when item_1<item_2 then item_1 else item_2 end,
case when item_1>item_2 then item_1 else item_2 end,
count
order by item_1) as rnum
from tbl t
) t
where rnum=1

TSQL Sweepstakes Script

I need to run a sweepstakes script to get X amount of winners from a customers table. Each customer has N participations. The table looks like this
CUSTOMER-A 5
CUSTOMER-B 8
CUSTOMER-C 1
I can always script to have CUSTOMER-A,B and C inserted 5, 8 and 1 times respectively in a temp table and then select randomly using order by newid() but would like to know if there's a more elegant way to address this.
(Update: Added final query.)
(Update2: Added single query to avoid temp table.)
Here's the hard part using a recursive CTE plus the final query that shows "place".
Code
DECLARE #cust TABLE (
CustomerID int IDENTITY,
ParticipationCt int
)
DECLARE #list TABLE (
CustomerID int,
RowNumber int
)
INSERT INTO #cust (ParticipationCt) VALUES (5)
INSERT INTO #cust (ParticipationCt) VALUES (8)
INSERT INTO #cust (ParticipationCt) VALUES (1)
INSERT INTO #cust (ParticipationCt) VALUES (3)
INSERT INTO #cust (ParticipationCt) VALUES (4)
SELECT * FROM #cust
;WITH t AS (
SELECT
lvl = 1,
CustomerID,
ParticipationCt
FROM #Cust
UNION ALL
SELECT
lvl = lvl + 1,
CustomerID,
ParticipationCt
FROM t
WHERE lvl < ParticipationCt
)
INSERT INTO #list (CustomerID, RowNumber)
SELECT
CustomerID,
ROW_NUMBER() OVER (ORDER BY NEWID())
FROM t
--<< All rows
SELECT * FROM #list ORDER BY RowNumber
--<< All customers by "place"
SELECT
CustomerID,
ROW_NUMBER() OVER (ORDER BY MIN(RowNumber)) AS Place
FROM #list
GROUP BY CustomerID
Results
CustomerID ParticipationCt
----------- ---------------
1 5
2 8
3 1
4 3
5 4
CustomerID RowNumber
----------- -----------
4 1
1 2
1 3
2 4
1 5
5 6
2 7
2 8
4 9
2 10
2 11
2 12
1 13
5 14
5 15
3 16
5 17
1 18
2 19
2 20
4 21
CustomerID Place
----------- -----
4 1
1 2
2 3
5 4
3 5
Single Query with No Temp Table
It is possible to get the answer with a single query that does not use a temp table. This works fine, but I personally like the temp table version better so you can validate the interim results.
Code (Single Query)
;WITH List AS (
SELECT
lvl = 1,
CustomerID,
ParticipationCt
FROM #Cust
UNION ALL
SELECT
lvl = lvl + 1,
CustomerID,
ParticipationCt
FROM List
WHERE lvl < ParticipationCt
),
RandomOrder AS (
SELECT
CustomerID,
ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNumber
FROM List
)
SELECT
CustomerID,
ROW_NUMBER() OVER (ORDER BY MIN(RowNumber)) AS Place
FROM RandomOrder
GROUP BY CustomerID
try this:
Select Top X CustomerId
From (Select CustomerId,
Rand(CustomerId) *
Count(*) /
(Select Count(*)
From Table) Sort
From Table
Group By CustomerId) Z
Order By Sort Desc
EDIT: abovbe assumed multiple rows per customer, one row per participation... Sorry, following assumes one row per customer, with column Participations holding number of participations for that customer.
Select Top 23 CustomerId
From ( Select CustomerId,
Participations - RAND(CustomerId) *
(Select SUM(Participations ) From customers) sort
from customers) Z
Order By sort desc