Complex grouping - design / performance problem - sql

WARNING : This is one BIG Question
I have a design problem that started simple, but in one step of growth has stumped me completely.
The simple version of reality has a nice flat fact table...
All names have been changed to protect the innocent
CREATE TABLE raw_data (
tier0_id INT, tier1_id INT, tier2_id INT, tier3_id INT,
metric0 INT, metric1 INT, metric2 INT, metric3 INT
)
The tierIDs relate to entities in a fixed depth tree. Such as a business hierarchy.
The metrics are just performance figures, such as number of frogs captured, or pigeons released.
In the reporting the kindly user would make selections to mean something like the following:
tier0_id's 34 and 55 - shown separately
all of tier1_id's - grouped together
all of tier2_id's - grouped together
all of tier3_id's - shown separately
metrics 2 and 3
This gives me the following type of query:
SELECT
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END AS tier0_id,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END AS tier1_id,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END AS tier2_id,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END AS tier3_id,
SUM(metric2) AS metric2, SUM(metric3) AS metric3
FROM
raw_data
INNER JOIN tier0_values ON tier0_values.id = raw_data.tier0_id OR tier0_values.id IS NULL
INNER JOIN tier1_values ON tier1_values.id = raw_data.tier1_id OR tier1_values.id IS NULL
INNER JOIN tier2_values ON tier2_values.id = raw_data.tier2_id OR tier2_values.id IS NULL
INNER JOIN tier3_values ON tier3_values.id = raw_data.tier3_id OR tier3_values.id IS NULL
GROUP BY
CASE WHEN #t0_grouping = 1 THEN NULL ELSE tier0_id END,
CASE WHEN #t1_grouping = 1 THEN NULL ELSE tier1_id END,
CASE WHEN #t2_grouping = 1 THEN NULL ELSE tier2_id END,
CASE WHEN #t3_grouping = 1 THEN NULL ELSE tier3_id END
It's a nice hybrid of Dynamic SQL, and parametrised queries. And yes, I know, but SQL-CE makes people do strange things. Besides, that can be tidied up as and when the following change gets incorporated...
From now on, we need to be able to include NULLs in the different tiers. This will mean "applies to ALL entities in that tier".
For example, with the following very simplified data:
Activity WorkingTime ActiveTime BusyTime
1 0m 10m 0m
2 0m 15m 0m
3 0m 20m 0m
NULL 60m 0m 45m
WorkingTime never applies to an activity, so al the values go in with a NULL ID. But ActiveTime is specifically about a specific activity, so it goes in with a legitimate ID. BusyTime is also against a NULL activity because it's the cumulation of all the ActiveTime.
If one were to report on this data, the NULL values -always- get included in every row, because the NULL -means- "applies to everything". The data would look like...
Activity WorkingTime ActiveTime BusyTime (BusyOnOtherActivities)
1 60m 10m 45m (45-10 = 35m)
2 60m 15m 45m (45-15 = 30m)
3 60m 20m 45m (45-20 = 25m)
1&2 60m 25m 45m (45-25 = 20m)
1&3 60m 30m 45m (45-30 = 15m)
2&3 60m 35m 45m (45-35 = 10m)
ALL 60m 45m 45m (45-45 = 0m)
Hopefully this example makes sense, because it's actually a multi-tiered hierarchy (as per the original example), and in every tier NULLs are allowed. So I'll try an example with 3 tiers...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 0 10 0 0 0
1 4 10 | 0 15 0 0 0
1 5 10 | 0 20 0 0 0
1 NULL 10 | 60 0 45 0 0
2 3 10 | 0 5 0 0 0
2 5 10 | 0 10 0 0 0
2 6 10 | 0 15 0 0 0
2 NULL 10 | 50 0 30 0 0
1 3 11 | 0 7 0 0 0
1 4 11 | 0 8 0 0 0
1 5 11 | 0 9 0 0 0
1 NULL 11 | 30 0 24 0 0
2 3 11 | 0 8 0 0 0
2 5 11 | 0 10 0 0 0
2 6 11 | 0 12 0 0 0
2 NULL 11 | 40 0 30 0 0
NULL NULL 10 | 0 0 0 60 0
NULL NULL 11 | 0 0 0 60 0
NULL NULL NULL | 0 0 0 0 2
This would give many, many possible different output records in the reporting, but here are a few examples...
t0_id | t1_id | t2_id | m1 | m2 | m3 | m4 | m5
1 3 10 | 60 10 45 60 2
1 4 10 | 60 15 45 60 2
1 5 10 | 60 20 45 60 2
2 3 10 | 50 5 30 60 2
2 5 10 | 50 10 30 60 2
2 6 10 | 50 15 30 60 2
1 ALL 10 | 60 45 45 60 2
2 ALL 10 | 50 30 30 60 2
ALL 3 10 | 110 15 75 60 2
ALL 4 10 | 60 15 45 60 2
ALL 5 10 | 110 30 75 60 2
ALL 6 10 | 50 15 30 60 2
ALL 3 ALL | 180 30 129 120 2
ALL 4 ALL | 90 23 69 120 2
ALL 5 ALL | 180 49 129 120 2
ALL 6 ALL | 90 27 60 120 2
ALL ALL 10 | 110 129 129 60 2
ALL ALL 11 | 70 129 129 60 2
ALL ALL ALL | 180 129 129 120 2
1 3&4 ALL | 90 40 69 120 2
ALL 3&4 ALL | 180 53 129 120 2
As messy as this is to explain, it makes complete and logical sense in my head. I understand what is being asked, but for the life of me I can not seem to write a query for this that doesn't take excruciating amounts of time to execute.
So, how would you write such a query, and/or refactor the schema?
I appreciate that people will ask for examples of what I've done so far, but I'm eager to hear other people's uncorrupted ideas and advice first ;)

The problem looks more like a normalization activity. I would start with normalizing the table
to something like: (You may need some more identity fields depending on your usage)
CREATE TABLE raw_data (
rawData_ID INT,
Activity_id INT,
metric0 INT)
I'd create a tiering table that looks something like: (tierplan allows for multiple groupings. If a tier_id has no parent to roll up under, then tierparent_id is NULL This alllows for recursion in the query.)
CREATE TABLE tiers (
tierplan_id INT,
tier_id INT,
tierparent_id INT)
Finally, I'd create a table that relates tiers and Activities something like:
CREATE TABLE ActivTiers (
Activplan_id INT, --id on the table
tierplan_id INT, --tells what tierplan the raw_data falls under
rawdata_id INT) --this allows the ActivityId to be payload instead of identifier.
Queries off of this ought to be "not too difficult."

Related

Properly 'Joining' two Cross Applies

I've got a query with three Cross-Applies that gather data from three different tables. The first Cr-Ap assists the 2nd and 3rd Cr-Ap's. It finds the most recent ID of a certain refill for a 'cartridge', the higher the ID the more recent the refill.
The second and third Cr-Ap's gather the SUMS of items that have been refilled and items that have been dispensed under the most recent Refill.
If I run the query for Cr-Ap 2 or 3 separately the output would look something like:
ID Amount
1 100
2 1000
3 100
4 0
5 0
etc
Amount would be either the amount of dispensed or refilled items.
Only I don't want to run these queries separately, I want them next to each other.
So what I want is a table that looks like this:
ID Refill Dispense
1 100 1
2 1000 5
3 100 7
4 0 99
5 0 3
etc
My gut tells me to do
INNER JOIN crossaply2 ON crossapply3.ID = crossapply2.ID
But this doesn't work. I'm still new to SQL so I don't exactly know what I can and can't join, what I do know is that you can use crossapply as a join (sorta?). I think that might be what I need to do here, I just don't know how.
But that's not it, there's another complication, there are certain refills where nothing gets dispensed. In these scenarios the crossapply I wrote for dispenses won't return anything for that refillID. With nothing I don't mean NULL, I mean it just skips the refillID. But I'd like to see a 0 in those cases. Because it just skips over those ID's I can't get COALESCE or ISNULL to work, this might also complicate the joining of these two applies. Because an INNER JOIN would skip any line where there is no Dispensed amount, even though there is a Refilled amount Id like to see.
Here is my code:
-- Dispensed SUM and Refilled SUM combined
SELECT [CartridgeRefill].[FK_CartridgeRegistration_Id]
,Refills.Refilled
,Dispenses.Dispensed
FROM [CartridgeRefill]
CROSS APPLY(
SELECT MAX([CartridgeRefill].[Id]) AS RecentRefillID
FROM [CartridgeRefill]
GROUP BY [CartridgeRefill].[FK_CartridgeRegistration_Id]
) AS RecentRefill
CROSS APPLY(
SELECT [CartridgeRefill].[FK_CartridgeRegistration_Id] AS RefilledID
,SUM([CartridgeRefillMedication].[Amount]) AS Refilled
FROM [CartridgeRefillMedication]
INNER JOIN [CartridgeRefill] ON [CartridgeRefillMedication].[FK_CartridgeRefill_Id] = [CartridgeRefill].[Id]
WHERE [CartridgeRefillMedication].[FK_CartridgeRefill_Id] = RecentRefill.RecentRefillID
GROUP BY [CartridgeRefill].[FK_CartridgeRegistration_Id]
) AS Refills
CROSS APPLY(
SELECT [CartridgeRefill].[FK_CartridgeRegistration_Id] AS DispensedID
,SUM([CartridgeDispenseAttempt].[Amount]) AS Dispensed
FROM [CartridgeDispenseAttempt]
INNER JOIN [CartridgeRefill] ON [CartridgeDispenseAttempt].[FK_CartridgeRefill_Id] = [CartridgeRefill].[Id]
WHERE [CartridgeDispenseAttempt].[FK_CartridgeRefill_Id] = RecentRefill.RecentRefillID
GROUP BY [CartridgeRefill].[FK_CartridgeRegistration_Id]
) AS Dispenses
GO
The output of this code is as follows:
1 300 1
1 300 1
1 200 194
1 200 194
1 200 8
1 200 8
1 0 39
1 0 39
1 100 14
1 100 14
1 200 1
1 200 1
1 0 28
1 0 28
1 1000 102
1 1000 102
1 1000 557
1 1000 557
1 2000 92
1 2000 92
1 100 75
1 100 75
1 100 100
1 100 100
1 100 51
1 100 51
1 600 28
1 600 28
1 200 47
1 200 47
1 200 152
1 200 152
1 234 26
1 234 26
1 0 227
1 0 227
1 10 6
1 10 6
1 300 86
1 300 86
1 0 194
1 0 194
1 500 18
1 500 18
1 1000 51
1 1000 51
1 1000 56
1 1000 56
1 500 48
1 500 48
1 0 10
1 0 10
1 1500 111
1 1500 111
1 56 79
1 56 79
1 100 6
1 100 6
1 44 134
1 44 134
1 1000 488
1 1000 488
1 100 32
1 100 32
1 100 178
1 100 178
1 500 672
1 500 672
1 200 26
1 200 26
1 500 373
1 500 373
1 100 10
1 100 10
1 900 28
1 900 28
2 900 28
2 900 28
2 900 28
etc
It is total nonsense that I can't do much with, it goes on for about 20k lines and goes through all the ID's, eventually.
Any help is more than appreciated :)
Looks like overcomplicated a bit.
Try
WITH cr AS (
SELECT [FK_CartridgeRegistration_Id]
,MAX([CartridgeRefill].[Id]) RecentRefillID
FROM [CartridgeRefill]
GROUP BY [FK_CartridgeRegistration_Id]
)
SELECT cr.[FK_CartridgeRegistration_Id], Refills.Refilled, Dispenses.Dispensed
FROM cr
CROSS APPLY(
SELECT SUM(crm.[Amount]) AS Refilled
FROM [CartridgeRefillMedication] crm
WHERE crm.[FK_CartridgeRefill_Id] = cr.RecentRefillID
) AS Refills
CROSS APPLY(
SELECT SUM(cda.[Amount]) AS Dispensed
FROM [CartridgeDispenseAttempt] cda
WHERE cda.[FK_CartridgeRefill_Id] = cr.RecentRefillID
) AS Dispenses;

How to write the query to make report by month in sql

I have the receiving and sending data for whole year. so i want to built the monthly report base on that data with the rule is Fisrt in first out. It means is the first receiving will be sent out first ...
DECLARE #ReceivingTbl AS TABLE(Id INT,ProId int, RecQty INT,ReceivingDate DateTime)
INSERT INTO #ReceivingTbl
VALUES (1,1001,210,'2019-03-12'),
(2,1001,315,'2019-06-15'),
(3,2001,500,'2019-04-01'),
(4,2001,10,'2019-06-15'),
(5,1001,105,'2019-07-10')
DECLARE #SendTbl AS TABLE(Id INT,ProId int, SentQty INT,SendMonth int)
INSERT INTO #SendTbl
VALUES (1,1001,50,3),
(2,1001,100,4),
(3,1001,80,5),
(4,1001,80,6),
(5,2001,200,6)
SELECT * FROM #ReceivingTbl ORDER BY ProId,ReceivingDate
SELECT * FROM #SendTbl ORDER BY ProId,SendMonth
Id ProId RecQty ReceivingDate
1 1001 210 2019-03-12
2 1001 315 2019-06-15
5 1001 105 2019-07-10
3 2001 500 2019-04-01
4 2001 10 2019-06-15
Id ProId SentQty SendMonth
1 1001 50 3
2 1001 100 4
3 1001 80 5
4 1001 80 6
5 2001 200 6
--- And the below is what i want:
Id ProId RecQty ReceivingDate ... Mar Apr May Jun
1 1001 210 2019-03-12 ... 50 100 60 0
2 1001 315 2019-06-15 ... 0 0 20 80
5 1001 105 2019-07-10 ... 0 0 0 0
3 2001 500 2019-04-01 ... 0 0 0 200
4 2001 10 2019-06-15 ... 0 0 0 0
Thanks!
Your question is not clear to me.
If you want to purely use the FIFO approach, therefore ignore any data the table contains, you necessarely need to order by ID, which in your example you are providing, and looks like it is in order of insert.
The first line inserted should be also the first line appearing in the select (FIFO), in order to do so you have to use:
ORDER BY Id ASC
Which will place the lower value of the ID first (1, 2, 3, ...)
To me though, this doesn't make much sense, so pay attention to the meaning o the data you actually have and leverage dates like ReceivingDate, and order by that, maybe even filtering by month of the date, below an example for January data:
WHERE MONTH(ReceivingDate) = 1

return the count of row even if null sql server

I trying to do a sql query to get the count for shift for each user
I used this query :
SELECT
COUNT(s.id) AS count, s.user_id
FROM
sarcuser AS u
INNER JOIN
sarcshiftpointuser AS s ON s.user_id = u.id
INNER JOIN
sarcalllevel AS l ON l.id = u.levelid
INNER JOIN
sarcshiftpointtable AS t ON t.shift_id = s.shift_id AND s.table_id = t.table_id
WHERE
(s.shift_id + '' LIKE '2')
AND (CAST(s.xdate AS DATE) BETWEEN CAST(N'2014-01-01' AS DATE) AND CAST(N'2015-01-01' AS DATE))
AND (u.gender + '' LIKE N'%')
AND (u.levelid + '' LIKE N'%')
AND (s.point_id + '' LIKE '2')
GROUP BY
s.user_id
ORDER BY
count
It works very well ... but there is a logic problem :
when the user didn't appear in the shift didn't return the count and I need it to return 0
For example :
user1 user2
shift1 2 2
shift2 5 0
shift3 6 10
but actually the code returns :
user1 user2
shift1 2 2
shift2 5 10
shift3 6
and that's wrong ... how to return the count even if it zero with this condition and this inner join ?
Sample for data in table :
sarcuser :
id firstname lastname gender levelid
52 samy sammour male 1
62 ibrahim jackob male 1
71 rebeca janson female 3
sarcalllevel :
id name
1 field leader
2 leader
3 paramdic
sarcshiftpointtable :
id shift_id table_id name_of_shift point_id
1 1 1 shift1 2
2 2 1 shift2 2
3 3 1 shift3 2
4 1 2 shift1 7
5 2 2 shift2 7
6 3 2 shift3 7
sarcshiftpointuser :
id point_id shift_id table_id user_id xdate
1 2 1 1 62 2014-01-05
2 2 1 1 0 2014-01-05
3 2 1 1 71 2014-01-05
4 2 2 1 0 2014-01-05
5 2 2 1 0 2014-01-05
6 2 2 1 52 2014-01-05
7 2 3 1 52 2014-01-05
8 2 3 1 62 2014-01-05
9 2 3 1 71 2014-01-05
10 2 1 1 71 2014-01-06
11 2 1 1 52 2014-01-06
12 2 1 1 0 2014-01-06
13 2 2 1 62 2014-01-06
14 2 2 1 0 2014-01-06
15 2 2 1 52 2014-01-06
16 2 3 1 62 2014-01-06
17 2 3 1 52 2014-01-06
18 2 3 1 71 2014-01-06
if i apply this query 3 times by changing the shift should return :
52 62 71
shift1 1 2 2
shift2 2 1 0
shift3 2 2 2
in shift2 in sarcshiftpointuser the user 71 is not appear
so when I do the code it will return just to field not three ? the count 0 is not returned
52 62 71
shift2 2 1
to be more specific :
I need to export this table into excel so when the 0 is not return it give me a wrong order and wrong value (logically )
You will need to use a nested query using IFNULL
Take a look to this
http://www.w3schools.com/sql/sql_isnull.asp
Something like,
IFNULL(user,0)
I think you are referring a crosstab query. you can use PIVOT to return your result set. Please refer below link.
Sql Server 2008 Cross Tab Query.
If you give few sample data for sarcuser , sarcshiftpointuser, sarcalllevel & sarcshiftpointtable tables, then we can give you a better answer.

SQL terminology to combine a NOT EXIST query with latest value

I am a beginner with basic knowledge.
I have a single table that I am trying to pull all UID's that have not had a particular code in the table within the past year.
My table looks like this: (but much larger of course)
FACID DPID EID DID UID DT Code Units Charge ET Ord
1 1 6 2 1002 15-Mar-07 99204 1 180 09:36.7 1
1 1 7 5 10004 15-Mar-07 99213 1 68 02:36.9 1
1 1 24 55 25887 15-Mar-07 99213 1 68 43:55.3 1
1 1 25 2 355688 15-Mar-07 99213 1 68 53:20.2 1
1 1 26 5 555654 15-Mar-07 99213 1 68 42:22.6 1
1 1 27 44 135514 15-Mar-07 99213 1 68 00:36.8 1
1 1 28 2 3244522 15-Mar-07 99214 1 98 34:59.4 1
1 1 29 5 235445 15-Mar-07 99213 1 68 56:42.1 1
1 1 30 3 3214444 15-Mar-07 99213 1 68 54:56.5 1
1 1 33 1 221444 15-Mar-07 99204 1 180 37:44.5 1
I am attempting to use the following, but this is not working for my time frame limits.
select distinct UID from PtProcTbl
where DT<'20120101'
and NOT EXISTS (Select Distinct UID
where Code in ('99203','99204','99205','99213',
'99214','99215','99244','99245'))
I need to know how to make sure the UID's that I am pulling are the ones don't have a DT after the 1/1/2012 cut off date that contains one of the NOT Exists codes.
The above query returned UID's that actually dates after 1/1/2012 that does contain one of the above codes...
Not sure what I am doing wrong or if I am totally off base on this..
Thanks in advance.
Are you sure you need the NOT EXISTS? How about instead:
AND Code NOT IN ('99203','99204','99205','99213','99214','99215','99244','99245')

Generate all combinations in SQL

I need to generate all combinations of size #k in a given set of size #n. Can someone please review the following SQL and determine first if the following logic is returning the expected results, and second if is there a better way?
/*CREATE FUNCTION dbo.Factorial ( #x int )
RETURNS int
AS
BEGIN
DECLARE #value int
IF #x <= 1
SET #value = 1
ELSE
SET #value = #x * dbo.Factorial( #x - 1 )
RETURN #value
END
GO*/
SET NOCOUNT ON;
DECLARE #k int = 5, #n int;
DECLARE #set table ( [value] varchar(24) );
DECLARE #com table ( [index] int );
INSERT #set VALUES ('1'),('2'),('3'),('4'),('5'),('6');
SELECT #n = COUNT(*) FROM #set;
DECLARE #combinations int = dbo.Factorial(#n) / (dbo.Factorial(#k) * dbo.Factorial(#n - #k));
PRINT CAST(#combinations as varchar(max)) + ' combinations';
DECLARE #index int = 1;
WHILE #index <= #combinations
BEGIN
INSERT #com VALUES (#index)
SET #index = #index + 1
END;
WITH [set] as (
SELECT
[value],
ROW_NUMBER() OVER ( ORDER BY [value] ) as [index]
FROM #set
)
SELECT
[values].[value],
[index].[index] as [combination]
FROM [set] [values]
CROSS JOIN #com [index]
WHERE ([index].[index] + [values].[index] - 1) % (#n) BETWEEN 1 AND #k
ORDER BY
[index].[index];
Returning Combinations
Using a numbers table or number-generating CTE, select 0 through 2^n - 1. Using the bit positions containing 1s in these numbers to indicate the presence or absence of the relative members in the combination, and eliminating those that don't have the correct number of values, you should be able to return a result set with all the combinations you desire.
WITH Nums (Num) AS (
SELECT Num
FROM Numbers
WHERE Num BETWEEN 0 AND POWER(2, #n) - 1
), BaseSet AS (
SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
FROM #set
), Combos AS (
SELECT
ComboID = N.Num,
S.Value,
Cnt = Count(*) OVER (PARTITION BY N.Num)
FROM
Nums N
INNER JOIN BaseSet S ON N.Num & S.ind <> 0
)
SELECT
ComboID,
Value
FROM Combos
WHERE Cnt = #k
ORDER BY ComboID, Value;
This query performs pretty well, but I thought of a way to optimize it, cribbing from the Nifty Parallel Bit Count to first get the right number of items taken at a time. This performs 3 to 3.5 times faster (both CPU and time):
WITH Nums AS (
SELECT Num, P1 = (Num & 0x55555555) + ((Num / 2) & 0x55555555)
FROM dbo.Numbers
WHERE Num BETWEEN 0 AND POWER(2, #n) - 1
), Nums2 AS (
SELECT Num, P2 = (P1 & 0x33333333) + ((P1 / 4) & 0x33333333)
FROM Nums
), Nums3 AS (
SELECT Num, P3 = (P2 & 0x0f0f0f0f) + ((P2 / 16) & 0x0f0f0f0f)
FROM Nums2
), BaseSet AS (
SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
FROM #set
)
SELECT
ComboID = N.Num,
S.Value
FROM
Nums3 N
INNER JOIN BaseSet S ON N.Num & S.ind <> 0
WHERE P3 % 255 = #k
ORDER BY ComboID, Value;
I went and read the bit-counting page and think that this could perform better if I don't do the % 255 but go all the way with bit arithmetic. When I get a chance I'll try that and see how it stacks up.
My performance claims are based on the queries run without the ORDER BY clause. For clarity, what this code is doing is counting the number of set 1-bits in Num from the Numbers table. That's because the number is being used as a sort of indexer to choose which elements of the set are in the current combination, so the number of 1-bits will be the same.
I hope you like it!
For the record, this technique of using the bit pattern of integers to select members of a set is what I've coined the "Vertical Cross Join." It effectively results in the cross join of multiple sets of data, where the number of sets & cross joins is arbitrary. Here, the number of sets is the number of items taken at a time.
Actually cross joining in the usual horizontal sense (of adding more columns to the existing list of columns with each join) would look something like this:
SELECT
A.Value,
B.Value,
C.Value
FROM
#Set A
CROSS JOIN #Set B
CROSS JOIN #Set C
WHERE
A.Value = 'A'
AND B.Value = 'B'
AND C.Value = 'C'
My queries above effectively "cross join" as many times as necessary with only one join. The results are unpivoted compared to actual cross joins, sure, but that's a minor matter.
Critique of Your Code
First, may I suggest this change to your Factorial UDF:
ALTER FUNCTION dbo.Factorial (
#x bigint
)
RETURNS bigint
AS
BEGIN
IF #x <= 1 RETURN 1
RETURN #x * dbo.Factorial(#x - 1)
END
Now you can calculate much larger sets of combinations, plus it's more efficient. You might even consider using decimal(38, 0) to allow larger intermediate calculations in your combination calculations.
Second, your given query does not return the correct results. For example, using my test data from the performance testing below, set 1 is the same as set 18. It looks like your query takes a sliding stripe that wraps around: each set is always 5 adjacent members, looking something like this (I pivoted to make it easier to see):
1 ABCDE
2 ABCD Q
3 ABC PQ
4 AB OPQ
5 A NOPQ
6 MNOPQ
7 LMNOP
8 KLMNO
9 JKLMN
10 IJKLM
11 HIJKL
12 GHIJK
13 FGHIJ
14 EFGHI
15 DEFGH
16 CDEFG
17 BCDEF
18 ABCDE
19 ABCD Q
Compare the pattern from my queries:
31 ABCDE
47 ABCD F
55 ABC EF
59 AB DEF
61 A CDEF
62 BCDEF
79 ABCD G
87 ABC E G
91 AB DE G
93 A CDE G
94 BCDE G
103 ABC FG
107 AB D FG
109 A CD FG
110 BCD FG
115 AB EFG
117 A C EFG
118 BC EFG
121 A DEFG
...
Just to drive the bit-pattern -> index of combination thing home for anyone interested, notice that 31 in binary = 11111 and the pattern is ABCDE. 121 in binary is 1111001 and the pattern is A__DEFG (backwards mapped).
Performance Results With A Real Numbers Table
I ran some performance testing with big sets on my second query above. I do not have a record at this time of the server version used. Here's my test data:
DECLARE
#k int,
#n int;
DECLARE #set TABLE (value varchar(24));
INSERT #set VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q');
SET #n = ##RowCount;
SET #k = 5;
DECLARE #combinations bigint = dbo.Factorial(#n) / (dbo.Factorial(#k) * dbo.Factorial(#n - #k));
SELECT CAST(#combinations as varchar(max)) + ' combinations', MaxNumUsedFromNumbersTable = POWER(2, #n);
Peter showed that this "vertical cross join" doesn't perform as well as simply writing dynamic SQL to actually do the CROSS JOINs it avoids. At the trivial cost of a few more reads, his solution has metrics between 10 and 17 times better. The performance of his query decreases faster than mine as the amount of work increases, but not fast enough to stop anyone from using it.
The second set of numbers below is the factor as divided by the first row in the table, just to show how it scales.
Erik
Items CPU Writes Reads Duration | CPU Writes Reads Duration
----- ------ ------ ------- -------- | ----- ------ ------ --------
17•5 7344 0 3861 8531 |
18•9 17141 0 7748 18536 | 2.3 2.0 2.2
20•10 76657 0 34078 84614 | 10.4 8.8 9.9
21•11 163859 0 73426 176969 | 22.3 19.0 20.7
21•20 142172 0 71198 154441 | 19.4 18.4 18.1
Peter
Items CPU Writes Reads Duration | CPU Writes Reads Duration
----- ------ ------ ------- -------- | ----- ------ ------ --------
17•5 422 70 10263 794 |
18•9 6046 980 219180 11148 | 14.3 14.0 21.4 14.0
20•10 24422 4126 901172 46106 | 57.9 58.9 87.8 58.1
21•11 58266 8560 2295116 104210 | 138.1 122.3 223.6 131.3
21•20 51391 5 6291273 55169 | 121.8 0.1 613.0 69.5
Extrapolating, eventually my query will be cheaper (though it is from the start in reads), but not for a long time. To use 21 items in the set already requires a numbers table going up to 2097152...
Here is a comment I originally made before realizing that my solution would perform drastically better with an on-the-fly numbers table:
I love single-query solutions to problems like this, but if you're looking for the best performance, an actual cross-join is best, unless you start dealing with seriously huge numbers of combination. But what does anyone want with hundreds of thousands or even millions of rows? Even the growing number of reads don't seem too much of a problem, though 6 million is a lot and it's getting bigger fast...
Anyway. Dynamic SQL wins. I still had a beautiful query. :)
Performance Results with an On-The-Fly Numbers Table
When I originally wrote this answer, I said:
Note that you could use an on-the-fly numbers table, but I haven't tried it.
Well, I tried it, and the results were that it performed much better! Here is the query I used:
DECLARE #N int = 16, #K int = 12;
CREATE TABLE #Set (Value char(1) PRIMARY KEY CLUSTERED);
CREATE TABLE #Items (Num int);
INSERT #Items VALUES (#K);
INSERT #Set
SELECT TOP (#N) V
FROM
(VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')) X (V);
GO
DECLARE
#N int = (SELECT Count(*) FROM #Set),
#K int = (SELECT TOP 1 Num FROM #Items);
DECLARE #combination int, #value char(1);
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (
SELECT Num, P1 = (Num & 0x55555555) + ((Num / 2) & 0x55555555)
FROM Nums
WHERE Num BETWEEN 0 AND Power(2, #N) - 1
), Nums2 AS (
SELECT Num, P2 = (P1 & 0x33333333) + ((P1 / 4) & 0x33333333)
FROM Nums1
), Nums3 AS (
SELECT Num, P3 = (P2 & 0x0F0F0F0F) + ((P2 / 16) & 0x0F0F0F0F)
FROM Nums2
), BaseSet AS (
SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
FROM #Set
)
SELECT
#Combination = N.Num,
#Value = S.Value
FROM
Nums3 N
INNER JOIN BaseSet S
ON N.Num & S.Ind <> 0
WHERE P3 % 255 = #K;
Note that I selected the values into variables to reduce the time and memory needed to test everything. The server still does all the same work. I modified Peter's version to be similar, and removed unnecessary extras so they were both as lean as possible. The server version used for these tests is Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86) Standard Edition on Windows NT 5.2 <X86> (Build 3790: Service Pack 2) (VM) running on a VM.
Below are charts showing the performance curves for values of N and K up to 21. The base data for them is in another answer on this page. The values are the result of 5 runs of each query at each K and N value, followed by throwing out the best and worst values for each metric and averaging the remaining 3.
Basically, my version has a "shoulder" (in the leftmost corner of the chart) at high values of N and low values of K that make it perform worse there than the dynamic SQL version. However, this stays fairly low and constant, and the central peak around N = 21 and K = 11 is much lower for Duration, CPU, and Reads than the dynamic SQL version.
I included a chart of the number of rows each item is expected to return so you can see how the query performs stacked up against how big a job it has to do.
Please see my additional answer on this page for the complete performance results. I hit the post character limit and could not include it here. (Any ideas where else to put it?) To put things in perspective against my first version's performance results, here's the same format as before:
Erik
Items CPU Duration Reads Writes | CPU Duration Reads
----- ----- -------- ------- ------ | ----- -------- -------
17•5 354 378 12382 0 |
18•9 1849 1893 97246 0 | 5.2 5.0 7.9
20•10 7119 7357 369518 0 | 20.1 19.5 29.8
21•11 13531 13807 705438 0 | 38.2 36.5 57.0
21•20 3234 3295 48 0 | 9.1 8.7 0.0
Peter
Items CPU Duration Reads Writes | CPU Duration Reads
----- ----- -------- ------- ------ | ----- -------- -------
17•5 41 45 6433 0 |
18•9 2051 1522 214021 0 | 50.0 33.8 33.3
20•10 8271 6685 864455 0 | 201.7 148.6 134.4
21•11 18823 15502 2097909 0 | 459.1 344.5 326.1
21•20 25688 17653 4195863 0 | 626.5 392.3 652.2
Conclusions
On-the-fly numbers tables are better than a real table containing rows, since reading one at huge rowcounts requires a lot of I/O. It is better to use a little CPU.
My initial tests weren't broad enough to really show the performance characteristics of the two versions.
Peter's version could be improved by making each JOIN not only be greater than the prior item, but also restrict the maximum value based on how many more items have to be fit into the set. For example, at 21 items taken 21 at a time, there is only one answer of 21 rows (all 21 items, one time), but the intermediate rowsets in the dynamic SQL version, early in the execution plan, contain combinations such as "AU" at step 2 even though this will be discarded at the next join since there is no value higher than "U" available. Similarly, an intermediate rowset at step 5 will contain "ARSTU" but the only valid combo at this point is "ABCDE". This improved version would not have a lower peak at the center, so possibly not improving it enough to become the clear winner, but it would at least become symmetrical so that the charts would not stay maxed past the middle of the region but would fall back to near 0 as my version does (see the top corner of the peaks for each query).
Duration Analysis
There is no really significant difference between the versions in duration (>100ms) until 14 items taken 12 at a time. Up to this point, my version wins 30 times and the dynamic SQL version wins 43 times.
Starting at 14•12, my version was faster 65 times (59 >100ms), the dynamic SQL version 64 times (60 >100ms). However, all the times my version was faster, it saved a total averaged duration of 256.5 seconds, and when the dynamic SQL version was faster, it saved 80.2 seconds.
The total averaged duration for all trials was Erik 270.3 seconds, Peter 446.2 seconds.
If a lookup table were created to determine which version to use (picking the faster one for the inputs), all the results could be performed in 188.7 seconds. Using the slowest one each time would take 527.7 seconds.
Reads Analysis
The duration analysis showed my query winning by significant but not overly large amount. When the metric is switched to reads, a very different picture emerges--my query uses on average 1/10th the reads.
There is no really significant difference between the versions in reads (>1000) until 9 items taken 9 at a time. Up to this point, my version wins 30 times and the dynamic SQL version wins 17 times.
Starting at 9•9, my version used fewer reads 118 times (113 >1000), the dynamic SQL version 69 times (31 >1000). However, all the times my version used fewer reads, it saved a total averaged 75.9M reads, and when the dynamic SQL version was faster, it saved 380K reads.
The total averaged reads for all trials was Erik 8.4M, Peter 84M.
If a lookup table were created to determine which version to use (picking the best one for the inputs), all the results could be performed in 8M reads. Using the worst one each time would take 84.3M reads.
I would be very interested to see the results of an updated dynamic SQL version that puts the extra upper limit on the items chosen at each step as I described above.
Addendum
The following version of my query achieves an improvement of about 2.25% over the performance results listed above. I used MIT's HAKMEM bit-counting method, and added a Convert(int) on the result of row_number() since it returns a bigint. Of course I wish this is the version I had used with for all the performance testing and charts and data above, but it is unlikely I will ever redo it as it was labor-intensive.
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (
SELECT Convert(int, Num) Num
FROM Nums
WHERE Num BETWEEN 1 AND Power(2, #N) - 1
), Nums2 AS (
SELECT
Num,
P1 = Num - ((Num / 2) & 0xDB6DB6DB) - ((Num / 4) & 0x49249249)
FROM Nums1
),
Nums3 AS (SELECT Num, Bits = ((P1 + P1 / 8) & 0xC71C71C7) % 63 FROM Nums2),
BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM #Set)
SELECT
N.Num,
S.Value
FROM
Nums3 N
INNER JOIN BaseSet S
ON N.Num & S.Ind <> 0
WHERE
Bits = #K
And I could not resist showing one more version that does a lookup to get the count of bits. It may even be faster than other versions:
DECLARE #BitCounts binary(255) =
0x01010201020203010202030203030401020203020303040203030403040405
+ 0x0102020302030304020303040304040502030304030404050304040504050506
+ 0x0102020302030304020303040304040502030304030404050304040504050506
+ 0x0203030403040405030404050405050603040405040505060405050605060607
+ 0x0102020302030304020303040304040502030304030404050304040504050506
+ 0x0203030403040405030404050405050603040405040505060405050605060607
+ 0x0203030403040405030404050405050603040405040505060405050605060607
+ 0x0304040504050506040505060506060704050506050606070506060706070708;
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (SELECT Convert(int, Num) Num FROM Nums WHERE Num BETWEEN 1 AND Power(2, #N) - 1),
BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM ComboSet)
SELECT
#Combination = N.Num,
#Value = S.Value
FROM
Nums1 N
INNER JOIN BaseSet S
ON N.Num & S.Ind <> 0
WHERE
#K =
Convert(int, Substring(#BitCounts, N.Num & 0xFF, 1))
+ Convert(int, Substring(#BitCounts, N.Num / 256 & 0xFF, 1))
+ Convert(int, Substring(#BitCounts, N.Num / 65536 & 0xFF, 1))
+ Convert(int, Substring(#BitCounts, N.Num / 16777216, 1))
Please forgive this extra answer. I ran into the post character limit in my original answer.
Here are the complete averaged numeric performance results for the charts in my answer.
| Erik | Peter
N K | CPU Duration Reads Writes | CPU Duration Reads Writes
-- -- - ----- -------- ------ ------ - ----- -------- ------- ------
1 1 | 0 0 7 0 | 0 0 7 0
2 1 | 0 0 10 0 | 0 0 7 0
2 2 | 0 0 7 0 | 0 0 11 0
3 1 | 0 0 12 0 | 0 0 7 0
3 2 | 0 0 12 0 | 0 0 13 0
3 3 | 5 0 7 0 | 0 0 19 0
4 1 | 0 0 14 0 | 0 0 7 0
4 2 | 0 0 18 0 | 0 0 15 0
4 3 | 0 0 14 0 | 5 0 27 0
4 4 | 0 0 7 0 | 0 0 35 0
5 1 | 5 0 16 0 | 5 0 7 0
5 2 | 0 0 26 0 | 0 0 17 0
5 3 | 0 0 26 0 | 0 0 37 0
5 4 | 0 0 16 0 | 0 0 57 0
5 5 | 0 0 7 0 | 0 0 67 0
6 1 | 0 0 18 0 | 0 0 7 0
6 2 | 5 0 36 0 | 0 0 19 0
6 3 | 0 0 46 0 | 0 0 49 0
6 4 | 0 0 36 0 | 0 0 89 0
6 5 | 5 0 18 0 | 5 0 119 0
6 6 | 0 0 7 0 | 0 0 131 0
7 1 | 5 0 20 0 | 0 0 7 0
7 2 | 0 0 48 0 | 0 0 21 0
7 3 | 0 0 76 0 | 0 0 63 0
7 4 | 0 0 76 0 | 0 0 133 0
7 5 | 0 1 48 0 | 0 1 203 0
7 6 | 5 0 20 0 | 0 1 245 0
7 7 | 5 0 7 0 | 0 3 259 0
8 1 | 5 2 22 0 | 0 4 7 0
8 2 | 0 1 62 0 | 0 0 23 0
8 3 | 0 1 118 0 | 0 0 79 0
8 4 | 0 1 146 0 | 0 1 191 0
8 5 | 5 3 118 0 | 0 1 331 0
8 6 | 5 1 62 0 | 5 2 443 0
8 7 | 0 0 22 0 | 5 3 499 0
8 8 | 0 0 7 0 | 5 3 515 0
9 1 | 0 2 24 0 | 0 0 7 0
9 2 | 5 3 78 0 | 0 0 25 0
9 3 | 5 3 174 0 | 0 1 97 0
9 4 | 5 5 258 0 | 0 2 265 0
9 5 | 5 7 258 0 | 10 4 517 0
9 6 | 5 5 174 0 | 5 5 769 0
9 7 | 0 3 78 0 | 10 4 937 0
9 8 | 0 0 24 0 | 0 3 1009 0
9 9 | 0 1 7 0 | 0 4 1027 0
10 1 | 10 4 26 0 | 0 0 7 0
10 2 | 5 5 96 0 | 0 0 27 0
10 3 | 5 2 246 0 | 0 0 117 0
10 4 | 10 10 426 0 | 10 4 357 0
10 5 | 15 12 510 0 | 5 8 777 0
10 6 | 15 16 426 0 | 10 9 1281 0
10 7 | 10 4 246 0 | 10 9 1701 0
10 8 | 10 5 96 0 | 10 5 1941 0
10 9 | 5 4 26 0 | 10 7 2031 0
10 10 | 5 0 7 0 | 10 7 2051 0
11 1 | 10 8 28 0 | 0 0 7 0
11 2 | 15 11 116 0 | 0 0 29 0
11 3 | 21 24 336 0 | 10 3 139 0
11 4 | 21 18 666 0 | 5 2 469 0
11 5 | 21 20 930 0 | 5 3 1129 0
11 6 | 26 35 930 0 | 15 12 2053 0
11 7 | 20 14 666 0 | 5 25 2977 0
11 8 | 15 9 336 0 | 20 14 3637 0
11 9 | 10 7 116 0 | 21 27 3967 0
11 10 | 10 8 28 0 | 36 34 4086 0
11 11 | 5 8 7 0 | 15 15 4109 0
12 1 | 16 18 30 0 | 5 0 7 0
12 2 | 31 32 138 0 | 0 0 31 0
12 3 | 31 26 446 0 | 10 2 163 0
12 4 | 47 40 996 0 | 10 7 603 0
12 5 | 47 46 1590 0 | 21 17 1593 0
12 6 | 57 53 1854 0 | 31 30 3177 0
12 7 | 41 39 1590 0 | 31 30 5025 0
12 8 | 41 42 996 0 | 42 43 6609 0
12 9 | 31 26 446 0 | 52 52 7607 0
12 10 | 20 19 138 0 | 57 62 8048 0
12 11 | 15 17 30 0 | 72 64 8181 0
12 12 | 15 10 7 0 | 67 38 8217 0
13 1 | 31 32 32 0 | 0 0 7 0
13 2 | 21 25 162 0 | 0 0 33 0
13 3 | 36 34 578 0 | 5 2 189 0
13 4 | 57 65 1436 0 | 10 5 761 0
13 5 | 41 40 2580 0 | 10 10 2191 0
13 6 | 62 56 3438 0 | 31 32 4765 0
13 7 | 62 62 3438 0 | 57 53 8251 0
13 8 | 52 64 2580 0 | 52 47 11710 0
13 9 | 26 28 1436 0 | 93 96 14311 0
13 10 | 31 29 578 0 | 161 104 15891 0
13 11 | 36 35 162 0 | 129 99 16525 0
13 12 | 21 22 32 0 | 156 96 16383 0
13 13 | 26 30 7 0 | 166 98 16411 0
14 1 | 57 53 34 0 | 0 0 7 0
14 2 | 52 50 188 0 | 0 0 35 0
14 3 | 57 60 734 0 | 10 4 217 0
14 4 | 78 76 2008 0 | 15 8 945 0
14 5 | 99 97 4010 0 | 36 34 2947 0
14 6 | 120 125 6012 0 | 41 47 6951 0
14 7 | 125 119 6870 0 | 93 94 12957 0
14 8 | 135 138 6012 0 | 88 98 19821 0
14 9 | 78 153 4010 0 | 234 156 26099 0
14 10 | 94 92 2008 0 | 229 133 30169 0
14 11 | 83 90 734 0 | 239 136 32237 0
14 12 | 47 46 188 0 | 281 176 33031 0
14 13 | 52 53 34 0 | 260 167 32767 0
14 14 | 46 47 7 0 | 203 149 32797 0
15 1 | 83 83 36 0 | 0 0 7 0
15 2 | 145 139 216 0 | 0 2 37 0
15 3 | 104 98 916 0 | 0 2 247 0
15 4 | 135 135 2736 0 | 15 17 1157 0
15 5 | 94 97 6012 0 | 26 27 3887 0
15 6 | 192 188 10016 0 | 57 53 9893 0
15 7 | 187 192 12876 0 | 73 73 19903 0
15 8 | 286 296 12876 0 | 338 230 33123 0
15 9 | 208 207 10016 0 | 354 223 46063 0
15 10 | 140 143 6012 0 | 443 334 56143 0
15 11 | 88 86 2736 0 | 391 273 62219 0
15 12 | 73 72 916 0 | 432 269 65019 0
15 13 | 109 117 216 0 | 317 210 65999 0
15 14 | 156 187 36 0 | 411 277 66279 0
15 15 | 140 142 7 0 | 354 209 65567 0
16 1 | 281 281 38 0 | 0 0 7 0
16 2 | 141 146 246 0 | 0 0 39 0
16 3 | 208 206 1126 0 | 10 4 279 0
16 4 | 187 189 3646 0 | 15 13 1399 0
16 5 | 234 234 8742 0 | 42 42 5039 0
16 6 | 333 337 16022 0 | 83 85 13775 0
16 7 | 672 742 22886 0 | 395 235 30087 0
16 8 | 510 510 25746 0 | 479 305 53041 0
16 9 | 672 675 22886 0 | 671 489 78855 0
16 10 | 489 492 16022 0 | 859 578 101809 0
16 11 | 250 258 8742 0 | 719 487 117899 0
16 12 | 198 202 3646 0 | 745 483 126709 0
16 13 | 119 119 1126 0 | 770 506 130423 0
16 14 | 291 327 246 0 | 770 531 131617 0
16 15 | 156 156 38 0 | 713 451 131931 0
16 16 | 125 139 7 0 | 895 631 132037 0
17 1 | 406 437 40 0 | 0 0 7 0
17 2 | 307 320 278 0 | 0 0 41 0
17 3 | 281 290 1366 0 | 0 3 313 0
17 4 | 307 317 4766 0 | 31 28 1673 0
17 5 | 354 378 12382 0 | 41 45 6433 0
17 6 | 583 582 24758 0 | 130 127 18809 0
17 7 | 839 859 38902 0 | 693 449 43873 0
17 8 | 1177 1183 48626 0 | 916 679 82847 0
17 9 | 1031 1054 48626 0 | 1270 944 131545 0
17 10 | 828 832 38902 0 | 1469 1105 180243 0
17 11 | 672 668 24758 0 | 1535 1114 219217 0
17 12 | 422 422 12382 0 | 1494 991 244047 0
17 13 | 474 482 4766 0 | 1615 1165 256501 0
17 14 | 599 607 1366 0 | 1500 1042 261339 0
17 15 | 223 218 278 0 | 1401 1065 262777 0
17 16 | 229 228 40 0 | 1390 918 263127 0
17 17 | 541 554 7 0 | 1562 1045 263239 0
18 1 | 401 405 42 0 | 0 0 7 0
18 2 | 401 397 312 0 | 0 0 43 0
18 3 | 458 493 1638 0 | 5 6 349 0
18 4 | 583 581 6126 0 | 16 13 1981 0
18 5 | 697 700 17142 0 | 83 130 8101 0
18 6 | 792 799 37134 0 | 156 162 25237 0
18 7 | 1672 1727 63654 0 | 1098 751 62693 0
18 8 | 1598 1601 87522 0 | 1416 1007 126423 0
18 9 | 1849 1893 97246 0 | 2051 1522 214021 0
18 10 | 1963 2083 87522 0 | 2734 2103 311343 0
18 11 | 1411 1428 63654 0 | 2849 2352 398941 0
18 12 | 1042 1048 37134 0 | 3021 2332 462671 0
18 13 | 942 985 17142 0 | 3036 2314 499881 0
18 14 | 656 666 6126 0 | 3052 2177 517099 0
18 15 | 526 532 1638 0 | 2910 2021 523301 0
18 16 | 614 621 312 0 | 3083 2108 525015 0
18 17 | 536 551 42 0 | 2921 2031 525403 0
18 18 | 682 680 7 0 | 3141 2098 525521 0
19 1 | 885 909 44 0 | 0 0 7 0
19 2 | 1411 1498 348 0 | 0 0 45 0
19 3 | 880 887 1944 0 | 5 4 387 0
19 4 | 1119 1139 7758 0 | 26 25 2325 0
19 5 | 1120 1127 23262 0 | 73 72 10077 0
19 6 | 1395 1462 54270 0 | 453 387 33591 0
19 7 | 1875 1929 100782 0 | 1197 838 87941 0
19 8 | 2656 2723 151170 0 | 2255 1616 188803 0
19 9 | 3046 3092 184762 0 | 3317 2568 340053 0
19 10 | 3635 3803 184762 0 | 5171 4041 524895 0
19 11 | 2739 2774 151170 0 | 5577 4574 709737 0
19 12 | 3203 3348 100782 0 | 6182 5194 860987 0
19 13 | 1672 1750 54270 0 | 6458 5561 961849 0
19 14 | 1760 1835 23262 0 | 6177 4964 1016199 0
19 15 | 968 1006 7758 0 | 6266 4331 1039541 0
19 16 | 1099 1134 1944 0 | 6208 4254 1047379 0
19 17 | 995 1037 348 0 | 6385 4366 1049403 0
19 18 | 916 964 44 0 | 6036 4268 1049831 0
19 19 | 1135 1138 7 0 | 6234 4320 1049955 0
20 1 | 1797 1821 46 0 | 0 0 7 0
20 2 | 2000 2029 386 0 | 0 0 47 0
20 3 | 2031 2071 2286 0 | 10 6 427 0
20 4 | 1942 2036 9696 0 | 31 34 2707 0
20 5 | 2104 2161 31014 0 | 88 85 12397 0
20 6 | 2880 2958 77526 0 | 860 554 43675 0
20 7 | 3791 3940 155046 0 | 2026 1405 121285 0
20 8 | 5130 5307 251946 0 | 3823 2731 276415 0
20 9 | 6547 6845 335926 0 | 5380 4148 528445 0
20 10 | 7119 7357 369518 0 | 8271 6685 864455 0
20 11 | 5692 5803 335926 0 | 9557 8029 1234057 0
20 12 | 4734 4850 251946 0 | 11114 9504 1570067 0
20 13 | 3604 3641 155046 0 | 11551 10434 1822097 0
20 14 | 2911 2999 77526 0 | 12317 10822 1977227 0
20 15 | 2115 2134 31014 0 | 12806 10679 2054837 0
20 16 | 2041 2095 9696 0 | 13062 9115 2085935 0
20 17 | 2390 2465 2286 0 | 12807 9002 2095715 0
20 18 | 1765 1788 386 0 | 12598 8601 2098085 0
20 19 | 2067 2143 46 0 | 12578 8626 2098555 0
20 20 | 1640 1663 7 0 | 12932 9064 2098685 0
21 1 | 3374 3425 48 0 | 0 0 7 0
21 2 | 4031 4157 426 0 | 0 1 49 0
21 3 | 3218 3250 2666 0 | 10 5 469 0
21 4 | 3687 3734 11976 0 | 21 25 3129 0
21 5 | 3692 3735 40704 0 | 115 114 15099 0
21 6 | 4859 4943 108534 0 | 963 661 56079 0
21 7 | 6114 6218 232566 0 | 2620 1880 164701 0
21 8 | 8573 8745 406986 0 | 4999 3693 397355 0
21 9 | 11880 12186 587866 0 | 9047 6863 804429 0
21 10 | 13255 13582 705438 0 | 14358 11436 1392383 0
21 11 | 13531 13807 705438 0 | 18823 15502 2097909 0
21 12 | 12244 12400 587866 0 | 21834 18760 2803435 0
21 13 | 9406 9528 406986 0 | 23771 21274 3391389 0
21 14 | 7114 7180 232566 0 | 26677 24296 3798463 0
21 15 | 4869 4961 108534 0 | 26479 23998 4031117 0
21 16 | 4416 4521 40704 0 | 26536 22976 4139739 0
21 17 | 4380 4443 11976 0 | 26490 19107 4180531 0
21 18 | 3265 3334 2666 0 | 25979 17995 4192595 0
21 19 | 3640 3768 426 0 | 26186 17891 4195349 0
21 20 | 3234 3295 48 0 | 25688 17653 4195863 0
21 21 | 3156 3219 7 0 | 26140 17838 4195999 0
The CUBE extension to a group by clause represents all combinations of the given list. E.g, the following will give all 3-combinations of a 4-element set.
select concat(a,b,c,d)
from (select 'a','b','c','d') as t(a,b,c,d)
group by cube(a,b,c,d)
having len(concat(a,b,c,d)) = 3
How about some dynamic SQL?
DECLARE #k int = 5, #n INT
IF OBJECT_ID('tempdb..#set') IS NOT NULL DROP TABLE #set
CREATE TABLE #set ( [value] varchar(24) )
INSERT #set VALUES ('1'),('2'),('3'),('4'),('5'),('6')
SET #n = ##ROWCOUNT
SELECT dbo.Factorial(#n) / (dbo.Factorial(#k) * dbo.Factorial(#n - #k)) AS [expected combinations]
-- let's generate some sql.
DECLARE
#crlf NCHAR(2) = NCHAR(13)+NCHAR(10)
, #sql NVARCHAR(MAX)
, #select NVARCHAR(MAX)
, #from NVARCHAR(MAX)
, #order NVARCHAR(MAX)
, #in NVARCHAR(MAX)
DECLARE #j INT = 0
WHILE #j < #k BEGIN
SET #j += 1
IF #j = 1 BEGIN
SET #select = 'SELECT'+#crlf+' _1.value AS [1]'
SET #from = #crlf+'FROM #set AS _1'
SET #order = 'ORDER BY _1.value'
SET #in = '[1]'
END
ELSE BEGIN
SET #select += #crlf+', _'+CONVERT(VARCHAR,#j)+'.value AS ['+CONVERT(VARCHAR,#j)+']'
SET #from += #crlf+'INNER JOIN #set AS _'+CONVERT(VARCHAR,#j)+' ON _'+CONVERT(VARCHAR,#j)+'.value > _'+CONVERT(VARCHAR,#j-1)+'.value'
SET #order += ', _'+CONVERT(VARCHAR,#j)+'.value'
SET #in += ', ['+CONVERT(VARCHAR,#j)+']'
END
END
SET #select += #crlf+', ROW_NUMBER() OVER ('+#order+') AS combination'
SET #sql = #select + #from
-- let's see how it looks
PRINT #sql
EXEC (#sql)
-- ok, now dump pivot and dump into a table for later use
IF OBJECT_ID('tempdb..#combinations') IS NOT NULL DROP TABLE #combinations
CREATE TABLE #combinations (
combination INT
, value VARCHAR(24)
, PRIMARY KEY (combination, value)
)
SET #sql
= 'WITH CTE AS ('+#crlf+#sql+#crlf+')'+#crlf
+ 'INSERT #combinations (combination, value)'+#crlf
+ 'SELECT combination, value FROM CTE a'+#crlf
+ 'UNPIVOT (value FOR position IN ('+#in+')) AS b'
PRINT #sql
EXEC (#sql)
SELECT COUNT(DISTINCT combination) AS [returned combinations] FROM #combinations
SELECT * FROM #combinations
Generates the following query for #k = 5:
SELECT
_1.value AS [1]
, _2.value AS [2]
, _3.value AS [3]
, _4.value AS [4]
, _5.value AS [5]
, ROW_NUMBER() OVER (ORDER BY _1.value, _2.value, _3.value, _4.value, _5.value) AS combination
FROM #set AS _1
INNER JOIN #set AS _2 ON _2.value > _1.value
INNER JOIN #set AS _3 ON _3.value > _2.value
INNER JOIN #set AS _4 ON _4.value > _3.value
INNER JOIN #set AS _5 ON _5.value > _4.value
Which it then unpivots and dumps into a table.
The dynamic SQL is ugly, and you can't wrap it in a UDF, but the query produced is very efficient.
First create this UDF...
CREATE FUNCTION [dbo].[_ex_fn_SplitToTable] (#str varchar(5000), #sep char(1) = null)
RETURNS #ReturnVal table (n int, s varchar(5000))
AS
/*
Alpha Test
-----------
select * from [dbo].[_ex_fn_SplitToTable_test01]('abcde','')
*/
BEGIN
declare #str2 varchar(5000)
declare #sep2 char(1)
if LEN(ISNULL(#sep,'')) = 0
begin
declare #i int
set #i = 0
set #str2 = ''
declare #char varchar(1)
startloop:
set #i += 1
--print #i
set #char = substring(#str,#i,1)
set #str2 = #str2 + #char + ','
if LEN(#str) <= #i
goto exitloop
goto startloop
exitloop:
set #str2 = left(#str2,LEN(#str2) - 1)
set #sep2 = ','
--print #str2
end
else
begin
set #str2 = #str
set #sep2 = #sep
end
;WITH Pieces(n, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep2, #str2)
UNION ALL
SELECT n + 1, stop + 1, CHARINDEX(#sep2, #str2, stop + 1)
FROM Pieces
WHERE stop > 0
)
insert into #ReturnVal(n,s)
SELECT n,
SUBSTRING(#str2, start, CASE WHEN stop > 0 THEN stop-start ELSE 5000 END) AS s
FROM Pieces option (maxrecursion 32767)
RETURN
END
GO
Then create this stored proc...
CREATE proc [CombinationsOfString]
(
#mystring varchar(max) = '0,5,10,15,20,25'
)
as
/*
ALPHA TEST
---------
exec CombinationsOfString '-20,-10,0,10,20'
*/
if object_id('tempdb..#_201606070947_myorig') is not null drop table #_201606070947_myorig
CREATE TABLE #_201606070947_myorig
(
SourceId int not null identity(1,1)
,Element varchar(100) not null
)
insert into #_201606070947_myorig
select s from dbo._ex_fn_SplitToTable(#mystring,',')
--select SourceId, Element from #_201606070947_myorig
declare #mynumerics varchar(max)
set #mynumerics = (
select STUFF(REPLACE((SELECT '#!' + LTRIM(RTRIM(SourceId)) AS 'data()'
FROM #_201606070947_myorig
FOR XML PATH('')),' #!',', '), 1, 2, '') as Brands
)
set #mynumerics = REPLACE(#mynumerics,' ','')
print #mynumerics
if object_id('tempdb..#_201606070947_source') is not null drop table #_201606070947_source
if object_id('tempdb..#_201606070947_numbers') is not null drop table #_201606070947_numbers
if object_id('tempdb..#_201606070947_results') is not null drop table #_201606070947_results
if object_id('tempdb..#_201606070947_processed') is not null drop table #_201606070947_processed
CREATE TABLE #_201606070947_source
(
SourceId int not null identity(1,1)
,Element char(1) not null
)
--declare #mynumerics varchar(max)
--set #mynumerics = '1,2,3,4,5'
insert into #_201606070947_source
select s from dbo._ex_fn_SplitToTable(#mynumerics,',')
-- select * from #_201606070947_source
declare #Length int
set #Length = (select max(SourceId) from #_201606070947_source)
declare #columnstring varchar(max) = (SELECT REPLICATE('c.',#Length))
print #columnstring
declare #subs varchar(max) = (SELECT REPLICATE('substring.',#Length))
print #subs
if object_id('tempdb..#_201606070947_columns') is not null drop table #_201606070947_columns
select s+CONVERT(varchar,dbo.PadLeft(convert(varchar,n),'0',3)) cols
into #_201606070947_columns
from [dbo].[_ex_fn_SplitToTable](#columnstring,'.') where LEN(s) > 0
if object_id('tempdb..#_201606070947_subcolumns') is not null drop table #_201606070947_subcolumns
select s+'(Combo,'+CONVERT(varchar,n)+',1) ' + 'c'+CONVERT(varchar,dbo.PadLeft(convert(varchar,n),'0',3)) cols
into #_201606070947_subcolumns
from [dbo].[_ex_fn_SplitToTable](#subs,'.') where LEN(s) > 0
-- select * from #_201606070947_subcolumns
-- select * from #_201606070947_columns
declare #columns_sql varchar(max)
set #columns_sql =
(
select distinct
stuff((SELECT distinct + cast(cols as varchar(50)) + ' VARCHAR(1), '
FROM (
select cols
from #_201606070947_columns
) t2
--where t2.n = t1.n
FOR XML PATH('')),3,0,'')
from (
select cols
from #_201606070947_columns
) t1
)
declare #substring_sql varchar(max)
set #substring_sql =
(
select distinct
stuff((SELECT distinct + cast(cols as varchar(100)) + ', '
FROM (
select cols
from #_201606070947_subcolumns
) t2
--where t2.n = t1.n
FOR XML PATH('')),3,0,'')
from (
select cols
from #_201606070947_subcolumns
) t1
)
set #substring_sql = left(#substring_sql,LEN(#substring_sql) - 1)
print #substring_sql
set #columns_sql = LEFT(#columns_sql,LEN(#columns_sql) - 1)
--SELECT #columns_sql
declare #sql varchar(max)
set #sql = 'if object_id(''tempdb..##_201606070947_01'') is not null drop table ##_201606070947_01 create table ##_201606070947_01 (rowid int,' + #columns_sql + ')'
print #sql
execute(#sql)
CREATE TABLE #_201606070947_numbers (Number int not null)
insert into #_201606070947_numbers
select SourceId from #_201606070947_source
CREATE TABLE #_201606070947_results
(
Combo varchar(10) not null
,Length int not null
)
SET NOCOUNT on
DECLARE
#Loop int
,#MaxLoop int
-- How many elements there are to process
SELECT #MaxLoop = max(SourceId)
from #_201606070947_source
-- Initialize first value
TRUNCATE TABLE #_201606070947_results
INSERT #_201606070947_results (Combo, Length)
select Element, 1
from #_201606070947_source
where SourceId = 1
SET #Loop = 2
-- Iterate to add each Element after the first
WHILE #Loop <= #MaxLoop
BEGIN
INSERT #_201606070947_results (Combo, Length)
select distinct
left(re.Combo, #Loop - nm.Number)
+ so.Element
+ RIGHT(re.Combo, nm.Number - 1)
,#Loop
from #_201606070947_results re
inner join #_201606070947_numbers nm
on nm.Number <= #Loop
inner join #_201606070947_source so
on so.SourceId = #Loop
where re.Length = #Loop - 1
SET #Loop = #Loop + 1
END
-- select * from #_201606070947_results
-- Show #_201606070947_results
SELECT *
into #_201606070947_processed
from #_201606070947_results
where Length = #MaxLoop
order by Combo
-- select * from #_201606070947_processed
set #sql = 'if object_id(''tempdb..##_201606070947_02'') is not null drop table ##_201606070947_02 '
print #sql
execute(#sql)
set #sql = ' ' +
' SELECT ROW_NUMBER() OVER(ORDER BY Combo Asc) AS RowID,' + #substring_sql +
' into ##_201606070947_02 ' +
' FROM #_201606070947_processed ' +
' '
PRINT #sql
execute(#sql)
declare #columns_sql_new varchar(max)
set #columns_sql_new = REPLACE(#columns_sql,'(1)','(100)')
set #sql = 'if object_id(''tempdb..##_201606070947_03'') is not null drop table ##_201606070947_03 create table ##_201606070947_03 (RowId int,' + #columns_sql_new + ')'
PRINT #sql
execute(#sql)
insert into ##_201606070947_03 (RowId)
select RowId from ##_201606070947_02
--select * from ##_201606070947_03
DECLARE #ColumnId varchar(10)
DECLARE #getColumnId CURSOR
SET #getColumnId = CURSOR FOR
select cols ColumnId from #_201606070947_columns
OPEN #getColumnId
FETCH NEXT
FROM #getColumnId INTO #ColumnId
WHILE ##FETCH_STATUS = 0
BEGIN
PRINT #ColumnId
set #sql = ' ' +
' update ##_201606070947_03
set ' + #ColumnId + ' = B.Element
from ##_201606070947_03 A
, (
select A.RowID, B.*
from
(
select * from ##_201606070947_02
) A
,
(
select * from #_201606070947_myorig
) B
where A.' + #ColumnId + ' = B.SourceId
) B
where A.RowId = B.RowId
'
execute(#sql)
print #sql
FETCH NEXT
FROM #getColumnId INTO #ColumnId
END
CLOSE #getColumnId
DEALLOCATE #getColumnId
select * from ##_201606070947_03
I ran across another technique for calculating combinations, and it is so simple. In a public SQL challenge, it also soundly beat my own attempt based on the same bit-pattern technique as in my (currently) accepted answer. Though I admit I didn't play with it super long, or grab my best solution from here and adapt, but I wrote it over again.
Anyway, I was sure I had taken myself down a peg--here I was thinking I'd hit on something very cool but later found out my solution was easily surpassed. But to my surprise, when I tried the technique out, it was worse than the best method above. It is here for reference due to its easiness to implement and its reasonable performance for small jobs (thus making it superior except when the job may demand the added complexity of something like the bit-pattern technique).
Given a table #Set with the same number of rows as the items being selected from, and variable #K for the number of items taken at a time, here is that method:
WITH Chains AS (
SELECT
Generation = 1,
Chain = Convert(varchar(1000),'|' + Value + '|'),
Value
FROM
#Set
WHERE
#K > 0
AND value <= ALL (
SELECT TOP (#K) Value
FROM #Set
ORDER BY Value DESC
)
UNION ALL
SELECT
C.Generation + 1,
Convert(varchar(1000), C.Chain + S.Value + '|'),
S.Value
FROM
Chains C
INNER JOIN #Set S
ON C.Value < S.Value
WHERE
C.Generation <= #K
)
SELECT
C.Chain,
S.Value
FROM
Chains C
INNER JOIN #Set S
ON C.Chain LIKE '%|' + S.Value + '|%'
WHERE
Generation = #K;