Extracting lines using two criteria - awk

Hoping somebody can teach me how to do this task.
I am thinking awk might be good to do this, but I am really beginner.
I have a file like below (tab separated, actual file is much bigger).
Here, important columns are second and ninth (235 and 15 in the first line of the file).
S 235 1365 * 0 * * * 15 1 c81 592
H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1
H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10
H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4
H 235 792 100 + 0 0 180I792M393I 14 1 c407 107
S 236 1365 * 0 * * * 15 1 c474 152
H 236 279 95 + 0 0 765I279M321I 10-1 1 s7689 1
H 236 301 99.7 - 0 0 908I301M156I 15 1 s8443 1
H 236 563 95.2 - 0 0 728I563M74I 17 1 c1725 12
H 236 97 97.9 - 0 0 732I97M536I 17 1 s11472 1
I would like to extract lines by specifying the value of ninth columns. At this time, second columns will be like pivot column. What I mean pivot column is, consider as a single set of data if second column has same value. And within the set of lines, all lines need to have the specific values in the ninth column.
So, for example, if I specify ninth column "14" and "15". Then out put will be.
S 235 1365 * 0 * * * 15 1 c81 592
H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1
H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10
H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4
H 235 792 100 + 0 0 180I792M393I 14 1 c407 107
6th and 8th lines have "15" in their ninth column, but other lines in the "set" (specified by second column: 236) have values other than "14" or "15", so I do not want to extract the lines.

$ cat tst.awk
$2 != prevPivot { prtCurrSet() }
$9 !~ /^1[45]$/ { isBadSet=1 }
{ currSet = currSet $0 ORS; prevPivot = $2 }
END { prtCurrSet() }
function prtCurrSet() {
if ( !isBadSet ) {
printf "%s", currSet
}
currSet = ""
isBadSet = 0
}
$ awk -f tst.awk file
S 235 1365 * 0 * * * 15 1 c81 592
H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1
H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10
H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4
H 235 792 100 + 0 0 180I792M393I 14 1 c407 107

Not completely sure about complete requirement, seeing your expected output, could you please try following.
awk '$2 == 235 && ($9 == 14 || $9 == 15)' Input_file
Output will be as follows.
S 235 1365 * 0 * * * 15 1 c81 592
H 235 296 99.7 + 0 0 3I296M1066I 14 1 s15018 1
H 235 719 95.4 + 0 0 174D545M820I 15 1 c2664 10
H 235 764 99.1 + 0 0 55I764M546I 15 1 c6519 4
H 235 792 100 + 0 0 180I792M393I 14 1 c407 107

Short awk expression:
awk '$2==235 && $9~/^1[45]$/' file
$9~/^1[45]$/ - ensures that the 9th field matches either 14 or 15

Related

Parsing Json Data from select query in SQL Server

I have a situation where I have a table that has a single varchar(max) column called dbo.JsonData. It has x number of rows with x number of properties.
How can I create a query that will allow me to turn the result set from a select * query into a row/column result set?
Here is what I have tried:
SELECT *
FROM JSONDATA
FOR JSON Path
But it returns a single row of the json data all in a single column:
JSON_F52E2B61-18A1-11d1-B105-00805F49916B
[{"Json_Data":"{\"Serial_Number\":\"12345\",\"Gateway_Uptime\":17,\"Defrost_Cycles\":0,\"Freeze_Cycles\":2304,\"Float_Switch_Raw_ADC\":1328,\"Bin_status\":2304,\"Line_Voltage\":0,\"ADC_Evaporator_Temperature\":0,\"Mem_Sw\":1280,\"Freeze_Timer\":2560,\"Defrost_Timer\":593,\"Water_Flow_Switch\":3328,\"ADC_Mid_Temperature\":2560,\"ADC_Water_Temperature\":0,\"Ambient_Temperature\":71,\"Mid_Temperature\":1259,\"Water_Temperature\":1259,\"Evaporator_Temperature\":1259,\"Ambient_Temperature_Off_Board\":0,\"Ambient_Temperature_On_Board\":0,\"Gateway_Info\":\"{\\\"temp_sensor\\\":0.00,\\\"temp_pcb\\\":82.00,\\\"gw_uptime\\\":17.00,\\\"winc_fw\\\":\\\"19.5.4\\\",\\\"gw_fw_version\\\":\\\"0.0.0\\\",\\\"gw_fw_version_git\\\":\\\"2a75f20-dirty\\\",\\\"gw_sn\\\":\\\"328\\\",\\\"heap_free\\\":11264.00,\\\"gw_sig_csq\\\":0.00,\\\"gw_sig_quality\\\":0.00,\\\"wifi_sig_strength\\\":-63.00,\\\"wifi_resets\\\":0.00}\",\"ADC_Ambient_Temperature\":1120,\"Control_State\":\"Bin Full\",\"Compressor_Runtime\":134215680}"},{"Json_Data":"{\"Serial_Number\":\"12345\",\"Gateway_Uptime\":200,\"Defrost_Cycles\":559,\"Freeze_Cycles\":510,\"Float_Switch_Raw_ADC\":106,\"Bin_status\":0,\"Line_Voltage\":119,\"ADC_Evaporator_Temperature\":123,\"Mem_Sw\":113,\"Freeze_Timer\":0,\"Defrost_Timer\":66,\"Water_Flow_Switch\":3328,\"ADC_Mid_Temperature\":2560,\"ADC_Water_Temperature\":0,\"Ambient_Temperature\":71,\"Mid_Temperature\":1259,\"Water_Temperature\":1259,\"Evaporator_Temperature\":54,\"Ambient_Temperature_Off_Board\":0,\"Ambient_Temperature_On_Board\":0,\"Gateway_Info\":\"{\\\"temp_sensor\\\":0.00,\\\"temp_pcb\\\":82.00,\\\"gw_uptime\\\":199.00,\\\"winc_fw\\\":\\\"19.5.4\\\",\\\"gw_fw_version\\\":\\\"0.0.0\\\",\\\"gw_fw_version_git\\\":\\\"2a75f20-dirty\\\",\\\"gw_sn\\\":\\\"328\\\",\\\"heap_free\\\":10984.00,\\\"gw_sig_csq\\\":0.00,\\\"gw_sig_quality\\\":0.00,\\\"wifi_sig_strength\\\":-60.00,\\\"wifi_resets\\\":0.00}\",\"ADC_Ambient_Temperature\":1120,\"Control_State\":\"Defrost\",\"Compressor_Runtime\":11304}"},{"Json_Data":"{\"Seri...
What am I missing?
I can't specify the columns explicitly because the json strings aren't always the same.
This what I expect:
Serial_Number Gateway_Uptime Defrost_Cycles Freeze_Cycles Float_Switch_Raw_ADC Bin_status Line_Voltage ADC_Evaporator_Temperature Mem_Sw Freeze_Timer Defrost_Timer Water_Flow_Switch ADC_Mid_Temperature ADC_Water_Temperature Ambient_Temperature Mid_Temperature Water_Temperature Evaporator_Temperature Ambient_Temperature_Off_Board Ambient_Temperature_On_Board ADC_Ambient_Temperature Control_State Compressor_Runtime temp_sensor temp_pcb gw_uptime winc_fw gw_fw_version gw_fw_version_git gw_sn heap_free gw_sig_csq gw_sig_quality wifi_sig_strength wifi_resets LastModifiedDateUTC Defrost_Cycle_time Freeze_Cycle_time
12345 251402 540 494 106 0 98 158 113 221 184 0 0 0 1259 1259 1259 33 0 0 0 Freeze 10833 0 78 251402 19.5.4 0.0.0 2a75f20-dirty 328.00000000 10976 0 0 -61 0 2018-03-20 11:15:28.000 0 0
12345 251702 540 494 106 0 98 178 113 517 184 0 0 0 1259 1259 1259 22 0 0 0 Freeze 10838 0 78 251702 19.5.4 0.0.0 2a75f20-dirty 328.00000000 10976 0 0 -62 0 2018-03-20 11:15:42.000 0 0
...
Thank you,
Ron

Teradata how to select first occurrent

I have a table similar to the picture below. In this table, I have some duplicates in SESS_KEY. I only want rows that does not have duplicates or if rows that have duplicates, I only want the ones with CALL_TRNSF_FLG set to 1. I have manual create INCLUDE field to show column that I want (1). How can I achieve this?
Thank you for your help!
Here is the sample data:
INCLUDE SESS_KEY SESS_CALL_ST_DT_TS CONN_ID TLK_DUR HLD_DUR AFT_CALL_WRK_DUR TRNSF_TLK_TM TRNSF_HLD_TM TRNSF_ACW_TM CALL_TRNSF_FLG
0 24067A16-A24A-45BE-E3AA-7E0BFE7ECDA5 7/25/2016 9:07 0141028541267da5 918 57 26 ? ? ? 0
1 24067A16-A24A-45BE-E3AA-7E0BFE7ECDA5 7/25/2016 9:07 0521028304ed75f8 236 0 3 918 57 26 1
0 49FFAB03-C19C-4291-6BAB-267CC95E27CF 7/6/2016 17:25 014102854125f060 278 0 130 ? ? ? 0
1 49FFAB03-C19C-4291-6BAB-267CC95E27CF 7/6/2016 17:25 0521028304e98111 391 0 8 278 0 130 1
0 7CCBBF2F-6FBC-4812-BAB1-4E258B88C20A 7/12/2016 11:34 05200282b0814531 269 0 190 406 0 124 1
1 7CCBBF2F-6FBC-4812-BAB1-4E258B88C20A 7/12/2016 11:34 013b028225ed6484 406 0 124 ? ? ? 0
0 CA32F05E-5C8A-4849-63A4-15B2342081B8 7/6/2016 11:38 02420282b06776f9 256 0 114 297 0 67 1
1 CA32F05E-5C8A-4849-63A4-15B2342081B8 7/6/2016 11:38 014102854125ea06 297 0 67 ? ? ? 0
0 E75EF405-1C0D-45E4-EC97-88D3CD7B5E55 7/5/2016 15:03 1.41E+214 2,691 0 255 ? ? ? 0
1 E75EF405-1C0D-45E4-EC97-88D3CD7B5E55 7/5/2016 15:03 0243028304ee14a5 314 0 9 2,691 0 255 1
1 04F8CC43-710B-4E4D-D8A1-DAC45FB3FF24 7/19/2016 16:49 1.41E+14 123 100 43 ? ? ? 0
1 0AFB6070-9D95-47B0-B0AF-D34ED70FCE8E 7/22/2016 14:20 0243028304f1ffca 335 239 79 ? ? ? 0
1 13581E6A-A568-4993-098C-05233CF293AE 7/15/2016 11:22 014102854126375a 196 150 258 ? ? ? 0
1 1A6AE4BE-1858-4CB3-83B1-CFF7A9E88EF9 7/8/2016 19:09 02420282b068325e 120 0 0 ? ? ? 0
1 24CE6C11-AF85-4770-53B4-FE20200339DF 7/28/2016 12:47 0243028304f3401b 181 0 0 107 0 48 1
1 293F85F4-34BC-44B1-43B5-A6B3B8886FC8 7/1/2016 8:33 0521028304e8778e 70 0 21 149 0 1 1
1 2BD0216A-B3F3-4597-1CBD-095F8D291736 7/7/2016 8:41 0243028304ee83b2 1,037 0 187 ? ? ? 0
1 2C774BE2-5B26-47C0-B69F-69B04A63F879 7/25/2016 18:26 013b028225edd637 1,481 0 110 ? ? ? 0
1 3F43720B-B6AE-4335-4FB5-9275A952989F 7/11/2016 11:08 013b028225ed5830 155 0 0 ? ? ? 0
1 41B056DC-8D3F-425D-BD9E-10A3EB0E944D 7/27/2016 11:13 05200282b084c5d5 34 0 0 ? ? ? 0
1 420483AD-8586-45C7-68AB-675E50EF2B92 7/5/2016 11:03 013b028225ed2765 1,320 0 283 ? ? ? 0
1 43A14051-6EAA-4251-3FA1-F2FBAE6DB643 7/23/2016 12:16 05200282b083f410 359 0 143 ? ? ? 0
1 494F3EA9-EA47-4F7B-C795-61B8B23DA0FA 7/21/2016 9:27 02420282b06ac6c3 0 0 0 ? ? ? 0
1 4D743557-DE09-4007-D58C-EFB09EF6713C 7/29/2016 17:19 05200282b085844a 951 361 240 ? ? ? 0
1 546C0FD0-5445-44F8-0789-1FA62BB57CDB 7/15/2016 18:14 1.41E+59 686 0 60 ? ? ? 0
1 5487C587-D37C-4E5C-9A88-87A3978996CD 7/28/2016 18:51 014102854126a96d 833 0 534 ? ? ? 0
1 5AB8D65A-28C7-4CAD-5796-3A7B720A47F7 7/20/2016 8:56 0141028541265a9f 274 111 381 ? ? ? 0
1 6866B3F8-F953-43BF-9089-B1FE699DEE07 7/19/2016 16:25 05200282b0830349 35 0 180 ? ? ? 0
1 6A4566B3-71B9-47BB-75BC-37B6E644D704 7/19/2016 10:14 02420282b06a3d7b 0 0 0 ? ? ? 0
1 72D17A78-FA5D-42DA-E39A-F7B950C15E22 7/5/2016 18:05 02420282b0675679 606 0 167 ? ? ? 0
1 73657A2A-34B7-4921-E691-49827E46128D 7/20/2016 11:02 02420282b06a8ae8 31 0 264 ? ? ? 0
1 7520F825-DA7B-4D5F-7AA9-3ADD9AAC5BE7 7/5/2016 18:53 05200282b07fd5df 354 0 20 ? ? ? 0
1 76DA5FB6-3EDD-45E1-B8BB-C70EA1CB4E53 7/1/2016 10:07 0243028304ed7c74 132 0 20 105 0 66 1
1 810B9E66-AA32-4BB0-128D-8E3FFC86EB0E 7/22/2016 13:37 013b028225edc13d 1,621 109 34 ? ? ? 0
1 81402352-DE71-45E4-4EAD-C1FFE20F8288 7/11/2016 9:28 0521028304ea456a 38 0 0 71 0 0 1
1 81EA3AD7-B721-4718-9AB0-6FB005252F64 7/12/2016 17:15 013b028225ed6ad5 812 129 60 ? ? ? 0
1 870632C0-4D80-41DC-AD84-12972DBC5AF2 7/23/2016 14:20 0243028304f229ee 1,084 0 5 ? ? ? 0
1 886919E7-80DB-4E2C-D5B2-8B83420F4D27 7/26/2016 19:22 0243028304f2da42 533 465 155 ? ? ? 0
1 8A18B8A2-1405-446B-71BA-A3FBAC816C12 7/8/2016 16:13 013b028225ed4f72 318 237 0 ? ? ? 0
1 8A54DAD7-2745-4BFB-22BE-BF479C1A8710 7/7/2016 15:25 02420282b067da6c 42 0 94 104 0 38 1
1 8D5EB433-2D50-4A67-00AC-E768A549B56E 7/26/2016 14:35 0521028304edf692 55 0 0 ? ? ? 0
1 8F222904-EC4E-4395-D496-A25FB408AD95 7/29/2016 17:09 0243028304f3a5f0 88 0 137 ? ? ? 0
1 9310922F-D545-4E78-42B2-E1B508F5A436 7/7/2016 12:23 02420282b067c625 155 2 15 ? ? ? 1
1 A605BF7A-50E6-4114-1981-7B3988079B7E 7/6/2016 16:56 02420282b0679dfa 89 0 293 ? ? ? 0
1 AA23384F-C4DA-4357-3DAF-7CD8337831DD 7/9/2016 11:20 014102854126082a 138 0 210 ? ? ? 0
1 AF5AD7E2-7584-4ACD-B28E-1AB2DB87BDAA 7/21/2016 17:36 0243028304f1cda6 0 0 0 ? ? ? 0
1 B66D3851-83BE-4E0E-7D9B-1719E378905D 7/19/2016 12:41 0243028304f122cd 81 0 0 ? ? ? 0
1 BB2FA3CD-AB6D-42BD-3CB7-EEC27E3403BF 7/15/2016 14:38 0243028304f0753e 65 0 195 92 3 29 1
1 BBA4031A-7876-4614-F9BC-718A6D8A16A7 7/13/2016 17:42 0521028304eb1a47 163 0 85 ? ? ? 0
1 BCF2B7D8-CBD0-497F-EEA7-FDEC46EFEEBE 7/7/2016 12:09 0521028304e9acaa 44 0 8 ? ? ? 0
1 BE9386B6-424E-40F9-67A1-A56EF6C18B77 7/27/2016 20:03 013b028225edecc5 1 0 0 ? ? ? 0
1 C0F0EF71-F52B-4D10-E9B4-DA1AF4343CC7 7/11/2016 15:21 05200282b08111ee 49 0 61 368 0 597 1
1 C84FCA28-2372-4F8B-52B1-4BC5E9AD128B 7/19/2016 13:06 013b028225eda06c 59 0 32 162 0 0 1
1 C8B3CC50-DEC3-4F24-D0A2-E32A03AFA786 7/13/2016 13:22 05200282b0819c4d 126 0 0 ? ? ? 0
1 CC119F61-A70F-4DB3-7C8C-DCE9A1C3BCB5 7/27/2016 9:48 02420282b06c1330 0 0 0 ? ? ? 0
1 D57D43C7-F9F0-42B9-C6B6-23B1414D9F12 7/14/2016 15:04 05200282b081ee2a 36 0 17 ? ? ? 0
1 E438B480-8F98-469C-3899-E6F10DD1F755 7/5/2016 20:12 02420282b0675cea 3,874 163 7 ? ? ? 0
1 F223F1F4-F50D-41F6-DA9F-46EA2972F394 7/27/2016 20:13 05200282b084f966 417 0 6 ? ? ? 0
1 FB3B0CB1-89D8-4B47-E4BA-465E57D52B0D 7/14/2016 14:21 02420282b0695b07 138 0 2 ? ? ? 0
SELECT *
FROM tab
QUALIFY
-- rows that does not have duplicates
COUNT(*)
OVER (PARTITION BY SESS_KEY) = 1
-- the ones with CALL_TRNSF_FLG set to 1
OR CALL_TRNSF_FLG = 1
If there might be multiple rows with CALL_TRNSF_FLG = 1 and you only want one row per session:
QUALIFY
ROW_NUMBER(*)
OVER (PARTITION BY SESS_KEY
ORDER BY CALL_TRNSF_FLG DESC) = 1
You can filter rows You want like this:
WHERE
CONN_ID NOT IN (
SELECT
MIN(CONN_ID)
FROM
<<your_table>>
GROUP BY
SESS_KEY
HAVING
count(*) > 1
)
Assuming, that pair: sess_key,conn_id is unique. Otherwise, You should find unique set of columns and filter by it.
Hi You can have query as,
Using JOINS
select t1.* from table_name t1
join
(select count(*),SESS_KEYfrom table_name
group by SESS_KEYhaving count(*) > 1 ) t2
on t1.SESS_KEY=t2.SESS_KEY;
Using WHERE clause
select t1.* from table_name t1,
(select count(*),SESS_KEYfrom table_name
group by SESS_KEYhaving count(*) > 1 ) t2
where t1.SESS_KEY=t2.SESS_KEY;
This provides you all the duplicates having in SESS_KEY column
To Update
merge into table_name t1
using
(select count(*),SESS_KEYfrom table_name
group by SESS_KEYhaving count(*) > 1 ) t2
on t1.SESS_KEY=t2.SESS_KEY
when matched then
update set
CALL_TRNSF_FLG=1;
Use MERGE statement to update the table

Getting negative values after applying PNG UP predictor to xref stream

PDF stream decode parameters are as follows.
<>/Filter/FlateDecode/ID[]/Index[2573 1 2962 1 2967 2 3004 9]/Info 2573 0 R/Length 58/Prev 1365436/Root 2575 0 R/Size 3013/Type/XRef/W[1 3 0]>>stream
.....
decoded bytes (unsigned int values):
1 20 4294967267 4294967190
1 20 4294967269 74
1 20 4294967284 4294967204
2 0 11 4294967235
1 20 4294967262 24
1 20 4294967262 116
1 20 4294967265 4294967222
1 20 4294967267 60
1 20 4294967267 89
1 20 4294967268 127
1 20 4294967268 4294967196
1 20 4294967270 39
1 20 4294967270 4294967220
not able to interprete it, please help.

Transpose a column to a line after detecting a pattern

I have this text file format:
01 contig00041 1 878 + YP_003990830.1 metalloendopeptidase, glycoprotease family Geobacillus sp. Y4.1MC1 100.00 291 1 291 47 337 0.0 592 #line 1
01 contig00041 1241 3117 - YP_002948419.1 ABC transporter Geobacillus sp. WCH70 84.94 #line 2
37.31 624 #line 3
260 1 #line 4
321 624 #line 5
532 23 #line 6
12 644 #line 7
270 0.0 #line 8
3e-37 1046 #line 9
154 #line 10
I have to detect a line containing 8 columns (line 2), and transpose the second column of the followning seven lines (lines 3 - 9) to the end of the 8-column line. And finally, exclude line 10. This pattern repeats along a large text file, but it is not frequent (30 time, in a file of 2000 lines). Is it possible doing it using awk?
The edited text file must look like the following text:
01 contig00041 1 878 + YP_003990830.1 metalloendopeptidase, glycoprotease family Geobacillus sp. Y4.1MC1 100.00 291 1 291 47 337 0.0 592 #line 1
01 contig00041 1241 3117 - YP_002948419.1 ABC transporter Geobacillus sp. WCH70 84.94 624 1 624 23 644 0.0 1046 #line 2
Thank you very much in advance.
awk 'NF == 12 { t = $0; for (i = 1; i <= 7; ++i) { r = getline; if (r < 1) break; t = t "\t" $2; } print t; next; } NF > 12' temp.txt
Output:
01 contig00041 1 878 + YP_003990830.1 metalloendopeptidase, glycoprotease family Geobacillus sp. Y4.1MC1 100.00 291 1 291 47 337 0.0 592
01 contig00041 1241 3117 - YP_002948419.1 ABC transporter Geobacillus sp. WCH70 84.94 624 1 624 23 644 0.0 1046
It would automatically print lines having more than 12 fields.
If it detects lines having 12 fields, concatenate second column of other 7 lines with it and print.
Ignore any other line.
edited to only add the second column of the lines with two columns.
I think this does what you want:
awk 'NF >= 8 { a[++i] = $0 } NF == 2 { a[i] = a[i] " " $2 } END { for (j = 1; j <= i; ++j) print a[j] }' file
For lines with more than 8 columns, add a new element to the array a. If the line has 2 columns, append the contents to the current array element. Once the whole file has been processed, go through the array and print all of the lines.
Output:
01 contig00041 1 878 + YP_003990830.1 metalloendopeptidase, glycoprotease family Geobacillus sp. Y4.1MC1 100.00 291 1 291 47 337 0.0 592
01 contig00041 1241 3117 - YP_002948419.1 ABC transporter Geobacillus sp. WCH70 84.94 624 1 624 23 644 0.0 1046

Generate all combinations in SQL

I need to generate all combinations of size #k in a given set of size #n. Can someone please review the following SQL and determine first if the following logic is returning the expected results, and second if is there a better way?
/*CREATE FUNCTION dbo.Factorial ( #x int )
RETURNS int
AS
BEGIN
DECLARE #value int
IF #x <= 1
SET #value = 1
ELSE
SET #value = #x * dbo.Factorial( #x - 1 )
RETURN #value
END
GO*/
SET NOCOUNT ON;
DECLARE #k int = 5, #n int;
DECLARE #set table ( [value] varchar(24) );
DECLARE #com table ( [index] int );
INSERT #set VALUES ('1'),('2'),('3'),('4'),('5'),('6');
SELECT #n = COUNT(*) FROM #set;
DECLARE #combinations int = dbo.Factorial(#n) / (dbo.Factorial(#k) * dbo.Factorial(#n - #k));
PRINT CAST(#combinations as varchar(max)) + ' combinations';
DECLARE #index int = 1;
WHILE #index <= #combinations
BEGIN
INSERT #com VALUES (#index)
SET #index = #index + 1
END;
WITH [set] as (
SELECT
[value],
ROW_NUMBER() OVER ( ORDER BY [value] ) as [index]
FROM #set
)
SELECT
[values].[value],
[index].[index] as [combination]
FROM [set] [values]
CROSS JOIN #com [index]
WHERE ([index].[index] + [values].[index] - 1) % (#n) BETWEEN 1 AND #k
ORDER BY
[index].[index];
Returning Combinations
Using a numbers table or number-generating CTE, select 0 through 2^n - 1. Using the bit positions containing 1s in these numbers to indicate the presence or absence of the relative members in the combination, and eliminating those that don't have the correct number of values, you should be able to return a result set with all the combinations you desire.
WITH Nums (Num) AS (
SELECT Num
FROM Numbers
WHERE Num BETWEEN 0 AND POWER(2, #n) - 1
), BaseSet AS (
SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
FROM #set
), Combos AS (
SELECT
ComboID = N.Num,
S.Value,
Cnt = Count(*) OVER (PARTITION BY N.Num)
FROM
Nums N
INNER JOIN BaseSet S ON N.Num & S.ind <> 0
)
SELECT
ComboID,
Value
FROM Combos
WHERE Cnt = #k
ORDER BY ComboID, Value;
This query performs pretty well, but I thought of a way to optimize it, cribbing from the Nifty Parallel Bit Count to first get the right number of items taken at a time. This performs 3 to 3.5 times faster (both CPU and time):
WITH Nums AS (
SELECT Num, P1 = (Num & 0x55555555) + ((Num / 2) & 0x55555555)
FROM dbo.Numbers
WHERE Num BETWEEN 0 AND POWER(2, #n) - 1
), Nums2 AS (
SELECT Num, P2 = (P1 & 0x33333333) + ((P1 / 4) & 0x33333333)
FROM Nums
), Nums3 AS (
SELECT Num, P3 = (P2 & 0x0f0f0f0f) + ((P2 / 16) & 0x0f0f0f0f)
FROM Nums2
), BaseSet AS (
SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
FROM #set
)
SELECT
ComboID = N.Num,
S.Value
FROM
Nums3 N
INNER JOIN BaseSet S ON N.Num & S.ind <> 0
WHERE P3 % 255 = #k
ORDER BY ComboID, Value;
I went and read the bit-counting page and think that this could perform better if I don't do the % 255 but go all the way with bit arithmetic. When I get a chance I'll try that and see how it stacks up.
My performance claims are based on the queries run without the ORDER BY clause. For clarity, what this code is doing is counting the number of set 1-bits in Num from the Numbers table. That's because the number is being used as a sort of indexer to choose which elements of the set are in the current combination, so the number of 1-bits will be the same.
I hope you like it!
For the record, this technique of using the bit pattern of integers to select members of a set is what I've coined the "Vertical Cross Join." It effectively results in the cross join of multiple sets of data, where the number of sets & cross joins is arbitrary. Here, the number of sets is the number of items taken at a time.
Actually cross joining in the usual horizontal sense (of adding more columns to the existing list of columns with each join) would look something like this:
SELECT
A.Value,
B.Value,
C.Value
FROM
#Set A
CROSS JOIN #Set B
CROSS JOIN #Set C
WHERE
A.Value = 'A'
AND B.Value = 'B'
AND C.Value = 'C'
My queries above effectively "cross join" as many times as necessary with only one join. The results are unpivoted compared to actual cross joins, sure, but that's a minor matter.
Critique of Your Code
First, may I suggest this change to your Factorial UDF:
ALTER FUNCTION dbo.Factorial (
#x bigint
)
RETURNS bigint
AS
BEGIN
IF #x <= 1 RETURN 1
RETURN #x * dbo.Factorial(#x - 1)
END
Now you can calculate much larger sets of combinations, plus it's more efficient. You might even consider using decimal(38, 0) to allow larger intermediate calculations in your combination calculations.
Second, your given query does not return the correct results. For example, using my test data from the performance testing below, set 1 is the same as set 18. It looks like your query takes a sliding stripe that wraps around: each set is always 5 adjacent members, looking something like this (I pivoted to make it easier to see):
1 ABCDE
2 ABCD Q
3 ABC PQ
4 AB OPQ
5 A NOPQ
6 MNOPQ
7 LMNOP
8 KLMNO
9 JKLMN
10 IJKLM
11 HIJKL
12 GHIJK
13 FGHIJ
14 EFGHI
15 DEFGH
16 CDEFG
17 BCDEF
18 ABCDE
19 ABCD Q
Compare the pattern from my queries:
31 ABCDE
47 ABCD F
55 ABC EF
59 AB DEF
61 A CDEF
62 BCDEF
79 ABCD G
87 ABC E G
91 AB DE G
93 A CDE G
94 BCDE G
103 ABC FG
107 AB D FG
109 A CD FG
110 BCD FG
115 AB EFG
117 A C EFG
118 BC EFG
121 A DEFG
...
Just to drive the bit-pattern -> index of combination thing home for anyone interested, notice that 31 in binary = 11111 and the pattern is ABCDE. 121 in binary is 1111001 and the pattern is A__DEFG (backwards mapped).
Performance Results With A Real Numbers Table
I ran some performance testing with big sets on my second query above. I do not have a record at this time of the server version used. Here's my test data:
DECLARE
#k int,
#n int;
DECLARE #set TABLE (value varchar(24));
INSERT #set VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q');
SET #n = ##RowCount;
SET #k = 5;
DECLARE #combinations bigint = dbo.Factorial(#n) / (dbo.Factorial(#k) * dbo.Factorial(#n - #k));
SELECT CAST(#combinations as varchar(max)) + ' combinations', MaxNumUsedFromNumbersTable = POWER(2, #n);
Peter showed that this "vertical cross join" doesn't perform as well as simply writing dynamic SQL to actually do the CROSS JOINs it avoids. At the trivial cost of a few more reads, his solution has metrics between 10 and 17 times better. The performance of his query decreases faster than mine as the amount of work increases, but not fast enough to stop anyone from using it.
The second set of numbers below is the factor as divided by the first row in the table, just to show how it scales.
Erik
Items CPU Writes Reads Duration | CPU Writes Reads Duration
----- ------ ------ ------- -------- | ----- ------ ------ --------
17•5 7344 0 3861 8531 |
18•9 17141 0 7748 18536 | 2.3 2.0 2.2
20•10 76657 0 34078 84614 | 10.4 8.8 9.9
21•11 163859 0 73426 176969 | 22.3 19.0 20.7
21•20 142172 0 71198 154441 | 19.4 18.4 18.1
Peter
Items CPU Writes Reads Duration | CPU Writes Reads Duration
----- ------ ------ ------- -------- | ----- ------ ------ --------
17•5 422 70 10263 794 |
18•9 6046 980 219180 11148 | 14.3 14.0 21.4 14.0
20•10 24422 4126 901172 46106 | 57.9 58.9 87.8 58.1
21•11 58266 8560 2295116 104210 | 138.1 122.3 223.6 131.3
21•20 51391 5 6291273 55169 | 121.8 0.1 613.0 69.5
Extrapolating, eventually my query will be cheaper (though it is from the start in reads), but not for a long time. To use 21 items in the set already requires a numbers table going up to 2097152...
Here is a comment I originally made before realizing that my solution would perform drastically better with an on-the-fly numbers table:
I love single-query solutions to problems like this, but if you're looking for the best performance, an actual cross-join is best, unless you start dealing with seriously huge numbers of combination. But what does anyone want with hundreds of thousands or even millions of rows? Even the growing number of reads don't seem too much of a problem, though 6 million is a lot and it's getting bigger fast...
Anyway. Dynamic SQL wins. I still had a beautiful query. :)
Performance Results with an On-The-Fly Numbers Table
When I originally wrote this answer, I said:
Note that you could use an on-the-fly numbers table, but I haven't tried it.
Well, I tried it, and the results were that it performed much better! Here is the query I used:
DECLARE #N int = 16, #K int = 12;
CREATE TABLE #Set (Value char(1) PRIMARY KEY CLUSTERED);
CREATE TABLE #Items (Num int);
INSERT #Items VALUES (#K);
INSERT #Set
SELECT TOP (#N) V
FROM
(VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')) X (V);
GO
DECLARE
#N int = (SELECT Count(*) FROM #Set),
#K int = (SELECT TOP 1 Num FROM #Items);
DECLARE #combination int, #value char(1);
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (
SELECT Num, P1 = (Num & 0x55555555) + ((Num / 2) & 0x55555555)
FROM Nums
WHERE Num BETWEEN 0 AND Power(2, #N) - 1
), Nums2 AS (
SELECT Num, P2 = (P1 & 0x33333333) + ((P1 / 4) & 0x33333333)
FROM Nums1
), Nums3 AS (
SELECT Num, P3 = (P2 & 0x0F0F0F0F) + ((P2 / 16) & 0x0F0F0F0F)
FROM Nums2
), BaseSet AS (
SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
FROM #Set
)
SELECT
#Combination = N.Num,
#Value = S.Value
FROM
Nums3 N
INNER JOIN BaseSet S
ON N.Num & S.Ind <> 0
WHERE P3 % 255 = #K;
Note that I selected the values into variables to reduce the time and memory needed to test everything. The server still does all the same work. I modified Peter's version to be similar, and removed unnecessary extras so they were both as lean as possible. The server version used for these tests is Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86) Standard Edition on Windows NT 5.2 <X86> (Build 3790: Service Pack 2) (VM) running on a VM.
Below are charts showing the performance curves for values of N and K up to 21. The base data for them is in another answer on this page. The values are the result of 5 runs of each query at each K and N value, followed by throwing out the best and worst values for each metric and averaging the remaining 3.
Basically, my version has a "shoulder" (in the leftmost corner of the chart) at high values of N and low values of K that make it perform worse there than the dynamic SQL version. However, this stays fairly low and constant, and the central peak around N = 21 and K = 11 is much lower for Duration, CPU, and Reads than the dynamic SQL version.
I included a chart of the number of rows each item is expected to return so you can see how the query performs stacked up against how big a job it has to do.
Please see my additional answer on this page for the complete performance results. I hit the post character limit and could not include it here. (Any ideas where else to put it?) To put things in perspective against my first version's performance results, here's the same format as before:
Erik
Items CPU Duration Reads Writes | CPU Duration Reads
----- ----- -------- ------- ------ | ----- -------- -------
17•5 354 378 12382 0 |
18•9 1849 1893 97246 0 | 5.2 5.0 7.9
20•10 7119 7357 369518 0 | 20.1 19.5 29.8
21•11 13531 13807 705438 0 | 38.2 36.5 57.0
21•20 3234 3295 48 0 | 9.1 8.7 0.0
Peter
Items CPU Duration Reads Writes | CPU Duration Reads
----- ----- -------- ------- ------ | ----- -------- -------
17•5 41 45 6433 0 |
18•9 2051 1522 214021 0 | 50.0 33.8 33.3
20•10 8271 6685 864455 0 | 201.7 148.6 134.4
21•11 18823 15502 2097909 0 | 459.1 344.5 326.1
21•20 25688 17653 4195863 0 | 626.5 392.3 652.2
Conclusions
On-the-fly numbers tables are better than a real table containing rows, since reading one at huge rowcounts requires a lot of I/O. It is better to use a little CPU.
My initial tests weren't broad enough to really show the performance characteristics of the two versions.
Peter's version could be improved by making each JOIN not only be greater than the prior item, but also restrict the maximum value based on how many more items have to be fit into the set. For example, at 21 items taken 21 at a time, there is only one answer of 21 rows (all 21 items, one time), but the intermediate rowsets in the dynamic SQL version, early in the execution plan, contain combinations such as "AU" at step 2 even though this will be discarded at the next join since there is no value higher than "U" available. Similarly, an intermediate rowset at step 5 will contain "ARSTU" but the only valid combo at this point is "ABCDE". This improved version would not have a lower peak at the center, so possibly not improving it enough to become the clear winner, but it would at least become symmetrical so that the charts would not stay maxed past the middle of the region but would fall back to near 0 as my version does (see the top corner of the peaks for each query).
Duration Analysis
There is no really significant difference between the versions in duration (>100ms) until 14 items taken 12 at a time. Up to this point, my version wins 30 times and the dynamic SQL version wins 43 times.
Starting at 14•12, my version was faster 65 times (59 >100ms), the dynamic SQL version 64 times (60 >100ms). However, all the times my version was faster, it saved a total averaged duration of 256.5 seconds, and when the dynamic SQL version was faster, it saved 80.2 seconds.
The total averaged duration for all trials was Erik 270.3 seconds, Peter 446.2 seconds.
If a lookup table were created to determine which version to use (picking the faster one for the inputs), all the results could be performed in 188.7 seconds. Using the slowest one each time would take 527.7 seconds.
Reads Analysis
The duration analysis showed my query winning by significant but not overly large amount. When the metric is switched to reads, a very different picture emerges--my query uses on average 1/10th the reads.
There is no really significant difference between the versions in reads (>1000) until 9 items taken 9 at a time. Up to this point, my version wins 30 times and the dynamic SQL version wins 17 times.
Starting at 9•9, my version used fewer reads 118 times (113 >1000), the dynamic SQL version 69 times (31 >1000). However, all the times my version used fewer reads, it saved a total averaged 75.9M reads, and when the dynamic SQL version was faster, it saved 380K reads.
The total averaged reads for all trials was Erik 8.4M, Peter 84M.
If a lookup table were created to determine which version to use (picking the best one for the inputs), all the results could be performed in 8M reads. Using the worst one each time would take 84.3M reads.
I would be very interested to see the results of an updated dynamic SQL version that puts the extra upper limit on the items chosen at each step as I described above.
Addendum
The following version of my query achieves an improvement of about 2.25% over the performance results listed above. I used MIT's HAKMEM bit-counting method, and added a Convert(int) on the result of row_number() since it returns a bigint. Of course I wish this is the version I had used with for all the performance testing and charts and data above, but it is unlikely I will ever redo it as it was labor-intensive.
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (
SELECT Convert(int, Num) Num
FROM Nums
WHERE Num BETWEEN 1 AND Power(2, #N) - 1
), Nums2 AS (
SELECT
Num,
P1 = Num - ((Num / 2) & 0xDB6DB6DB) - ((Num / 4) & 0x49249249)
FROM Nums1
),
Nums3 AS (SELECT Num, Bits = ((P1 + P1 / 8) & 0xC71C71C7) % 63 FROM Nums2),
BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM #Set)
SELECT
N.Num,
S.Value
FROM
Nums3 N
INNER JOIN BaseSet S
ON N.Num & S.Ind <> 0
WHERE
Bits = #K
And I could not resist showing one more version that does a lookup to get the count of bits. It may even be faster than other versions:
DECLARE #BitCounts binary(255) =
0x01010201020203010202030203030401020203020303040203030403040405
+ 0x0102020302030304020303040304040502030304030404050304040504050506
+ 0x0102020302030304020303040304040502030304030404050304040504050506
+ 0x0203030403040405030404050405050603040405040505060405050605060607
+ 0x0102020302030304020303040304040502030304030404050304040504050506
+ 0x0203030403040405030404050405050603040405040505060405050605060607
+ 0x0203030403040405030404050405050603040405040505060405050605060607
+ 0x0304040504050506040505060506060704050506050606070506060706070708;
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (SELECT Convert(int, Num) Num FROM Nums WHERE Num BETWEEN 1 AND Power(2, #N) - 1),
BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM ComboSet)
SELECT
#Combination = N.Num,
#Value = S.Value
FROM
Nums1 N
INNER JOIN BaseSet S
ON N.Num & S.Ind <> 0
WHERE
#K =
Convert(int, Substring(#BitCounts, N.Num & 0xFF, 1))
+ Convert(int, Substring(#BitCounts, N.Num / 256 & 0xFF, 1))
+ Convert(int, Substring(#BitCounts, N.Num / 65536 & 0xFF, 1))
+ Convert(int, Substring(#BitCounts, N.Num / 16777216, 1))
Please forgive this extra answer. I ran into the post character limit in my original answer.
Here are the complete averaged numeric performance results for the charts in my answer.
| Erik | Peter
N K | CPU Duration Reads Writes | CPU Duration Reads Writes
-- -- - ----- -------- ------ ------ - ----- -------- ------- ------
1 1 | 0 0 7 0 | 0 0 7 0
2 1 | 0 0 10 0 | 0 0 7 0
2 2 | 0 0 7 0 | 0 0 11 0
3 1 | 0 0 12 0 | 0 0 7 0
3 2 | 0 0 12 0 | 0 0 13 0
3 3 | 5 0 7 0 | 0 0 19 0
4 1 | 0 0 14 0 | 0 0 7 0
4 2 | 0 0 18 0 | 0 0 15 0
4 3 | 0 0 14 0 | 5 0 27 0
4 4 | 0 0 7 0 | 0 0 35 0
5 1 | 5 0 16 0 | 5 0 7 0
5 2 | 0 0 26 0 | 0 0 17 0
5 3 | 0 0 26 0 | 0 0 37 0
5 4 | 0 0 16 0 | 0 0 57 0
5 5 | 0 0 7 0 | 0 0 67 0
6 1 | 0 0 18 0 | 0 0 7 0
6 2 | 5 0 36 0 | 0 0 19 0
6 3 | 0 0 46 0 | 0 0 49 0
6 4 | 0 0 36 0 | 0 0 89 0
6 5 | 5 0 18 0 | 5 0 119 0
6 6 | 0 0 7 0 | 0 0 131 0
7 1 | 5 0 20 0 | 0 0 7 0
7 2 | 0 0 48 0 | 0 0 21 0
7 3 | 0 0 76 0 | 0 0 63 0
7 4 | 0 0 76 0 | 0 0 133 0
7 5 | 0 1 48 0 | 0 1 203 0
7 6 | 5 0 20 0 | 0 1 245 0
7 7 | 5 0 7 0 | 0 3 259 0
8 1 | 5 2 22 0 | 0 4 7 0
8 2 | 0 1 62 0 | 0 0 23 0
8 3 | 0 1 118 0 | 0 0 79 0
8 4 | 0 1 146 0 | 0 1 191 0
8 5 | 5 3 118 0 | 0 1 331 0
8 6 | 5 1 62 0 | 5 2 443 0
8 7 | 0 0 22 0 | 5 3 499 0
8 8 | 0 0 7 0 | 5 3 515 0
9 1 | 0 2 24 0 | 0 0 7 0
9 2 | 5 3 78 0 | 0 0 25 0
9 3 | 5 3 174 0 | 0 1 97 0
9 4 | 5 5 258 0 | 0 2 265 0
9 5 | 5 7 258 0 | 10 4 517 0
9 6 | 5 5 174 0 | 5 5 769 0
9 7 | 0 3 78 0 | 10 4 937 0
9 8 | 0 0 24 0 | 0 3 1009 0
9 9 | 0 1 7 0 | 0 4 1027 0
10 1 | 10 4 26 0 | 0 0 7 0
10 2 | 5 5 96 0 | 0 0 27 0
10 3 | 5 2 246 0 | 0 0 117 0
10 4 | 10 10 426 0 | 10 4 357 0
10 5 | 15 12 510 0 | 5 8 777 0
10 6 | 15 16 426 0 | 10 9 1281 0
10 7 | 10 4 246 0 | 10 9 1701 0
10 8 | 10 5 96 0 | 10 5 1941 0
10 9 | 5 4 26 0 | 10 7 2031 0
10 10 | 5 0 7 0 | 10 7 2051 0
11 1 | 10 8 28 0 | 0 0 7 0
11 2 | 15 11 116 0 | 0 0 29 0
11 3 | 21 24 336 0 | 10 3 139 0
11 4 | 21 18 666 0 | 5 2 469 0
11 5 | 21 20 930 0 | 5 3 1129 0
11 6 | 26 35 930 0 | 15 12 2053 0
11 7 | 20 14 666 0 | 5 25 2977 0
11 8 | 15 9 336 0 | 20 14 3637 0
11 9 | 10 7 116 0 | 21 27 3967 0
11 10 | 10 8 28 0 | 36 34 4086 0
11 11 | 5 8 7 0 | 15 15 4109 0
12 1 | 16 18 30 0 | 5 0 7 0
12 2 | 31 32 138 0 | 0 0 31 0
12 3 | 31 26 446 0 | 10 2 163 0
12 4 | 47 40 996 0 | 10 7 603 0
12 5 | 47 46 1590 0 | 21 17 1593 0
12 6 | 57 53 1854 0 | 31 30 3177 0
12 7 | 41 39 1590 0 | 31 30 5025 0
12 8 | 41 42 996 0 | 42 43 6609 0
12 9 | 31 26 446 0 | 52 52 7607 0
12 10 | 20 19 138 0 | 57 62 8048 0
12 11 | 15 17 30 0 | 72 64 8181 0
12 12 | 15 10 7 0 | 67 38 8217 0
13 1 | 31 32 32 0 | 0 0 7 0
13 2 | 21 25 162 0 | 0 0 33 0
13 3 | 36 34 578 0 | 5 2 189 0
13 4 | 57 65 1436 0 | 10 5 761 0
13 5 | 41 40 2580 0 | 10 10 2191 0
13 6 | 62 56 3438 0 | 31 32 4765 0
13 7 | 62 62 3438 0 | 57 53 8251 0
13 8 | 52 64 2580 0 | 52 47 11710 0
13 9 | 26 28 1436 0 | 93 96 14311 0
13 10 | 31 29 578 0 | 161 104 15891 0
13 11 | 36 35 162 0 | 129 99 16525 0
13 12 | 21 22 32 0 | 156 96 16383 0
13 13 | 26 30 7 0 | 166 98 16411 0
14 1 | 57 53 34 0 | 0 0 7 0
14 2 | 52 50 188 0 | 0 0 35 0
14 3 | 57 60 734 0 | 10 4 217 0
14 4 | 78 76 2008 0 | 15 8 945 0
14 5 | 99 97 4010 0 | 36 34 2947 0
14 6 | 120 125 6012 0 | 41 47 6951 0
14 7 | 125 119 6870 0 | 93 94 12957 0
14 8 | 135 138 6012 0 | 88 98 19821 0
14 9 | 78 153 4010 0 | 234 156 26099 0
14 10 | 94 92 2008 0 | 229 133 30169 0
14 11 | 83 90 734 0 | 239 136 32237 0
14 12 | 47 46 188 0 | 281 176 33031 0
14 13 | 52 53 34 0 | 260 167 32767 0
14 14 | 46 47 7 0 | 203 149 32797 0
15 1 | 83 83 36 0 | 0 0 7 0
15 2 | 145 139 216 0 | 0 2 37 0
15 3 | 104 98 916 0 | 0 2 247 0
15 4 | 135 135 2736 0 | 15 17 1157 0
15 5 | 94 97 6012 0 | 26 27 3887 0
15 6 | 192 188 10016 0 | 57 53 9893 0
15 7 | 187 192 12876 0 | 73 73 19903 0
15 8 | 286 296 12876 0 | 338 230 33123 0
15 9 | 208 207 10016 0 | 354 223 46063 0
15 10 | 140 143 6012 0 | 443 334 56143 0
15 11 | 88 86 2736 0 | 391 273 62219 0
15 12 | 73 72 916 0 | 432 269 65019 0
15 13 | 109 117 216 0 | 317 210 65999 0
15 14 | 156 187 36 0 | 411 277 66279 0
15 15 | 140 142 7 0 | 354 209 65567 0
16 1 | 281 281 38 0 | 0 0 7 0
16 2 | 141 146 246 0 | 0 0 39 0
16 3 | 208 206 1126 0 | 10 4 279 0
16 4 | 187 189 3646 0 | 15 13 1399 0
16 5 | 234 234 8742 0 | 42 42 5039 0
16 6 | 333 337 16022 0 | 83 85 13775 0
16 7 | 672 742 22886 0 | 395 235 30087 0
16 8 | 510 510 25746 0 | 479 305 53041 0
16 9 | 672 675 22886 0 | 671 489 78855 0
16 10 | 489 492 16022 0 | 859 578 101809 0
16 11 | 250 258 8742 0 | 719 487 117899 0
16 12 | 198 202 3646 0 | 745 483 126709 0
16 13 | 119 119 1126 0 | 770 506 130423 0
16 14 | 291 327 246 0 | 770 531 131617 0
16 15 | 156 156 38 0 | 713 451 131931 0
16 16 | 125 139 7 0 | 895 631 132037 0
17 1 | 406 437 40 0 | 0 0 7 0
17 2 | 307 320 278 0 | 0 0 41 0
17 3 | 281 290 1366 0 | 0 3 313 0
17 4 | 307 317 4766 0 | 31 28 1673 0
17 5 | 354 378 12382 0 | 41 45 6433 0
17 6 | 583 582 24758 0 | 130 127 18809 0
17 7 | 839 859 38902 0 | 693 449 43873 0
17 8 | 1177 1183 48626 0 | 916 679 82847 0
17 9 | 1031 1054 48626 0 | 1270 944 131545 0
17 10 | 828 832 38902 0 | 1469 1105 180243 0
17 11 | 672 668 24758 0 | 1535 1114 219217 0
17 12 | 422 422 12382 0 | 1494 991 244047 0
17 13 | 474 482 4766 0 | 1615 1165 256501 0
17 14 | 599 607 1366 0 | 1500 1042 261339 0
17 15 | 223 218 278 0 | 1401 1065 262777 0
17 16 | 229 228 40 0 | 1390 918 263127 0
17 17 | 541 554 7 0 | 1562 1045 263239 0
18 1 | 401 405 42 0 | 0 0 7 0
18 2 | 401 397 312 0 | 0 0 43 0
18 3 | 458 493 1638 0 | 5 6 349 0
18 4 | 583 581 6126 0 | 16 13 1981 0
18 5 | 697 700 17142 0 | 83 130 8101 0
18 6 | 792 799 37134 0 | 156 162 25237 0
18 7 | 1672 1727 63654 0 | 1098 751 62693 0
18 8 | 1598 1601 87522 0 | 1416 1007 126423 0
18 9 | 1849 1893 97246 0 | 2051 1522 214021 0
18 10 | 1963 2083 87522 0 | 2734 2103 311343 0
18 11 | 1411 1428 63654 0 | 2849 2352 398941 0
18 12 | 1042 1048 37134 0 | 3021 2332 462671 0
18 13 | 942 985 17142 0 | 3036 2314 499881 0
18 14 | 656 666 6126 0 | 3052 2177 517099 0
18 15 | 526 532 1638 0 | 2910 2021 523301 0
18 16 | 614 621 312 0 | 3083 2108 525015 0
18 17 | 536 551 42 0 | 2921 2031 525403 0
18 18 | 682 680 7 0 | 3141 2098 525521 0
19 1 | 885 909 44 0 | 0 0 7 0
19 2 | 1411 1498 348 0 | 0 0 45 0
19 3 | 880 887 1944 0 | 5 4 387 0
19 4 | 1119 1139 7758 0 | 26 25 2325 0
19 5 | 1120 1127 23262 0 | 73 72 10077 0
19 6 | 1395 1462 54270 0 | 453 387 33591 0
19 7 | 1875 1929 100782 0 | 1197 838 87941 0
19 8 | 2656 2723 151170 0 | 2255 1616 188803 0
19 9 | 3046 3092 184762 0 | 3317 2568 340053 0
19 10 | 3635 3803 184762 0 | 5171 4041 524895 0
19 11 | 2739 2774 151170 0 | 5577 4574 709737 0
19 12 | 3203 3348 100782 0 | 6182 5194 860987 0
19 13 | 1672 1750 54270 0 | 6458 5561 961849 0
19 14 | 1760 1835 23262 0 | 6177 4964 1016199 0
19 15 | 968 1006 7758 0 | 6266 4331 1039541 0
19 16 | 1099 1134 1944 0 | 6208 4254 1047379 0
19 17 | 995 1037 348 0 | 6385 4366 1049403 0
19 18 | 916 964 44 0 | 6036 4268 1049831 0
19 19 | 1135 1138 7 0 | 6234 4320 1049955 0
20 1 | 1797 1821 46 0 | 0 0 7 0
20 2 | 2000 2029 386 0 | 0 0 47 0
20 3 | 2031 2071 2286 0 | 10 6 427 0
20 4 | 1942 2036 9696 0 | 31 34 2707 0
20 5 | 2104 2161 31014 0 | 88 85 12397 0
20 6 | 2880 2958 77526 0 | 860 554 43675 0
20 7 | 3791 3940 155046 0 | 2026 1405 121285 0
20 8 | 5130 5307 251946 0 | 3823 2731 276415 0
20 9 | 6547 6845 335926 0 | 5380 4148 528445 0
20 10 | 7119 7357 369518 0 | 8271 6685 864455 0
20 11 | 5692 5803 335926 0 | 9557 8029 1234057 0
20 12 | 4734 4850 251946 0 | 11114 9504 1570067 0
20 13 | 3604 3641 155046 0 | 11551 10434 1822097 0
20 14 | 2911 2999 77526 0 | 12317 10822 1977227 0
20 15 | 2115 2134 31014 0 | 12806 10679 2054837 0
20 16 | 2041 2095 9696 0 | 13062 9115 2085935 0
20 17 | 2390 2465 2286 0 | 12807 9002 2095715 0
20 18 | 1765 1788 386 0 | 12598 8601 2098085 0
20 19 | 2067 2143 46 0 | 12578 8626 2098555 0
20 20 | 1640 1663 7 0 | 12932 9064 2098685 0
21 1 | 3374 3425 48 0 | 0 0 7 0
21 2 | 4031 4157 426 0 | 0 1 49 0
21 3 | 3218 3250 2666 0 | 10 5 469 0
21 4 | 3687 3734 11976 0 | 21 25 3129 0
21 5 | 3692 3735 40704 0 | 115 114 15099 0
21 6 | 4859 4943 108534 0 | 963 661 56079 0
21 7 | 6114 6218 232566 0 | 2620 1880 164701 0
21 8 | 8573 8745 406986 0 | 4999 3693 397355 0
21 9 | 11880 12186 587866 0 | 9047 6863 804429 0
21 10 | 13255 13582 705438 0 | 14358 11436 1392383 0
21 11 | 13531 13807 705438 0 | 18823 15502 2097909 0
21 12 | 12244 12400 587866 0 | 21834 18760 2803435 0
21 13 | 9406 9528 406986 0 | 23771 21274 3391389 0
21 14 | 7114 7180 232566 0 | 26677 24296 3798463 0
21 15 | 4869 4961 108534 0 | 26479 23998 4031117 0
21 16 | 4416 4521 40704 0 | 26536 22976 4139739 0
21 17 | 4380 4443 11976 0 | 26490 19107 4180531 0
21 18 | 3265 3334 2666 0 | 25979 17995 4192595 0
21 19 | 3640 3768 426 0 | 26186 17891 4195349 0
21 20 | 3234 3295 48 0 | 25688 17653 4195863 0
21 21 | 3156 3219 7 0 | 26140 17838 4195999 0
The CUBE extension to a group by clause represents all combinations of the given list. E.g, the following will give all 3-combinations of a 4-element set.
select concat(a,b,c,d)
from (select 'a','b','c','d') as t(a,b,c,d)
group by cube(a,b,c,d)
having len(concat(a,b,c,d)) = 3
How about some dynamic SQL?
DECLARE #k int = 5, #n INT
IF OBJECT_ID('tempdb..#set') IS NOT NULL DROP TABLE #set
CREATE TABLE #set ( [value] varchar(24) )
INSERT #set VALUES ('1'),('2'),('3'),('4'),('5'),('6')
SET #n = ##ROWCOUNT
SELECT dbo.Factorial(#n) / (dbo.Factorial(#k) * dbo.Factorial(#n - #k)) AS [expected combinations]
-- let's generate some sql.
DECLARE
#crlf NCHAR(2) = NCHAR(13)+NCHAR(10)
, #sql NVARCHAR(MAX)
, #select NVARCHAR(MAX)
, #from NVARCHAR(MAX)
, #order NVARCHAR(MAX)
, #in NVARCHAR(MAX)
DECLARE #j INT = 0
WHILE #j < #k BEGIN
SET #j += 1
IF #j = 1 BEGIN
SET #select = 'SELECT'+#crlf+' _1.value AS [1]'
SET #from = #crlf+'FROM #set AS _1'
SET #order = 'ORDER BY _1.value'
SET #in = '[1]'
END
ELSE BEGIN
SET #select += #crlf+', _'+CONVERT(VARCHAR,#j)+'.value AS ['+CONVERT(VARCHAR,#j)+']'
SET #from += #crlf+'INNER JOIN #set AS _'+CONVERT(VARCHAR,#j)+' ON _'+CONVERT(VARCHAR,#j)+'.value > _'+CONVERT(VARCHAR,#j-1)+'.value'
SET #order += ', _'+CONVERT(VARCHAR,#j)+'.value'
SET #in += ', ['+CONVERT(VARCHAR,#j)+']'
END
END
SET #select += #crlf+', ROW_NUMBER() OVER ('+#order+') AS combination'
SET #sql = #select + #from
-- let's see how it looks
PRINT #sql
EXEC (#sql)
-- ok, now dump pivot and dump into a table for later use
IF OBJECT_ID('tempdb..#combinations') IS NOT NULL DROP TABLE #combinations
CREATE TABLE #combinations (
combination INT
, value VARCHAR(24)
, PRIMARY KEY (combination, value)
)
SET #sql
= 'WITH CTE AS ('+#crlf+#sql+#crlf+')'+#crlf
+ 'INSERT #combinations (combination, value)'+#crlf
+ 'SELECT combination, value FROM CTE a'+#crlf
+ 'UNPIVOT (value FOR position IN ('+#in+')) AS b'
PRINT #sql
EXEC (#sql)
SELECT COUNT(DISTINCT combination) AS [returned combinations] FROM #combinations
SELECT * FROM #combinations
Generates the following query for #k = 5:
SELECT
_1.value AS [1]
, _2.value AS [2]
, _3.value AS [3]
, _4.value AS [4]
, _5.value AS [5]
, ROW_NUMBER() OVER (ORDER BY _1.value, _2.value, _3.value, _4.value, _5.value) AS combination
FROM #set AS _1
INNER JOIN #set AS _2 ON _2.value > _1.value
INNER JOIN #set AS _3 ON _3.value > _2.value
INNER JOIN #set AS _4 ON _4.value > _3.value
INNER JOIN #set AS _5 ON _5.value > _4.value
Which it then unpivots and dumps into a table.
The dynamic SQL is ugly, and you can't wrap it in a UDF, but the query produced is very efficient.
First create this UDF...
CREATE FUNCTION [dbo].[_ex_fn_SplitToTable] (#str varchar(5000), #sep char(1) = null)
RETURNS #ReturnVal table (n int, s varchar(5000))
AS
/*
Alpha Test
-----------
select * from [dbo].[_ex_fn_SplitToTable_test01]('abcde','')
*/
BEGIN
declare #str2 varchar(5000)
declare #sep2 char(1)
if LEN(ISNULL(#sep,'')) = 0
begin
declare #i int
set #i = 0
set #str2 = ''
declare #char varchar(1)
startloop:
set #i += 1
--print #i
set #char = substring(#str,#i,1)
set #str2 = #str2 + #char + ','
if LEN(#str) <= #i
goto exitloop
goto startloop
exitloop:
set #str2 = left(#str2,LEN(#str2) - 1)
set #sep2 = ','
--print #str2
end
else
begin
set #str2 = #str
set #sep2 = #sep
end
;WITH Pieces(n, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep2, #str2)
UNION ALL
SELECT n + 1, stop + 1, CHARINDEX(#sep2, #str2, stop + 1)
FROM Pieces
WHERE stop > 0
)
insert into #ReturnVal(n,s)
SELECT n,
SUBSTRING(#str2, start, CASE WHEN stop > 0 THEN stop-start ELSE 5000 END) AS s
FROM Pieces option (maxrecursion 32767)
RETURN
END
GO
Then create this stored proc...
CREATE proc [CombinationsOfString]
(
#mystring varchar(max) = '0,5,10,15,20,25'
)
as
/*
ALPHA TEST
---------
exec CombinationsOfString '-20,-10,0,10,20'
*/
if object_id('tempdb..#_201606070947_myorig') is not null drop table #_201606070947_myorig
CREATE TABLE #_201606070947_myorig
(
SourceId int not null identity(1,1)
,Element varchar(100) not null
)
insert into #_201606070947_myorig
select s from dbo._ex_fn_SplitToTable(#mystring,',')
--select SourceId, Element from #_201606070947_myorig
declare #mynumerics varchar(max)
set #mynumerics = (
select STUFF(REPLACE((SELECT '#!' + LTRIM(RTRIM(SourceId)) AS 'data()'
FROM #_201606070947_myorig
FOR XML PATH('')),' #!',', '), 1, 2, '') as Brands
)
set #mynumerics = REPLACE(#mynumerics,' ','')
print #mynumerics
if object_id('tempdb..#_201606070947_source') is not null drop table #_201606070947_source
if object_id('tempdb..#_201606070947_numbers') is not null drop table #_201606070947_numbers
if object_id('tempdb..#_201606070947_results') is not null drop table #_201606070947_results
if object_id('tempdb..#_201606070947_processed') is not null drop table #_201606070947_processed
CREATE TABLE #_201606070947_source
(
SourceId int not null identity(1,1)
,Element char(1) not null
)
--declare #mynumerics varchar(max)
--set #mynumerics = '1,2,3,4,5'
insert into #_201606070947_source
select s from dbo._ex_fn_SplitToTable(#mynumerics,',')
-- select * from #_201606070947_source
declare #Length int
set #Length = (select max(SourceId) from #_201606070947_source)
declare #columnstring varchar(max) = (SELECT REPLICATE('c.',#Length))
print #columnstring
declare #subs varchar(max) = (SELECT REPLICATE('substring.',#Length))
print #subs
if object_id('tempdb..#_201606070947_columns') is not null drop table #_201606070947_columns
select s+CONVERT(varchar,dbo.PadLeft(convert(varchar,n),'0',3)) cols
into #_201606070947_columns
from [dbo].[_ex_fn_SplitToTable](#columnstring,'.') where LEN(s) > 0
if object_id('tempdb..#_201606070947_subcolumns') is not null drop table #_201606070947_subcolumns
select s+'(Combo,'+CONVERT(varchar,n)+',1) ' + 'c'+CONVERT(varchar,dbo.PadLeft(convert(varchar,n),'0',3)) cols
into #_201606070947_subcolumns
from [dbo].[_ex_fn_SplitToTable](#subs,'.') where LEN(s) > 0
-- select * from #_201606070947_subcolumns
-- select * from #_201606070947_columns
declare #columns_sql varchar(max)
set #columns_sql =
(
select distinct
stuff((SELECT distinct + cast(cols as varchar(50)) + ' VARCHAR(1), '
FROM (
select cols
from #_201606070947_columns
) t2
--where t2.n = t1.n
FOR XML PATH('')),3,0,'')
from (
select cols
from #_201606070947_columns
) t1
)
declare #substring_sql varchar(max)
set #substring_sql =
(
select distinct
stuff((SELECT distinct + cast(cols as varchar(100)) + ', '
FROM (
select cols
from #_201606070947_subcolumns
) t2
--where t2.n = t1.n
FOR XML PATH('')),3,0,'')
from (
select cols
from #_201606070947_subcolumns
) t1
)
set #substring_sql = left(#substring_sql,LEN(#substring_sql) - 1)
print #substring_sql
set #columns_sql = LEFT(#columns_sql,LEN(#columns_sql) - 1)
--SELECT #columns_sql
declare #sql varchar(max)
set #sql = 'if object_id(''tempdb..##_201606070947_01'') is not null drop table ##_201606070947_01 create table ##_201606070947_01 (rowid int,' + #columns_sql + ')'
print #sql
execute(#sql)
CREATE TABLE #_201606070947_numbers (Number int not null)
insert into #_201606070947_numbers
select SourceId from #_201606070947_source
CREATE TABLE #_201606070947_results
(
Combo varchar(10) not null
,Length int not null
)
SET NOCOUNT on
DECLARE
#Loop int
,#MaxLoop int
-- How many elements there are to process
SELECT #MaxLoop = max(SourceId)
from #_201606070947_source
-- Initialize first value
TRUNCATE TABLE #_201606070947_results
INSERT #_201606070947_results (Combo, Length)
select Element, 1
from #_201606070947_source
where SourceId = 1
SET #Loop = 2
-- Iterate to add each Element after the first
WHILE #Loop <= #MaxLoop
BEGIN
INSERT #_201606070947_results (Combo, Length)
select distinct
left(re.Combo, #Loop - nm.Number)
+ so.Element
+ RIGHT(re.Combo, nm.Number - 1)
,#Loop
from #_201606070947_results re
inner join #_201606070947_numbers nm
on nm.Number <= #Loop
inner join #_201606070947_source so
on so.SourceId = #Loop
where re.Length = #Loop - 1
SET #Loop = #Loop + 1
END
-- select * from #_201606070947_results
-- Show #_201606070947_results
SELECT *
into #_201606070947_processed
from #_201606070947_results
where Length = #MaxLoop
order by Combo
-- select * from #_201606070947_processed
set #sql = 'if object_id(''tempdb..##_201606070947_02'') is not null drop table ##_201606070947_02 '
print #sql
execute(#sql)
set #sql = ' ' +
' SELECT ROW_NUMBER() OVER(ORDER BY Combo Asc) AS RowID,' + #substring_sql +
' into ##_201606070947_02 ' +
' FROM #_201606070947_processed ' +
' '
PRINT #sql
execute(#sql)
declare #columns_sql_new varchar(max)
set #columns_sql_new = REPLACE(#columns_sql,'(1)','(100)')
set #sql = 'if object_id(''tempdb..##_201606070947_03'') is not null drop table ##_201606070947_03 create table ##_201606070947_03 (RowId int,' + #columns_sql_new + ')'
PRINT #sql
execute(#sql)
insert into ##_201606070947_03 (RowId)
select RowId from ##_201606070947_02
--select * from ##_201606070947_03
DECLARE #ColumnId varchar(10)
DECLARE #getColumnId CURSOR
SET #getColumnId = CURSOR FOR
select cols ColumnId from #_201606070947_columns
OPEN #getColumnId
FETCH NEXT
FROM #getColumnId INTO #ColumnId
WHILE ##FETCH_STATUS = 0
BEGIN
PRINT #ColumnId
set #sql = ' ' +
' update ##_201606070947_03
set ' + #ColumnId + ' = B.Element
from ##_201606070947_03 A
, (
select A.RowID, B.*
from
(
select * from ##_201606070947_02
) A
,
(
select * from #_201606070947_myorig
) B
where A.' + #ColumnId + ' = B.SourceId
) B
where A.RowId = B.RowId
'
execute(#sql)
print #sql
FETCH NEXT
FROM #getColumnId INTO #ColumnId
END
CLOSE #getColumnId
DEALLOCATE #getColumnId
select * from ##_201606070947_03
I ran across another technique for calculating combinations, and it is so simple. In a public SQL challenge, it also soundly beat my own attempt based on the same bit-pattern technique as in my (currently) accepted answer. Though I admit I didn't play with it super long, or grab my best solution from here and adapt, but I wrote it over again.
Anyway, I was sure I had taken myself down a peg--here I was thinking I'd hit on something very cool but later found out my solution was easily surpassed. But to my surprise, when I tried the technique out, it was worse than the best method above. It is here for reference due to its easiness to implement and its reasonable performance for small jobs (thus making it superior except when the job may demand the added complexity of something like the bit-pattern technique).
Given a table #Set with the same number of rows as the items being selected from, and variable #K for the number of items taken at a time, here is that method:
WITH Chains AS (
SELECT
Generation = 1,
Chain = Convert(varchar(1000),'|' + Value + '|'),
Value
FROM
#Set
WHERE
#K > 0
AND value <= ALL (
SELECT TOP (#K) Value
FROM #Set
ORDER BY Value DESC
)
UNION ALL
SELECT
C.Generation + 1,
Convert(varchar(1000), C.Chain + S.Value + '|'),
S.Value
FROM
Chains C
INNER JOIN #Set S
ON C.Value < S.Value
WHERE
C.Generation <= #K
)
SELECT
C.Chain,
S.Value
FROM
Chains C
INNER JOIN #Set S
ON C.Chain LIKE '%|' + S.Value + '|%'
WHERE
Generation = #K;