SQL Coalesce with missing values - sql

I have two tables, master and child. The master's primary key MM is an INT. The child table has a compound key of two columns and value column:
MM (INT)
POS (INT, values 1-32)
VV (INT, values 1-9)
Sample master table data:
(1, other data)
(2, other data)
(3, other data)
Sample child table data
(1, 1,2)
(1, 2,2)
(1, 4,1)
(1,15,1)
(2, 4,5)
(2, 5,3)
(2,31,7)
(3,3,1)
(4,18,2)
{4,19,5)
For a report I could like to de-normalize the data with an output like this:
(1,'22010000000000010000000000000000')
(2,'00053000000000000000000000000070')
(3,'00100000000000000000000000000000')
(4,'00000000000000000025000000000000')
I was thinking to use a select query with coalesce like this but the output is not not exactly what I want:
(1,'22110')
(2,'537')
(3,'1')
(4,'25')
How do I fill in the missing data with zeros?

One way I can think to do this uses a decimal value with a precision of 32 and sum() and then convert back to a zero-padded string:
select mm,
right(replicate('0', 32) + cast(sum(val) as varchar(32)), 32)
from (select c.*,
cast(cast(val as varchar(32)) + replicate('0', 32 - pos) as decimal(32, 0)) as val
from child c
) c
group by mm;
EDIT:
The above isn't generalizable (say, above 38 characters or to use letters as well as digits). Here is a more generalizable, but longer version:
select c.mm,
(max(case when pos = 1 then valc else '0' end) +
max(case when pos = 2 then valc else '0' end) +
max(case when pos = 3 then valc else '0' end) +
. . .
max(case when pos = 32 then valc else '0' end) +
)
from (select c.*, cast(val as varchar(255)) as valc
from child c
) c
group by c.mm;
I should note that if you want to handle a master with no children, then use a left join. That aspect of the problem seems less interesting than combining the values in the appropriate positions.

Try it like this
DECLARE #master TABLE(MM INT,OtherData VARCHAR(100));
INSERT INTO #master VALUES
(1, 'Other Data 1')
,(2, 'Other Data 2')
,(3, 'Other Data 3');
DECLARE #child TABLE(MM INT, POS INT, VV INT)
INSERT INTO #child VALUES
(1, 1,2)
,(1, 2,2)
,(1, 4,1)
,(1,15,1)
,(2, 4,5)
,(2, 5,3)
,(2,31,7)
,(3,3,1)
,(4,18,2)
,(4,19,5);
--One CTE to get 32 numbers
WITH Numbers(Nr) AS
(SELECT TOP 32 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM sys.objects) --get 32 numbers
--another CTE to get distinct MMs
,MMs AS
(
SELECT c.MM
,m.OtherData
FROM #child AS c
LEFT JOIN #master AS m ON c.MM=m.MM
GROUP BY c.MM,m.OtherData
)
--In this CTE The CROSS JOIN with the Numbers will create a list of 32 rows, which carry in all positions with a corresponding child its number. COALESCE will set a zero in the place of all NULLs
,Masked AS
(
SELECT MMs.MM
,MMs.OtherData
,Nr
,COALESCE(VV,0) AS Val
FROM MMs
CROSS JOIN Numbers
LEFT JOIN #child AS c1 ON c1.MM=MMs.MM AND c1.POS=Nr
)
-The final SELECT uses FOR XML PATH to get the 32 numbers in rows back to a string
SELECT *
,(
SELECT Masked.Val AS [*]
FROM Masked
WHERE Masked.MM=MMs.MM
FOR XML PATH('')
)
FROM MMs
The result
1 22010000000000100000000000000000
2 00053000000000000000000000000070
3 00100000000000000000000000000000
4 00000000000000000250000000000000

Related

Find and classify sequential patterns for a distinct group T-SQL

I need help finding and classifying sequential patterns for each distinct key.
From the data I have, I need to create a new table that contains the key and a pattern identifier that belongs to that key.
From the example below the pattern is as follows:
Key #1 and #3 have the values 1, 2 and 3. The Key #3 has the values 8,
9 and 10. When a distinct pattern exists for a key I.E (1, 2, 3) I
need to create an entry on the table for the key # and that specific
pattern (1, 2, 3)
Data:
key value
1 1
1 2
1 3
2 8
2 9
2 10
3 1
3 2
3 3
Expected Output:
key pattern
1 1
2 2
3 1
Fiddle:
http://sqlfiddle.com/#!6/4fe39
Example table:
CREATE TABLE yourtable
([key] int, [value] int)
;
INSERT INTO yourtable
([key], [value])
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 8),
(2, 9),
(2, 10),
(3, 1),
(3, 2),
(3, 3)
;
You can concatenate the values together in several ways. The traditional method in SQL Server uses for xml:
select k.key,
stuff( (select ',' + cast(t.id as varchar(255))
from t
where k.key = t.key
for xml path ('')
order by t.id
), 1, 1, ''
) as ids
from (select distinct key from t) k;
You can convert this to a unique number using a CTE/subquery:
with cte as (
select k.key,
stuff( (select ',' + cast(t.id as varchar(255))
from t
where k.key = t.key
for xml path ('')
order by t.id
), 1, 1, ''
) as ids
from (select distinct key from t) k
)
select cte.*, dense_rank() over (order by ids) as ids_id
from cte;

SQL - string combine based on id

Need suggestion to split string in table 1, match its Ids with table 2 and concatenate the values.
Table - 1
Id Tbl1Col
1 2
2 2,4
3
4 6
5 3
Table - 2
Id Tbl2Col
1 E
2 F
3 M
4 U
5 P
6 C
7 N
8 G
Query -
SELECT T2.Tbl2Col
FROM Table1 AS T1
LEFT JOIN Table2 AS T2 WHERE T1.Tbl1Col= T2.Id
WHERE T1.Id = #Id
Now If #Id = 1, Output is F -- works fine
Now If #Id = 2, Output should be FU -- should not be F,U
Yuck! But you can use LIKE:
SELECT T2.Tbl2Col
FROM Table1 T1 LEFT JOIN
Table2 T2
WHERE ',' + T1.Tbl1Col + ',' LIKE '%,' + CAST(T2.Id as VARCHAR(255)) + ',%'
WHERE T1.Id = #Id;
You have a lousy data format, so this cannot make use of indexes. You should really have a separate table, with one row per Table1.id and Table2.id. Such a table is called a junction table or an association table.
create table dbo.Table01 (
Id int
, Col varchar(100)
);
create table dbo.Table02 (
Id int
, Col varchar(100)
);
insert into dbo.Table01 (Id, Col)
values (1, '2'), (2, '2, 4');
insert into dbo.Table02 (Id, Col)
values (1, 'E'), (2, 'F'), (4, 'U');
select
t.Id
, replace(STRING_AGG (t02.Col, ','), ',', '') as StringAgg
from dbo.Table01 t
cross apply string_split (t.Col, ',') as ss
inner join dbo.Table02 t02 on ss.value = t02.Id
group by t.id
Follow the next approach:-
1) Turning a Comma Separated string into individual rows via using CROSS APPLY with XML
2) Join the two tables with left join.
3) Concatenate many rows with same id via using STUFF & FOR XML
4) Use Replace function for removing comma.
Demo:-
declare #MyTable table (id int , Tbl1Col varchar(10))
insert into #MyTable values (1,'2'),(2,'2,4'),(3,''),(4,'6'),(5,'3')
declare #MyTable2 table (id int , Tbl2Col varchar(10))
insert into #MyTable2 values (1,'E'),(2,'F'),(3,'M'),(4,'U'),(5,'P'),(6,'C'),(7,'N'),(8,'G')
select a.id , Tbl2Col
into #TestTable
from
(
SELECT A.id,
Split.a.value('.', 'VARCHAR(100)') AS Tbl1Col
FROM
(
SELECT id,
CAST ('<M>' + REPLACE(Tbl1Col, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM #MyTable
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a) ) a
left join #MyTable2 b
on a.Tbl1Col = b.id
order by a.id
SELECT id, Tbl2Col =
Replace(STUFF((SELECT DISTINCT ', ' + Tbl2Col
FROM #TestTable b
WHERE b.id = a.id
FOR XML PATH('')), 1, 2, ''),',','')
FROM #TestTable a
GROUP BY id
Output:-
1 F
2 F U
3 NULL
4 C
5 M
References:-
Turning a Comma Separated string into individual rows
How to concatenate many rows with same id in sql?
Finally:-
Don't use this approach, and normalize your database instead , just use it as fun/training/trying .... etc code.

T-SQL Summation

I'm trying to create result set with 3 columns. Each column coming from the summation of 1 Column of Table A but grouped by different ID's. Here's an overview of what I wanted to do..
Table A
ID Val.1
1 4
1 5
1 6
2 7
2 8
2 9
3 10
3 11
3 12
I wanted to create something like..
ROW SUM.VAL.1 SUM.VAL.2 SUM.VAL.3
1 15 21 33
I understand that I can not get this using UNION, I was thinking of using CTE but not quite sure with the logic.
You need conditional Aggregation
select 1 as Row,
sum(case when ID = 1 then Val.1 end),
sum(case when ID = 2 then Val.1 end),
sum(case when ID = 3 then Val.1 end)
From yourtable
You may need dynamic cross tab or pivot if number of ID's are not static
DECLARE #col_list VARCHAR(8000)= Stuff((SELECT ',sum(case when ID = '+ Cast(ID AS VARCHAR(20))+ ' then [Val.1] end) as [val.'+Cast(ID AS VARCHAR(20))+']'
FROM Yourtable
GROUP BY ID
FOR xml path('')), 1, 1, ''),
#sql VARCHAR(8000)
exec('select 1 as Row,'+#col_list +'from Yourtable')
Live Demo
I think pivoting the data table will yield the desired result.
IF OBJECT_ID('tempdb..#TableA') IS NOT NULL
DROP TABLE #TableA
CREATE TABLE #TableA
(
RowNumber INT,
ID INT,
Value INT
)
INSERT #TableA VALUES (1, 1, 4)
INSERT #TableA VALUES (1, 1, 5)
INSERT #TableA VALUES (1, 1, 6)
INSERT #TableA VALUES (1, 2, 7)
INSERT #TableA VALUES (1, 2, 8)
INSERT #TableA VALUES (1, 2, 9)
INSERT #TableA VALUES (1, 3, 10)
INSERT #TableA VALUES (1, 3, 11)
INSERT #TableA VALUES (1, 3, 12)
-- https://msdn.microsoft.com/en-us/library/ms177410.aspx
SELECT RowNumber, [1] AS Sum1, [2] AS Sum2, [3] AS Sum3
FROM
(
SELECT RowNumber, ID, Value
FROM #TableA
) a
PIVOT
(
SUM(Value)
FOR ID IN ([1], [2], [3])
) AS p
This technique works if the ids you are seeking are constant, otherwise I imagine some dyanmic-sql would work as well if changing ids are needed.
https://msdn.microsoft.com/en-us/library/ms177410.aspx

Splitting of string by fixed keyword

Hi I currently have a tables with a column that I would like to split.
ID Serial
1 AAA"-A01-AU-234-U_xyz(CY)(REV-002)
2 AAA"-A01-AU-234-U(CY)(REV-1)
3 AAA"-A01-AU-234-U(CY)(REV-101)
4 VVV"-01-AU-234-Z_ww(REV-001)
5 VVV"-01-AU-234-Z(REV-001)_xyz(CY)
6 V-VV"-01-AU-234-Z(REV-03)_xyz(CY)
7 V-VV"-01-AU-234-Z-ZZZ(REV-004)_xyz(CY)
I would like to split this column into 2 field via a select statement
The first field would consist of the text from the start and end when this scenario is satisfied
After the first "-
take all text till the next 3 hypen (-)
Take the first letter after the last hypen(-)
The second field would want to store the Value(Int) inside the (REV) bracket. Rev is always stored inside a compassing bracket (Rev-xxx) the number may stretch from 0-999 and have different form of representation
Example of output
Field 1 Field 2
AAA"-A01-AU-234-U 2
AAA"-A01-AU-234-U 1
AAA"-A01-AU-234-U 101
VVV"-01-AU-234-Z 1
VVV"-01-AU-234-Z 1
V-VV"-01-AU-234-Z 3
V-VV"-01-AU-234-Z 4
Maybe it is possible to make it better and faster, but at least it does work. If i will have some time more i will look at this again to think of better solution, but it do the job.
create table #t
(
id int,
serial nvarchar(255)
)
go
insert into #t values (1, 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)')
insert into #t values (2, 'AAA"-A01-AU-234-U(CY)(REV-1)')
insert into #t values (3, 'AAA"-A01-AU-234-U(CY)(REV-101)')
insert into #t values (4, 'VVV"-01-AU-234-Z_ww(REV-001)')
insert into #t values (5, 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)')
insert into #t values (6, 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)')
insert into #t values (7, 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)')
go
select id, serial,
left(serial,charindex('-', serial, charindex('-', serial, charindex('-', serial, charindex('"',serial) + 2) +1) + 1) + 1) as 'Field2'
,cast( replace(left(right(serial, len(serial) - charindex('REV',serial) +1 ), CHARINDEX(')',right(serial, len(serial) - charindex('REV',serial) +1 )) - 1), 'REV-', '')as int) as 'Field1'
from #t
go
gives me:
id serial Field2 Field1
1 AAA"-A01-AU-234-U_xyz(CY)(REV-002) AAA"-A01-AU-234-U 2
2 AAA"-A01-AU-234-U(CY)(REV-1) AAA"-A01-AU-234-U 1
3 AAA"-A01-AU-234-U(CY)(REV-101) AAA"-A01-AU-234-U 101
4 VVV"-01-AU-234-Z_ww(REV-001) VVV"-01-AU-234-Z 1
5 VVV"-01-AU-234-Z(REV-001)_xyz(CY) VVV"-01-AU-234-Z 1
6 VVV"-01-AU-234-Z(REV-03)_xyz(CY) VVV"-01-AU-234-Z 3
7 VVV"-01-AU-234-Z(REV-004)_xyz(CY) VVV"-01-AU-234-Z 4
I came up with a solution in php using regular expressions.I am trying to convert it into posix standards supported by mysql.Anyways in the meanwhile you can have a look at this and it works perfect.
/The first script select the values for fields 1 namely AAA"-A01-AU-234-U/
<?php
$txt='VVV"-01-AU-234-Z(REV-001)_xyz(CY)';
$re1='((?:[a-z][a-z0-9_]*))';
$re2='.*?';
$re3='(\\d+)';
$re4='.*?';
$re5='((?:[a-z][a-z0-9_]*))';
$re6='.*?';
$re7='(\\d+)';
$re8='.*?';
$re9='([a-z])';
echo $re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9;
if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9."/is", $txt, $matches))
{
$var1=$matches[1][0];
$int1=$matches[2][0];
$var2=$matches[3][0];
$int2=$matches[4][0];
$w1=$matches[5][0];
print "($var1) ($int1) ($var2) ($int2) ($w1) \n";
}
?>
/*The second script selects values for field 2 namely the last integer*/
<?php
$txt='VVV"-01-AU-234-Z_ww(REV-001)';
$re1='.*?';
$re2='\\d';
$re3='.*?';
$re4='\\d';
$re5='.*?';
$re6='\\d';
$re7='.*?';
$re8='\\d';
$re9='.*?';
$re10='\\d';
$re11='.*?';
$re12='\\d';
$re13='.*?';
$re14='\\d';
$re15='(\\d)';
if ($c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7.$re8.$re9.$re10.$re11.$re12.$re13.$re14.$re15."/is", $txt, $matches))
{
$d1=$matches[1][0];
print "($d1) \n";
}
?>
OUTPUT:
(VVV) (01) (AU) (234) (Z) //script 1
(1) //script 2
You can add database connection to the script and store the results in a new table.You can aslo iterate each row as input to the script and store corresponding results in the table.
Note:
The regular expression used for selecting field 1:
((?:[a-z][a-z0-9_]*)).*?(\d+).*?((?:[a-z][a-z0-9_]*)).*?(\d+).*?([a-z])
The regular expression used for selecting field 2:
.*?\d.*?\d.*?\d.*?\d.*?\d.*?\d.*?\d(\d)
If anybody can convert the above expressions to posix standards then the user can write a simple query like
select t.serial as field 1 from table t
where t.serial regexp 'converted exp' join
(select t1.serial as field 2 from table t1
where t1.serial regexp 'converted exp')q
on q.id=t.id;
I tried to convert it but the matching constraints were lost.You should actually change ?: to ^ and ? to [^>] and //d to [0-9] or digit.Hope it helps.
Try this solution. It uses a combination of charindex and the substring function.
DECLARE #TempTable table
(
id int,
serial nvarchar(255)
)
insert into #TempTable values (1, 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)')
insert into #TempTable values (2, 'AAA"-A01-AU-234-U(CY)(REV-1)')
insert into #TempTable values (3, 'AAA"-A01-AU-234-U(CY)(REV-101)')
insert into #TempTable values (4, 'VVV"-01-AU-234-Z_ww(REV-001)')
insert into #TempTable values (5, 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)')
insert into #TempTable values (6, 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)')
insert into #TempTable values (7, 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)')
select
id,
serial,
substring(serial, 1, P4.Pos+1) as field1,
convert(int, substring(Serial, P6.Pos , P7.Pos - P6.Pos)) as field2
from #TempTable
cross apply (select (charindex('-', Serial))) as P1(Pos)
cross apply (select (charindex('-', Serial, P1.Pos+1))) as P2(Pos)
cross apply (select (charindex('-', Serial, P2.Pos+1))) as P3(Pos)
cross apply (select (charindex('-', Serial, P3.Pos+1))) as P4(Pos)
cross apply (select (charindex('REV-', Serial,P1.Pos+1)+4)) as P6(Pos)
--+4 because 'REV-' is 4 chars long
cross apply (select (charindex(')', Serial,P6.Pos+1))) as P7(Pos);
I have updated my answer. Is this better now?
DECLARE #Table table(ID int, SERIAL nvarchar(100));
INSERT INTO #Table(ID, SERIAL)
VALUES ('1', 'AAA"-A01-AU-234-U_xyz(CY)(REV-002)'),
('2', 'AAA"-A01-AU-234-U(CY)(REV-1)'),
('3', 'AAA"-A01-AU-234-U(CY)(REV-101)'),
('4', 'VVV"-01-AU-234-Z_ww(REV-001)'),
('5', 'VVV"-01-AU-234-Z(REV-001)_xyz(CY)'),
('6', 'VVV"-01-AU-234-Z(REV-03)_xyz(CY)'),
('7', 'VVV"-01-AU-234-Z(REV-004)_xyz(CY)'),
('8', 'AAA"-A01-AU-234-U-1111-(REV-111)'),
('9', 'AAA"-A01-AU-234-U-111111-5555(CY)(REV-101)'),
('10', 'V-VV"-01-AU-234-Z-ZZZ(REV-004)_xyz(CY)')
SELECT
ID,
SERIAL,
LEFT(SERIAL, P5.Pos + 1) AS Field1,
CONVERT(int, SUBSTRING(SERIAL, P6.Pos, CHARINDEX(')', RIGHT(SERIAL, LEN(SERIAL) - P6.Pos)))) AS Field2
FROM #Table
CROSS APPLY (SELECT CHARINDEX('"-', SERIAL)) AS P1(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P1.Pos + 1)) AS P2(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P2.Pos + 1)) AS P3(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P3.Pos + 1)) AS P4(Pos)
CROSS APPLY (SELECT CHARINDEX('-', SERIAL, P4.Pos + 1)) AS P5(Pos)
CROSS APPLY (SELECT CHARINDEX('REV-', SERIAL, P5.Pos + 1) + 4) AS P6(Pos)

Trying to accomplish without dynamic SQL (sql server)

All,
I'm trying to pull off an insert from one table to another without using dynamic sql. However, the only solutions I'm coming up with at the moment use dynamic sql. It's been tricky to search for any similar scenarios.
Here are the details:
My starting point is the following legacy table:
CREATE TABLE [dbo].[_Combinations](
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Red')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Orange')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Yellow')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Green')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Blue')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Indigo')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'Violet')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'A')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'B')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'C')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'D')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'E')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'F')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'G')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'H')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'I')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'J')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'K')
SELECT * FROM _Combinations
The _Combinations table contains a key for different types of attributes (AttributeID) and the possible values for each attribute (Value).
In this case, there are 3 different attributes with multiple possible values, however there can be many more (up to 10).
The requirement is then to create every possible combination of each value and store it normalized, as there will be other data stored with each possible combination. I need to store both the attribute keys and values that make up each combination, so it's not just a simple cross join to display each combination. The target table for storing each combination of attributes is this:
CREATE TABLE [dbo].[_CombinedAttributes](
[GroupKey] [int] NULL,
[AttributeID] [int] NULL,
[Value] [varchar](50) NULL
) ON [PRIMARY]
So attribute combination records using the above data would look like this in the target table:
GroupKey AttributeID Value
1 8 A
1 16 1
1 28 Red
2 8 B
2 16 1
2 28 Red
This gives me what I need. Each group has an identifier and I can track the attributeIDs and values that make up each group. I'm using two scripts to get from the _Combinations table to the format of the _CombinedAttributes table:
-- SCRIPT #1
SELECT Identity(int) AS RowNumber, * INTO #Test
FROM (
SELECT AttributeID AS Attribute1, Value AS Value1 FROM _Combinations WHERE AttributeID = 8) C1
CROSS JOIN
(
SELECT AttributeID AS Attribute2, Value AS Value2 FROM _Combinations WHERE AttributeID = 16) C2
CROSS JOIN
(
SELECT AttributeID AS Attribute3, Value AS Value3 FROM _Combinations WHERE AttributeID = 28) C3
-- SCRIPT #2
INSERT INTO _CombinedAttributes
SELECT RowNumber AS GroupKey, Attribute1, Value1
FROM #Test
UNION ALL
SELECT RowNumber, Attribute2, Value2
FROM #Test
UNION ALL
SELECT RowNumber, Attribute3, Value3
FROM #Test
ORDER BY RowNumber, Attribute1
The above two scripts work, but obviously there's some drawbacks. Namely I need to know how many attributes I'm dealing with and there's hard coding of IDs, so I can't generate this on the fly. The solution I came up with is I build the strings for Script 1 and Script 2 by looping through the attributes in the the _Combinations table and generate execution strings which is long and messy but I can post if needed. Can anyone see a way to pull off the format for the final insert without dynamic sql?
This routine wouldn't be run very much, but it's going to be run enough that I'd like to not be doing any execute string building and use straight SQL.
Thanks in advance.
UPDATE:
When I use a second dataset, Gordon's code is no longer returning correct results, it's creating groups with only 1 attribute near the end, however on this second dataset I get the correct rowcount with Nathan's routine (row count on final result should be 396). But as I stated on the comments, if I use the first dataset, I get the opposite result, Gordon's returns correctly, but Nathan's code has dups. I'm at a loss. Here is the second data set:
DROP TABLE [dbo].[_Combinations]
GO
CREATE TABLE [dbo].[_Combinations](
[AttributeID] [int] NULL,
[Value] varchar NULL
) ON [PRIMARY]
GO
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'1')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (16, N'2')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'<=39')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'40-44')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'45-49')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'50-54')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'55-64')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (28, N'65+')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'AA')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'JJ')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'CC')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'DD')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'EE')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'KK')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'BB')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'FF')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'GG')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'HH')
INSERT [dbo].[_Combinations] ([AttributeID], [Value]) VALUES (8, N'II')
I think this solves your problem.
Here is the approach. First, observe that the final data has the product of the number of each attribute -- 2*7*11 = 154 rows. Then observe that each value occurs a fixed number of times. For AttributeId = 16, each value occurs 154 / 2, because there are two values.
So, the idea is to calculate the number of times that each value appears. Then, generate the list of all the values. The final challenge is to assign the group numbers to these. For this, I use row_number() partitioned by the attribute id. To be honest, I'm not 100% that the grouping assignment is correct (it makes sense and it passed the eyeball test), but I'm worried that I'm missing a subtlety.
Here is the query:
with attributecount1 as (
select c.AttributeId, count(*) as cnt
from _Combinations c
group by c.AttributeId
),
const as (
select exp(sum(log(cnt))) as tot, count(*) as numattr
from attributecount1
),
attributecount as (
select a.*,
(tot / a.cnt) as numtimes
from attributecount1 a cross join const
),
thevalues as (
select c.AttributeId, c.Value, ac.numtimes, 1 as seqnum
from AttributeCount ac join
_Combinations c
on ac.AttributeId = c.AttributeId
union all
select v.AttributeId, v.Value, v.numtimes, v.seqnum + 1
from thevalues v
where v.seqnum + 1 <= v.numtimes
)
select row_number() over (partition by AttributeId order by seqnum, Value) as groupnum,
*
from thevalues
order by 1, 2
The SQL Fiddle is here.
EDIT:
Unfortunately, I don't have access to SQL Server today and SQL Fiddle is acting up.
The problem is solvable. The above solution works, but -- as stated in my comment -- only when the dimensions are pairwise mutually prime. The problem is the assignment of the group number to the values. It turns out that this is a problem in number theory.
Essentially, we want to enumerate the combinations. If there were 2 in two groups, then it would be:
group 0: 1 1
group 1: 1 2
group 2: 2 1
group 3: 2 2
You can see a relationship between the group number and which values are assigned -- based on the binary representation of the group number. If this were 2x3, then it would look like:
group 0: 1 1
group 1: 1 2
group 2: 1 3
group 3: 2 1
group 4: 2 2
group 5: 2 3
Same idea, but now there is not "binary" representation. Each position in the number would have a different base. No problem.
So, the challenge is mapping a number (such as the group number) to each digit. This requires appropriate division and modulo arithmetic.
The following implements this in Postgres:
with c as (
select 1 as attrid, '1' as val union all
select 1 as attrid, '2' as val union all
select 2 as attrid, 'A' as val union all
select 2 as attrid, 'B' as val union all
select 3 as attrid, '10' as val union all
select 3 as attrid, '20' as val
),
c1 as (
select c.*, dense_rank() over (order by attrid) as attrnum,
dense_rank() over (partition by attrid order by val) as valnum,
count(*) over (partition by attrid) as cnt
from c
),
a1 as (
select attrid, count(*) as cnt,
cast(round(exp(sum(ln(count(*))) over (order by attrid rows between unbounded preceding and current row))) as int)/count(*) as cum
from c
group by attrid
),
a2 as (
select a.*,
(select cast(round(exp(sum(ln(cnt)))) as int)
from a1
where a1.attrid <= a.attrid
) / cnt as cum
from a1 a
),
const as (
select cast(round(exp(sum(ln(cnt)))) as int) as numrows
from a1
),
nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all
select 5 union all select 6 union all select 7 union all select 8
from const
),
ac as (
select c1.*, a1.cum, const.numrows
from c1 join
a1 on c1.attrid = a1.attrid cross join
const
)
select *
from nums join
ac
on (nums.n/cum) % cnt = valnum - 1
order by 1, 2;
(Note: generate_series() was not working correctly for some reason with certain joins, which is why it manually generates the sequence of numbers.)
When SQL Fiddle gets working again, I should be able to translate this back to SQL Server.
EDIT II:
Here is the version that works in SQL Server:
with attributecount1 as (
select c.AttributeId, count(*) as cnt
from _Combinations c
group by c.AttributeId
),
const as (
select cast(round(exp(sum(log(cnt))), 1) as int) as tot, count(*) as numattr
from attributecount1
),
attributecount as (
select a.*,
(tot / a.cnt) as numtimes,
(select cast(round(exp(sum(log(ac1.cnt))), 1) as int)
from attributecount1 ac1
where ac1.AttributeId <= a.AttributeId
) / a.cnt as cum
from attributecount1 a cross join const
),
c as (
select c.*, ac.numtimes, ac.cum, ac.cnt,
dense_rank() over (order by c.AttributeId) as attrnum,
dense_rank() over (partition by c.AttributeId order by Value) as valnum
from _Combinations c join
AttributeCount ac
on ac.AttributeId = c.AttributeId
),
nums as (
select 1 as n union all
select 1 + n
from nums cross join const
where 1 + n <= const.tot
)
select *
from nums join
c
on (nums.n / c.cum)%c.cnt = c.valnum - 1
option (MAXRECURSION 1000)
THe SQL Fiddle is here.
Years ago I faced a similar problem with a fixed EAV schema not unlike yours. Peter Larsson came up with the below solution to address my "dynamic combinations" query.
I've adapted it to fit your schema. Hope this helps!
SqlFiddle Here
;with cteSource (Iteration, AttributeID, recID, Items, Unq, Perm) as
(
select v.Number + 1,
s.AttributeId,
row_number() over (order by v.Number, s.AttributeID) - 1,
s.Items,
u.Unq,
f.Perm
from (select AttributeID, count(*) from _Combinations group by AttributeID) s(AttributeId, Items)
cross
join (select count(distinct AttributeID) from _Combinations) u (Unq)
join master..spt_values as v on v.Type = 'P'
outer
apply (
select top(1) cast(exp(sum(log(count(*))) over ()) as bigint)
from _Combinations as w
where w.AttributeID >= s.AttributeID
group
by w.AttributeID
having count(*) > 1
) as f(Perm)
where v.Number < (select top(1) exp(sum(log(count(*))) over()) from _Combinations as x group by x.AttributeID)
)
select s.Iteration,
s.AttributeID,
w.Value
from cteSource as s
cross
apply (
select Value,
row_number() over (order by Value) - 1
from _Combinations
where AttributeID = s.AttributeID
) w(Value, recID)
where coalesce(s.recID / (s.Perm * s.Unq / s.Items), 0) % s.Items = w.recID
order
by s.Iteration, s.AttributeId;
I've decided to post this, just for the sake of a procedural solution appearing in parallel with the CTE-based ones.
The following produces a zero-based GroupKey column. If you want it to start from 1, simply change #i to #i+1 in the last insert ... select.
-- Add a zero-based row number, partitioned by AttributeId
declare #Attrs table (AttributeId int,Value varchar(50),RowNum int)
insert into #Attrs
select
AttributeId,Value,
ROW_NUMBER()over(partition by AttributeId order by AttributeId,Value)-1
from _Combinations
-- AttributeId value counts
declare #AttCount table (AttributeId int,n int)
insert into #AttCount
select AttributeId,COUNT(*) n from #Attrs
group by AttributeID
-- Total number of combos -- Multiply all AttributeId counts
-- EXP(SUM(LOG(n))) didnt work as expected
-- so fall back to good old cursors...
declare #ncombos int,#num int
declare mulc cursor for select n from #AttCount
open mulc
set #ncombos=1
fetch next from mulc into #num
while ##FETCH_STATUS=0
begin
set #ncombos=#ncombos*#num
fetch next from mulc into #num
end
close mulc
deallocate mulc
-- Now let's get our hands dirty...
declare #i int,#m int,#atid int,#n int,#r int
declare c cursor for select AttributeId,n from #AttCount
open c
fetch next from c into #atid,#n
set #m=1
while ##FETCH_STATUS=0
begin
set #i=0
while #i<#ncombos
begin
set #r=(#i/#m)%#n
insert into _CombinedAttributes (GroupKey,AttributeId,Value)
select #i,#atid,value from #Attrs where AttributeId=#atid and RowNum=#r
set #i=#i+1
end
set #m=#m*#n
fetch next from c into #atid,#n
end
close c
deallocate c
Hint: Here's why I didn't use exp(sum(log())) to emulate a mul() aggregate.
Recursive Solution
The following is a recursive solution, SQLFiddle is here:
with a as ( -- unique AttributeIDs
select AttributeID
,Row_Number() over(order by AttributeID) as rowNo
,count(*) as cnt
from [dbo].[_Combinations]
group by AttributeID
),
r as (
-- start recursion: list all values of the first attribute
select Dense_Rank() over(order by c.[Value]) - 1 as GroupKey
,c.AttributeID
,c.[Value]
,a.cnt as factor
,1 as level
from a
join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
where a.rowNo = 1
union all
-- recursion step: add the combinations with the values of the next attribute
select GroupKey
,case when AttributeID = 'prev' then prevAttribID else currAttribID end as AttributeID
,[Value]
,factor
,level
from (select r.Value as prev
,c.Value as curr
,(Dense_Rank() over(order by c.[Value]) - 1) * r.factor + r.GroupKey as GroupKey
,r.level + 1 as level
,r.factor * a.cnt as factor
,r.AttributeID as prevAttribID
,a.AttributeID as currAttribID
from r
join a on r.level + 1 = a.rowNo
join [dbo].[_Combinations] as c on a.AttributeID = c.AttributeID
) as p
unpivot ( Value for AttributeID in (prev, curr)) as up
)
-- get result: this is the data from the deepest level
select distinct
GroupKey + 1 as GroupKey -- start with one instead of zero
,AttributeID
,[Value]
from r
where level = (select count(*) from a)
order by GroupKey, AttributeID, [Value]
Dynamic Solution
And this is a slightly shorter version using a dynamic statement:
declare #stmt varchar(max);
with a as ( -- unique attribute keys, cast here to avoid casting when building the dynamic statement
select distinct cast(AttributeID as varchar(10)) as ID
from [dbo].[_Combinations]
)
select #stmt = 'select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
from
(
select '
+ (select ' C' + ID + '.Value as V' + ID + ', ' from a for xml path(''))
+ ' Row_Number() over(order by '
+ stuff((select ', C' + ID + '.Value' from a for xml path('')), 1, 2, '')
+ ') AS GroupKey from '
+ stuff((select ' cross join [dbo].[_Combinations] as C' + ID from a for xml path('')), 1, 11, '')
+ ' where '
+ stuff((select ' and C' + ID + '.AttributeID = ' + ID from a for xml path('')), 1, 4, '')
+ ') as p unpivot (Value for AttributeIDStr in ('
+ stuff((select ', V' + ID from a for xml path('')), 1, 2, '')
+ ')) as up'
;
exec (#stmt)
As SQL Server does not have the nice list aggregate function that other databases have, one must use the ugly stuff((select ... for xml path(''))) expression.
The statement produced for the sample data is - apart from whitespace differences - the following:
select GroupKey, Cast(SubString(AttributeIDStr, 2, 100) as int) as AttributeID, Value
from
(
select C16.Value as V16
,C28.Value as V28
,C8.Value as V8
,Row_Number() over(order by C16.Value, C28.Value, C8.Value) AS GroupKey
from [dbo].[_Combinations] as C16
cross join
[dbo].[_Combinations] as C28
cross join
[dbo].[_Combinations] as C8
where C16.AttributeID = 16
and C28.AttributeID = 28
and C8.AttributeID = 8
) as p
unpivot ( Value for AttributeIDStr in (V16, V28, V8)) as up
Both solutions avoid the multiplication aggregation workaround using exp(log()) that is used in some other answers, which is very sensitive to rounding errors.
Regarding the issue with exp(sum(log(count(*))) over ()), the answer for me seemed to be to introduce the ROUND function to the mix. Thus, the following snippet seems to produce a reliable answer (so far at least):
ROUND(exp(sum(log(count(*))) over ()), 0)