I have a question I have a simple table that looks like this when i do select all on it (one column with some rows)
| a, b, c | - 1st row
| b, d, d | - 2nd row
| d, e, f | - 3rd row
Now in trying to split those values by comma so each value would be in separate row something like
|a| - 1st row
|b| - 2nd row
|c| - 3rd row
|d| - 4th row
|e| - 5th row
|f| - 6th row
I was trying with something like:
select id,
case when CHARINDEX(', ', [value])>0
then SUBSTRING([value] , 1, CHARINDEX(', ',[value])-1) else [value] end firstname,
CASE WHEN CHARINDEX(', ', [value])>0
THEN SUBSTRING([value],CHARINDEX(', ',[value])+1,len([value])) ELSE NULL END as lastname from table
But it is not the way.
Without a UDF Parse/Split function
You didn't specify a Table or Column name so replace YourTable and YourList with your actual table and column names.
Select Distinct RetVal
,RowNr = Dense_Rank() over (Order by RetVal)
From YourTable A
Cross Apply (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ replace((Select A.YourList as [*] For XML Path('')),',','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) B
Returns
RetVal RowNr
a 1
b 2
c 3
d 4
e 5
f 6
Using a Split/Parse function (everyone should have a good one)
Select Distinct RetVal
,RowNr = Dense_Rank() over (Order by RetVal)
From YourTable A
Cross Apply (Select * from [dbo].[udf-Str-Parse-8K](A.YourList,',') ) B
The UDF -- if interested
CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(#String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(#Delimiter) From cte2 t Where Substring(#String,t.N,DataLength(#Delimiter)) = #Delimiter),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(#Delimiter,#String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By A.N)
,RetVal = LTrim(RTrim(Substring(#String, A.N, A.L)))
From cte4 A
);
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
A recursive cte solution which finds all the ,s in the string and gets the substring between 2 ,s. (Assuming you are on a sql server version 2012+ for lead to work)
with cte as (
select val,charindex(',',','+val+',') as location from t
union all
select val,charindex(',',','+val+',',location+1) from cte
where charindex(',',','+val+',',location+1) > 0
)
,substrings as (select *,
substring(val,location,
lead(location,1) over(partition by val order by location)-location-1) as sub
from cte)
select distinct sub
from substrings
where sub is not null and sub <> ''
order by 1;
Sample Demo
1) The first cte gets all the , locations in the string recursively. , is appended at the beginning and end of the string to avoid missing the first substring before , and the last substring after ,.
2) For each string, the location of the next , is found using lead ordered by the location of ,.
3) Finally get all those substrings which are not null and are not empty strings.
You can do this by using cross apply and XML
select distinct
p.a.value('.','varchar(10)') col
from (
select
cast('<x>' + replace(col,', ','</x><x>') + '</x>' as XML) as x
from your_table) t
cross apply x.nodes ('/x') as p(a)
) t
Related
I am analysing data in a 'RawDataDescriptions' table with a 'description' field that was open end for users to input.
I'm looking for ways to broadly categorise the descriptions by phrases or a string of characters that frequently show up (including a count of how many times they occur).
I have no specific words or phrases to look for necessarily where i can use a 'like' statement, instead i'm looking for commonalities between the fields.
Whilst looking for this through other questions, i managed to find a query which i adjusted for my own table that gets out the most common word (Pasted below), but of course one word alone provides little -if any -insight into the descriptions.
Is it possible to make a query that would provide a count of phrases and not just single words? if so, what would the main components of it be?
WITH E1(N) AS
(
SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
) t(N)
),
E2(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b),
E4(N) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b)
SELECT
x.Item,
COUNT(*)
FROM RawDataDescriptions p
CROSS APPLY (
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = LTRIM(RTRIM(SUBSTRING(p.[Description], l.N1, l.L1)))
FROM (
SELECT s.N1,
L1 = ISNULL(NULLIF(CHARINDEX(' ',p.[Description],s.N1),0)-
s.N1,4000)
FROM(
SELECT 1 UNION ALL
SELECT t.N+1
FROM(
SELECT TOP (ISNULL(DATALENGTH(p.[Description])/2,0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4
) t(N)
WHERE SUBSTRING(p.[Description] ,t.N,1) = ' '
) s(N1)
) l(N1, L1)
) x
WHERE x.item <> ''
GROUP BY x.Item
ORDER BY COUNT(*) DESC
*Edit - not doable. Alternative desired outcome:
Sample table
Id | Description
---+--------------------------
01 | Customer didn't like it
02 | Person liked it
03 | Person didn't like it
04 | Client didn't like it
05 | person liked it
#Parameter = 3
Desired result :
string | count
-----------------+-------
didn't like it | 3
Person liked it | 2
edit 2** the original question was doable - see answer
Here is one option. I have several concerns, like punctuation, control characters, and especially performance on large tables
Example
Declare #RawDataDescriptions Table ([Id] varchar(50),[Description] varchar(50))
Insert Into #RawDataDescriptions Values
('01','Customer didn''t like it')
,('02','Person liked it')
,('03','Person didn''t like it')
,('04','Client didn''t like it')
,('05','person liked it')
;with cte as (
Select Id
,B.*
From #RawDataDescriptions A
Cross Apply (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(A.[Description],' ','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
) B
)
Select Phrase
,Cnt = count(*)
From cte A
Cross Apply (
Select Phrase = stuff((Select ' '+RetVal
From cte
Where ID = A.ID
and RetSeq between A.RetSeq and A.RetSeq+2
Order By RetSeq
For XML Path('')),1,1,'')
) B
Where Phrase like '% % %'
Group By Phrase
Having count(*)>1
Order By 2 Desc
Returns
Phrase Cnt
didn't like it 3
Person liked it 2
UPDATE - TVF - Better Performance
I decided that I may want to turn this into a a Table-Valued Function, and was shocked by the performance gains. For example, I have 130,000 descriptions from FRED (Federal Reserve Economic Data), and I was able to generate a list of common phrases (n words) in 9 seconds.
Usage
Select Phrase = B.RetVal
,Cnt = count(*)
From YourTable A
Cross Apply [dbo].[tvf-Str-Parse-Phrase](A.YourColumn,' ',4) B
Group By B.RetVal
Having count(*)>1
Order By 2 Desc
The TVF if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse-Phrase] (#String varchar(max),#Delimeter varchar(25),#WordCnt int)
Returns Table
As
Return (
with cte as (
Select RetSeq = Row_Number() over (Order By (Select null))
,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>' + replace((Select replace(#String,#Delimeter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.')) as A
Cross Apply x.nodes('x') AS B(i)
)
Select RetSeq = Row_Number() over (Order By (Select Null))
,B.RetVal
From cte A
Cross Apply (Select RetVal = stuff((Select ' '+RetVal From cte Where RetSeq between A.RetSeq and A.RetSeq+#WordCnt-1 For XML Path('')),1,1,'') ) B
Where B.RetVal like Replicate('% ',#WordCnt-1)+'%'
);
--Select * from [dbo].[tvf-Str-Parse-Phrase]('This is some text that I want parsed',' ',4)
You can enable Microsoft Full Text Index on the table & perform these kind of finding frequent words and characters analysis on the table.
sys.dm_fts_index_keywords_by_document
sys.dm_fts_index_keywords_by_property
sys.dm_fts_index_keywords_by_document
The scenario is that we have two lists:
A: 23,45,g5,33
B: 11,12,45,g9
We want the fastest mechanism in SQL SERVER to see if any of the values from B is available in A, in this example 45 is in A so it must return true.
The solution should describe the way to store the lists (CSV, tables etc.) and the comparison mechanism.
Each list is relatively small (average 10 values in each) but the comparison is being made many many times (very few writes, many many reads)
If you are stuck with the delimited string, consider the following:
Example:
Declare #YourTable Table ([ColA] varchar(50),[ColB] varchar(50))
Insert Into #YourTable Values
('23,45,g5,33' ,'11,12,45,g9')
,('no,match' ,'found,here')
Select *
from #YourTable A
Cross Apply (
Select Match=IsNull(sum(1),0)
From [dbo].[udf-Str-Parse-8K](ColA,',') B1
Join [dbo].[udf-Str-Parse-8K](ColB,',') B2 on B1.RetVal=B2.RetVal
) B
Returns
ColA ColB Match
23,45,g5,33 11,12,45,g9 1
no,match found,here 0
The UDF if Interested
CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (#String varchar(max),#Delimiter varchar(25))
Returns Table
As
Return (
with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(N) As (Select Top (IsNull(DataLength(#String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
cte3(N) As (Select 1 Union All Select t.N+DataLength(#Delimiter) From cte2 t Where Substring(#String,t.N,DataLength(#Delimiter)) = #Delimiter),
cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(#Delimiter,#String,s.N),0)-S.N,8000) From cte3 S)
Select RetSeq = Row_Number() over (Order By A.N)
,RetVal = LTrim(RTrim(Substring(#String, A.N, A.L)))
From cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
I'm still confused as to the core idea... but here is a simple solution that's better than a comma separated list. Creating indexes would make it faster, of course. It's far quicker than looping.
declare #table table (id char(4), v varchar(256))
insert into #table
values
('A','23'),
('A','45'),
('A','g5'),
('A','33'),
('B','11'),
('B','12'),
('B','45'),
('B','g9')
select distinct
base.v
--,base.*
--,compare.*
from
#table base
inner join
#table compare
on compare.v = base.v
and compare.id <> base.id
Split Way
declare #table table (id char(4), v varchar(256))
insert into #table
values
('A','23,45,g5,33'),
('B','11,12,45,g9')
;with cte as(
select
t.ID
,base.Item
from
#table t
cross apply dbo.DelimitedSplit8K(t.v,',') base)
select
t.Item
from
cte t
inner join
cte x on
x.Item = t.Item
and x.id <> t.id
where
t.id = 'A'
USING THIS FUNCTION
CREATE FUNCTION [dbo].[DelimitedSplit8K] (#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
/* "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
enough to cover VARCHAR(8000)*/
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
GO
Based on the previous answer, I think it should look like this:
declare #table table (id char(4), v varchar(256))
insert into #table
values
('A','23'),
('A','45'),
('A','g5'),
('A','33'),
('B','11'),
('B','12'),
('B','45'),
('B','g9')
if exists( select count(1)
from
#table base
inner join
#table compare
on compare.v = base.v
and base.id='A' and compare.id='B')
print 'true'
else
print 'false'
index on id, v or v, id depending on grow of your data
I have the following function:
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
and the following code:
declare #string nvarchar(max) = 'aaa,1.3,1,bbb,1.5,ccc,2.0,1'
;WITH AllItems as
(
SELECT Item, ROW_NUMBER() OVER(ORDER BY (select null)) as rn
FROM dbo.SplitStrings_XML(#string, ',')
)
, Strings as
(
SELECT Item as Name, ROW_NUMBER() OVER(ORDER BY (select null)) as rn
FROM dbo.SplitStrings_XML(#string, ',')
WHERE ISNUMERIC(Item) = 0
), Doubles as
(
SELECT Item as Measure, ROW_NUMBER() OVER(ORDER BY (select null)) as rn
FROM dbo.SplitStrings_XML(#string, ',')
WHERE ISNUMERIC(Item) = 1 AND CHARINDEX('.', Item) > 0
), Integers as
(
SELECT Item as Value, ROW_NUMBER() OVER(ORDER BY (select null)) as rn
FROM dbo.SplitStrings_XML(#string, ',')
WHERE ISNUMERIC(Item) = 1 AND CHARINDEX('.', Item) = 0
)
SELECT Name, Measure, Value
FROM AllItems A
LEFT JOIN Strings S ON A.rn = S.rn
LEFT JOIN Doubles D ON A.rn = D.rn
LEFT JOIN Integers I ON A.rn = I.rn
WHERE COALESCE(Name, Measure, Value) IS NOT NULL
In this code we got a #string = 'aaa,1.3,1,bbb,1.5,ccc,2.0,1' that returns the chars in a row named Name ,returns the double values in a row named Measure and the int values in a row named Value,the problem is that in my string i have always a Name and Measure but sometimes the Value is missing and I would like to place a NULL value in that space.
So in my example I shouldhave something like
Name Measure Value
---------+--------+-------
aaa 1.3 1
bbb 1.5 NULL
ccc 2.0 1
Instead I have :
Name Measure Value
---------+--------+-------
aaa 1.3 1
bbb 1.5 1
ccc 2.0 NULL
First, I would suggest that you modify the function to return the item number. However, that is not necessary because your row_number() does that.
Then, I assume that a "missing value" means ",,".
If so, I would suggest defining the CTEs as:
WITH AllItems as (
SELECT Item, ROW_NUMBER() OVER (ORDER BY (select null)) as rn
FROM dbo.SplitStrings_XML(#string, ',')
),
Strings as (
SELECT Item as Name, ROW_NUMBER() OVER (ORDER BY (select null)) as rn
FROM AllItems ai
WHERE ai.rn % 3 = 1
),
Doubles as (
SELECT Item as Measure, ROW_NUMBER() OVER (ORDER BY (select null)) as rn
FROM AllItems ai
WHERE ai.rn % 3 = 2
),
Integers as (
SELECT Item as Value, ROW_NUMBER() OVER(ORDER BY (select null)) as rn
FROM AllItems ai
WHERE ai.rn % 3 = 0
)
. . .
I need to find out how to generate an alphanumeric string that follows the format like in the answer for this question which I'm currently using, except it has to be in the following format:
Vowel + consonant + vowel + consonant + 4-digit number
For example ABAB1111 or IJUZ9236.
Thanks for any suggestion.
You can follow this steps:
Generate a vowels(A,E...) table , consonants (B,C..) table and numbers (1,2,..) table .
Then use this query:
SELECT (SELECT TOP 1 * FROM vowels ORDER BY newid()) +
(SELECT TOP 1 * FROM consonants ORDER BY newid()) +
(SELECT TOP 1 * FROM vowels ORDER BY newid()) +
(SELECT TOP 1 * FROM consonants ORDER BY newid()) +
(SELECT TOP 1 * FROM numbers ORDER BY newid()) +
(SELECT TOP 1 * FROM numbers ORDER BY newid()) +
(SELECT TOP 1 * FROM numbers ORDER BY newid()) +
(SELECT TOP 1 * FROM numbers ORDER BY newid())
I assume you want a random string. Something like this should work:
with v as (
select 'A' as c union all select 'E' union all . . .
),
c as (
select 'B' as c union all select 'C' union all . . .
),
d as (
select '0' as c union all select '1' union all . . .
)
select ((select top 1 c from v order by newid()) +
(select top 1 c from c order by newid()) +
(select top 1 c from v order by newid()) +
(select top 1 c from c order by newid()) +
(select top 1 c from d order by newid()) +
(select top 1 c from d order by newid()) +
(select top 1 c from d order by newid()) +
(select top 1 c from d order by newid())
);
Using temp tables as example data i'd do it like this;
CREATE TABLE #Vowels (Vowel varchar(1))
INSERT INTO #Vowels VALUES ('A'),('E'),('I'),('O'),('U')
CREATE TABLE #Consonants (Consonant varchar(1))
INSERT INTO #Consonants VALUES ('B'),('C'),('D'),('F'),('G'),('H'),('J'),('K'),('L'),('M'),('N'),('P'),('Q'),('R'),('S'),('T'),('V'),('W'),('X'),('Y'),('Z')
CREATE TABLE #Numbers (Numbers varchar(1))
INSERT INTO #Numbers VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)
SELECT
v1.Vowel + c1.Consonant + v2.Vowel + c2.Consonant + n1.Numbers + n2.Numbers + n3.Numbers + n4.Numbers AS Result
FROM (SELECT TOP 1 Vowel FROM #Vowels ORDER BY NEWID()) v1
CROSS JOIN (SELECT TOP 1 Consonant FROM #Consonants ORDER BY NEWID()) c1
CROSS JOIN (SELECT TOP 1 Vowel FROM #Vowels ORDER BY NEWID()) v2
CROSS JOIN (SELECT TOP 1 Consonant FROM #Consonants ORDER BY NEWID()) c2
CROSS JOIN (SELECT TOP 1 Numbers FROM #Numbers ORDER BY NEWID()) n1
CROSS JOIN (SELECT TOP 1 Numbers FROM #Numbers ORDER BY NEWID()) n2
CROSS JOIN (SELECT TOP 1 Numbers FROM #Numbers ORDER BY NEWID()) n3
CROSS JOIN (SELECT TOP 1 Numbers FROM #Numbers ORDER BY NEWID()) n4
DROP TABLE #Consonants
DROP TABLE #Numbers
DROP TABLE #Vowels
The result comes out like this but with different values each time you run it.
Result
AQOF7641
If you are running this a number of times, it would make sense to make proper tables containing your vowels, consonants and number. It would reduce the (admittedly small) cost of this query.
This should do the trick:
WITH letters as
(
SELECT 'bcdfghjklmnpqrstvwxyz' c, 'aeiou' v
)
,CTE as
(
SELECT
SUBSTRING(v, CAST(rand()*5 as int)+1, 1)+
SUBSTRING(c, CAST(rand()*21 as int)+1, 1)+
SUBSTRING(v, CAST(rand()*5 as int)+1, 1)+
SUBSTRING(c, CAST(rand()*21 as int)+1, 1)+
right(10000+ CAST(rand()*10000 as int),4) x
FROM letters
)
SELECT x
FROM CTE
DECLARE #AlphaString VARCHAR(200) = NULL;
WITH
CTE_Digits AS (
SELECT TOP 255
ROW_NUMBER() OVER (ORDER BY a.object_id) AS RowNum
FROM
sys.all_columns a
CROSS JOIN
sys.all_columns b),
CTE_Types AS (
SELECT
CHAR(RowNum) AS Digit,
CASE
WHEN RowNum < 58 THEN 'D'
WHEN CHAR(RowNum) IN ('A','E','I','O','U') THEN 'V'
ELSE 'C'
END AS CharType
FROM
CTE_Digits
WHERE
RowNum BETWEEN 48 AND 57
OR RowNum BETWEEN 65 AND 90),
CTE_List AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY CharType ORDER BY NEWID()) AS NewRow
FROM
CTE_Types),
CTE_Ordered AS (
SELECT
*,
CASE CharType
WHEN 'V' THEN 2
WHEN 'C' THEN 3
WHEN 'D' THEN 7
END * NewRow AS DigitOrder
FROM
CTE_List
WHERE
(NewRow < 5
AND CharType = 'D')
OR NewRow < 3)
SELECT #AlphaString =
(SELECT
CAST(Digit AS VARCHAR(MAX))
FROM
CTE_Ordered
ORDER BY
DigitOrder
FOR XML PATH(''));
SELECT #AlphaString;
I need to concatenate the name in a recursive cross join way. I don't know how to do this, I have tried a CTE using WITH RECURSIVE but no success.
I have a table like this:
group_id | name
---------------
13 | A
13 | B
19 | C
19 | D
31 | E
31 | F
31 | G
Desired output:
combinations
------------
ACE
ACF
ACG
ADE
ADF
ADG
BCE
BCF
BCG
BDE
BDF
BDG
Of course, the results should multiply if I add a 4th (or more) group.
Native Postgresql Syntax:
SqlFiddleDemo
WITH RECURSIVE cte1 AS
(
SELECT *, DENSE_RANK() OVER (ORDER BY group_id) AS rn
FROM mytable
),cte2 AS
(
SELECT
CAST(name AS VARCHAR(4000)) AS name,
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
CAST(CONCAT(c2.name,c1.name) AS VARCHAR(4000)) AS name
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT name as combinations
FROM cte2
WHERE LENGTH(name) = (SELECT MAX(rn) FROM cte1)
ORDER BY name;
Before:
I hope if you don't mind that I use SQL Server Syntax:
Sample:
CREATE TABLE #mytable(
ID INTEGER NOT NULL
,TYPE VARCHAR(MAX) NOT NULL
);
INSERT INTO #mytable(ID,TYPE) VALUES (13,'A');
INSERT INTO #mytable(ID,TYPE) VALUES (13,'B');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'C');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'D');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'E');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'F');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'G');
Main query:
WITH cte1 AS
(
SELECT *, rn = DENSE_RANK() OVER (ORDER BY ID)
FROM #mytable
),cte2 AS
(
SELECT
TYPE = CAST(TYPE AS VARCHAR(MAX)),
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
[Type] = CAST(CONCAT(c2.TYPE,c1.TYPE) AS VARCHAR(MAX))
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT *
FROM cte2
WHERE LEN(Type) = (SELECT MAX(rn) FROM cte1)
ORDER BY Type;
LiveDemo
I've assumed that the order of "cross join" is dependent on ascending ID.
cte1 generate DENSE_RANK() because your IDs contain gaps
cte2 recursive part with CONCAT
main query just filter out required length and sort string
The recursive query is a bit simpler in Postgres:
WITH RECURSIVE t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name
FROM tbl
)
, cte AS (
SELECT grp, name
FROM t
WHERE grp = 1
UNION ALL
SELECT t.grp, c.name || t.name
FROM cte c
JOIN t ON t.grp = c.grp + 1
)
SELECT name AS combi
FROM cte
WHERE grp = (SELECT max(grp) FROM t)
ORDER BY 1;
The basic logic is the same as in the SQL Server version provided by #lad2025, I added a couple of minor improvements.
Or you can use a simple version if your maximum number of groups is not too big (can't be very big, really, since the result set grows exponentially). For a maximum of 5 groups:
WITH t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name AS n
FROM tbl
)
SELECT concat(t1.n, t2.n, t3.n, t4.n, t5.n) AS combi
FROM (SELECT n FROM t WHERE grp = 1) t1
LEFT JOIN (SELECT n FROM t WHERE grp = 2) t2 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 3) t3 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 4) t4 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 5) t5 ON true
ORDER BY 1;
Probably faster for few groups. LEFT JOIN .. ON true makes this work even if higher levels are missing. concat() ignores NULL values. Test with EXPLAIN ANALYZE to be sure.
SQL Fiddle showing both.