Incremental concatenation of columns per row in SQL - sql

I am trying to write an Oracle or MS SQL script which outputs the first row containing the cell value in column A, second row contains the cell value of column A concatenated with column B and separated by a comma,
third row contains the cell values of column A, B and C concatenated and separated by a comma.
Suppose the following SQL Table:
|columnA |columnB|columnC |columnD |columnF |columnG |
|--------|-------|--------|--------|--------|--------|
| matty | lucy | james | mike | tala | mark |
| jana | steph | alex | mohd | hani | elie |
The output would be:
matty
matty,lucy
matty,lucy,james
matty,lucy,james,mike
matty,lucy,james,mike,tala
matty,lucy,james,mike,tala,mark
jana
jana,steph
jana,steph,alex
jana,steph,alex,mohd
jana,steph,alex,mohd,hani
jana,steph,alex,mohd,hani,elie
How should I write the SQL select statement?

You can use apply:
select tt.*
from table t cross apply
( values (columnA, null, null, null, null, null),
(columnA, columnB, null, null, null, null),
. . .
(columnA, columnB, columnC, columnD, columnF, columnG)
) tt(col1, col2, col3, col4, col5, col6);
If you want to combine all the data into single column then use concat() :
select tt.*
from table t cross apply
( values (columnA),
(concat(columnA, ',', columnB)),
(concat(columnA, ',', columnB, ',', columnC)),
(concat(columnA, ',', columnB, ',', columnC, ',', columnD)),
(concat(columnA, ',', columnB, ',', columnC, ',', columnD, ',', columnF)),
(concat(columnA, ',', columnB, ',', columnC, ',', columnD, ',', columnF, ',', columnG))
) tt(cols);

One way is to unpivot data and make recursive concatenation (Oracle solution):
--data
with t(a, b, c, d, e, f) as (
select 'matty', 'lucy', 'james', 'mike', 'tala', 'mark' from dual union all
select 'jana ', 'steph', 'alex', 'mohd', 'hani', 'elie' from dual )
-- end of data
select ltrim(sys_connect_by_path(name, ','), ',') path
from (select rownum r1, a, b, c, d, e, f from t)
unpivot (name for r2 in (a as 1, b as 2, c as 3, d as 4, e as 5, f as 6))
connect by prior r1 = r1 and r2 = prior r2 + 1
start with r2 = 1
demo

If you want a version that works in both databases:
select colA
from t
union all
select concat(Col1, concat(',', colB))
from t
union all
select concat(concat(Col1, concat(',', colB)), concat(',', colC))
from t
union all
. . .

Related

'Unpivoting' a SQL table

I'm looking to 'unpivot' a table, though I'm not sure what the best way of going about it, is. Additionally, the values are separated by a ';'. I've listed a sample of what I'm looking at:
​
Column_A
Column_B
Column_C
Column_D
000
A;B;C;D
01;02;03;04
X;Y;D;E
001
A;B
05;06
S;T
002
C
07
S
​
From that, I'm looking for a way to unpivot it, but also to keep the relations it's currently in. As in, the first value in Column_B, C, and D are tied together:
​
|Column_A|Column_B|Column_C|Column_D|
|:-|:-|:-|:-|
|000|A|01|X|
|000|B|02|Y|
|000|C|03|D|
|000|D|04|E|
|001|A|05|S|
And so on.
My initial thought is to use a CTE, which I've set up as:
WITH TEST AS(
SELECT DISTINCT Column_A, Column_B, Column_C, VALUE AS Column_D
from [TABLE]
CROSS APPLY STRING_SPLIT(Column_D, ';'))
SELECT \* FROM TEST
;
Though that doesn't seem to produce the correct results, especially after stacking the CTEs and string splits.
As an update, there were really helpful solutions below. They all ran as expected, however I had one last addition. Is it possible/reasonable to ignore a row/column if it's blank? For example, skipping over Column_C where Column_A is '001'.
|Column_A|Column_B|Column_C|Column_D|
|:-|:-|:-|:-|
|000|A;B;C;D|01;02;03;04|X;Y;D;E|
|001|A;B||S;T|
|002|C|07|S|
Here is a JSON based method. SQL Server 2016 onwards.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ColA varchar(3), ColB varchar(8000), ColC varchar(8000), ColD varchar(8000));
INSERT INTO #tbl VALUES
('000','A;B;C;D','01;02;03;04','X;Y;D;E'),
('001','A;B','05;06','S;T'),
('002','C','07','S');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, ar1 = '["' + REPLACE(ColB, ';', '","') + '"]'
, ar2 = '["' + REPLACE(ColC, ';', '","') + '"]'
, ar3 = '["' + REPLACE(ColD, ';', '","') + '"]'
FROM #tbl
)
SELECT ColA, ColB.[value] AS [ColB], ColC.[value] AS ColC, ColD.[value] AS ColD
FROM rs
CROSS APPLY OPENJSON (ar1, N'$') AS ColB
CROSS APPLY OPENJSON (ar2, N'$') AS ColC
CROSS APPLY OPENJSON (ar3, N'$') AS ColD
WHERE ColB.[key] = ColC.[key]
AND ColB.[key] = ColD.[key];
Output
+------+------+------+------+
| ColA | ColB | ColC | ColD |
+------+------+------+------+
| 000 | A | 01 | X |
| 000 | B | 02 | Y |
| 000 | C | 03 | D |
| 000 | D | 04 | E |
| 001 | A | 05 | S |
| 001 | B | 06 | T |
| 002 | C | 07 | S |
+------+------+------+------+
You can use a recursive CTE to walk through the strings. Assuming they are all the same length (i.e. same number of semicolons):
with cte as (
select a, convert(varchar(max), null) as b, convert(varchar(max), null) as c, convert(varchar(max), null) as d,
convert(varchar(max), b + ';') as rest_b, convert(varchar(max), c + ';') as rest_c, convert(varchar(max), d + ';') as rest_d,
0 as lev
from t
union all
select a,
left(rest_b, charindex(';', rest_b) - 1),
left(rest_c, charindex(';', rest_c) - 1),
left(rest_d, charindex(';', rest_d) - 1),
stuff(rest_b, 1, charindex(';', rest_b), ''),
stuff(rest_c, 1, charindex(';', rest_c), ''),
stuff(rest_d, 1, charindex(';', rest_d), ''),
lev + 1
from cte
where rest_b <> ''
)
select a, b, c, d
from cte
where lev > 0
order by a, lev;
Here is a db<>fiddle.
EDIT:
You can ensure filter to use rows that have the same number of semicolons by using:
where (length(b) - length(replace(b, ';', ''))) = (length(c) - length(replace(c, ';', ''))) and
(length(b) - length(replace(b, ';', ''))) = (length(d) - length(replace(d, ';', '')))
You could also extend c and d with a bunch of semicolons so no error occurs and the resulting values are empty strings. Extra semicolons in those columns don't matter, so you could use:
select a, convert(varchar(max), null) as b, convert(varchar(max), null) as c, convert(varchar(max), null) as d,
convert(varchar(max), b + ';') as rest_b, convert(varchar(max), c + replicate(';', length(b))) as rest_c, convert(varchar(max), d + replicate(';', length(b))) as rest_d,
0 as lev
The real problem here is your design. Hopefully the reason you're doing this is to fix your design.
Unfortunately you can't use SQL Server's inbuilt STRING_SPLIT for this, as it doesn't provide an ordinal position. As such I use DelimitedSplit8K_LEAD to separate the values into rows, and then "join" then back up. This does assume that all the columns have the same number of delimited values.
CREATE TABLE dbo.YourTable (ColA varchar(3),
ColB varchar(8000),
ColC varchar(8000),
ColD varchar(8000));
INSERT INTO dbo.YourTable
VALUES('000','A;B;C;D','01;02;03;04','X;Y;D;E'),
('001','A;B','05;06','S;T'),
('002','C','07','S');
GO
SELECT YT.ColA,
DSLB.Item AS ColB,
DSLC.Item AS ColC,
DSLD.Item AS ColD
FROM dbo.YourTable YT
CROSS APPLY dbo.DelimitedSplit8K_LEAD(YT.ColB,';') DSLB
CROSS APPLY dbo.DelimitedSplit8K_LEAD(YT.ColC,';') DSLC
CROSS APPLY dbo.DelimitedSplit8K_LEAD(YT.ColD,';') DSLD
WHERE DSLB.ItemNumber = DSLC.ItemNumber
AND DSLC.ItemNumber = DSLD.ItemNumber;
GO
DROP TABLE dbo.YourTable;

Concatenate or merge many columns values with a separator between and ignoring nulls - SQL Server 2016 or older

I want to simulate the CONCAT_WS SQL Server 2017+ function with SQL Server 2016 version or older in order to concatenate many columns which values are strings like that:
Input:
| COLUMN1 | COLUMN2 | COLUMN3 | COLUMN4 |
'A' 'B' NULL 'D'
NULL 'E' 'F' 'G'
NULL NULL NULL NULL
Output:
| MERGE |
'A|B|D'
'E|F|G'
NULL
Notice that the output result is a new column that concatenate all values separated by '|'. The default value should be NULL if there are no values in the columns.
I tried with CONCAT and a CASE statement with many WHEN conditions but is really dirty and I am not allowed to use this solution. Thanks in advance.
One convenient way is:
select stuff( coalesce(',' + column1, '') +
coalesce(',' + column2, '') +
coalesce(',' + column3, '') +
coalesce(',' + column4, ''), 1, 1, ''
)
Here is another method by using XML and XQuery.
The number of columns is not hard-coded, it could be dynamic.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (id INT IDENTITY PRIMARY KEY, col1 CHAR(1), col2 CHAR(1), col3 CHAR(1), col4 CHAR(1));
INSERT INTO #tbl (col1, col2, col3, col4) VALUES
( 'A', 'B', NULL, 'D'),
(NULL, 'E' , 'F' , 'G'),
(NULL, NULL, NULL , NULL);
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = '|';
SELECT id, REPLACE((
SELECT *
FROM #tbl AS c
WHERE c.id = p.id
FOR XML PATH('r'), TYPE, ROOT('root')
).query('data(/root/r/*[local-name() ne "id"])').value('.', 'VARCHAR(100)') , SPACE(1), #separator) AS concatColumns
FROM #tbl AS p;
Output
+----+---------------+
| id | concatColumns |
+----+---------------+
| 1 | A|B|D |
| 2 | E|F|G |
| 3 | |
+----+---------------+

SQL Subquery with delimiter

I need to be able to split one string by the delimiter * into separate columns without including *
The column y from table x looks like this:
column y
*1HS*AB*GXX*123*02*PA45*2013-08-10*
*1R1*B*GX*123*02*PA45*2013-08-10*
*1HS*B*GX*13*01*PA45*2013-08-01*
*1P*C*GXX*123*02*PA45*2013-08-10*
STRING_SPLIT is not avalible
The outcome should be this:
Column1 Column2 Column3 Column4 Column5 Column6 Column7
1HS AB GXX 123 2 PA45 10-08-2013
1R1 B GX 123 2 PA45 10-08-2013
1HS B GX 13 1 PA45 01-08-2013
1P C GXX 123 2 PA45 10-08-2013
will you use the below query..
select RTRIM (REGEXP_SUBSTR (column y, '[^,]*,', 1, 1), ',') AS column 1
, RTRIM (REGEXP_SUBSTR (column y, '[^,]*,', 1, 2), ',') AS column 2
, RTRIM (REGEXP_SUBSTR (column y, '[^,]*,', 1, 3), ',') AS column 3
, LTRIM (REGEXP_SUBSTR (column y, ',[^,]*', 1, 3), ',') AS column 4
from YOUR_TABLE
Unfortunately, string_split() does not guarantee that it preserves the ordering of the values. And, SQL Server does not offer other useful string functions.
So, I recommend using recursive CTEs for this purpose:
with t as (
select *
from (values ('*1HS*AB*GXX*123*02*PA45*2013-08-10*'), ('1HSB*GX*13*01*PA45*2013-08-01*')) v(str)
),
cte as (
select convert(varchar(max), null) as val, 0 as lev, convert(varchar(max), str) as rest,
row_number() over (order by (select null)) as id
from t
union all
select left(rest, charindex('*', rest) - 1), lev + 1, stuff(rest, 1, charindex('*', rest) + 1, ''), id
from cte
where rest <> '' and lev < 10
)
select max(case when lev = 1 then val end) as col1,
max(case when lev = 2 then val end) as col2,
max(case when lev = 3 then val end) as col3,
max(case when lev = 4 then val end) as col4,
max(case when lev = 5 then val end) as col5,
max(case when lev = 6 then val end) as col6,
max(case when lev = 7 then val end) as col7
from cte
where lev > 0
group by cte.id;
Here is a db<>fiddle.
Assuming you can add a table valued function to your database then Jeff Moden's string split function is the best approach I've encountered. It will allow you to maintain order as well.
Find details here

Split one column data into multiple columns in oracle

In my oracle query I am using like below for retrieving the records and the result looks like this -
SELECT columnC
, LISTAGG(r.columnA,',') WITHIN GROUP (ORDER BY r.columnB) AS Test_sensor
FROM tableA
GROUP BY columnC
Currently the output looks like below -
ColumnC | Test_Sensor
=============================
Z12345 | 20,30,40,50,60,70
But I want this data to be displayed as below -
ColumnC | Test_Sensor1 | Test_Sensor2 | Test_Sensor3 | Test_Sensor4
==========================================================================
Z12345 | 20 | 30 | 40 | 50
Please help me on this
Thanks
Kranthi RTR
You can use a PIVOT (and do not need to use LISTAGG):
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TableA ( ColumnA, ColumnB, ColumnC ) AS
SELECT 20, 'A', 'Z12345' FROM DUAL UNION ALL
SELECT 30, 'B', 'Z12345' FROM DUAL UNION ALL
SELECT 40, 'C', 'Z12345' FROM DUAL UNION ALL
SELECT 50, 'D', 'Z12345' FROM DUAL UNION ALL
SELECT 60, 'E', 'Z12345' FROM DUAL UNION ALL
SELECT 70, 'F', 'Z12345' FROM DUAL;
Query 1:
SELECT *
from (
SELECT columnA,
columnC,
ROW_NUMBER() OVER ( PARTITION BY columnC ORDER BY columnB ) AS rn
FROM tableA
) a
PIVOT ( MAX( columnA ) FOR rn IN (
1 AS test_sensor1,
2 AS test_sensor2,
3 AS test_sensor3,
4 AS test_sensor4
) )
Results:
| COLUMNC | TEST_SENSOR1 | TEST_SENSOR2 | TEST_SENSOR3 | TEST_SENSOR4 |
|---------|--------------|--------------|--------------|--------------|
| Z12345 | 20 | 30 | 40 | 50 |
Query 2:
You can do it using LISTAGG but it is much, much less efficient than using PIVOT:
SELECT ColumnC,
REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 1 ) AS test_sensor1,
REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 2 ) AS test_sensor2,
REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 3 ) AS test_sensor3,
REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 4 ) AS test_sensor4
FROM (
SELECT ColumnC,
LISTAGG( ColumnA, ',' ) WITHIN GROUP ( ORDER BY ColumnB )
AS test_sensor
FROM TableA
GROUP BY ColumnC
)
Results:
| COLUMNC | TEST_SENSOR1 | TEST_SENSOR2 | TEST_SENSOR3 | TEST_SENSOR4 |
|---------|--------------|--------------|--------------|--------------|
| Z12345 | 20 | 30 | 40 | 50 |
If you go with LISTAGG, there is a caveat with LISTAGG in that it ignores NULL values. Are you sure your columnA will ALWAYS have data? If, say, test_sensor2 is NULL, the output from the listagg operation will be 20,40,50,60,70 so after the NULL value all sensor's data will be reported wrong! Correct that by using this:
replace(LISTAGG( nvl(to_char(ColumnA), ','), ',' )
WITHIN GROUP ( ORDER BY ColumnB ), ',,,',',,')
Now your output is 20,,40,50,60,70, keeping test_sensor2's NULL value and the rest in proper position.
HOWEVER now there is another problem in that the regex of the form '[^,]+' causes the same issue! So even though the listagg output is fixed, the parsed output is back to being off after the column with the NULL value! There is a different form shown below that fixes this problem. Here is an example set up so you can comment/uncomment to see the differences as the data is processed. Always expect the unexpected!
with TableA ( ColumnA, ColumnB, ColumnC ) AS (
SELECT 20, 'A', 'Z12345' FROM DUAL UNION ALL
SELECT NULL, 'B', 'Z12345' FROM DUAL UNION ALL -- make NULL
SELECT 40, 'C', 'Z12345' FROM DUAL UNION ALL
SELECT 50, 'D', 'Z12345' FROM DUAL UNION ALL
SELECT 60, 'E', 'Z12345' FROM DUAL UNION ALL
SELECT 70, 'F', 'Z12345' FROM DUAL
),
tbl_tmp as (
SELECT ColumnC,
-- Preserve the NULL in position 2
replace(LISTAGG( nvl(to_char(ColumnA), ','), ',' )
WITHIN GROUP ( ORDER BY ColumnB ), ',,,',',,')
AS test_sensor
FROM TableA
GROUP BY ColumnC
)
--select * from tbl_tmp;
-- regex of format [^,]+ does not handle NULLs
SELECT ColumnC,
-- REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 1) AS test_sensor1,
-- REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 2 ) AS test_sensor2,
-- REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 3 ) AS test_sensor3,
-- REGEXP_SUBSTR( test_sensor, '[^,]+', 1, 4 ) AS test_sensor4
REGEXP_SUBSTR( test_sensor, '(.*?)(,|$)', 1, 1, NULL, 1) AS test_sensor1,
REGEXP_SUBSTR( test_sensor, '(.*?)(,|$)', 1, 2, NULL, 1 ) AS test_sensor2,
REGEXP_SUBSTR( test_sensor, '(.*?)(,|$)', 1, 3, NULL, 1 ) AS test_sensor3,
REGEXP_SUBSTR( test_sensor, '(.*?)(,|$)', 1, 4, NULL, 1 ) AS test_sensor4
FROM tbl_tmp;

How to get multiple comma-separated values as individual columns?

I have a table with data like this:
select * from data
id | col1 | col2 | col3
---+-------+-------+-------
1 | 1,2,3 | 4,5,6 | 7,8,9
I want to get the data like this:
id | name | dd | fn | suf
---+------+----+----+-----
1 | col1 | 1 | 2 | 3
1 | col2 | 4 | 5 | 6
1 | col3 | 7 | 8 | 9
Currently, I use split_part() in a query like this:
SELECT * from(
select id,
'col1' as name,
NULLIF(split_part(col1, ',', 1), '') AS dd,
NULLIF(split_part(col1, ',', 2), '') AS fn,
NULLIF(split_part(col1, ',', 3), '') AS suf
from data
UNION
select id,
'col2' as name,
NULLIF(split_part(col2, ',', 1), '') AS dd,
NULLIF(split_part(col2, ',', 2), '') AS fn,
NULLIF(split_part(col2, ',', 3), '') AS suf
from data
UNION
select id,
'col3' as name,
NULLIF(split_part(col3, ',', 1), '') AS dd,
NULLIF(split_part(col3, ',', 2), '') AS fn,
NULLIF(split_part(col3, ',', 3), '') AS suf
from data
);
Is there a more elegant way? I have 20 columns.
Assuming this table:
CREATE TABLE tbl (id int, col1 text, col2 text, col3 text);
INSERT INTO tbl VALUES (1 ,'1,2,3', '4,5,6', '7,8,9');
A VALUES expression in a LATERAL subquery should be an elegant solution.
Then just use split_part(). Add NULLIF() only if there can be actual empty strings in the source ...
SELECT id, x.name
, split_part(x.col, ',', 1) AS dd
, split_part(x.col, ',', 2) AS fn
, split_part(x.col, ',', 3) AS suf
FROM tbl t, LATERAL (
VALUES (text 'col1', t.col1)
, ( 'col2', t.col2)
, ( 'col3', t.col3)
-- ... many more?
) x(name, col);
Works in PostgreSQL 9.3 or later.
SQL Fiddle.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
SELECT DISTINCT on multiple columns
Split comma separated column data into additional columns
I would do the union all first and the split_part() second:
select id, name,
coalesce(split_part(col, ',', 1), '') as dd,
coalesce(split_part(col, ',', 2), '') as fn,
coalesce(split_part(col, ',', 3), '') as suf
from ((select id, 'col1' as name, col1 as col from data
) union all
(select id, 'col2' as name, col2 as col from data
) union all
(select id, 'col3' as name, col3 as col from data
)
) t;