Optimal way to concatenate/aggregate strings - sql

I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE and FOR XML, but they just don't cut it for me.
String aggregation would do something like this:
id | Name Result: id | Names
-- - ---- -- - -----
1 | Matt 1 | Matt, Rocks
1 | Rocks 2 | Stylus
2 | Stylus
I've taken a look at CLR-defined aggregate functions as a replacement for COALESCE and FOR XML, but apparently SQL Azure does not support CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.
Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but hey I'll take what I can get) that I can use to aggregate my stuff?

SOLUTION
The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM dbo.SourceTable
),
Concatenated AS
(
SELECT
ID,
CAST(Name AS nvarchar) AS FullName,
Name,
NameNumber,
NameCount
FROM Partitioned
WHERE NameNumber = 1
UNION ALL
SELECT
P.ID,
CAST(C.FullName + ', ' + P.Name AS nvarchar),
P.Name,
P.NameNumber,
P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C
ON P.ID = C.ID
AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
EXPLANATION
The approach boils down to three steps:
Number the rows using OVER and PARTITION grouping and ordering them as needed for the concatenation. The result is Partitioned CTE. We keep counts of rows in each partition to filter the results later.
Using recursive CTE (Concatenated) iterate through the row numbers (NameNumber column) adding Name values to FullName column.
Filter out all results but the ones with the highest NameNumber.
Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).
I've quickly tested the solution on SQL Server 2012 with the following data:
INSERT dbo.SourceTable (ID, Name)
VALUES
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')
The query result:
ID FullName
----------- ------------------------------
2 Stylus
3 Bar, Baz, Foo
1 Matt, Rocks

Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).
create table #t (id int, name varchar(20))
insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')
select id
,Names = stuff((select ', ' + name as [text()]
from #t xt
where xt.id = t.id
for xml path('')), 1, 2, '')
from #t t
group by id

STRING_AGG() in SQL Server 2017, Azure SQL, and PostgreSQL:
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql
GROUP_CONCAT() in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat
(Thanks to #Brianjorden and #milanio for Azure update)
Example Code:
select Id
, STRING_AGG(Name, ', ') Names
from Demo
group by Id
SQL Fiddle: http://sqlfiddle.com/#!18/89251/1

Although #serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself.
This is #serge way:
DECLARE #startTime datetime2;
DECLARE #endTime datetime2;
DECLARE #counter INT;
SET #counter = 1;
set nocount on;
declare #YourTable table (ID int, Name nvarchar(50))
WHILE #counter < 1000
BEGIN
insert into #YourTable VALUES (ROUND(#counter/10,0), CONVERT(NVARCHAR(50), #counter) + 'CC')
SET #counter = #counter + 1;
END
SET #startTime = GETDATE()
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM #YourTable
),
Concatenated AS
(
SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1
UNION ALL
SELECT
P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
SET #endTime = GETDATE();
SELECT DATEDIFF(millisecond,#startTime, #endTime)
--Take about 54 milliseconds
And this is xmlpath way:
DECLARE #startTime datetime2;
DECLARE #endTime datetime2;
DECLARE #counter INT;
SET #counter = 1;
set nocount on;
declare #YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))
WHILE #counter < 1000
BEGIN
insert into #YourTable VALUES (#counter, ROUND(#counter/10,0), CONVERT(NVARCHAR(50), #counter) + 'CC')
SET #counter = #counter + 1;
END
SET #startTime = GETDATE();
set nocount off
SELECT
t1.HeaderValue
,STUFF(
(SELECT
', ' + t2.ChildValue
FROM #YourTable t2
WHERE t1.HeaderValue=t2.HeaderValue
ORDER BY t2.ChildValue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM #YourTable t1
GROUP BY t1.HeaderValue
SET #endTime = GETDATE();
SELECT DATEDIFF(millisecond,#startTime, #endTime)
--Take about 4 milliseconds

Update: Ms SQL Server 2017+, Azure SQL Database
You can use: STRING_AGG.
Usage is pretty simple for OP's request:
SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id
Read More
Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. #hrobky mentioned this previously as a SQL Server 2016 feature at the time.
--- Old Post:
Not enough reputation here to reply to #hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..

You can use += to concatenate strings, for example:
declare #test nvarchar(max)
set #test = ''
select #test += name from names
if you select #test, it will give you all names concatenated

I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:
DECLARE #STRUCTURED_VALUES TABLE (
ID INT
,VALUE VARCHAR(MAX) NULL
,VALUENUMBER BIGINT
,VALUECOUNT INT
);
INSERT INTO #STRUCTURED_VALUES
SELECT ID
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
,COUNT(*) OVER (PARTITION BY ID) AS VALUECOUNT
FROM RAW_VALUES_TABLE;
WITH CTE AS (
SELECT SV.ID
,SV.VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM #STRUCTURED_VALUES SV
WHERE VALUENUMBER = 1
UNION ALL
SELECT SV.ID
,CTE.VALUE + ' ' + SV.VALUE AS VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM #STRUCTURED_VALUES SV
JOIN CTE
ON SV.ID = CTE.ID
AND SV.VALUENUMBER = CTE.VALUENUMBER + 1
)
SELECT ID
,VALUE
FROM CTE
WHERE VALUENUMBER = VALUECOUNT
ORDER BY ID
;

Try this, i use it in my projects
DECLARE #MetricsList NVARCHAR(MAX);
SELECT #MetricsList = COALESCE(#MetricsList + '|', '') + QMetricName
FROM #Questions;

Related

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

Efficient way to merge alternating values from two columns into one column in SQL Server

I have two columns in a table. I want to merge them into a single column, but the merge should be done taking alternate characters from each columns.
For example:
Column A --> value (1,2,3)
Column B --> value (A,B,C)
Required result - (1,A,2,B,3,C)
It should be done without loops.
You need to make use of the UNION and get a little creative with how you choose to alternate. My solution ended up looking like this.
SELECT ColumnA
FROM Table
WHERE ColumnA%2=1
UNION
SELECT ColumnB
FROM TABLE
WHERE ColumnA%2=0
If you have an ID/PK column that could just as easily be used, I just didn't want to assume anything about your table.
EDIT:
If your table contains duplicates that you wish to keep, use UNION ALL instead of UNION
Try This;
SELECT [value]
FROM [Table]
UNPIVOT
(
[value] FOR [Column] IN ([Column_A], [Column_B])
) UNPVT
If you have SQL 2016 or higher you can use:
SELECT QUOTENAME(STRING_AGG (cast(a as varchar(1)) + ',' + b, ','), '()')
FROM test;
In older versions, depending on how much data you have in your tables you can also try:
SELECT QUOTENAME(STUFF(
(SELECT ',' + cast(a as varchar(1)) + ',' + b
FROM test
FOR XML PATH('')), 1, 1,''), '()')
Here you can try a sample
http://sqlfiddle.com/#!18/6c9af/5
with data as (
select *, row_number() over order by colA) as rn
from t
)
select rn,
case rn % 2 when 1 then colA else colB end as alternating
from data;
The following SQL uses undocumented aggregate concatenation technique. This is described in Inside Microsoft SQL Server 2008 T-SQL Programming on page 33.
declare #x varchar(max) = '';
declare #t table (a varchar(10), b varchar(10));
insert into #t values (1,'A'), (2,'B'),(3,'C');
select #x = #x + a + ',' + b + ','
from #t;
select '(' + LEFT(#x, LEN(#x) - 1) + ')';

Dynamic SELECT statement, generate columns based on present and future values

Currently building a SELECT statement in SQL Server 2008 but would like to make this SELECT statement dynamic, so the columns can be defined based on values in a table. I heard about pivot table and cursors, but seems kind of hard to understand at my current level, here is the code;
DECLARE #date DATE = null
IF #date is null
set # date = GETDATE() as DATE
SELECT
Name,
value1,
value2,
value3,
value4
FROM ref_Table a
FULL OUTER JOIN (
SELECT
PK_ID ID,
sum(case when FK_ContainerType_ID = 1 then 1 else null) Box,
sum(case when FK_ContainerType_ID = 2 then 1 else null) Pallet,
sum(case when FK_ContainerType_ID = 3 then 1 else null) Bag,
sum(case when FK_ContainerType_ID = 4 then 1 else null) Drum
from
Packages
WHERE
#date between PackageStart AND PackageEnd
group by PK_ID ) b on a.Name = b.ID
where
Group = 0
The following works great for me , but PK_Type_ID and the name of the column(PackageNameX,..) are hard coded, I need to be dynamic and it can build itself based on present or futures values in the Package table.
Any help or guidance on the right direction would be greatly appreciated...,
As requested
ref_Table (PK_ID, Name)
1, John
2, Mary
3, Albert
4, Jane
Packages (PK_ID, FK_ref_Table_ID, FK_ContainerType_ID, PackageStartDate, PackageEndDate)
1 , 1, 4, 1JAN2014, 30JAN2014
2 , 2, 3, 1JAN2014, 30JAN2014
3 , 3, 2, 1JAN2014, 30JAN2014
4 , 4, 1, 1JAN2014, 30JAN2014
ContainerType (PK_ID, Type)
1, Box
2, Pallet
3, Bag
4, Drum
and the result should look like this;
Name Box Pallet Bag Drum
---------------------------------------
John 1
Mary 1
Albert 1
Jane 1
The following code like I said works great, the issue is the Container table is going to grow and I need to replicated the same report without hard coding the columns.
What you need to build is called a dynamic pivot. There are plenty of good references on Stack if you search out that term.
Here is a solution to your scenario:
IF OBJECT_ID('tempdb..##ref_Table') IS NOT NULL
DROP TABLE ##ref_Table
IF OBJECT_ID('tempdb..##Packages') IS NOT NULL
DROP TABLE ##Packages
IF OBJECT_ID('tempdb..##ContainerType') IS NOT NULL
DROP TABLE ##ContainerType
SET NOCOUNT ON
CREATE TABLE ##ref_Table (PK_ID INT, NAME NVARCHAR(50))
CREATE TABLE ##Packages (PK_ID INT, FK_ref_Table_ID INT, FK_ContainerType_ID INT, PackageStartDate DATE, PackageEndDate DATE)
CREATE TABLE ##ContainerType (PK_ID INT, [Type] NVARCHAR(50))
INSERT INTO ##ref_Table (PK_ID,NAME)
SELECT 1,'John' UNION
SELECT 2,'Mary' UNION
SELECT 3,'Albert' UNION
SELECT 4,'Jane'
INSERT INTO ##Packages (PK_ID, FK_ref_Table_ID, FK_ContainerType_ID, PackageStartDate, PackageEndDate)
SELECT 1,1,4,'2014-01-01','2014-01-30' UNION
SELECT 2,2,3,'2014-01-01','2014-01-30' UNION
SELECT 3,3,2,'2014-01-01','2014-01-30' UNION
SELECT 4,4,1,'2014-01-01','2014-01-30'
INSERT INTO ##ContainerType (PK_ID, [Type])
SELECT 1,'Box' UNION
SELECT 2,'Pallet' UNION
SELECT 3,'Bag' UNION
SELECT 4,'Drum'
DECLARE #DATE DATE, #PARAMDEF NVARCHAR(MAX), #COLS NVARCHAR(MAX), #SQL NVARCHAR(MAX)
SET #DATE = '2014-01-15'
SET #COLS = STUFF((SELECT DISTINCT ',' + QUOTENAME(T.[Type])
FROM ##ContainerType T
FOR XML PATH, TYPE).value('.', 'NVARCHAR(MAX)'),1,1,'')
SET #SQL = 'SELECT [Name], ' + #COLS + '
FROM (SELECT [Name], [Type], 1 AS Value
FROM ##ref_Table R
JOIN ##Packages P ON R.PK_ID = P.FK_ref_Table_ID
JOIN ##ContainerType T ON P.FK_ContainerType_ID = T.PK_ID
WHERE #DATE BETWEEN P.PackageStartDate AND P.PackageEndDate) X
PIVOT (COUNT(Value) FOR [Type] IN (' + #COLS + ')) P
'
PRINT #COLS
PRINT #SQL
SET #PARAMDEF = '#DATE DATE'
EXEC SP_EXECUTESQL #SQL, #PARAMDEF, #DATE=#DATE
Output:
Name Bag Box Drum Pallet
Albert 0 0 0 1
Jane 0 1 0 0
John 0 0 1 0
Mary 1 0 0 0
Static Query:
SELECT [Name],[Box],[Pallet],[Bag],[Drum] FROM
(
SELECT *
FROM
(
SELECT rf.Name,cnt.[Type], pk.PK_ID AS PKID, rf.PK_ID AS RFID
FROM ref_Table rf INNER JOIN Packages pk ON rf.PK_ID = pk.FK_ref_Table_ID
INNER JOIN ContanerType cnt ON cnt.PK_ID = pk.FK_ContainerType_ID
) AS SourceTable
PIVOT
(
COUNT(PKID )
FOR [Type]
IN ( [Box],[Pallet],[Bag],[Drum])
) AS PivotTable
) AS Main
ORDER BY RFID
Dynamic Query:
DECLARE #columnList nvarchar (MAX)
DECLARE #pivotsql nvarchar (MAX)
SELECT #columnList = STUFF(
(
SELECT ',' + '[' + [Type] + ']'
FROM ContanerType
FOR XML PATH( '')
)
,1, 1,'' )
SET #pivotsql =
N'SELECT [Name],' + #columnList + ' FROM
(
SELECT *
FROM
(
SELECT rf.Name,cnt.[Type], pk.PK_ID AS PKID, rf.PK_ID AS RFID
FROM ref_Table rf INNER JOIN Packages pk ON rf.PK_ID = pk.FK_ref_Table_ID
INNER JOIN ContanerType cnt ON cnt.PK_ID = pk.FK_ContainerType_ID
) AS SourceTable
PIVOT
(
COUNT(PKID )
FOR [Type]
IN ( ' + #columnList + ')
) AS PivotTable
) AS Main
ORDER BY RFID;'
EXEC sp_executesql #pivotsql
Following my tutorial below will help you to understand the PIVOT functionality:
We write sql queries in order to get different result sets like full, partial, calculated, grouped, sorted etc from the database tables. However sometimes we have requirements that we have to rotate our tables. Sounds confusing?
Let's keep it simple and consider the following two screen grabs.
SQL Table:
Expected Results:
Wow, that's look like a lot of work! That is a combination of tricky sql, temporary tables, loops, aggregation......, blah blah blah
Don't worry let's keep it simple, stupid(KISS).
MS SQL Server 2005 and above has a function called PIVOT. It s very simple to use and powerful. With the help of this function we will be able to rotate sql tables and result sets.
Simple steps to make it happen:
Identify all the columns those will be part of the desired result set.
Find the column on which we will apply aggregation(sum,ave,max,min etc)
Identify the column which values will be the column header.
Specify the column values mentioned in step3 with comma separated and surrounded by square brackets.
So, if we now follow above four steps and extract information from the above sales table, it will be as below:
Year, Month, SalesAmount
SalesAmount
Month
[Jan],[Feb] ,[Mar] .... etc
We are nearly there if all the above steps made sense to you so far.
Now we have all the information we need. All we have to do now is to fill the below template with required information.
Template:
Our SQL query should look like below:
SELECT *
FROM
(
SELECT SalesYear, SalesMonth,Amount
FROM Sales
) AS SourceTable
PIVOT
(
SUM(Amount )
FOR SalesMonth
IN ( [Jan],[Feb] ,[Mar],
[Apr],[May],[Jun] ,[Jul],
[Aug],[Sep] ,[Oct],[Nov] ,[Dec])
) AS PivotTable;
In the above query we have hard coded the column names. Well it's not fun when you have to specify a number of columns.
However, there is a work arround as follows:
DECLARE #columnList nvarchar (MAX)
DECLARE #pivotsql nvarchar (MAX)
SELECT #columnList = STUFF(
(
SELECT ',' + '[' + SalesMonth + ']'
FROM Sales
GROUP BY SalesMonth
FOR XML PATH( '')
)
,1, 1,'' )
SET #pivotsql =
N'SELECT *
FROM
(
SELECT SalesYear, SalesMonth,Amount
FROM Sales
) AS SourceTable
PIVOT
(
SUM(Amount )
FOR SalesMonth
IN ( ' + #columnList +' )
) AS PivotTable;'
EXEC sp_executesql #pivotsql
Hopefully this tutorial will be a help to someone somewhere.
Enjoy coding.

SQL group records with same value, and place that into a table

I want to group same record with SQL, the following is my result
Name Code Qty
data1 AG 12
data1 AS 15
data2 MS 10
data2 IS 11
I want it to be like this instead.
Name Code Qty Code Qty
data1 AG 12 AS 15
data2 MS 10 IS 11
Can this be done in SQL only?
Can this be done in SQL only?
With a variable number of columns you have to build a the query dynamically.
I would hardly call this "SQL only" but it can be done in T-SQL and here is one way.
-- Sample table
declare #T table
(
Name varchar(5),
Code varchar(2),
Qty int
)
-- Sample data
insert into #T values
('data1', 'AG', 12),
('data1', 'AS', 15),
('data1', 'AQ', 17),
('data2', 'MS', 10),
('data2', 'IS', 11)
declare #XML xml
declare #SQL nvarchar(max)
declare #Max int
-- Max number of codes per name
select #Max = max(C)
from (select count(*) as C
from #T
group by Name) as T
-- Convert table to XML
set #XML = (select Name,
(select Code,
Qty
from #T as T2
where T1.Name = T2.Name
for xml path('c'), type)
from #T as T1
group by Name
for xml path('r'))
-- Build a dynamic query
;with Numbers(Number) as
(
select 1
union all
select Number + 1
from Numbers
where Number < #Max
)
select #SQL = 'select T.N.value(''Name[1]'', ''varchar(5)'') as Name ' +
(select ',T.N.value(''c['+cast(Number as nvarchar(10))+']/Code[1]'', ''char(2)'') as Code
,T.N.value(''c['+cast(Number as nvarchar(10))+']/Qty[1]'', ''int'') as Qty'
from Numbers
for xml path(''), type).value('.', 'nvarchar(max)') +
' from #xml.nodes(''/r'') as T(N)'
-- Execute query
exec sp_executesql #SQL, N'#xml xml', #XML
Result:
Name Code Qty Code Qty Code Qty
----- ---- ----------- ---- ----------- ---- -----------
data1 AG 12 AS 15 AQ 17
data2 MS 10 IS 11 NULL NULL
Try here: https://data.stackexchange.com/stackoverflow/q/122860/
Assuming you are using SQLServer, you could use rank to assign a sequential number to each code with the name group, like so:
select Name, Code, Qty, Rank() OVER (PARTITION BY Name ORDER BY Code) AS CodeRank
from MyTable
Then you can use the pivot functionality within either SQLServer or SSRS to format this as required.
Not exactly. You can hack around with the idea of concatenating lots of Code,Qty pairs into a single value for each record, though. I'm not sure what you're planning to do with multiple columns with the same name, even if such a thing were possible.

How can I pivot these key+values rows into a table of complete entries?

Maybe I demand too much from SQL but I feel like this should be possible. I start with a list of key-value pairs, like this:
'0:First, 1:Second, 2:Third, 3:Fourth'
etc. I can split this up pretty easily with a two-step parse that gets me a table like:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 1 1
3 1 Second
etc.
Now, in the simple case of splitting the pairs into a pair of columns, it's fairly easy. I'm interested in the more advanced case where I might have multiple values per entry, like:
'0:First:Fishing, 1:Second:Camping, 2:Third:Hiking'
and such.
In that generic case, I'd like to find a way to take my 3-column result table and somehow pivot it to have one row per entry and one column per value-part.
So I want to turn this:
EntryNumber PairNumber Item
0 0 0
1 0 First
2 0 Fishing
3 1 1
4 1 Second
5 1 Camping
Into this:
Entry [1] [2] [3]
0 0 First Fishing
1 1 Second Camping
Is that just too much for SQL to handle, or is there a way? Pivots (even tricky dynamic pivots) seem like an answer, but I can't figure how to get that to work.
No, in SQL you can't infer columns dynamically based on the data found during the same query.
Even using the PIVOT feature in Microsoft SQL Server, you must know the columns when you write the query, and you have to hard-code them.
You have to do a lot of work to avoid storing the data in a relational normal form.
Alright, I found a way to accomplish what I was after. Strap in, this is going to get bumpy.
So the basic problem is to take a string with two kinds of delimiters: entries and values. Each entry represents a set of values, and I wanted to turn the string into a table with one column for each value per entry. I tried to make this a UDF, but the necessity for a temporary table and dynamic SQL meant it had to be a stored procedure.
CREATE PROCEDURE [dbo].[ParseValueList]
(
#parseString varchar(8000),
#itemDelimiter CHAR(1),
#valueDelimiter CHAR(1)
)
AS
BEGIN
SET NOCOUNT ON;
IF object_id('tempdb..#ParsedValues') IS NOT NULL
BEGIN
DROP TABLE #ParsedValues
END
CREATE TABLE #ParsedValues
(
EntryID int,
[Rank] int,
Pair varchar(200)
)
So that's just basic set up, establishing the temp table to hold my intermediate results.
;WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),--Brute forces 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --Uses a cross join to generate 100 rows (10 * 10)
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --Uses a cross join to generate 10,000 rows (100 * 100)
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E4)
That beautiful piece of SQL comes from SQL Server Central's Forums and is credited to "a guru." It's a great little 10,000 line tally table perfect for string splitting.
INSERT INTO #ParsedValues
SELECT ItemNumber AS EntryID, ROW_NUMBER() OVER (PARTITION BY ItemNumber ORDER BY ItemNumber) AS [Rank],
SUBSTRING(Items.Item, T1.N, CHARINDEX(#valueDelimiter, Items.Item + #valueDelimiter, T1.N) - T1.N) AS [Value]
FROM(
SELECT ROW_NUMBER() OVER (ORDER BY T2.N) AS ItemNumber,
SUBSTRING(#parseString, T2.N, CHARINDEX(#itemDelimiter, #parseString + #itemDelimiter, T2.N) - T2.N) AS Item
FROM cteTally T2
WHERE T2.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#itemDelimiter + #parseString, T2.N, 1) = #itemDelimiter
) AS Items, cteTally T1
WHERE T1.N < LEN(#parseString) + 2 --Ensures we cut out once the entire string is done
AND SUBSTRING(#valueDelimiter + Items.Item, T1.N, 1) = #valueDelimiter
Ok, this is the first really dense meaty part. The inner select is breaking up my string along the item delimiter (the comma), using the guru's string splitting method. Then that table is passed up to the outer select which does the same thing, but this time using the value delimiter (the colon) to each row. The inner RowNumber (EntryID) and the outer RowNumber over Partition (Rank) are key to the pivot. EntryID show which Item the values belong to, and Rank shows the ordinal of the values.
DECLARE #columns varchar(200)
DECLARE #columnNames varchar(2000)
DECLARE #query varchar(8000)
SELECT #columns = COALESCE(#columns + ',[' + CAST([Rank] AS varchar) + ']', '[' + CAST([Rank] AS varchar)+ ']'),
#columnNames = COALESCE(#columnNames + ',[' + CAST([Rank] AS varchar) + '] AS Value' + CAST([Rank] AS varchar)
, '[' + CAST([Rank] AS varchar)+ '] AS Value' + CAST([Rank] AS varchar))
FROM (SELECT DISTINCT [Rank] FROM #ParsedValues) AS Ranks
SET #query = '
SELECT '+ #columnNames +'
FROM #ParsedValues
PIVOT
(
MAX([Value]) FOR [Rank]
IN (' + #columns + ')
) AS pvt'
EXECUTE(#query)
DROP TABLE #ParsedValues
END
And at last, the dynamic sql that makes it possible. By getting a list of Distinct Ranks, we set up our column list. This is then written into the dynamic pivot which tilts the values over and slots each value into the proper column, each with a generic "Value#" heading.
Thus by calling EXEC ParseValueList with a properly formatted string of values, we can break it up into a table to feed into our purposes! It works (but is probably overkill) for simple key:value pairs, and scales up to a fair number of columns (About 50 at most, I think, but that'd be really silly.)
Anyway, hope that helps anyone having a similar issue.
(Yeah, it probably could have been done in something like SQLCLR as well, but I find a great joy in solving problems with pure SQL.)
Though probably not optimal, here's a more condensed solution.
DECLARE #DATA varchar(max);
SET #DATA = '0:First:Fishing, 1:Second:Camping, 2:Third:Hiking';
SELECT
DENSE_RANK() OVER (ORDER BY [Data].[row]) AS [Entry]
, [Data].[row].value('(./B/text())[1]', 'int') as "[1]"
, [Data].[row].value('(./B/text())[2]', 'varchar(64)') as "[2]"
, [Data].[row].value('(./B/text())[3]', 'varchar(64)') as "[3]"
FROM
(
SELECT
CONVERT(XML, '<A><B>' + REPLACE(REPLACE(#DATA , ',', '</B></A><A><B>'), ':', '</B><B>') + '</B></A>').query('.')
) AS [T]([c])
CROSS APPLY [T].[c].nodes('/A') AS [Data]([row]);
Hope is not too late.
You can use the function RANK to know the position of each Item per PairNumber. And then use Pivot
SELECT PairNumber, [1] ,[2] ,[3]
FROM
(
SELECT PairNumber, Item, RANK() OVER (PARTITION BY PairNumber order by EntryNumber) as RANKing
from tabla) T
PIVOT
(MAX(Item)
FOR RANKing in ([1],[2],[3])
)as PVT