How to create a SQL Server function to "join" multiple rows from a subquery into a single delimited field? [duplicate] - sql

This question already has answers here:
How to concatenate text from multiple rows into a single text string in SQL Server
(47 answers)
Closed 5 years ago.
To illustrate, assume that I have two tables as follows:
VehicleID Name
1 Chuck
2 Larry
LocationID VehicleID City
1 1 New York
2 1 Seattle
3 1 Vancouver
4 2 Los Angeles
5 2 Houston
I want to write a query to return the following results:
VehicleID Name Locations
1 Chuck New York, Seattle, Vancouver
2 Larry Los Angeles, Houston
I know that this can be done using server side cursors, ie:
DECLARE #VehicleID int
DECLARE #VehicleName varchar(100)
DECLARE #LocationCity varchar(100)
DECLARE #Locations varchar(4000)
DECLARE #Results TABLE
(
VehicleID int
Name varchar(100)
Locations varchar(4000)
)
DECLARE VehiclesCursor CURSOR FOR
SELECT
[VehicleID]
, [Name]
FROM [Vehicles]
OPEN VehiclesCursor
FETCH NEXT FROM VehiclesCursor INTO
#VehicleID
, #VehicleName
WHILE ##FETCH_STATUS = 0
BEGIN
SET #Locations = ''
DECLARE LocationsCursor CURSOR FOR
SELECT
[City]
FROM [Locations]
WHERE [VehicleID] = #VehicleID
OPEN LocationsCursor
FETCH NEXT FROM LocationsCursor INTO
#LocationCity
WHILE ##FETCH_STATUS = 0
BEGIN
SET #Locations = #Locations + #LocationCity
FETCH NEXT FROM LocationsCursor INTO
#LocationCity
END
CLOSE LocationsCursor
DEALLOCATE LocationsCursor
INSERT INTO #Results (VehicleID, Name, Locations) SELECT #VehicleID, #Name, #Locations
END
CLOSE VehiclesCursor
DEALLOCATE VehiclesCursor
SELECT * FROM #Results
However, as you can see, this requires a great deal of code. What I would like is a generic function that would allow me to do something like this:
SELECT VehicleID
, Name
, JOIN(SELECT City FROM Locations WHERE VehicleID = Vehicles.VehicleID, ', ') AS Locations
FROM Vehicles
Is this possible? Or something similar?

If you're using SQL Server 2005, you could use the FOR XML PATH command.
SELECT [VehicleID]
, [Name]
, (STUFF((SELECT CAST(', ' + [City] AS VARCHAR(MAX))
FROM [Location]
WHERE (VehicleID = Vehicle.VehicleID)
FOR XML PATH ('')), 1, 2, '')) AS Locations
FROM [Vehicle]
It's a lot easier than using a cursor, and seems to work fairly well.
Update
For anyone still using this method with newer versions of SQL Server, there is another way of doing it which is a bit easier and more performant using the
STRING_AGG method that has been available since SQL Server 2017.
SELECT [VehicleID]
,[Name]
,(SELECT STRING_AGG([City], ', ')
FROM [Location]
WHERE VehicleID = V.VehicleID) AS Locations
FROM [Vehicle] V
This also allows a different separator to be specified as the second parameter, providing a little more flexibility over the former method.

Note that Matt's code will result in an extra comma at the end of the string; using COALESCE (or ISNULL for that matter) as shown in the link in Lance's post uses a similar method but doesn't leave you with an extra comma to remove. For the sake of completeness, here's the relevant code from Lance's link on sqlteam.com:
DECLARE #EmployeeList varchar(100)
SELECT #EmployeeList = COALESCE(#EmployeeList + ', ', '') +
CAST(EmpUniqueID AS varchar(5))
FROM SalesCallsEmployees
WHERE SalCal_UniqueID = 1

I don't belive there's a way to do it within one query, but you can play tricks like this with a temporary variable:
declare #s varchar(max)
set #s = ''
select #s = #s + City + ',' from Locations
select #s
It's definitely less code than walking over a cursor, and probably more efficient.

In a single SQL query, without using the FOR XML clause.
A Common Table Expression is used to recursively concatenate the results.
-- rank locations by incrementing lexicographical order
WITH RankedLocations AS (
SELECT
VehicleID,
City,
ROW_NUMBER() OVER (
PARTITION BY VehicleID
ORDER BY City
) Rank
FROM
Locations
),
-- concatenate locations using a recursive query
-- (Common Table Expression)
Concatenations AS (
-- for each vehicle, select the first location
SELECT
VehicleID,
CONVERT(nvarchar(MAX), City) Cities,
Rank
FROM
RankedLocations
WHERE
Rank = 1
-- then incrementally concatenate with the next location
-- this will return intermediate concatenations that will be
-- filtered out later on
UNION ALL
SELECT
c.VehicleID,
(c.Cities + ', ' + l.City) Cities,
l.Rank
FROM
Concatenations c -- this is a recursion!
INNER JOIN RankedLocations l ON
l.VehicleID = c.VehicleID
AND l.Rank = c.Rank + 1
),
-- rank concatenation results by decrementing length
-- (rank 1 will always be for the longest concatenation)
RankedConcatenations AS (
SELECT
VehicleID,
Cities,
ROW_NUMBER() OVER (
PARTITION BY VehicleID
ORDER BY Rank DESC
) Rank
FROM
Concatenations
)
-- main query
SELECT
v.VehicleID,
v.Name,
c.Cities
FROM
Vehicles v
INNER JOIN RankedConcatenations c ON
c.VehicleID = v.VehicleID
AND c.Rank = 1

From what I can see FOR XML (as posted earlier) is the only way to do it if you want to also select other columns (which I'd guess most would) as the OP does.
Using COALESCE(#var... does not allow inclusion of other columns.
Update:
Thanks to programmingsolutions.net there is a way to remove the "trailing" comma to.
By making it into a leading comma and using the STUFF function of MSSQL you can replace the first character (leading comma) with an empty string as below:
stuff(
(select ',' + Column
from Table
inner where inner.Id = outer.Id
for xml path('')
), 1,1,'') as Values

In SQL Server 2005
SELECT Stuff(
(SELECT N', ' + Name FROM Names FOR XML PATH(''),TYPE)
.value('text()[1]','nvarchar(max)'),1,2,N'')
In SQL Server 2016
you can use the FOR JSON syntax
i.e.
SELECT per.ID,
Emails = JSON_VALUE(
REPLACE(
(SELECT _ = em.Email FROM Email em WHERE em.Person = per.ID FOR JSON PATH)
,'"},{"_":"',', '),'$[0]._'
)
FROM Person per
And the result will become
Id Emails
1 abc#gmail.com
2 NULL
3 def#gmail.com, xyz#gmail.com
This will work even your data contains invalid XML characters
the '"},{"":"' is safe because if you data contain '"},{"":"', it will be escaped to "},{\"_\":\"
You can replace ', ' with any string separator
And in SQL Server 2017, Azure SQL Database
You can use the new STRING_AGG function

The below code will work for Sql Server 2000/2005/2008
CREATE FUNCTION fnConcatVehicleCities(#VehicleId SMALLINT)
RETURNS VARCHAR(1000) AS
BEGIN
DECLARE #csvCities VARCHAR(1000)
SELECT #csvCities = COALESCE(#csvCities + ', ', '') + COALESCE(City,'')
FROM Vehicles
WHERE VehicleId = #VehicleId
return #csvCities
END
-- //Once the User defined function is created then run the below sql
SELECT VehicleID
, dbo.fnConcatVehicleCities(VehicleId) AS Locations
FROM Vehicles
GROUP BY VehicleID

I've found a solution by creating the following function:
CREATE FUNCTION [dbo].[JoinTexts]
(
#delimiter VARCHAR(20) ,
#whereClause VARCHAR(1)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #Texts VARCHAR(MAX)
SELECT #Texts = COALESCE(#Texts + #delimiter, '') + T.Texto
FROM SomeTable AS T
WHERE T.SomeOtherColumn = #whereClause
RETURN #Texts
END
GO
Usage:
SELECT dbo.JoinTexts(' , ', 'Y')

Mun's answer didn't work for me so I made some changes to that answer to get it to work. Hope this helps someone.
Using SQL Server 2012:
SELECT [VehicleID]
, [Name]
, STUFF((SELECT DISTINCT ',' + CONVERT(VARCHAR,City)
FROM [Location]
WHERE (VehicleID = Vehicle.VehicleID)
FOR XML PATH ('')), 1, 2, '') AS Locations
FROM [Vehicle]

VERSION NOTE: You must be using SQL Server 2005 or greater with Compatibility Level set to 90 or greater for this solution.
See this MSDN article for the first example of creating a user-defined aggregate function that concatenates a set of string values taken from a column in a table.
My humble recommendation would be to leave out the appended comma so you can use your own ad-hoc delimiter, if any.
Referring to the C# version of Example 1:
change: this.intermediateResult.Append(value.Value).Append(',');
to: this.intermediateResult.Append(value.Value);
And
change: output = this.intermediateResult.ToString(0, this.intermediateResult.Length - 1);
to: output = this.intermediateResult.ToString();
That way when you use your custom aggregate, you can opt to use your own delimiter, or none at all, such as:
SELECT dbo.CONCATENATE(column1 + '|') from table1
NOTE: Be careful about the amount of the data you attempt to process in your aggregate. If you try to concatenate thousands of rows or many very large datatypes you may get a .NET Framework error stating "[t]he buffer is insufficient."

With the other answers, the person reading the answer must be aware of the vehicle table and create the vehicle table and data to test a solution.
Below is an example that uses SQL Server "Information_Schema.Columns" table. By using this solution, no tables need to be created or data added. This example creates a comma separated list of column names for all tables in the database.
SELECT
Table_Name
,STUFF((
SELECT ',' + Column_Name
FROM INFORMATION_SCHEMA.Columns Columns
WHERE Tables.Table_Name = Columns.Table_Name
ORDER BY Column_Name
FOR XML PATH ('')), 1, 1, ''
)Columns
FROM INFORMATION_SCHEMA.Columns Tables
GROUP BY TABLE_NAME

Try this query
SELECT v.VehicleId, v.Name, ll.LocationList
FROM Vehicles v
LEFT JOIN
(SELECT
DISTINCT
VehicleId,
REPLACE(
REPLACE(
REPLACE(
(
SELECT City as c
FROM Locations x
WHERE x.VehicleID = l.VehicleID FOR XML PATH('')
),
'</c><c>',', '
),
'<c>',''
),
'</c>', ''
) AS LocationList
FROM Locations l
) ll ON ll.VehicleId = v.VehicleId

If you're running SQL Server 2005, you can write a custom CLR aggregate function to handle this.
C# version:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined,MaxByteSize=8000)]
public class CSV:IBinarySerialize
{
private StringBuilder Result;
public void Init() {
this.Result = new StringBuilder();
}
public void Accumulate(SqlString Value) {
if (Value.IsNull) return;
this.Result.Append(Value.Value).Append(",");
}
public void Merge(CSV Group) {
this.Result.Append(Group.Result);
}
public SqlString Terminate() {
return new SqlString(this.Result.ToString());
}
public void Read(System.IO.BinaryReader r) {
this.Result = new StringBuilder(r.ReadString());
}
public void Write(System.IO.BinaryWriter w) {
w.Write(this.Result.ToString());
}
}

Related

SQL query to obtain counts from two tables based on values and field names

I want to count the alerts of the candidates based on district.
Below is the district-wise alert lookup table
Table_LKP_AlertMastInfo
DistrictID FieldName AlertOptionValue
71 AreYouMarried Yes
71 Gender Female
72 AreYouMarried Yes
The above Table_LKP_AlertMastInfo FieldName should compare with table_RegistrationInfo fields to check the AlertOptionValue to get counts.
Below is the candidate details table:
Table_RegistrationInfo
CandidateId DistrictID AreYouMarried Gender
Can001 71 Yes Female
Can002 71 No Female
Can003 72 Yes Man
Can004 72 No Man
I want output like below:
Can001 2
Can002 1
Can003 1
Explanation of the above output counts:
Can001 have selected AreYouMarried:Yes and Gender:Female then count value 2
Can002 have selected Gender:Female then count value 1
Can003 have selected AreYouMarried:Yes then count value 1
Can004 have not alerts
This won't be possible without dynamic SQL if your data is modeled like it is, i.e. key-value pairs in Table_LKP_AlertMastInfo and columns in Table_RegistrationInfo. So with that out of our way, let's do it. Full code to the stored procedure providing the exact results you need is at the end, I'll follow with the explanation on what it does.
Because the alerts are specified as key-value pairs (field name - field value), we'll first need to get the candidate data in the same format. UNPIVOT can fix this right up, if we can get it the list of the fields. Had we only had only the two fields you mention in the question, it would be rather easy, something like:
SELECT CandidateId, DistrictID
, FieldName
, FieldValue
FROM Table_RegistrationInfo t
UNPIVOT (FieldValue FOR FieldName IN (AreYouMarried, Gender)) upvt
Of course that's not the case, so we'll need to dynamically select the list of the fields we're interested in and provide that. Since you're on 2008 R2, STRING_AGG is not yet available, so we'll use the XML trick to aggregate all the fields into a single string and provide it to the query above.
DECLARE #sql NVARCHAR(MAX)
SELECT #sql = CONCAT('SELECT CandidateId, DistrictID
, FieldName
, FieldValue
FROM Table_RegistrationInfo t
UNPIVOT (FieldValue FOR FieldName IN (',
STUFF((
SELECT DISTINCT ',' + ami.FieldName
FROM Table_LKP_AlertMastInfo ami
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, ''), ')) upvt')
PRINT #sql
This produces almost the exact output as the query I wrote. Next, we need to store this data somewhere. Temporary tables to the rescue. Let's create one and insert into it using this dynamic SQL.
CREATE TABLE #candidateFields
(
CandidateID VARCHAR(50),
DistrictID INT,
FieldName NVARCHAR(200),
FieldValue NVARCHAR(1000)
);
INSERT INTO #candidateFields
EXEC sp_executesql #sql
-- (8 rows affected)
-- We could index this for good measure
CREATE UNIQUE CLUSTERED INDEX uxc#candidateFields on #candidateFields
(
CandidateId, DistrictId, FieldName, FieldValue
);
Great, with that out of the way, we now have both data sets - alerts and candidate data - in the same format. It's a matter of joining to find matches between both:
SELECT cf.CandidateID, COUNT(*) AS matches
FROM #candidateFields cf
INNER
JOIN Table_LKP_AlertMastInfo alerts
ON alerts.DistrictID = cf.DistrictID
AND alerts.FieldName = cf.FieldName
AND alerts.AlertOptionValue = cf.FieldValue
GROUP BY cf.CandidateID
Provides the desired output for the sample data:
CandidateID matches
-------------------------------------------------- -----------
Can001 2
Can002 1
Can003 1
(3 rows affected)
So we can stitch all that together now to form a reusable stored procedure:
CREATE PROCEDURE dbo.findMatches
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sql NVARCHAR(MAX)
SELECT #sql = CONCAT('SELECT CandidateId, DistrictID
, FieldName
, FieldValue
FROM Table_RegistrationInfo t
UNPIVOT (FieldValue FOR FieldName IN (',
STUFF((
SELECT DISTINCT ',' + ami.FieldName
FROM Table_LKP_AlertMastInfo ami
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, ''), ')) upvt')
CREATE TABLE #candidateFields
(
CandidateID VARCHAR(50),
DistrictID INT,
FieldName NVARCHAR(200),
FieldValue NVARCHAR(1000)
);
INSERT INTO #candidateFields
EXEC sp_executesql #sql
CREATE UNIQUE CLUSTERED INDEX uxc#candidateFields on #candidateFields
(
CandidateId, DistrictId, FieldName
);
SELECT cf.CandidateID, COUNT(*) AS matches
FROM #candidateFields cf
JOIN Table_LKP_AlertMastInfo alerts
ON alerts.DistrictID = cf.DistrictID
AND alerts.FieldName = cf.FieldName
AND alerts.AlertOptionValue = cf.FieldValue
GROUP BY cf.CandidateID
END;
Execute with
EXEC dbo.findMatches
You'd of course need to adjust types and probably add a bunch of other things here, like error handling, but this should get you started on the right path. You'll want a covering index on that alert table and it should be pretty fast even with a lot of records.
I managed to get the expected result without using dynamic queries.
Not sure if this is what you are looking for:
SELECT DISTINCT
c.CandidateId, SUM(a.AreYouMarriedAlert + a.GenderAlter) AS AlterCount
FROM
Table_RegistrationInfo c
OUTER APPLY
(
SELECT
CASE
WHEN a.FieldName = 'AreYouMarried' AND c.AreYouMarried = a.AlertOptionValue THEN 1
ELSE 0
END AS AreYouMarriedAlert,
CASE
WHEN a.FieldName = 'Gender' AND c.Gender = a.AlertOptionValue THEN 1
ELSE 0
END AS GenderAlter
FROM
Table_LKP_AlertMastInfo a
WHERE
a.DistrictID = c.DistrictID
) a
GROUP BY c.CandidateId
HAVING SUM(a.AreYouMarriedAlert + a.GenderAlter) > 0
Results:
I asusme that with 100 fields you have a set of alerts which are a combinatioin of values. Further I assume that you can have a select list in a proper order all the time. So
select candidateid,
AreyouMarried || '|' || Gender all_responses_in_one_string
from ....
is psssible. So above will return
candidateid all_responses_in_one_string
can001 Yes|Female
can002 No|Female
So now your alert can be a regular expression for the concatenated string. And your alert is based on how much you matched.
Here is one simple way of doing this:
SELECT subq.*
FROM
(SELECT CandidateId,
(SELECT COUNT(*)
FROM Table_LKP_AlertMastInfo ami
WHERE ami.DistrictID = ri.DistrictID
AND ami.FieldName ='AreYouMarried'
AND ami.AlertOptionValue = ri.AreYouMarried) +
(SELECT COUNT(*)
FROM Table_LKP_AlertMastInfo ami
WHERE ami.DistrictID = ri.DistrictID
AND ami.FieldName ='Gender'
AND ami.AlertOptionValue = ri.Gender) AS [count]
FROM Table_RegistrationInfo ri) subq
WHERE subq.[count] > 0;
See SQL Fiddle demo.
I am not sure if this can be completely done using SQL. If you are using some backend technology such as ADO.NET, then you can store the results in Datatables. Loop through the column names and do the comparison.
Dynamic SQL can be used to make Table_LKP_AlertMastInfo look like Table_RegistrationInfo.
This script can be used in a stored procedure and results can be retrieved in a Datatable.
DECLARE #SQL NVARCHAR(MAX)
DECLARE #PivotFieldNameList nvarchar(MAX)
SET #SQL = ''
SET #PivotFieldNameList = ''
SELECT #PivotFieldNameList = #PivotFieldNameList + FieldName + ', '
FROM (SELECT DISTINCT FieldName FROM Table_LKP_AlertMastInfo) S
SET #PivotFieldNameList = SUBSTRING(#PivotFieldNameList, 1, LEN(#PivotFieldNameList) - 1)
--SELECT #PivotFieldNameList
SET #SQL = ' SELECT DistrictId, ' + #PivotFieldNameList + ' FROM
Table_LKP_AlertMastInfo
PIVOT
( MAX(AlertOptionValue)
FOR FieldName IN (' + #PivotFieldNameList + '
) ) AS p '
PRINT #SQL
EXEC(#SQL)
Above query results like below
DistrictId AreYouMarried Gender
71 Yes Female
72 Yes NULL
If you get results from Table_RegistrationInfo into another Datatable, then both can be used for comparison.
Not tested but this should do the trick:
SELECT CandidateId,
( CASE
WHEN AreYouMarried = "Yes" AND Gender = 'Female' THEN 2
WHEN Gender = 'Female' THEN 1
WHEN AreYouMarried = "Yes" THEN 1
ELSE 0 END
) as CandidateValue
FROM
(SELECT * FROM Table_LKP_AlertMastInfo) as Alert
LEFT JOIN
(SELECT * FROM Table_RegistrationInfo) as Registration
ON (Alert.DistrictID = Registration.DistrictID);
This should give you a list with candidateId matching the condition count

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

replace value in varchar(max) field with join

I have a table that contains text field with placeholders. Something like this:
Row Notes
1. This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2. Second row...just a ##test#.
(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).
Now, I have a lookup table that looks like this:
Name Value
placeholder130 Dog
myPlaceholder Cat
oneMore Cow
test Horse
(Lookup table will contain anywhere from 10k to 100k records)
I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):
This is some notes Dog this Cat, Cow. End.
What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.
You could try to split the string using a numbers table and rebuild it with for xml path.
select (
select coalesce(L.Value, T.Value)
from Numbers as N
cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
left outer join Lookup as L
on L.Name = T.Value
where N.Number <= len(notes) and
substring('##' + notes, Number, 2) = '##'
order by N.Number
for xml path(''), type
).value('text()[1]', 'varchar(max)')
from Notes
SQL Fiddle
I borrowed the string splitting from this blog post by Aaron Bertrand
SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.
Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:
; with Repl as
(
select row_number() over (order by l.name) rn
, Name
, Value
from Lookup l
)
, Recurse as
(
select Notes
, 0 as rn
from Notes
union all
select replace(Notes, '##' + l.name + '##', l.value)
, r.rn + 1
from Recurse r
join Repl l
on l.rn = r.rn + 1
)
select *
from Recurse
where rn =
(
select count(*)
from Lookup
)
option (maxrecursion 0)
Example at SQL Fiddle.
Another option is a while loop to keep replacing lookups until no more are found:
declare #notes table (notes varchar(max))
insert #notes
select Notes
from Notes
while 1=1
begin
update n
set Notes = replace(n.Notes, '##' + l.name + '##', l.value)
from #notes n
outer apply
(
select top 1 Name
, Value
from Lookup l
where n.Notes like '%##' + l.name + '##%'
) l
where l.name is not null
if ##rowcount = 0
break
end
select *
from #notes
Example at SQL Fiddle.
I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.
Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.
I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.
note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.
--setup
if object_id('dbo.[Lookup]') is not null
drop table dbo.[Lookup];
go
if object_id('dbo.fn_ReplaceLookups') is not null
drop function dbo.fn_ReplaceLookups;
go
create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
insert into dbo.[Lookup]
select '##placeholder130##','Dog' union all
select '##myPlaceholder##','Cat' union all
select '##oneMore##','Cow' union all
select '##test##','Horse';
go
create function [dbo].[fn_ReplaceLookups](#input varchar(max))
returns varchar(max)
as
begin
declare #xml xml;
select #xml = cast(('<r><i>'+replace(#input,'##' ,'</i><i>')+'</i></r>') as xml);
--extract the potential tokens
declare #LookupsInString table (LookupName varchar(100) primary key);
insert into #LookupsInString
select distinct '##'+v+'##'
from ( select [v] = r.n.value('(./text())[1]', 'varchar(100)'),
[r] = row_number() over (order by n)
from #xml.nodes('r/i') r(n)
)d(v,r)
where r%2=0;
--tokenize the input
select #input = replace(#input, l.LookupName, l.LookupValue)
from dbo.[Lookup] l
join #LookupsInString lis on
l.LookupName = lis.LookupName;
return #input;
end
go
return
--usage
declare #Notes table ([Id] int primary key, notes varchar(100));
insert into #Notes
select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
select 2, 'Second row...just a ##test##.';
select *,
dbo.fn_ReplaceLookups(notes)
from #Notes;
Returns:
Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.
Try this
;WITH CTE (org, calc, [Notes], [level]) AS
(
SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable
UNION ALL
SELECT CTE.org, CTE.[Notes],
CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
FROM CTE
INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'
)
SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
SQL FIDDLE DEMO
Check the below devioblog post for reference
devioblog post
To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.
Notes
Id FragSeq Text SubsId
1 1 'This is some notes ' 1
1 2 ' this ' 2
1 3 ', ' 3
1 4 '. End.' null
2 1 'Second row...just a ' 4
2 2 '.' null
Subs
Id Name Value
1 'placeholder130' 'Dog'
2 'myPlaceholder' 'Cat'
3 'oneMore' 'Cow'
4 'test' 'Horse'
Now we can do the substitutions with a simple join.
SELECT Notes.Text + COALESCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:
DECLARE #Note VARCHAR(8000)
SELECT #Note = COALESCE(#Note, '') + Notes.Text + COALSCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.
Unfortunately I'm not at a location where I can test this, but it ought to work fine.
I really don't know how it will perform with 10k+ of lookups.
how does the old dynamic SQL performs?
DECLARE #sqlCommand NVARCHAR(MAX)
SELECT #sqlCommand = N'PlaceholderTable.[Notes]'
SELECT #sqlCommand = 'REPLACE( ' + #sqlCommand +
', ''##' + LookupTable.[Name] + '##'', ''' +
LookupTable.[Value] + ''')'
FROM LookupTable
SELECT #sqlCommand = 'SELECT *, ' + #sqlCommand + ' FROM PlaceholderTable'
EXECUTE sp_executesql #sqlCommand
Fiddle demo
And now for some recursive CTE.
If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...
;WITH T AS (
SELECT
Row,
StartIdx = 1, -- 1 as first starting index
EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
-- (first) temp result bounded by indexes
FROM PlaceholderTable -- **this is your source table**
UNION ALL
SELECT
pt.Row,
StartIdx = newstartidx, -- starting index (calculated in calc1)
EndIdx = EndIdx + CAST(newendidx as int) + 1, -- ending index (calculated in calc4 + total offset)
Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
-- temp result taken from subquery or original
FROM
T
JOIN PlaceholderTable pt -- **this is your source table**
ON pt.Row = T.Row
CROSS APPLY(
SELECT newstartidx = EndIdx + 2 -- new starting index moved by 2 from last end ('##')
) calc1
CROSS APPLY(
SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
-- current piece of txt we work on
) calc2
CROSS APPLY(
SELECT patidx = patindex('%##%', newtxt) -- current index of '##'
) calc3
CROSS APPLY(
SELECT newendidx = CASE
WHEN patidx = 0 THEN len(newtxt) + 1
ELSE patidx END -- if last piece of txt, end with its length
) calc4
CROSS APPLY(
SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
-- get the new token
) calc5
OUTER APPLY(
SELECT newtokensub = Value
FROM LookupTable
WHERE Name = newtoken -- substitute the token if you can find it in **your lookup table**
) calc6
WHERE newstartidx + len(newtxt) - 1 <= len(pt.Notes)
-- do this while {new starting index} + {length of txt we work on} exceeds total length
)
,lastProcessed AS (
SELECT
Row,
Result,
rn = row_number() over(partition by Row order by StartIdx desc)
FROM T
) -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1 -- filter out intermediate results (display only last ones)

Optimal way to concatenate/aggregate strings

I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE and FOR XML, but they just don't cut it for me.
String aggregation would do something like this:
id | Name Result: id | Names
-- - ---- -- - -----
1 | Matt 1 | Matt, Rocks
1 | Rocks 2 | Stylus
2 | Stylus
I've taken a look at CLR-defined aggregate functions as a replacement for COALESCE and FOR XML, but apparently SQL Azure does not support CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.
Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but hey I'll take what I can get) that I can use to aggregate my stuff?
SOLUTION
The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM dbo.SourceTable
),
Concatenated AS
(
SELECT
ID,
CAST(Name AS nvarchar) AS FullName,
Name,
NameNumber,
NameCount
FROM Partitioned
WHERE NameNumber = 1
UNION ALL
SELECT
P.ID,
CAST(C.FullName + ', ' + P.Name AS nvarchar),
P.Name,
P.NameNumber,
P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C
ON P.ID = C.ID
AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
EXPLANATION
The approach boils down to three steps:
Number the rows using OVER and PARTITION grouping and ordering them as needed for the concatenation. The result is Partitioned CTE. We keep counts of rows in each partition to filter the results later.
Using recursive CTE (Concatenated) iterate through the row numbers (NameNumber column) adding Name values to FullName column.
Filter out all results but the ones with the highest NameNumber.
Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).
I've quickly tested the solution on SQL Server 2012 with the following data:
INSERT dbo.SourceTable (ID, Name)
VALUES
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')
The query result:
ID FullName
----------- ------------------------------
2 Stylus
3 Bar, Baz, Foo
1 Matt, Rocks
Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).
create table #t (id int, name varchar(20))
insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')
select id
,Names = stuff((select ', ' + name as [text()]
from #t xt
where xt.id = t.id
for xml path('')), 1, 2, '')
from #t t
group by id
STRING_AGG() in SQL Server 2017, Azure SQL, and PostgreSQL:
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql
GROUP_CONCAT() in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat
(Thanks to #Brianjorden and #milanio for Azure update)
Example Code:
select Id
, STRING_AGG(Name, ', ') Names
from Demo
group by Id
SQL Fiddle: http://sqlfiddle.com/#!18/89251/1
Although #serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself.
This is #serge way:
DECLARE #startTime datetime2;
DECLARE #endTime datetime2;
DECLARE #counter INT;
SET #counter = 1;
set nocount on;
declare #YourTable table (ID int, Name nvarchar(50))
WHILE #counter < 1000
BEGIN
insert into #YourTable VALUES (ROUND(#counter/10,0), CONVERT(NVARCHAR(50), #counter) + 'CC')
SET #counter = #counter + 1;
END
SET #startTime = GETDATE()
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM #YourTable
),
Concatenated AS
(
SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1
UNION ALL
SELECT
P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
SET #endTime = GETDATE();
SELECT DATEDIFF(millisecond,#startTime, #endTime)
--Take about 54 milliseconds
And this is xmlpath way:
DECLARE #startTime datetime2;
DECLARE #endTime datetime2;
DECLARE #counter INT;
SET #counter = 1;
set nocount on;
declare #YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))
WHILE #counter < 1000
BEGIN
insert into #YourTable VALUES (#counter, ROUND(#counter/10,0), CONVERT(NVARCHAR(50), #counter) + 'CC')
SET #counter = #counter + 1;
END
SET #startTime = GETDATE();
set nocount off
SELECT
t1.HeaderValue
,STUFF(
(SELECT
', ' + t2.ChildValue
FROM #YourTable t2
WHERE t1.HeaderValue=t2.HeaderValue
ORDER BY t2.ChildValue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM #YourTable t1
GROUP BY t1.HeaderValue
SET #endTime = GETDATE();
SELECT DATEDIFF(millisecond,#startTime, #endTime)
--Take about 4 milliseconds
Update: Ms SQL Server 2017+, Azure SQL Database
You can use: STRING_AGG.
Usage is pretty simple for OP's request:
SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id
Read More
Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. #hrobky mentioned this previously as a SQL Server 2016 feature at the time.
--- Old Post:
Not enough reputation here to reply to #hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..
You can use += to concatenate strings, for example:
declare #test nvarchar(max)
set #test = ''
select #test += name from names
if you select #test, it will give you all names concatenated
I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:
DECLARE #STRUCTURED_VALUES TABLE (
ID INT
,VALUE VARCHAR(MAX) NULL
,VALUENUMBER BIGINT
,VALUECOUNT INT
);
INSERT INTO #STRUCTURED_VALUES
SELECT ID
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
,COUNT(*) OVER (PARTITION BY ID) AS VALUECOUNT
FROM RAW_VALUES_TABLE;
WITH CTE AS (
SELECT SV.ID
,SV.VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM #STRUCTURED_VALUES SV
WHERE VALUENUMBER = 1
UNION ALL
SELECT SV.ID
,CTE.VALUE + ' ' + SV.VALUE AS VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM #STRUCTURED_VALUES SV
JOIN CTE
ON SV.ID = CTE.ID
AND SV.VALUENUMBER = CTE.VALUENUMBER + 1
)
SELECT ID
,VALUE
FROM CTE
WHERE VALUENUMBER = VALUECOUNT
ORDER BY ID
;
Try this, i use it in my projects
DECLARE #MetricsList NVARCHAR(MAX);
SELECT #MetricsList = COALESCE(#MetricsList + '|', '') + QMetricName
FROM #Questions;

SQL Server 2008 - Comma seperation using coalesce as part of a larger query

I'm using coalesce to return a string of names and when I run it as a single query, I've got no problem, but when I want to put it in a larger query, I get a little stumped.
Below is the bit that works:
DECLARE #return varchar(200)
SELECT #return = COALESCE(#return + ', ', '') +
CAST((select FileAs) AS varchar(30))
from Objects
where ObjectId in (select objectid from dbo.Get_XML_Links(2335))
select #return
But I want to add the list of names into a larger query as follows:
select det.Description,
StartTime,
EndTime,
THE STRING OF NAMES
from DiaryEvents de
inner join DiaryEventTypes det on det.DiaryEventTypeID = de.EventTypeID
where EventTypeID in (29,40)
I can't quite see what I need to do (or if it is possible in this format!)
Thanks to a comment above and a very handy article on SQL Authority, I've worked it out as follows:
select det.Description,
StartTime,
EndTime,
(
SELECT SUBSTRING(
(select ',' + FileAs
from Objects
where ObjectId in (select objectid from dbo.Get_XML_Links(de.diaryeventid) where ObjectTypeSystemCode = 'CCT')
FOR XML PATH('')),2,200000)
)AS Contacts
from DiaryEvents de
inner join DiaryEventTypes det on det.DiaryEventTypeID = de.EventTypeID
where EventTypeID in (29,40)