T-SQL / Find all dots in a column - sql

I have query like that :
DECLARE #TABLE TABLE (
ID int primary key identity(1,1),
CODE nvarchar(max)
);
INSERT INTO #TABLE VALUES ('320.01.001'),('320.01.002'),('320.001.002'),('320.01.002.0003.0002')
SELECT * FROM #TABLE
Result:
I want to count of dots in a column .
My excepted result:

A pretty simple method is:
select t.*, s.num_dots
from #table t cross apply
(select count(*) - 1 as num_dots
from string_split(t.code, '.') s
) s;
A more traditional method uses the difference between the lengths of two strings:
select t.*,
len(t.code) - len(replace(t.code, '.', '')) as num_dots
from #table t;
I actually do not have a sense of which of these is faster. If I had to guess, I would guess the second, but if performance is an issue, you should test the two versions.

Related

Reversing string segments within query

I have a text column that contains values like CN-CHGO-BNSF.
I need to reverse the order of the segments between hyphens. So the example above would be converted to BNSF-CHGO-CN. I can easily do this in C# code, but here I could optimize my task if I could do it within a query.
Is there any way to do this in a SQL query? I'm using Entity Framework and SQL Server. Is this possible?
The number of segments will be one or more. (The number of hyphens will be zero or more.)
Examples
Input
Output
BNSF
BNSF
CHGO-BNSF
BNSF-CHGO
CN-CHGO-BNSF
BNSF-CHGO-CN
FXE-EAGPA-BNSF-ROBSP-(RVPR)
(RVPR)-ROBSP-BNSF-EAGPA-FXE
With SQL Server 2022, it looks like STRING_SPLIT is being added (which I'm not currently using). I don't know if that could be used for this.
If you have a maximum number of positions, you can use a bit of JSON in concert with concat()
Example
Declare #YourTable Table ([SomeCol] varchar(50)) Insert Into #YourTable Values
('BNSF')
,('CHGO-BNSF')
,('CN-CHGO-BNSF')
,('FXE-EAGPA-BNSF-ROBSP-(RVPR)')
Select A.*
,NewVal = stuff( concat('-'+JSON_VALUE(JS,'$[6]')
,'-'+JSON_VALUE(JS,'$[5]')
,'-'+JSON_VALUE(JS,'$[4]')
,'-'+JSON_VALUE(JS,'$[3]')
,'-'+JSON_VALUE(JS,'$[2]')
,'-'+JSON_VALUE(JS,'$[1]')
,'-'+JSON_VALUE(JS,'$[0]')
),1,1,'')
From #YourTable A
Cross Apply (values ( '["'+replace(string_escape(SomeCol,'json'),'-','","')+'"]' ) ) B(JS)
Results
EDIT Using String_AGG
Select A.*
,B.NewValue
From #YourTable A
Cross Apply ( Select NewValue = string_agg(value,'-') within group (ORDER BY convert(int,[key]) desc)
from openjson( '["'+replace(string_escape(SomeCol,'json'),'-','","')+'"]' )
) B
Please try the following solution.
It is generic and will work for any number of tokens in a column.
It is using XML and XQuery to tokenize tokens in a column.
After that XQuery's FLWOR expression is traversing tokens in a reverse order via order by $pos descending.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(MAX));
INSERT Into #tbl (tokens) VALUES
('BNSF'),
('CHGO-BNSF'),
('CN-CHGO-BNSF'),
('FXE-EAGPA-BNSF-ROBSP-(RVPR)');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = '-';
SELECT t.*
, REPLACE(c.query('
for $x in /root/r/text()
let $pos := count(/root/r[. << $x[1]])
order by $pos descending
return data($x)
').value('.','VARCHAR(MAX)'),SPACE(1), #separator) AS Result
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
ID
tokens
Result
1
BNSF
BNSF
2
CHGO-BNSF
BNSF-CHGO
3
CN-CHGO-BNSF
BNSF-CHGO-CN
4
FXE-EAGPA-BNSF-ROBSP-(RVPR)
(RVPR)-ROBSP-BNSF-EAGPA-FXE

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.
This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t
First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;
Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END
I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

Efficient way to merge alternating values from two columns into one column in SQL Server

I have two columns in a table. I want to merge them into a single column, but the merge should be done taking alternate characters from each columns.
For example:
Column A --> value (1,2,3)
Column B --> value (A,B,C)
Required result - (1,A,2,B,3,C)
It should be done without loops.
You need to make use of the UNION and get a little creative with how you choose to alternate. My solution ended up looking like this.
SELECT ColumnA
FROM Table
WHERE ColumnA%2=1
UNION
SELECT ColumnB
FROM TABLE
WHERE ColumnA%2=0
If you have an ID/PK column that could just as easily be used, I just didn't want to assume anything about your table.
EDIT:
If your table contains duplicates that you wish to keep, use UNION ALL instead of UNION
Try This;
SELECT [value]
FROM [Table]
UNPIVOT
(
[value] FOR [Column] IN ([Column_A], [Column_B])
) UNPVT
If you have SQL 2016 or higher you can use:
SELECT QUOTENAME(STRING_AGG (cast(a as varchar(1)) + ',' + b, ','), '()')
FROM test;
In older versions, depending on how much data you have in your tables you can also try:
SELECT QUOTENAME(STUFF(
(SELECT ',' + cast(a as varchar(1)) + ',' + b
FROM test
FOR XML PATH('')), 1, 1,''), '()')
Here you can try a sample
http://sqlfiddle.com/#!18/6c9af/5
with data as (
select *, row_number() over order by colA) as rn
from t
)
select rn,
case rn % 2 when 1 then colA else colB end as alternating
from data;
The following SQL uses undocumented aggregate concatenation technique. This is described in Inside Microsoft SQL Server 2008 T-SQL Programming on page 33.
declare #x varchar(max) = '';
declare #t table (a varchar(10), b varchar(10));
insert into #t values (1,'A'), (2,'B'),(3,'C');
select #x = #x + a + ',' + b + ','
from #t;
select '(' + LEFT(#x, LEN(#x) - 1) + ')';

Splitting a variable length column in SQL server safely

I have a column (varchar400) in the following form in an SQL table :
Info
UserID=1123456,ItemID=6685642
The column is created via our point of sale application, and so I cannot do the normal thing of simply splitting it into two columns as this would cause an obscene amount of work. My problem is that this column is used to store attributes of products in our database, and so while I am only concerned with UserID and ItemID, there may be superfluous information stored here, for example :
Info
IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.
What I want to retrieve is simply two columns, with no error given if neither of these attributes exists in the Info column. :
UserID ItemID
123124 1213124
Would it be possible to do this effectively, with error checking, given that the length of the IDs are all variable, but all of the attributes are comma-separated and follow a uniform style (i.e "UserID=number").
Can anyone tell me the best way of dealing with my problem ?
Thanks a lot.
Try this
declare #infotable table (info varchar(4000))
insert into #infotable
select 'IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.'
union all
select 'UserID=1123456,ItemID=6685642'
-- convert info column to xml type
; with cte as
(
select cast('<info ' + REPLACE(REPLACE(REPLACE(info,',', '" '),'=','="'),'.','') + '" />' as XML) info,
ROW_NUMBER() over (order by info) id
from #infotable
)
select userId, ItemId from
(
select T.N.value('local-name(.)', 'varchar(max)') as Name,
T.N.value('.', 'varchar(max)') as Value, id
from cte cross apply info.nodes('//#*') as T(N)
) v
pivot (max(value) for Name in ([UserID], [ItemId])) p
SQL DEMO
You can try this split function: http://www.sommarskog.se/arrays-in-sql-2005.html
Assuming ItemID=1213124. is terminated with a dot.
Declare #t Table (a varchar(400))
insert into #t values ('IrrelevantID=666,UserID=123124,AnotherIrrelevantID=1232342,ItemID=1213124.')
insert into #t values ('IrrelevantID=333,UserID=222222,AnotherIrrelevantID=0,ItemID=111.')
Select
STUFF(
Stuff(a,1,CHARINDEX(',UserID=',a) + Len(',UserID=')-1 ,'' )
,CharIndex
(',',
Stuff(a,1,CHARINDEX(',UserID=',a) + Len(',UserID=')-1 ,'' )
)
,400,'') as UserID
,
STUFF(
Stuff(a,1,CHARINDEX(',ItemID=',a) + Len(',ItemID=')-1 ,'' )
,CharIndex
('.',
Stuff(a,1,CHARINDEX(',ItemID=',a) + Len(',ItemID=')-1,'' )
)
,400,'') as ItemID
from #t

Optimal way to concatenate/aggregate strings

I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE and FOR XML, but they just don't cut it for me.
String aggregation would do something like this:
id | Name Result: id | Names
-- - ---- -- - -----
1 | Matt 1 | Matt, Rocks
1 | Rocks 2 | Stylus
2 | Stylus
I've taken a look at CLR-defined aggregate functions as a replacement for COALESCE and FOR XML, but apparently SQL Azure does not support CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.
Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but hey I'll take what I can get) that I can use to aggregate my stuff?
SOLUTION
The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM dbo.SourceTable
),
Concatenated AS
(
SELECT
ID,
CAST(Name AS nvarchar) AS FullName,
Name,
NameNumber,
NameCount
FROM Partitioned
WHERE NameNumber = 1
UNION ALL
SELECT
P.ID,
CAST(C.FullName + ', ' + P.Name AS nvarchar),
P.Name,
P.NameNumber,
P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C
ON P.ID = C.ID
AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
EXPLANATION
The approach boils down to three steps:
Number the rows using OVER and PARTITION grouping and ordering them as needed for the concatenation. The result is Partitioned CTE. We keep counts of rows in each partition to filter the results later.
Using recursive CTE (Concatenated) iterate through the row numbers (NameNumber column) adding Name values to FullName column.
Filter out all results but the ones with the highest NameNumber.
Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).
I've quickly tested the solution on SQL Server 2012 with the following data:
INSERT dbo.SourceTable (ID, Name)
VALUES
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')
The query result:
ID FullName
----------- ------------------------------
2 Stylus
3 Bar, Baz, Foo
1 Matt, Rocks
Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).
create table #t (id int, name varchar(20))
insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')
select id
,Names = stuff((select ', ' + name as [text()]
from #t xt
where xt.id = t.id
for xml path('')), 1, 2, '')
from #t t
group by id
STRING_AGG() in SQL Server 2017, Azure SQL, and PostgreSQL:
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql
GROUP_CONCAT() in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat
(Thanks to #Brianjorden and #milanio for Azure update)
Example Code:
select Id
, STRING_AGG(Name, ', ') Names
from Demo
group by Id
SQL Fiddle: http://sqlfiddle.com/#!18/89251/1
Although #serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself.
This is #serge way:
DECLARE #startTime datetime2;
DECLARE #endTime datetime2;
DECLARE #counter INT;
SET #counter = 1;
set nocount on;
declare #YourTable table (ID int, Name nvarchar(50))
WHILE #counter < 1000
BEGIN
insert into #YourTable VALUES (ROUND(#counter/10,0), CONVERT(NVARCHAR(50), #counter) + 'CC')
SET #counter = #counter + 1;
END
SET #startTime = GETDATE()
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM #YourTable
),
Concatenated AS
(
SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1
UNION ALL
SELECT
P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
SET #endTime = GETDATE();
SELECT DATEDIFF(millisecond,#startTime, #endTime)
--Take about 54 milliseconds
And this is xmlpath way:
DECLARE #startTime datetime2;
DECLARE #endTime datetime2;
DECLARE #counter INT;
SET #counter = 1;
set nocount on;
declare #YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))
WHILE #counter < 1000
BEGIN
insert into #YourTable VALUES (#counter, ROUND(#counter/10,0), CONVERT(NVARCHAR(50), #counter) + 'CC')
SET #counter = #counter + 1;
END
SET #startTime = GETDATE();
set nocount off
SELECT
t1.HeaderValue
,STUFF(
(SELECT
', ' + t2.ChildValue
FROM #YourTable t2
WHERE t1.HeaderValue=t2.HeaderValue
ORDER BY t2.ChildValue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM #YourTable t1
GROUP BY t1.HeaderValue
SET #endTime = GETDATE();
SELECT DATEDIFF(millisecond,#startTime, #endTime)
--Take about 4 milliseconds
Update: Ms SQL Server 2017+, Azure SQL Database
You can use: STRING_AGG.
Usage is pretty simple for OP's request:
SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id
Read More
Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. #hrobky mentioned this previously as a SQL Server 2016 feature at the time.
--- Old Post:
Not enough reputation here to reply to #hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..
You can use += to concatenate strings, for example:
declare #test nvarchar(max)
set #test = ''
select #test += name from names
if you select #test, it will give you all names concatenated
I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:
DECLARE #STRUCTURED_VALUES TABLE (
ID INT
,VALUE VARCHAR(MAX) NULL
,VALUENUMBER BIGINT
,VALUECOUNT INT
);
INSERT INTO #STRUCTURED_VALUES
SELECT ID
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
,COUNT(*) OVER (PARTITION BY ID) AS VALUECOUNT
FROM RAW_VALUES_TABLE;
WITH CTE AS (
SELECT SV.ID
,SV.VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM #STRUCTURED_VALUES SV
WHERE VALUENUMBER = 1
UNION ALL
SELECT SV.ID
,CTE.VALUE + ' ' + SV.VALUE AS VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM #STRUCTURED_VALUES SV
JOIN CTE
ON SV.ID = CTE.ID
AND SV.VALUENUMBER = CTE.VALUENUMBER + 1
)
SELECT ID
,VALUE
FROM CTE
WHERE VALUENUMBER = VALUECOUNT
ORDER BY ID
;
Try this, i use it in my projects
DECLARE #MetricsList NVARCHAR(MAX);
SELECT #MetricsList = COALESCE(#MetricsList + '|', '') + QMetricName
FROM #Questions;