String_agg for SQL Server before 2017 - sql

Can anyone help me make this query work for SQL Server 2014?
This is working on Postgresql and probably on SQL Server 2017. On Oracle it is listagg instead of string_agg.
Here is the SQL:
select
string_agg(t.id,',') AS id
from
Table t
I checked on the site some xml option should be used but I could not understand it.

In SQL Server pre-2017, you can do:
select stuff( (select ',' + cast(t.id as varchar(max))
from tabel t
for xml path ('')
), 1, 1, ''
);
The only purpose of stuff() is to remove the initial comma. The work is being done by for xml path.

Note that for some characters, the values will be escaped when using FOR XML PATH, for example:
SELECT STUFF((SELECT ',' + V.String
FROM (VALUES('7 > 5'),('Salt & pepper'),('2
lines'))V(String)
FOR XML PATH('')),1,1,'');
This returns the string below:
7 > 5,Salt & pepper,2
lines'
This is unlikely desired. You can get around this using TYPE and then getting the value of the XML:
SELECT STUFF((SELECT ',' + V.String
FROM (VALUES('7 > 5'),('Salt & pepper'),('2
lines'))V(String)
FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(MAX)'),1,1,'');
This returns the string below:
7 > 5,Salt & pepper,2
lines
This would replicate the behaviour of the following:
SELECT STRING_AGG(V.String,',')
FROM VALUES('7 > 5'),('Salt & pepper'),('2
lines'))V(String);
Of course, there might be times where you want to group the data, which the above doesn't demonstrate. To achieve this you would need to use a correlated subquery. Take the following sample data:
CREATE TABLE dbo.MyTable (ID int IDENTITY(1,1),
GroupID int,
SomeCharacter char(1));
INSERT INTO dbo.MyTable (GroupID, SomeCharacter)
VALUES (1,'A'), (1,'B'), (1,'D'),
(2,'C'), (2,NULL), (2,'Z');
From this wanted the below results:
GroupID
Characters
1
A,B,D
2
C,Z
To achieve this you would need to do something like this:
SELECT MT.GroupID,
STUFF((SELECT ',' + sq.SomeCharacter
FROM dbo.MyTable sq
WHERE sq.GroupID = MT.GroupID --This is your correlated join and should be on the same columns as your GROUP BY
--You "JOIN" on the columns that would have been in the PARTITION BY
FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(MAX)'),1,1,'')
FROM dbo.MyTable MT
GROUP BY MT.GroupID; --I use GROUP BY rather than DISTINCT as we are technically aggregating here
So, if you were grouping on 2 columns, then you would have 2 clauses your sub query's WHERE: WHERE MT.SomeColumn = sq.SomeColumn AND MT.AnotherColumn = sq.AnotherColumn, and your outer GROUP BY would be GROUP BY MT.SomeColumn, MT.AnotherColumn.
Finally, let's add an ORDER BY into this, which you also define in the subquery. Let's, for example, assume you wanted to sort the data by the value of the ID descending in the string aggregation:
SELECT MT.GroupID,
STUFF((SELECT ',' + sq.SomeCharacter
FROM dbo.MyTable sq
WHERE sq.GroupID = MT.GroupID
ORDER BY sq.ID DESC --This is identical to the ORDER BY you would have in your OVER clause
FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(MAX)'),1,1,'')
FROM dbo.MyTable MT
GROUP BY MT.GroupID;
For would produce the following results:
GroupID
Characters
1
D,B,A
2
Z,C
Unsurprisingly, this will never be as efficient as a STRING_AGG, due to having the reference the table multiple times (if you need to perform multiple aggregations, then you need multiple sub queries), but a well indexed table will greatly help the RDBMS. If performance really is a problem, because you're doing multiple string aggregations in a single query, then I would either suggest you need to reconsider if you need the aggregation, or it's about time you conisidered upgrading.

Related

Join using a LIKE clause is taking too long

Please see the TSQL below:
create table #IDs (id varchar(100))
insert into #IDs values ('123')
insert into #IDs values ('456')
insert into #IDs values ('789')
insert into #IDs values ('1010')
create table #Notes (Note varchar(500))
insert into #Notes values ('Here is a note for 123')
insert into #Notes values ('A note for 789 here')
insert into #Notes values ('456 has a note here')
I want to find all the IDs that are referenced in the #Notes table. This works:
select #IDs.id from #IDs inner join #Notes on #Notes.note like '%' + #IDs.id + '%'
However, there are hundreds of thousands of records in both tables and the query does not complete. I was thinking about FreeText searching, but I don't think it can be applied here. A cursor takes too long to run as well (I think it will take over one month). Is there anything else I can try? I am using SQL Server 2019.
The size of the input is only one aspect of the solution.
By splitting the text to tokens you indeed increase the number of records, but in the same time you enable equality join, which can be implemented using Hash Join.
You should get the query results in a few minutes top, basically the time it takes to your system to do a full scan on both tables, plus some processing time.
No need for temp tables.
No need for indexes.
Select id
from #IDS
where id in (select w.value
from #Notes as n
cross apply string_split(n.Note, ' ') as w
)
Fiddle
Per the OP request -
Here is a code that handles more complicated scenario, where an id could contain various characters (as defined by #token_char) and the separators are potentially all other characters
declare #token_char varchar(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
;
with cte_notes as
(
select Note
,replace(translate(Note,#token_char,space(len(#token_char))),' ','') as non_token_char
from #Notes
)
select id
from #IDS
where id in
(
select w.value
from cte_notes as n
cross apply string_split(translate(n.Note,n.non_token_char,space(len(n.non_token_char))),' ') as w
where w.value != ''
)
The Fiddle data sample was altered accordingly, to reflect the change
If you are going to do this search often you may want to explore using a wonderful (if underused) feature of SQL Server called 'Full Text Search.' To quote Microsoft:
A LIKE query against millions of rows of text data can take minutes to
return; whereas a full-text query can take only seconds or less
against the same data, depending on the number of rows that are
returned.'
I have seen searches go from minutes to seconds using this feature.
You would need to create a Full Text Search Catalog and then create indexs on the tables you want to search. It's not hard and will take you a few minutes to learn how to do this.
This is a good starting point:
https://learn.microsoft.com/en-us/sql/relational-databases/search/get-started-with-full-text-search?view=sql-server-ver15
I would apply CTE with string_split to filter out all alphabetic components and then join #ID table with the result of the CTE by id column. The query was tested on a sample of 1 mm rows.
With CTE As (
Select T.value As id
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null
)
Select I.id
From #IDs As I Inner Join CTE On (I.id=CTE.id)
If you just want to extract a numeric value from a string, in this case join is excessive.
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null And T.value Like '%[0-9]%'
id
Note
123
Here is a note for 123
789
A note for 789 here
456
456 has a note here
No matter what, under the given circumstances, I would use join to filter out those numbers that are not represented in #IDs table.
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
If the string contains brackets or parenthesis instead of spaces like this:
"456(this is an id number) has a note here" or "456[01/01/2022]"
as last resorts (since it degrades performance) you can use TRANSLATE to replace those brackets with spaces as follows:
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(TRANSLATE(Note,'[]()',' '),' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
db<>fiddle

Append values from 2 different columns in SQL

I have the following table
I need to get the following output as "SVGFRAMXPOSLSVG" from the 2 columns.
Is it possible to get this appended values from 2 columns
Please try this.
SELECT STUFF((
SELECT '' + DEPART_AIRPORT_CODE + ARRIVE_AIRPORT_CODE
FROM #tblName
FOR XML PATH('')
), 1, 0, '')
For Example:-
Declare #tbl Table(
id INT ,
DEPART_AIRPORT_CODE Varchar(50),
ARRIVE_AIRPORT_CODE Varchar(50),
value varchar(50)
)
INSERT INTO #tbl VALUES(1,'g1','g2',NULL)
INSERT INTO #tbl VALUES(2,'g2','g3',NULL)
INSERT INTO #tbl VALUES(3,'g3','g1',NULL)
SELECT STUFF((
SELECT '' + DEPART_AIRPORT_CODE + ARRIVE_AIRPORT_CODE
FROM #tbl
FOR XML PATH('')
), 1, 0, '')
Summary
Use Analytic functions and listagg to get the job done.
Detail
Create two lists of code_id and code values. Match the code_id values for the same airport codes (passengers depart from the same airport they just arrived at). Using lag and lead to grab values from other rows. NULLs will exist for code_id at the start and end of the itinerary. Default the first NULL to 0, and the last NULL to be the previous code_id plus 1. A list of codes will be produced, with a matching index. Merge the lists together and remove duplicates by using a union. Finally use listagg with no delimiter to aggregate the rows onto a string value.
with codes as
(
select
nvl(lag(t1.id) over (order by t1.id),0) as code_id,
t1.depart_airport_code as code
from table1 t1
union
select
nvl(lead(t1.id) over (order by t1.id)-1,lag(t1.id) over (order by t1.id)+1) as code_id,
t1.arrive_airport_code as code
from table1 t1
)
select
listagg(c.code,'') WITHIN GROUP (ORDER BY c.code_id) as result
from codes c;
Note: This solution does rely on an integer id field being available. Otherwise the analytic functions wouldn't have a column to sort by. If id doesn't exist, then you would need to manufacture one based on another column, such as a timestamp or another identifier that ensures the rows are in the correct order.
Use row_number() over (order by myorderedidentifier) as id in a subquery or view to achieve this. Don't use rownum. It could give you unpredictable results. Without an ORDER BY clause, there is no guarantee that the same query will return the same results each time.
Output
| RESULT |
|-----------------|
| SVGFRAMXPOSLSVG |

Return a XML field when using GROUP BY clause in MS SQL Management Studio?

I have the following table structure (partially excluded for clarity of question):
The table sometimes receives two lowFareRQ and lowFareRS that is considered to be only one booking under BookingNumber. The booking is then processed into a ticket where each booking number always have the same TicketRQ and TicketRS if the user proceeded with the booking. TicketRS contains 3rd party reference number.
I now want to display all the active bookings to the user in order to allow the user to cancel a booking if he wanted to.
So I would naturally want to retrieve the each booking number with active status as well as the TicketRS xml data in order to get the 3rd party reference number.
Here is the SQL query I started with:
SELECT TOP 100
[BookingNumber]
,[Status]
,[TicketRS]
FROM [VTResDB].[dbo].[LowFareRS]
GROUP BY [BookingNumber],[Status],[TicketRS]
ORDER BY [recID] desc
Now with MS SQL Management Studio you have to add the field [TicketRS] to 'GROUP BY' if you want to have it in the 'SELECT' field list... but you cannot have a XML field in the 'GROUP BY' list.
The XML data type cannot be compared or sorted, except when using the IS NULL operator.
I know that if I change the table structure this problem can be solved without any issue but I want to avoid changing the table structure because I am just completing the software and do not want to rewrite existing code.
Is there a way to return a XML field when using GROUP BY clause in MS SQL Management Studio?
Uhm, this seems dirty... If your XMLs are identically within the group, you might try something like this:
DECLARE #tbl TABLE(ID INT IDENTITY,Col1 VARCHAR(100),SomeValue INT,SomeXML XML);
INSERT INTO #tbl(col1,SomeValue,SomeXML) VALUES
('testA',1,'<root><a>testA</a></root>')
,('testA',2,'<root><a>testA</a></root>')
,('testB',3,'<root><a>testB</a></root>')
,('testB',4,'<root><a>testB</a></root>');
WITH GroupedSource AS
(
SELECT SUM(SomeValue) AS SumColumn
,CAST(SomeXml AS NVARCHAR(MAX)) AS XmlColumn
FROM #tbl AS tbl
GROUP BY Col1,CAST(SomeXml AS NVARCHAR(MAX))
)
SELECT SumColumn
,CAST(XmlColumn AS XML) AS ReCasted
FROM GroupedSource
Another approach was this
WITH GroupedSource AS
(
SELECT SUM(SomeValue) AS SumColumn
,MIN(ID) AS FirstID
FROM #tbl AS tbl
GROUP BY Col1
)
SELECT SumColumn
,(SELECT SomeXML FROM #tbl WHERE ID=FirstID) AS ReTaken
FROM GroupedSource
Cast it to nvarchar(max) and back
with t(xc,val) as (
select xc=cast(N'<x><y>txt</y></x>' as xml), val = 5
union all
select xc=cast(N'<x><y>txt</y></x>' as xml), val = 6
)
select xc = cast(xc as XML), val
from (
select xc = cast(xc as nvarchar(max)), val = sum(val)
from t
group by cast(xc as nvarchar(max))
) tt
;

String Concatenation In Sql Server With Distinct Values Sorted On Diff Column

I have following scenario,to make it brief i have created fiddle for this Sql Demo
I have case id(CaseId) for which i can have multiple subject id(CaseSubjId) & those case subject can be added any time so their insertion order is maintained by rowid."Office" column displays office location for that particular case subject.
I have to query concat all location in csv manner for each case id so that distinct values can be selected for office & concatenation order is maintained in rowid asc order.
CREATE TABLE TmpTest(CaseId INT,CaseSubjId INT,Office VARCHAR(10),RowId INT)
INSERT INTO TmpTest(CaseId,CaseSubjId,Office,RowId)VALUES
(1,1,'Kol',1),(1,2,'Del',2),(1,3,'Kol',4),(1,4,'Noi',3),(1,5,'Kol',6),
(1,6,'Bhu',7),(2,11,'Kol',5),(2,12,'Bhu',3),(2,13,'Kol',4),(2,14,'Met',7),
(2,15,'Bhu',1),(2,16,'Met',2)
--OutPut Required:
CaseId | Office
1 | Kol,Del,Noi,Bhu
2 | Bhu,Met,Kol
--Order By Row Id Asc Group By Case Id Concat String In CSV For Office Value
But Output i am getting all values in office getting concatenated.
Try this:
select distinct caseid,
STUFF(
(SELECT DISTINCT ',' + tmp.Office
FROM TmpTest AS tmp
WHERE tmp.CaseId = TmpTest.CaseId
FOR XML PATH('')), 1, 1, '')
AS Locations
from TmpTest group by caseid

Select columnValue if the column exists otherwise null

I'm wondering if I can select the value of a column if the column exists and just select null otherwise. In other words I'd like to "lift" the select statement to handle the case when the column doesn't exist.
SELECT uniqueId
, columnTwo
, /*WHEN columnThree exists THEN columnThree ELSE NULL END*/ AS columnThree
FROM (subQuery) s
Note, I'm in the middle to solidifying my data model and design. I hope to exclude this logic in the coming weeks, but I'd really like to move beyond this problem right because the data model fix is a more time consuming endeavor than I'd like to tackle now.
Also note, I'd like to be able to do this in one query. So I'm not looking for an answer like
check what columns are on your sub query first. Then modify your
query to appropriately handle the columns on your sub query.
You cannot do this with a simple SQL statement. A SQL query will not compile unless all table and column references in the table exist.
You can do this with dynamic SQL if the "subquery" is a table reference or a view.
In dynamic SQL, you would do something like:
declare #sql nvarchar(max) = '
SELECT uniqueId, columnTwo, '+
(case when exists (select *
from INFORMATION_SCHEMA.COLUMNS
where tablename = #TableName and
columnname = 'ColumnThree' -- and schema name too, if you like
)
then 'ColumnThree'
else 'NULL as ColumnThree'
end) + '
FROM (select * from '+#SourceName+' s
';
exec sp_executesql #sql;
For an actual subquery, you could approximate the same thing by checking to see if the subquery returned something with that column name. One method for this is to run the query: select top 0 * into #temp from (<subquery>) s and then check the columns in #temp.
EDIT:
I don't usually update such old questions, but based on the comment below. If you have a unique identifier for each row in the "subquery", you can run the following:
select t.. . ., -- everything but columnthree
(select column3 -- not qualified!
from t t2
where t2.pk = t.pk
) as column3
from t cross join
(values (NULL)) v(columnthree);
The subquery will pick up column3 from the outer query if it doesn't exist. However, this depends critically on having a unique identifier for each row. The question is explicitly about a subquery, and there is no reason to expect that the rows are easily uniquely identified.
As others already suggested, the sane approach is to have queries that meet your table design.
There is a rather exotic approach to achieve what you want in (pure, not dynamic) SQL though. A similar problem was posted at DBA.SE: How to select specific rows if a column exists or all rows if a column doesn't but it was simpler as only one row and one column was wanted as result. Your problem is more complex so the query is more convoluted, to say the least. Here is, the insane approach:
; WITH s AS
(subquery) -- subquery
SELECT uniqueId
, columnTwo
, columnThree =
( SELECT ( SELECT columnThree
FROM s AS s2
WHERE s2.uniqueId = s.uniqueId
) AS columnThree
FROM (SELECT NULL AS columnThree) AS dummy
)
FROM s ;
It also assumes that the uniqueId is unique in the result set of the subquery.
Tested at SQL-Fiddle
And a simpler method which has the additional advantage that allows more than one column with a single subquery:
SELECT s.*
FROM
( SELECT NULL AS columnTwo,
NULL AS columnThree,
NULL AS columnFour
) AS dummy
CROSS APPLY
( SELECT
uniqueId,
columnTwo,
columnThree,
columnFour
FROM tableX
) AS s ;
The question has also been asked at DBA.SE and has been answered by #Andriy M (using CROSS APPLY too!) and Michael Ericsson (using XML):
Why can't I use a CASE statement to see if a column exists and not SELECT from it?
you can use dynamic SQL.
first you need to check exist column and then create dynamic query.
DECLARE #query NVARCHAR(MAX) = '
SELECT FirstColumn, SecondColumn, '+
(CASE WHEN exists (SELECT 1 FROM syscolumns
WHERE name = 'ColumnName' AND id = OBJECT_ID('TableName'))
THEN 'ColumnName'
ELSE 'NULL as ThreeColumn'
END) + '
FROM TableName'
EXEC sp_executesql #query;