Best way to Iterate through temp table to build string in SQL Server 2014 - sql

I have created a temp table, the idea being that I want to loop through it, match all records with the same email address and then populate a string, which will go in to an email, then drop the table. This will be run as a stored procedure.
I've used a cursor that first grabbed all the unique email addresses and then coalesce the records but with potentially 100k-500k records performance won't be acceptable, and I know there must be a far more efficient way of doing it.
Example data (apologies, don't know how to format it properly)
#temptable
temp_email, temp_string
test#test.com string1
test#test.com string2
test2#test.com string3
test2#test.com string4
test3#test.com string5
I then want to populate another table with this data
emailto... emailbody
test#test.com 'string1<br / > string2'
test2#test.com 'string3<br / > string4'
test3#test.com 'string5'
Thank you.

The STUFF and FOR XML PATH method achieves this nicely in SQl Server 2014 and prior. Because you have the characters < and > however, they need to be "un-escaped" afterwards:
WITH VTE AS(
SELECT *
FROM (VALUES('test#test.com','string1'),
('test#test.com','string2'),
('test2#test.com','string3'),
('test2#test.com','string4'),
('test3#test.com','string5')) V(Email, String))
SELECT Email,
STUFF(REPLACE(REPLACE((SELECT '<br/>' + sq.String
FROM VTE sq
WHERE sq.Email = V.Email
FOR XML PATH('')),'<','<'),'>','>'),1,5,'')
FROM VTE V
GROUP BY Email;

you dont need to use cursor, please use string_agg function.
Create table #temptable
(temp_email varchar(50), temp_string varchar(50))
INSERT INTO #temptable
VALUES ('test#test.com', 'string1'),
('test#test.com', 'string2'),
('test2#test.com', 'string3'),
('test2#test.com', 'string4'),
('test3#test.com', 'string5')
Select temp_email, STRING_AGG(temp_string,' <br/>')
from #temptable
Group by temp_email

In SQL Server 2014 there are ways of doing this without a cursor but they are basically quite convoluted hacks and lead to pretty unreadable SQL in my opinion. See here for details:
How to concatenate text from multiple rows into a single text string in SQL server?
A cursor is arguably the best way in SQL 2014 because at least its readable.
In Sql Server 2017 there is an official aggregation function for this:
String_Agg
... but thats no use to you at the mo. Sorry.

You can do something like this :
-- Create temp table
CREATE TABLE #temptable (temp_email varchar(50), temp_string varchar(50))
-- populate table with data
INSERT INTO #temptable
VALUES ('test#test.com', 'string1'),
('test#test.com', 'string2'),
('test2#test.com', 'string3'),
('test2#test.com', 'string4'),
('test3#test.com', 'string5')
-- actual query
;WITH CTE_table AS(
SELECT C.temp_email,
REPLACE(REPLACE(STUFF(
(SELECT CAST(temp_string AS VARCHAR(20))+'<br/>' AS [text()]
FROM #temptable AS O
WHERE C.temp_email= o.temp_email
FOR XML PATH('')), 1, 0, NULL)
,'<','<') -- replace this < with html code <
,'>','>') -- replace this > with html code >
AS temp_string
,ROW_NUMBER() OVER (partition by temp_email order by temp_email) rownumber
FROM #temptable AS C
)
-- Get only unique records
SELECT temp_email,temp_string FROM CTE_table
Where rownumber=1

Related

Checking if field contains multiple string in sql server

I am working on a sql database which will provide with data some grid. The grid will enable filtering, sorting and paging but also there is a strict requirement that users can enter free text to a text input above the grid for example
'Engine 1001 Requi' and that the result will contain only rows which in some columns contain all the pieces of the text. So one column may contain Engine, other column may contain 1001 and some other will contain Requi.
I created a technical column (let's call it myTechnicalColumn) in the table (let's call it myTable) which will be updated each time someone inserts or updates a row and it will contain all the values of all the columns combined and separated with space.
Now to use it with entity framework I decided to use a table valued function which accepts one parameter #searchQuery and it will handle it like this:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS #Result TABLE
( ... here come columns )
AS
BEGIN
DECLARE #searchToken TokenType
INSERT INTO #searchToken(token) SELECT value FROM STRING_SPLIT(#searchText,' ')
DECLARE #searchTextLength INT
SET #searchTextLength = (SELECT COUNT(*) FROM #searchToken)
INSERT INTO #Result
SELECT
... here come columns
FROM myTable
WHERE (SELECT COUNT(*) FROM #searchToken WHERE CHARINDEX(token, myTechnicalColumn) > 0) = #searchTextLength
RETURN;
END
Of course the solution works fine but it's kinda slow. Any hints how to improve its efficiency?
You can use an inline Table Valued Function, which should be quite a lot faster.
This would be a direct translation of your current code
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE (
SELECT COUNT(*)
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) > 0
) = (SELECT COUNT(*) FROM searchText)
);
GO
You are using a form of query called Relational Division Without Remainder and there are other ways to cut this cake:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE NOT EXISTS (
SELECT 1
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) = 0
)
);
GO
This may be faster or slower depending on a number of factors, you need to test.
Since there is no data to test, i am not sure if the following will solve your issue:
-- Replace the last INSERT portion
INSERT INTO #Result
SELECT
... here come columns
FROM myTable T
JOIN #searchToken S ON CHARINDEX(S.token, T.myTechnicalColumn) > 0

SQL: Improving the string split

I have a set of code that takes a string value, split it, and pass it to a table. The code works, but it runs slow. Any suggestion to modify the code and make it run faster would be greatly appreciated.
DECLARE #StrPropertyIDs VARCHAR(1000)
SET #StrPropertyIDs = '419,429,459'
DECLARE #TblPropertyID TABLE
(
property_id varchar(100)
)
INSERT INTO #TblPropertyID(property_id)
select x.Item
from dbo.SplitString(#StrPropertyIDs, ',') x
select *
from vw_nfpa_firstArv_RPT
where property_use IN
(
SELECT property_id
FROM #TblPropertyID
)
The best long term strategy here would be to move away from CSV data in your SQL tables if at all possible. As a quick fix here, we could try creating the table variable with an index on property_id:
DECLARE #TblPropertyID TABLE (
property_id varchar(100) INDEX idx CLUSTERED
);
This would make the WHERE IN clause of your query faster, though we could try rewriting it using EXISTS:
SELECT *
FROM vw_nfpa_firstArv_RPT t1
WHERE EXISTS (SELECT 1 FROM #TblPropertyID t2
WHERE t2.property_id = t1.property_use);
Note that this would only work on SQL Server 2014 or later.

How to manipulate comma-separated list in SQL Server

I have a list of values such as
1,2,3,4...
that will be passed into my SQL query.
I need to have these values stored in a table variable. So essentially I need something like this:
declare #t (num int)
insert into #t values (1),(2),(3),(4)...
Is it possible to do that formatting in SQL Server? (turning 1,2,3,4... into (1),(2),(3),(4)...
Note: I can not change what those values look like before they get to my SQL script; I'm stuck with that list. also it may not always be 4 values; it could 1 or more.
Edit to show what values look like: under normal circumstances, this is how it would work:
select t.pk
from a_table t
where t.pk in (#place_holder#)
#placeholder# is just a literal place holder. when some one would run the report, #placeholder# is replaced with the literal values from the filter of that report:
select t.pk
from a_table t
where t.pk in (1,2,3,4) -- or whatever the user selects
t.pk is an int
note: doing
declare #t as table (
num int
)
insert into #t values (#Placeholder#)
does not work.
Your description is a bit ridicuolus, but you might give this a try:
Whatever you mean with this
I see what your trying to say; but if I type out '#placeholder#' in the script, I'll end up with '1','2','3','4' and not '1,2,3,4'
I assume this is a string with numbers, each number between single qoutes, separated with a comma:
DECLARE #passedIn VARCHAR(100)='''1'',''2'',''3'',''4'',''5'',''6'',''7''';
SELECT #passedIn; -->: '1','2','3','4','5','6','7'
Now the variable #passedIn holds exactly what you are talking about
I'll use a dynamic SQL-Statement to insert this in a temp-table (declared table variable would not work here...)
CREATE TABLE #tmpTable(ID INT);
DECLARE #cmd VARCHAR(MAX)=
'INSERT INTO #tmpTable(ID) VALUES (' + REPLACE(SUBSTRING(#passedIn,2,LEN(#passedIn)-2),''',''','),(') + ');';
EXEC (#cmd);
SELECT * FROM #tmpTable;
GO
DROP TABLE #tmpTable;
UPDATE 1: no dynamic SQL necessary, all ad-hoc...
You can get the list of numbers as derived table in a CTE easily.
This can be used in a following statement like WHERE SomeID IN(SELECT ID FROM MyIDs) (similar to this: dynamic IN section )
WITH MyIDs(ID) AS
(
SELECT A.B.value('.','int') AS ID
FROM
(
SELECT CAST('<x>' + REPLACE(SUBSTRING(#passedIn,2,LEN(#passedIn)-2),''',''','</x><x>') + '</x>' AS XML) AS AsXml
) as tbl
CROSS APPLY tbl.AsXml.nodes('/x') AS A(B)
)
SELECT * FROM MyIDs
UPDATE 2:
And to answer your question exactly:
With this following the CTE
insert into #t(num)
SELECT ID FROM MyIDs
... you would actually get your declared table variable filled - if you need it later...

Splitting delimited values in a SQL column into multiple rows

I would really like some advice here, to give some background info I am working with inserting Message Tracking logs from Exchange 2007 into SQL. As we have millions upon millions of rows per day I am using a Bulk Insert statement to insert the data into a SQL table.
In fact I actually Bulk Insert into a temp table and then from there I MERGE the data into the live table, this is for test parsing issues as certain fields otherwise have quotes and such around the values.
This works well, with the exception of the fact that the recipient-address column is a delimited field seperated by a ; character, and it can be incredibly long sometimes as there can be many email recipients.
I would like to take this column, and split the values into multiple rows which would then be inserted into another table. Problem is anything I am trying is either taking too long or not working the way I want.
Take this example data:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com;user4#domain4.com;user5#domain5.com
I would like this to be formatted as followed in my Recipients table:
message-id recipient-address
2D5E558D4B5A3D4F962DA5051EE364BE06CF37A3A5#Server.com user1#domain1.com
E52F650C53A275488552FFD49F98E9A6BEA1262E#Server.com user2#domain2.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user3#domain3.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user4#domain4.com
4fd70c47.4d600e0a.0a7b.ffff87e1#Server.com user5#domain5.com
Does anyone have any ideas about how I can go about doing this?
I know PowerShell pretty well, so I tried in that, but a foreach loop even on 28K records took forever to process, I need something that will run as quickly/efficiently as possible.
Thanks!
If you are on SQL Server 2016+
You can use the new STRING_SPLIT function, which I've blogged about here, and Brent Ozar has blogged about here.
SELECT s.[message-id], f.value
FROM dbo.SourceData AS s
CROSS APPLY STRING_SPLIT(s.[recipient-address], ';') as f;
If you are still on a version prior to SQL Server 2016
Create a split function. This is just one of many examples out there:
CREATE FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
I've discussed a few others here, here, and a better approach than splitting in the first place here.
Now you can extrapolate simply by:
SELECT s.[message-id], f.Item
FROM dbo.SourceData AS s
CROSS APPLY dbo.SplitStrings(s.[recipient-address], ';') as f;
Also I suggest not putting dashes in column names. It means you always have to put them in [square brackets].
SQL Server 2016 include a new table function string_split(), similar to the previous solution.
The only requirement is Set compatibility level to 130 (SQL Server 2016)
You may use CROSS APPLY (available in SQL Server 2005 and above) and STRING_SPLIT function (available in SQL Server 2016 and above):
DECLARE #delimiter nvarchar(255) = ';';
-- create tables
CREATE TABLE MessageRecipients (MessageId int, Recipients nvarchar(max));
CREATE TABLE MessageRecipient (MessageId int, Recipient nvarchar(max));
-- insert data
INSERT INTO MessageRecipients VALUES (1, 'user1#domain.com; user2#domain.com; user3#domain.com');
INSERT INTO MessageRecipients VALUES (2, 'user#domain1.com; user#domain2.com');
-- insert into MessageRecipient
INSERT INTO MessageRecipient
SELECT MessageId, ltrim(rtrim(value))
FROM MessageRecipients
CROSS APPLY STRING_SPLIT(Recipients, #delimiter)
-- output results
SELECT * FROM MessageRecipients;
SELECT * FROM MessageRecipient;
-- delete tables
DROP TABLE MessageRecipients;
DROP TABLE MessageRecipient;
Results:
MessageId Recipients
----------- ----------------------------------------------------
1 user1#domain.com; user2#domain.com; user3#domain.com
2 user#domain1.com; user#domain2.com
and
MessageId Recipient
----------- ----------------
1 user1#domain.com
1 user2#domain.com
1 user3#domain.com
2 user#domain1.com
2 user#domain2.com
for table = "yelp_business", split the column categories values separated by ; into rows and display as category column.
SELECT unnest(string_to_array(categories, ';')) AS category
FROM yelp_business;

String manipulation SQL

I have a row of strings that are in the following format:
'Order was assigned to lastname,firsname'
I need to cut this string down into just the last and first name but it is always a different name for each record.
The 'Order was assigned to' part is always the same.......
Thanks
I am using SQL Server. It is multiple records with different names in each record.
In your specific case you can use something like:
SELECT SUBSTRING(str, 23) FROM table
However, this is not very scalable, should the format of your strings ever change.
If you are using an Oracle database, you would want to use SUBSTR instead.
Edit:
For databases where the third parameter is not optional, you could use SUBSTRING(str, 23, LEN(str))
Somebody would have to test to see if this is better or worse than subtraction, as in Martin Smith's solution but gives you the same result in the end.
In addition to the SUBSTRING methods, you could also use a REPLACE function. I don't know which would have better performance over millions of rows, although I suspect that it would be the SUBSTRING - especially if you were working with CHAR instead of VARCHAR.
SELECT REPLACE(my_column, 'Order was assigned to ', '')
For SQL Server
WITH testData AS
(
SELECT 'Order was assigned to lastname,firsname' as Col1 UNION ALL
SELECT 'Order was assigned to Bloggs, Jo' as Col1
)
SELECT SUBSTRING(Col1,23,LEN(Col1)-22) AS Name
from testData
Returns
Name
---------------------------------------
lastname,firsname
Bloggs, Jo
on MS SQL Server:
declare #str varchar(100) = 'Order was assigned to lastname,firsname'
declare #strLen1 int = DATALENGTH('Order was assigned to ')
declare #strLen2 int = len(#str)
select #strlen1, #strLen2, substring(#str,#strLen1,#strLen2),
RIGHT(#str, #strlen2-#strlen1)
I would require that a colon or some other delimiter be between the message and the name.
Then you could just search for the index of that character and know that anything after it was the data you need...
Example with format changing over time:
CREATE TABLE #Temp (OrderInfo NVARCHAR(MAX))
INSERT INTO #Temp VALUES ('Order was assigned to :Smith,Mary')
INSERT INTO #Temp VALUES ('Order was assigned to :Holmes,Larry')
INSERT INTO #Temp VALUES ('New Format over time :LootAt,Me')
SELECT SUBSTRING(OrderInfo, CHARINDEX(':',OrderInfo)+1, LEN(OrderInfo))
FROM #Temp
DROP TABLE #Temp