I have a stored procedure that's taking a very long time because I have 2 function calls that are being called before a PIVOT, which means it's calling the functions 5 times for each record rather than once for each record. How can I get rewrite my query so that the 2 function calls right at the end of the query are run after the Pivot rather than before?
Here's the query
CREATE TABLE #Temp
(
ServiceRecordID INT,
LocationStd VARCHAR(1000),
AreaServedStd VARCHAR(1000),
RegionalLimited BIT,
Region VARCHAR(255),
Visible BIT
)
DECLARE #RegionCount INT
SELECT #RegionCount = COUNT(RegionID) FROM Regions WHERE SiteID = #SiteID AND RegionID % 100 != 0
INSERT INTO #Temp
SELECT TOP (#RegionCount * 100) SR.ServiceRecordID, SR.LocationStd, SR.AreaServedStd, SR.RegionalLimited, R.Region,
CASE WHEN (ISNULL(R_SR.RegionID,0) = 0 AND ISNULL(R_SR_Serv.RegionID,0) = 0) THEN 0 ELSE 1 END AS Visible
FROM ServiceRecord SR
INNER JOIN Sites S ON SR.SiteID = S.SiteID
INNER JOIN Regions R ON R.SiteID = S.SiteID
LEFT OUTER JOIN lkup_Region_ServiceRecord R_SR ON R_SR.RegionID = R.RegionID AND R_SR.ServiceRecordID = SR.ServiceRecordID
LEFT OUTER JOIN lkup_Region_ServiceRecord_Serv R_SR_Serv ON R_SR_Serv.RegionID = R.RegionID AND R_SR_Serv.ServiceRecordID = SR.ServiceRecordID AND SR.RegionalLimited = 0
WHERE SR.SiteID = #SiteID
AND R.RegionID % 100 != 0
ORDER BY SR.ServiceRecordID
DECLARE #RegionList varchar(2000),#SQL varchar(max)
SELECT #RegionList = STUFF((SELECT DISTINCT ',[' + Region + ']' FROM #Temp ORDER BY ',[' + Region + ']' FOR XML PATH('')),1,1,'')
SET #SQL='SELECT * FROM
(SELECT ServiceRecordID,
dbo.fn_ServiceRecordGetServiceName(ServiceRecordID,'''') AS ServiceName,
LocationStd,
AreaServedStd,
RegionalLimited,
Region As Region,
dbo.fn_GetOtherRegionalSitesForServiceRecord(ServiceRecordID) AS OtherSites,
CAST(Visible AS INT) AS Visible FROM #Temp) B PIVOT(MAX(Visible) FOR Region IN (' + #RegionList + ')) A'
EXEC(#SQL)
Move the function calls after the PIVOT:
SET #SQL='
SELECT
A.*,
N.ServiceName,
S.OtherSites
FROM
(
SELECT
ServiceRecordID,
LocationStd,
AreaServedStd,
RegionalLimited,
Region,
CAST(Visible AS INT) AS Visible
FROM #Temp
) B
PIVOT(MAX(Visible) FOR Region IN (' + #RegionList + ')) A
OUTER APPLY (
SELECT dbo.fn_ServiceRecordGetServiceName(A.ServiceRecordID,'''')
) N (ServiceName)
OUTER APPLY (
SELECT dbo.fn_GetOtherRegionalSitesForServiceRecord(A.ServiceRecordID)
) S (OtherSites);
';
Or just put them in the outer SELECT:
SET #SQL='
SELECT
A.*,
ServiceName = dbo.fn_ServiceRecordGetServiceName(A.ServiceRecordID,''''),
OtherSites = dbo.fn_GetOtherRegionalSitesForServiceRecord(A.ServiceRecordID)
FROM
(
SELECT
ServiceRecordID,
LocationStd,
AreaServedStd,
RegionalLimited,
Region,
CAST(Visible AS INT) AS Visible
FROM #Temp
) B
PIVOT(MAX(Visible) FOR Region IN (' + #RegionList + ')) A
';
If you can possibly convert those functions to be table-valued rowset-returning consisting of a single SELECT statement, you may get a huge performance improvement as well.
CREATE FUNCTION dbo.fn_ServiceRecordGetServiceName2(
#ServiceRecordID itn
)
RETURNS TABLE
AS
RETURN ( -- single select statement
SELECT ServiceName = Blah
FROM dbo.Gorp
WHERE Gunk = 'Ralph'
);
Then
OUTER APPLY dbo.fn_ServiceRecordGetServiceName(ServiceRecordID,'''') N
And N.ServiceName will return the value(s).
Also, it is not correct to tack on square brackets to convert data values to valid sysnames. You should use the QuoteName function. This will ensure your system doesn't break no matter WHAT crazy value is entered 13 years from now (think 'Taiwan [North]'):
STUFF((SELECT DISTINCT ',' + QuoteName(Region) FROM #Temp ...
Note:
Since you said that this is for display in a web page, you don't even need to do the pivoting on the server. Instead, return 2 rowsets to the client, one with the Site data and one with the column data for the Regions. You would need an additional pass through every row in the Region rowset to find out all the regions needed, but this can be done very quickly. Finally, adjust your program code to step through the Region rows as needed for each matching Site, and created your output.
One reason this is worth the investment is that if your application grows in size, you can always throw another web server at the problem, but it's a lot harder to throw another database at it. A new web server will cost less than continually beefing up your SQL Server.
P.S. Even dynamic SQL is easier to deal with when you format it well. :)
Related
I currently have a query which I am using to unpivot an existing table.
Some background information on the table - each year a new column is added to the table to indicate the $ values for a project ID for that year. With every column added one will be dropped. All these columns are prefixed with 'YR_' followed by the new year. There are constantly 20 'YR_' columns.
I am required to unpivot the 'YR_' columns so that they appear as per below, allowing me to utilize the information easier for several reports -
Before unpivot -
ProjectID YR_16 YR_17 YR_18 YR_19 YR_20
10 0 100 20 25 100
After unpivot -
ProjectID YR Value
10 YR_16 0
10 YR_17 100
10 YR_18 20
10 YR_19 25
10 YR_20 100
Below is the query I am using to create the unpivot table, which will dynamically pick up columns as they are added/dropped
declare #query as NVARCHAR(max);
Declare #cols as NVARCHAR(250) = STUFF((
Select distinct ',' + QUOTENAME(Column_Name)
From INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'F1BWK_PLM_CAPEX'
And COLUMN_NAME like 'YR_%'
For XML Path(''), type)
.value('.','NVARCHAR(MAX)')
,1,1,'');
Select #query = 'With Unpivoted as
(
Select * from F1BWK_PLM_CAPEX U
Unpivot (
Val
For Yr in (' + #cols + ')
)
As UnpivotTable
)
Select U.*
From Unpivoted U
inner join [dbo].[F1_SYPAR_CTL] CTL
on CTL.value = U.WS_VERS
and CTL.PARAM_NAME like ''MBRC_CURR_PLM_BUDVER''
inner join [dbo].[F1_SYPAR_CTL] CTL2
on CTL2.value = U.WS_NAME
and CTL2.PARAM_NAME like ''MBRC_CURR_PLM_BUDWSH''';
Exec(#query);
I am having issues in turning this query into a function so that I can call upon it and save it in SQL Server so that other members of my team can use it when they require it.
This is my first time using unpivot tables and creating functions. Open to suggestions on changing my dynamic unpivot query to best suits my needs.
Thanks in advance
In SQL Server you are not allowed to have dynamic SQL in functions.
You can create an SP that will return a record set generated using your code above.
UPDATE:
For the sake of completeness:
You can also call the above SP using (OPENQUERY) which would allow you to join to other tables without having to save to a temp table first but IMO it is an ugly way do this.
END UPDATE:
Another way is to create a VIEW that would un-pivot this table and create an SP that would regenerate this view after the process that adds a new column is completed e.g.
CREATE TABLE F1BWK_PLM_CAPEX( Val INT, Yr_17 INT, Yr_18 INT )
INSERT INTO F1BWK_PLM_CAPEX
SELECT 1, 10, 12
CREATE VIEW F1BWK_PLM_CAPEX_UNPIVOTED
AS
SELECT *
FROM F1BWK_PLM_CAPEX AS U
UNPIVOT (
Val2 FOR Yr IN (Yr_17, Yr_18)
)
AS UnpivotTable;
CREATE PROCEDURE dbo.Create_F1BWK_PLM_CAPEX_UNPIVOTED
AS
SET NOCOUNT ON
DECLARE #query as NVARCHAR(max);
DECLARE #cols as NVARCHAR(250) =
STUFF((
SELECT distinct ',' + QUOTENAME( Column_Name )
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'F1BWK_PLM_CAPEX'
AND COLUMN_NAME like 'YR_%'
For XML Path(''), type)
.value('.','NVARCHAR(MAX)' )
,1,1,'');
SELECT #query = '
ALTER VIEW F1BWK_PLM_CAPEX_UNPIVOTED
AS
SELECT *
FROM F1BWK_PLM_CAPEX AS U
UNPIVOT (
Val2 FOR Yr IN ( ' + #cols + ' )
)
AS UnpivotTable
';
EXEC(#query);
RETURN;
Now your query would become this:
SELECT *
FROM F1BWK_PLM_CAPEX_UNPIVOTED AS U
INNER JOIN [dbo].[F1_SYPAR_CTL] AS CTL
ON CTL.value = U.WS_VERS and CTL.PARAM_NAME = 'MBRC_CURR_PLM_BUDVER'
INNER JOIN [dbo].[F1_SYPAR_CTL] AS CTL2
ON CTL2.value = U.WS_NAME AND CTL2.PARAM_NAME = 'MBRC_CURR_PLM_BUDWSH'
At the end of the year a process runs that adds a new column:
ALTER TABLE F1BWK_PLM_CAPEX
ADD Yr_19 INT NOT NULL DEFAULT( 10 )
This process needs to run this SP to regenerate the view:
EXEC Create_F1BWK_PLM_CAPEX_UNPIVOTED
Notes:
I have replaced LIKE with = as your query matches on a full value. Only use LIKE when you need to specify wild cards.
I have also fixed a syntax error by changing Val to Val2 in the UNPIVOT query
I tried to convert the (null) values with 0 (zeros) output in PIVOT function but have no success.
Below is the table and the syntax I've tried:
SELECT
CLASS,
[AZ],
[CA],
[TX]
FROM #TEMP
PIVOT (SUM(DATA)
FOR STATE IN ([AZ], [CA], [TX])) AS PVT
ORDER BY CLASS
CLASS AZ CA TX
RICE 10 4 (null)
COIN 30 3 2
VEGIE (null) (null) 9
I tried to use the ISNULL but did not work.
PIVOT SUM(ISNULL(DATA,0)) AS QTY
What syntax do I need to use?
SELECT CLASS,
isnull([AZ],0),
isnull([CA],0),
isnull([TX],0)
FROM #TEMP
PIVOT (SUM(DATA)
FOR STATE IN ([AZ], [CA], [TX])) AS PVT
ORDER BY CLASS
If you have a situation where you are using dynamic columns in your pivot statement you could use the following:
DECLARE #cols NVARCHAR(MAX)
DECLARE #colsWithNoNulls NVARCHAR(MAX)
DECLARE #query NVARCHAR(MAX)
SET #cols = STUFF((SELECT distinct ',' + QUOTENAME(Name)
FROM Hospital
WHERE Active = 1 AND StateId IS NOT NULL
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
SET #colsWithNoNulls = STUFF(
(
SELECT distinct ',ISNULL(' + QUOTENAME(Name) + ', ''No'') ' + QUOTENAME(Name)
FROM Hospital
WHERE Active = 1 AND StateId IS NOT NULL
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
EXEC ('
SELECT Clinician, ' + #colsWithNoNulls + '
FROM
(
SELECT DISTINCT p.FullName AS Clinician, h.Name, CASE WHEN phl.personhospitalloginid IS NOT NULL THEN ''Yes'' ELSE ''No'' END AS HasLogin
FROM Person p
INNER JOIN personlicense pl ON pl.personid = p.personid
INNER JOIN LicenseType lt on lt.licensetypeid = pl.licensetypeid
INNER JOIN licensetypegroup ltg ON ltg.licensetypegroupid = lt.licensetypegroupid
INNER JOIN Hospital h ON h.StateId = pl.StateId
LEFT JOIN PersonHospitalLogin phl ON phl.personid = p.personid AND phl.HospitalId = h.hospitalid
WHERE ltg.Name = ''RN'' AND
pl.licenseactivestatusid = 2 AND
h.Active = 1 AND
h.StateId IS NOT NULL
) AS Results
PIVOT
(
MAX(HasLogin)
FOR Name IN (' + #cols + ')
) p
')
You cannot place the IsNull() until after the data is selected so you will place the IsNull() around the final value in the SELECT:
SELECT CLASS,
IsNull([AZ], 0) as [AZ],
IsNull([CA], 0) as [CA],
IsNull([TX], 0) as [TX]
FROM #TEMP
PIVOT
(
SUM(DATA)
FOR STATE IN ([AZ], [CA], [TX])
) AS PVT
ORDER BY CLASS
Sometimes it's better to think like a parser, like T-SQL parser. While executing the statement, parser does not have any value in Pivot section and you can't have any check expression in that section. By the way, you can simply use this:
SELECT CLASS
, IsNull([AZ], 0)
, IsNull([CA], 0)
, IsNull([TX], 0)
FROM #TEMP
PIVOT (
SUM(DATA)
FOR STATE IN (
[AZ]
, [CA]
, [TX]
)
) AS PVT
ORDER BY CLASS
You have to account for all values in the pivot set. you can accomplish this using a cartesian product.
select pivoted.*
from (
select cartesian.key1, cartesian.key2, isnull(relationship.[value],'nullvalue') as [value]
from (
select k1.key1, k2.key2
from ( select distinct key1 from relationship) k1
,( select distinct key2 from relationship) k2
) cartesian
left outer join relationship on relationship.key1 = cartesian.key1 and relationship.key2 = carterisan.key2
) data
pivot (
max(data.value) for ([key2_v1], [key2_v2], [key2_v3], ...)
) pivoted
To modify the results under pivot, you can put the columns in the selected fields and then modify them accordingly. May be you can use DECODE for the columns you have built using pivot function.
Kranti A
I have encountered a similar problem. The root cause is that (use your scenario for my case), in the #temp table, there is no record for:
a. CLASS=RICE and STATE=TX
b. CLASS=VEGIE and (STATE=AZ or STATE=CA)
So, when MSSQL does pivot for no record, MSSQL always shows NULL for MAX, SUM, ... (aggregate functions).
None of above solutions (IsNull([AZ], 0)) works for me, but I do get ideas from these solutions.
Sorry, it really depends on the #TEMP table. I can only provide some suggestions.
Make sure #TEMP table have records for below condition, even Data is null.
a. CLASS=RICE and STATE=TX
b. CLASS=VEGIE and (STATE=AZ or STATE=CA)
You may need to use cartesian product: select A.*, B.* from A, B
In the select query for #temp, if you need to join any table with WHERE, then would better put where inside another sub select query. (Goal is 1.)
Use isnull(DATA, 0) in #TEMP table.
Before pivot, make sure you have achieved Goal 1.
I can't give an answer to the original question, since there is no enough info for #temp table. I have pasted my code as example here.
SELECT * FROM (
SELECT eeee.id as enterprise_id
, eeee.name AS enterprise_name
, eeee.indicator_name
, CONVERT(varchar(12) , isnull(eid.[date],'2019-12-01') , 23) AS data_date
, isnull(eid.value,0) AS indicator_value
FROM (select ei.id as indicator_id, ei.name as indicator_name, e.* FROM tbl_enterprise_indicator ei, tbl_enterprise e) eeee
LEFT JOIN (select * from tbl_enterprise_indicator_data WHERE [date]='2020-01-01') eid
ON eeee.id = eid.enterprise_id and eeee.indicator_id = enterprise_indicator_id
) AS P
PIVOT
(
SUM(P.indicator_value) FOR P.indicator_name IN(TX,CA)
) AS T
I've been working on this query for an hour and half but I can't get it done,
First, this is my query:
SELECT
Questions, PossibleAnswer,
((COUNT(PossibleAnswer) + 0.0) / 10 ) * 100 AS Percentage
FROM
(SELECT
A.AnswerID, B.Questions, B.QuestionID, C.PossibleAnswer
FROM
TblSurveyCustomerAnswers A
INNER JOIN
TblSurveyQuestion B ON A.QuestionID = B.QuestionID
INNER JOIN
TblSurveyAnswer C ON A.AnswerID = C.AnswerID
WHERE
A.CustomerID = 1) AS SOURCE
GROUP BY
Questions, PossibleAnswer
The result is below:
Now, I want the rows for column name PossibleAnswer to be converted in columns, so I did a research and found the PIVOT command (I need dynamic since it's a possible answers field) and this is my code
DECLARE #DynamicPivotQuery AS NVARCHAR(MAX)
DECLARE #ColumnName AS NVARCHAR(MAX)
--Get distinct values of the PIVOT Column
SELECT #ColumnName= ISNULL(#ColumnName + ',','')
+ QUOTENAME(PossibleAnswer)
FROM
(
SELECT DISTINCT X.*
FROM
(
SELECT Questions,PossibleAnswer, ((COUNT(PossibleAnswer) + 0.0) / 10 ) * 100 AS Percentage
FROM
(
SELECT A.AnswerID,B.Questions, B.QuestionID, C.PossibleAnswer
FROM TblSurveyCustomerAnswers A
INNER JOIN TblSurveyQuestion B
ON A.QuestionID = B.QuestionID
INNER JOIN TblSurveyAnswer C
ON A.AnswerID = C.AnswerID
WHERE A.CustomerID = 1
) AS SOURCE
GROUP BY Questions, PossibleAnswer
) X
) AS B
--Prepare the PIVOT query using the dynamic
SET #DynamicPivotQuery =
'SELECT Questions, ' + #ColumnName + '
FROM TblSurveyCustomerAnswers A
INNER JOIN TblSurveyQuestion B
ON A.QuestionID = B.QuestionID
PIVOT(Max(Questions)
FOR PossibleAnswer IN (' + #ColumnName + ')) AS PVTTable'
--Execute the Dynamic Pivot Query
EXEC sp_executesql #DynamicPivotQuery
And I can't get the pivot work, need help. I'm stuck. See this error:
In general for questions like these you should provide sample data, table definitions and expected output so people can take your script, fiddle with it and produce something that works. See How to post a T-SQL question on a public forum for one way to do this.
Since it is hard to look at a dynamic script, not having the table structures, and point at what your problem is, let me give you the following advice:
Instead of taking your big query that produces the output and form queries around that big query, first insert the output of that query into a temporary table. You can do this by placing an INTO #temp_table clause after the SELECT clause. This creates a new temporary table #temp_table containing the output of the query.
SELECT --your select columns
INTO #p_in -- creates a temporary table #p_in that contains the output
FROM --the rest of your query
Determine the pivot columns based on the newly created temporary table. It'll be a lot more conscise and easier to spot errors
Write your Dynamic SQL using the temporary table (again it'll be a lot more conscise and easier to spot errors)
Don't forget to DROP the temporary table after executing the dynamic SQL.
I just try to solved problem without temporary table. You may edit query as your requirement.
--For PIVOT column
DECLARE #ColumnName AS NVARCHAR(MAX)
SELECT #ColumnName ''''+ PossibleAnswer + '''' + ' , ' + #ColumnName
FROM
(
SELECT
DISTINCT PossibleAnswer
FROM
(
SELECT
A.AnswerID, B.Questions, B.QuestionID, C.PossibleAnswer
FROM
TblSurveyCustomerAnswers A
INNER JOIN
TblSurveyQuestion B ON A.QuestionID = B.QuestionID
INNER JOIN
TblSurveyAnswer C ON A.AnswerID = C.AnswerID
WHERE
A.CustomerID = 1
) AS SOURCE
)B
--For removing last comma
IF #ColumnName != ''
BEGIN
SET #ColumnName = SUBSTRING(#ColumnName, 1, LEN(#ColumnName)-1)
END
-- Make result
SELECT *
FROM
(
SELECT Questions,PossibleAnswer, ((COUNT(PossibleAnswer) + 0.0) / 10 ) * 100 AS Percentage
FROM
(
SELECT A.AnswerID,B.Questions, B.QuestionID, C.PossibleAnswer
FROM TblSurveyCustomerAnswers A
INNER JOIN TblSurveyQuestion B
ON A.QuestionID = B.QuestionID
INNER JOIN TblSurveyAnswer C
ON A.AnswerID = C.AnswerID
WHERE A.CustomerID = 1
) AS SOURCE
GROUP BY Questions, PossibleAnswer
)C
PIVOT
( Max(Questions)
FOR PossibleAnswer IN (#ColumnName)
) AS PVTTable
I created a User-Defined Table Type:
CREATE TYPE dbo.ListTableType AS TABLE(
ITEM varchar(500) NULL
)
I leverage it in a function:
CREATE FUNCTION dbo.fn_list_to_string
(
#LIST dbo.ListTableType READONLY
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #RESULT varchar(max)
SET #RESULT = ''
DECLARE #NL AS CHAR(2) = CHAR(13) + CHAR(10)
SELECT #RESULT = #RESULT + ITEM + #NL FROM #LIST
SET #RESULT = SUBSTRING(#RESULT, 1, LEN(#RESULT) - 1)
RETURN #RESULT
END
Finally, I try to use this function in a simple select:
SELECT
P.PROGRAM_ID,
PROGRAM_NAME,
PROGRAM_DESC,
P.STATUS_ID,
STATUS_DESC,
P.CONTACT_SID,
I.FIRST_NAME + ' ' + I.LAST_NAME as CONTACT_NAME,
P.CLARITY_ID,
dbo.fn_list_to_string(
( SELECT CONVERT(varchar,CLARITY_ID) as ITEM
FROM dbo.MUSEUM_PROGRAM_PROJECTS as A
JOIN dbo.MUSEUM_PROJECTS as B on B.PROJECT_ID = A.PROJECT_ID
WHERE PROGRAM_ID = P.PROGRAM_ID )
) as PROJECT_CLARITY_IDS
FROM dbo.MUSEUM_PROGRAMS as P
LEFT JOIN dbo.MUSEUM_PROGRAM_STATUS_TYPES as S on S.STATUS_ID = P.STATUS_ID
LEFT JOIN dbo.v_IDVAULT_ENRICHED_CURRENT_EMPLOYEES as I on I.[SID] = P.CONTACT_SID
But I get this error:
Operand type clash: varchar is incompatible with ListTableType
Any idea why? Also if there's another [more elegant] way to achieve what I'm trying to do I'm open to suggestions as well! Thanks in advance!
Here is a simple demonstration of the FOR XML PATH technique which does all of this with a very simple subquery and no table types or extremely inefficient multi-statement table-valued functions etc.
USE tempdb;
GO
CREATE TABLE dbo.P(Program_ID INT);
CREATE TABLE dbo.M(Clarity_ID INT, Program_ID INT);
INSERT dbo.P VALUES(1),(2),(3),(4);
INSERT dbo.M VALUES(1,1),(1,2),(2,3),(3,2),(1,4),(4,1);
SELECT
P.PROGRAM_ID,
PROJECT_CLARITY_IDS = STUFF((
SELECT CHAR(13)+CHAR(10)+CONVERT(VARCHAR(12),Clarity_ID)
FROM dbo.M WHERE Program_ID = p.Program_ID
FOR XML PATH(''), TYPE).value('.[1]','nvarchar(max)'),1,2,'')
FROM dbo.P AS p;
SQLfiddle demo
The output doesn't look right in SQLfiddle or in results to grid in Management Studio, because they strip out carriage returns/line feeds for display purposes, but you can replace CHAR(13)+CHAR(10) with two commas or semi-colons or something to verify that there are two characters there.
Using STUFF..FOR XML PATH construct for concatanation in combination with CTE will get the results you'd like. Something like this:
WITH CTE_PROJECT_CLARITIES AS
(
SELECT DISTINCT PROGRAM_ID
, STUFF((
SELECT CHAR(13) + CHAR(10) + CONVERT(varchar(11),CLARITY_ID)
FROM dbo.MUSEUM_PROGRAM_PROJECTS as A
JOIN dbo.MUSEUM_PROJECTS as B on B.PROJECT_ID = A.PROJECT_ID
WHERE A.PROGRAM_ID = X.PROGRAM_ID
FOR XML PATH ('')),1,2,'') AS PROJECT_CLARITY_IDS
FROM MUSEUM_PROGRAM_PROJECTS X
)
SELECT
P.PROGRAM_ID,
PROGRAM_NAME,
PROGRAM_DESC,
P.STATUS_ID,
STATUS_DESC,
P.CONTACT_SID,
I.FIRST_NAME + ' ' + I.LAST_NAME as CONTACT_NAME,
P.CLARITY_ID,
X.PROJECT_CLARITY_IDS
FROM dbo.MUSEUM_PROGRAMS as P
LEFT JOIN dbo.MUSEUM_PROGRAM_STATUS_TYPES as S on S.STATUS_ID = P.STATUS_ID
LEFT JOIN dbo.v_IDVAULT_ENRICHED_CURRENT_EMPLOYEES as I on I.[SID] = P.CONTACT_SID
LEFT JOIN CTE_PROJECT_CLARITIES X ON X.PROGRAM_ID = p.PROGRAM_ID
SQLFiddle DEMO (not sure if I got the columns right, but you'll get the idea)
I have a table that contains text field with placeholders. Something like this:
Row Notes
1. This is some notes ##placeholder130## this ##myPlaceholder##, #oneMore#. End.
2. Second row...just a ##test#.
(This table contains about 1-5k rows on average. Average number of placeholders in one row is 5-15).
Now, I have a lookup table that looks like this:
Name Value
placeholder130 Dog
myPlaceholder Cat
oneMore Cow
test Horse
(Lookup table will contain anywhere from 10k to 100k records)
I need to find the fastest way to join those placeholders from strings to a lookup table and replace with value. So, my result should look like this (1st row):
This is some notes Dog this Cat, Cow. End.
What I came up with was to split each row into multiple for each placeholder and then join it to lookup table and then concat records back to original row with new values, but it takes around 10-30 seconds on average.
You could try to split the string using a numbers table and rebuild it with for xml path.
select (
select coalesce(L.Value, T.Value)
from Numbers as N
cross apply (select substring(Notes.notes, N.Number, charindex('##', Notes.notes + '##', N.Number) - N.Number)) as T(Value)
left outer join Lookup as L
on L.Name = T.Value
where N.Number <= len(notes) and
substring('##' + notes, Number, 2) = '##'
order by N.Number
for xml path(''), type
).value('text()[1]', 'varchar(max)')
from Notes
SQL Fiddle
I borrowed the string splitting from this blog post by Aaron Bertrand
SQL Server is not very fast with string manipulation, so this is probably best done client-side. Have the client load the entire lookup table, and replace the notes as they arrived.
Having said that, it can of course be done in SQL. Here's a solution with a recursive CTE. It performs one lookup per recursion step:
; with Repl as
(
select row_number() over (order by l.name) rn
, Name
, Value
from Lookup l
)
, Recurse as
(
select Notes
, 0 as rn
from Notes
union all
select replace(Notes, '##' + l.name + '##', l.value)
, r.rn + 1
from Recurse r
join Repl l
on l.rn = r.rn + 1
)
select *
from Recurse
where rn =
(
select count(*)
from Lookup
)
option (maxrecursion 0)
Example at SQL Fiddle.
Another option is a while loop to keep replacing lookups until no more are found:
declare #notes table (notes varchar(max))
insert #notes
select Notes
from Notes
while 1=1
begin
update n
set Notes = replace(n.Notes, '##' + l.name + '##', l.value)
from #notes n
outer apply
(
select top 1 Name
, Value
from Lookup l
where n.Notes like '%##' + l.name + '##%'
) l
where l.name is not null
if ##rowcount = 0
break
end
select *
from #notes
Example at SQL Fiddle.
I second the comment that tsql is just not suited for this operation, but if you must do it in the db here is an example using a function to manage the multiple replace statements.
Since you have a relatively small number of tokens in each note (5-15) and a very large number of tokens (10k-100k) my function first extracts tokens from the input as potential tokens and uses that set to join to your lookup (dbo.Token below). It was far too much work to look for an occurrence of any of your tokens in each note.
I did a bit of perf testing using 50k tokens and 5k notes and this function runs really well, completing in <2 seconds (on my laptop). Please report back how this strategy performs for you.
note: In your example data the token format was not consistent (##_#, ##_##, #_#), I am guessing this was simply a typo and assume all tokens take the form of ##TokenName##.
--setup
if object_id('dbo.[Lookup]') is not null
drop table dbo.[Lookup];
go
if object_id('dbo.fn_ReplaceLookups') is not null
drop function dbo.fn_ReplaceLookups;
go
create table dbo.[Lookup] (LookupName varchar(100) primary key, LookupValue varchar(100));
insert into dbo.[Lookup]
select '##placeholder130##','Dog' union all
select '##myPlaceholder##','Cat' union all
select '##oneMore##','Cow' union all
select '##test##','Horse';
go
create function [dbo].[fn_ReplaceLookups](#input varchar(max))
returns varchar(max)
as
begin
declare #xml xml;
select #xml = cast(('<r><i>'+replace(#input,'##' ,'</i><i>')+'</i></r>') as xml);
--extract the potential tokens
declare #LookupsInString table (LookupName varchar(100) primary key);
insert into #LookupsInString
select distinct '##'+v+'##'
from ( select [v] = r.n.value('(./text())[1]', 'varchar(100)'),
[r] = row_number() over (order by n)
from #xml.nodes('r/i') r(n)
)d(v,r)
where r%2=0;
--tokenize the input
select #input = replace(#input, l.LookupName, l.LookupValue)
from dbo.[Lookup] l
join #LookupsInString lis on
l.LookupName = lis.LookupName;
return #input;
end
go
return
--usage
declare #Notes table ([Id] int primary key, notes varchar(100));
insert into #Notes
select 1, 'This is some notes ##placeholder130## this ##myPlaceholder##, ##oneMore##. End.' union all
select 2, 'Second row...just a ##test##.';
select *,
dbo.fn_ReplaceLookups(notes)
from #Notes;
Returns:
Tokenized
--------------------------------------------------------
This is some notes Dog this Cat, Cow. End.
Second row...just a Horse.
Try this
;WITH CTE (org, calc, [Notes], [level]) AS
(
SELECT [Notes], [Notes], CONVERT(varchar(MAX),[Notes]), 0 FROM PlaceholderTable
UNION ALL
SELECT CTE.org, CTE.[Notes],
CONVERT(varchar(MAX), REPLACE(CTE.[Notes],'##' + T.[Name] + '##', T.[Value])), CTE.[level] + 1
FROM CTE
INNER JOIN LookupTable T ON CTE.[Notes] LIKE '%##' + T.[Name] + '##%'
)
SELECT DISTINCT org, [Notes], level FROM CTE
WHERE [level] = (SELECT MAX(level) FROM CTE c WHERE CTE.org = c.org)
SQL FIDDLE DEMO
Check the below devioblog post for reference
devioblog post
To get speed, you can preprocess the note templates into a more efficient form. This will be a sequence of fragments, with each ending in a substitution. The substitution might be NULL for the last fragment.
Notes
Id FragSeq Text SubsId
1 1 'This is some notes ' 1
1 2 ' this ' 2
1 3 ', ' 3
1 4 '. End.' null
2 1 'Second row...just a ' 4
2 2 '.' null
Subs
Id Name Value
1 'placeholder130' 'Dog'
2 'myPlaceholder' 'Cat'
3 'oneMore' 'Cow'
4 'test' 'Horse'
Now we can do the substitutions with a simple join.
SELECT Notes.Text + COALESCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
This produces a list of fragments with substitutions complete. I am not an MSQL user, but in most dialects of SQL you can concatenate these fragments in a variable quite easily:
DECLARE #Note VARCHAR(8000)
SELECT #Note = COALESCE(#Note, '') + Notes.Text + COALSCE(Subs.Value, '')
FROM Notes LEFT JOIN Subs
ON SubsId = Subs.Id WHERE Notes.Id = ?
ORDER BY FragSeq
Pre-processing a note template into fragments will be straightforward using the string splitting techniques of other posts.
Unfortunately I'm not at a location where I can test this, but it ought to work fine.
I really don't know how it will perform with 10k+ of lookups.
how does the old dynamic SQL performs?
DECLARE #sqlCommand NVARCHAR(MAX)
SELECT #sqlCommand = N'PlaceholderTable.[Notes]'
SELECT #sqlCommand = 'REPLACE( ' + #sqlCommand +
', ''##' + LookupTable.[Name] + '##'', ''' +
LookupTable.[Value] + ''')'
FROM LookupTable
SELECT #sqlCommand = 'SELECT *, ' + #sqlCommand + ' FROM PlaceholderTable'
EXECUTE sp_executesql #sqlCommand
Fiddle demo
And now for some recursive CTE.
If your indexes are correctly set up, this one should be very fast or very slow. SQL Server always surprises me with performance extremes when it comes to the r-CTE...
;WITH T AS (
SELECT
Row,
StartIdx = 1, -- 1 as first starting index
EndIdx = CAST(patindex('%##%', Notes) as int), -- first ending index
Result = substring(Notes, 1, patindex('%##%', Notes) - 1)
-- (first) temp result bounded by indexes
FROM PlaceholderTable -- **this is your source table**
UNION ALL
SELECT
pt.Row,
StartIdx = newstartidx, -- starting index (calculated in calc1)
EndIdx = EndIdx + CAST(newendidx as int) + 1, -- ending index (calculated in calc4 + total offset)
Result = Result + CAST(ISNULL(newtokensub, newtoken) as nvarchar(max))
-- temp result taken from subquery or original
FROM
T
JOIN PlaceholderTable pt -- **this is your source table**
ON pt.Row = T.Row
CROSS APPLY(
SELECT newstartidx = EndIdx + 2 -- new starting index moved by 2 from last end ('##')
) calc1
CROSS APPLY(
SELECT newtxt = substring(pt.Notes, newstartidx, len(pt.Notes))
-- current piece of txt we work on
) calc2
CROSS APPLY(
SELECT patidx = patindex('%##%', newtxt) -- current index of '##'
) calc3
CROSS APPLY(
SELECT newendidx = CASE
WHEN patidx = 0 THEN len(newtxt) + 1
ELSE patidx END -- if last piece of txt, end with its length
) calc4
CROSS APPLY(
SELECT newtoken = substring(pt.Notes, newstartidx, newendidx - 1)
-- get the new token
) calc5
OUTER APPLY(
SELECT newtokensub = Value
FROM LookupTable
WHERE Name = newtoken -- substitute the token if you can find it in **your lookup table**
) calc6
WHERE newstartidx + len(newtxt) - 1 <= len(pt.Notes)
-- do this while {new starting index} + {length of txt we work on} exceeds total length
)
,lastProcessed AS (
SELECT
Row,
Result,
rn = row_number() over(partition by Row order by StartIdx desc)
FROM T
) -- enumerate all (including intermediate) results
SELECT *
FROM lastProcessed
WHERE rn = 1 -- filter out intermediate results (display only last ones)