Is there any way of improving the performance of this SQL Function? - sql

I have a table which looks something like
Event ID Date Instructor
1 1/1/2000 Person 1
1 1/1/2000 Person 2
Now what I want to do is return this data so that each event is on one row and the Instructors are all in one column split with a <br> tag like 'Person 1 <br> Person 2'
Currently the way I have done this is to use a function
CREATE FUNCTION fnReturnInstructorNamesAsHTML
(
#EventID INT
)
RETURNS VARCHAR(max)
BEGIN
DECLARE #Result VARCHAR(MAX)
SELECT
#result = coalesce(#result + '<br>', '') + inst.InstructorName
FROM
[OpsInstructorEventsView] inst
WHERE
inst.EventID = #EventID
RETURN #result
END
Then my main stored procedure calls it like
SELECT
ev.[BGcolour],
ev.[Event] AS name,
ev.[eventid] AS ID,
ev.[eventstart],
ev.[CourseType],
ev.[Type],
ev.[OtherType],
ev.[OtherTypeDesc],
ev.[eventend],
ev.[CourseNo],
ev.[Confirmed],
ev.[Cancelled],
ev.[DeviceID] AS resource_id,
ev.Crew,
ev.CompanyName ,
ev.Notes,
dbo.fnReturnInstructorNamesAsHTML(ev.EventID) as Names
FROM
[OpsSimEventsView] ev
JOIN
[OpsInstructorEventsView] inst
ON
ev.EventID = inst.EventID
This is very slow, im looking at 4seconds per call to the DB. Is there a way for me to improve the performance of the function? Its a fairly small function so im not sure what I can do here, and I couldnt see a way to work the COALESCE into the SELECT of the main procedure.
Any help would be really appreciated, thanks.

You could try something like this.
SELECT
ev.[BGcolour],
ev.[Event] AS name,
ev.[eventid] AS ID,
ev.[eventstart],
ev.[CourseType],
ev.[Type],
ev.[OtherType],
ev.[OtherTypeDesc],
ev.[eventend],
ev.[CourseNo],
ev.[Confirmed],
ev.[Cancelled],
ev.[DeviceID] AS resource_id,
ev.Crew,
ev.CompanyName ,
ev.Notes,
STUFF((SELECT '<br>'+inst.InstructorName
FROM [OpsInstructorEventsView] inst
WHERE ev.EventID = inst.EventID
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)'), 1, 4, '') as Names
FROM
[OpsSimEventsView] ev
Not sure why you have joined OpsInstructorEventsView in the main query. I removed it here but if you needed you can just add it again.

A few things to look at:
1) The overhead of functions makes them expensive to call, especially in the select statement of a query that could potentially be returning thousands of rows. It will have to execute that function for every one of them. Consider merging the behavior of the function into your main stored procedure, where the SQL Server can make better use of its optimizer.
2) Since you are joining on event id in both tables, make sure you have an index on those two columns. I would expect that you do, given that those both appear to be primary key columns, but make sure. An index can make a huge difference.
3) Convert your coalesce call into its equivalent case statements to remove the overhead of calling that function.

Yes make it an INLINE Table-Valued SQL function:
CREATE FUNCTION fnReturnInstructorNamesAsHTML
( #EventID INT )
RETURNS Table
As
Return
SELECT InstructorName + '<br>' result
FROM OpsInstructorEventsView
WHERE EventID = #EventID
Go
Then, in your SQL Statement, use it like this
SELECT ]Other stuff],
(Select result from dbo.fnReturnInstructorNamesAsHTML(ev.EventID)) as Names
FROM OpsSimEventsView ev
JOIN OpsInstructorEventsView inst
ON ev.EventID = inst.EventID
I'm not exactly clear how the query you show in your question is concatenating data from multiple rows in one row of the result, but the problem is that ordinary UDFs are compiled on use, on EVERY use, so for each row in your output result the Query processopr has to recompile the UDF again. THis is NOT True for an "inline table valued" UDF, as it's sql is folded into the outer sql before it is passed to the SQL optimizer, (the subsystem that generates the statement cache plan) and so the UDF is only compiled once.

Related

How to use Join with like operator and then casting columns

I have 2 tables with these columns:
CREATE TABLE #temp
(
Phone_number varchar(100) -- example data: "2022033456"
)
CREATE TABLE orders
(
Addons ntext -- example data: "Enter phone:2022033456<br>Thephoneisvalid"
)
I have to join these two tables using 'LIKE' as the phone numbers are not in same format. Little background I am joining the #temp table on the phone number with orders table on its Addons value. Then again in WHERE condition I am trying to match them and get some results. Here is my code. But my results that I am getting are not accurate. As its not returning any data. I don't know what I am doing wrong. I am using SQL Server.
select
*
from
order_no as n
join
orders as o on n.order_no = o.order_no
join
#temp as t on t.phone_number like '%'+ cast(o.Addons as varchar(max))+'%'
where
t.phone_number = '%' + cast(o.Addons as varchar(max)) + '%'
You can not use LIKE statement in the JOIN condition. Please provide more information on your tables. You have to convert the format of one of the phone field to compile with other phone field format in order to join.
I think your join condition is in the wrong order. Because your question explicitly mentions two tables, let's stick with those:
select *
from orders o JOIN
#temp t
on cast(o.Addons as varchar(max)) like '%' + t.phone_number + '%';
It has been so long since I dealt with the text data type (in SQL Server), that I don't remember if the cast() is necessary or not.
Instead of trying to do everything in a single top-level query, you should apply a transformation projection to your orders table and use that as a subquery, which will make the query easier to understand.
Using the CHARINDEX function will make this a lot easier, however it does not support ntext, you will need to change your schema to use nvarchar(max) instead - which you should be doing anyway as ntext is deprecated, fortunately you can use CONVERT( nvarchar(max), someNTextValue ), though this will reduce performance as you won't be able to use any indexes on your ntext values - but this query will run slowly anyway.
SELECT
orders2.*,
CASE WHEN orders2.PhoneStart > 0 AND orders2.PhoneEnd > 0 THEN
SUBSTRING( orders2.Addons, orders2.PhoneStart, orders2.PhoneEnd - orders2.PhoneStart )
ELSE
NULL
END AS ExtractedPhoneNumber
FROM
(
SELECT
orders.*, -- never use `*` in production, so replace this with the actual columns in your orders table
CHARINDEX('Enter phone:', Addons) AS PhoneStart,
CHARINDEX('<br>Thephoneisvalid', AddOns, CHARINDEX('Enter phone:', Addons) ) AS PhoneEnd
FROM
orders
) AS orders2
I suggest converting the above into a VIEW or CTE so you can directly query it in your JOIN expression:
CREATE VIEW ordersWithPhoneNumbers AS
-- copy and paste the above query here, then execute the batch to create the view, you only need to do this once.
Then you can use it like so:
SELECT
* -- again, avoid the use of the star selector in production use
FROM
ordersWithPhoneNumbers AS o2 -- this is the above query as a VIEW
INNER JOIN order_no ON o2.order_no = order_no.order_no
INNER JOIN #temp AS t ON o2.ExtractedPhoneNumber = t.phone_number
Actually, I take back my previous remark about performance - if you add an index to the ExtractedPhoneNumber column of the ordersWithPhoneNumbers view then you'll get good performance.

With as in Oracle SQL

I would like to know if is it possible to use the clause "with as" with a variable and/or in a block begin/end.
My code is
WITH EDGE_TMP
AS
(select edge.node_beg_id,edge.node_end_id,prg_massif.longueur,prg_massif.lgvideoupartage,prg_massif.lgsanscable from prg_massif
INNER JOIN edge on prg_massif.asset_id=edge.asset_id
where prg_massif.lgvideoupartage LIKE '1' OR prg_massif.lgsanscable LIKE '1')
,
journey (TO_TOWN, STEPS,DISTANCE,WAY)
AS
(SELECT DISTINCT node_beg_id, 0, 0, CAST(&&node_begin AS VARCHAR2(2000))
FROM EDGE_TMP
WHERE node_beg_id = &&node_begin
UNION ALL
SELECT node_end_id, journey.STEPS + 1
, journey.DISTANCE + EDGE_TMP.longueur,
CONCAT(CONCAT(journey.WAY,';'), EDGE_TMP.node_end_id
)
It create a string as output separated by a ; but i need to get it back as variable or table do you know how? I used a concat to retrieve data in a big string. Can i use a table to insert data
,
A need to use the result to proceed more treatment.
Thank you,
mat
No, WITH is a part of an SQL statement only. But if you describe why you need it in pl/sql, we'll can advice you something.
Edit: if you have SQL statement which produces result you need, you can assign it's value to pl/sql variable. There are several methods to do this, simpliest is to use SELECT INTO statement (add INTO variable clause into your select).
You can use WITH clause as a part of SELECT INTO statement (at least in not-too-very-old Oracle versions).

large group of variable IDs

I have a working query that contains a large number of variable IDs. Rather than copying and pasting in each ID whenever I need to run a new query, I was wondering if there was a way to create a stored procedure out of the query below and pass in a group of IDs?
Here is the query. The IDs change all the time, so I'm trying to figure out a way of doing this easier but I'm not having much luck.
I thought about using a cursor in a stored procedure and just passing each ID, but that seems cumbersome and inefficient.
SELECT gm.geoId, T.number As surveyID, 0 as SpeciesCount
FROM (
VALUES (1994328036),(1994328037),(1994328038),(1994328039),(1994328040),(1994328041),(1994328042),(1994328043),
(1994328044),(1994328045),(1994328046),(1994328047),(1994328048),(1994328049),(1994328050),(1994328051),
(1994328052),(1994328053),(1994328054),(1994328055)
) AS T(number)
CROSS JOIN dbo.groupBiology gm
You can create a table-valued function (TVF) like this:
CREATE FUNCTION tvf_GetIDs ()
RETURNS
#output TABLE ( data int )
AS
BEGIN
INSERT INTO #output (data) VALUES
(1994328036),(1994328037),(1994328038),(1994328039),
(1994328040),(1994328041),(1994328042),(1994328043),
(1994328044),(1994328045),(1994328046),(1994328047),
(1994328048),(1994328049),(1994328050),(1994328051),
(1994328052),(1994328053),(1994328054),(1994328055)
RETURN
END
GO
then use this function wherever the IDs are required, e.g.
SELECT *
FROM Customers AS c
INNER JOIN (SELECT * FROM tvf_GetIDs()) t ON c.CustID = t.data
You only need to update the TVF whenever the IDs change.

Combine Unique Column Values Into One to Avoid Duplicates

For simplicity, assume I have two tables joined by account#. The second table has two columns, id and comment. Each account could have one or more comments and each unique comment has a unique id.
I need to write a t-sql query to generate one row for each account - which I assume means I need to combine as many comments as might exit for each account. This assumes the result set will only show the account# once. Simple?
Sql Server is a RDBMS best tuned for storing data and retrieving data, you can retrieve the desired data with one very simple query but the desired format should be handled with any of the reporting tools available like ssrs or crystal reports
Your query will be a simple inner join something like this
SELECT A.Account , B.Comment
FROM TableA AS A INNER JOIN TableB AS B
ON A.Account = B.Account
Now you can use your reporting tool to Group all the Comments by Account when Displaying data.
I do agree with M. Ali, but if you don't have that option, the following will work.
SELECT [accountID]
, [name]
, (SELECT CAST(Comment + ', ' AS VARCHAR(MAX))
FROM [comments]
WHERE (accountID = accounts.accountID)
FOR XML PATH ('')
) AS Comments
FROM accounts
SQL Fiddle
In my actual project I have this exact situation.
What you need is a solution to aggregate the comments in order to show only one line per account#.
I solve it by creating a function to concatenate the comments, like this:
create function dbo.aggregateComments( #accountId integer, #separator varchar( 5 ) )
as
begin;
declare #comments varchar( max ); set #comments = '';
select #comments = #comments + #separator + YouCommentsTableName.CommentColumn
from dbo.YouCommentsTableNAme
where YouCommentsTableName.AccountId = #accountId;
return #comments;
end;
You can use it on you query this way:
select account#, dbo.aggretateComments( account#, ',' )
from dbo.YourAccountTableName
Creating a function will give you a common place to retrieve your comments. It's a good programming practice.

sql server-query optimization with many columns

we have "Profile" table with over 60 columns like (Id, fname, lname, gender, profilestate, city, state, degree, ...).
users search other peopel on website. query is like :
WITH TempResult as (
select ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum, profile.id from Profile
where
(#a is null or a = #a) and
(#b is null or b = #b) and
...(over 60 column)
)
SELECT profile.* FROM TempResult join profile on TempResult.id = profile.id
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
sql server by default use clustered index for execution query. but total execution time is over 300. we test another solution such as multi column index in all columns in where clause but total execution time is over 400.
do you have any solution to make total execution time lower than 100.
we using sql server 2008.
Unfortunately I don't think there is a pure SQL solution to your issue. Here are a couple alternatives:
Dynamic SQL - build up a query that only includes WHERE clause statements for values that are actually provided. Assuming the average search actually only fills in 2-3 fields, indexes could be added and utilized.
Full Text Search - go to something more like a Google keyword search. No individual options.
Lucene (or something else) - Search outside of SQL; This is a fairly significant change though.
One other option that I just remembered implementing in a system once. Create a vertical table that includes all of the data you are searching on and build up a query for it. This is easiest to do with dynamic SQL, but could be done using Table Value Parameters or a temp table in a pinch.
The idea is to make a table that looks something like this:
Profile ID
Attribute Name
Attribute Value
The table should have a unique index on (Profile ID, Attribute Name) (unique to make the search work properly, index will make it perform well).
In this table you'd have rows of data like:
(1, 'city', 'grand rapids')
(1, 'state', 'MI')
(2, 'city', 'detroit')
(2, 'state', 'MI')
Then your SQL will be something like:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
WHERE (AttributeName = 'city' AND AttributeValue = 'grand rapids')
AND (AttributeName = 'state' AND AttributeValue = 'MI')
GROUP BY ProfileID
HAVING COUNT(*) = 2
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
Like I said, you could use a temp table that has attribute name/values:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
JOIN PassedInAttributeTable ON ProfileAttributes.AttributeName = PassedInAttributeTable.AttributeName
AND ProfileAttributes.AttributeValue = PassedInAttributeTable.AttributeValue
GROUP BY ProfileID
HAVING COUNT(*) = CountOfRowsInPassedInAttributeTable -- calculate or pass in
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
As I recall, this ended up performing very well, even on fairly complicated queries (though I think we only had 12 or so columns).
As a single query, I can't think of a clever way of optimising this.
Provided that each column's check is highly selective, however, the following (very long winded) code, might prove faster, assuming each individual column has it's own separate index...
WITH
filter AS (
SELECT
[a].*
FROM
(SELECT * FROM Profile WHERE #a IS NULL OR a = #a) AS [a]
INNER JOIN
(SELECT id FROM Profile WHERE b = #b UNION ALL SELECT NULL WHERE #b IS NULL) AS [b]
ON ([a].id = [b].id) OR ([b].id IS NULL)
INNER JOIN
(SELECT id FROM Profile WHERE c = #c UNION ALL SELECT NULL WHERE #c IS NULL) AS [c]
ON ([a].id = [c].id) OR ([c].id IS NULL)
.
.
.
INNER JOIN
(SELECT id FROM Profile WHERE zz = #zz UNION ALL SELECT NULL WHERE #zz IS NULL) AS [zz]
ON ([a].id = [zz].id) OR ([zz].id IS NULL)
)
, TempResult as (
SELECT
ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum,
[filter].*
FROM
[filter]
)
SELECT
*
FROM
TempResult
WHERE
(RowNum >= #FirstRow)
AND (RowNum <= #LastRow)
EDIT
Also, thinking about it, you may even get the same result just by having the 60 individual indexes. SQL Server can do INDEX MERGING...
You've several issues imho. One is that you're going to end up with a seq scan no matter what you do.
But I think your more crucial issue here is that you've an unnecessary join:
SELECT profile.* FROM TempResult
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
This is a classic "SQL Filter" query problem. I've found that the typical approaches of "(#b is null or b = #b)" & it's common derivatives all yeild mediocre performance. The OR clause tends to be the cause.
Over the years I've done a lot of Perf/Tuning & Query Optimisation. The Approach I've found best is to generate Dynamic SQL inside a Stored Proc. Most times you also need to add "with Recompile" on the statement. The Stored Proc helps reduce potential for SQL injection attacks. The Recompile is needed to force the selection of indexes appropriate to the parameters you are searching on.
Generally it is at least an order of magnitude faster.
I agree you should also look at points mentioned above like :-
If you commonly only refer to a small subset of the columns you could create non-clustered "Covering" indexes.
Highly selective (ie:those with many unique values) columns will work best if they are the lead colum in the index.
If many colums have a very small number of values, consider using The BIT datatype. OR Create your own BITMASKED BIGINT to represent many colums ie: a form of "Enumerated datatyle". But be careful as any function in the WHERE clause (like MOD or bitwise AND/OR) will prevent the optimiser from choosing an index. It works best if you know the value for each & can combine them to use an equality or range query.
While often good to find RoWID's with a small query & then join to get all the other columns you want to retrieve. (As you are doing above) This approach can sometimes backfire. If the 1st part of the query does a Clustred Index Scan then often it is faster to get the otehr columns you need in the select list & savethe 2nd table scan.
So always good to try it both ways & see what works best.
Remember to run SET STATISTICS IO ON & SET STATISTICS TIME ON. Before running your tests. Then you can see where the IO is & it may help you with index selection, for the mose frequent combination of paramaters.
I hope this makes sense without long code samples. (it is on my other machine)