Adding Non clustered index - sql

So the current query I have takes long time to run. When i run execution plan, it shows: Table Insert (#tempdata), Cost: 99%. I realized i need to add non clustered index to run the query fast. So I have added this line of code but still there is no difference:
create nonclustered index #temp_index
on [dbo].#tempData ([class_room]) include ([class_name],[class_floor],[class_building])
This is the current query I have:
IF OBJECT_ID('tempdb..#tempdata') IS NOT NULL DROP TABLE #tempdata
SELECT [class_room][class_name],[class_floor],[class_building]
INTO #tempdata
FROM class_info x
create nonclustered index #temp_index
on [dbo].#tempData ([class_room]) include ([class_name],[class_floor],[class_building])
;with cte1 as(
SELECT [class_room][class_name],[class_floor],[class_building]
FROM #tempdata
WHERE class_room <> '')
select * from cte1

This is a bit long for a comment.
Can you explain why you are not just running this code?
SELECT [class_room], [class_name], [class_floor], [class_building]
FROM class_info x
WHERE class_room <> '';
If performance is an issue, I first would recommend getting rid of unnecessary reads and writes -- such as creating a temporary table.

Related

Does Adding Indexes speed up String Wildcard % searches?

We are conducting a wildcard search on a database table with column string. Does creating a non-clustered index on columns help with wildcard searches? Will this improve performance?
CREATE TABLE [dbo].[Product](
[ProductId] [int] NOT NULL,
[ProductName] [varchar](250) NOT NULL,
[ModifiedDate] [datetime] NOT NULL,
...
CONSTRAINT [PK_ProductId] PRIMARY KEY CLUSTERED
(
[ProductId] ASC
)
)
Proposed Index:
CREATE NONCLUSTERED INDEX [IX_Product_ProductName] ON [dbo].[Product] [ProductName])
for this query
select * from dbo.Product where ProductName like '%furniture%'
Currently using Microsoft SQL Server 2019.
Creating a normal index will not help(*), but a full-text index will, though you would have to change your query to something like this:
select * from dbo.Product where ProductName CONTAINS 'furniture'
(* -- well, it can be slightly helpful, in that it can reduce a scan over every row and column in your table into a scan over merely every row and only the relevant columns. However, it will not achieve the orders of magnitude performance boost that we normally expect from indexes that turn scans into single seeks.)
For a double ended wildcard search as shown, an index cannot help you by restricting the rows SQL Server has to look at - a full table scan will be carried out. But it can help with the amount of data that has to be retrieved from disk.
Because in ProductName like '%furniture%', ProductName could start or end with any string, so no index can reduce the rows that have to be inspected.
However if a row in your Product table is 1,000 characters and you have 10,000 rows, you have to load that much data. But if you have an index on ProductName, and ProductName is only 50 characters, then you only have to load 10,000 * 50 rather than 10,000 * 1000.
Note: If the query was a single ended wildcard search with % at end of 'furniture%', then the proposed index would certainly help.
First you can use FTS to search words into sentences even partially (beginning by).
For those ending by or for those containing you can use a rotative indexing technic:
CREATE TABLE T_WRD
(WRD_ID BIGINT IDENTITY PRIMARY KEY,
WRD_WORD VARCHAR(64) COLLATE Latin1_General_100_BIN NOT NULL UNIQUE,
WRD_DROW AS REVERSE(WRD_WORD) PERSISTED NOT NULL UNIQUE,
WRD_WORD2 VARCHAR(64) COLLATE Latin1_General_100_CI_AI NOT NULL) ;
GO
CREATE TABLE T_WORD_ROTATE_STRING_WRS
(WRD_ID BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
WRS_ROTATE SMALLINT NOT NULL,
WRD_ID_PART BIGINT NOT NULL REFERENCES T_WRD (WRD_ID),
PRIMARY KEY (WRD_ID, WRS_ROTATE));
GO
CREATE OR ALTER TRIGGER E_I_WRD
ON T_WRD
FOR INSERT
AS
SET NOCOUNT ON;
-- splitting words
WITH R AS
(
SELECT WRD_ID, TRIM(WRD_WORD) AS WRD_WORD, 0 AS ROTATE
FROM INSERTED
UNION ALL
SELECT WRD_ID, RIGHT(WRD_WORD, LEN(WRD_WORD) -1), ROTATE + 1
FROM R
WHERE LEN(WRD_WORD) > 1
)
SELECT *
INTO #WRD
FROM R;
-- inserting missing words
INSERT INTO T_WRD (WRD_WORD, WRD_WORD2)
SELECT WRD_WORD, LOWER(WRD_WORD) COLLATE SQL_Latin1_General_CP1251_CI_AS
FROM #WRD
WHERE WRD_WORD NOT IN (SELECT WRD_WORD
FROM T_WRD);
-- inserting cross reference words
INSERT INTO T_WORD_ROTATE_STRING_WRS
SELECT M.WRD_ID, ROTATE, D.WRD_ID
FROM #WRD AS M
JOIN T_WRD AS D
ON M.WRD_WORD = D.WRD_WORD
WHERE NOT EXISTS(SELECT 1/0
FROM T_WORD_ROTATE_STRING_WRS AS S
WHERE S.WRD_ID = M.WRD_ID
AND S.WRS_ROTATE = ROTATE);
GO
Then now you can insert into the first table all the words you want from your sentences and finding it by ending of partially in querying those two tables...
As an example, word:
WITH
T AS (SELECT 'électricité' AS W)
INSERT INTO T_WRD
SELECT W, LOWER(CAST(W AS VARCHAR(64)) COLLATE SQL_Latin1_General_CP1251_CI_AS) AS W2
FROM T;
You can now use :
SELECT * FROM T_WRD;
SELECT * FROM T_WORD_ROTATE_STRING_WRS;
To find those partial words
It depends on the optimizer. Like usually requires a full table scan. if the optimizer can scan an index for matches than it will do an index scan which is faster than a full table scan.
if the optimizer does not select an index scan you can force it to use an index. You must measure performance times to determine if using an index scan decreases search time
Use with (index(index_name)) to force an index scan e.g.
select * from t1 with (index(t1i1)) where v1 like '456%'
SQL Server Index - Any improvement for LIKE queries?
If you use %search% pattern, the optimizer will always perform a full table scan.
Another technique for speeding up searches is to use substrings and exact match searches.
Yes, the part before the first % is matched against the index. Of course however, if your pattern starts with %, then a full scan will be performed instead.

SQL Server Query intermittent performance Issue

Recently we have run into performance issues with a particular query on SQL Server (2016). The problem I'm seeing is that the performance issues are incredibly inconsistent and I'm not sure how to improve this.
The table details:
CREATE TABLE ContactRecord
(
ContactSeq BIGINT NOT NULL
, ApplicationCd VARCHAR(2) NOT NULL
, StartDt DATETIME2 NOT NULL
, EndDt DATETIME2
, EndStateCd VARCHAR(3)
, UserId VARCHAR(10)
, UserTypeCd VARCHAR(2)
, LineId VARCHAR(3)
, CallingLineId VARCHAR(20)
, DialledLineId VARCHAR(20)
, ChannelCd VARCHAR(2)
, SubChannelCd VARCHAR(2)
, ServicingAgentCd VARCHAR(7)
, EucCopyTimestamp VARCHAR(30)
, PRIMARY KEY (ContactSeq)
, FOREIGN KEY (ApplicationCd) REFERENCES ApplicationType(ApplicationCd)
, FOREIGN KEY (EndStateCd) REFERENCES EndStateType(EndStateCd)
, FOREIGN KEY (UserTypeCd) REFERENCES UserType(UserTypeCd)
)
CREATE TABLE TransactionRecord
(
TransactionSeq BIGINT NOT NULL
, ContactSeq BIGINT NOT NULL
, TransactionTypeCd VARCHAR(3) NOT NULL
, TransactionDt DATETIME2 NOT NULL
, PolicyId VARCHAR(10)
, ProductId VARCHAR(7)
, EucCopyTimestamp VARCHAR(30)
, Detail VARCHAR(1000)
, PRIMARY KEY (TransactionSeq)
, FOREIGN KEY (ContactSeq) REFERENCES ContactRecord(ContactSeq)
, FOREIGN KEY (TransactionTypeCd) REFERENCES TransactionType(TransactionTypeCd)
)
Current record counts:
ContactRecord 20million
TransactionRecord 90million
My query is:
select
UserId,
max(StartDt) as LastLoginDate
from
ContactRecord
where
ContactSeq in
(
select
ContactSeq
from
TransactionRecord
where
ContactSeq in
(
select
ContactSeq
from
ContactRecord
where
UserId in
(
'1234567890',
'1234567891' -- Etc.
)
)
and TransactionRecord.TransactionTypeCd not in
(
'122'
)
)
and ApplicationCd not in
(
'1',
'4',
'5'
)
group by
UserId;
Now the query isn't great and could be improved using joins, however it does fundamentally work.
The problem I'm having is that our data job takes an input of roughly 7100 userIds. These are then broken up into groups of 500. For each 500 these are used in the IN clause in this query. The first 14 executions of this query with 500 items in the IN clause execute fine. Results are returned in roughly 15-20 seconds for each.
The issue is with the remaining 100 give or take for the last execution of this query. It never seems to complete. It just hangs. In our data job it is timing out after 10 minutes. I have no idea why. I'm not an expert with SQL Server so I'm not really sure how to debug this. I have executed each sub query independently and then replaced the contents of the sub query with the returned data. Doing this for each sub query works fine.
Any help is really appreciated here as I'm at a loss to how this works so consistently with larger amounts of parameters but just doesn't work with only a fraction.
EDIT
I've got three example of execution plans here. Please note that each of these were executed on a test server and all executed almost instantly as there is very little data on this test equivalent.
This is the execution plan for 500 arguments which executes fine in production, returning in roughly 15-20 seconds:
This is the execution plan for 119 arguments which is timing out in our data job after 10 minutes:
This is the execution plan for 5 arguments which executes fine. This query is not explicitly being executed in the data job but just for comparison:
In all instances SSMS has given the following warning:
/*
Missing Index Details from SQLQuery2.sql
The Query Processor estimates that implementing the following index could improve the query cost by 26.3459%.
*/
/*
USE [CloasIvr]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[TransactionRecord] ([TransactionTypeCd])
INCLUDE ([ContactSeq])
GO
*/
Is this the root cause to this problem?
Without seeing what's going on, it is hard to know exactly what's going on - especially with the ones that are failing. Execution plans for 'good' runs can help a bit but we're just guessing what's going wrong in bad runs.
My initial guess (similar to my comment) is that the estimates of what it expects is very wrong, and it creates a plan that is very bad.
Your TransactionRecord table in particular, with the detail column that is 1000 characters, would could have big issues with an unexpected large number of nested loops.
Indexes
The first thing I would suggest is indexing - particularly to a) only include a subset of the data you need for these, and b) to have them ordered in a useful manner.
I suggest the following two indexes would appear to help
CREATE INDEX IX_ContactRecord_User ON ContactRecord
(UserId, ContactSeq)
INCLUDE (ApplicationCD, Startdt);
CREATE INDEX IX_TransactionRecord_ContactSeq ON TransactionRecord
(ContactSeq, TransactionTypeCd);
These are both 'covering indexes', as well as being sorted in ways that can help.
Alternatively, you could replace the first one with a slightly modified version (sorting first on ContactSeq) but I think the above version would be more useful.
CREATE INDEX IX_ContactRecord_User2 ON ContactRecord
(ContactSeq)
INCLUDE (ApplicationCD, Startdt, UserId);
Also, regarding the index on TransactionRecord - if this is the only query that would be using that index, you could improve it by creating the following index instead
CREATE INDEX IX_TransactionRecord_ContactSeq_Filtered ON TransactionRecord
(ContactSeq, TransactionTypeCd)
WHERE (TransactionTypeCD <> '122');
The above is a filtered index that matches what's specified in the WHERE clause of your statement. The big thing about this is that it has already a) removed the records where the type <> '122', and b) has sorted the records already on ContactSeq so it's then easy to look them up.
By the way - given you asked about adding indexes on Foreign Keys on principle - the use of these really depends on how you read the data. If you are only ever referring to the referenced table (e.g., you have an FK to a status table, and only ever use it to report, in English, the statuses) then an index on the original table's Status_ID wouldn't help. On the other hand, if you want to find all the rows with Status_ID = 4, then it would help.
To help understanding indexes, I strongly recommend Brent Ozar's How to think like an SQL Server Engine - it really helped me to understand how indexes work in practice.
Use a sorted temp table
This may help but is unlikely to be the primary fix. If you pre-load the relevant UserIDs into a temporary table (with a primary key on UserID) then it may help with the relevant JOIN. It may also be easier for you to modify each run rather than have to modify the middle of the query.
CREATE TABLE #Users (UserId VARCHAR(10) PRIMARY KEY);
INSERT INTO #Users (UserID) VALUES
('1234567890'),
('1234567891');
Then replace the middle section of your query with
where
ContactSeq in
(
select
ContactSeq
from
ContactRecord CR
INNER JOIN #Users U ON CR.UserID = U.UserID
)
and TransactionRecord.TransactionTypeCd not in
(
'122'
)
Simplify the query
I had a go at simplifying the query, and got it to this:
select CR.UserId,
max(CR.StartDt) as LastLoginDate
from ContactRecord CR
INNER JOIN TransactionRecord TR ON CR.ContactSeq = TR.ContactSeq
where TR.TransactionTypeCd not in ('122')
AND CR.ApplicationCd not in ('1', '4', '5')
AND CR.UserId in ('1234567890', '1234567891') -- etc
group by UserId;
or alternatively (with the temp table)
select CR.UserId,
max(CR.StartDt) as LastLoginDate
from ContactRecord CR
INNER JOIN #Users U ON CR.UserID = U.UserID
INNER JOIN TransactionRecord TR ON CR.ContactSeq = TR.ContactSeq
where TR.TransactionTypeCd not in ('122')
AND CR.ApplicationCd not in ('1', '4', '5')
group by UserId;
One advantage of simplifying the query, is that it also helps SQL Server get good estimates; which in turn help it get good execution plans.
Of course, you would need to test that the above returns exactly the same records in your circumstances - I don't have a data set to test on, so I cannot be 100% sure these simplified versions match the original.

Nested Loop in Where Statement killing performance

I am having serious performance issues when using a nested loop in a WHERE clause.
When I run the below code as is, it takes several minutes. The trick is I'm using the WHERE clause to pull ALL data if the report_id is NULL, but only certain report_id's if I set them in the parameter string.
The function [fn_Parse_List] turns a VARCHAR string such as '123,456,789' into a table where each row is each number in integer form, which is then used in the IN clause.
When I run the code below with report_id = '456' (the dashed out portion), the code takes seconds, but passing the temporary table and using the SELECT statement in the WHERE clause kills it.
alter procedure dbo.p_revenue
(#report_id varchar(max) = NULL)
as
select cast(value as int) Report_ID
into #report_ID_Temp
from [fn_Parse_List] (#report_id)
SELECT *
FROM BIGTABLE
where #report_id is null
or a.report_id in (select Report_ID from #report_ID_Temp)
--Where #report_id is null or a.report_id in (456)
exec p_revenue #report_id = '456'
Is there a way to optimize this? I tried a JOIN with the table #report_ID_Temp, but it still takes just as long and doesn't work when the report_id is NULL.
You're breaking three different rules.
If you want two query plans, you need two queries: OR does not give you two query plans. IF does.
If you have a temporary table, make sure it has a primary key and any appropriate indexes. In your case, you need an ALTER TABLE statement to add the primary key clustered index. Or you can CREATE TABLE to declare the structure in the first place.
If you think fn_Parse_List is a good idea, you haven't read enough Sommarskog
If I were to write the Stored Procedure for your case, I would use a Table Valued Parameter (TVP) instead of passing multiple values as a comma-seperated string.
Something like the following:
-- Create a type for the TVP
CREATE TYPE REPORT_IDS_PAR AS TABLE(
report_id INT
);
GO
-- Use the TVP type instead of VARCHAR
CREATE PROCEDURE dbo.revenue
#report_ids REPORT_IDS_PAR READONLY
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS(SELECT 1 FROM #report_ids)
SELECT
*
FROM
BIGTABLE;
ELSE
SELECT
*
FROM
#report_ids AS ids
INNER JOIN BIGTABLE AS bt ON
bt.report_id=ids.report_id;
-- OPTION(RECOMPILE) -- see remark below
END
GO
-- Execute the Stored Procedure
DECLARE #ids REPORT_IDS_PAR;
-- Empty table for all rows:
EXEC dbo.revenue #ids;
-- Specific report_id's for specific rows:
INSERT INTO #ids(report_id)VALUES(123),(456),(789);
EXEC dbo.revenue #ids;
GO
If you run this procedure with a TVP with a lot of rows or a wildly varying number of rows, I suggest you add the option OPTION(RECOMPILE) to the query.
I see 2 possible things that could help improve performance. Depends on which part is taking the longest. First off, SELECT INTO is a single threaded operation until SQL Server 2014. If this is taking a long time, create an explicitly defined temp table with CREATE TABLE. Secondly, depending on the number of records inserted into the temp table, you probably need an index on the Report_ID column. That can all be done in the body of the stored procedure. If you do end up using an explicitly defined temp table, I would create the index after the data is loaded.
If that doesn't help, first check that the report_id column on the BIGTABLE is indexed. Then try splitting the select into 2 and combining with a UNION ALL like this:
ALTER PROCEDURE dbo.p_revenue
(
#report_id VARCHAR(MAX) = NULL
)
AS
SELECT CAST(value AS INT) Report_ID
INTO #report_ID_Temp
FROM fn_Parse_List(#report_id);
SELECT *
FROM BIGTABLE
WHERE #report_id IS NULL
UNION ALL
SELECT *
FROM BIGTABLE
WHERE a.report_id IN ( SELECT Report_ID
FROM #report_ID_Temp );
GO
EXEC p_revenue #report_id = '456';
Are you saying I should have two queries, one where it pulls if the report_id doesn't exists and one where there is a list of report_ids?
Yes, yes, yes. The fact, that it somehow works when You enter the numbers directly, distracts You from the core problem. You need table scan when #report_id is null and index seek when it is not and You can not have both in one execution plan. The performance would inevitably have to suffer, one way or another.
I would prefer not to, as the table i'm pulling from is actually a
view with 800 lines with an additional parameter not shown above.
I do not see where is the problem, SELECT * FROM BIGTABLE and SELECT * FROM BIGVIEW seems the same. If You need parameters You can use inline table valued function. If You have more parameters with variable selectivity like #report_id, I guess You would end up with dynamic sql anyway, sooner or later.
UNION ALL as proposed by #db_brad would help, but one of those subquery is executed even when there is no need for it.
As a quick patch You can append OPTION(RECOMPILE) to the SELECT and have table scan one time and index seek the other time, but recompiling every time would induce nontrivial overhead.

Execution Plan shows a sort - but can't figure a way around because of query

Here is my query -- the last query is what is causing me pain:
The address.postcode field is a varchar(14) and you can see the input format the user sends in.
DECLARE #ZipCode NVARCHAR(MAX) = ('06409;06471;11763;06443;06371;11949;11946;11742')
IF OBJECT_ID('tempdb..#ZipCodes') IS NOT NULL DROP TABLE #ZipCodes;
CREATE TABLE #ZipCodes (
Zipcode NVARCHAR(6)
)
INSERT INTO #ZipCodes ( Zipcode )
SELECT zip.Token + '%'
FROM DMS.fn_SplitList(#ZipCode, ';') zip
CREATE NONCLUSTERED INDEX [idx_Zip] ON #ZipCodes (Zipcode)
IF OBJECT_ID('tempdb..#ZipCodesConstituents') IS NOT NULL DROP TABLE #ZipCodesConstituents;
CREATE TABLE #ZipCodesConstituents (
ConstituentID UNIQUEIDENTIFIER
, PostCode NVARCHAR(12)
)
CREATE NONCLUSTERED INDEX [idx_ZipCodesConstituents] ON #ZipCodesConstituents (ConstituentID, PostCode)
INSERT INTO #ZipCodesConstituents ( ConstituentID, PostCode )
SELECT a.CONSTITUENTID
, a.POSTCODE
FROM #ZipCodes zip
JOIN DMS.address a
ON a.POSTCODE LIKE zip.Zipcode
where a.ISPRIMARY = 1
I am trying to attach the execution plan -- but not having any luck...
Basically the section of code has an Est Cost of 61.9%
and the Sort is 61.5%
I tried to evaluate the behavior, but on my tests, I can't force a sort operator in the last INSERT. But what I see are the following two issues.
You create the index, and afterwards insert into the table. This may be harmful. Not in all cases, but it may force the sort in your query, as SQL Server tries to ensure the right order during the insert to help the index at its reorganize.
Your using UNIQUEIDENTIFIER as your primary keys. This may be useful in a way, but I think in your case a simple IDENTITY(1,1) column would be enough, doesn't it? The UNIQUEIDENTIFIER in your index, will heavily force a fragmentation. It might not be the best idea to solve it this way.
I tested both variants with a test-set of 100.000 rows.
These are my results in measurement of the execution costs:
In all cases, the costs are significantly lower, if the Index is created after the INSERT of #ZipCodesConstituents.
The usage of an INDENTITY instead of an UNIQUEIDENTIFIER will additionally boost the performance.
It would be wise to add a Index on your Address table, if your running this sort of query more often.
Here are the measurements in cost-points (cp) - the lower the better:
UniqueIdentifier + Index before: 20 cp for the Insert.
UniqueIdentifier + Index after Insert: 8 cp for the Insert + 7 for the Index (where the SORT occurs).
Identity + Index before: 18 cp for the Insert
Identity + Index after: 7 cp for the Insert + 6 for the Index
Identity + Index after + Index on Address: 3 for the Insert + 6 for the Index
And the winner is: Identity Column + Index and maybe an Index on your Address too.
The index I used to boost both of the statements (UniqueIdentifier and Identity) is this one:
CREATE NONCLUSTERED INDEX [NCI_Adress_IsPrimary_Postcode]
ON Address ([IsPrimary],[PostCode])
INCLUDE ([Constituentid])
In my testcase it took 13 cp to build it. If you just use this once, it won't be helpful! If you use this statement quite often or even sometimes a day/week it may be useful for you.
Hopefully this will solve your problems.

Using User Defined Functions and performance?

I'm using stored procedure to fetch data and i needed to filter dynamically. For example if i dont want to fetch some data which's id is 5, 10 or 12 im sending it as string to procedure and im converting it to table via user defined function. But i must consider performance so here is a example:
Solution 1:
SELECT *
FROM Customers
WHERE CustomerID NOT IN (SELECT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',','));
Solution 2:
CREATE TABLE #tempTable (Value NVARCHAR(4000));
INSERT INTO #tempTable
SELECT Value FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
SELECT *
FROM BusinessAds
WHERE AdID NOT IN (SELECT Value FROM #tempTable)
DROP TABLE #tempTable
Which solution is better for performance?
You would probably be better off creating the #temp table with a clustered index and appropriate datatype
CREATE TABLE #tempTable (Value int primary key);
INSERT INTO #tempTable
SELECT DISTINCT Value
FROM dbo.func_ConvertListToTable('4,6,5,1,2,3,9,222',',')
You can also put a clustered index on the table returned by the TVF.
As for which is better SQL Server will always assume that the TVF will return 1 row rather than recompiling after the #temp table is populated, so you would need to consider whether this assumption might cause sub optimal query plans for the case that the list is large.