How to optimize a select top N Query - sql

I have a very large table, consisting of 40 million rows, in a SQL Server 2008 Database.
CREATE TABLE [dbo].[myTable](
[ID] [bigint] NOT NULL,
[CONTRACT_NUMBER] [varchar](50) NULL,
[CUSTOMER_NAME] [varchar](200) NULL,
[INVOICE_NUMBER] [varchar](50) NULL,
[AGENCY] [varchar](50) NULL,
[AMOUNT] [varchar](50) NULL,
[INVOICE_MONTH] [int] NULL,
[INVOICE_YEAR] [int] NULL,
[Unique_ID] [bigint] NULL,
[bar_code] [varchar](50) NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[ID] ASC,
[bar_code] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I am trying to optimize performance for the following query:
SELECT top 35 ID,
CONTRACT_NR,
CUSTOMER_NAME,
INVOICE_NUMBER,
AMOUNT,
AGENCY,
CONTRACT_NUMBER,
ISNULL([INVOICE_MONTH], 1) as [INVOICE_MONTH],
ISNULL([INVOICE_YEAR], 1) as [INVOICE_YEAR],
bar_code,
Unique_ID
from MyTable
WHERE
CONTRACT_NUMBER like #CONTRACT_NUMBER and
INVOICE_NUMBER like #INVOICE_NUMBER and
CUSTOMER_NAME like #CUSTOMER_NAME
ORDER BY Unique_ID desc
In order to do that i build an included index on the columns CONTRACT_NUMBER, INVOICE_NUMBER and CUSTOMER_NAME.
CREATE NONCLUSTERED INDEX [ix_search_columns_without_uniqueid] ON [dbo].[MyTable]
(
[CONTRACT_NUMBER] ASC,
[CUSTOMER_NAME] ASC,
[INVOICE_NUMBER] ASC
)
INCLUDE ( [ID],
[AGENCY],
[AMOUNT],
[INVOICE_MONTH],
[INVOICE_YEAR],
[Unique_ID],
[Contract_nr],
[bar_code]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Still the query is taking from 3 sec to 10 sec to execute. From the query execution plan i see that an index seek operation is taking place consuming about 30% of the total workload and than a Sort (Top N) operation which is consuming the other 70%. Any idea how can i optimize this query, a response time of less than 1 sec is preferred?
Note: I tried also to include dhe column [Unique_ID] in the index columns. In this case the query execution plan is doing an index scan, but with many users querying the database, i am having the same problem.

Check this page for more detail.
Update the statistic with a full scan to make the optimizer work easier.
UPDATE STATISTICS tablename WITH fullscan
GO
Set statistics time on and execute the following query
SET STATISTICS time ON
GO
SELECT num_of_reads, num_of_bytes_read,
num_of_writes, num_of_bytes_written
FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)
GO
SELECT TOP 100 c1, c2,c3
FROM yourtablename
WHERE c1<30000
ORDER BY c2
GO
SELECT num_of_reads, num_of_bytes_read,
num_of_writes, num_of_bytes_written
FROM sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)
GO
Result
CPU time = 124 ms, elapsed time = 91 ms
Before Query execution
num_of_reads num_of_bytes_read num_of_writes num_of_bytes_written
-------------------- -------------------- -------------------- --------------------
725864 46824931328 793589 51814416384
After Query execution
num_of_reads num_of_bytes_read num_of_writes num_of_bytes_written
-------------------- -------------------- -------------------- --------------------
725864 46824931328 793589 51814416384
Source : https://www.mssqltips.com/sqlservertip/2053/trick-to-optimize-top-clause-in-sql-server/

Try and replace your clustered index (currently on two columns) with one solely on unique_id (assuming that it really is unique). This will aid your sorting. Then add a second covering index - as you have tried - on the three columns used in the WHERE. Check your statistics are upto date. Ihave a feeling that the column bar_code in your PK is preventing your sort from running as quickly as it could.
Do your variables contain wildcards?If they do,and they are leading wildcards, the index on the WHERE columns cannot be used. If they are not wildcarded, try a direct "=", assuming case-sensitivity is not an issue.
UPDATE: since you have leading wildcards, you will not be able to take advantage of an index on CONTRACT_NUMBER , INVOICE_NUMBER or CUSTOMER_NAME: as GriGrim suggested, the only alternative here is to use fulltext searches (CONTAINS keyword etc.).

Related

SQL Server - Is creating an index with a unique constraint as one of the columns necessary?

I have the query below:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = #input
AND FK_SLA_Process = #input2
AND IsActive = 1
And this is my index for this SLA table.
CREATE INDEX IX_SLA_SLAName_FK_SLA_Process_IsActive ON dbo.SLA (SLAName, FK_SLA_Process, IsActive) INCLUDE (SLATimeInSeconds)
However, the SLAName column is unique so it has a unique constraint/index.
Is my created index an overkill? Do I still need it or will SQL Server use the index created on the unique column SLAName?
It would be an "overkill" if your index would only be on SLAName, but you are also ordering by FK_SLA_Process and IsActive so queries that need needs columns will benefit more from your index and less if you just had the unique one.
So for a query like this:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Both index will yield the same results and there would be no point in yours. But for queries like:
SELECT PrimaryKey
FROM dbo.SLA
WHERE SLAName = 'SomeName'
AND FK_SLA_Process = 'Some Value'
Or
SELECT SLATimeInSeconds
FROM dbo.SLA
WHERE SLAName = 'SomeName'
Your index will be better than the unique one (2nd example is a covering index).
You should inspect which kind of SELECT you do to this table and decide if you need this one or not. Keep in mind that having many indexes might speed up selects but slow down inserts, updates and deletes.
Assuming you have a such table declaration:
CREATE TABLE SLA
(
ID INT PRIMARY KEY,
SLAName VARCHAR(50) NOT NULL UNIQUE,
fk_SLA INT,
IsActive TINYINT
)
Under the hood we have two indexes:
CREATE TABLE [dbo].[SLA](
[ID] [int] NOT NULL,
[SLAName] [varchar](50) NOT NULL,
[fk_SLA] [int] NULL,
[IsActive] [tinyint] NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
So this query will have an index seek and has an optimal plan:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
Its query plan indicates index seek because we are searching by index UNIQUE NONCLUSTERED ([SLAName] ASC ) and don't use other columns in WHERE statement:
But if you add extra parameters into WHERE:
SELECT s.ID
FROM dbo.SLA s
WHERE s.SLAName = 'test'
AND s.fk_SLA = 1
AND s.IsActive = 1
Execution plan will have extra look up:
Lookup happens when index does not have necessary information. SQL Query engine has to get out from UNIQUE NONCLUSTERED index data structure to find data of columns fk_SLA and IsActive in your table SLA.
So your index is overkill as you have UNIQUE NONCLUSTERED index:
UNIQUE NONCLUSTERED
(
[SLAName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
If SLAName column is unique and it has a unique constraint, any query that returns only one or 0 rows (all the queries with a point search that include SLAName = 'SomeName' condition) will use the unique index and make (maximum) one lookup in the base table.
Unless your queries have a range search like SLAName like 'SomeName%' there is no need in covering index because index search + 1 lookup is almost the same as only index search, and there is no need to waste space / maintain another index for such a miserable performance gain.

How to improve perfomance

i have the following table structure :
CREATE TABLE [dbo].[TableABC](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[FieldA] [nvarchar](36) NULL,
[FieldB] [int] NULL,
[FieldC] [datetime] NULL,
[FieldD] [nvarchar](255) NULL,
[FieldE] [decimal](19, 5) NULL,
PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I do two type of CRUD operations with this table.
SELECT * FROM [dbo].[TableABC] WHERE FieldA = #FieldA
INSERT INTO [dbo].[TableABC](FieldA,FieldB,FieldC,FieldD,FieldE) VALUES (#FieldA,#FieldB,#FieldC,#FieldD,#FieldE)
FieldA has a unique value, but there is no constraint in the table.
Currently there are 6070755 rows in the table. Along with data growing , performance is getting slow.
Any suggestion , how to improve perfomance ? How to make CREATE and READ operation faster ?
now i faced problem , that select and insert takes too long , sometime more then 60 seconds
Read up on SQL basics- and Indices DEFINITELY are one. And if you have a unique value and no index on the field (constraint is irrelevant, unique index is good neough) - yes, that will get slower. SQL Server has to check the whole table.
So:
Add a unique index to Field a.
Given your 2 statements and the little "FieldA has a unique value, but there is no constraint in the table." I assume you are trying to enforce unique values there by selecting first. This will slow you down.
Instead make the index, and then try/catch the non unique sql errors - WAY faster. WAY faster. The index will make the insert a LITTLE slower, but you can save on the very slow select you do not totally.

Changing cte for microsoft access

I've tried searching for an answer to this but can't find one.
I have a CTE I use for SQL queries relating to 2 data tables in a database. The primary key of one table is a foreign key in the other and can appear numerous times in the 2nd table. I want to do a count of the number of times each foreign key appears in the second table, and list this as a total field in my search results along with details from the first table. As CTEs don't work in Access I've adjusted this to use a sub select in the join, but it still doesn't like it in access.
Here are the basic parts of the tables
CREATE TABLE [dbo].[Clients](
[ClientRef] [int] NOT NULL,
[Surname] [varchar](40) NULL,
[Forenames] [varchar](50) NULL,
[Title] [varchar](40) NULL,
CONSTRAINT [CLIE_ClientRef_PK] PRIMARY KEY CLUSTERED
(
[ClientRef] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Policies](
[PolicyRef] [int] NOT NULL,
[ClientRef] [int] NULL,
CONSTRAINT [POLI_PolicyRef_PK] PRIMARY KEY CLUSTERED
(
[PolicyRef] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Here's my CTE
WITH CliPol (ClientRef, Plans) AS (SELECT ClientRef, COUNT(ClientRef) AS Plans FROM Policies GROUP BY ClientRef)
SELECT Clients.Surname, Clients.Forenames, Clients.Title, CliPol.Plans AS [No. of plans]
FROM Clients LEFT JOIN CliPol ON Clients.ClientRef = CliPol.ClientRef
ORDER BY Surname, Forenames;
And here's my adjusted query.
SELECT Clients.ClientRef, Clients.Surname, Clients.Forenames, Clients.Title , Plans.NoPlans
FROM Clients
LEFT JOIN
(SELECT ClientRef, COUNT(ClientRef) AS NoPlans FROM Policies GROUP BY ClientRef)
AS Plans ON Plans.ClientRef = Clients.ClientRef
ORDER BY Clients.Surname, Clients.Forenames
Unfortunately Access throws error #3131, "Syntax error in FROM clause", when I try to run that query.
Does anybody know how I make this work in Access?
One alternate approach would be to use the DCount() domain aggregate function
SELECT
Clients.ClientRef,
Clients.Surname,
Clients.Forenames,
Clients.Title ,
DCount("ClientRef", "Plans", "ClientRef=" & Clients.ClientRef) AS NoPlans
FROM Clients
ORDER BY Clients.Surname, Clients.Forenames

How can LIKE '%...' seek on an index?

I would expect these two SELECTs to have the same execution plan and performance. Since there is a leading wildcard on the LIKE, I expect an index scan. When I run this and look at the plans, the first SELECT behaves as expected (with a scan). But the second SELECT plan shows an index seek, and runs 20 times faster.
Code:
-- Uses index scan, as expected:
SELECT 1
FROM AccountAction
WHERE AccountNumber LIKE '%441025586401'
-- Uses index seek somehow, and runs much faster:
declare #empty VARCHAR(30) = ''
SELECT 1
FROM AccountAction
WHERE AccountNumber LIKE '%441025586401' + #empty
Question:
How does SQL Server use an index seek when the pattern starts with a wildcard?
Bonus question:
Why does concatenating an empty string change/improve the execution plan?
Details:
There is a non-clustered index on Accounts.AccountNumber
There are other indexes, but both the seek and the scan are on this index.
The Accounts.AccountNumber column is a nullable varchar(30)
The server is SQL Server 2012
Table and index definitions:
CREATE TABLE [updatable].[AccountAction](
[ID] [int] IDENTITY(1,1) NOT NULL,
[AccountNumber] [varchar](30) NULL,
[Utility] [varchar](9) NOT NULL,
[SomeData1] [varchar](10) NOT NULL,
[SomeData2] [varchar](200) NULL,
[SomeData3] [money] NULL,
--...
[Created] [datetime] NULL,
CONSTRAINT [PK_Account] PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_updatable_AccountAction_AccountNumber_UtilityCode_ActionTypeCd] ON [updatable].[AccountAction]
(
[AccountNumber] ASC,
[Utility] ASC
)
INCLUDE ([SomeData1], [SomeData2], [SomeData3]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE CLUSTERED INDEX [CIX_Account] ON [updatable].[AccountAction]
(
[Created] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
NOTE:
Here is the actual execution plan for the two queries. The names of the objects differ slightly from the code above because I was trying to keep the question simple.
These tests (database AdventureWorks2008R2) shows what happens:
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'Test #1';
SELECT p.BusinessEntityID, p.LastName
FROM Person.Person p
WHERE p.LastName LIKE '%be%';
PRINT 'Test #2';
DECLARE #Pattern NVARCHAR(50);
SET #Pattern=N'%be%';
SELECT p.BusinessEntityID, p.LastName
FROM Person.Person p
WHERE p.LastName LIKE #Pattern;
SET STATISTICS IO OFF;
SET NOCOUNT OFF;
Results:
Test #1
Table 'Person'. Scan count 1, logical reads 106
Test #2
Table 'Person'. Scan count 1, logical reads 106
The results from SET STATISTICS IO shows that LIO are the same.
But the execution plans are quite different:
In the first test, SQL Server uses an Index Scan explicit but in the second test SQL Server uses an Index Seek which is an Index Seek - range scan. In the last case SQL Server uses a Compute Scalar operator to generate these values
[Expr1005] = Scalar Operator(LikeRangeStart([#Pattern])),
[Expr1006] = Scalar Operator(LikeRangeEnd([#Pattern])),
[Expr1007] = Scalar Operator(LikeRangeInfo([#Pattern]))
and, the Index Seek operator use an Seek Predicate (optimized) for a range scan (LastName > LikeRangeStart AND LastName < LikeRangeEnd) plus another unoptimized Predicate (LastName LIKE #pattern).
How can LIKE '%...' seek on an index?
My answer: it isn't a "real" Index Seek. It's a Index Seek - range scan which, in this case, has the same performance like Index Scan.
Please see, also, the difference between Index Seek and Index Scan (similar debate):
So…is it a Seek or a Scan?.
Edit 1: The execution plan for OPTION(RECOMPILE) (see Aaron's recommendation please) shows, also, an Index Scan (instead of Index Seek):

SQL efficiency rows or tables

I'm creating a database that holds yield values of electric engines. The yield values are stored in an Excel file which I have to transfer to the database. Each test for an engine has 42 rows (torque) and 42 columns (power in kw) with the values stored in these cells.
(kw) 1,0 1,2 ...(42x)
-------- -------
(rpm)2000 76,2 77,0
2100 76,7 77,6
...
(42x)
Well I thought of creating a column for engine_id, test_id (each engine can have more than one test), and 42 columns for the corresponding yield values. For each test I have to add 42 rows for one single engine with the yield values. This doesn't seem efficient nor easy to implement to me.
If there are 42 records (rows) for 1 single engine, in a matter of time the database will hold up several thousands of rows and searching for a specific engine with the corresponding values will be an exhausting task.
If I make for each test for a specific engine a separate table, again after some time I would I have probably thousands of tables. Now what should I go for, a table with thousands of records or a table with 42 columns and 42 rows? Either way, I still have redundant records.
A database is definitely the answer (searching through many millions, or hundred of millions of rows is pretty easy once you get the hang of SQL (the language for interacting with databases). I would recommend a table structure of
EngineId, TestId, TourqueId, PowerId, YieldValue
Which would have values...
Engine1, Test1, 2000, 1.0, 73.2
So only 5 columns. This will give you the flexibility to add more yield results in future should it be required (or even if its not, its just an easier schema anyway). You will need to learn SQL, however, to realise the power of the database over a spreadsheet. Also, there are many techniques for importing Excel data to SQL, so you should investigate that (Google it). If you find you are transferring all that data by hand then you are doing something wrong (not wrong really, but inefficient!).
Further to your comments, here is the exact schema with index (in MS SQL Server)
CREATE TABLE [dbo].[EngineTestResults](
[EngineId] [varchar](50) NOT NULL,
[TestId] [varchar](50) NOT NULL,
[Tourque] [int] NOT NULL,
[Power] [decimal](18, 4) NOT NULL,
[Yield] [decimal](18, 4) NOT NULL,
CONSTRAINT [PK_EngineTestResults] PRIMARY KEY CLUSTERED
(
[EngineId] ASC,
[TestId] ASC,
[Tourque] ASC,
[Power] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Index [IX_EngineTestResults] Script Date: 01/14/2012 14:26:21 ******/
CREATE NONCLUSTERED INDEX [IX_EngineTestResults] ON [dbo].[EngineTestResults]
(
[EngineId] ASC,
[TestId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
So note that there is no incrementing primary key...the key is (EngineId, TestId, Torque, Power). To get the results for a particular engine you would run a query like the following:
Select * from EngineTestResults where engineId = 'EngineABC' and TestId = 'TestA'
Note that I have added an index for that set of criteria.
The strength of a relational database is the ability to normalize data across multiple tables, so you could have one table for engines, one for tests and one for results. Something like the following:
CREATE TABLE tbl__engines (
`engine_id` SMALLINT UNSIGNED NOT NULL,
`name` VARCHAR(255) NOT NULL,
PRIMARY KEY(engine_id)
);
CREATE TABLE tbl__tests (
`test_id` INT UNSIGNED NOT NULL,
`engine_id` SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(test_id),
FOREIGN KEY(engine_id) REFERENCES tbl__engines(engine_id)
);
CREATE TABLE tbl__test_result (
`result_id` INT UNSIGNED NOT NULL,
`test_id` INT UNSIGNED NOT NULL,
`torque` INT NOT NULL,
`power` DECIMAL(6,2) NOT NULL,
`yield` DECIMAL(6,2) NOT NULL,
FOREIGN KEY(test_id) REFERENCES tbl__tests(test_id)
);
Then you can simply perform a join across these three tables to return the required results. Something like:
SELECT
*
FROM `tbl__engines` e
INNER JOIN `tbl__tests` t ON e.engine_id = t.engine_id
INNER JOIN `tbl__results` r ON r.test_id = t.test_id;