Quick SELECT sometimes time out - sql

I have stored procedure which execute simple select. Any time I run it manually, it runs under the second. But in production (SQL Azure S2 database) it runs inside scheduled task every 12 ours - so I think it is reasonable to expect it to run every time with "cold" - with no cached data. And the performance is very unpredictable - sometimes it takes 5 second, sometimes 30 and sometimes even 100.
The select is optimized to the maximum (of my knowledge, anyway) - I created filtered index including all the columns returned from SELECT, so the only operation in execution plan is Index scan. There is huge difference between estimated and actual rows:
But overall the query seems pretty lightweight. I do not blame environment (SQL Azure) because there is A LOT of queries executing all the time, and this one is the only one with this performance problem.
Here is XML execution plan for SQL ninjas willing to help : http://pastebin.com/u5GCz0vW
EDIT:
Table structure:
CREATE TABLE [myproject].[Purchase](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProductId] [nvarchar](50) NOT NULL,
[DeviceId] [nvarchar](255) NOT NULL,
[UserId] [nvarchar](255) NOT NULL,
[Receipt] [nvarchar](max) NULL,
[AppVersion] [nvarchar](50) NOT NULL,
[OSType] [tinyint] NOT NULL,
[IP] [nchar](15) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ValidationState] [smallint] NOT NULL,
[ValidationInfo] [nvarchar](max) NULL,
[ValidationError] [nvarchar](max) NULL,
[ValidatedOn] [datetime] NULL,
[PurchaseId] [nvarchar](255) NULL,
[PurchaseDate] [datetime] NULL,
[ExpirationDate] [datetime] NULL,
CONSTRAINT [PK_Purchase] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
Index definition:
CREATE NONCLUSTERED INDEX [IX_AndroidRevalidationTargets3] ON [myproject].[Purchase]
(
[ExpirationDate] ASC,
[ValidatedOn] ASC
)
INCLUDE ( [ProductId],
[DeviceId],
[UserId],
[Receipt],
[AppVersion],
[OSType],
[IP],
[CreatedOn],
[ValidationState],
[ValidationInfo],
[ValidationError],
[PurchaseId],
[PurchaseDate])
WHERE ([OSType]=(1) AND [ProductId] IS NOT NULL AND [ProductId]<>'trial' AND ([ValidationState] IN ((1), (0), (-2))))
Data can be considered sensitive, so I cant provide sample.

Since your query returns only 1 match, I think you should trim down your index to a bare minimum. You can get the remaining columns via a Key Lookup from the clustered index:
CREATE NONCLUSTERED INDEX [IX_AndroidRevalidationTargets3] ON [myproject].[Purchase]
(
[ExpirationDate] ASC,
[ValidatedOn] ASC
)
WHERE ([OSType]=(1) AND [ProductId] IS NOT NULL AND [ProductId]<>'trial' AND ([ValidationState] IN ((1), (0), (-2))))
This doesn't eliminate the scan, but it makes the index much leaner for a fast read.
Edit: OP stated that the slimmed-down index was ignored by SQL Server. You can force SQL Server to use the filter index:
SELECT *
FROM [myproject].[Purchase] WITH (INDEX(IX_AndroidRevalidationTargets3))

Related

Index seek and Table scan takes the same time to execute

I have a table with the structure like this:
CREATE TABLE [dbo].[User]
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[CountryCode] [NVARCHAR](2) NOT NULL DEFAULT (N'GB'),
[CreationDate] [DATETIME2](7) NOT NULL,
[Email] [NVARCHAR](256) NULL,
[EmailConfirmed] [BIT] NOT NULL,
[FirstName] [NVARCHAR](MAX) NOT NULL,
[LastName] [NVARCHAR](MAX) NOT NULL,
[LastSignIn] [DATETIME2](7) NOT NULL,
[LockoutEnabled] [BIT] NOT NULL,
[LockoutEnd] [DATETIMEOFFSET](7) NULL,
[NormalizedEmail] [NVARCHAR](256) NULL,
[NormalizedUserName] [NVARCHAR](256) NULL,
[PasswordHash] [NVARCHAR](MAX) NULL,
[SecurityStamp] [NVARCHAR](MAX) NULL,
[TimeZone] [NVARCHAR](64) NOT NULL DEFAULT (N'Europe/London'),
[TwoFactorEnabled] [BIT] NOT NULL,
[UserName] [NVARCHAR](256) NULL,
[LastInfoUpdate] [DATETIME] NOT NULL
)
I have around a million rows in that table, and I want to apply a nonclustered index to the [LastInfoUpdate] column.
So I've created a non-clustered index using this command:
CREATE NONCLUSTERED INDEX IX_ProductVendor_VendorID1
ON [dbo].[TestUsers] (LastInfoUpdate)
INCLUDE(Email)
And once I'm trying to run simple query like that:
SELECT [LastInfoUpdate]
FROM [dbo].[TestUsers]
WHERE [LastInfoUpdate] >= GETUTCDATE()
I just get the same result in timing as without index. According to SQL Server Profiler with db does index seek while using index and just use less cpu resources in comparison with case without index but what is important for me it's time. What time the same? What am I doing wrong?
Execution Plan of table scan
Execution plan of Index Scan
Index seek Execution Plan file
Just create the following index:
CREATE INDEX IX_Users_EventDate ON Users(EventDate)
INCLUDE (EventId)
And the following query will be fast:
SELECT EventId, EventDate
FROM Users
WHERE EventDate <= GETUTCDATE()
Because the index is a covering index.
The key of a covering index must include columns referenced in WHERE and ORDER BY clauses. And the covering index must include all columns referenced on the SELECT list.
The query you posted doesn't match to the query plans you linked. The query plans are for the above query.
Another thing to take into account is the number of records returned by the query. If they are many, the query cannot be fast enough, because it needs to read all the data and send it to the network.
Try to use ColumnStore index. It is faster when you want to get some range of columns:
CREATE NONCLUSTERED COLUMNSTORE INDEX
[csi_User_LastInfoUpdate_Email] ON
[dbo].[User] ( [LastInfoUpdate], [Email] )WITH
(DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
An article about column store index.
"WHERE [LastInfoUpdate] >= GETUTCDATE()" might return a lot of results. Tablescan can in this case be faster than index seek and subsequent adding information from the tabledata.
By adding the queried information to the index you can avoid the costly subsequent looks into tabledata.

Optimize performance on SQL table - indexing

I have a table in my DB which contains 5 million records:
CREATE TABLE [dbo].[PurchaseFact](
[Branch] [int] NOT NULL,
[ProdAnal] [varchar](30) NULL,
[Account] [varchar](12) NULL,
[Partno] [varchar](24) NULL,
[DteGRN] [date] NULL,
[DteAct] [date] NULL,
[DteExpect] [date] NULL,
[OrderNo] [bigint] NULL,
[GRNNO] [varchar](75) NULL,
[SuppAdv] [varchar](75) NULL,
[Supplier] [varchar](12) NULL,
[OrdType] [varchar](4) NULL,
[UnitStock] [varchar](4) NULL,
[OrderQty] [float] NULL,
[RecdQty] [float] NULL,
[Batch] [varchar](100) NULL,
[CostPr] [float] NULL,
[Reason] [varchar](2) NULL,
[TotalCost] [float] NULL,
[Magic] [bigint] IDENTITY(1,1) NOT NULL,
PRIMARY KEY CLUSTERED
(
[Magic] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
As you can see from the above - a CLUSTERED INDEX is being used on the MAGIC column which is a UNIQUE column.
Data retrieval time for the following SELECT statement is well over 8mins which causes reporting issues:
SELECT Branch,
Supplier,
ProdAnal,
DteGRN AS Date,
PartNo AS Partno,
OrderNo,
OrderQty,
TotalCost,
CostPr
FROM dbo.PurchaseFact src
WHERE YEAR(DteGRN) = 2016
Excluding the WHERE clause also doesn't make the query run any faster.
I have tried, together with the CLUSTERED index to include a UNIQUE index in the hopes that it would run faster but to no avail :
CREATE UNIQUE INDEX Unique_Index ON dbo.PurchaseFact ([Branch], [Supplier], [Magic])
INCLUDE ([ProdAnal], [Account], [Partno], [DteAct], [DteExpect], [OrderNo], [GRNNO],
[SuppAdv], [OrdType], [UnitStock])
Is there any way I can optimize performance time on this table or should I resort to archiving old data?
Any advice would be greatly appreciated.
This is your where clause:
WHERE YEAR(DteGRN) = 2016
If the table has 5 million rows, then this is going to return a lot of data, assuming any reasonable distribution of dates. The volume of data is probably responsible for the length of time for the query.
One thing you can do is to rephrase the WHERE and then put an index on the appropriate column:
WHERE DteGRN >= '2016-01-01' and DteGRN < '2017-01-01'
This can then take advantage of an index on PurchaseFact(DteGRN). However, given the likely number of rows being returned, the index probably will not help very much.
The bigger question is why your reporting application is bringing back all the rows from 2016, rather than summarizing them inside the database. I suspect you have an architecture issue with the reporting application.
Sorry, can't add comments.
To help improve performance further (if you can live with the UPDATE overhead), create a COVERING INDEX that INCLUDES only the columns in the SELECT part of the query.

Implementing custom fields in a database for large numbers of records

I'm developing an app which requires a user defined custom fields on a contacts table. This contact table can contain many millions of contacts.
We're looking at using a secondary metadata table which stores information about the fields, along with a tertiary value table which stores the actual data.
Here's the rough schema:
CREATE TABLE [dbo].[Contact](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NULL,
[MiddleName] [nvarchar](max) NULL,
[LastName] [nvarchar](max) NULL,
[Email] [nvarchar](max) NULL
)
CREATE TABLE [dbo].[CustomField](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FieldName] [nvarchar](50) NULL,
[Type] [varchar](50) NULL
)
CREATE TABLE [dbo].[ContactAndCustomField](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ContactID] [int] NULL,
[FieldID] [int] NULL,
[FieldValue] [nvarchar](max) NULL
)
However, this approach introduces a lot of complexity, particularly with regard to importing CSV files with multiple custom fields. At the moment this requires a update/join statement and a separate insert statement for every individual custom field. Joins would also be required to return custom field data for multiple rows at once
I've argued for this structure instead:
CREATE TABLE [dbo].[Contact](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NULL,
[MiddleName] [nvarchar](max) NULL,
[LastName] [nvarchar](max) NULL,
[Email] [nvarchar](max) NULL
[CustomField1] [nvarchar](max) NULL
[CustomField2] [nvarchar](max) NULL
[CustomField3] [nvarchar](max) NULL /* etc, adding lots of empty fields */
)
CREATE TABLE [dbo].[ContactCustomField](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FieldIndex] [int] NULL,
[FieldName] [nvarchar](50) NULL,
[Type] [varchar](50) NULL
)
The downside of this second approach is that there is a finite number of custom fields that must be specified when the contacts table is created. I don't think that's a major hurdle given the performance benefits it will surely have when importing large CSV files, and returning result sets.
What approach is the most efficient for large numbers of rows? Are there any downsides to the second technique that I'm not seeing?
Microsoft introduced sparse columns exactly for this type of problems. Tha point is that in a "classic" design you end up with large number of columns, most of the NULLs for any particular row. Same here with sparse columns, but NULLs don't require any storage. Moreover, you can create sets of columns and modify sets with XML.
Performance- and storage-wise, sparse columns are the winner.
http://technet.microsoft.com/en-us/library/cc280604.aspx
uery performance. Query performance for any "property bag table" approach is funny and comically slow - but if you need flexibility you can either have a dynamic table that is changed via an editor OR you have a property bag table. So when you need it, you need it.
But expect the performance to be slow.
The best approach would likely be a ContactCustomFields table which has - fields that are determined by an editor.

Table size strange in SQL Server

I have 2 tables in SQL Server
TbUrl
INDEX SPACE 12,531 MB
ROW COUNT 247505
DATA SPACE 1.965,891 MB
Table structure:
CREATE TABLE [TbUrl](
[IdUrl] [Int] IDENTITY(1,1) NOT NULL,
[IdSupply] [Int] NOT NULL,
[Uri] [varchar](512) NOT NULL,
[UrlCod] [varchar](256) NOT NULL,
[Status] [Int] NOT NULL,
[InsertionDate] [datetime] NOT NULL,
[UpdatedDate] [datetime] NULL,
[UpdatedIp] [varchar](15) NULL
TbUrlDetail
INDEX SPACE 29,406 MB
ROW COUNT 234209
DATA SPACE 386,047 MB
Structure:
CREATE TABLE .[TbUrlDetail](
[IdUrlDetail] [Int] IDENTITY(1,1) NOT NULL,
[IdUri] [Int] NOT NULL,
[Title] [varchar](512) NOT NULL,
[Sku] [varchar](32) NOT NULL,
[MetaKeywords] [varchar](512) NOT NULL,
[MetaDescription] [varchar](512) NOT NULL,
[Price] [money] NOT NULL,
[Description] [text] NOT NULL,
[Stock] [Bit] NOT NULL,
[StarNumber] [Int] NOT NULL,
[ReviewNumber] [Int] NOT NULL,
[Category] [varchar](256) NOT NULL,
[UrlShort] [varchar](32) NULL,
[ReleaseDate] [datetime] NOT NULL,
[InsertionDate] [datetime] NOT NULL
The size of TbUrl is very large compared with TbUrlDetail
The layout (design) of table TbUrl is less compared with TbUrlDetail but the data space it's else.
I´ve done SHRINK ON DATABASE but the space of TbUrl doesn't reduce.
What might be happening? How do I decrease the space of this table?
Is there a clustered index on the table? (If not you could be suffering from a lot of forward pointers - ref.) Have you made drastic changes to the data or the data types or added / dropped columns? (If you have then a lot of the space previously occupied may not be able to be re-used. One ref where changing a fixed-length col to variable does not reclaim space.)
In both cases you should be able to recover the wasted space by rebuilding the table (which will also rebuild all of the clustered indexes):
ALTER TABLE dbo.TblUrl REBUILD;
If you are on Enterprise Edition you can do this online:
ALTER TABLE dbo.TblUrl REBUILD WITH (ONLINE = ON);
Shrinking the entire database is not the magic answer here. And if there is no clustered index on this table, I strongly suggest you consider one before performing the rebuild.
With VARCHAR() fields, the amount of space actually taken does vary according to the amount of text put in those fields.
Could you perhaps have (on average) much shorter entries in one table than in the other?
Try
SELECT
SUM(CAST(LENGTH(uri) + LENGTH(urlcod) AS BIGINT)) AS character_count
FROM
TbUrl
SELECT
SUM(CAST(LENGTH(title) + LENGTH(metakeywords) + LENGTH(metadescription) + LENGTH(Category) AS BIGINT)) AS character_count
FROM
TbUrlDetail

Data Access from single table in sql server 2005 is too slow

Following is the script of table. Accessing data from this table is too slow.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Emails](
[id] [int] IDENTITY(1,1) NOT NULL,
[datecreated] [datetime] NULL CONSTRAINT [DF_Emails_datecreated]
DEFAULT (getdate()),
[UID] [nvarchar](250) COLLATE Latin1_General_CI_AS NULL,
[From] [nvarchar](100) COLLATE Latin1_General_CI_AS NULL,
[To] [nvarchar](100) COLLATE Latin1_General_CI_AS NULL,
[Subject] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,
[Body] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,
[HTML] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL,
[AttachmentCount] [int] NULL,
[Dated] [datetime] NULL
) ON [PRIMARY]
Following query takes 50 seconds to fetch data.
select id, datecreated, UID, [From], [To], Subject, AttachmentCount,
Dated from emails
If I include Body and Html in select then time is event worse.
indexes are on:
id unique clustered
From Non unique non clustered
To Non unique non clustered
Tabls has currently 180000+ records.
There might be 100,000 records each month so this will become more slow as time will pass.
Does splitting data into two table will solve the problem?
What other indexes should be there?
It's almost certainly the volume of the data that's causing a problem. Because of this, you should not fetch the Subject column until you are need it. Even fetching SUBSTRING(Subject, 100) may be noticeably faster.
This may be irrelevant, but older versions of SQL Server suffered if the BLOB columns weren't the last in the row, so just as an experiment I'd move [AttachmentCount] and [Dated] above the three nvarchar(max) columns.