Extremely slow SELECT statement with WHERE on a FK field - sql

I have this query below, and it's extremely slow. It takes almost 2 minutes for run to return 3,008 records out of a table with 99 million records. The first query where it gets "Article" data is super fast, less than 1 second and always returns 1 record. It's the second query that's the problem. I don't really want to JOIN these queries. The first one is so quick, and (in my real query) I'm setting more than just #ArticleID for further use.
The query execution plan says it has 75% for it on a clustered key lookup on IX_Name, which didn't make sense to me because I'm not even doing anything with name fields here. Furthermore, Id and ArticleID are both indexes on ArticleAuthor, so I'm not sure what I'm doing wrong. I can't do much with IX_Name being the clustered index...my boss created this table and said to do that.
DECLARE #DOI VARCHAR(72) = '10.1140/EPJC/S10052-012-1993-2'
DECLARE #ArticleID VARCHAR(12)
SELECT
#ArticleID = A.Id
FROM
Article A
LEFT JOIN
JournalName JN WITH (NOLOCK) ON JN.Id = A.JournalId
WHERE
A.DOI = #DOI
PRINT 'GOT ARTICLE DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')
SELECT
AA.Id
FROM
[ArticleWarehouseTemp]..ArticleAuthor AA WITH (NOLOCK)
WHERE
AA.ArticleID = #ArticleID
PRINT 'GOT ARTICLEAUTHOR DATA ' + format(getdate(), 'yyyy-MM-dd HH:mm:ss.fff')
Please help! This is driving me insane. I've attached the table structure and indexes here too.
CREATE TABLE [dbo].[ArticleAuthor]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ArticleId] [int] NOT NULL,
[FullName] [nvarchar](128) NULL,
[LastName] [nvarchar](64) NULL,
[FirstName] [nvarchar](64) NULL,
[FirstInitial] [nvarchar](1) NULL,
[OrcId] [varchar](36) NULL,
[IsSequenceFirst] [bit] NULL,
[SequenceIndex] [smallint] NULL,
[CreatedDate] [smalldatetime] NULL CONSTRAINT [DF_ArticleAuthor_CreatedDate] DEFAULT (getdate()),
[UpdatedDate] [smalldatetime] NULL,
[Affiliations] [varbinary](max) NULL
) ON [ArticleAuthorFileGroup] TEXTIMAGE_ON [ArticleAuthorFileGroup]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[ArticleAuthor] WITH CHECK
ADD CONSTRAINT [FK_ArticleId]
FOREIGN KEY([ArticleId]) REFERENCES [dbo].[Article] ([Id])
GO
ALTER TABLE [dbo].[ArticleAuthor] CHECK CONSTRAINT [FK_ArticleId]
GO
CREATE NONCLUSTERED INDEX [IX_ID]
ON [dbo].[ArticleAuthor] ([Id] ASC)
CREATE NONCLUSTERED INDEX [IX_ArticleID]
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)
CREATE CLUSTERED INDEX [IX_Name]
ON [dbo].[ArticleAuthor] ([LastName] ASC, [FirstName] ASC, [FirstInitial] ASC)

If you have to keep the current clustered index as is, you can do the following:
1.
Make sure that you are using correct types:
DECLARE #ArticleID VARCHAR(12)
should be
DECLARE #ArticleID int;
to match the type of the column ArticleId in the ArticleAuthor table.
2.
To make sure that index IX_ArticleID is used efficiently, to make it a covering index, INCLUDE the column Id to it:
CREATE NONCLUSTERED INDEX [IX_ArticleID]
ON [dbo].[ArticleAuthor] ([ArticleId] ASC)
INCLUDE(Id);
3.
If you have a very skewed distribution of data, i.e. a number of rows per ArticleId varies greatly for different articles. Say, if one article has 2 rows and another article has million rows, then you'd better add OPTION(RECOMPILE) to the query and make sure that statistics and/or index(es) are kept up to date.

You are declaring DECLARE #ArticleID VARCHAR(12) while its int in your table [dbo].[ArticleAuthor][ArticleId] [int] NOT NULL,
Try to make them same datatype to ensure faster response.

Related

Index seek and Table scan takes the same time to execute

I have a table with the structure like this:
CREATE TABLE [dbo].[User]
(
[Id] [INT] IDENTITY(1,1) NOT NULL,
[CountryCode] [NVARCHAR](2) NOT NULL DEFAULT (N'GB'),
[CreationDate] [DATETIME2](7) NOT NULL,
[Email] [NVARCHAR](256) NULL,
[EmailConfirmed] [BIT] NOT NULL,
[FirstName] [NVARCHAR](MAX) NOT NULL,
[LastName] [NVARCHAR](MAX) NOT NULL,
[LastSignIn] [DATETIME2](7) NOT NULL,
[LockoutEnabled] [BIT] NOT NULL,
[LockoutEnd] [DATETIMEOFFSET](7) NULL,
[NormalizedEmail] [NVARCHAR](256) NULL,
[NormalizedUserName] [NVARCHAR](256) NULL,
[PasswordHash] [NVARCHAR](MAX) NULL,
[SecurityStamp] [NVARCHAR](MAX) NULL,
[TimeZone] [NVARCHAR](64) NOT NULL DEFAULT (N'Europe/London'),
[TwoFactorEnabled] [BIT] NOT NULL,
[UserName] [NVARCHAR](256) NULL,
[LastInfoUpdate] [DATETIME] NOT NULL
)
I have around a million rows in that table, and I want to apply a nonclustered index to the [LastInfoUpdate] column.
So I've created a non-clustered index using this command:
CREATE NONCLUSTERED INDEX IX_ProductVendor_VendorID1
ON [dbo].[TestUsers] (LastInfoUpdate)
INCLUDE(Email)
And once I'm trying to run simple query like that:
SELECT [LastInfoUpdate]
FROM [dbo].[TestUsers]
WHERE [LastInfoUpdate] >= GETUTCDATE()
I just get the same result in timing as without index. According to SQL Server Profiler with db does index seek while using index and just use less cpu resources in comparison with case without index but what is important for me it's time. What time the same? What am I doing wrong?
Execution Plan of table scan
Execution plan of Index Scan
Index seek Execution Plan file
Just create the following index:
CREATE INDEX IX_Users_EventDate ON Users(EventDate)
INCLUDE (EventId)
And the following query will be fast:
SELECT EventId, EventDate
FROM Users
WHERE EventDate <= GETUTCDATE()
Because the index is a covering index.
The key of a covering index must include columns referenced in WHERE and ORDER BY clauses. And the covering index must include all columns referenced on the SELECT list.
The query you posted doesn't match to the query plans you linked. The query plans are for the above query.
Another thing to take into account is the number of records returned by the query. If they are many, the query cannot be fast enough, because it needs to read all the data and send it to the network.
Try to use ColumnStore index. It is faster when you want to get some range of columns:
CREATE NONCLUSTERED COLUMNSTORE INDEX
[csi_User_LastInfoUpdate_Email] ON
[dbo].[User] ( [LastInfoUpdate], [Email] )WITH
(DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
An article about column store index.
"WHERE [LastInfoUpdate] >= GETUTCDATE()" might return a lot of results. Tablescan can in this case be faster than index seek and subsequent adding information from the tabledata.
By adding the queried information to the index you can avoid the costly subsequent looks into tabledata.

Why is my query slower when I use the index I made for it?

I have the following table, made with EntityFramework 6.1:
CREATE TABLE [dbo].[MachineryReading] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[Location] [sys].[geometry] NULL,
[Latitude] FLOAT (53) NOT NULL,
[Longitude] FLOAT (53) NOT NULL,
[Altitude] FLOAT (53) NULL,
[Odometer] INT NULL,
[Speed] FLOAT (53) NULL,
[BatteryLevel] INT NULL,
[PinFlags] BIGINT NOT NULL,
[DateRecorded] DATETIME NOT NULL,
[DateReceived] DATETIME NOT NULL,
[Satellites] INT NOT NULL,
[HDOP] FLOAT (53) NOT NULL,
[MachineryId] INT NOT NULL,
[TrackerId] INT NOT NULL,
[ReportType] NVARCHAR (1) NULL,
[FixStatus] INT DEFAULT ((0)) NOT NULL,
[AlarmStatus] INT DEFAULT ((0)) NOT NULL,
[OperationalSeconds] INT DEFAULT ((0)) NOT NULL,
CONSTRAINT [PK_dbo.MachineryReading] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId] FOREIGN KEY ([MachineryId]) REFERENCES [dbo].[Machinery] ([Id]) ON DELETE CASCADE,
CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId] FOREIGN KEY ([TrackerId]) REFERENCES [dbo].[Tracker] ([Id]) ON DELETE CASCADE
);
GO
CREATE NONCLUSTERED INDEX [IX_MachineryId]
ON [dbo].[MachineryReading]([MachineryId] ASC);
GO
CREATE NONCLUSTERED INDEX [IX_TrackerId]
ON [dbo].[MachineryReading]([TrackerId] ASC);
Thats a lot of information, and our most common (and slowest) query only uses a subset of it:
SELECT TOP 1 OperationalSeconds
FROM MachineryReading
WHERE MachineryId = #id
AND DateRecorded > #start
AND DateRecorded < #end
AND OperationalSeconds <> 0
The table stores a few million rows, recorded from about 2012 onwards, although our code is set to begin some searches from 2000. It was running pretty slowly, so one of the guys I work with partitioned the table based on DateRecorded:
ALTER PARTITION SCHEME PartitionSchemeMonthRange NEXT USED [Primary]
ALTER PARTITION FUNCTION [PartitionFunctionMonthRange]() SPLIT RANGE(N'2016-01-01T00:00:00.000')
ALTER PARTITION SCHEME PartitionSchemeMonthRange NEXT USED [Primary]
ALTER PARTITION FUNCTION [PartitionFunctionMonthRange]() SPLIT RANGE(N'2016-02-01T00:00:00.000')
...
CREATE UNIQUE CLUSTERED INDEX [PK_dbo.MachineryReadingPs] ON MachineryReading(DateRecorded, Id) ON PartitionSchemeMonthRange(DateRecorded)
However, the query above is still running pretty slowly. So on top of that, I made another index:
CREATE NONCLUSTERED INDEX [IX_MachineryId_DateRecorded]
ON [dbo].[MachineryReading]([DateRecorded] ASC, [MachineryId] ASC)
INCLUDE([OperationalSeconds], [FixStatus]);
Executing that query again, the execution plan shows it completely ignoring the index I just made, instead opting for Constant Scan, and Index Seek on IX_MachineryId. This works pretty quickly for a small date range, but is terrible for getting the total operational hours.
Ok, I can deal with that: WITH(INDEX(IX_MachineryId_DateRecorded)).
Nope. It actually runs significantly slower, when using the index I made specifically for that query! What gives? What can I do better?
You have DateRecorded before MachineryId in your indexes. Reverse these for a more efficient index.

Update PK columns in table having clustered keys

I have a database with tables that have clustered primary keys. I believe the term I picked up on is use of a Natural Key. The frontend to the SQL database is programmed to affect changes to all related foreign key tables for the 3,000 selected values which I want to modify. It takes about 13 seconds per change. I have a need to do this in a much shorter timeframe if possible.
The reason for doing this is prep work to migrate to a new CMMS program.
I found reference for use of ON UPDATE CASCADE, but I am not certain this applies.
Among many references, I used the following:
https://social.technet.microsoft.com/Forums/sqlserver/it-IT/23294919-3e6a-4146-a70d-66fa155ed1b3/update-primary-key-column-in-sql-server?forum=transactsql
An example of 2 of the 15 tables having the same named [EQNUM] column follows. Table A is the table that is first modified using the frontend. I left out many columns for each table:
CREATE TABLE A
(
[EQNUM] [varchar](30) NOT NULL,
CONSTRAINT [PK_A] PRIMARY KEY CLUSTERED ([EQNUM] ASC)
)
GO
SET ANSI_PADDING OFF
GO
CREATE TABLE B
(
[EQNUM] [varchar](30) NOT NULL,
[ColA] [varchar](10) NOT NULL,[ColB] [datetime] NOT NULL,
[ColC] [varchar](30) NOT NULL, [ColD] [varchar](30) NOT NULL,
[ColE] [varchar](30) NOT NULL, [ColF] [varchar](30) NOT NULL,
[ColG] [varchar](11) NOT NULL,[ColH] [varchar](10) NOT NULL,
[ColI] [datetime] NOT NULL,[ColJ] [varchar](15) NOT NULL,
[ColK] [int] NULL,
CONSTRAINT [PK_B]
PRIMARY KEY CLUSTERED ([EQNUM] ASC,
[ColA] ASC,[ColB] ASC,[ColC] ASC,[ColD] ASC,[ColE] ASC,
[ColF] ASC,[ColG] ASC,[ColH] ASC,[ColI] ASC,[ColJ] ASC)
)
An example of 1 of 4 sets of UPDATE queries, for which I believe the added first and last ALTER TABLE lines would allow me to affect the update:
ALTER TABLE A NOCHECK CONSTRAINT [PK_EQUIP]
UPDATE A
SET [EQNUM] = REPLACE([EQNUM],'-B','-B0')
WHERE [EQNUM] LIKE '%-A[1-9][0-5][0-9]-%' OR
[EQNUM] LIKE '%-A[1-9][A-F][0-5][0-9]-%' OR
ALTER TABLE A CHECK CONSTRAINT [PK_EQUIP]
ALTER TABLE A NOCHECK CONSTRAINT [PK_B]
UPDATE B
SET [EQNUM] = REPLACE([EQNUM],'-B','-B0')
WHERE [EQNUM] LIKE '%-A[1-9][0-5][0-9]-%' OR
[EQNUM] LIKE '%-A[1-9][A-F][0-5][0-9]-%' OR
ALTER TABLE A CHECK CONSTRAINT [PK_B]
Is it this simple, or am I missing something? Is there a better way?

SQL query not returning record

I am trying to retrieve a record from a table with a given field value. The query is:
declare #imei varchar(50)
set #imei = 'ee262b57-ccb4-4a2b-8410-6d8621fd9328'
select *
from tblDevices
where imei = #imei
which returns nothing.
If I comment out the where clause all records are returned, including the one I am looking for. The value is clearly in the table field and matches exactly, but I cannot get the where clause to work.
I literally copied the value out of the table to ensure it was correct.
I would appreciate any guidance on my mistake.
Table def:
CREATE TABLE [dbo].[tblDevices](
[id] [int] IDENTITY(1,1) NOT NULL,
[create_date] [datetime] NOT NULL,
[update_date] [datetime] NOT NULL,
[other_id] [int] NULL,
[description] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[authorized] [int] NOT NULL,
[imei] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_tblDevices] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Edit
Using user2864740 suggestion, I queried the following:
select hashbytes('SHA1', imei) as h1 from tblDevices where id =8
returns:
0x43F9067C174B2F2F2C0FFD17B9AC7F54B3C630A2
select hashbytes('SHA1', #imei) as h2
returns:
0xB9B82BB440B04729B2829B335E6D6B450572D2AB
So, I am not sure what this means. My poor little brain is having a hard time understanding that A <> A?! What is going on here if it's not a collation issue? How can two identical values not be considered equal?
Edit 2
this is the table record I want:
8 2013-10-22 12:43:10.223 2013-10-22 12:43:10.223 -1 1 ee262b57-ccb4-4a2b-8410-6d8621fd9328
Kinda of taking a wild stab but with the two hashes showing they are in fact different, wondering if you just have an extra space somewhere Maybe try:
select *
from tblDevices
where Trim(imei) = (#imei)

When is a SQL Server foreign key table too much?

When I have the below two tables, would the StatusTypes table be considered as overkill? i.e. is there more benefit to using it than not?
In this situation I don't expect to have to load these statuses up in an admin backend in order to add or change/ delete them, but on the other hand I don't often like not using foreign keys.
I'm looking for reasons for and against separating out the status type or keeping it in the Audit table.
Any help would be appreciated.
-- i.e. NEW, SUBMITTED, UPDATED
CREATE TABLE [dbo].[StatusTypes](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](250) NOT NULL,
CONSTRAINT [PK_StatusTypes] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Audits](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [nvarchar](500) NULL,
[Country_Fkey] [int] NOT NULL,
[User_Fkey] [int] NOT NULL,
[CreatedDate] [date] NOT NULL,
[LastAmendedDate] [date] NULL,
[Status_Fkey] [int] NOT NULL,
CONSTRAINT [PK_Audits] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO
In this situation I like to keep the lookup table to enforce the status being one of a set of types. Some databases have an enum type, or can use check constraints, but this is the most portable method IMO.
However, I make the lookup table containing only a single string column containing the type's name. That way you don't have to actually join to the lookup table and your ORM (assuming you use one) can be completely unaware of it.
In this case the schema would look like:
CREATE TABLE [dbo].[StatusTypes](
[ID] [nvarchar](250) NOT NULL,
CONSTRAINT [PK_StatusTypes] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Audits](
[ID] [int] IDENTITY(1,1) NOT NULL,
...
[Status] [nvarchar](250) NOT NULL,
CONSTRAINT [PK_Audits] PRIMARY KEY CLUSTERED ([ID] ASC),
CONSTRAINT [FK_Audit_Status] FOREIGN KEY (Status) REFERENCES StatusTypes(ID)
) ON [PRIMARY]
GO
And a query for audit items of a particular type would be:
SELECT ...
FROM Audits
WHERE Status = 'ACTIVE'
So referential integrity is still enforced but queries don't need an extra join.
I'll offer a counter-argument: Use your development time where it is most useful. Maybe you don't need this runtime-check that much. Maybe you can use your development time for some other check that is more useful.
Is it even likely that an invalid status value will be set? You application surely uses a set of constants or an enum so it is unlikely that some rogue value slips in.
That said, there is a lot of value in ensuring integrity. I like to cover all my "enum" columns with a BETWEEN check constraint which is quickly done and even faster at runtime.