Will creating index help in this case - sql-server-2005

I'm still a learning user of SQL-SERVER2005.
Here is my table structure
CREATE TABLE [dbo].[Trn_PostingGroups](
[ControlGroup] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[PracticeCode] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ScanDate] [smalldatetime] NULL,
[DepositDate] [smalldatetime] NULL,
[NameOfFile] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DepositValue] [decimal](11, 2) NULL,
[RecordStatus] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_Trn_PostingGroups_1] PRIMARY KEY CLUSTERED
(
[ControlGroup] ASC,
[PracticeCode] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Scenario 1 : Suppose I have a query like this...
Select * from Trn_PostingGroups where PracticeCode = 'ABC'
Will indexing on Practice Code seperately help me in making my query faster??
Scenario 2 :
Select * from Trn_PostingGroups
where
ControlGroup = 12701
and PracticeCode = 'ABC'
and NameOfFile = 'FileName1'
Will indexing on NameOfFile seperately help me in making my query faster ??

If you were only selecting on the first field (ControlGroup), it is the primary sort of the clustered index and you wouldn't need to index the other field.
If you select on the other primary key fields, then adding a separate index on the other fields should help with such selects.
In general, you should index fields that are commonly used in SORT and WHERE clauses. This of course is over simplified.
See this article for more information about optimizing (statistics and query analyser).

You can only utilize one index per table per query (unless you consider self joins or CTEs). if you have multiple that can be used on the same table in the same query, then SQL Server will use statistics to determine which would be better to use.
In Scenario 1, if you create an index on PracticeCode alone, it will usually be used, as long as you have enough rows that a table scan costs more and that there is a diverse range of values in that column. An index will not be used if there are only a few rows in the table (it is faster to just look at them all). Also, an index will not be used if most of the values in that column are the same. It will not use the PK in this query, it would be like looking for a first name in the phone book, you can't use the index because it is last+first name. You might consider reversing your PK to PracticeCode+ControlGroup if you never search on ControlGroup by itself.
In Scenario 2, if you have an index on NameOfFile it will probably use the PK and ignore the NameOfFile index. Unless you make the NameOfFile index unique, and then it is a tossup. You might try to create an index (in addition to your PK) on ControlGroup+PracticeCode+NameOfFile. if you have many files per ControlGroup+PracticeCode, then it may select that index over the PK index.

Related

SQLite speed up select with collate nocase

I have SQLite db:
CREATE TABLE IF NOT EXISTS Commits
(
GlobalVer INTEGER PRIMARY KEY,
Data blob NOT NULL
) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
I want to make 1 select:
SELECT Commits.Data
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ?
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
after that i want to make another select:
SELECT Commits.Data,Streams.Name
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ? COLLATE NOCASE
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
The problem is that 2nd select works super slow. I think this is because COLLATE NOCASE. I want to speed up it. I tried to add index but it doesn't help (may be i did sometinhg wrong?). How to execute 2nd query with speed approximately equals to 1st query's speed?
An index can be used to speed up a search only if it uses the same collation as the query.
By default, an index takes the collation from the table column, so you could change the table definition:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
However, this would make the first query slower.
To speed up both queries, you need two indexes, one for each collation. So to use the default collation for the implicit index, and NOCASE for the explicit index:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS Streams_nocase_idx ON Streams(Name COLLATE NOCASE, GlobalVar);
(Adding the second column to the index speeds up the ORDER BY in this query.)

SQL Index Update with Covering Columns

I am creating an index on a table and I want to include a covering column: messageText nvarchar(1024)
After insertion, the messageText is never updated, so it's an ideal candidate to include in a covering index to speed up lookups.
But what happens if I update other columns in same index?
Will the entire row in the index need reallocating or will just that data from the updated column be updated in the index?
Simple Example
Imaging the following table:
CREATE TABLE [Messages](
[messageID] [int] IDENTITY(1,1) NOT NULL,
[mbrIDTo] [int] NOT NULL,
[isRead] [bit] NOT NULL,
[messageText] [nvarchar](1024) NOT NULL
)
And the following Index:
CREATE NONCLUSTERED INDEX [IX_messages] ON [Messages] ( [mbrIDTo] ASC, [messageID] ASC )
INCLUDE ( [isRead], [messageText])
When we update the table:
UPDATE Messages
SET isRead = 1
WHERE (mbrIDTo = 6546)
The query plan shows that the index IX_messages is utilized and will also be updated becuase the column isRead is part of the index.
Therefore does including large text fields (such as messageText in the above) as part of a covering column in an index, impact performance when other values, in that same index, are updated?
When a row is updated in SQL Server, the entire row is deleted and a new row with the updated records is inserted. Therefore, even if the messageText field is not changing, it will still have to be re-written to the disk.
Here is a blog post from Paul Randall with a good example: http://www.sqlskills.com/blogs/paul/do-changes-to-index-keys-really-do-in-place-updates/

If I place a composite index on three columns and use them in the same query but in different places, will it still be effective?

With the following table and index:
CREATE TABLE [Ticket]
(
[Id] BIGINT IDENTITY NOT NULL,
[Title] CHARACTER VARYING(255) NOT NULL,
[Description] CHARACTER VARYING(MAX) NOT NULL,
[Severity] INTEGER NOT NULL,
[Priority] INTEGER NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL,
PRIMARY KEY([Id])
);
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON [Ticket]([Priority], [Severity], [CreatedOn]);
Will the following query:
SELECT [Id]
FROM [Ticket]
WHERE [Priority] = 1
ORDER BY [Severity] DESC, [CreatedOn] ASC
make use of the entire composite index or only utilize the [Priority] part of the index?
I know that for a query that had all of the columns in the WHERE clause, the whole index would be used. I am unsure about the above case though!
Given the actual execution plan below, on a table with no statistics, I am not sure how to interpret it.
It does look like it used the index, but which parts? There is clearly a sort cost, but is that sorting by [Severity] and then [CreatedOn] after doing a seek on [Priority]?
It may use the index, but it will only use the Priority part efficiently since you have the index sorted in a way that is not optimal for the query;
ORDER BY [Severity] DESC, [CreatedOn] ASC
vs
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON
[Ticket]([Priority], [Severity], [CreatedOn]);
As you can see in this fiddle if you click the execution plan, the query is split into an index seek and a sort.
Since Severity is sorted ascended, the index won't be (optimally) used for the sort. If you really want an optimal sort, index Severity descending as your query uses it;
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON
[Ticket]([Priority], [Severity] DESC, [CreatedOn]);
An SQLfiddle with the fixed index. Note that the whole query is now an index seek.
Note that the plan may look different for you depending on your data, but in general this is true, an index sorted the same way as the query accesses it will use the index better.

SQL Server why index is not used

I have a following table in SQL Server 2008 database:
CREATE TABLE [dbo].[Actions](
[ActionId] [int] IDENTITY(1,1) NOT NULL,
[ActionTypeId] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Description] [nvarchar](1000) NOT NULL,
[Comment] [nvarchar](500) NOT NULL,
[Created] [datetime] NOT NULL,
[Executed] [datetime] NULL,
[DisplayText] [nvarchar](1000) NULL,
[ExecutedBy] [int] NULL,
[Result] [int] NULL
)
CONSTRAINT [PK_Actions] PRIMARY KEY CLUSTERED
(
[CaseActionId] ASC
)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Actions_Executed] ON [dbo].[Actions]
(
[Executed] ASC,
[ExecutedBy] ASC
)
There are 20 000 rows which has Executed date equal to '2500-01-01' and 420 000 rows which has Executed date < '2500-01-01'.
When I execute a query
select CaseActionId, Executed, ExecutedBy, DisplayText from CaseActions
where Executed='2500-01-01'
the query plans shows that the clustered index scan on PK_Actions is performed and the index IX_Actions_Executed is not used at all.
What funny I got missing index hint which says
/* The Query Processor estimates that implementing the following index could improve the query cost by 99.9901%.
*/
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Actions] ([Executed])
But the index is already there.
Why the index is not used if it would select 5% of the data ?
Most likely, the query optimizer just sees that you're selecting DisplayText as well - so for each of the 20'000 rows found in the NC index, there would need to be a key lookup into the clustered index to get that data - and key lookups are expensive operations! So in the end, it might just be easier and more efficient to scan the clustere index right away.
I bet if you run this query here:
select CaseActionId, Executed, ExecutedBy
from CaseActions
where Executed='2500-01-01'
then the NC index will be used
If you really need the DisplayText and that's a query you'll run frequently, maybe you should include that column in the index as an extra column in the leaf level:
DROP INDEX [IX_Actions_Executed]
CREATE NONCLUSTERED INDEX [IX_Actions_Executed]
ON [dbo].[Actions]([Executed] ASC, [ExecutedBy] ASC)
INCLUDE([DisplayText])
This would make your NC index a covering index, i.e. it could return all columns needed for your query. If you run your original query again with this covering index in place, I'm pretty sure SQL Server's query optimizer will indeed use it. The probability that any NC index will be used is significantly increased if that NC index is a covering index, e.g. some queries can get all their columns they need from just the NC index, without key lookups.
The missing index hints are a bit misleading at times - there are also known bugs leading to SQL Server Mgmt Studio to continously recommendation indices that are already in place..... don't bet too much of your money on those index hints!

Increasing performance on a logging table in SQL Server 2005

I have a "history" table where I log each request into a Web Handler on our web site. Here is the table definition:
/****** Object: Table [dbo].[HistoryRequest] Script Date: 10/09/2009 17:18:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HistoryRequest](
[HistoryRequestID] [uniqueidentifier] NOT NULL,
[CampaignID] [int] NOT NULL,
[UrlReferrer] [nvarchar](512) NOT NULL,
[UserAgent] [nvarchar](512) NOT NULL,
[UserHostAddress] [nvarchar](15) NOT NULL,
[UserHostName] [nvarchar](512) NOT NULL,
[HttpBrowserCapabilities] [xml] NOT NULL,
[Created] [datetime] NOT NULL,
[CreatedBy] [nvarchar](100) NOT NULL,
[Updated] [datetime] NULL,
[UpdatedBy] [nvarchar](100) NULL,
CONSTRAINT [PK_HistoryRequest] PRIMARY KEY CLUSTERED
(
[HistoryRequestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[HistoryRequest] WITH CHECK ADD CONSTRAINT [FK_HistoryRequest_Campaign] FOREIGN KEY([CampaignID])
REFERENCES [dbo].[Campaign] ([CampaignId])
GO
ALTER TABLE [dbo].[HistoryRequest] CHECK CONSTRAINT [FK_HistoryRequest_Campaign]
GO
37 seconds for 1050 rows on this statement:
SELECT *
FROM HistoryRequest AS hr
WHERE Created > '10/9/2009'
ORDER BY Created DESC
Does anyone have anysuggestions for speeding this up? I have a Clustered Index on the PK and a regular Index on the CREATED column. I tried a Unique Index and it barfed complaining there is a duplicate entry somewhere - which can be expected.
Any insights are welcome!
You are requesting all columns (*) over a non-covering index (created). On a large data set you are guaranteed to hit the Index Tipping Point where the clustered index scan is more efficient than an nonclustered index range seek and bookmark lookup.
Do you need * always? If yes, and if the typical access pattern is like this, then you must organize the table accordingly and make Created the leftmost clustered key.
If not, then consider changing your query to a coverable query, eg. select only HistoryRequestID and Created, which are covered by the non clustered index. If more fields are needed, add them as included columns to the non-clustered index, but take into account that this will add extra strorage space and IO log write time.
Hey, I've seen some odd behavior when pulling XML columns in large sets. Try putting your index on Created back, then specify the columns in your select statement; but omit the XML. See how that affects the return time for results.
For a log table, you probably don't need a uniqueidentifier column. You're not likely to query on it either, so it's not a good candidate for a clustered index. Your sample query is on "Created", yet there's no index on it. If you query frequently on ranges of "Created" values then it would be a good candidate for clustering even though it's not necessarily unique.
OTOH, the foreign key suggests frequent querying by Campaign, in which case having the clustering done by that column could make sense, and would also probably do a better job of scattering the inserted keys in the indexes - both the surrogate key and the timestamp would add records in sequential order, which is net more work over time for insertions because the node sectors are filled less randomly.
If it's just a log table, why does it have update audit columns? It would normally be write-only.
Rebuild indexes. Use WITH (NOLOCK) clause after the table names where appropriate, this probably applies if you want to run long(ish) running queries against table that are heavily used in a live environment (such as a log file). It basically means your query migth miss some of teh very latest records but you also aren't holding a lock open on the table - which creates additional overhead.