SQL Server 2008 seems to be picking the PK Index for every query, even if a better one seems to exist - sql

It sounds like a similar situation to what's asked here, but I'm not sure his details are the same as mine.
Basically I have a relational table, we'll call it User:
User
-----------
int Id
varchar<100> Name
int AddressId
varchar<max> Description
and it has the following indices:
PK_User_Id - Obviously the primary key.
IX_User_AddressId - which includes only the AddressId.
When I run the following query:
select Id, Name, AddressId, Description from User where AddressId > 200
The execution plan shows that a scan was done, and PK_User_Id was used.
If I run this query:
select AddressId from User where AddressId > 200
The execution plan shows that a scan was done and IX_User_AddressId was used.
if I include all of the columns in the IX_User_AddressId index, then my original query will use the proper index, but that still seems wrong that I'd have to do that.
So my SQL noob question is this: What in the world do I have to do to get my queries to use the fastest index? Be very specific because I must be retarded as I can't figure this out.

You query looks like it has tipped, since your index does not cover all the fields you wanted, I would say it tipped (check out Kimberly Tripp - Tipping Point) and has used the Primary Key index which I would take a pretty good guess as being your clustered index.

When your IX_User_AddressId index contains only the AddressId, SQL must perform bookmark lookups on the base table to retrieve your other columns (Id, Name, Description). If the table is small enough, SQL may decide it is more efficient to scan the entire table rather than using an alternate index in combination with bookmark lookups. When you add those other columns to your index, you create what is called a covering index, meaning that all of the columns necessary to satisfy your query are available in the index itself. This is a good thing as it will eliminate the bookmark lookups.

Related

MariaDB Indexing

Let's say I have a table of 200,000,000 users. For each user I have saved a certain attribute. Let it be their lastname.
I am unsure of which index type to use with MariaDB. The only queries made to the database will be in the form of SELECT lastname FROM table WHERE username='MYUSERNAME'.
Is it therefore the best to just define the column username as a primary key. Or do I need to do anything else? Also how long is it going to take until the index is built?
Sorry for this question, but this is my first database with more than 200.000 rows.
I would go with:
CREATE INDEX userindex on `table`(username);
This will index the usernames since this is what your query is searching on. This will speed up the results coming back as the username column will be indexed.
Try it and if it reduces performance just delete the index, nothing lost (although make sure you do have backups! :))
This article will help you out https://mariadb.com/kb/en/getting-started-with-indexes/
It says primary keys are best set at table creation and as I guess yours is already in existence that would mean either copying it and creating a primary key or just using an index.
I recently indexed a table with non unique strings as an ID and although it took a few minutes to index the speed performance was a great improvement, this table was 57m rows.
-EDIT- Just re-read and thought it was 200,000 as mentioned at the end but see it is 200,000,000 in the title, that's a hella lotta rows.
username sounds like something that is "unique" and not null. So, make it NOT NULL and have PRIMARY KEY(username), without an AUTO_INCREMENT surrogate PK.
If it not unique, or cannot be NOT NULL, then INDEX(username) is very likely to be useful.
To design indexes, you must first know what queries you will be performing. (If you had called it simply "col1", I would not have been able to guess at the above advice.)
There are 3 index types:
BTree (actually B+Tree; see Wikipedia). This is the default and the most commonly used index type. It is efficient at finding a row given a specific value (WHERE user_name = 'joe'). It is also useful for a range of values (WHERE user_name LIKE 'Smith%').
FULLTEXT is useful for a TEXT column where you want to search for "words" inside it.
SPATIAL is useful for 2-dimensional data, such as geographical points on a map or other type of grid.

Define PK on multi tenant environment on SQL Server

I see different opinions about choosing the PK in a multi-tenancy environment. Let's say a have a table Employees. I created my Employee table like this:
EmployeeId INT IDENTITY PRIMARY KEY,
TenantId INT,
FirstName NVARCHAR(100),
LastName NVARCHAR(100)
I know I need to use the TenantId in all my queries, so I created besides a non cluster index on TenantId so I can write some queries like this:
In case I need all Employees for one specific Tenant:
Select EmployeeId, FirstName, LastName
where TenantId = 1
In case I need one Employees for one specific Tenant:
Select EmployeeId, FirstName, LastName
where EmployeeId = 1 and TenantId = 1
Testing with about 100000 records and one single Tenant for now on the Employee table I get full scan on the first query (I guess it's normal even if I have a non cluster index defined on TenantId because I have one Tenant in my table so it needs to scan the all table) and index seek on the second one.
Is this a good approach, do I need to add the TenantId in the cluster index too?
There is no simple answer to your question. You seem to have a low-cardinality column and a desire to query on that column. As a consequence, you will be returning many rows. You have observed this in the case of one value --> gets all the rows.
If you had 5 tenants randomly distributed in the 100,000 rows, then SQL Server would probably still do a full table scan, because it expects that all pages would have at least one of the records you are looking for. This is why non-clustered indexes work best on columns with high cardinality (which implies that few rows have any given value).
With a clustered index on tenant, then you will find all the rows in 1/5 of the pages. The query should be faster. However, the query is still returning a lot of data, so it is an open question whether the faster table scan is much of an overall benefit.
And, this comes at a cost. INSERTs no longer occur at the end of the table, so page splits become much more common. UPDATEs to tenant require deleting and re-inserting data, rather than modifying the record in place (and that additional work can have locking implications). These can be important considerations.
A common case where clustered indexes are useful on a low-cardinality column is the "most-recent data" problem. If you have a table and only 1% is the most recent data (or valid or whatever), then a clustered index on that column can be a big win.
Finally, if tenantid really is low cardinality, you might consider partitioning the table by this column. This might give you the best of both worlds, at least for the two queries that you suggest.
PRIMARY KEY is simply a tag to indicate that is the main lookup on the table! The real question is which key do you make the CLUSTERED index? Personally from what you've posted, what I've seen in the past and knowing the engine - I'd cluster on TentantId, EmployeeId but also add a nonclustered unique key on EmployeeId which is your surrogate key. Because you cluster on TentantId and the nature of B+ Trees then your tenant's will be together (in order). Also, lends itself to partitioning later on....

SQL design for performance

I am new to SQL and I have a basic question about performance.
I want to create a users database which will store information about my users:
Id
Log in name
Password
Real name
Later I want to perform a SELECT query on: Id, Log in name and Real name.
What would be the best design for this database, what tables and what keys should I create?
If it's only about those 4 fields it looks like just one table. Primary key on ID, unique index on LoginName. You may not want to store the password, but only a hash.
Depending on your queries, create different indexes. Furthermore, you may not need the ID field at all.
UPDATE:
Creating an index on certain column(s) enables the database to optimize its SQL statements. Given your user table:
USER
USER_ID BIGINT NOT NULL
LOGIN_ID VARCHAR(<size>) NOT NULL
PASSWORD VARCHAR(<size>) NOT NULL
NAME VARCHAR(<size>) NOT NULL
CONSTRAINT PK_USER PRIMARY KEY ( USER_ID )
The databases I know will automatically create an index on the primary key, which in fact means that the database maintains an optimized lookup table, see WikiPedia for further details.
Now say, you want to query users by LOGIN_ID which is a fairly common use case, I guess, you can create another index like:
CREATE INDEX I_USER_1 ON USER ( LOGIN_ID asc )
The above index will optimize the select * from USER where LOGIN_ID='foo'. Furthermore, you can create a unique index instead, assuming that you do not want duplicate LOGIN_IDs:
CREATE UNIQUE INDEX UI_USER_1 ON USER ( LOGIN_ID asc )
That's the whole story, so if you want to optimize a query for the users real name (NAME), you just create another index:
CREATE INDEX I_USER_2 ON USER ( NAME asc )
Just to add to the #homes answer you should work out what sort of queries you will be running and then optimize for those sorts of queries. For example if you are doing a lot of writes and not as many reads having lots of indexes can cause performance issues. It's a bit like tuning an engine for a car, are you going to be going quickly down a drag strip or are you tuning it for driving long distances.
Anyway you also asked about the NAME column. If you are going to be matching on a varchar column it might be worth investigating the use of FULLTEXT Indexes.
http://msdn.microsoft.com/en-us/library/ms187317.aspx
This allows you to do optimized searchs on names where you might be matching parts of a name and the like. As the #homes answer said it really does depend on what your queries and intent is when writing the query.
Might be worth making the table and using the query execution plan in something like SQL management studio against your queries and see what impact your indexes have on the amount of rows and sort of looks up that are happening.
http://www.sql-server-performance.com/2006/query-execution-plan-analysis/

designing index for a web-cms database

I have a question, would you please help me?
I have designed a database for a web-cms, in the User table_ which includes : UserID, Username, Password, FirstName, LastName, …_ which is the best choice that I have to create the index on it, username or FirstName and LastName? Or both of them?
By default the UserID is the clustered index of user table, so the next index must be non-clustered. But I am not sure about UserID to be the clustered index. As this is a web site and many users can register or remove their accounts everyday, is this a good choice to create the clustered index on UserID?
i am using sql server 2008
You should define clustered indexes on fields that are often requested sequentially, or contain a large number of distinct values or are used in queries to joion tables. So that usually mean that the primary key is a good candidate.
Non-Clustered indexes are good for field that are used in the where clause of queries.
Deciding which fields you create indexes on is something that is very specific to your application. If you have very critical queries that use the first name and last name fields then I would say yes, otherwise it may not be worth the effort.
In terms of persons removing their accounts I am sure that you do not intend to delete the row from the table. Usually you just mark these as inactive because what happens to all the other related tables that may be affected by this user?

Clustered index - multi-part vs single-part index and effects of inserts/deletes

This question is about what happens with the reorganizing of data in a clustered index when an insert is done. I assume that it should be more expensive to do inserts on a table which has a clustered index than one that does not because reorganizing the data in a clustered index involves changing the physical layout of the data on the disk. I'm not sure how to phrase my question except through an example I came across at work.
Assume there is a table (Junk) and there are two queries that are done on the table, the first query searches by Name and the second query searches by Name and Something. As I'm working on the database I discovered that the table has been created with two indexes, one to support each query, like so:
--drop table Junk1
CREATE TABLE Junk1
(
Name char(5),
Something char(5),
WhoCares int
)
CREATE CLUSTERED INDEX IX_Name ON Junk1
(
Name
)
CREATE NONCLUSTERED INDEX IX_Name_Something ON Junk1
(
Name, Something
)
Now when I looked at the two indexes, it seems that IX_Name is redundant since IX_Name_Something can be used by any query that desires to search by Name. So I would eliminate IX_Name and make IX_Name_Something the clustered index instead:
--drop table Junk2
CREATE TABLE Junk2
(
Name char(5),
Something char(5),
WhoCares int
)
CREATE CLUSTERED INDEX IX_Name_Something ON Junk2
(
Name, Something
)
Someone suggested that the first indexing scheme should be kept since it would result in more efficient inserts/deletes (assume that there is no need to worry about updates for Name and Something). Would that make sense? I think the second indexing method would be better since it means one less index needs to be maintained.
I would appreciate any insight into this specific example or directing me to more info on maintenance of clustered indexes.
Yes, inserting into the middle of an existing table (or its page) could be expensive when you have a less than optimal clustered index. Worst case would be a page split : half the rows on the page would have to be moved elsewhere, and indices (including non-clustered indices on that table) need to be updated.
You can alleviate that problem by using the right clustered index - one that ideally is:
narrow (only a single field, as small as possible)
static (never changes)
unique (so that SQL Server doesn't need to add 4-byte uniqueifiers to your rows)
ever-increasing (like an INT IDENTITY)
You want a narrow key (ideally a single INT) since each and every entry in each and every non-clustered index will also contain the clustering key(s) - you don't want to put lots of columns in your clustering key, nor do you want to put things like VARCHAR(200) there!
With an ever increasing clustered index, you will never see the case of a page split. The only fragmentation you could encounter is from deletes ("swiss cheese" problem).
Check out Kimberly Tripp's excellet blog posts on indexing - most notably:
GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues... - this one actually shows that a good clustered index will speed up all operations - including inserts, delete etc., compared to a heap with no clustered index!
Ever-increasing clustering key - the Clustered Index Debate..........again!
Assume there is a table (Junk) and
there are two queries that are done on
the table, the first query searches by
Name and the second query searches by
Name and Something. As I'm working on
the database I discovered that the
table has been created with two
indexes, one to support each query,
like so:
That's definitely not necessary - if you have one index on (Name, Something), that index can also and just as well be used if you search and restrict on just WHERE Name = abc - having a separate index with just the Name column is totally not needed and only wastes space (and costs time to be kept up to date).
So basically, you only need a single index on (Name, Something), and I would agree with you - if you have no other indices on this table, then you should be able to make this the clustered key. Since that key won't be ever-increasing and could possibly change, too (right?), this might not be such a great idea.
The other option would be to introduce a surrogate ID INT IDENTITY and cluster on that - with two benefits:
it's all a good clustered key should be, including ever-increasing -> you'll never have any issues with page splits and performance for INSERT operations
you still get all the benefits of having a clustering key (see Kim Tripps' blog posts - clustered tables are almost always preferable to heaps)
Someone suggested that the first indexing scheme should be kept since it would result in more efficient inserts/deletes
That's a bogus claim. Ordered data is ordered data and the same IO would be performed.
SET STATISTICS IO ON
-- your insert statement here
You can create a clustered index only on one column, not two or more so choose the column which your app will mostly be querying on, like wildcard queries on customer fullnames, etc. (see discussion)