Fields after "ORDER BY" or "WHERE" and index in MySQL - sql

Do Fields after "ORDER BY" or "WHERE" might have index (PRIMARY, UNIQUE, INDEX) in mysql?
Consider a table with the following columns:
ID | AddedDate | CatID | Title | Description | Status | Editor
In these queries, are ID, AddedDate and CatID might have index?
SELECT *
FROM table WHERE ID = $id
SELECT *
FROM table
ORDER BY ID
SELECT *
FROM table
ORDER BY AddedDate
SELECT *
FROM table
ORDER BY CatID

You can order by any field. Please clarify our question if you want to know more / something else.
You might want to read ORDER BY optimization. There it says that fields with index might even improve the sorting as no extra has to be done (in the optimal case).
Update:
Yes, you can add an index if you want (if this is what you mean, it is still not clear as OMG Ponies points out). In general it is to say that you should add an index to those fields that you often use in WHERE clauses.

As far as I know, there are three basic ways to order rows:
In-memory sort: Read all rows into memeory and sort them. Very fast.
Using sorted index: Read one row at a time, looking up the columns that are not in the index in the base table.
File sort: Build a sort order by reading a part of the table at a time. This is really slow.
For tables that fit in memory, MySQL will probably choose option 1. That means it won't use an index even if it's present. The index will just be overhead.
But indexes shine for bigger tables. If the table is too big for memory, MySQL can avoid the painful file sort and rely on the index.
These days, memory is plentiful, and tables almost always fit in memory. I would only add indexes for ordering after I saw a file sort happening.

One of the main benefits of having an index that it lets you select only that subset of rows you're interested in. The alternative to using an index is to do a "full table scan".
Unless you have a "where" clause, you're not really going to get much benefit from having indexes.

Related

Strange behavior when doing where and order by query in postgres

Background: A large table, 50M+, all column in query is indexed.
when I do a query like this:
select * from table where A=? order by id DESC limit 10;
In statement, A, id are both indexed.
Now confusing things happen:
the more rows where returned, the less time whole sql cost
the less rows where returned, the more time whole sql cost
I have a guess here: postgres do the order by first, and then where , so it cost more time to find 10 row in the orderd index when target rowset is small(like find 10 particular sand on beach); oppositeļ¼Œ if target rowset is large, it's easy to find the first 10.
Is it right? Or there are some other reason for this?
Final question: How to optimize this situation?
It can either use the index on A to apply the selectivity, then sort on "id" and apply the limit. Or it can read them already in order using the index on "id", then filter out the ones that meet the A condition until it finds 10 of them. It will choose the one it thinks is faster, and sometimes it makes the wrong choice.
If you had a multi-column index, on (A,id) it could use that one index to do both things, get the selectivity on A and still fetch the already in order by "id", at the same time.
Do you know PGAdmin? With "explain verbose" before your statement, you can check how the query is executed (meaning the order of the operators). Usually first happens the filter and only afterwards the sorting...

Getting RID Lookup instead of Table Scan?

SQL Fiddle: http://sqlfiddle.com/#!3/23cf8
In this query, when I have an In clause on an Id, and then also select other columns, the In is evaluated first, and then the Details column and other columns are pulled in via a RID Lookup:
--In production and in SQL Fiddle, Details is grabbed via a RID Lookup after the In clause is evaluated
SELECT [Id]
,[ForeignId]
,Details
--Generate a numbering(starting at 1)
--,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where foreignId In (1,2,3,5)
With this query, the Details are being pulled in via a Table Scan.
With NumberedContacts AS
(
SELECT [Id]
,[ForeignId]
--Generate a numbering(starting at 1)
,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where ForeignId In (1,2,3,5)
)
Select nc.[Id]
,nc.[ForeignId]
,sc.[Details]
From NumberedContacts nc
Inner Join SupportContacts sc on nc.Id = sc.Id
Where nc.ContactNumber <= 2 --Only grab the last 2 contacts per ForeignId
;
In SqlFiddle, the second query actually gets a RID Lookup, whereas in production with a million records it produces a Table Scan (the IN clause eliminates 99% of the rows)
Otherwise the query plan shown in SQL Fiddle is identical, the only difference being that for the second query the RID Lookup in SQL Fiddle, is a Table Scan in production :(
I would like to understand possibilities that would cause this behavior? What kinds of things would you look at to help determine the cause of it using a table scan here?
How can I influence it to use a RID Lookup there?
From looking at operation costs in the actual execution plan, I believe I can get the second query very close in performance to the first query if I can get it to use a RID Lookup. If I don't select the Detail column, then the performance of both queries is very close in production. It is only after adding other columns like Detail that performance degrades significantly for the second query. When I put it in SQL Fiddle and saw that the execution plan used an RID Lookup, I was surprised but slightly confused...
It doesn't have a clustered index because in testing with different clustered indexes, there was slightly worse performance for this and other queries. That was before I began adding other columns like Details though, and I can experiment with that more, but would like to have a understanding of what is going on now before I start shooting in the dark with random indexes.
What if you would change your main index to include the Details column?
If you use:
CREATE NONCLUSTERED INDEX [IX_SupportContacts_ForeignIdAsc_IdDesc]
ON SupportContacts ([ForeignId] ASC, [Id] DESC)
INCLUDE (Details);
then neither a RID lookup nor a table scan would be needed, since your query could be satisfied from just the index itself....
The differences in the query plans will be dependent on the types of indexes that exist and the statistics of the data for those tables in the different environments.
The optimiser uses the statistics (histograms of data frequency, mostly) and the available indexes to decide which execution plan is going to be the quickest.
So, for example, you have noticed that the performance degrades when the 'Details' column is included. This is an almost sure sign that either the 'Details' column is not part of an index, or if it is part of an index, the data in that column is mostly unique such that the index accesses would be equivalent (or almost equivalent) to a table scan.
Often when this situation arises, the optimiser will choose a table scan over the index access, as it can take advantage of things like block reads to access the table records faster than perhaps a fragmented read of an index.
To influence the path that will be chose by the optimiser, you would need to look at possible indexes that could be added/modified to make an index access more efficient, but this should be done with care as it can adversely affect other queries as well as possibly degrading insert performance.
The other important activity you can do to help the optimiser is to make sure the table statistics are kept up to date and refreshed at a frequency that is appropriate to the rate of change of the frequency distribution in the table data
If it's true that 99% of the rows would be omitted if it performed the query using the relevant index + RID then the likeliest problem in your production environment is that your statistics are out of date and the optimiser doesn't realise that ForeignID in (1,2,3,5) would limit the result set to 1% of the total data.
Here's a good link for discovering more about statistics from Pinal Dave: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
As for forcing the optimiser to follow the correct path WITHOUT updating the statistics, you could use a table hint - if you know the index that your plan should be using which contains the ID and ForeignID columns then stick that in your query as a hint and force SQL optimiser to use the index:
http://msdn.microsoft.com/en-us/library/ms187373.aspx
FYI, if you want the best performance from your second query, use this index and avoid the headache you're experiencing altogether:
create index ix1 on SupportContacts(ForeignID, Id DESC) include (Details);

Avoid "SELECT TOP 1" and "ORDER BY" in Queries

I have the very table in sql server 2008 with lot of data
|ID|Name|Column_1|Column_2|
|..|....|........|........|
more than 18,000 records. So i need to the the row with the lowest value of Column_1 that is date but could by any data type (that is unsorted) so I use these sentence
SELECT TOP 1 ID, Name from table ORDER BY Column_1 ASC
But this is very very slow. And i think that i don't need to to sort the whole table. My question es how to get the same date with out using TOP 1 and ORDER BY
I cannot see why 18,000 rows of information would cause too much of a slow down, but that is obviously without seeing what the data is you are storing.
If you are regularly going to be using the Column_1 field, then I would suggest you place a non-clustered index on it... that will speed up your query.
You can do it by "designing" your table via Sql Server Management Studio, or directly via TSQL...
CREATE INDEX IX_myTable_Column_1 ON myTable (Column_1 ASC)
More information on MSDN about creating indexes here
Update thanks to comments by #GarethD who helped me with this, as I wasn't actually aware of it.
As an extra part of the above TSQL statement, it will increase the speed of your queries if you include the names of the other columns that will be used within the index....
CREATE INDEX IX_myTable_Column_1 ON myTable (Column_1 ASC) INCLUDE (ID, Name)
As GarethD points out, using this SQLFiddle as proof, the execution plan is much quicker as it avoids a "RID" (or Row Identifier) lookup.
More information on MSDN about creating indexes with include columns here
Thank you #GarethD
Would this work faster? When I read this question, this was the code that came to mind:
Select top 1 ID, Name
from table
where Column_1 = (Select min(Column_1) from table)

SQL Server index included columns

I need help understanding how to create indexes. I have a table that looks like this
Id
Name
Age
Location
Education,
PhoneNumber
My query looks like this:
SELECT *
FROM table1
WHERE name = 'sam'
What's the correct way to create an index for this with included columns?
What if the query has a order by statement?
SELECT *
FROM table1
WHERE name = 'sam'
ORDER BY id DESC
What if I have 2 parameters in my where statement?
SELECT *
FROM table1
WHERE name = 'sam'
AND age > 12
The correct way to create an index with included columns? Either via Management Studio/Toad/etc, or SQL (documentation):
CREATE INDEX idx_table_1 ON db.table_1 (name) INCLUDE (id)
What if the Query has an ORDER BY
The ORDER BY can use indexes, if the optimizer sees fit to (determined by table statistics & query). It's up to you to test if a composite index or an index with INCLUDE columns works best by reviewing the query cost.
If id is the clustered key (not always the primary key though), I probably wouldn't INCLUDE the column...
What if I have 2 parameters in my where statement?
Same as above - you need to test what works best for your query. Might be composite, or include, or separate indexes.
But keep in mind that:
tweaking for one query won't necessarily benefit every other query
indexes do slow down INSERT/UPDATE/DELETE statements, and require maintenance
You can use the Database Tuning Advisor (DTA) for index recommendations, including when some are redundant
Recommended reading
I highly recommend reading Kimberly Tripp's "The Tipping Point" for a better understanding of index decisions and impacts.
Since I do not know which exactly tasks your DB is going to implement and how many records in it, I would suggest that you take a look at the Index Basics MSDN article. It will allow you to decide yourself which indexes to create.
If ID is your primary and/or clustered index key, just create an index on Name, Age. This will cover all three queries.
Included fields are best used to retrieve row-level values for columns that are not in the filter list, or to retrieve aggregate values where the sorted field is in the GROUP BY clause.
If inserts are rare, create as much indexes as You want.
For first query create index for name column.
Id column I think already is primary key...
Create 2nd index with name and age. You can keep only one index: 'name, ag'e and it will not be much slower for 1st query.

Avoiding a Full Table Scan in MySQL

How can I avoid a full table scan on mysql?
In general, by making sure you have a usable index on fields that appear in WHERE, JOIN and ORDER BY clauses.
Index your data.
Write queries that use those indexes.
Anything more than that we need specifics.
Also note that sometimes you just can not rid of a full table scan, i.e. When you need all the rows from your table... or when the cost of scanning the index is gt the cost of scanning the full table.
Use a LIMIT clause when you know how many rows you are expecting to return, for example if you are looking for a record with a known ID field that is unique, limit your select to 1, that way mysql will stop searching after it finds the first record. The same goes for updates and deletes.
SELECT * FROM `yourTable` WHERE `idField` = 123 LIMIT 1