I have 2 tables that are a few millions of rows with indexes. I'm looking to convert one of the indexes to DESC order to optimize some operations. However, will that affect joinining speed or other optimizations?
For example:
Table A:
a_id (pk)
Table B:
b_id (pk)
a_id (fk)
If A.a_id is stored as DESC and B.a_id is stored ASC will I encounter any problems or slowness on joins? Will oracle be able to use the indexes for joining even though they have different sort orders? Do I have to make B.a_id DESC as well or create a second index that is DESC ? Obviously I'd like to try a simple experiment but I don't have DBA access or a spare oracle setup to work with.
Will oracle be able to use the indexes
for joining even though they have
different sort orders?
Indexes are not used "for joining". They're used to access data. The row sources thus created are then joined. The only reason I can think of that the sort order of the index would have any impact on joining would be if a merge join is occurring and the index is being used to avoid sorting. In this case, the impact of changing to a descending index might be that the data needs to be sorted in memory after it is accessed; or it might not, if the optimizer is intelligent enough to simply walk through that data in reverse order when doing the merge.
If you have queries whose execution plans rely on using the index on A.A_ID to get the data in ascending order (either for purposes of a merge join or to meet your requested ordering of the results), then changing the index to descending order could have an impact.
Edit: Just did a quick test on some sample data. The optimizer does seem to have the capability to merge row sources sorting in opposite orders without resorting either of them. So at the most obvious level, having one index ascending and the other descending should not cause serious performance problems. However, it does look like the descending indexes can have other effects on the execution plan -- in my case, the ascending index was used for a fast full scan, while the descending one was used for a range scan. This could cause changes in query performance -- good or bad -- but the only way to know for certain is to test it.
Oracle implements indexes as doubly-linked lists, so it makes no difference whether you specify an ASC or DESC index for a single column.
DESC indexes are a special case that helps when you have a multi-column index, e.g. if I have a query that often orders by colA ASC, colB DESC, then I might decide to add an index on (colA, colB DESC) in order to avoid a sort.
Developing without a development and test system?
Your answer is to develop with one. Oracle comes on all platforms, just install, add data, do your work.
For you, just live dangerously and do the index change, who cares what happens. Grab for that brass ring. So you miss. You won't lose any data.
I'm not sure I get what you're trying to ask - you cannot "store" in descending or ascending order. You can fetch the results of the query and order it using ORDER BY clause which will sort the resulting set in ascending or descending order.
There is no guarantee that you're inserting any data in ascending or descending order.
Consequently, the "order" by which it is inserted will have no bearing on the performance because there is no order
Generally speaking an index can do scanning in asc/desc order since the 2 pointers in the index structure are sufficient to identify leaf blocks and corresponding blocks while doing scan based on asc/desc order without sorting in the memory.
However if we create an index with desc column definition its structure will be much larger than the a normal index since the normal index has a 90-10 splits (incrementing row ids) where as desc index will be 50-50 splits and will lead to unused space and a candidate for rebuild which will require additional maintenance and overhead.
DESC indexes can be helpful when you have a multi-column index where one column is need in asc while the other in desc to avoid sorting in the memory.
Early optimization is a waste of time. Just leave this problem and do the next thing. When there are 100 million rows in this table change the indexes and test what happens, until then your ten rows of data are not worth the time to "optimize".
Related
I am always wondering if ORDER BY is efficient, because I believe it inevitably need a whole-database scanning, even if the ordering field is indexed.
For example, if I order by created_at and limit 10. I think, because the database cannot know I will order by created_at a priori, it has to sort the whole data and return the first 10 items. Of course if we have an index on created_at, things might be better.
However, even with index, I think we can still run into trouble. For example, I want to sort by a function of a field, say (age^2-age-10). Even if we indexed the age field, the database cannot know what function I will use a priori, so it has to calculate the sqrt on all rows.
Am I wrong? Anyway, could anyone explain to me the workflow behind ORDER BY?
If there is an index that is sorted in the same order as specified in the ORDER BY clause, the database will not need to perform a sort operation. The query optimizer looks for indexes that can speed up your query. It analyzes your SQL query and, in the case of ORDER BY clauses, looks for indexes that have the same order. See Indexing ORDER BY for more details.
Some database engines allow indexing computed columns, which would cover the case you mentioned.
In theory, the database optimizer can take into account the limit clause when determining the query plan. This is most obviously useful with a limit 1 query, which can be implemented just by keeping track of which row has the extreme value for the columns in the order by. The same idea can be extended to larger limit sizes.
In practice, I don't think that most databases implement this optimization when the limit is larger than 1. Some may for the special case of limit 1 (or top 1 or whatever the right syntax is).
An index can be used for an order by. In general, the columns in the index would need to match exactly the appropriate columns in the index. SQL optimizers are generally not smart enough to recognize simple conversions. On the other hand, people who write SQL usually don't do such transformations.
I have a Rails app on a Postgres database which relies heavily on queries like this:
SELECT DISTINCT client_id FROM orders WHERE orders.total>100
I need, essentially, the ids of all the clients who have orders which meet a certain condition. I only need the id, so I figured this is way faster than using joins.
Would I benefit from adding an index to the column "total"? I don't mind insert speed, I just need the query to run extremely fast.
I would expect the following multicolumn index to be fastest:
CREATE INDEX orders_foo_idx ON orders (total DESC, client_id);
PostgreSQL 9.2 could benefit even more. With it's "index-only tuples" feature, it could serve the query without hitting the table under favorable circumstances: no writes since the last VACUUM.
DESC or ASC hardly matters in this case. A B-tree index can be searched in both directions almost equally efficient.
Absolutely. With no index on the total column, this query will require a table scan. With an index on the total column, it will require an index seek and key lookup. This will provide your query with huge performance gains as the size of the table grows.
> I only need the id, so I figured this is way faster than using joins.
True, though I'm not sure why you would consider using joins in the first place in this case.
As cmotley said, you're going to require an index on the total column for this query. However, optimal performance is going to depend on exactly which queries you're running. For example, for this query, with this table structure, the fastest you're going to get is to create an index like so:
CREATE INDEX IX_OrderTotals ON orders (total, client_id)
By including the client_id in the index, you create something called a covered index on the client_id column, so the database engine won't have to look up the row behind the scenes in order to fetch your data.
I have a very specific query. I tried lots of ways but i couldn't reach the performance i want.
SELECT *
FROM
items
WHERE
user_id=1
AND
(item_start < 20000 AND item_end > 30000)
i created and index on user_id, item_start, item_end
this didn't work and i dropped all indexes and create new indexes
user_id, (item_start, item_end)
also this didn't work.
(user_id, item_start and item_end are int)
edit: database is MySQL 5.1.44, engine is InnoDB
UPDATE: per your comment below, you need all the columns in the query (hence your SELECT *). If that's the case, you have a few options to maximize query performance:
create (or change) your clustered index to be on item_user_id, item_start, item_end. This will ensure that as few rows as possible are examined for each query. Per my original answer below, this approach may speed up this particular query but may slow down others, so you'll need to be careful.
if it's not practical to change your clustered index, you can create a non-clustered index on item_user_id, item_start, item_end and any other columns your query needs. This will slow down inserts somewhat, and will double the storage required for your table, but will speed up this particular query.
There are always other ways to increase performance (e.g. by reducing the size of each row) but the primary way is to decrease the number of rows which must be accessed and to increase the % of rows which are accessed sequentially rather than randomly. The indexing suggestions above do both.
ORIGINAL ANSWER BELOW:
Without knowing the exact schema or query plan, the main performance problem with this query is that SELECT * forces a lookup back to your clustered index for every row. If there are large numbers of matching rows for a particular user ID and if your clustered index's first column is not item_user_id, then this will likley be a very inefficient operation because your disk will be trying to fetch lots of randomly distributed rows from teh clustered inedx.
In other words, even thouggh filtering the rows you want is fast (because of your index), actually fetching the data is slower. .
If, however, your clustered index is ordered by item_user_id, item_start, item_end then that should speed things up. Note that this is not a panacea, since if you have other queries which depend on different ordering, or if you're inserting rows in a differnet order, you could end up slowing down other queries.
A less impactful solution would be to create a covering index which contains only the columns you want (also ordered by item_user_id, item_start, item_end, and then add the other cols you need). THen change your query to only pull back the cols you need, instead of using SELECT *.
If you could post more info about the DBMS brand and version, and the schema of your table, and we can help with more details.
Do you need to SELECT *?
If not, you can create a index on user_id, item_start, item_end with the fields you need in the SELECT-part as included columns. This all assuming you're using Microsoft SQL Server 2005+
What is the fastest way to reorder (resort) a table so that the physical representation most closely matches the logical, using PostgreSQL?
Having data ordered does not mean that you won't need ORDER BY clause in your queries anymore.
It just means that the logical order or the data is likely to match the physical order, and retrieving the data in logical order (say, from an index scan) will more likely result in a sequential read access to the table.
Note that neither MySQL nor PostgreSQL guarantee that INSERT … SELECT … ORDER BY will result in the data being ordered in the table.
To order data in PostgreSQL you should define an index with the given order and issue this command:
CLUSTER mytable USING myindex
This will rebuild the table and all indexes defined on it.
Order of insertion does not in the end always control order in the table. Tables are ordered by their clustered index, if they have one. If they do not have one, then the order is technically undefined, although it is probably safe in many cases to assume that they're ordered in insertion order this doesn't mean they'll stay that way. Lacking a specific ordering the DB engine is free to reorder as it sees fit to optimize retrieval.
If you want to control order on disk, the single best way is to properly define your clustered index. Always.
Use CLUSTER to reorder a table by a given index.
(BTW, ALTER TABLE ... ORDER BY ...)
I have a query that pulls 5 records from a table of ~10,000. The order clause isn't covered by an index, but the where clause is.
The query scans about 7,700 rows to pull these 5 results, and that seems like a bit much. I understand, though, that the complexity of the ordering criteria complicates matters. How, if at all, can i reduce the number of rows scanned?
The query looks like this:
SELECT *
FROM `mediatypes_article`
WHERE `mediatypes_article`.`is_published` = 1
ORDER BY `mediatypes_article`.`published_date` DESC, `mediatypes_article`.`ordering` ASC, `mediatypes_article`.`id` DESC LIMIT 5;
medaitypes_article.is_published is indexed.
How many rows apply to "is_published = 1" ?
I assume that is like... 7.700 rows?
Either way you take it, the full result that will match the WHERE clause has to be fetched and completely ordered by all sorting criteria. Then the full list of all sorted published articles will be truncated after the first 5 results.
Maybe it will help you to look at the MySQL documentation article about ORDER BY optimization, but for the first you should try to apply indices on the columns that are stated in the ORDER BY statement. It is very likely that this will speed up things greatly.
Executing OPTIMIZE TABLE may not help, but it doesn't hurt either.
When you have ordering, you have to traverse all the btree to figure out the proper order.
10,000 records to order is not that big amount to worry about. Remember, with proper indexing, the RDBMS doesn't fetch the whole record to figure out the order. It has the indexed columns in btree pages saved on disk and with few page reads, the whole btree is loaded in memory and can be traversed.
In MySQL you can make an index that includes multiple columns. I think what you probably need to do is make an index that includes is_published and published_date. You should look at the output from the EXPLAIN statement to make sure it's doing things the smart way, and add an index if it is not.