Optimize query on timestamp column in Postgresql 8.x - sql

Lets suppose you have and orders table, this table contains a timestamp column indicating the creation time of the orders. A normal query would be to obtain the orders between two dates. Does anybody know how to optimize this query because creating an index on the timestamp column has no effect as shown by EXPLAIN ANALYZE.

Usually indexes are used, but only if the table is properly analyzed (VACUUM ANALYZE or just ANALYZE), and if the table size is large enough that index scans are faster than sequential scans.

An index should work. I suspect its not working for you because your table is either tiny (PostgreSQL almost never uses indices for tiny tables), or you haven't done an analyze on it.

Related

PostgreSQL EXPLAIN: How do I see a plan AS IF certain tables had millions of rows?

This is a question about PostgreSQL's EXPLAIN command. This command shows you how the optimizer will execute your SQL based on the data in your tables. We are not in prod yet, so all of our tables have ~100 rows or less. Is there a way to get EXPLAIN to tell me what the explain plan would look like if certain tables had millions of rows instead of tens of rows?
I could generate the data somehow, but then I'd have to clear it and wait for it to be created. If that's the only way, I'll accept that as an answer, though.
I don't think so. Postgresql collects some statistics related to the table that the optimizer will use to choose the best plan. These statistics are not related to simply how many rows a table contains but they will depends on the values/data too.
From the postgres documentation:
the query planner needs to estimate the number of rows retrieved by a
query in order to make good choices of query plans.
What does it mean that? Suppose we have an indexed column called foo, without a non-unique constraint. Suppose you have the following simple query:
SELECT * FROM test_table WHERE foo = 5
Postgresql will have to choose between different index scans:
sequential scan
index scan
bitmap scan
It will choose the type of scan based on how many rows it thinks to retrieve from the query. How does it know how many rows will be retrieved before running the query? With the statistics that it collects. These statistics are based on the VALUES/DATA inside your table. Suppose you have a table with 1 million of rows and 90% of them have foo = 5. Postgresql may be know that, because it could have collected some statistics about the distribution of your data. So it will chose a sequential scan, because according to its cost model, this scan is the cheapest one.
In the end, it will would be not enough generate data, but you should generate values that will represent the reality (the data that you will have in the future).
You can already build your database creating some indexes (based on the query that you will do) to have already good performance in production. If it will be not enough you will have to tune your indexes after you go into production.

PostgreSQL Index Isn't Used on Query

so I have PostgreSQL database with table that track the movement and fuel level of equipment with millions of rows
and to make query faster when I create index on time column with this command:
create index gpsapi__index__time on gpsapi (time);
When I tried to run simple command with "EXPLAIN ANALYZE" like this
EXPLAIN ANALYZE
SELECT *
FROM gpsapi g
WHERE g.time >= NOW() - '1 months'::interval;
it doesn't show that the query uses the index I created
Do you know how to solve this? Thanks!
If you read the execution plan closely, you'll see that Postgres is telling you that out of about 6 million records, 5.5 million matched (> 90%). Based on statistics, it is likely that Postgres realized that it would be returning a large percentage of total records in the table, and that it would be faster to just forgo using the index and scan the entire table.
The concept to understand here is that, while the index you defined does potentially let Postgres throw away non matching records very quickly, it actually increases the time needed to lookup the values in SELECT *. The reason for this is that, upon hitting the leaf node in the index, Postgres must then do a seek back to the clustered index/table to find the column values. Assuming your query would return most of the table, it would be faster to just scan the table directly.
This being said, there is nothing at all inherently wrong with your index. If your query used a more narrow range, or searched for a specific timestamp, such that the expected result set were sufficiently small, then Postgres likely would have used the index.

How to get list of values stored in index?

I'm having this issue in Oracle 11g R2. Table containing not null column which is indexed with non unique index. The index is not containing other columns.
Then I assumed that if I query distinct values of the column from the table, it would use index to get different values of the column (sounds logical to me). However at least explain plan is telling me it's doing full table scan. Also it took some time so probably the plan was not changed during run time. Optimizer index hint didn't helped.
I tried to search answer for this but no luck. Is there way to get values stored in index or somehow query the table without "touching" the table at all (like multi column index joins can)?
Thanks!
EDIT: This was about Oracle EBS gl_balances table and gl_balances_n2 index. I got answer and this changed the explain plan:
select /*+ index_ffs(gl gl_balances_n2) */
distinct gl.period_name
from gl_balances gl;
It may not be more efficient to scan the index than to scan the table -- don't forget that the index segment also contains branch nodes, and each index entry has to contain a ROWID of about 16 bytes (if memory serves).
So a "fast full index scan", which is the plan you're looking to get, may not be as fast as a full table scan. (You'd use an index_ffs() hint for that, by the way.)
edit: It be possible to use a more exotic method
Maintaining your own list by periodically querying the table using DBMS_Scheduler.
A materialized view. Complete refresh on demand might be adequate, though barely better than just periodically querying the data and maintaining your own unique list.
Making the index compressed, though that would only be of value for longish index keys.
A bitmap index -- not for a concurrently modified table though.

Postgres ignoring clustered index on date query

I have a large table that I run queries like select date_att > date '2001-01-01' on regularly. I'm trying to increase the speed of these queries by clustering the table on date_att, but when I run those queries through explain analyze it still chooses to sequentially scan the table, even on queries as simple as SELECT date_att from table where date_att > date '2001-01-01'. Why is this the case? I understand that since the query returns a large portion of the table, the optimizer will ignore the index, but since the table is clustered by that attribute, shouldn't it be able to really quickly binary search through the table to the point where date > '2001-01-01' and return all results after that? This query still takes as much time as without the clustering.
It seems like you are confusing two concepts:
PostgreSQL clustering of a table
Clustering a table according to an index in PostgreSQL aligns the order of table rows (stored in a heap table) to the order in the index at the time of clustering. From the docs:
Clustering is a one-time operation: when the table is subsequently
updated, the changes are not clustered.
http://www.postgresql.org/docs/9.3/static/sql-cluster.html
Clustering potentially (often) improves query speed for range queries because the selected rows are stored nearby in the heap table by coincidence. There is nothing that guarantees this order! Consequently the optimizer cannot assume that it is true.
E.g. if you insert a new row that fulfills your where clause it might be inserted at any place in the table — e.g. where rows for 1990 are stored. Hence, this assumtion doesn't hold true:
but since the table is clustered by that attribute, shouldn't it be able to really quickly binary > search through the table to the point where date > '2001-01-01' and return all results after that?
This brings us to the other concept you mentioned:
Clustered Indexes
This is something completely different, not supported by PostgreSQL at all but by many other databases (SQL Server, MySQL with InnoDB and also Oracle where it is called 'Index Organized Table').
In that case, the table data itself is stored in an index structure — there is no separate heap structure! As it is an index, the order is also maintained for each insert/update/delete. Hence your assumption would hold true and indeed I'd expect the above mentioned database to behave as you would expect it (given the date column is the clustering key!).
Hope that clarifies it.

Tuning table select SQL having a RAW column in Oracle 10g

I have a table with several columns and a unique RAW column. I created an unique index on the RAW column.
My query selects all columns from the table (6 million rows).
when i see the cost of the query its too high (51K). and its still using INDEX FULL scan. The query do not have any filter conditions, its a plain select * from.
Please suggest how can i tune the query operation.
Thanks in advance.
Why are you hinting it to use the index if you're retrieving all columns from all rows? The index would only help if you were filtering on the indexed column. If you were only retrieving the indexed column then an INDEX_FFS hint might help. But if you have to go back to the data for any non-indexed columns then using the index at all becomes counterproductive beyond a certain proportion of returned data as you're having to access both the index data blocks and the table data blocks repeatedly.
So, your query is:
select /*+ index (rawdata idx_test) */
rawdata.*
from v_wis_cds_cp_rawdata_test rawdata
and you want to know why Oracle is choosing an INDEX FULL scan?
Well, as Alex said, the reason is the "index (raw data idx_text)" hint. This is a directive that tells the Oracle optimizer, "when you access rawdata, use an index access on the idx_text index", which means that's what Oracle will do if at all possible - even if that's not the best plan.
Hints don't make queries faster automatically. They are a way of telling the optimizer what not to do.
I've seen queries like this before - sometimes a hint like this is added in order to return the rows in sorted order, without actually doing a sort. However, if this was the requirement, I'd strongly recommend adding an ORDER BY clause in anyway, because if the hint becomes invalid for some reason (e.g. the index gets dropped or renamed), the sorting would no longer happen and no error would be reported.
If you don't need the rows returned in any particular order, I suggest you remove the hint and see if the performance improves.