Despite the existence of relevant indices, PostgreSQL query is slow - sql

I have a table with 3 columns:
time (timestamptz)
price (numeric(8,2))
set_id (int)
The table contains 7.4M records.
I've created an simple index for time and an index for set_id.
I want to run the following query:
select * from test_prices where time BETWEEN '2015-06-05 00:00:00+00' and '2020-06-05 00:00:00+00';
Depsite my indices, the query takes 2 minutes and 30 seconds.
See explain analze stats: https://explain.depesz.com/s/ZwSH
GCP postgres DB has the following stats:
What do I miss here? Why is this query so slow and how can I improve?

According to your explain plan, the row is returning 1.6 million rows out of 4.5 million. That means that a significant portion of rows are being returned.
Postgres wisely decides that a full table scan is more efficient than using an index, because there is a good chance that all the data pages will need to be read anyway.
It is surprising that you are reporting 00:02:30 for the query. The explain is saying that the query completes in about 1.4 seconds -- which seems reasonable.
I suspect that the elapsed time is caused by the volume of data being returned (perhaps the rows are very wide), a slow network connection to the database, or contention on the database/server.

Your query selects two thirds of the table. A sequential scan is the most efficient way to process such a query.
Your query executes in under 2 seconds. It must be your client that takes a long time to render the query result (pgAdmin is infamous for that). Use a different client.

Related

Sqlite index seemingly broken after insertion, need to run ANALYZE which is locking

Sqlite query gets really slow after inserting 14,300,160 rows (the db size is about 3G).
Supposed I have a table called test and I have created an index on column TIMESTAMP prior to insertion. A simple
SELECT DISTINCT TIMESTAMP FROM test;
Would run about 40 seconds, but after I do:
ANALYZE; -- Takes 1 minute or so on this DB
The same query runs 40 milliseconds, which is expected since the column is indexed.
As I'm using the database for soft real time applications, it is not possible to lock the DB for 1 minute or so just to run ANALYZE from time to time. I suspect that the insertion is breaking the index, hence the ANALYZE helped. Is that really the case? And if so, is there anyway to prevent this from happening?

Log join and where clause execution time in Oracle

I would like to ask for some help with a query in Oracle Database.
I have a massive select with multiple tables joined together (over 10) and multiple where clauses applied (10-20). Some tables have 10 columns, some has 300+. Most tables have 10+ million rows, some of them even 60+ million.
The execution time is usually between 25 and 45 minutes, sometimes it drops to 30 seconds. Monitoring the server load shows, that the load was almost the same.
We would like to optimize the select to reduce the usual execution time to 10-15 minutes or less.
My question is: Is there any tool or technique which can provide me information about which part of the query ran so long (like something, that can show me that in the last execution of the query, the 1st join took 36 secs, the 2nd join 40 secs, 1st where clause 10 secs etc.)?
(Note, that i'm not asking for optimization advice but for any tool or technique which can provide me information about which part/operation of executed query took so long)
Thanks in advance, I hope I was clear! :)
One option is to do the following:
add /*+ gather_plan_statistics */ to your query
execute the query
after the query, select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
This gives you a plan with columns like actual rows, actual time, memory usage, and more.
If you don't want to re-run the query you can generate actual rows and times of the last execution using a SQL Monitor Report like this:
select dbms_sqltune.report_sql_monitor(sql_id => ' add the sql_id here') from dual;
Using these tools allow you to focus on the relevant operation. A plain old explain plan isn't good enough for complex queries. AWR doesn't focus on individual queries. And tracing is a huge waste of time when there are faster alternatives.

SQL Server choice wrong execution plan

When this query is executed, SQL Server chooses a wrong execution plan, why?
SELECT top 10 AccountNumber , AVERAGE
FROM [M].[dbo].[Account]
WHERE [Code] = 9201
Go
SELECT top 10 AccountNumber , AVERAGE
FROM [M].[dbo].[Account] with (index(IX_Account))
WHERE [Code] = 9201
SQL Server chooses the clustered PK index for this query and elapsed time = 78254 ms, but if I force SQL Server to choose a non-clustered index then elapsed time is 2 ms, Stats Account table is updated.
It's usually down to having bad statistics on the various indexes. Even with correct stats, an index can only hold so many samples and occasionally when there is a massive skew in the values then the optimiser can think that it won't find a sufficiently small number.
Also you can sometimes have a massive amount of [almost] empty blocks to read through with data values only at "the end". This can sometimes mean where you have a couple of otherwise close variations, one will require drastically more IO to burn through the holes. Likewise if you don't actually have 10 values for 9201 it will have to do an entire table scan if it choses the PK/CI rather than a more fitting index. This is more prevalent when you've done plenty of deletes.
Try updating the stats on the various indexes and things like that & see if it changes anything. 78 seconds is a lot of IO on a single table scan.

SQL Performance, Using OPTION (FAST n)

Can anyone tell me what's the disadvantages of using OPTION (FAST n) in SQL Queries.
For example, I grab 100,000 records so quickly, but does this make effect on other processes of SQL Server?
I am moving a bit close to my issue.
I have to run a data process every week. So the first result comes out after 5-7 seconds and then I do my data process on these results. The results normally consists of few thousand rows. and every row take a few seconds to be processed. Normally the process waits for the whole result to be there then it start processing. The result comes out in dataset (I am using c# console app), I So I want the top 10 results to comes out quickly so that I can start the process immediately and then the rest of the rows comes out and add in the queue and wait for there turn.
Any idea how can I do this.
Thanks
Option fast forces the query optimizer to not optimize the total runtime of the query, but the time it takes to fetch the first N rows.
if you have 2 tables of 1 million rows you want to join, a standard query plan is a hashmap of one table (temp table of a million rows) and then use a hashmap lookup on the other.
a fast 10 optimisation would probably just use nested loops, because the effort of building that 1 million row hashmap is quite a bit more than the fast 10 steps of nested loop. If you are after all 1 million rows, the nested loop could take 3 times longer, but under fast 10, you'll get those 10 quicker. (this example assumes the existence of a suitable index)

SELECT query is slow (no index needed), why is it so slow?

I have a table with over 1 million entries.
The problem is with the speed of the SELECT queries. This one is very fast:
SELECT *
FROM tmp_pages_data
WHERE site_id = 14294
Showing rows 0 - 29 (1,273,042 total, Query took 0.0009 sec)
And this one is very slow:
SELECT *
FROM tmp_pages_data
WHERE page_status = 0
Showing rows 0 - 29 (15,394 total, Query took 0.3018 sec)
There is an index on the id column only, not needed in any of the selects. So there is no index on site_id or page status.
The 0.30 seconds query is very disturbing, especially when there are thousands requests.
So how can this be possible? What can I do to see what's slowing it down?
What can I do to see what's slowing it down?
It's quite obvious what is slowing it down - as you've already pointed out you don't have an index on the page_status column, and you should have one.
The only surprise is that your first query is so fast without the index. Looking at it more closely it seems that whatever client you are running these queries on is adding an implicit LIMIT 30 that you aren't showing in your question. Because there are so many rows that match it doesn't take long to find the first 30 of them, at which point it can stop searching and return the result. However your second query returns fewer matching rows so it takes longer to find them. Adding the index would solve this problem and make your query almost instant.
Short answer: add an index on the column page_status.
Ok, from our discussion in the comments we now know that the db somehow knows that the first query will returns all rows. That's why it's so fast.
The second query is slow because it doesn't have an index. OMG Ponies already stated that a normal index won't work because the value set is too small. I'd just like to point you to 'bitmap indexes'. I've not used them myself yet but they are known to be designed for exactly this case.