How Can an Always-False PostgreSQL Query Run for Hours? - sql

I've been trying to debug a slow query in another part of our system, and saw this query is active:
SELECT * FROM xdmdf.hematoma AS "zzz4" WHERE (0 = 1)
It has apparently been active for > 8 hours. With that WHERE clause, logically, this query should return zero rows. Why would a SQL engine even bother to evaluate it? Would a query like this be useful for anything, and if so, what could it be?
(xdmdf.hematoma is a view and I would expect SELECT * on it to take ~30 minutes under non-locky conditions.)
This statement:
explain select 1 from xdmdf.hematoma limit 1
(no analyze) has been running for about 10 minutes now.

There are two possibilities:
It takes forever to plan the statement, because you changed some planner settings and the view definition is so complicated (partitioned table?).
This is the unlikely explanation.
A concurrent transaction is holding an ACCESS EXCLUSIVE lock on a table involved in the view definition.
Terminate any such concurrent transactions.

Related

Log join and where clause execution time in Oracle

I would like to ask for some help with a query in Oracle Database.
I have a massive select with multiple tables joined together (over 10) and multiple where clauses applied (10-20). Some tables have 10 columns, some has 300+. Most tables have 10+ million rows, some of them even 60+ million.
The execution time is usually between 25 and 45 minutes, sometimes it drops to 30 seconds. Monitoring the server load shows, that the load was almost the same.
We would like to optimize the select to reduce the usual execution time to 10-15 minutes or less.
My question is: Is there any tool or technique which can provide me information about which part of the query ran so long (like something, that can show me that in the last execution of the query, the 1st join took 36 secs, the 2nd join 40 secs, 1st where clause 10 secs etc.)?
(Note, that i'm not asking for optimization advice but for any tool or technique which can provide me information about which part/operation of executed query took so long)
Thanks in advance, I hope I was clear! :)
One option is to do the following:
add /*+ gather_plan_statistics */ to your query
execute the query
after the query, select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
This gives you a plan with columns like actual rows, actual time, memory usage, and more.
If you don't want to re-run the query you can generate actual rows and times of the last execution using a SQL Monitor Report like this:
select dbms_sqltune.report_sql_monitor(sql_id => ' add the sql_id here') from dual;
Using these tools allow you to focus on the relevant operation. A plain old explain plan isn't good enough for complex queries. AWR doesn't focus on individual queries. And tracing is a huge waste of time when there are faster alternatives.

Simple Oracle query started running slow when returning more than 251 results

As the title implies, I have a very simple Oracle query that is returning in 5 seconds when I go beyond returning 251 results. I am using SQL Developer, and attaching using the built in connection utility (there is no facility for an ODBC connection in this application).
The query found here is fast (fast enough) (pa_stu holds roughly 40k rows):
Select * From pa_stu Where rownum < 252;
Oracle returns the data to me in .521 second, according to SQL Developer.
The following query, and ones that pull larger sets of data, are the culprit:
Select * From pa_stu Where rownum < 253;
Oracle returns the data to me for that last one in 5.327 second, according to SQL Developer.
All queries being used for testing have the same explain plan. That is, the filter predicate of ROWNUM<251 (change the 251 to whatever number is being used) and a TABLE ACCESS of FULL.
The results above are consistent, plus, bumping up the evaluated number to about 1000 doubles the result time to roughly 10 seconds (consistently). It is as if some buffering is going on somewhere, and that buffer is too small. Additionally, this is happening on only one of our Oracle servers. The other, more highly used one (holds different data as well) has no problem returning 100's of thousands of records using similar statements.
The databases are controlled by a DBA, and, I have run all of this by her. She does not have a solution. This actually started happening a month or so back, and was not the case many months ago, if that is meaningful. It was just not as noticeable as it is now.
Thank you for any help.

What makes one of these queries faster?

I have a sql query that I tried executing (below) that took 10 seconds to run, and since it was on a production environment I stopped it just to be sure there is no sql locking going on
SELECT TOP 1000000 *
FROM Table T
Where CONVERT(nvarchar(max), T.Data) like '%SearchPhrase%' --T.Data is initially XML
Now if I add an order by on creation time (which I do not believe is an index), it takes 2 seconds and is done.
SELECT TOP 1000000 *
FROM Table T
Where CONVERT(nvarchar(max), T.Data) like '%SearchPhrase%' --T.Data is initially XML
order by T.CreatedOn asc
Now the kicker is that only about 3000 rows are returned, which tells me that even with the TOP 1000000 it isn't stopping short on which rows it's still going through all the rows.
I have a basic understanding of how SQL server works and how the query parsing works, but I'm just confused as to why the order by makes it so much faster in this situation.
The server being run is SQL server 2008 R2
The additional sort operation is apparently enough in this case for SQL Server to use a parallel plan.
The slower one (without ORDER BY) is a serial plan whereas the faster one has a DegreeOfParallelism of 24 meaning that the work is being done by 24 threads rather than just a single one.
This explains the much reduced elapsed time despite the additional work required for the sort.

TOP 100 causing SQL Server 2008 hang?

I have inherited a VERY poorly designed and maintained database and have been using my knowledge of SQL Server and a little luck keeping this HIGH availability server up and not completing coming down in flames (the previous developer, who quit basically just kept the system up for 4 years).
I have come across a very strange problem today. I hope someone can explain this to me so if this happens again there is a way to fix it.
Anyway, there is a stored proc that is pretty simple. It joins two tables together between a SHORT date/time range (5 mins range) and passes back the results (this query runs every 5 mins via a windows service). The largest table has 100k rows, the smallest table has 10k rows. The stored proc is very simple and does:
NOTE:The table and columns names have been changed to protect the innocent.
SELECT TOP 100 m.*
FROM dbo.mytable1 m WITH (nolock)
INNER JOIN dbo.mytable2 s WITH (nolock) ON m.Table2ID = s.Table2ID
WHERE m.RowActive = 1
AND s.DateStarted <= DATEADD(minute, -5, getdate())
ORDER BY m.DateStarted
Now, if I keep "TOP 100" in the query, the query hangs until I stop it (running in SMS or in the stored proc). If I remove the TOP 100, the query works as planned and returns 50-ish rows, like it should (we don't want it to return more than 100 rows if we can help it).
So, I did some investigating, using sp_who, sp_who2, and looked at the master..sysprocesses and used DBCC INPUTBUFFER to look for any SPIDs that might be locking or blocking. No blocks and no locking.
This JUST STARTED today with no changes to these these two tables designs and from what I gather the last time this query/tables have been touched was 3 years ago and has been running without error since.
Now, a side note, and I don't know if this would have anything to do with this. But I reindexed both these tables about 24 hours before because they were 99% fragmented (remember, I said this was poorly designed and poorly maintained server).
Can anyone explain why SQL Server 2008 would do this?
THE ORDER BY is the killer. it has to read all rows, sort by that order by column, and then give you the first 100 rows.
The absolute first thing I would do would do a side by side comparison of the query plans of the full and the top 100 queries and see if the top 100 is not performant. You might need to update stats or even have missing indexes.
I'd presume there's no index on mytable1.DateStarted. I think something might be deciding to perform the sorting earlier on in the query process when you did SELECT TOP 100.

Postgres 8.4.4 (x32 on Win7 x64) very slow UPDATE on small table

I have a very simple update statement:
UPDATE W SET state='thing'
WHERE state NOT IN ('this','that') AND losttime < CURRENT_TIMESTAMP;
The table W only has 90 rows, though the losttime and state columns for each row are updated each about every ~10s seconds. There are indexes on state and losttime (as well as the primary index).
I'm noticing with large databases (i.e. the other tables have a lot of entries, not table W) over a period of time, the query gets slower and slower and slower. After running for 48 hours, I'm timing it by running it in the query window of PqAdminIII and it's taken 17 minutes to execute!
I have a similar query on another table that's showing the same problem:
UPDATE H SET release='1'
WHERE a NOT IN (SELECT id from A WHERE state!='done') AND release!='1';
H does not have any indexes, but I have tried putting and removing an index on H(release) with no change in behaviour. This query, after the database has been up for 48 hours and the table H has ~100k rows, is taking 27 minutes. Postgres server will have a thread completely pegged (100% CPU utilization) for the duration of the query, so it doesn't look like there's any contention for network, disk, etc.
So in broad strokes, the behaviour I see is that my database runs as expected for about 5 minutes, then gradually everything grinds to a halt as basic maintenance-related UPDATE commands start to take longer and longer to run. By the second day, it's taking an hour to do a simple maintenance cycle (a handful of UPDATES) which were running ~100ms at the outset. It seems clear to me that the performance degrade is super-linear with the amount of information in the database -- maybe N^2 or some such.
Autovacuum is on using the defaults. I read through the manual (again) and didn't see anything that jumped out at me.
I'm scratching my head here. I don't see any bug fixes that seem relevant in 9.0.1 and 9.0.2 release notes. Can anyone help me to understand what is happening? Thanks, M
-x-x-x-x-
Okay, so I may have two problems here.
The first update appears to run fast now. Not sure what happened, so I'll proceed there with the assumption that I need to run VACUUM / ANALYZE or some combination more frequently -- say every minute or so. I would really like to know why autovacuum isn't doing this for me.
The second update continues to run slowly. The query plan suggests that indexes are not being used effectively and that there is a 80k*30k cross occurring, which could account for super-linear runtime that I seem to be observing. (Does everyone agree with this interpretation of the plan?)
I can convert the UPDATE to a SELECT:
SELECT * from H
where a not in (SELECT id from A where state='done') AND release!='1';
with a similar runtime (27 minutes).
If I don't trust the postgres optimizer and do this:
WITH r as (select id from A where state='done')
SELECT a from H
JOIN on H.a=r.id
WHERE H.released='0';
then the query runs in ~500ms.
How do I translate this knowledge back into an UPDATE that runs with acceptable speed?
My attempt:
UPDATE H SET release='1'
FROM A
where A.state!='done' AND release!='1' AND A.id=H.a;
runs in about 140 seconds, which is faster, but still very very slow.
Where can I go from here?
-x-x-x-x-
VACUUM ANALYZE has been added as part of "routine maintenance" where the application will run it approximately once every minute or so independently of any autovacuum that is running.
Also, rewrote the second query to eliminate the known-to-be-slow NOT IN clause, replacing it with a "Left Anti-Semi Join" (huh?)
UPDATE H SET release='1'
WHERE release='0' AND NOT EXISTS (SELECT * FROM A WHERE id=H.a AND state!='done');
PostgreSQL implements MVCC.
This means that each time you make an update, a new copy of row is created and the old one is marked as deleted (but is not physically deleted).
This slows down the queries.
You should run VACUUM on a timely basis.
PostgreSQL 8.4.4 runs autovacuum daemon to do this, but it may have some problems on your installation.
Does the situation improve when you run VACUUM manually?
Check with pg_total_relation_size('tablename') whether your tables are bloated out of proportion. If that is the case, you may need to tweak the autovacuum configuration.
The other alternative is that the tables are locked. Look into pg_stat_activity or pg_locks to find out.
I think you're not correctly closing transactions.