Postgres slow query (slow index scan) - sql

I have a table with 3 million rows and 1.3GB in size. Running Postgres 9.3 on my laptop with 4GB RAM.
explain analyze
select act_owner_id from cnt_contacts where act_owner_id = 2
I have btree key on cnt_contacts.act_owner_id defined as:
CREATE INDEX cnt_contacts_idx_act_owner_id
ON public.cnt_contacts USING btree (act_owner_id, status_id);
The query runs in about 5 seconds
Bitmap Heap Scan on cnt_contacts (cost=2598.79..86290.73 rows=6208 width=4) (actual time=5865.617..5875.302 rows=5444 loops=1)
Recheck Cond: (act_owner_id = 2)
-> Bitmap Index Scan on cnt_contacts_idx_act_owner_id (cost=0.00..2597.24 rows=6208 width=0) (actual time=5865.407..5865.407 rows=5444 loops=1)
Index Cond: (act_owner_id = 2)
Total runtime: 5875.684 ms"
Why is taking so long?
work_mem = 1024MB;
shared_buffers = 128MB;
effective_cache_size = 1024MB
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 15.0 # same scale as above
cpu_tuple_cost = 3.0

Ok, You have big table, index and long time execution plain for PG. Lets think about ways how to improve you plan and reduce time. You write and remove rows. PG write and remove tuples and table and index can be bloated. For good search PG loads index to shared buffer. And you need keep you index clean as possible. For selection PG reads to shared buffer and than search. Try to set up buffer memory and reduce index and table bloating, keep db cleaned.
What you do and think about:
1) Just check index duplicates and that you indexes have good selection:
WITH table_scans as (
SELECT relid,
tables.idx_scan + tables.seq_scan as all_scans,
( tables.n_tup_ins + tables.n_tup_upd + tables.n_tup_del ) as writes,
pg_relation_size(relid) as table_size
FROM pg_stat_user_tables as tables
),
all_writes as (
SELECT sum(writes) as total_writes
FROM table_scans
),
indexes as (
SELECT idx_stat.relid, idx_stat.indexrelid,
idx_stat.schemaname, idx_stat.relname as tablename,
idx_stat.indexrelname as indexname,
idx_stat.idx_scan,
pg_relation_size(idx_stat.indexrelid) as index_bytes,
indexdef ~* 'USING btree' AS idx_is_btree
FROM pg_stat_user_indexes as idx_stat
JOIN pg_index
USING (indexrelid)
JOIN pg_indexes as indexes
ON idx_stat.schemaname = indexes.schemaname
AND idx_stat.relname = indexes.tablename
AND idx_stat.indexrelname = indexes.indexname
WHERE pg_index.indisunique = FALSE
),
index_ratios AS (
SELECT schemaname, tablename, indexname,
idx_scan, all_scans,
round(( CASE WHEN all_scans = 0 THEN 0.0::NUMERIC
ELSE idx_scan::NUMERIC/all_scans * 100 END),2) as index_scan_pct,
writes,
round((CASE WHEN writes = 0 THEN idx_scan::NUMERIC ELSE idx_scan::NUMERIC/writes END),2)
as scans_per_write,
pg_size_pretty(index_bytes) as index_size,
pg_size_pretty(table_size) as table_size,
idx_is_btree, index_bytes
FROM indexes
JOIN table_scans
USING (relid)
),
index_groups AS (
SELECT 'Never Used Indexes' as reason, *, 1 as grp
FROM index_ratios
WHERE
idx_scan = 0
and idx_is_btree
UNION ALL
SELECT 'Low Scans, High Writes' as reason, *, 2 as grp
FROM index_ratios
WHERE
scans_per_write <= 1
and index_scan_pct < 10
and idx_scan > 0
and writes > 100
and idx_is_btree
UNION ALL
SELECT 'Seldom Used Large Indexes' as reason, *, 3 as grp
FROM index_ratios
WHERE
index_scan_pct < 5
and scans_per_write > 1
and idx_scan > 0
and idx_is_btree
and index_bytes > 100000000
UNION ALL
SELECT 'High-Write Large Non-Btree' as reason, index_ratios.*, 4 as grp
FROM index_ratios, all_writes
WHERE
( writes::NUMERIC / ( total_writes + 1 ) ) > 0.02
AND NOT idx_is_btree
AND index_bytes > 100000000
ORDER BY grp, index_bytes DESC )
SELECT reason, schemaname, tablename, indexname,
index_scan_pct, scans_per_write, index_size, table_size
FROM index_groups;
2) Check if you have tables and index bloating?
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta -- very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
(maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+COUNT(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::NUMERIC) AS bs,
CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC
3) Do you clean unused tuples from hard disk? Is it time for vacuum?
SELECT
relname AS TableName
,n_live_tup AS LiveTuples
,n_dead_tup AS DeadTuples
FROM pg_stat_user_tables;
4) Think about that. If you have 10 records in db and 8 of 10 have id = 2 thats mean you have bad selectivity of index and in this way PG will scan all 8 records. But of you try to use id != 2 index will work good. Try to set index with good selection.
5) Use proper column type got you data. If you can use less kb type for you column just convert it.
6) Just check you DB and condition. Check this for start going page
Just try to see that you have in data base unused data in tables, indexes must be cleaned, check selectivity for you indexes. Try use other brin indexes for data, try to recreate indexes.

You are selecting 5444 records scattered over a 1.3 GB table on a laptop. How long do you expect that to take?
It looks like your index is not cached, either because it can't be sustained in the cache, or because this is the first time you used that part of it. What happens if you run the exact same query repeatedly? The same query but with a different constant?
running the query under "explain (analyze,buffers)" would be helpful to get additional information, particularly if you turned track_io_timing on first.

Related

Distinct query causing the query to be slow

I have a postgres query which is almost taking 7 seconds.
Joining two tables with where clause and when i use distinct it takes 7 seconds and without distinct i get the result in 500ms. I even applied index but of no help. How can i tune the query for better perfomance
select distinct RES.* from ACT_RU_TASK RES inner join ACT_RU_IDENTITYLINK I on
I.TASK_ID_ = RES.ID_ WHERE RES.ASSIGNEE_ is null
and I.TYPE_ = 'candidate'and ( I.GROUP_ID_ IN ( 'us1','us2') )
order by RES.priority_ desc LIMIT 10 OFFSET 0
For every RES.ID_ i have two I.TASK_ID_ so i need only unique records
Instead of using distinct use exists:
select RES.*
from ACT_RU_TASK RES
where exists (select 1
from ACT_RU_IDENTITYLINK I
where I.TASK_ID_ = RES.ID_ and
I.TYPE_ = 'candidate' and
I.GROUP_ID_ IN ( 'us1','us2')
) and
RES.ASSIGNEE_ is null
order by RES.priority_ desc
LIMIT 10 OFFSET 0;
For this query, you want an index on ACT_RU_IDENTITYLINK(TASK_ID_, TYPE_, GROUP_ID_). It is also possible that an index on ACT_RU_TASK(ASSIGNEE_, priority_, ID_) could be used.

Tracking drop column progress in Oracle

I have issued following statement in Oracle on a huge table (1.8 TB size):
alter table drop unused columns checkpoint
It has been running for almost 10 days now, crashing a couple of times due to lack of memory.
I resumed it with :
alter table drop columns continue
How can I track the progress and possibly get an estimate of the finish time?
I tried querying v$session_longops but there are no records for this session.
I know this is an old question, but given that nobody actually ever answered it, here's what I came up with when I ran into the same problem:
select v.*, to_char(100*"Processed"/"# of Blocks", '990D000000') "% Complete"
from (select sid, serial#, status, event, p1text, p1, p2text, p2, p3text, p3, x.segment_name, x.extent_id,
(select sum(case
when p2 not between block_id and block_id + blocks - 1 then blocks
else p2 - block_id
end)
from dba_extents x2
where x2.segment_name = x.segment_name
and extent_id <= x.extent_id) "Processed",
(select sum(blocks)
from dba_extents
where segment_name = x.segment_name) "# of Blocks"
from v$session s
left join dba_extents x on p1 = file_id and p2 between block_id and block_id + blocks - 1
where (sid, serial#) in ((15,40610))
) v
Basically, I'm taking the P2 value (Block #) from the wait event on the session doing the drop, and identifying where within the table's list of segments the process is at.
There is some work left to the reader here, as this assumes that
You know the session and serial#
That "db file sequential read" activities are, in fact sequential.
That the session isn't swapping between multiple files
That the tablespace the table resides in isn't split across multiple files
One could probably also only look at which extent the session was in compared to the total number of extents in the segment to give a rough estimate of progress as well...
Hopefully somebody else can find this helpful, even if only as a breadcrumb to building a better solution.
Cheers
Good query above. I modified the query a bit:
select
v.*,
Round(100*"Processed"/"# of Blocks", 2) "% Complete"
from (
select
sid,
serial#,
status,
event,
p1text,
p1,
p2text,
p2,
p3text,
p3,
x.segment_name,
x.partition_name,
x.extent_id,
( select
sum(
case
when p2.partition_position < p.partition_position
then x2.blocks
when ( x2.partition_name is null or
p2.partition_position = p.partition_position) and
x2.file_id < x.file_id
then x2.blocks
when ( x2.partition_name is null or
p2.partition_position = p.partition_position) and
x2.file_id = x.file_id and
x2.block_id < x.block_id
then x2.blocks
when ( x2.partition_name is null or
p2.partition_position = p.partition_position) and
x2.file_id = x.file_id and
x2.block_id = x.block_id and
s.p2 < x2.block_id + x2.blocks
then s.p2 - x2.block_id
end
)
from DBA_EXTENTS x2
left join DBA_TAB_PARTITIONS p2
on p2.table_owner = x2.owner and
p2.table_name = x2.segment_name and
p2.partition_name = x2.partition_name
where x2.segment_name = x.segment_name
) as "Processed",
( select
sum(x2.blocks)
from DBA_EXTENTS x2
where x2.segment_name = x.segment_name
) as "# of Blocks"
from V$SESSION s
left join DBA_EXTENTS x
on s.p1 = x.file_id and
s.p2 between x.block_id and x.block_id + x.blocks - 1
left join DBA_TAB_PARTITIONS p
on p.table_owner = x.owner and
p.table_name = x.segment_name and
p.partition_name = x.partition_name
where
--(sid, serial#) in ((84,23431)) or
s.event = 'db file scattered read'
) v

postgres: select rows where foreign key count less than value

I've inherited a particularly slow performing query but I'm unclear of the best path to maintain the functionality and reduce the query cost.
A pared down version of the query looks like so:
select * from api_event where COALESCE(
(SELECT count(*) FROM api_ticket WHERE
event_id = api_event.id),
0
) < api_event.ticket_max AND COALESCE(
(SELECT count(*) FROM api_ticket WHERE
api_ticket.user_id = 45187 AND event_id = api_event.id
and api_ticket.status != 'x'),
0
) < api_event.ticket_max_per_user;
Runing Explain/Analyze on that seems to tell me that this requires a sequential scan on the api_event table:
Seq Scan on api_event (cost=0.00..69597.99 rows=448 width=243) (actual time=0.059..230.981 rows=1351 loops=1)
Filter: ((COALESCE((SubPlan 1), 0::bigint) < ticket_max) AND (COALESCE((SubPlan 2), 0::bigint) < ticket_max_per_user))
Rows Removed by Filter: 2647
Any suggestions on how I can improve this?
Rewriting the query as an explicit join will probably help:
select e.*
from api_event e left join
(select t.event_id, count(*) as cnt,
sum(case when t.user_id = 45187 and t.status <> 'x' then 1 else 0
end) as special_cnt
from api_ticket t
group by t.event_id
) t
on e.id = t.event_id
where coalesce(t.cnt, 0) < e.ticket_max and
coalesce(special_cnt, 0) < e.ticket_max_per_user;
This is a corrolated subqueries, recently i have improved the performance of some queries by avoiding corrolated subqueries using with based queries, its extremely fast in Oracle and I hope that it help you with Postgres.

Excessive runtime sql. How to improve?

I have the following SQL query to execute in Sql Server MSSM, and it takes more than 5 seconds to run. The tables that are joined by the inner join, just a few tens of thousands of records. Why does it takes so long?.
The higher costs of the query are: - Clustered Index Scan [MyDB].[dbo].[LinPresup].[PK_LinPresup_Linea_IdPresupuesto_IdPedido] 78%. - Clustered Index Seek [MyDB].[dbo].[Pedidos].[PK_Pedidos_IdPedido] 19%
Thank you.
Declare #FILTROPAG bigint
set #FILTROPAG = 1
Declare #FECHATRABAJO DATETIME
set #FECHATRABAJO = getDate()
Select * from(
SELECT distinct Linpresup.IdCliente, Linpresup.IdPedido, Linpresup.FSE, Linpresup.IdArticulo,
Linpresup.Des, ((Linpresup.can*linpresup.mca)-(linpresup.srv*linpresup.mca)) as Pendiente,
Linpresup.IdAlmacen, linpresup.IdPista, articulos.Tip, linpresup.Linea,
ROW_NUMBER() OVER(ORDER BY CONVERT(Char(19), Linpresup.FSE, 120) +
Linpresup.IdPedido + CONVERT(char(2), linpresup.Linea) DESC) as NUM_REG
FROM Linpresup INNER JOIN Pedidos on LinPresup.IdPedido = Pedidos.IdPedido
INNER JOIN Articulos ON Linpresup.IdArticulo = Articulos.IdArticulo
where pedidos.Cerrado = 'false' and linpresup.IdPedido <> '' and linpresup.can <> linpresup.srv
and Linpresup.FecAnulacion is null and Linpresup.Fse <= #FECHATRABAJO
and LinPresup.IdCliente not in (Select IdCliente from Clientes where Ctd = '4')
and Substring(LinPresup.IdPedido, 5, 2) LIKE '11' or Substring(LinPresup.IdPedido, 5, 2) LIKE '10'
) as TablaTemp
WHERE NUM_REG BETWEEN #FILTROPAG AND 1500
order by NUM_REG ASC
----------
This is the new query with the changes applied:
CHECKPOINT;
go
dbcc freeproccache
go
dbcc dropcleanbuffers
go
Declare #FILTROPAG bigint
set #FILTROPAG = 1
Declare #FECHATRABAJO DATETIME
set #FECHATRABAJO = getDate()
SELECT Linpresup.IdCliente, Linpresup.IdPedido, Linpresup.FSE, Linpresup.IdArticulo,
Linpresup.Des, Linpresup.can, linpresup.mca, linpresup.srv,
Linpresup.IdAlmacen, linpresup.IdPista, linpresup.Linea
into #TEMPREP
FROM Linpresup
where Linpresup.FecAnulacion is null and linpresup.IdPedido <> ''
and (linpresup.can <> linpresup.srv) and Linpresup.Fse <= #FECHATRABAJO
Select *, ((can*mca)-(srv*mca)) as Pendiente
From(
Select tablaTemp.*, ROW_NUMBER() OVER(ORDER BY FSECONVERT + IDPedido + LINCONVERT DESC) as NUM_REG, Articulos.Tip
From(
Select #TEMPREP.*,
Substring(#TEMPREP.IdPedido, 5, 2) as NewCol,
CONVERT(Char(19), #TEMPREP.FSE, 120) as FSECONVERT, CONVERT(char(2), #TEMPREP.Linea) as LINCONVERT
from #TEMPREP INNER JOIN Pedidos on #TEMPREP.IdPedido = Pedidos.IdPedido
where Pedidos.Cerrado = 'false'
and #TEMPREP.IdCliente not in (Select IdCliente from Clientes where Ctd = '4')) as tablaTemp
inner join Articulos on tablaTemp.IDArticulo = Articulos.IdArticulo
where (NewCol = '10' or NewCol = '11')) as TablaTemp2
where NUM_REG BETWEEN #FILTROPAG AND 1500
order by NUM_REG ASC
DROP TABLE #TEMPREP
The total execution time has decreased from 5336 to 3978, and the waiting time for a server response has come to take from 5309 to 2730. It's something.
This part of your query is not SARGable and an index scan will be performed instead of a seek
and Substring(LinPresup.IdPedido, 5, 2) LIKE '11'
or Substring(LinPresup.IdPedido, 5, 2) LIKE '10'
functions around column names in general will lead to an index scan
Without seeing your execution plan it's hard to say. That said the following jumps out at me as a potential danger point:
and Substring(LinPresup.IdPedido, 5, 2) LIKE '11'
or Substring(LinPresup.IdPedido, 5, 2) LIKE '10'
I suspect that using the substring function here will cause any potentially useful indexes to not be used. Also, why are you using LIKE here? I'm guessing it probably gets optimized out, but it seems like a standard = would work...
I can't imagine why you would think such a query would run quickly. You are:
ordering the recordset twice (and once with where you are using
concatentation and functions),
your where clause has functions (which are not sargable) and ORs
which are almost always slow,
you use not in where not exists would probably be faster.
you have math calculations
And you haven't mentioned your indexing (which may or may not be helpful) or what the execution plan shows as the spots that are affecting performance the most.
I would probably start with pulling the distinct data to a CTE or temp table (you can index temp tables) without the calcualtions (to ensure when you do the calcs later it is against the smallest data set). Then I would convert the substrings to LinPresup.IdPedido LIKE '1[0-1]%'. I woudl convert the not in to not exists. I would put the math in the outer query so that is is only done on the smalest data set.

SQL MIN in Sub query causes huge delay

I have a SQL query that I'm trying to debug. It works fine for small sets of data, but in large sets of data, this particular part of it causes it to take 45-50 seconds instead of being sub second in speed. This subquery is one of the select items in a larger query. I'm basically trying to figure out when the earliest work date is that fits in the same category as the current row we are looking at (from table dr)
ISNULL(CONVERT(varchar(25),(SELECT MIN(drsd.DateWorked) FROM [TableName] drsd
WHERE drsd.UserID = dr.UserID
AND drsd.Val1 = dr.Val1
OR (((drsd.Val2 = dr.Val2 AND LEN(dr.Val2) > 0) AND (drsd.Val3 = dr.Val3 AND LEN(dr.Val3) > 0) AND (drsd.Val4 = dr.Val4 AND LEN(dr.Val4) > 0))
OR (drsd.Val5 = dr.Val5 AND LEN(dr.Val5) > 0)
OR ((drsd.Val6 = dr.Val6 AND LEN(dr.Val6) > 0) AND (drsd.Val7 = dr.Val7 AND LEN(dr.Val2) > 0))))), '') AS WorkStartDate,
This winds up executing a key lookup some 18 million times on a table that has 346,000 records. I've tried creating an index on it, but haven't had any success. Also, selecting a max value in this same query is sub second in time, as it doesn't have to execute very many times at all.
Any suggestions of a different approach to try? Thanks!
Create a composite index on drsd (UserID, DateWorked).
It is also possible that the record distribution in drsd is skewed towards the greater dates, like this:
DateWorked Condition
01.01.2001 FALSE
02.01.2001 FALSE
…
18.04.2010 FALSE
19.04.2010 TRUE
In this case, the MAX query will need to browse over only 1 record, while the MIN query will have to browse all records from 2001 and further on.
In this case, you'll need to create four separate indexes:
UserId, Val1, DateWorked
UserId, Val2, Val3, Val4, DateWorked
UserId, Val5, DateWorked
UserId, Val6, Val7, DateWorked
and rewrite the subquery:
SELECT MIN(dateWorked)
FROM (
SELECT MIN(DateWorked) AS DateWorked
FROM drsd
WHERE UserID = dr.UserID
AND Val1 = dr.Val1
UNION ALL
SELECT MIN(DateWorked)
FROM drsd
WHERE UserID = dr.UserID
AND drsd.Val2 = dr.Val2 AND LEN(dr.Val2) > 0
AND drsd.Val3 = dr.Val3 AND LEN(dr.Val3) > 0
AND drsd.Val4 = dr.Val4 AND LEN(dr.Val4) > 0
UNION ALL
SELECT MIN(DateWorked)
FROM drsd
WHERE UserID = dr.UserID
AND drsd.Val5 = dr.Val5 AND LEN(dr.Val5) > 0
UNION ALL
SELECT MIN(DateWorked)
FROM drsd
WHERE UserID = dr.UserID
AND drsd.Val6 = dr.Val6 AND LEN(dr.Val6) > 0
AND drsd.Val7 = dr.Val7 AND LEN(dr.Val7) > 0
) q
Each query will use its own index and the final query will just select the minimal of the four values (which is instant).