postgres: select rows where foreign key count less than value - sql

I've inherited a particularly slow performing query but I'm unclear of the best path to maintain the functionality and reduce the query cost.
A pared down version of the query looks like so:
select * from api_event where COALESCE(
(SELECT count(*) FROM api_ticket WHERE
event_id = api_event.id),
0
) < api_event.ticket_max AND COALESCE(
(SELECT count(*) FROM api_ticket WHERE
api_ticket.user_id = 45187 AND event_id = api_event.id
and api_ticket.status != 'x'),
0
) < api_event.ticket_max_per_user;
Runing Explain/Analyze on that seems to tell me that this requires a sequential scan on the api_event table:
Seq Scan on api_event (cost=0.00..69597.99 rows=448 width=243) (actual time=0.059..230.981 rows=1351 loops=1)
Filter: ((COALESCE((SubPlan 1), 0::bigint) < ticket_max) AND (COALESCE((SubPlan 2), 0::bigint) < ticket_max_per_user))
Rows Removed by Filter: 2647
Any suggestions on how I can improve this?

Rewriting the query as an explicit join will probably help:
select e.*
from api_event e left join
(select t.event_id, count(*) as cnt,
sum(case when t.user_id = 45187 and t.status <> 'x' then 1 else 0
end) as special_cnt
from api_ticket t
group by t.event_id
) t
on e.id = t.event_id
where coalesce(t.cnt, 0) < e.ticket_max and
coalesce(special_cnt, 0) < e.ticket_max_per_user;

This is a corrolated subqueries, recently i have improved the performance of some queries by avoiding corrolated subqueries using with based queries, its extremely fast in Oracle and I hope that it help you with Postgres.

Related

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

Postgres slow query (slow index scan)

I have a table with 3 million rows and 1.3GB in size. Running Postgres 9.3 on my laptop with 4GB RAM.
explain analyze
select act_owner_id from cnt_contacts where act_owner_id = 2
I have btree key on cnt_contacts.act_owner_id defined as:
CREATE INDEX cnt_contacts_idx_act_owner_id
ON public.cnt_contacts USING btree (act_owner_id, status_id);
The query runs in about 5 seconds
Bitmap Heap Scan on cnt_contacts (cost=2598.79..86290.73 rows=6208 width=4) (actual time=5865.617..5875.302 rows=5444 loops=1)
Recheck Cond: (act_owner_id = 2)
-> Bitmap Index Scan on cnt_contacts_idx_act_owner_id (cost=0.00..2597.24 rows=6208 width=0) (actual time=5865.407..5865.407 rows=5444 loops=1)
Index Cond: (act_owner_id = 2)
Total runtime: 5875.684 ms"
Why is taking so long?
work_mem = 1024MB;
shared_buffers = 128MB;
effective_cache_size = 1024MB
seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 15.0 # same scale as above
cpu_tuple_cost = 3.0
Ok, You have big table, index and long time execution plain for PG. Lets think about ways how to improve you plan and reduce time. You write and remove rows. PG write and remove tuples and table and index can be bloated. For good search PG loads index to shared buffer. And you need keep you index clean as possible. For selection PG reads to shared buffer and than search. Try to set up buffer memory and reduce index and table bloating, keep db cleaned.
What you do and think about:
1) Just check index duplicates and that you indexes have good selection:
WITH table_scans as (
SELECT relid,
tables.idx_scan + tables.seq_scan as all_scans,
( tables.n_tup_ins + tables.n_tup_upd + tables.n_tup_del ) as writes,
pg_relation_size(relid) as table_size
FROM pg_stat_user_tables as tables
),
all_writes as (
SELECT sum(writes) as total_writes
FROM table_scans
),
indexes as (
SELECT idx_stat.relid, idx_stat.indexrelid,
idx_stat.schemaname, idx_stat.relname as tablename,
idx_stat.indexrelname as indexname,
idx_stat.idx_scan,
pg_relation_size(idx_stat.indexrelid) as index_bytes,
indexdef ~* 'USING btree' AS idx_is_btree
FROM pg_stat_user_indexes as idx_stat
JOIN pg_index
USING (indexrelid)
JOIN pg_indexes as indexes
ON idx_stat.schemaname = indexes.schemaname
AND idx_stat.relname = indexes.tablename
AND idx_stat.indexrelname = indexes.indexname
WHERE pg_index.indisunique = FALSE
),
index_ratios AS (
SELECT schemaname, tablename, indexname,
idx_scan, all_scans,
round(( CASE WHEN all_scans = 0 THEN 0.0::NUMERIC
ELSE idx_scan::NUMERIC/all_scans * 100 END),2) as index_scan_pct,
writes,
round((CASE WHEN writes = 0 THEN idx_scan::NUMERIC ELSE idx_scan::NUMERIC/writes END),2)
as scans_per_write,
pg_size_pretty(index_bytes) as index_size,
pg_size_pretty(table_size) as table_size,
idx_is_btree, index_bytes
FROM indexes
JOIN table_scans
USING (relid)
),
index_groups AS (
SELECT 'Never Used Indexes' as reason, *, 1 as grp
FROM index_ratios
WHERE
idx_scan = 0
and idx_is_btree
UNION ALL
SELECT 'Low Scans, High Writes' as reason, *, 2 as grp
FROM index_ratios
WHERE
scans_per_write <= 1
and index_scan_pct < 10
and idx_scan > 0
and writes > 100
and idx_is_btree
UNION ALL
SELECT 'Seldom Used Large Indexes' as reason, *, 3 as grp
FROM index_ratios
WHERE
index_scan_pct < 5
and scans_per_write > 1
and idx_scan > 0
and idx_is_btree
and index_bytes > 100000000
UNION ALL
SELECT 'High-Write Large Non-Btree' as reason, index_ratios.*, 4 as grp
FROM index_ratios, all_writes
WHERE
( writes::NUMERIC / ( total_writes + 1 ) ) > 0.02
AND NOT idx_is_btree
AND index_bytes > 100000000
ORDER BY grp, index_bytes DESC )
SELECT reason, schemaname, tablename, indexname,
index_scan_pct, scans_per_write, index_size, table_size
FROM index_groups;
2) Check if you have tables and index bloating?
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::FLOAT/otta END)::NUMERIC,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::FLOAT/iotta END)::NUMERIC,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::FLOAT)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::FLOAT)),0) AS iotta -- very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(CASE WHEN hdr%ma=0 THEN ma ELSE hdr%ma END)))::NUMERIC AS datahdr,
(maxfracsum*(nullhdr+ma-(CASE WHEN nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+COUNT(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::NUMERIC) AS bs,
CASE WHEN SUBSTRING(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC
3) Do you clean unused tuples from hard disk? Is it time for vacuum?
SELECT
relname AS TableName
,n_live_tup AS LiveTuples
,n_dead_tup AS DeadTuples
FROM pg_stat_user_tables;
4) Think about that. If you have 10 records in db and 8 of 10 have id = 2 thats mean you have bad selectivity of index and in this way PG will scan all 8 records. But of you try to use id != 2 index will work good. Try to set index with good selection.
5) Use proper column type got you data. If you can use less kb type for you column just convert it.
6) Just check you DB and condition. Check this for start going page
Just try to see that you have in data base unused data in tables, indexes must be cleaned, check selectivity for you indexes. Try use other brin indexes for data, try to recreate indexes.
You are selecting 5444 records scattered over a 1.3 GB table on a laptop. How long do you expect that to take?
It looks like your index is not cached, either because it can't be sustained in the cache, or because this is the first time you used that part of it. What happens if you run the exact same query repeatedly? The same query but with a different constant?
running the query under "explain (analyze,buffers)" would be helpful to get additional information, particularly if you turned track_io_timing on first.

SQL: Using COUNT(*) Instead of EXISTS

Is it possible to use COUNT in place of EXISTS?
I have following query:
SELECT *
FROM Goals G
WHERE EXISTS (SELECT NULL FROM tfv_home_last6(G.Date, G.Home) WHERE GameNumber <= 6 AND
HomeGoals >= 3)
Instead of returning the row if at least one row exists in the subquery, I'd like to specify a number of rows that need to be returned in the subquery, something like
SELECT *
FROM Goals G
WHERE ROWCOUNT(*) >= 2 (SELECT NULL FROM tfv_home_last6(G.Date, G.Home) WHERE GameNumber <= 6 AND
HomeGoals >= 3)
I'm not sure how to go about it?
I'm using SQL Server 2012.
You can do the subquery pretty much just like you describe:
SELECT *
FROM Goals G
WHERE (SELECT count(*)
FROM tfv_home_last6(G.Date, G.Home)
WHERE GameNumber <= 6 AND HomeGoals >= 3
) > 0;
However, this requires calculating the entire count. The exists form is more efficient, because it stops at the first matching record.
In SQL Server 2012, you could also use `cross apply:
SELECT *
FROM Goals G cross apply
(select count(*) as cnt
FROM tfv_home_last6(G.Date, G.Home)
WHERE GameNumber <= 6 AND HomeGoals >= 3
) a
WHERE a.cnt > 0;
I do not know which would have better performance, the correlated subquery in the where clause or the
cross apply version.

Excessive runtime sql. How to improve?

I have the following SQL query to execute in Sql Server MSSM, and it takes more than 5 seconds to run. The tables that are joined by the inner join, just a few tens of thousands of records. Why does it takes so long?.
The higher costs of the query are: - Clustered Index Scan [MyDB].[dbo].[LinPresup].[PK_LinPresup_Linea_IdPresupuesto_IdPedido] 78%. - Clustered Index Seek [MyDB].[dbo].[Pedidos].[PK_Pedidos_IdPedido] 19%
Thank you.
Declare #FILTROPAG bigint
set #FILTROPAG = 1
Declare #FECHATRABAJO DATETIME
set #FECHATRABAJO = getDate()
Select * from(
SELECT distinct Linpresup.IdCliente, Linpresup.IdPedido, Linpresup.FSE, Linpresup.IdArticulo,
Linpresup.Des, ((Linpresup.can*linpresup.mca)-(linpresup.srv*linpresup.mca)) as Pendiente,
Linpresup.IdAlmacen, linpresup.IdPista, articulos.Tip, linpresup.Linea,
ROW_NUMBER() OVER(ORDER BY CONVERT(Char(19), Linpresup.FSE, 120) +
Linpresup.IdPedido + CONVERT(char(2), linpresup.Linea) DESC) as NUM_REG
FROM Linpresup INNER JOIN Pedidos on LinPresup.IdPedido = Pedidos.IdPedido
INNER JOIN Articulos ON Linpresup.IdArticulo = Articulos.IdArticulo
where pedidos.Cerrado = 'false' and linpresup.IdPedido <> '' and linpresup.can <> linpresup.srv
and Linpresup.FecAnulacion is null and Linpresup.Fse <= #FECHATRABAJO
and LinPresup.IdCliente not in (Select IdCliente from Clientes where Ctd = '4')
and Substring(LinPresup.IdPedido, 5, 2) LIKE '11' or Substring(LinPresup.IdPedido, 5, 2) LIKE '10'
) as TablaTemp
WHERE NUM_REG BETWEEN #FILTROPAG AND 1500
order by NUM_REG ASC
----------
This is the new query with the changes applied:
CHECKPOINT;
go
dbcc freeproccache
go
dbcc dropcleanbuffers
go
Declare #FILTROPAG bigint
set #FILTROPAG = 1
Declare #FECHATRABAJO DATETIME
set #FECHATRABAJO = getDate()
SELECT Linpresup.IdCliente, Linpresup.IdPedido, Linpresup.FSE, Linpresup.IdArticulo,
Linpresup.Des, Linpresup.can, linpresup.mca, linpresup.srv,
Linpresup.IdAlmacen, linpresup.IdPista, linpresup.Linea
into #TEMPREP
FROM Linpresup
where Linpresup.FecAnulacion is null and linpresup.IdPedido <> ''
and (linpresup.can <> linpresup.srv) and Linpresup.Fse <= #FECHATRABAJO
Select *, ((can*mca)-(srv*mca)) as Pendiente
From(
Select tablaTemp.*, ROW_NUMBER() OVER(ORDER BY FSECONVERT + IDPedido + LINCONVERT DESC) as NUM_REG, Articulos.Tip
From(
Select #TEMPREP.*,
Substring(#TEMPREP.IdPedido, 5, 2) as NewCol,
CONVERT(Char(19), #TEMPREP.FSE, 120) as FSECONVERT, CONVERT(char(2), #TEMPREP.Linea) as LINCONVERT
from #TEMPREP INNER JOIN Pedidos on #TEMPREP.IdPedido = Pedidos.IdPedido
where Pedidos.Cerrado = 'false'
and #TEMPREP.IdCliente not in (Select IdCliente from Clientes where Ctd = '4')) as tablaTemp
inner join Articulos on tablaTemp.IDArticulo = Articulos.IdArticulo
where (NewCol = '10' or NewCol = '11')) as TablaTemp2
where NUM_REG BETWEEN #FILTROPAG AND 1500
order by NUM_REG ASC
DROP TABLE #TEMPREP
The total execution time has decreased from 5336 to 3978, and the waiting time for a server response has come to take from 5309 to 2730. It's something.
This part of your query is not SARGable and an index scan will be performed instead of a seek
and Substring(LinPresup.IdPedido, 5, 2) LIKE '11'
or Substring(LinPresup.IdPedido, 5, 2) LIKE '10'
functions around column names in general will lead to an index scan
Without seeing your execution plan it's hard to say. That said the following jumps out at me as a potential danger point:
and Substring(LinPresup.IdPedido, 5, 2) LIKE '11'
or Substring(LinPresup.IdPedido, 5, 2) LIKE '10'
I suspect that using the substring function here will cause any potentially useful indexes to not be used. Also, why are you using LIKE here? I'm guessing it probably gets optimized out, but it seems like a standard = would work...
I can't imagine why you would think such a query would run quickly. You are:
ordering the recordset twice (and once with where you are using
concatentation and functions),
your where clause has functions (which are not sargable) and ORs
which are almost always slow,
you use not in where not exists would probably be faster.
you have math calculations
And you haven't mentioned your indexing (which may or may not be helpful) or what the execution plan shows as the spots that are affecting performance the most.
I would probably start with pulling the distinct data to a CTE or temp table (you can index temp tables) without the calcualtions (to ensure when you do the calcs later it is against the smallest data set). Then I would convert the substrings to LinPresup.IdPedido LIKE '1[0-1]%'. I woudl convert the not in to not exists. I would put the math in the outer query so that is is only done on the smalest data set.