Count with subselect really slow in postgres - sql

I have this query:
SELECT c.name, COUNT(t.id)
FROM Cinema c
JOIN CinemaMovie cm ON cm.cinema_id = c.id
JOIN Ticket t ON cm.id = cinema_movie_id
WHERE cm.id IN (
SELECT cm1.id
FROM CinemaMovie cm1
JOIN Movie m1 ON m1.id = cm1.movie_id
JOIN Ticket t1 ON t1.cinema_movie_id = cm1.id
WHERE m1.name = 'Hellboy'
AND t1.time >= timestamp '2019-04-18 00:00:00'
AND t1.time <= timestamp '2019-04-18 23:59:59' )
GROUP BY c.id;
and the problem is that this query runs really slow (more than 1 minute) when the table has like 20 million rows. From what I understand, the problem seems to be the inner query, as it takes a long time. Also, I have all indexes on foreign keys. What am I missing ?
Also note that when I select only by name (I omit the date) everything takes like 10 seconds.
EDIT
What I am trying to do, is count number of tickets for each cinema name, based on movie name and the timestamp on ticket.

I don't understand why you are using a subquery. Does this do what you want?
SELECT c.name, COUNT(t.id)
FROM Cinema c JOIN
CinemaMovie cm
ON cm.cinema_id = c.id JOIN
Ticket t
ON cm.id = cinema_movie_id JOIN
Movie m
ON m.id = cm.movie_id
WHERE m.name = 'Hellboy' AND
t.time >= '2019-04-18'::timestamp and
t.time < '2019-04-19'::timestamp
GROUP BY c.id, c.name;

Related

why "OR" operator is slower than union in oracle

Does anyone know why an "OR" operator is slower than union in ORACLE.
I have query like this:
Select
O.Order_number,
DA. ID,
DA.Country,
Sum(amount) Amount
from
Order O
left join Delivery_Address DA on
O.ID = DA.order_Id
left join TBL_A on
TBL_A.DA_ID = DA.ID
enter code here
< ... Left joining another 10 tables>
enter code here
Left join Transaction Tr on
TR.Order_id = Order.id
where
DA.Country = 'USA'
OR
Tr.transaction_Date between to_date('20200701','yyyymmdd') and sysdate
This takes 200 secs for the first 50 records.
Select
O.Order_number,
DA.ID,
DA.Country,
Sum(amount) Amount
from
Order O
left join Delivery_Address DA on
O.ID = DA.order_Id
left join TBL_A on
TBL_A.DA_ID = DA.ID
enter code here
< ... Left joining another 10 tables>
enter code here
Left join Transaction Tr on
TR.Order_id = Order.id
where
DA.Country = 'USA'
union
Select
O.Order_number,
DA. ID,
DA.Country,
Sum(amount) Amount
from
Order O
left join Delivery_Address DA on
O.ID = DA.order_Id
left join TBL_A on
TBL_A.DA_ID = DA.ID
enter code here
< ... Left joining another 10 tables>
enter code here
Left join Transaction Tr on
TR.Order_id = Order.id
where
Tr.transaction_Date between to_date('20200701','yyyymmdd') and sysdate
This second query takes 13 secs for the first 50 records.
The transaction_date from the Transaction table is indexed, but the Country column is not indexed.
Anyone have any idea?
The OR allows each subquery to be evaluated independently.
You would have to look at the execution plans to see what is really happening. However, in the first subquery query, an index using da.country is probably using an index. And in the second, tr.transaction_date.

Get all tables where there is no booking on this time or date

So basically I have a tables table. And a bookings table. A table can be assigned to a booking via the table_no column. The booking also has a reservation_time and reservation_date columns. What I'd like my query to do, is to return all tables that aren't linked to a booking on a certain time or date. It's really bugging me.
Here is what my query looks like as of now
select t.id, t.number
FROM tables t JOIN
bookings b
ON b.table_no = t.number JOIN
reservation_time_data r
ON r.id = b.reservation_time
WHERE t.number != b.table_no AND b.reservation_date != '2020-07-22' AND 45 NOT BETWEEN r.start_time AND r.end_time
You seem to want not exists. Based on your sample query, I think this is:
select t.id, t.number
from tables t
where not exists (select 1
from bookings b join
reservation_time_data r
on r.id = b.reservation_time
where b.table_no = t.number and
b.reservation_date = '2020-07-22' and
45 >= r.start_time and
45 <= r.end_time
);
I think you can get it with left join like this
select t.id, t.number FROM tables t Left JOIN bookings b ON b.table_no = t.number
WHERE b.table_no is null AND (b.reservation_date = '2020-07-22' Or b.[your time column here] BETWEEN b.start_time AND b.end_time )

Count columns of joined table

I am writing a query to summarize the data in a Postgres database:
SELECT products.id,
products.NAME,
product_types.type_name AS product_type,
delivery_types.delivery,
products.required_selections,
Count(s.id) AS selections_count,
Sum(CASE
WHEN ss.status = 'WARNING' THEN 1
ELSE 0
END) AS warning_count
FROM products
JOIN product_types
ON product_types.id = products.product_type_id
JOIN delivery_types
ON delivery_types.id = products.delivery_type_id
LEFT JOIN selections_products sp
ON products.id = sp.product_id
LEFT JOIN selections s
ON s.id = sp.selection_id
LEFT JOIN selection_statuses ss
ON ss.id = s.selection_status_id
LEFT JOIN listings l
ON ( s.listing_id = l.id
AND l.local_date_time BETWEEN
To_timestamp('2014/12/01', 'YYYY/mm/DD'
) AND
To_timestamp('2014/12/30', 'YYYY/mm/DD') )
GROUP BY products.id,
product_types.type_name,
delivery_types.delivery
Basically we have a product with selections, these selections have listings and the listings have a local_date. I need a list of all products and how many listings they have between the two dates. No matter what I do, I get a count of all selections (a total). I feel like I'm overlooking something. The same concept goes for warning_count. Also, I don't really understand why Postgres requires me to add a group by here.
The schema looks like this (the parts you would care about anyway):
products
name:string
, product_type:fk
, required_selections:integer
, deliver_type:fk
selections_products
product_id:fk
, selection_id:fk
selections
selection_status_id:fk
, listing_id:fk
selection_status
status:string
listing
local_date:datetime
The way you have it you LEFT JOIN to all selections irregardless of listings.local_date_time.
There is room for interpretation, we would need to see actual table definitions with all constraints and data types to be sure. Going out on a limb, my educated guess is you can fix your query with the use of parentheses in the FROM clause to prioritize joins:
SELECT p.id
, p.name
, pt.type_name AS product_type
, dt.delivery
, p.required_selections
, count(s.id) AS selections_count
, sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count
FROM products p
JOIN product_types pt ON pt.id = p.product_type_id
JOIN delivery_types dt ON dt.id = p.delivery_type_id
LEFT JOIN ( -- LEFT JOIN!
selections_products sp
JOIN selections s ON s.id = sp.selection_id -- INNER JOIN!
JOIN listings l ON l.id = s.listing_id -- INNER JOIN!
AND l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2014-12-31'
LEFT JOIN selection_statuses ss ON ss.id = s.selection_status_id
) ON sp.product_id = p.id
GROUP BY p.id, pt.type_name, dt.delivery;
This way, you first eliminate all selections outside the given time frame with [INNER] JOIN before you LEFT JOIN to products, thus keeping all products in the result, including those that aren't in any applicable selection.
Related:
Join four tables involving LEFT JOIN without duplicates
While selecting all or most products, this can be rewritten to be faster:
SELECT p.id
, p.name
, pt.type_name AS product_type
, dt.delivery
, p.required_selections
, COALESCE(s.selections_count, 0) AS selections_count
, COALESCE(s.warning_count, 0) AS warning_count
FROM products p
JOIN product_types pt ON pt.id = p.product_type_id
JOIN delivery_types dt ON dt.id = p.delivery_type_id
LEFT JOIN (
SELECT sp.product_id
, count(*) AS selections_count
, count(*) FILTER (WHERE ss.status = 'WARNING') AS warning_count
FROM selections_products sp
JOIN selections s ON s.id = sp.selection_id
JOIN listings l ON l.id = s.listing_id
LEFT JOIN selection_statuses ss ON ss.id = s.selection_status_id
WHERE l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2014-12-31'
GROUP BY 1
) s ON s.product_id = p.id;
It's cheaper to aggregate and count selections and warnings per product_id first, and then join to products. (Unless you only retrieve a small selection of products, then it's cheaper to reduce related rows first.)
Related:
Why does the following join increase the query time significantly?
Also, I don't really understand why Postgres requires me to add a group by here.
Since Postgres 9.1, the PK column in GROUP BY covers all columns of the same table. That does not cover columns of other tables, even if they are functionally dependent. You need to list those explicitly in GROUP BY if you don't want to aggregate them.
My second query avoids this problem on the outset by aggregating before the join.
Aside: chances are, this doesn't do what you want:
l.local_date_time BETWEEN To_timestamp('2014/12/01', 'YYYY/mm/DD')
AND To_timestamp('2014/12/30', 'YYYY/mm/DD')
Since date_time seems to be of type timestamp (not timestamptz!), you would include '2014-12-30 00:00', but exclude the rest of the day '2014-12-30'. And it's always better to use ISO 8601 format for dates and timestamps, which is means the same with every locale and datestyle setting. Hence:
WHERE l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2014-12-31'
This includes all of '2014-12-30', and nothing else. No idea why you chose to exclude '2014-12-31'. Maybe you really want to include all of Dec. 2014?
WHERE l.local_date_time >= '2014-12-01'
AND l.local_date_time < '2015-01-01'

Self join on joined table

My query looks like
Select m.cw_sport_match_id as MatchId,
m.season_id as SeasonId,
s.title as SeasonName,
c.title as ContestName
from dbo.cw_sport_match m
inner join dbo.cw_sport_season s
ON m.season_id = s.cw_sport_season_id
inner join dbo.cw_sport_contest c
ON m.contest_id = c.cw_sport_contest_id
Where s.date_start <= GETDATE() AND s.date_end >= GETDATE()
order by s.date_start
No i need the name parent of the sport_contest (if there is one, it can be null). So basically a self join but no on the same table as the query is for. All the examples that i find do the self join are not done on another table.
can any sql pro help me?
So how can i join the cw_sport_season itself with the season_parent_id and get the title of it?
If I'm understanding your question correctly, you want to outer join the cw_sport_season table to itself using the season_parent_id field. Maybe something on these lines:
Select m.cw_sport_match_id as MatchId,
m.season_id as SeasonId,
s.title as SeasonName,
parent.title as ParentSeasonName,
c.title as ContestName
from dbo.cw_sport_match m
inner join dbo.cw_sport_season s
ON m.season_id = s.cw_sport_season_id
inner join dbo.cw_sport_contest c
ON m.contest_id = c.cw_sport_contest_id
left join dbo.cw_sport_season parent
ON s.season_parent_id = parent.cw_sport_season_id
Where s.date_start <= GETDATE() AND s.date_end >= GETDATE()
order by s.date_start

PostgreSQL - INNER JOIN two tables with a LIMIT

I've seen this post which almost coincides with my question but my specific problem is that I need to put a limit to the third table/query, as in LIMIT 15, for example. Is there an easy way to achieve this? Thanks!
EDIT
My SQL SELECT statement would look something like this:
SELECT t2.name AS user_name, t3.name AS artist_name
FROM tbl1 t1
INNER JOIN tbl2 t2 ON t1.t1able_id = t2.id
INNER JOIN (SELECT * FROM tbl3 WHERE artist_id = 100 limit 15) t3
ON t2.id = t3.artist_id
WHERE t1.kind = 'kind'
To clarify: It's just a matter of joining two tables but the second table has two states. First state as a "common user" and the next state as an "artist" (both using the same table, e.g. users).
Try this query:
select *
from
tableA a
inner join
tableB b
on a.common = b.common
inner join
(select * from tableC order by some_column limit 15) c
on b.common = c.common
Few days ago I searched for such answer and Postgresql provides to do it smoothly with rank().
My goal was to get last 3 submitted annual report files before commercial firm got started its first insolvency process.
SELECT * FROM
(
SELECT r.id, f.id AS id_file, f.file_date, min(p.process_started) AS "first_process_started",
rank() OVER (PARTITION BY r.id ORDER BY f.file_date DESC) AS "rank"
FROM registry r
INNER JOIN files f ON (r.id = f.id_registry)
INNER JOIN processes p ON (r.id = p.id_registry)
WHERE
r.type = 'LIMITED_LIABILITY_COMPANY'
AND f.file_type = 'ANNUAL_REPORT')
AND p.process_type = ('INSOLVENCY')
GROUP BY r.id, f.id, f.file_date
HAVING f.file_date <= min(p.process_started)
) AS ranked_files
WHERE rank <= 3