Subquery instead of distinct and order by in postgres - sql

I have two tables: Company, Event.
Company
{
id,
name
}
Event
{
id,
id_company,
date,
name
}
It's one to many relation.
I want to have results of companies (every company only once) with the latest event.
I'm using postgres. I have this query:
select distinct on (COM.id)
COM.id,
COM.name,
EVT.date,
EVT.name
FROM Company COM
LEFT JOIN Event EVT on EVT.id_company = COM.id
ORDER BY COM.id, EVT.date DESC
It looks good but I'm wondering if I could have the same result using subqueries or something else instead of distinct and order by date of the event.

You can use row_number and then select row number 1, like below:
select COM.id,
COM.name,
EVT.date,
EVT.name
FROM Company COM
LEFT JOIN (select EVT.*, row_number() over(partition by id_company order by EVT.date DESC) rnk from Event EVT ) EVT on EVT.id_company = COM.id and rnk=1

You could use rank() to achieve your results using a subquery or CTE such as this one.
with event_rank as (
select id_company, date, name,
rank() over (partition by id_company order by date desc) as e_rank
from event
)
select c.id, c.name, er.date, er.name
from company c
left join event_rank er
on c.id = er.id_company
where er.e_rank = 1
or er.e_rank is null --to get the companies who don't have an event
View on DB Fiddle

Related

How optimize select with max subquery on the same table?

We have many old selects like this:
SELECT
tm."ID",tm."R_PERSONES",tm."R_DATASOURCE", ,tm."MATCHCODE",
d.NAME AS DATASOURCE,
p.PDID
FROM TABLE_MAPPINGS tm,
PERSONES p,
DATASOURCES d,
(select ID
from TABLE_MAPPINGS
where (R_PERSONES, MATCHCODE)
in (select
R_PERSONES, MATCHCODE
from TABLE_MAPPINGS
where
id in (select max(id)
from TABLE_MAPPINGS
group by MATCHCODE)
)
) tm2
WHERE tm.R_PERSONES = p.ID
AND tm.R_DATASOURCE=d.ID
and tm2.id = tm.id;
These are large tables, and queries take a long time.
How to rebuild them?
Thank you
You can query the table only once using something like (untested as you have not provided a minimal example of your create table statements or sample data):
SELECT *
FROM (
SELECT m.*,
COUNT(CASE WHEN rnk = 1 THEN 1 END)
OVER (PARTITION BY r_persones, matchcode) AS has_max_id
FROM (
SELECT tm.ID,
tm.R_PERSONES,
tm.R_DATASOURCE,
tm.MATCHCODE,
d.NAME AS DATASOURCE,
p.PDID,
RANK() OVER (PARTITION BY tm.matchcode ORDER BY tm.id DESC) As rnk
FROM TABLE_MAPPINGS tm
INNER JOIN PERSONES p ON tm.R_PERSONES = p.ID
INNER JOIN DATASOURCES d ON tm.R_DATASOURCE = d.ID
) m
)
WHERE has_max_id > 0;
First finding the maximum ID using the RANK analytic function and then finding all the relevant r_persones, matchcode pairs using conditional aggregation in a COUNT analytic function.
Note: you want to use the RANK or DENSE_RANK analytic functions to match the maximums as it can match multiple rows per partition; whereas ROW_NUMBER will only ever put a single row per partition first.
You're querying table_mappings 3 times; how about doing it only once?
WITH
tab_map
AS
(SELECT a.id,
a.r_persones,
a.matchcode,
a.datasource,
ROW_NUMBER ()
OVER (PARTITION BY a.matchcode ORDER BY a.id DESC) rn
FROM table_mappings a)
SELECT tm.id,
tm.r_persones,
tm.matchcode,
d.name AS datasource,
p.pdid
FROM tab_map tm
JOIN persones p ON p.id = tm.r_persones
JOIN datasources d ON d.id = tm.r_datasource
WHERE tm.rn = 1

Select second most recent date from inner join

I have this query :
SELECT
companies.display_name, companies.pay_schedule_id,
pay_schedule_periods.schedule_id,
pay_schedule_periods.created_at
FROM
companies
INNER JOIN
pay_schedule_periods ON pay_schedule_id = pay_schedule_periods.schedule_id
ORDER BY
companies.display_name, pay_schedule_periods.created_at DESC;
I get this result :
How can I select only the second most recent created_at date from each unique display_name ?
You could use row_number to assign a sequence to your dates and apply this before joining, then include as part of your join criteria, such as:
select c.display_name, c.pay_schedule_id, psp.schedule_id, psp.created_at
from companies c
join (
select pay_schedule_id, created_at,
Row_Number() over(partition by pay_schedule_id order by created_at desc) rn
from pay_schedule_periods
)psp on psp.schedule_id = c.pay_schedule_id and rn = 2
order by c.display_name, psp.created_at desc;
You could also apply this using a lateral join which would simplify further.

How to group/aggregate rows in a table based on values in a column

I have a table of chats between buyers and sellers, and in many cases there are chats between the same buyer and a seller (with different chatids), that have different lastactivity dates (shown here in Unix time).
My goal is to be able to query this table so that only a single chatid is returned for each buyer/seller pair, and this chatid corresponds to whichever chat had the most recent lastactivity - so like this:
I've tried:
SELECT max(lastactivity), chatid, buyerid, supplierid,
FROM chat_table
GROUP BY 2,3,4
but this doesn't seem to work...
Anyone able to help?
In Redshift, I would do use window functions:
select ct.*
from (select ct.*,
row_number() over (partition by least(buyerid, supplierid), greatest(buyerid, supplierid)
order by lastactivity
) as seqnum
from chat_table ct
) ct
where seqnum = 1;
Using a simpler approach, using self-JOIN,
SELECT a.buyerid, a.sellerid, a.chatid, a.lastactivity
FROM chat_table a
JOIN (SELECT MAX(lastactivity),buyerid, sellerid FROM chat_table GROUP BY buyerid, sellerid) b
ON a.buyerid = b.buyerid
AND a.sellerid = b.sellerid
AND a.lastactivity = b.lastactivity ;
With NOT EXISTS:
SELECT c.*
FROM chat_table c
WHERE NOT EXISTS (
SELECT 1 FROM chat_table
WHERE buyerid = c.buyerid AND supplierid = c.supplierid AND lastactivity > c.lastactivity
)
select a.lastactivity, a.buyerid, a.supplierid, a.chatid
from (SELECT max(lastactivity) lastactivity, buyerid, supplierid
FROM chat_table
GROUP BY 2,3) a
left join chat_table b on a.buyerid=b.buyerid and a.supplierid=b.supplierid and a.lastactivity=b.kastactivity;
You need to use window function like row_number.
The partition will be the buyer_id and seller_id,
order by the lastactivity.
In the outer query filter the rows where rownumber=1.
https://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html
As this question is tagged to aws-redshift. Below query works. Try this.
Select *from chat_table
where (buyerid,sellerid, lastactivity)
IN(
Select buyerid,sellerid, max(lastactivity) as lastactivity
from chat_table
Group by buyerid,sellerid);

Highest Count with a group

I'm having an absolute brain fade
SELECT p.ProductCategory, f.ProductSubCategory, COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
GROUP BY p.ProductCategory, f.ProductSubCategory
ORDER BY 1,3 DESC
This shows me the count for each ProductSubCategory, I would like to see only the highest ProductSubCategory per ProductCategory.
I wish to see (I don't care about the Count value)
There are a couple of different ways to do this. One involves joining the results back to themselves and using the max aggregate. But since you are using SQL Server, you can use ROW_NUMBER to achieve the same result:
with cte as (
select p.productcategory, p.ProductSubCategory, COUNT(*) cnt,
ROW_NUMBER() over (partition by p.productcategory order by count(*) desc) rn
from products p
join sales s on p.ProductSubCategory = s.ProductSubCategory
group by p.productcategory, p.ProductSubCategory
)
select *
from cte
where rn = 1
You already got the answer, Please see the following code to. It may help you.
SELECT p.ProductCategory,
f.ProductSubCategory,
COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
JOIN (
SELECT p.ProductCategory,
f.ProductSubCategory,
ROW_NUMBER() OVER ( PARTITION BY p.ProductCategory,
f.ProductSubCategory
ORDER BY COUNT(*) DESC) [Row]
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory) Lu
ON P.ProductCategory = Lu.ProductCategory
AND f.ProductSubCategory = Lu.ProductSubCategory
WHERE Lu.Row = 1
GROUP By p.ProductCategory,
f.ProductSubCategory

oracle - maximum per group

University Table - UniversityName, UniversityId
Lease Table - LeaseId, BookId, UniversityId, LeaseDate
Book Table - BookId, UniversityId, Category, PageCount.
For each university, I have to find category that had the most number of books leased.
So, something like
UniversityName Category #OfTimesLeased
I have been playing around with it with some success using Dense_Rank etc - but if there is a tie, only one of them shows up, while I want both of them to show up.
Current Query:
select b.UniversityId, MAX(tempTable.type) KEEP (DENSE_RANK FIRST ORDER BY tempTable.counter DESC)
from book b
join
(select count(l.leaseid) AS counter, b.category, b.universityid
from lease l
join book b
on b.bookid =l.bookid AND b.universityid=r.universityid
group by b.category, b.universityid) tempTable
on counterTable.universityid= b.universityid
group by b.universityid
^Unable to solve the tie issue and get the number of leases for the most leased book type.
Try this
WITH CTE AS
(
SELECT UniversityName, Category, Count(*) NumOfTimesLeased
FROM University u
INNER JOIN Book b on u.UniversityId = b.UniversityId
INNER JOIN Lease l on b.bookid = l.bookid and b.UniversityId = l.UniversityId
GROUP BY UniversityName, Category
),
CTE2 AS (
SELECT UniversityName, Category, NumOfTimesLeased,
RANK() OVER (PARTITION BY UniversityName
ORDER BY NumOfTimesLeased DESC) Rnk
FROM CTE)
SELECT * FROM CTE2 WHERE Rnk = 1
You are on the right track with the analytic functions:
select Univerity, Category, NumLeased
from (select t.*,
row_number() over (partition by university order by Numleased desc) as seqnum
from (select l.university, b.category, count(*) as NumLeased
from lease l join
book b
on l.bookid = b.bookid
group by l.university, b.category
) t
) t
where seqnum = 1
I use the row_number() because you only want the one top value. Rank and dense_rank are more useful when you are looking for values other than "1".
If you want the top values to show up when there is a tie, then use dense_rank instead of row_number. The values will be on different rows.