Get max/min value of the column independet from where clause

Get max/min value of the column independet from where clause - sql

I am having the following query and running it on postgress
select
p.id as p_id,
p.name as p_name,
p.tags,
p.creator,
p.value
p.creation_date,
cp.id as c_part_id,
fr.distance
count(*) OVER() AS total_item
from t_p p
left join t_c_part cp on p.id = cp.p_id
left join t_fl fr on p.id = fr.p_id
where p.name = 'test'
ORDER BY p.id ASC, p.name ASC
OFFSET 0 FETCH NEXT 25 ROWS only
What is missing here is that I also need to get max(p.value) and min(p.value) not affected by the "where" clause - so calculated from total (all) values.
I am dreaming that I can do it within one query and reduce the number of transactions.
Honestly not sure if it is possible!
What I tried is something like this ->
SELECT
(SELECT COUNT(*) from t_p) as count,
(SELECT json_agg(t.*) FROM (
SELECT * FROM t_p
where ***
) AS t) AS rows
But this one did not look really nice as it require additional JSON manipulation at the backend.
I discovered that I might try to use the "with" statement to create a temporary view so the where condition is only evaluated once, but did not succeed to make it works...

You can add the extra columns as scalar subqueries in the form (select min(value) from t_p). Their values are not related to the main query so they should be totally independent.
Your original query has some minor syntax issues (missing commas). I fixed those and the result is:
select
p.id as p_id,
p.name as p_name,
p.tags,
p.creator,
p.value,
p.creation_date,
cp.p_id as c_part_id,
fr.distance,
count(*) OVER() AS total_item,
(select min(value) from t_p) as min_value,
(select max(value) from t_p) as max_value
from t_p p
left join t_c_part cp on p.id = cp.p_id
left join t_fl fr on p.id = fr.p_id
where p.name = 'test'
ORDER BY p.id ASC, p.name ASC
OFFSET 0 FETCH NEXT 25 ROWS only
See running query (without any data) at DB Fiddle.

You can join to a sub-query that calculates both MIN & MAX.
...
from t_p p
left join t_c_part cp on p.id = cp.p_id
left join t_fl fr on p.id = fr.p_id
cross join (
select
min(value) as min_value
, max(value) as max_value
, avg(value) as avg_value
from t_p
) as v
...
Then use v.min_value and v.max_value in the select.
Doesn't even have to be a LATERAL.

You could get the minimum and maximum "on the side" like this:
select
p.id as p_id,
p.name as p_name,
p.tags,
p.creator,
p.value
p.creation_date,
cp.id as c_part_id,
fr.distance,
count(*) OVER() AS total_item,
p.min_value,
p.max_value
from (SELECT id,
name,
tags,
creator,
value,
creation_date,
min(value) OVER () AS min_value,
max(value) OVER () AS max_value,
FROM t_p) AS p
left join t_c_part cp on p.id = cp.p_id
left join t_fl fr on p.id = fr.p_id
where p.name = 'test'
ORDER BY p.id ASC, p.name ASC
OFFSET 0 FETCH NEXT 25 ROWS only;

Related

How optimize select with max subquery on the same table?

We have many old selects like this:
SELECT
tm."ID",tm."R_PERSONES",tm."R_DATASOURCE", ,tm."MATCHCODE",
d.NAME AS DATASOURCE,
p.PDID
FROM TABLE_MAPPINGS tm,
PERSONES p,
DATASOURCES d,
(select ID
from TABLE_MAPPINGS
where (R_PERSONES, MATCHCODE)
in (select
R_PERSONES, MATCHCODE
from TABLE_MAPPINGS
where
id in (select max(id)
from TABLE_MAPPINGS
group by MATCHCODE)
)
) tm2
WHERE tm.R_PERSONES = p.ID
AND tm.R_DATASOURCE=d.ID
and tm2.id = tm.id;
These are large tables, and queries take a long time.
How to rebuild them?
Thank you

You can query the table only once using something like (untested as you have not provided a minimal example of your create table statements or sample data):
SELECT *
FROM (
SELECT m.*,
COUNT(CASE WHEN rnk = 1 THEN 1 END)
OVER (PARTITION BY r_persones, matchcode) AS has_max_id
FROM (
SELECT tm.ID,
tm.R_PERSONES,
tm.R_DATASOURCE,
tm.MATCHCODE,
d.NAME AS DATASOURCE,
p.PDID,
RANK() OVER (PARTITION BY tm.matchcode ORDER BY tm.id DESC) As rnk
FROM TABLE_MAPPINGS tm
INNER JOIN PERSONES p ON tm.R_PERSONES = p.ID
INNER JOIN DATASOURCES d ON tm.R_DATASOURCE = d.ID
) m
)
WHERE has_max_id > 0;
First finding the maximum ID using the RANK analytic function and then finding all the relevant r_persones, matchcode pairs using conditional aggregation in a COUNT analytic function.
Note: you want to use the RANK or DENSE_RANK analytic functions to match the maximums as it can match multiple rows per partition; whereas ROW_NUMBER will only ever put a single row per partition first.

You're querying table_mappings 3 times; how about doing it only once?
WITH
tab_map
AS
(SELECT a.id,
a.r_persones,
a.matchcode,
a.datasource,
ROW_NUMBER ()
OVER (PARTITION BY a.matchcode ORDER BY a.id DESC) rn
FROM table_mappings a)
SELECT tm.id,
tm.r_persones,
tm.matchcode,
d.name AS datasource,
p.pdid
FROM tab_map tm
JOIN persones p ON p.id = tm.r_persones
JOIN datasources d ON d.id = tm.r_datasource
WHERE tm.rn = 1

Is there a way to distinct multiple columns in sql?

Is there a way to distinct multiple columns? When I tried to do it with p.name it says that there is an error that occurred.
SELECT DISTINCT( V.NAME ),
POH.status,
poh.shipdate,
pod.orderqty,
POD.receivedqty,
POD.rejectedqty,
p.NAME
FROM purchasing.vendor v
INNER JOIN purchasing.productvendor pv
ON v.businessentityid = pv.businessentityid
INNER JOIN production.product p
ON pv.productid = P.productid
INNER JOIN purchasing.purchaseorderdetail POD
ON P.productid = POD.productid
INNER JOIN purchasing.purchaseorderheader POH
ON POD.purchaseorderid = POH.purchaseorderid
ORDER BY v.NAME,
p.NAME;

If you want one row per NAME, then you can use ROW_NUMBER():
with q as (
<your query here with columns renamed so there are no duplicates>
)
select q.*
from (select q.*,
row_number() over (partition by v_name order by v_name) as seqnum
from q
) q
where seqnum = 1;

DISTINCT is not a function, it is an operator and its scope is the entire SELECT clause
(The query formatting is just for emphasizing the point)
SELECT DISTINCT
V.NAME,
POH.status,
poh.shipdate,
pod.orderqty,
POD.receivedqty,
POD.rejectedqty,
p.NAME
FROM purchasing.vendor v
...
That answers the error you get, however, I doubt if this will give you the results you are looking for

postgres join max date

I need to construct a join that will give me the most recent price for each product. I vastly simplified the table structures for the purpose of the example, and each table row counts will be in the millions. My previous stabs at this have not exactly been very effecient.

In PostgreSQL, you could try DISTINCT ON to only get the first row per product id in descending create_date order;
SELECT DISTINCT ON (products.id) products.*, prices.*
FROM products
JOIN prices
ON products.id = prices.product_id
ORDER BY products.id, create_date DESC
(of course, except for illustrative purposes, you should of course select the exact columns you need)

The simplest way to do it is using the row_number function.
SELECT
p.name,
t.amount AS latest_price
FROM (
SELECT
p.*,
row_number() OVER (PARTITION BY product_id ORDER BY create_date DESC) AS rn
FROM
prices p) t
JOIN products p ON p.id = t.product_id
WHERE
rn = 1

While the DISTINCT ON answer worked for my instance, I found there's a faster way for me to get what I need.
SELECT
DISTINCT ON(u.id) u.id,
(CAST(data AS JSON) ->> 'Finished') AS Finished,
ee.post_value
FROM
users_user u
JOIN events_event ee on u.id = ee.actor_id
WHERE
u.id > 20000
ORDER BY
u.id DESC,
ee.time DESC;
takes ~25s on my DB, while
SELECT
u.id,
(CAST(data AS JSON) ->> 'Finished') AS Finished,
e.post_value
FROM
users_user u
JOIN events_event e on u.id = e.actor_id
LEFT JOIN events_event ee on ee.actor_id = e.actor_id
AND ee.time > e.time
WHERE
u.id > 20000
AND ee.id IS NULL
ORDER BY
u.id DESC;
takes ~15s.

Select all threads and order by the latest one

Now that I got the Select all forums and get latest post too.. how? question answered, I am trying to write a query to select all threads in one particular forum and order them by the date of the latest post (column "updated_at").
This is my structure again:
forums forum_threads forum_posts
---------- ------------- -----------
id id id
parent_forum (NULLABLE) forum_id content
name user_id thread_id
description title user_id
icon views updated_at
created_at created_at
updated_at
last_post_id (NULLABLE)
I tried writing this query, and it works.. but not as expected: It doesn't order the threads by their last post date:
SELECT DISTINCT ON(t.id) t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC;
How can I solve this one?

Assuming you want a single row per thread and not all rows for all posts.
DISTINCT ON is still the most convenient tool. But the leading ORDER BY items have to match the expressions of the DISTINCT ON clause. If you want to order the result some other way, you need to wrap it into a subquery and add another ORDER BY to the outer query:
SELECT *
FROM (
SELECT DISTINCT ON (t.id)
t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC
) sub
ORDER BY updated_at DESC;
If you are looking for a query without subquery for some unknown reason, this should work, too:
SELECT DISTINCT
t.id
, first_value(u.username) OVER w AS username
, first_value(p.updated_at) OVER w AS updated_at
, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
WINDOW w AS (PARTITION BY t.id ORDER BY p.updated_at DESC)
ORDER BY updated_at DESC;
There is quite a bit going on here:
The tables are joined and rows are selected according to JOIN and WHERE clauses.
The two instances of the window function first_value() are run (on the same window definition) to retrieve username and updated_at from the latest post per thread. This results in as many identical rows as there are posts in the thread.
The DISTINCT step is executed after the window functions and reduces each set to a single instance.
ORDER BY is applied last and updated_at references the OUT column (SELECT list), not one of the two IN columns (FROM list) of the same name.
Yet another variant, a subquery with the window function row_number():
SELECT id, username, updated_at, title
FROM (
SELECT t.id
, u.username
, p.updated_at
, t.title
, row_number() OVER (PARTITION BY t.id
ORDER BY p.updated_at DESC) AS rn
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
) sub
WHERE rn = 1
ORDER BY updated_at DESC;
Similar case:
Return records distinct on one column but order by another column
You'll have to test which is faster. Depends on a couple of circumstances.

Forget the distinct on:
SELECT t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY p.updated_at DESC;

SELECT distinct gives wrong count

I've got a problem where my count(*) will return the number of rows before distinct rows are filtered.
This is a simplified version of my query. Note that I'll extract a lot of other data from the tables, so group by won't return the same result, as I'd have to group by maybe 10 columns. The way it works is that m is a map mapping q, c and kl, so there can be several references to q.id. I only want one.
SELECT distinct on (q.id) count(*) over() as full_count
from q, c, kl, m
where c.id = m.chapter_id
and q.id = m.question_id
and q.active = 1
and c.class_id = m.class_id
and kl.id = m.class_id
order by q.id asc
If I run this i get full_count = 11210 while it only returns 9137 rows. If I run it without the distinct on (q.id), distinct on (q.id) is indeed the number of rows.
So it seems that the count function doesn't have access to the filtered rows. How can I solve this? Do I need to rethink my approach?

I'm not entirely sure what exactly you are trying to count, but this might get you started:
select id,
full_count,
id_count
from (
SELECT q.id,
count(*) over() as full_count,
count(*) over (partition by q.id) as id_count,
row_number() over (partition by q.id order by q.id) as rn
from q
join m on q.id = m.question_id
join c on c.id = m.chapter_id and c.class_id = m.class_id
join kl on kl.id = m.class_id
where q.active = 1
) t
where rn = 1
order by q.id asc
If you need the count per id, then the column id_count would be what you need. If you need the overall count, but just on row per id, then the full_count is probably what you want.
(note that I re-wrote your implicit join syntax to use explicit JOINs)

Can you use a subquery:
select qid, count(*) over () as full_count
from (SELECT distinct q.id
from q, c, kl, m
where c.id = m.chapter_id
and q.id = m.question_id
and q.active = 1
and c.class_id = m.class_id
and kl.id = m.class_id
) t
order by q.id asc
But the group by is the right approach. The key word distinct in selectis really just syntactic sugar for doing a group by on all non-aggregate-functioned columns.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get max/min value of the column independet from where clause - sql

Related

How optimize select with max subquery on the same table?

Is there a way to distinct multiple columns in sql?

postgres join max date

Select all threads and order by the latest one

SELECT distinct gives wrong count

Categories

Resources