Postgres - order by aggregate function on SUM - sql

I use Postgres sql function it in my sql query like:
SELECT
message.id,
note,
earned_media_direct(
SUM(message_stat.posts_delivered)::int,
CAST(SUM(message_stat.clicks) AS bigint),
team.earned_media_multi_clicks::int,
SUM(message_stat.likes)::int,
team.earned_media_multi_likes::int,
SUM(message_stat.comments)::int,
team.earned_media_multi_comments::int,
SUM(message_stat.shares)::int,
team.earned_media_multi_shares::int
) AS media_points,
count(*) OVER() AS total_count
FROM message
LEFT JOIN team ON team.id = 10
WHERE team_id = 10
GROUP BY message.id, team.id
{$orderBy}
LIMIT 20 OFFSET 1
When returning a list of messages I want to use ORDER BY rank (Sorting by "rank" really means sorting by Media Points)
The function earned_media_direct is defined within Postgres like:
CREATE OR REPLACE FUNCTION public.earned_media_direct(posts bigint, clicks bigint, clicks_multiplier numeric, likes bigint, likes_multiplier numeric, comments bigint, comments_multiplier numeric, reshares bigint, shares_multiplier numeric)
RETURNS numeric
LANGUAGE plpgsql
AS $function$
BEGIN
RETURN COALESCE(clicks, 0) * clicks_multiplier +
COALESCE(likes, 0) * likes_multiplier +
COALESCE(comments, 0) * comments_multiplier +
(COALESCE(posts, 0) + COALESCE(reshares, 0)) * shares_multiplier;
END;
$function$
I tried adding:
ROW_NUMBER() OVER (
ORDER BY earned_media_direct(
SUM(message_stat.posts_delivered),
CAST(SUM(message_stat.clicks) AS bigint),
team.earned_media_multi_clicks,
SUM(message_stat.likes),
team.earned_media_multi_likes,
SUM(message_stat.comments),
team.earned_media_multi_comments,
SUM(message_stat.shares),
team.earned_media_multi_shares) DESC
) AS rank
I am not sure I am using it right regarding my example. Is there another way to perform ORDER BY rank.
Thanks

We probably can use "subquery" approach for this
First: we wrap your main query into subquery and omit the order inside
it.
Second: we do order on outer query.
-- outer query
SELECT * FROM (
-- sub query (your main query)
SELECT
message.id,
note,
earned_media_direct(
SUM(message_stat.posts_delivered)::int,
CAST(SUM(message_stat.clicks) AS bigint),
team.earned_media_multi_clicks::int,
SUM(message_stat.likes)::int,
team.earned_media_multi_likes::int,
SUM(message_stat.comments)::int,
team.earned_media_multi_comments::int,
SUM(message_stat.shares)::int,
team.earned_media_multi_shares::int
) AS media_points,
count(*) OVER() AS total_count
FROM message
LEFT JOIN team ON team.id = 10
WHERE team_id = 10
GROUP BY message.id, team.id
LIMIT 20 OFFSET 1
) A
-- move the order to outer query
ORDER BY A.media_points;
Hope this help answer your question.
Reference:
https://www.postgresql.org/docs/current/functions-subquery.htm

Related

PostgreSQL - multiple aggregate queries from the same function call

I have a function that returns a setof from a table:
CREATE OR REPLACE FUNCTION get_assoc_addrs_from_bbl(_bbl text)
RETURNS SETOF wow_bldgs AS $$
SELECT bldgs.* FROM wow_bldgs AS bldgs
...
$$ LANGUAGE SQL STABLE;
Here's a sample of what the table would return:
Now I'm writing an "aggregate" function that will return only one row that with various (aggregated) data points about the table that this function returns. Here is my current working (& naive) example:
SELECT
count(distinct registrationid) as bldgs,
sum(unitsres) as units,
round(avg(yearbuilt), 1) as age,
(SELECT first(corpname) FROM (
SELECT unnest(corpnames) as corpname
FROM get_assoc_addrs_from_bbl('3012380016')
GROUP BY corpname ORDER BY count(*) DESC LIMIT 1
) corps) as topcorp,
(SELECT first(businessaddr) FROM (
SELECT unnest(businessaddrs) as businessaddr
FROM get_assoc_addrs_from_bbl('3012380016')
GROUP BY businessaddr ORDER BY count(*) DESC LIMIT 1
) rbas) as topbusinessaddr
FROM get_assoc_addrs_from_bbl('3012380016') assocbldgs
As you can see, for the two "subqueries" that require a custom grouping/ordering method, I need to repeat the call to get_assoc_addrs_from_bbl(). Ideally, I'm looking for a structure that would avoid the repeated calls as the function requires a lot of processing and I want the capacity for an arbitrary number of subqueries. I've looked into CTEs and window expressions and the like but no luck.
Any tips? Thank you!
Create simple aggregate function:
create aggregate array_agg2(anyarray) (
sfunc=array_cat,
stype=anyarray);
It aggregates array values into one single-dim array. Example:
# with t(x) as (values(array[1,2]),(array[2,3,4])) select array_agg2(x) from t;
┌─────────────┐
│ array_agg2 │
╞═════════════╡
│ {1,2,2,3,4} │
└─────────────┘
After that your query could be rewritten as
SELECT
count(distinct registrationid) as bldgs,
sum(unitsres) as units,
round(avg(yearbuilt), 1) as age,
(SELECT first(corpname) FROM (
SELECT * FROM unnest(array_agg2(corpnames)) as corpname
GROUP BY corpname ORDER BY count(*) DESC LIMIT 1
) corps) as topcorp,
(SELECT first(businessaddr) FROM (
SELECT * FROM unnest(array_agg2(businessaddrs)) as businessaddr
GROUP BY businessaddr ORDER BY count(*) DESC LIMIT 1
) rbas) as topbusinessaddr
FROM get_assoc_addrs_from_bbl('3012380016') assocbldgs
(surely if I understand your goal correctly)

PSQL replacement for stored procedure, too slow

I have property, each property has contracts, each contract has integer field rental_area.
Previously I had to get the rental_area of all contracts summed by property and this worked.
SELECT
Sum(cr.rental_area) total_property_rental_area,
-- bunch of other cr fields
FROM appdata.contract_rental cr
INNER JOIN appdata.domain_building b1
ON ( b1.building_id = cr.building_id )
INNER JOIN appdata.domain_immovable im1
ON ( im1.immovable_id = b1.immovable_id )
GROUP BY im1.property_id
Now the logic changed, and contract has a list of periods, and one of those periods contain the rental_area of that contract. And finding that correct period needed some special logic.
I tried to join the logic to the query, but could not find a place, where to put it so I had to create stored procedure.
SELECT Sum(p.rental_area) total_property_rental_area
-- bunch of other cr fields
FROM appdata.contract_rental cr
JOIN appdata.rental_period p
ON p.id = Get_current_period_id(cr.contract_rental_id,
cr.end_date_actual)
INNER JOIN appdata.domain_building b1
ON ( b1.building_id = cr.building_id )
INNER JOIN appdata.domain_immovable im1
ON ( im1.immovable_id = b1.immovable_id )
GROUP BY im1.property_id
Procedure:
CREATE OR REPLACE FUNCTION appdata.get_current_period_id(in contract_id_in bigint, in end_date_actual_in Date)
RETURNS bigint AS
$BODY$
DECLARE
period_id bigint;
BEGIN
-- find the period that matches with end date or current date
select id into period_id
from rental_period
where contract_id = contract_id_in
and Coalesce(end_date_actual_in, Now()) >= start_date
order by start_date desc limit 1;
-- if there was no period, just take the first one
IF period_id is null THEN
select id into period_id
from rental_period
where contract_id = contract_id_in
order by start_date asc
limit 1;
END IF;
return period_id;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
But now it is too slow, is there a way to put the period finding logic into the sql to make it faster, without using stored procedure? The point is that, for reach contract, it has to only get single period based on the logic.
Roll out your stored procedure back into the main SQL, the primary hint being coalescing your 2 queries (if one is null then use the other)
Example: Instead of
p.id = Get_current_period_id(cr.contract_rental_id,
cr.end_date_actual)
Use:
p.pid = coalesce(
(select rpx.id
from rental_period rpx
where contract_id = cr.contract_rental_id
and Coalesce(cr.end_date, Now()) >= start_date
order by start_date desc limit 1;
),
( select rpy.id
from rental_period rpy
where contract_id = cr.contract_rental_id
order by start_date asc
limit 1;
)
)
As per the comment below, the following index may also help:
create index on rental_period (contract_id, start_date asc)
be sure to analyze the table afterward to update the statistics.

Record returned from function has columns concatenated

I have a table which stores account changes over time. I need to join that up with two other tables to create some records for a particular day, if those records don't already exist.
To make things easier (I hope), I've encapsulated the query that returns the correct historical data into a function that takes in an account id, and the day.
If I execute "Select * account_servicetier_for_day(20424, '2014-08-12')", I get the expected result (all the data returned from the function in separate columns). If I use the function within another query, I get all the columns joined into one:
("2014-08-12 14:20:37",hollenbeck,691,12129,20424,69.95,"2Mb/1Mb 20GB Limit",2048,1024,20.000)
I'm using "PostgreSQL 9.2.4 on x86_64-slackware-linux-gnu, compiled by gcc (GCC) 4.7.1, 64-bit".
Query:
Select
'2014-08-12' As day, 0 As inbytes, 0 As outbytes, acct.username, acct.accountid, acct.userid,
account_servicetier_for_day(acct.accountid, '2014-08-12')
From account_tab acct
Where acct.isdsl = 1
And acct.dslservicetypeid Is Not Null
And acct.accountid Not In (Select accountid From dailyaccounting_tab Where Day = '2014-08-12')
Order By acct.username
Function:
CREATE OR REPLACE FUNCTION account_servicetier_for_day(_accountid integer, _day timestamp without time zone) RETURNS setof account_dsl_history_info AS
$BODY$
DECLARE _accountingrow record;
BEGIN
Return Query
Select * From account_dsl_history_info
Where accountid = _accountid And timestamp <= _day + interval '1 day - 1 millisecond'
Order By timestamp Desc
Limit 1;
END;
$BODY$ LANGUAGE plpgsql;
Generally, to decompose rows returned from a function and get individual columns:
SELECT * FROM account_servicetier_for_day(20424, '2014-08-12');
As for the query:
Postgres 9.3 or newer
Cleaner with JOIN LATERAL:
SELECT '2014-08-12' AS day, 0 AS inbytes, 0 AS outbytes
, a.username, a.accountid, a.userid
, f.* -- but avoid duplicate column names!
FROM account_tab a
, account_servicetier_for_day(a.accountid, '2014-08-12') f -- <-- HERE
WHERE a.isdsl = 1
AND a.dslservicetypeid IS NOT NULL
AND NOT EXISTS (
SELECT FROM dailyaccounting_tab
WHERE day = '2014-08-12'
AND accountid = a.accountid
)
ORDER BY a.username;
The LATERAL keyword is implicit here, functions can always refer earlier FROM items. The manual:
LATERAL can also precede a function-call FROM item, but in this
case it is a noise word, because the function expression can refer to
earlier FROM items in any case.
Related:
Insert multiple rows in one table based on number in another table
Short notation with a comma in the FROM list is (mostly) equivalent to a CROSS JOIN LATERAL (same as [INNER] JOIN LATERAL ... ON TRUE) and thus removes rows from the result where the function call returns no row. To retain such rows, use LEFT JOIN LATERAL ... ON TRUE:
...
FROM account_tab a
LEFT JOIN LATERAL account_servicetier_for_day(a.accountid, '2014-08-12') f ON TRUE
...
Also, don't use NOT IN (subquery) when you can avoid it. It's the slowest and most tricky of several ways to do that:
Select rows which are not present in other table
I suggest NOT EXISTS instead.
Postgres 9.2 or older
You can call a set-returning function in the SELECT list (which is a Postgres extension of standard SQL). For performance reasons, this is best done in a subquery. Decompose the (well-known!) row type in the outer query to avoid repeated evaluation of the function:
SELECT '2014-08-12' AS day, 0 AS inbytes, 0 AS outbytes
, a.username, a.accountid, a.userid
, (a.rec).* -- but be wary of duplicate column names!
FROM (
SELECT *, account_servicetier_for_day(a.accountid, '2014-08-12') AS rec
FROM account_tab a
WHERE a.isdsl = 1
AND a.dslservicetypeid Is Not Null
AND NOT EXISTS (
SELECT FROM dailyaccounting_tab
WHERE day = '2014-08-12'
AND accountid = a.accountid
)
) a
ORDER BY a.username;
Related answer by Craig Ringer with an explanation, why it's better not to decompose on the same query level:
How to avoid multiple function evals with the (func()).* syntax in an SQL query?
Postgres 10 removed some oddities in the behavior of set-returning functions in the SELECT:
What is the expected behaviour for multiple set-returning functions in SELECT clause?
Use the function in the from clause
Select
'2014-08-12' As day,
0 As inbytes,
0 As outbytes,
acct.username,
acct.accountid,
acct.userid,
asfd.*
From
account_tab acct
cross join lateral
account_servicetier_for_day(acct.accountid, '2014-08-12') asfd
Where acct.isdsl = 1
And acct.dslservicetypeid Is Not Null
And acct.accountid Not In (Select accountid From dailyaccounting_tab Where Day = '2014-08-12')
Order By acct.username

How to get a last record when using group by in PostgreSQL

This is my table "AuctionDetails"
The following select:
select string_agg("AuctionNO",',' ) as "AuctionNO"
,sum("QuntityInAuction" ) as "QuntityInAuction"
,"AmmanatPattiID"
,"EntryPassDetailsId"
,"BrokerID"
,"TraderID"
,"IsSold"
,"IsActive"
,"IsExit"
,"IsNew"
,"CreationDate"
from "AuctionDetails"
group by "AmmanatPattiID"
,"EntryPassDetailsId"
,"TraderID"
,"IsSold"
,"IsActive"
,"IsExit"
,"IsNew"
,"BrokerID"
,"CreationDate"
gives me this result:
but i need record like
AuctionNo QunatityInAuction AmmanatpattiID EntryPassDetailID BrokerID Trader ID IsSold ISActive ISExit IsNew CreationDate
AU8797897,AU8797886,AU596220196F37379 1050 -1 228,229 42 42 f t f t 2013-10-10
At the end i need a latest entry of trader and broker which is in our case "42", sum of quantity , and concatenation of auction number ...
The Postgres wiki describes how to define your own FIRST and LAST aggregate functions. For example:
-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE SQL IMMUTABLE STRICT AS $$
SELECT $2;
$$;
-- And then wrap an aggregate around it
CREATE AGGREGATE public.LAST (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);
The page is here: https://wiki.postgresql.org/wiki/First/last_(aggregate)
There are various ways to do this. Combinations of aggregate and window functions or a combination of window functions and DISTINCT ...
SELECT a.*, b.*
FROM (
SELECT string_agg("AuctionNO", ',') AS "AuctionNO"
,sum("QuntityInAuction") AS "QuntityInAuction"
FROM "AuctionDetails"
) a
CROSS JOIN (
SELECT "AmmanatPattiID"
,"EntryPassDetailsId"
,"BrokerID"
,"TraderID"
,"IsSold"
,"IsActive"
,"IsExit"
,"IsNew"
,"CreationDate"
FROM "AuctionDetails"
ORDER BY "AuctionID" DESC
LIMIT 1
) b
For the simple case of a single result row for a whole table, this may be simplest.

Max returning all values on a self join

My problem requires me to query data from the table, and include a column to calculate the % increase as well. I need to pull only the records with the highest % of increase using MAX. I think I'm on the right track but but for some reason its returning all records despite the having clause calling for just the max.
Select
O.Grocery_Item,
TO_CHAR(sum(g.Price_IN_2000), '$99,990.00') TOTAL_IN_2000,
TO_CHAR(sum(g.Estimated_Price_In_2025), '$99,990.00') TOTAL_IN_2025,
TO_CHAR(Round(O.MY_OUTPUT),'9,990') || '%' as My_Output
From
GROCERY_PRICES g,
(SELECT
GROCERY_ITEM,
(((sum(Estimated_Price_In_2025) -
sum(Price_IN_2000))/sum(Price_IN_2000))*100) MY_OUTPUT
FROM
GROCERY_PRICES
GROUP BY GROCERY_ITEM) O
Where
G.GROCERY_ITEM = O.GROCERY_ITEM
GROUP BY
O.GROCERY_ITEM, O.MY_OUTPUT
Having
my_output IN (select Max(O.MY_OUTPUT) from GROCERY_PRICES);
Results:
GROCERY_ITEM TOTAL_IN_2000 TOTAL_IN_2025 MY_OUTPUT
------------------------------ ------------- ------------- ---------
M_004 $2.70 $5.65 109%
B_001 $0.80 $2.64 230%
T_006 $5.70 $6.65 17%
B_002 $2.72 $7.36 171%
E_001 $0.62 $1.78 187%
R_003 $4.00 $13.20 230%
6 rows selected
You can simplify your query so you only select from the Groceries table once since your My_Output column is only a function of numbers you are already producing the self join is not necessary. Then I've used RANK to get the top records (although if you are not concerned about ties ROWNUM will work better):
SELECT g.Grocery_Item,
g.TOTAL_IN_2000,
g.TOTAL_IN_2025,
g.My_Output
FROM ( SELECT Grocery_Item,
TO_CHAR(TOTAL_IN_2000, '$99,990.00') TOTAL_IN_2000,
TO_CHAR(TOTAL_IN_2025, '$99,990.00') TOTAL_IN_2025,
TO_CHAR(ROUND(((TOTAL_IN_2025 / TOTAL_IN_2000) - 1) * 100), '9,990') || '%' as My_Output,
RANK() OVER(PARTITION BY Grocery_Item ORDER BY (TOTAL_IN_2025 / TOTAL_IN_2000) - 1 DESC) AS GroceryRank
FROM ( SELECT g.Grocery_Item,
SUM(g.Price_IN_2000) TOTAL_IN_2000,
SUM(g.Estimated_Price_In_2025) TOTAL_IN_2025
FROM GROCERY_PRICES g
GROUP BY g.Grocery_Item
) g
) g
WHERE GroceryRank = 1;
I've also simplified your percentage calculation.
Try this instead:
select *
from (Select O.Grocery_Item, TO_CHAR(sum(g.Price_IN_2000), '$99,990.00') TOTAL_IN_2000,
TO_CHAR(sum(g.Estimated_Price_In_2025), '$99,990.00') TOTAL_IN_2025,
TO_CHAR(Round(O.MY_OUTPUT),'9,990') || '%' as My_Output
From GROCERY_PRICES g join
(SELECT GROCERY_ITEM,
(((sum(Estimated_Price_In_2025) -
sum(Price_IN_2000))/sum(Price_IN_2000))*100
) MY_OUTPUT
FROM GROCERY_PRICES
GROUP BY GROCERY_ITEM
) O
on G.GROCERY_ITEM = O.GROCERY_ITEM
GROUP BY O.GROCERY_ITEM, O.MY_OUTPUT
ORDER BY my_output desc
) t
where rownum = 1
The problem is that your subquery only has outer references. So, the o.my_output is coming from the outer table, not the from clause in the subquery. You are comparing a value to itself, which for non-NULL values is always true.
Since you want the maximum value, the easiest way is to order the list and take the first row. You can also do this with analytic functions, but rownum is usually more efficient.