PSQL replacement for stored procedure, too slow - sql

I have property, each property has contracts, each contract has integer field rental_area.
Previously I had to get the rental_area of all contracts summed by property and this worked.
SELECT
Sum(cr.rental_area) total_property_rental_area,
-- bunch of other cr fields
FROM appdata.contract_rental cr
INNER JOIN appdata.domain_building b1
ON ( b1.building_id = cr.building_id )
INNER JOIN appdata.domain_immovable im1
ON ( im1.immovable_id = b1.immovable_id )
GROUP BY im1.property_id
Now the logic changed, and contract has a list of periods, and one of those periods contain the rental_area of that contract. And finding that correct period needed some special logic.
I tried to join the logic to the query, but could not find a place, where to put it so I had to create stored procedure.
SELECT Sum(p.rental_area) total_property_rental_area
-- bunch of other cr fields
FROM appdata.contract_rental cr
JOIN appdata.rental_period p
ON p.id = Get_current_period_id(cr.contract_rental_id,
cr.end_date_actual)
INNER JOIN appdata.domain_building b1
ON ( b1.building_id = cr.building_id )
INNER JOIN appdata.domain_immovable im1
ON ( im1.immovable_id = b1.immovable_id )
GROUP BY im1.property_id
Procedure:
CREATE OR REPLACE FUNCTION appdata.get_current_period_id(in contract_id_in bigint, in end_date_actual_in Date)
RETURNS bigint AS
$BODY$
DECLARE
period_id bigint;
BEGIN
-- find the period that matches with end date or current date
select id into period_id
from rental_period
where contract_id = contract_id_in
and Coalesce(end_date_actual_in, Now()) >= start_date
order by start_date desc limit 1;
-- if there was no period, just take the first one
IF period_id is null THEN
select id into period_id
from rental_period
where contract_id = contract_id_in
order by start_date asc
limit 1;
END IF;
return period_id;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
But now it is too slow, is there a way to put the period finding logic into the sql to make it faster, without using stored procedure? The point is that, for reach contract, it has to only get single period based on the logic.

Roll out your stored procedure back into the main SQL, the primary hint being coalescing your 2 queries (if one is null then use the other)
Example: Instead of
p.id = Get_current_period_id(cr.contract_rental_id,
cr.end_date_actual)
Use:
p.pid = coalesce(
(select rpx.id
from rental_period rpx
where contract_id = cr.contract_rental_id
and Coalesce(cr.end_date, Now()) >= start_date
order by start_date desc limit 1;
),
( select rpy.id
from rental_period rpy
where contract_id = cr.contract_rental_id
order by start_date asc
limit 1;
)
)
As per the comment below, the following index may also help:
create index on rental_period (contract_id, start_date asc)
be sure to analyze the table afterward to update the statistics.

Related

Postgres - order by aggregate function on SUM

I use Postgres sql function it in my sql query like:
SELECT
message.id,
note,
earned_media_direct(
SUM(message_stat.posts_delivered)::int,
CAST(SUM(message_stat.clicks) AS bigint),
team.earned_media_multi_clicks::int,
SUM(message_stat.likes)::int,
team.earned_media_multi_likes::int,
SUM(message_stat.comments)::int,
team.earned_media_multi_comments::int,
SUM(message_stat.shares)::int,
team.earned_media_multi_shares::int
) AS media_points,
count(*) OVER() AS total_count
FROM message
LEFT JOIN team ON team.id = 10
WHERE team_id = 10
GROUP BY message.id, team.id
{$orderBy}
LIMIT 20 OFFSET 1
When returning a list of messages I want to use ORDER BY rank (Sorting by "rank" really means sorting by Media Points)
The function earned_media_direct is defined within Postgres like:
CREATE OR REPLACE FUNCTION public.earned_media_direct(posts bigint, clicks bigint, clicks_multiplier numeric, likes bigint, likes_multiplier numeric, comments bigint, comments_multiplier numeric, reshares bigint, shares_multiplier numeric)
RETURNS numeric
LANGUAGE plpgsql
AS $function$
BEGIN
RETURN COALESCE(clicks, 0) * clicks_multiplier +
COALESCE(likes, 0) * likes_multiplier +
COALESCE(comments, 0) * comments_multiplier +
(COALESCE(posts, 0) + COALESCE(reshares, 0)) * shares_multiplier;
END;
$function$
I tried adding:
ROW_NUMBER() OVER (
ORDER BY earned_media_direct(
SUM(message_stat.posts_delivered),
CAST(SUM(message_stat.clicks) AS bigint),
team.earned_media_multi_clicks,
SUM(message_stat.likes),
team.earned_media_multi_likes,
SUM(message_stat.comments),
team.earned_media_multi_comments,
SUM(message_stat.shares),
team.earned_media_multi_shares) DESC
) AS rank
I am not sure I am using it right regarding my example. Is there another way to perform ORDER BY rank.
Thanks
We probably can use "subquery" approach for this
First: we wrap your main query into subquery and omit the order inside
it.
Second: we do order on outer query.
-- outer query
SELECT * FROM (
-- sub query (your main query)
SELECT
message.id,
note,
earned_media_direct(
SUM(message_stat.posts_delivered)::int,
CAST(SUM(message_stat.clicks) AS bigint),
team.earned_media_multi_clicks::int,
SUM(message_stat.likes)::int,
team.earned_media_multi_likes::int,
SUM(message_stat.comments)::int,
team.earned_media_multi_comments::int,
SUM(message_stat.shares)::int,
team.earned_media_multi_shares::int
) AS media_points,
count(*) OVER() AS total_count
FROM message
LEFT JOIN team ON team.id = 10
WHERE team_id = 10
GROUP BY message.id, team.id
LIMIT 20 OFFSET 1
) A
-- move the order to outer query
ORDER BY A.media_points;
Hope this help answer your question.
Reference:
https://www.postgresql.org/docs/current/functions-subquery.htm

How can I improve the native query for a table with 7 millions rows?

I have the below view(table) in my database(SQL SERVER).
I want to retrieve 2 things from this table.
The object which has the latest booking date for each Product number.
It will return the objects = {0001, 2, 2019-06-06 10:39:58} and {0003, 2, 2019-06-07 12:39:58}.
If all the step number has no booking date for a Product number, it wil return the object with Step number = 1. It will return the object = {0002, 1, NULL}.
The view has 7.000.000 rows. I must do it by using native query.
The first query that retrieves the product with the latest booking date:
SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)
The second query that retrieves the product with booking date NULL and Step number = 1;
SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1
I tried using a single query, but it takes too long.
For now I use 2 query for getting this information but for the future I need to improve this. Do you have an alternative? I also can not use stored procedure, function inside SQL SERVER. I must do it with native query from Java.
Try this,
Declare #p table(pumber int,step int,bookdate datetime)
insert into #p values
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')
;With CTE as
(
select pumber,max(bookdate)bookdate
from #p p1
where bookdate is not null
group by pumber
)
select p.* from #p p
where exists(select 1 from CTE c
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from #p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c
where p1.pumber=c.pumber)
If performance is main concern then 1 or 2 query do not matter,finally performance matter.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go
If more than 90% of data are where BookingDate is not null or where BookingDate is null then you can create Filtered Index on it.
Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
Go
Try row_number() with a proper ordering. Null values are treated as the lowest possible values by sql-server ORDER BY.
SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);
Pay attention to sql-server adviced indexes to get good performance.
Possibly the most efficient method is a correlated subquery:
select t.*
from t
where t.step_number = (select top (1) t2.step_number
from t t2
where t2.product_number = t.product_number and
order by t2.booking_date desc, t2.step_number
);
In particular, this can take advantage of an index on (product_number, booking_date desc, step_number).

How to simplify nested SQL cross join?

I'm using Postgres 9.3 and have the following four tables to have maximum flexibility regarding price and / or tax / taxe rate changes in the future (see below for more details):
CREATE TABLE main.products
(
id serial NOT NULL,
"productName" character varying(255) NOT NULL,
"productStockAmount" real NOT NULL,
)
CREATE TABLE main."productPrices"
(
id serial NOT NULL,
product_id integer NOT NULL,
"productPriceValue" real NOT NULL,
"productPriceValidFrom" timestamp without time zone NOT NULL,
)
CREATE TABLE main."productTaxes"
(
id serial NOT NULL,
product_id integer NOT NULL,
"productTaxValidFrom" timestamp without time zone NOT NULL,
"taxRate_id" integer NOT NULL,
)
CREATE TABLE main."taxRateValues"
(
id integer NOT NULL,
"taxRate_id" integer NOT NULL,
"taxRateValueValidFrom" timestamp without time zone NOT NULL,
"taxRateValue" real,
)
I built a view based on the following query to get the currently relevant values:
SELECT p.id, p."productName", p."productStockAmount", sub."productPriceValue", CHR(64+sub3."taxRate_id") AS taxRateId, sub3."taxRateValue" FROM main."products" p
CROSS JOIN LATERAL (SELECT * FROM main."productPrices" pp2 WHERE pp2."product_id"=p."id" AND pp2."productPriceValidFrom" <= NOW() ORDER BY pp2."productPriceValidFrom" DESC LIMIT 1) AS sub
CROSS JOIN LATERAL (SELECT * FROM main."productTaxes" pt WHERE pt."product_id"=p."id" AND pt."productTaxValidFrom" <= NOW() ORDER BY pt."productTaxValidFrom" DESC LIMIT 1) AS sub2
CROSS JOIN LATERAL (SELECT * FROM main."taxRateValues" trv WHERE trv."taxRate_id"=sub2."taxRate_id" AND trv."taxRateValueValidFrom" <= NOW() ORDER BY trv."taxRateValueValidFrom" DESC LIMIT 1) AS sub3
This works fine and gives me the correct results but I assume to get performance problems if several thousand products, price changes etc. are in the database.
Is there anything I can do to simplify the statement or the overall database design?
To use words to describe the needed flexibility:
Prices can be changed and I have to record which price is valid to which time (archival, so not only the current price is needed)
Applied tax rates for products can be changed (e.g. due to changes by law) - archival also needed
Tax rates in general can be changed (also by law, but not related to a single product but all products with this identifier)
Some examples of things that can happen:
Product X changes price from 100 to 200 at 2014-05-09
Product X changes tax rate from A to B at 2014-07-01
Tax rate value for tax rate A changes from 16 to 19 at 2014-09-01
As long as you fetch all rows or more than a few percent of all rows, it will be substantially faster to first aggregate once per table, and then join.
I suggest DISTINCT ON to pick the latest valid row per id:
SELECT p.id, p."productName", p."productStockAmount"
,pp."productPriceValue"
,CHR(64 + tr."taxRate_id") AS "taxRateId", tr."taxRateValue"
FROM main.products p
LEFT JOIN (
SELECT DISTINCT ON (product_id)
product_id, "productPriceValue"
FROM main."productPrices"
WHERE "productPriceValidFrom" <= now()
ORDER BY product_id, "productPriceValidFrom" DESC
) pp ON pp.product_id = p.id
LEFT JOIN (
SELECT DISTINCT ON (product_id)
product_id, "taxRate_id"
FROM main."productTaxes"
WHERE "productTaxValidFrom" <= now()
ORDER BY product_id, "productTaxValidFrom" DESC
) pt ON pt.product_id = p.id
LEFT JOIN (
SELECT DISTINCT ON ("taxRate_id") *
FROM main."taxRateValues"
WHERE "taxRateValueValidFrom" <= now()
ORDER BY "taxRate_id", "taxRateValueValidFrom" DESC
) tr ON tr."taxRate_id" = pt."taxRate_id";
Using LEFT JOIN to be on the safe side. Not every product might have entries in all sub-tables.
And I subscribe to what #Clodoaldo wrote about double-quoted identifiers. I never use anything but legal, lower-case names. Makes your life with Postgres easier.
Detailed explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Do not create quoted identifiers. Once you do it you are forever stuck with them and you will have to quote and remember the casing everywhere. You can use camel case whenever you want if you don't quote the identifier at creation time.
I don't understand why you need the cross lateral. I think it can be just
select
p.id,
p."productName",
p."productStockAmount",
pp2."productPriceValue",
chr(64 + trv."taxRate_id") as "taxRateId",
trv."taxRateValue"
from
main."products" p
left join (
select *
from main."productPrices"
where "productPriceValidFrom" <= now()
order by "productPriceValidFrom" desc
limit 1
) pp2 on pp2."product_id" = p."id"
left join (
select "taxRate_id"
from main."productTaxes"
where "productTaxValidFrom" <= now()
order by "productTaxValidFrom" desc
limit 1
) pt on pt."product_id" = p."id"
left join (
select *
from main."taxRateValues"
where "taxRateValueValidFrom" <= now()
order by "taxRateValueValidFrom" desc
limit 1
) trv on trv."taxRate_id" = pt."taxRate_id"

How to do self join on min/max

I am new to sql queries.
Table is defined as
( symbol varchar,
high int,
low int,
today date,
Primary key (symbol, today)
)
I need to find for each symbol in a given date range, max(high) and min(low) and corresponding dates for max(high) and min(low).
Okay to get first max date and min date in given table.
In a given date range some dates may be missing. If start date is not present then next date should be used and if last date is not present then earlier available date should be used
Data is for one year and around 5000 symbols.
I tried something like this
SELECT a.symbol,
a.maxValue,
a.maxdate,
b.minValue,
b.mindate
FROM (
SELECT table1.symbol, max_a.maxValue, max_a.maxdate
FROM table1
INNER JOIN (
SELECT table1.symbol,
max(table1.high) AS maxValue,
table1.TODAY AS maxdate
FROM table1
GROUP BY table1.symbol
) AS max_a
ON max_a.symbol = table1.symbol
AND table1.today = max_a.maxdate
) AS a
INNER JOIN (
SELECT symbol,
min_b.minValue,
min_b.mindate
FROM table1
INNER JOIN (
SELECT symbol,
min(low) AS minValue,
table1.TODAY AS mindate
FROM table1
GROUP BY testnsebav.symbol
) AS min_b
ON min_b.symbol = table1.symbol
AND table1.today = min_b.mindate
) AS b
ON a.symbol = b.symbol
The first INNER query pre-qualifies for each symbol what the low and high values are within the date range provided. After that, it joins back to the original table again (for same date range criteria), but also adds the qualifier that EITHER the low OR the high matches the MIN() or MAX() from the PreQuery. If so, allows it in the result set.
Now, the result columns. Not knowing which version SQL you were using, I have the first 3 columns as the "Final" values. The following 3 columns after that come from the record that qualified by EITHER of the qualifiers. As stocks go up and down all the time, its possible for the high and/or low values to occur more than once within the same time period. This will include ALL those entries that qualify the MIN() / MAX() criteria.
select
PreQuery.Symbol,
PreQuery.LowForSymbol,
PreQuery.HighForSymbol,
tFinal.Today as DateOfMatch,
tFinal.Low as DateMatchLow,
tFinal.High as DateMatchHigh
from
( select
t1.symbol,
min( t1.low ) as LowForSymbol,
max( t1.high ) as HighForSymbol
from
table1 t1
where
t1.today between YourFromDateParameter and YourToDateParameter
group by
t1.symbol ) PreQuery
JOIN table1 tFinal
on PreQuery.Symbol = tFinal.Symbol
AND tFinal.today between YourFromDateParameter and YourToDateParameter
AND ( tFinal.Low = LowForSymbol
OR tFinal.High = HighForSymbol )

Remove duplicates (1 to many) or write a subquery that solves my problem

Referring to the diagram below the records table has unique Records. Each record is updated, via comments through an Update Table. When I join the two I get lots of duplicates.
How to remove duplicates? Group By does not work for me as I have more than 10 fields in select query and some of them are functions.
Write a sub query which pulls the last updates in the Update table for each record that is updated in a particular month. Joining with this sub query will solve my problem.
Thanks!
Edit
Table structure that is of interest is
create table Records(
recordID int,
90more_fields various
)
create table Updates(
update_id int,
record_id int,
comment text,
byUser varchar(25),
datecreate datetime
)
Here's one way.
SELECT * /*But list columns explicitly*/
FROM Orange o
CROSS APPLY (SELECT TOP 1 *
FROM Blue b
WHERE b.datecreate >= '20110901'
AND b.datecreate < '20111001'
AND o.RecordID = b.Record_ID2
ORDER BY b.datecreate DESC) b
Based on the limited information available...
WITH cteLastUpdate AS (
SELECT Record_ID2, UpdateDateTime,
ROW_NUMBER() OVER(PARTITION BY Record_ID2 ORDER BY UpdateDateTime DESC) AS RowNUM
FROM BlueTable
/* Add WHERE clause if needed to restrict date range */
)
SELECT *
FROM cteLastUpdate lu
INNER JOIN OrangeTable o
ON lu.Record_ID2 = o.RecordID
WHERE lu.RowNum = 1
Last updates per record and month:
SELECT *
FROM UPDATES outerUpd
WHERE exists
(
-- Magic part
SELECT 1
FROM UPDATES innerUpd
WHERE innerUpd.RecordId = outerUpd.RecordId
GROUP BY RecordId
, date_part('year', innerUpd.datecolumn)
, date_part('month', innerUpd.datecolumn)
HAVING max(innerUpd.datecolumn) = outerUpd.datecolumn
)
(Works on PostgreSQL, date_part is different in other RDBMS)