How to Optimize "JOIN" in PostgreSQL

How to Optimize "JOIN" in PostgreSQL - sql

I have four tables to pull information from user: first_name, mongouser: email, card_status,transaction: transaction_type, balance, posted_at, is_atm, is_purchase, user_login: user_id, login_date, login_id...
Before I added the fourth table - user_login, everything was efficient. However, the fourth JOIN made everything slow. I wrote queries as shown below
SELECT * FROM
(SELECT
ssluserid,
first_name,
m.email,
zipcode,
date_part('year',age(birthday)) AS birthday,
(current_date - DATE(created_date)) AS duration,
CASE WHEN card_status = 'ACTIVE' THEN 1 ELSE 0 END AS IS_ACTIVE,
SUM(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN balance END) AS LOAD_AMT,
SUM(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN balance END) AS SPEND_AMT,
COUNT(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN balance END) AS LOAD_CT,
COUNT(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN balance END) AS SPEND_CT,
MIN(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN DATE(posted_at) END) AS FIRST_LOAD,
MAX(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN DATE(posted_at) END) AS LAST_LOAD,
MIN(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN DATE(posted_at) END) AS FIRST_SPEND,
MAX(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN DATE(posted_at) END) AS LAST_SPEND,
SUM(CASE WHEN transaction_type = 'Debit' AND is_atm = 't' AND DATE(posted_at) >= CURRENT_DATE - INTERVAL '90 days'
THEN balance END) AS ATM_AMT,
SUM(CASE WHEN transaction_type = 'Debit' AND is_purchase = 't' AND DATE(posted_at) >= CURRENT_DATE - INTERVAL '90 days'
THEN balance END) AS POS_AMT,
SUM(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 AND DATE(posted_at) >= CURRENT_DATE - INTERVAL '90 days'
THEN balance END) AS LOAD_VOL,
COUNT(CASE WHEN DATE(login_date) >= CURRENT_DATE - INTERVAL '90 days' THEN
login_id END) AS CT_LOGIN
FROM
mongouser m
LEFT OUTER JOIN
user u
ON m.userid = u.id
LEFT OUTER JOIN transactions t
ON u.id = t.user_id
LEFT OUTER JOIN user_login l
ON m.userid = l.user_id
GROUP BY 1,2,3,4,5,6,7) t
WHERE LAST_LOAD >= CURRENT_DATE - INTERVAL '90 days'
ORDER BY 9 DESC;
This query has been running for almost 40 min...Are there any ways to optimize it?

Focusing on just your statements you know where the problem is. You had this before
LEFT OUTER JOIN user u
ON m.userid = u.id
And you say things "weren't slow." Then you add this,
LEFT OUTER JOIN user_login l
ON m.userid = l.user_id
And you say things get slow. It's likely that you have an index on m.userid. Do you have an index on l.user_id?
CREATE INDEX foo ON user_login ( user_id );

Related

SQL - pull data for date range based on a start date

I have the below SQL query, which pulls account revenues for the past 3 months, along with each account's service start date (I'm using Amazon Redshift via SQL Workbench)
select distinct r.account_id, r.account_name, s.start_date
,SUM(CASE WHEN r.datekey between '20200601' and '20200630' THEN revenue ELSE 0 END) AS "June 2020"
,SUM(CASE WHEN r.datekey between '20200701' and '20200731' THEN revenue ELSE 0 END) AS "July 2020"
,SUM(CASE WHEN r.datekey between '20200801' and '20200831' THEN revenue ELSE 0 END) AS "August 2020"
from revenues r
join start_dates s on r.account_id = s.account_id
group by r.account_id, r.account_name, s.start_date;
How can modify the above query to pull revenues for the 3 months after each client's start date, keeping in mind this 3-month range will be different for each client? I've tried using DATEPART and DATEADD but I haven't found a solution using those statements.

You can change the join conditions to filter the revenues of each account_id on the 3 months that follow it start_date, and then use conditional aggregation:
select
s.account_id,
sum(case when r.datekey < dateadd(month, 1, s.start_date) then revenue else 0 end) as month1,
sum(case when r.datekey >= dateadd(month, 1, s.start_date) and r.datekey < dateadd(month, 2, s.start_date) then revenue else 0 end) as month2,
sum(case when r.datekey >= dateadd(month, 2, s.start_date) then revenue else 0 end) as month3
from start_dates s
left join revenues r
on r.account_id = s.account_id
and r.datekey >= s.start_date
and r.datekey < dateadd(month, 3, s.start_date)
group by s.account_id

Here, use DATEDIFF with start_date and GETDATE()
select distinct r.account_id, r.account_name, s.start_date
,SUM(CASE WHEN r.datekey between '20200601' and '20200630' THEN revenue ELSE 0 END) AS "June 2020"
,SUM(CASE WHEN r.datekey between '20200701' and '20200731' THEN revenue ELSE 0 END) AS "July 2020"
,SUM(CASE WHEN r.datekey between '20200801' and '20200831' THEN revenue ELSE 0 END) AS "August 2020"
from revenues r
join start_dates s on r.account_id = s.account_id
WHERE DATEDIFF(s.start_date, GETDATE())<=90
group by r.account_id, r.account_name, s.start_date;

SQL row in column

I have this SQL query
SELECT COUNT(*)*100/(SELECT COUNT(*) FROM tickets WHERE status = 'closed')
FROM tickets
WHERE closed_at <= due_at
UNION
SELECT COUNT(*)*100/(SELECT COUNT(*) FROM tickets WHERE status = 'closed')
FROM tickets
WHERE closed_at > due_at;
and it returns this
ROW 1 - 35
ROW 2 - 47
but I need the return like this:
1 | 2 |
35 47
I need the returns in columns, not rows.
Thanks.

Use conditional aggregation. I would recommend:
SELECT (SUM(CASE WHEN closed_at <= due_at THEN 100.0 ELSE 0 END) /
SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
),
(SUM(CASE WHEN closed_at > due_at THEN 100.0 ELSE 0 END) /
SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
)
FROM tickets ;
It seems strange that you are filtering on status = 'closed' in the denominator, but not in the numerator. If status = closed should be the filter for both, then you can simplify this to:
SELECT AVG(CASE WHEN closed_at <= due_at THEN 100.0 ELSE 0 END),
AVG(CASE WHEN closed_at > due_at THEN 100.0 ELSE 0 END)
FROM tickets
WHERE status = 'closed';

How to pull data from multiple date ranges with one SQL query?

I have two queries. Each query pulls the total count of orders between organization and customer, and the sum of receivables for the orders. The queries are identical except for the date range.
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) between 31 and 60
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
But I need to make this one query so that the output is a single table with columns for orders and receivables in the first date range, and next to those columns another pair of columns for the second date range. (i.e. num_orders < 31, receivables < 31, num_orders 31-60, receivables 31-60)

You can put condition statements inside the count() and sum() functions.
So if you adjusted your where clause to bring back all the orders (across both date ranges) then you could make multiple result columns in your select clause, each counting and summing from just the date range you want.
SELECT ...
count(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31 THEN 1 ELSE NULL END) as num_orders_a,
round(sum(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31 THEN cast(o.total_charge as real) ELSE NULL END)) as receivables_a,
count(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) BETWEEN 31 AND 60 THEN 1 ELSE NULL END) as num_orders_b,
round(sum(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) BETWEEN 31 AND 60 THEN cast(o.total_charge as real) ELSE NULL END)) as receivables_b
(same FROM, WHERE, GROUP BY, and ORDER BY sections)

There are a number of ways to skin this cat, and there is a real potential trade-off here between performance and code maintainability.
A CTE here would help with code readability / transparency / maintainability. This is a little bit of a hack way to do it, but this is one idea:
with order_data as (
SELECT
org.organization_id, org.name, cust.name as customer,
o.total_charge::real,
case
when current_date - o.delivery_confirmed_date::DATE < 31 then 1
when current_date - o.delivery_confirmed_date::date < 61 then 2
else 3
end as cat
FROM
organization as org,
orders as o,
organization as cust,
reconcile_order as ro
WHERE
org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
)
select
organization_id, name, customer,
sum (case when cat = 1 then 1 else 0 end) as "Orders < 31",
round (sum (case when cat = 1 then total_charge else 0 end)) as "Rec < 31",
sum (case when cat = 2 then 1 else 0 end) as "Orders 31-60",
round (sum (case when cat = 2 then total_charge else 0 end)) as "Rec 31-60",
sum (case when cat = 3 then 1 else 0 end) as "Orders 61+",
round (sum (case when cat = 3 then total_charge else 0 end)) as "Rec 61+"
from order_data
group by
organization_id, name, name
order by name asc
I think the more common approach might be to pass a "days_delta" column from the CTE (as current_date - o.delivery_confirmed_date::DATE) and have your sum functions look more like this:
sum (case when days_delta between 31 and 60 then ... end) as "31-60"
And... anyone who says you don't need a CTE -- they're right. You don't. For me it just makes the code more pleasant to deal with.
-- EDIT --
The less attractive (and less functional) cousin of the CTE, the subquery:
select
organization_id, name, customer,
sum (case when cat = 1 then 1 else 0 end) as "Orders < 31",
round (sum (case when cat = 1 then total_charge else 0 end)) as "Rec < 31",
sum (case when cat = 2 then 1 else 0 end) as "Orders 31-60",
round (sum (case when cat = 2 then total_charge else 0 end)) as "Rec 31-60",
sum (case when cat = 3 then 1 else 0 end) as "Orders 61+",
round (sum (case when cat = 3 then total_charge else 0 end)) as "Rec 61+"
from (
SELECT
org.organization_id, org.name, cust.name as customer,
o.total_charge::real,
case
when current_date - o.delivery_confirmed_date::DATE < 31 then 1
when current_date - o.delivery_confirmed_date::date < 61 then 2
else 3
end as cat
FROM
organization as org,
orders as o,
organization as cust,
reconcile_order as ro
WHERE
org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
) as order_data
group by
organization_id, name, name
order by name asc

Im not sure that I understand your exact question, but how about this:
Select earlier_ones.organization_id,earlier_ones.organization_id, name, customer, earlier_ones.receivables, later_ones.receivables
FROM (
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
) earlier_ones
LEFT JOIN (
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) between 31 and 60
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
) later_ones ON earlier_ones.organization_id = later_ones.organization_id AND earlier_ones.name = later_ones.name;

Incremental adding in sql select

I have a table where customer transactions are stored in this format:
Account Tran_type Tran_Amount tran_particular Tran_date
165266 C 5000 deposit 19_SEP-2014
165266 D 3000 withdrawal 20-SEP-2014
165266 C 8000 Deposit 21-SEP-2014
I am attempting to extract the Information for a Statement like this:
select tran_date, tran_particular,
(case when tran_type = 'C' then tran_amt else 0 end) CREDIT,
(case when tran_type = 'D' then tran_amt else 0 end) DEBIT
from tran_table order bby tran_date asc;
Is there a wat to add the Balance column on each row so it would show the Balance after the Transaction? say:
DATE DESC CREDIT DEBIT BALANCE
19-SEP-2014 DEPOSIT 5000 0 5000
20-SEP-2014 WITHDRAWAL 3000 2000
21-SEP-2014 DEPOSIT 8000 0 10000
Please assist.
EDIT I have trie the aswers suggested but it seems my balance is tagged to the date. See the output I have currently:
See the Balance does not change until the date changes.

select tran_date, tran_particular, Credit, Debit,
SUM(Delta) OVER (ORDER BY tran_date) AS Balance
from
(
select tran_date, tran_particular,
Case Tran_Type
When 'C' THEN Tran_Amount
Else 0
End AS Credit,
Case Tran_Type
When 'D' THEN Tran_Amount
Else 0
End AS Debit,
Case Tran_Type
When 'C' THEN Tran_Amount
When 'D' THEN -1 * Tran_Amount
Else 0
End AS Delta
from TRANSACTIONS
order by tran_date
)
Should do it

Select *,Sum( case when type ='C'
then amount
else -amount
end ) over (ORDER BY date ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )'Balance'
from #tt1

That will cost a sub-query:
SELECT tran_date, tran_particular,
(CASE when tran_type = 'C' THEN tran_amt ELSE 0 end) CREDIT,
(CASE when tran_type = 'D' THEN tran_amt ELSE 0 end) DEBIT,
(SELECT
SUM(CASE when type = 'C' tran_amt ELSE (-1) * tran_amt end)
FROM tran_table trn2
WHERE
trn2.Account = trn1.Account
AND trn2.tran_id <= trn1.tran_id
-- AND trn2.tran_date <= trn1.tran_date
)
BALANCE
FROM
tran_table trn1 ORDER BY tran_date asc;
In large scale data, having such a sub-query is not recommended. Having a materialized view is more rational.

Oracle opening and closing balance - SQL or PL/SQL needed?

select year,
month ,
d.PROD_ID,
T.CUSTOMER_ID,
SUM(CASE WHEN D.OP_TYPE = 1 THEN d.qty END) EARNED,
SUM(CASE WHEN D.OP_TYPE = 2 THEN d.qty END) SPEND
FROM TXN_HEADER T ,
TXN_DETAIL d ,
CUSTOMER A,
PRODUCT e
WHERE T.AMOUNT > 0
AND A.TYPE = 0
AND T.CUSTOMER_ID = A.CUSTOMER_ID
AND T.TXN_PK = D.TXN_PK
and d.PROD_ID = e.PROD_ID
and e.unit = 0
group by year, month ,d.PROD_ID, T.CUSTOMER_ID
ORDER BY 1,2,3,4
Output is as follows (here opening and closing not generated by query, but I required that has to be from the query)
YEAR MONTH PROD CUSTOMER OPENING EARNED SPEND CLOSING
---- ----- ---- -------- ------- ------ ----- -------
2012 8 548 12033 0 8 2 6
2012 9 509 12033 0 24 0 24
2012 9 509 12047 0 14 0 14
2012 9 548 12033 6 1 0 7
2012 9 548 12047 0 1 0 1
I required to generate the output as above. Here PROD_ID,CUSTOMER_ID wise dynamically the prev closing balance to be populated as opening and it shoulde calculate closing balance (opening+earned-spend) monthwise,customer wise ,product wise. is it possible to write in SQL or need to go PL/SQL?

I'd use analytics, with PROD_ID and CUSTOMER_ID in the partition clause to avoid mixing products and customers.
WITH
MONTHLY_BALANCE AS
(
SELECT
YEAR,
MONTH,
D.PROD_ID,
T.CUSTOMER_ID,
SUM(CASE WHEN D.OP_TYPE = 1 THEN D.QTY ELSE NULL END) EARNED,
SUM(CASE WHEN D.OP_TYPE = 2 THEN D.QTY ELSE NULL END) SPEND,
FROM TXN_HEADER T
JOIN CUSTOMER A
ON T.CUSTOMER_ID = A.CUSTOMER_ID
JOIN TXN_DETAIL D
ON T.TXN_PK = D.TXN_PK
JOIN PRODUCT E
ON D.PROD_ID = E.PROD_ID
WHERE T.AMOUNT > 0
AND A.TYPE = 0
AND E.UNIT = 0
GROUP BY YEAR, MONTH, D.PROD_ID, T.CUSTOMER_ID
)
SELECT
YEAR,
MONTH,
PROD_ID,
CUSTOMER_ID,
SUM(NVL(EARNED, 0) - NVL(SPEND, 0)) OVER(PARTITION BY PROD_ID, CUSTOMER_ID ORDER BY YEAR, MONTH ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) OPENING,
EARNED,
SPEND,
SUM(NVL(EARNED, 0) - NVL(SPEND, 0)) OVER(PARTITION BY PROD_ID, CUSTOMER_ID ORDER BY YEAR, MONTH ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW) CLOSING
FROM MONTHLY_BALANCE
ORDER BY 1, 2, 3, 4

Your CASE needs an ELSE
CASE WHEN D.OP_TYPE = 1 THEN d.qty ELSE 0 END
Without the else the CASE will return NULL when D.OP_TYPE is not equal to 1, and anything+NULL=NULL. When your WHEN is not satisfied it returns NULL and that is why you do not see anything for those columns.
To get OPENING and CLOSING calculated as you may want to use analytic functions like LEAD and LAG.
Select year,month,prod_id,customer_id,
LAG(closing,1,0) OVER (order by year,month,prod_id,customer_id) as opening,
earned,spend
,(LAG(closing,1,0) OVER (order by year,month,prod_id,customer_id)+closing) as closing
from (WITH temp AS (select year,
month ,
d.PROD_ID,
T.CUSTOMER_ID,
0 OPEN,
SUM(CASE WHEN D.OP_TYPE = 1 THEN d.qty END) EARNED,
SUM(CASE WHEN D.OP_TYPE = 2 THEN d.qty END) SPEND,
0 CLOSE
FROM TXN_HEADER T ,
TXN_DETAIL d ,
CUSTOMER A,
PRODUCT e
WHERE T.AMOUNT > 0
AND A.TYPE = 0
AND T.CUSTOMER_ID = A.CUSTOMER_ID
AND T.TXN_PK = D.TXN_PK
and d.PROD_ID = e.PROD_ID
and e.unit = 0
group by year, month ,d.PROD_ID, T.CUSTOMER_ID
ORDER BY 1,2,3,4)
SELECT year,month,prod_id,customer_id,open,earned,spend,(open+earned-spend) as closing
from temp);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to Optimize "JOIN" in PostgreSQL - sql

Related

SQL - pull data for date range based on a start date

SQL row in column

How to pull data from multiple date ranges with one SQL query?

Incremental adding in sql select

Oracle opening and closing balance - SQL or PL/SQL needed?

Categories

Resources