How to pull data from multiple date ranges with one SQL query? - sql

I have two queries. Each query pulls the total count of orders between organization and customer, and the sum of receivables for the orders. The queries are identical except for the date range.
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) between 31 and 60
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
But I need to make this one query so that the output is a single table with columns for orders and receivables in the first date range, and next to those columns another pair of columns for the second date range. (i.e. num_orders < 31, receivables < 31, num_orders 31-60, receivables 31-60)

You can put condition statements inside the count() and sum() functions.
So if you adjusted your where clause to bring back all the orders (across both date ranges) then you could make multiple result columns in your select clause, each counting and summing from just the date range you want.
SELECT ...
count(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31 THEN 1 ELSE NULL END) as num_orders_a,
round(sum(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31 THEN cast(o.total_charge as real) ELSE NULL END)) as receivables_a,
count(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) BETWEEN 31 AND 60 THEN 1 ELSE NULL END) as num_orders_b,
round(sum(CASE WHEN (NOW()::DATE - o.delivery_confirmed_date::DATE) BETWEEN 31 AND 60 THEN cast(o.total_charge as real) ELSE NULL END)) as receivables_b
(same FROM, WHERE, GROUP BY, and ORDER BY sections)

There are a number of ways to skin this cat, and there is a real potential trade-off here between performance and code maintainability.
A CTE here would help with code readability / transparency / maintainability. This is a little bit of a hack way to do it, but this is one idea:
with order_data as (
SELECT
org.organization_id, org.name, cust.name as customer,
o.total_charge::real,
case
when current_date - o.delivery_confirmed_date::DATE < 31 then 1
when current_date - o.delivery_confirmed_date::date < 61 then 2
else 3
end as cat
FROM
organization as org,
orders as o,
organization as cust,
reconcile_order as ro
WHERE
org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
)
select
organization_id, name, customer,
sum (case when cat = 1 then 1 else 0 end) as "Orders < 31",
round (sum (case when cat = 1 then total_charge else 0 end)) as "Rec < 31",
sum (case when cat = 2 then 1 else 0 end) as "Orders 31-60",
round (sum (case when cat = 2 then total_charge else 0 end)) as "Rec 31-60",
sum (case when cat = 3 then 1 else 0 end) as "Orders 61+",
round (sum (case when cat = 3 then total_charge else 0 end)) as "Rec 61+"
from order_data
group by
organization_id, name, name
order by name asc
I think the more common approach might be to pass a "days_delta" column from the CTE (as current_date - o.delivery_confirmed_date::DATE) and have your sum functions look more like this:
sum (case when days_delta between 31 and 60 then ... end) as "31-60"
And... anyone who says you don't need a CTE -- they're right. You don't. For me it just makes the code more pleasant to deal with.
-- EDIT --
The less attractive (and less functional) cousin of the CTE, the subquery:
select
organization_id, name, customer,
sum (case when cat = 1 then 1 else 0 end) as "Orders < 31",
round (sum (case when cat = 1 then total_charge else 0 end)) as "Rec < 31",
sum (case when cat = 2 then 1 else 0 end) as "Orders 31-60",
round (sum (case when cat = 2 then total_charge else 0 end)) as "Rec 31-60",
sum (case when cat = 3 then 1 else 0 end) as "Orders 61+",
round (sum (case when cat = 3 then total_charge else 0 end)) as "Rec 61+"
from (
SELECT
org.organization_id, org.name, cust.name as customer,
o.total_charge::real,
case
when current_date - o.delivery_confirmed_date::DATE < 31 then 1
when current_date - o.delivery_confirmed_date::date < 61 then 2
else 3
end as cat
FROM
organization as org,
orders as o,
organization as cust,
reconcile_order as ro
WHERE
org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
) as order_data
group by
organization_id, name, name
order by name asc

Im not sure that I understand your exact question, but how about this:
Select earlier_ones.organization_id,earlier_ones.organization_id, name, customer, earlier_ones.receivables, later_ones.receivables
FROM (
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) < 31
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
) earlier_ones
LEFT JOIN (
SELECT org.organization_id, org.name, cust.name as customer,
count(*) as num_orders, round (sum(cast(o.total_charge as real))) as receivables
FROM
organization as org, orders as o, organization as cust, reconcile_order as ro
WHERE org.organization_id = o.shipper_org_id
and o.broker_org_id = cust.organization_id
and o.order_id = ro.order_id
and o.status = 'D'
and (ro.receive_payment_in_full = 0 or ro.receive_payment_in_full is NULL)
and (NOW()::DATE - o.delivery_confirmed_date::DATE) between 31 and 60
group by org.organization_id, org.name,
cust.name
order by org.name asc limit 20
) later_ones ON earlier_ones.organization_id = later_ones.organization_id AND earlier_ones.name = later_ones.name;

Related

I am looking to find customers repurchase frequency in SQL from their first purchase date

I am trying to find the customer's repurchase rates from their first order date. For example, for 2016, how many customer purchased 1X in days 1-365 from their initial purchase, how many purchased twice etc.
I have a transaction_detail table which looks like below:
txn_date Customer_ID Transaction_Number Sales
1/2/2019 1 12345 $10
4/3/2018 1 65890 $20
3/22/2019 3 64453 $30
4/3/2019 4 88567 $20
5/21/2019 4 85446 $15
1/23/2018 5 89464 $40
4/3/2019 5 99674 $30
4/3/2019 6 32224 $20
1/23/2018 6 46466 $30
1/20/2018 7 56558 $30
I am able to find the customers who have shopped in 2016 and how many times have they repurchased in 2016, but I need to find the customer who have shopped in 2016 and how many times have they come back from their first purchase date.
I need a starting point for the query, I am not sure how to build this logic in my SQL code.
Any help would be appreciated.
I am using the below query:
WITH by_year
AS (SELECT
Customer_ID,
to_char(txn_date, 'YYYY') AS visit_year
FROM table
GROUP BY Customer_ID, to_char(txn_date, 'YYYY')),
with_first_year
AS (SELECT
Customer_ID,
visit_year,
FIRST_VALUE(visit_year) OVER (PARTITION BY Customer_ID ORDER BY visit_year) AS first_year
FROM by_year),
with_year_number
AS (SELECT
Customer_ID,
visit_year,
first_year,
(visit_year - first_year) AS year_number
FROM with_first_year)
SELECT
first_year AS first_year,
SUM(CASE WHEN year_number = 0 THEN 1 ELSE 0 END) AS year_0,
SUM(CASE WHEN year_number = 1 THEN 1 ELSE 0 END) AS year_1,
SUM(CASE WHEN year_number = 2 THEN 1 ELSE 0 END) AS year_2,
SUM(CASE WHEN year_number = 3 THEN 1 ELSE 0 END) AS year_3,
SUM(CASE WHEN year_number = 4 THEN 1 ELSE 0 END) AS year_4,
SUM(CASE WHEN year_number = 5 THEN 1 ELSE 0 END) AS year_5,
SUM(CASE WHEN year_number = 6 THEN 1 ELSE 0 END) AS year_6,
SUM(CASE WHEN year_number = 7 THEN 1 ELSE 0 END) AS year_7,
SUM(CASE WHEN year_number = 8 THEN 1 ELSE 0 END) AS year_8,
SUM(CASE WHEN year_number = 9 THEN 1 ELSE 0 END) AS year_9
FROM with_year_number
GROUP BY first_year
ORDER BY first_year
Use window functions and aggregation:
select cnt, count(*), min(customer_id), max(customer_id)
from (select customer_id, count(*) as cnt
from (select td.*,
min(txn_date) over (partition by Customer_ID) as min_txn_date
from transaction_detail td
) td
where txn_date >= min_txn_date and txn_date < min_txn_date + interval '365' day
group by customer_id
) c
group by cnt
order by cnt;
So as per my understanding, you want to know the count of the distinct person who first purchased in 2016 and repurchased after one year or more from date of purchase.
Select * from
(
Select customer_id,
Floor(months_between(txn_date, lead_txn_date)/12) as num_years
From
(
Select customer_id,
txn_date,
row_number() over (partition by Customer_ID order by txn_date) as rn,
lead(txn_date) over (partition by Customer_ID order by txn_date) as lead_txn_date
From your_table
)
Where txn_date >= date '2016-01-01'
and txn_date < date '2017-01-01'
and rn = 1
And months_between(txn_date, lead_txn_date) >= 12
)
Pivot
(
Count(1) for num_year in (1,2,3,4)
)
Ultimately, we are finding the number of years between first and second purchase of the customer. And first purchase must be in 2016.
Cheers!!

Curious if there are any methods to sum a total based on a weekly classification for a n day period aside from union

I am looking to sum a total based on a case that ranges for a week from query below to accumulate up to 90 day period. I can currently accomplish this by limiting the dates and union them together; however, is there another way?
The given query is only 2 weeks I would have to continue to union more subselects to fulfill 90 days.
select comp_id, sum(total) from (
(
SELECT CASE
WHEN AVG(amount) < 10 THEN 0
WHEN COUNT(p_id) < SUM(amount)*.5
THEN SUM(amount)*.5
ELSE COUNT(p_id)
END as total, avg(amount), comp_id
FROM p_container INNER JOIN chg ON chg_p_id = p_id
INNER JOIN c_type ON c_type_id = chtype_id
where correction_name like '%correction word%'
AND p_date BETWEEN GETDATE () - 9 AND GETDATE () - 2
group by comp_id
) UNION ALL (
SELECT CASE
WHEN AVG(amount) < 5 THEN 0
WHEN COUNT(p_id) < SUM(amount)*.06
THEN SUM(amount)*.06
ELSE COUNT(p_id)
END as total, avg(amount), comp_id
FROM p_container INNER JOIN chg ON chg_p_id = p_id
INNER JOIN c_type ON c_type_id = chtype_id
where correction_name like '%correction word%'
AND p_date BETWEEN GETDATE () - 17 AND GETDATE () - 10
group by comp_id
)) group by comp_id
You can use a case expressions:
SELECT (CASE WHEN p_date BETWEEN GETDATE() - 9 AND GETDATE() - 2 THEN 'group1'
WHEN p_date BETWEEN GETDATE() - 17 AND GETDATE() - 10 THEN 'group2'
END) as grp,
(CASE WHEN AVG(amount) < 10 THEN 0
WHEN COUNT(p_id) < SUM(amount)*0.5 THEN SUM(amount)*0.5
ELSE COUNT(p_id)
END) as total, AVG(amount),
comp_id
FROM p_container INNER JOIN
chg
ON chg_p_id = p_id INNER JOIN
c_type
ON c_type_id = chtype_id
WHERE correction_name like '%correction word%' AND
p_date >= GETDATE() - 17
GROUP BY grp, comp_id;

rank out of the total in postgres

I am writing a sql script in which I want to get total number of appointments per salesperson then also get how much he rank out of the rest salesperson. e.g Salesperson x has 5 appointment and he rate 4 out of 10 salespersons.
**expected results**:
Salesperson x 5 4/10
Salesperson D 6 5/10
Salesperson s 8 7/10
Use rank()
with sales as
(
select Salesperson, count(appointment) appointments
from SalesTable
group by Salesperson
)
select sales.*, rank() over (order by appointments desc) as salesrank
from sales
Hi Thanks for your response. I tried it this way it works:
select id,sales_person,"Appointment/Day",rank_for_the_day,"Appointment/Week",rank_for_the_week,"Appointment/Month",
rank_for_the_month,"Appointment/year",rank_for_the_year
from(
select supplied_id,salesperson,sum(case when appointment_date::date=current_date then 1 else 0 end )"Appointment/Day",
rank() over (order by sum(case when appointment_date::date=current_date then 1 else 0 end ) desc )||'/'||
(select sum(case when appointment_date::date=current_date then 1 else 0 end ) from match where date_part( 'year', appointment_date)=2017
and appointment_date is not null and date_part('day',appointment_date)=date_part('day',current_date) ) rank_for_the_day,
sum(case when appointment_date::date between current_date-7 and current_date then 1 else 0 end )"Appointment/Week",
rank() over (order by sum(case when appointment_date::date between current_date-7 and current_date then 1 else 0 end ) desc)||'/'||
(select sum(case when appointment_date::date between current_date-7 and current_date then 1 else 0 end )
from match m where date_part( 'year', appointment_date)=2017 and appointment_date is not null
and date_part('week',appointment_date)=date_part('week',current_date) ) rank_for_the_week,
sum(case when date_part('month',appointment_date)=date_part('month',current_date) then 1 else 0 end )"Appointment/Month",
rank() over (order by sum(case when date_part('month',appointment_date)=date_part('month',current_date) then 1 else 0 end ) desc)||'/'||
(select sum(case when date_part('month',appointment_date)=date_part('month',current_date) then 1 else 0 end )
from match m where date_part( 'year', appointment_date)=2017 and appointment_date is not null
and date_part('month',appointment_date)=date_part('month',current_date) ) rank_for_the_month,
sum(case when date_part('year',appointment_date)=date_part('year',current_date) then 1 else 0 end )"Appointment/year",
rank() over (order by sum(case when date_part('year',appointment_date)=date_part('year',current_date) then 1 else 0 end ) desc)||'/'||
(select sum(case when date_part('year',appointment_date)=date_part('year',current_date) then 1 else 0 end )
from match m where date_part( 'year', appointment_date)=2017 and appointment_date is not null
and date_part('year',appointment_date)=date_part('year',current_date) ) rank_for_the_year
from salespersontable
where date_part( 'year', appointment_date)=2017 and appointment_date is not null
group by id,salesperson
)x order by 6 desc
However,I would appreciate an efficient way to write this query to minimize resource consumption.

How to Optimize "JOIN" in PostgreSQL

I have four tables to pull information from user: first_name, mongouser: email, card_status,transaction: transaction_type, balance, posted_at, is_atm, is_purchase, user_login: user_id, login_date, login_id...
Before I added the fourth table - user_login, everything was efficient. However, the fourth JOIN made everything slow. I wrote queries as shown below
SELECT * FROM
(SELECT
ssluserid,
first_name,
m.email,
zipcode,
date_part('year',age(birthday)) AS birthday,
(current_date - DATE(created_date)) AS duration,
CASE WHEN card_status = 'ACTIVE' THEN 1 ELSE 0 END AS IS_ACTIVE,
SUM(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN balance END) AS LOAD_AMT,
SUM(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN balance END) AS SPEND_AMT,
COUNT(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN balance END) AS LOAD_CT,
COUNT(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN balance END) AS SPEND_CT,
MIN(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN DATE(posted_at) END) AS FIRST_LOAD,
MAX(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 THEN DATE(posted_at) END) AS LAST_LOAD,
MIN(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN DATE(posted_at) END) AS FIRST_SPEND,
MAX(CASE WHEN transaction_type = 'Debit' AND balance > 1.00 THEN DATE(posted_at) END) AS LAST_SPEND,
SUM(CASE WHEN transaction_type = 'Debit' AND is_atm = 't' AND DATE(posted_at) >= CURRENT_DATE - INTERVAL '90 days'
THEN balance END) AS ATM_AMT,
SUM(CASE WHEN transaction_type = 'Debit' AND is_purchase = 't' AND DATE(posted_at) >= CURRENT_DATE - INTERVAL '90 days'
THEN balance END) AS POS_AMT,
SUM(CASE WHEN transaction_type = 'Credit' AND balance > 1.00 AND DATE(posted_at) >= CURRENT_DATE - INTERVAL '90 days'
THEN balance END) AS LOAD_VOL,
COUNT(CASE WHEN DATE(login_date) >= CURRENT_DATE - INTERVAL '90 days' THEN
login_id END) AS CT_LOGIN
FROM
mongouser m
LEFT OUTER JOIN
user u
ON m.userid = u.id
LEFT OUTER JOIN transactions t
ON u.id = t.user_id
LEFT OUTER JOIN user_login l
ON m.userid = l.user_id
GROUP BY 1,2,3,4,5,6,7) t
WHERE LAST_LOAD >= CURRENT_DATE - INTERVAL '90 days'
ORDER BY 9 DESC;
This query has been running for almost 40 min...Are there any ways to optimize it?
Focusing on just your statements you know where the problem is. You had this before
LEFT OUTER JOIN user u
ON m.userid = u.id
And you say things "weren't slow." Then you add this,
LEFT OUTER JOIN user_login l
ON m.userid = l.user_id
And you say things get slow. It's likely that you have an index on m.userid. Do you have an index on l.user_id?
CREATE INDEX foo ON user_login ( user_id );

Oracle opening and closing balance - SQL or PL/SQL needed?

select year,
month ,
d.PROD_ID,
T.CUSTOMER_ID,
SUM(CASE WHEN D.OP_TYPE = 1 THEN d.qty END) EARNED,
SUM(CASE WHEN D.OP_TYPE = 2 THEN d.qty END) SPEND
FROM TXN_HEADER T ,
TXN_DETAIL d ,
CUSTOMER A,
PRODUCT e
WHERE T.AMOUNT > 0
AND A.TYPE = 0
AND T.CUSTOMER_ID = A.CUSTOMER_ID
AND T.TXN_PK = D.TXN_PK
and d.PROD_ID = e.PROD_ID
and e.unit = 0
group by year, month ,d.PROD_ID, T.CUSTOMER_ID
ORDER BY 1,2,3,4
Output is as follows (here opening and closing not generated by query, but I required that has to be from the query)
YEAR MONTH PROD CUSTOMER OPENING EARNED SPEND CLOSING
---- ----- ---- -------- ------- ------ ----- -------
2012 8 548 12033 0 8 2 6
2012 9 509 12033 0 24 0 24
2012 9 509 12047 0 14 0 14
2012 9 548 12033 6 1 0 7
2012 9 548 12047 0 1 0 1
I required to generate the output as above. Here PROD_ID,CUSTOMER_ID wise dynamically the prev closing balance to be populated as opening and it shoulde calculate closing balance (opening+earned-spend) monthwise,customer wise ,product wise. is it possible to write in SQL or need to go PL/SQL?
I'd use analytics, with PROD_ID and CUSTOMER_ID in the partition clause to avoid mixing products and customers.
WITH
MONTHLY_BALANCE AS
(
SELECT
YEAR,
MONTH,
D.PROD_ID,
T.CUSTOMER_ID,
SUM(CASE WHEN D.OP_TYPE = 1 THEN D.QTY ELSE NULL END) EARNED,
SUM(CASE WHEN D.OP_TYPE = 2 THEN D.QTY ELSE NULL END) SPEND,
FROM TXN_HEADER T
JOIN CUSTOMER A
ON T.CUSTOMER_ID = A.CUSTOMER_ID
JOIN TXN_DETAIL D
ON T.TXN_PK = D.TXN_PK
JOIN PRODUCT E
ON D.PROD_ID = E.PROD_ID
WHERE T.AMOUNT > 0
AND A.TYPE = 0
AND E.UNIT = 0
GROUP BY YEAR, MONTH, D.PROD_ID, T.CUSTOMER_ID
)
SELECT
YEAR,
MONTH,
PROD_ID,
CUSTOMER_ID,
SUM(NVL(EARNED, 0) - NVL(SPEND, 0)) OVER(PARTITION BY PROD_ID, CUSTOMER_ID ORDER BY YEAR, MONTH ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) OPENING,
EARNED,
SPEND,
SUM(NVL(EARNED, 0) - NVL(SPEND, 0)) OVER(PARTITION BY PROD_ID, CUSTOMER_ID ORDER BY YEAR, MONTH ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT_ROW) CLOSING
FROM MONTHLY_BALANCE
ORDER BY 1, 2, 3, 4
Your CASE needs an ELSE
CASE WHEN D.OP_TYPE = 1 THEN d.qty ELSE 0 END
Without the else the CASE will return NULL when D.OP_TYPE is not equal to 1, and anything+NULL=NULL. When your WHEN is not satisfied it returns NULL and that is why you do not see anything for those columns.
To get OPENING and CLOSING calculated as you may want to use analytic functions like LEAD and LAG.
Select year,month,prod_id,customer_id,
LAG(closing,1,0) OVER (order by year,month,prod_id,customer_id) as opening,
earned,spend
,(LAG(closing,1,0) OVER (order by year,month,prod_id,customer_id)+closing) as closing
from (WITH temp AS (select year,
month ,
d.PROD_ID,
T.CUSTOMER_ID,
0 OPEN,
SUM(CASE WHEN D.OP_TYPE = 1 THEN d.qty END) EARNED,
SUM(CASE WHEN D.OP_TYPE = 2 THEN d.qty END) SPEND,
0 CLOSE
FROM TXN_HEADER T ,
TXN_DETAIL d ,
CUSTOMER A,
PRODUCT e
WHERE T.AMOUNT > 0
AND A.TYPE = 0
AND T.CUSTOMER_ID = A.CUSTOMER_ID
AND T.TXN_PK = D.TXN_PK
and d.PROD_ID = e.PROD_ID
and e.unit = 0
group by year, month ,d.PROD_ID, T.CUSTOMER_ID
ORDER BY 1,2,3,4)
SELECT year,month,prod_id,customer_id,open,earned,spend,(open+earned-spend) as closing
from temp);