Loop in Oracle SQL, comparing one month to another - sql

I have to draft a SQL query which does the following:
Compare current week (e.g. week 10) amount to the average amount over previous 4 weeks (Week# 9,8,7,6).
Now I need to run the query on a monthly basis so say for weeks (10,11,12,13).
As of now I am running it four times giving the week parameter on each run.
For example my current query is something like this :
select account_id, curr.amount,hist.AVG_Amt
from
(
select
to_char(run_date,'IW') Week_ID,
sum(amount) Amount,
account_id
from Transaction t
where to_char(run_date,'IW') = '10'
group by account_id,to_char(run_date,'IW')
) curr,
(
select account_id,
sum(amount) / count(to_char(run_date,'IW')) as AVG_Amt
from Transactions
where to_char(run_date,'IW') in ('6','7','8','9')
group by account_id
) hist
where
hist.account_id = curr.account_id
and curr.amount > 2*hist.AVG_Amt;
As you can see, if I have to run the above query for week 11,12,13 I have to run it three separate times. Is there a way to consolidate or structure the query such that I only run once and I get the comparison data all together?
Just an additional info, I need to export the data to Excel (which I do after running query on the PL/SQL developer) and export to Excel.
Thanks!
-Abhi

You can use a correlated sub-query to get the sum of amounts for the last 4 weeks for a given week.
select
to_char(run_date,'IW') Week_ID,
sum(amount) curAmount,
(select sum(amount)/4.0 from transaction
where account_id = t.account_id
and to_char(run_date,'IW') between to_char(t.run_date,'IW')-4
and to_char(t.run_date,'IW')-1
) hist_amount,
account_id
from Transaction t
where to_char(run_date,'IW') in ('10','11','12','13')
group by account_id,to_char(run_date,'IW')
Edit: Based on OP's comment on the performance of the query above, this can also be accomplished using lag to get the previous row's value. Count of number of records present in the last 4 weeks can be achieved using a case expression.
with sum_amounts as
(select to_char(run_date,'IW') wk, sum(amount) amount, account_id
from Transaction
group by account_id, to_char(run_date,'IW')
)
select wk, account_id, amount,
1.0 * (lag(amount,1,0) over (order by wk) + lag(amount,2,0) over (order by wk) +
lag(amount,3,0) over (order by wk) + lag(amount,4,0) over (order by wk))
/ case when lag(amount,1,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,2,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,3,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,4,0) over (order by wk) <> 0 then 1 else 0 end
as hist_avg_amount
from sum_amounts

I think that is what you are looking for:
with lagt as (select to_char(run_date,'IW') Week_ID, sum(amount) Amount, account_id
from Transaction t
group by account_id, to_char(run_date,'IW'))
select Week_ID, account_id, amount,
(lag(amount,1,0) over (order by week) + lag(amount,2,0) over (order by week) +
lag(amount,3,0) over (order by week) + lag(amount,4,0) over (order by week)) / 4 as average
from lagt;

Related

SQL - How to calculate opening and closing balance

I'm trying to figure out a way to calculate two new columns for the following model:
Link: http://sqlfiddle.com/#!7/0a6ce9/1
A) first column: should be the "OPENING_BALANCE". It needs to be the sum of the "AMOUNT" column starting off when transaction date started. The transaction date could be any.
B) Second column: Should be the "CLOSING_BALANCE". This one will always be the sum of the "opening balance" from previous day + "amount" of the current day. So, by the second TRANSACTION_DATE forward, the "opening balance" will always be the "closing balance" from the previous day.
Here's an example:
Can anyone share any examples of how I could achieve this?
You can use SUM() window function and a CASE expression to check for the 1st transaction:
SELECT *,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY TRANSACTION_DATE, TRANSACTION_ID) = 1 THEN AMOUNT
ELSE SUM(AMOUNT) OVER (ORDER BY TRANSACTION_DATE, TRANSACTION_ID) - AMOUNT
END OPENING_BALANCE,
SUM(AMOUNT) OVER (ORDER BY TRANSACTION_DATE, TRANSACTION_ID) CLOSING_BALANCE
FROM TRANSATION_TABLE;
See the demo.
select f.TRANSACTION_ID, f.TRANSACTION_DATE, f.AMOUNT,CASE WHEN sum(s.AMOUNT)-f.AMOUNT >0
THEN sum(s.AMOUNT)-f.AMOUNT ELSE s.AMOUNT END AS OPENING_BALANCE,sum(s.AMOUNT) as CLOSING_BALANCE
from TRANSATION_TABLE f inner join TRANSATION_TABLE s on f.TRANSACTION_ID >= s.TRANSACTION_ID
group by f.amount order by f.TRANSACTION_ID asc

SQL calculation with previous row + current row

I want to make a calculation based on the excel file. I succeed to obtain 2 of the first records with LAG (as you can check on the 2nd screenshot). Im out of ideas how to proceed from now and need help. I just need the Calculation column take its previous data. I want to automatically calculate it over all the dates. I also tried to make a LAG for the calculation but manually and the result was +1 row more data instead of NULL. This is a headache.
LAG(Data ingested, 1) OVER ( ORDER BY DATE ASC ) AS LAG
You seem to want cumulative sums:
select t.*,
(sum(reconciliation + aves - microa) over (order by date) -
first_value(aves - microa) over (order by date)
) as calculation
from CalcTable t;
Here is a SQL Fiddle.
EDIT:
Based on your comment, you just need to define a group:
select t.*,
(sum(reconciliation + aves - microa) over (partition by grp order by date) -
first_value(aves - microa) over (partition by grp order by date)
) as calculation
from (select t.*,
count(nullif(reconciliation, 0)) over (order by date) as grp
from CalcTable t
) t
order by date;
Imo this could be solved using a "gaps and islands" approach. When Reconciliation>0 then create a gap. SUM(GAP) OVER converts the gaps into island groupings. In the outer query the 'sum_over' column (which corresponds to the 'Calculation') is a cumumlative sum partitioned by the island groupings.
with
gap_cte as (
select *, case when [Reconciliation]>0 then 1 else 0 end gap
from CalcTable),
grp_cte as (
select *, sum(gap) over (order by [Date]) grp
from gap_cte)
select *, sum([Reconciliation]+
(case when gap=1 then 0 else Aves end)-
(case when gap=1 then 0 else Microa end))
over (partition by grp order by [Date]) sum_over
from grp_cte;
[EDIT]
The CASE statement could be CROSS APPLY'ed instead
with
grp_cte as (
select c.*, v.gap, sum(v.gap) over (order by [Date]) grp
from #CalcTable c
cross apply (values (case when [Reconciliation]>0 then 1 else 0 end)) v(gap))
select *, sum([Reconciliation]+
(case when gap=1 then 0 else Aves end)-
(case when gap=1 then 0 else Microa end))
over (partition by grp order by [Date]) sum_over
from grp_cte;
Here is a fiddle

How to get the difference between (multiple) two different rows?

I have a set of data containing some fields: month, customer_id, row_num (RANK), and verified_date.
The rank field indicates the first (1) and second (2) purchase of each customer. I would like to know the time difference between first and second purchase for each customer and show only its first month = month where row_num = 1.
https://i.ibb.co/PjJk5Y0/Capture.png
So my expected result is like below image:
https://i.ibb.co/y5Mww7k/Capture-2.png
I'm using StandardSQL in Google Bigquery.
row_num, verified_date
from table
GROUP BY 1, 2```
We can try using a pivot query here, aggregating by the customer_id:
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

sql server - returning text for multiple vendors based on average values

I have a table giving ratings for various suppliers/areas. The format is below.
I would like to know, for each distinct month, and supplier (exp_id)
What was the highest and lowest rated pickup_ward_text
My expected output is similar to:
[year][month][exp_id] [highest rated pickup_ward] [lowest rated pickup_ward]
Where the 'rated' is an average of rating_driver, rating_punctuality & rating_vehicle
I am completely lost on how to achieve this, I have tried to past the first line of the table correctly below. a
Year Month exp_id RATING_DRIVER RATING_PUNCTUALITY RATING_VEHICLE booking_id pickup_date ratingjobref rating_date PICKUP_WARD_TEXT
2013 10 4 5.00 5.00 5.00 1559912 30:00.0 1559912 12/10/2013 18:29 N4
There's a common pattern using row_number() to find either the minimum or the maximum. You can combine them with a little trickery:
select
year,
month,
exp_id,
max(case rn1 when 1 then pickup_ward_text end) as min_pickup_ward_text,
max(case rn2 when 1 then pickup_ward_text end) as max_pickup_ward_text
from (
select
year,
month,
exp_id,
pickup_ward_text,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle
) rn1,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle desc
) rn2
from
mytable
) x
where
rn1 = 1 or rn2 = 1 -- this line isn't necessary, but might make things quicker
group by
year,
month,
exp_id
order by
year,
month,
exp_id
It may actually be faster to do two derived tables, for each part and inner join them. Some testing is in order:
select
n.year,
n.month,
n.exp_id,
n.pickup_ward_text as min_pickup_ward_text,
x.pickup_ward_text as max_pickup_ward_text
from (
select
year,
month,
exp_id,
pickup_ward_text,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle
) rn
from
mytable
) n
inner join (
select
year,
month,
exp_id,
pickup_ward_text,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle desc
) rn
from
mytable
) x
on n.year = x.year and n.month = x.month and n.exp_id = x.exp_id
where
n.rn = 1 and
x.rn = 1
order by
year,
month,
exp_id