SQL - SUMIF substitute? - sql

This question is best asked using an example - if I have daily data (in this case, daily Domestic Box Office for the movie Elvis), how can I sum only the weekend values?
If the data looks like this:
Date
DBO
6/24/2022
12755467
6/25/2022
9929779
6/26/2022
8526333
6/27/2022
4253038
6/28/2022
5267391
6/29/2022
4010762
6/30/2022
3577241
7/1/2022
5320812
7/2/2022
6841224
7/3/2022
6290576
7/4/2022
4248679
7/5/2022
3639110
7/6/2022
3002182
7/7/2022
2460108
7/8/2022
3326066
7/9/2022
4324040
7/10/2022
3530965
I'd like to be able to get results that look like this:
Weekend
DBO Sum
1
31211579
2
18452612
3
11181071
Also - not sure how tricky this would be but would love to include percent change v. last weekend.
Weekend
DBO Sum
% Change
1
31211579
2
18452612
-41%
3
11181071
-39%
I tried this with CASE WHEN but I got the results in different columns, which was not what I was looking for.
SELECT
,SUM(CASE
WHEN DATE BETWEEN '2022-06-24' AND '2022-06-26' THEN index
ELSE 0
END) AS Weekend1
,SUM(CASE
WHEN DATE BETWEEN '2022-07-01' AND '2022-07-03' THEN index
ELSE 0
END) AS Weekend2
,SUM(CASE
WHEN DATE BETWEEN '2022-07-08' AND '2022-07-10' THEN index
ELSE 0
END) AS Weekend3
FROM Elvis

I would start by filtering the data on week-end days only. Then we can group by week to get the index sum ; the last step is to use window functions to compare each week-end with the previous one:
select iso_week,
row_number() over(order by iso_week) weekend_number,
sum(index) as dbo_sum,
( sum(index) - lag(sum(index) over(order by iso_week) )
/ nullif(lag(sum(index)) over(order by iso_week), 0) as ratio_change
from (
select e.*, extract(isoweek from date) iso_week
from elvis e
where extract(dayofweek from date) in (1, 7)
) e
group by iso_week
order by iso_week

Consider below
select *,
round(100 * safe_divide(dbo_sum - lag(dbo_sum) over(order by week), lag(dbo_sum) over(order by week)), 2) change_percent
from (
select extract(week from date + 2) week, sum(dbo) dbo_sum
from your_table
where extract(dayofweek from date + 2) in (1, 2, 3)
group by week
)
if applied to sample data in your question - output is

Related

SQL Divide previous row balance by current row balance and insert that value into current rows column "Growth"

I have a table where like this.
Year
ProcessDate
Month
Balance
RowNum
Calculation
2022
20220430
4
22855547
1
2022
20220330
3
22644455
2
2022
20220230
2
22588666
3
2022
20220130
1
33545444
4
2022
20221230
12
22466666
5
I need to take the previous row of each column and divide that amount by the current row.
Ex: Row 1 calculation should = Row 2 Balance / Row 1 Balance (22644455/22855547 = .99% )
Row 2 calculation should = Row 3 Balance / Row 2 Balance etc....
Table is just a Temporary table I created titled #MonthlyLoanBalance2.
Now I just need to take it a step further.
Let me know what and how you would go about doing this.
Thank you in advance!
Insert into #MonthlytLoanBalance2 (
Year
,ProcessDate
,Month
,Balance
,RowNum
)
select
--CloseYearMonth,
left(ProcessDate,4) as 'Year',
ProcessDate,
--x.LOANTypeKey,
SUBSTRING(CAST(x.ProcessDate as varchar(38)),5,2) as 'Month',
sum(x.currentBalance) as Balance
,ROW_NUMBER()over (order by ProcessDate desc) as RowNum
from
(
select
distinct LoanServiceKey,
LoanTypeKey,
AccountNumber,
CurrentBalance,
OpenDateKey,
CloseDateKey,
ProcessDate
from
cu.LAFactLoanSnapShot
where LoanStatus = 'Open'
and LoanTypeKey = 0
and ProcessDate in (select DateKey from dimDate
where IsLastDayOfMonth = 'Y'
and DateKey > convert(varchar, getdate()-4000, 112)
)
) x
group by ProcessDate
order by ProcessDate desc;``
I am assuming your data is already prepared as shown in the table. Now you can try Lead() function to resolve your issue. Remember format() function is used for taking only two precision.
SELECT *,
FORMAT((ISNULL(LEAD(Balance,1) OVER (ORDER BY RowNum), 1)/Balance),'N2') Calculation
FROM #MonthlytLoanBalance2

SQL - Calculate percentage by group, for multiple groups

I have a table in GBQ in the following format :
UserId Orders Month
XDT 23 1
XDT 0 4
FKR 3 6
GHR 23 4
... ... ...
It shows the number of orders per user and month.
I want to calculate the percentage of users who have orders, I did it as following :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
GROUP BY
HasOrders
ORDER BY
Parts
It gives me the following result:
HasOrders Parts
0 35
1 65
I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%
Currently to do this I execute the query once per month, which is not practical :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
WHERE Month = 1
GROUP BY
HasOrders
ORDER BY
Parts
Is there a way execute a query once and have this result ?
HasOrders Parts Month
0 25 1
1 75 1
0 45 2
1 55 2
... ... ...
SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020
You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:
WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA
It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09
Consider below approach
select hasOrders, round(100 * parts, 2) as parts, month from (
select month,
countif(orders = 0) / count(*) `0`,
countif(orders > 0) / count(*) `1`,
from your_table
group by month
)
unpivot (parts for hasOrders in (`0`, `1`))
with output like below

How do I display records in the same row although I am using group by on 2 columns that appear in different rows right now?

This is the output I am getting now but I want all the records for one gateway in one row I am trying to find the damage count and total count of packages processed by an airport in a week. Currently I am grouping by airport and week so I am getting the records in different rows for an airport and week. I want to have the records for a particular airport in a single row with weeks being in the same row.
I tried putting a conditional group by but that did not work.
select tmp.gateway,tmp.weekbucket, sum(tmp.damaged_count) as DamageCount, sum(tmp.total_count) as TotalCount, round(sum(tmp.DPMO),0) as DPMO from
(
select a.gateway,
date_trunc('week', (a.processing_date + interval '1 day')) - interval '1 day' as weekbucket,
count(distinct(b.fulfillment_shipment_id||b.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a.package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway, tmp.weekbucket
order by tmp.gateway, tmp.weekbucket desc;
As you get two week's days starting and ending hence its likely that youll get 2 rows for each. Can try to remove week bucket from group by after performing your actual select/within the inner select and put a max on week bucket with summing both counts of both start and end of week dates.
select
tmp.gateway,max(tmp.weekbucket),
sum(tmp.damaged_count) as
DamageCount,
sum(tmp.total_count) as TotalCount,
round(sum(tmp.DPMO),0) as DPMO
from
(
select a.gateway,
date_trunc('week', (a.processing_date +
interval '1 day')) - interval '1 day' as
weekbucket, count(distinct(b.fulfillment_shipment_id||b
.package_id)) as damaged_count,
count(distinct(a.fulfillment_shipment_id||a .package_id)) as total_count,
count(distinct(b.fulfillment_shipment_id||b.package_id))*1.00/count(distinct(a.Fulfillment_Shipment_id || a.package_id))*1000000 as DPMO
from booker.d_air_shipments_na a
left join trex.d_ps_packages b
on (a.fulfillment_shipment_id||a.package_id =b.Fulfillment_Shipment_id||b.package_id)
where a.processing_date >= current_date-7
and (exception_summary in ('Reprint-Damaged Label') or exception_summary IS NULL)
and substring(route, position(a.gateway IN route) +6, 1) <> 'K'
group by a.gateway, weekbucket) as tmp
group by tmp.gateway
order by tmp.gateway,
max(tmp.weekbucket) desc;
So you want to pivot the two weeks into a single row with two sets of aggregates?:
select
tmp.gateway,
tmp.weekbucket,
min(case when rn = 1 then tmp.damaged_count end) as DamageCountWeek1,
min(case when rn = 2 then tmp.damaged_count end) as DamageCountWeek2,
min(case when rn = 1 then tmp.total_count end) as TotalCountWeek1,
min(case when rn = 2 then tmp.total_count end) as TotalCountWeek2,
min(case when rn = 1 then round(tmp.DPMO, 0) end) as DPMOWeek1,
min(case when rn = 2 then round(tmp.DPMO, 0) end) as DPMOWeek2,
from (
select row_number() over (partition by gateway order by weekbucket) as rn,
...
) as tmp
group by tmp.gateway
order by tmp.gateway;

Loop in Oracle SQL, comparing one month to another

I have to draft a SQL query which does the following:
Compare current week (e.g. week 10) amount to the average amount over previous 4 weeks (Week# 9,8,7,6).
Now I need to run the query on a monthly basis so say for weeks (10,11,12,13).
As of now I am running it four times giving the week parameter on each run.
For example my current query is something like this :
select account_id, curr.amount,hist.AVG_Amt
from
(
select
to_char(run_date,'IW') Week_ID,
sum(amount) Amount,
account_id
from Transaction t
where to_char(run_date,'IW') = '10'
group by account_id,to_char(run_date,'IW')
) curr,
(
select account_id,
sum(amount) / count(to_char(run_date,'IW')) as AVG_Amt
from Transactions
where to_char(run_date,'IW') in ('6','7','8','9')
group by account_id
) hist
where
hist.account_id = curr.account_id
and curr.amount > 2*hist.AVG_Amt;
As you can see, if I have to run the above query for week 11,12,13 I have to run it three separate times. Is there a way to consolidate or structure the query such that I only run once and I get the comparison data all together?
Just an additional info, I need to export the data to Excel (which I do after running query on the PL/SQL developer) and export to Excel.
Thanks!
-Abhi
You can use a correlated sub-query to get the sum of amounts for the last 4 weeks for a given week.
select
to_char(run_date,'IW') Week_ID,
sum(amount) curAmount,
(select sum(amount)/4.0 from transaction
where account_id = t.account_id
and to_char(run_date,'IW') between to_char(t.run_date,'IW')-4
and to_char(t.run_date,'IW')-1
) hist_amount,
account_id
from Transaction t
where to_char(run_date,'IW') in ('10','11','12','13')
group by account_id,to_char(run_date,'IW')
Edit: Based on OP's comment on the performance of the query above, this can also be accomplished using lag to get the previous row's value. Count of number of records present in the last 4 weeks can be achieved using a case expression.
with sum_amounts as
(select to_char(run_date,'IW') wk, sum(amount) amount, account_id
from Transaction
group by account_id, to_char(run_date,'IW')
)
select wk, account_id, amount,
1.0 * (lag(amount,1,0) over (order by wk) + lag(amount,2,0) over (order by wk) +
lag(amount,3,0) over (order by wk) + lag(amount,4,0) over (order by wk))
/ case when lag(amount,1,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,2,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,3,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,4,0) over (order by wk) <> 0 then 1 else 0 end
as hist_avg_amount
from sum_amounts
I think that is what you are looking for:
with lagt as (select to_char(run_date,'IW') Week_ID, sum(amount) Amount, account_id
from Transaction t
group by account_id, to_char(run_date,'IW'))
select Week_ID, account_id, amount,
(lag(amount,1,0) over (order by week) + lag(amount,2,0) over (order by week) +
lag(amount,3,0) over (order by week) + lag(amount,4,0) over (order by week)) / 4 as average
from lagt;

SQL: grouping by number of entries and entry date

I have the following table log:
event_time | name |
-------------------------
2014-07-16 11:40 Bob
2014-07-16 10:00 John
2014-07-16 09:20 Bob
2014-07-16 08:20 Bob
2014-07-15 11:20 Bob
2014-07-15 10:20 John
2014-07-15 09:00 Bob
I would like to generate a report, where I can group data by number of entries per day and by entry day. So the resulting report for the table above would be something like this:
event_date | 0-2 | 3 | 4-99 |
-------------------------------
2014-07-16 1 1 0
2014-07-15 2 0 0
I use the following approached to solve it:
Select with grouping in range
How to select the count of values grouped by ranges
If I find answer before anybody post it here, I will share it.
Added
I would like to count a number of daily entries for each name. Then I check to which column this value belongs to, and the I add 1 to that column.
I took it in two steps. Inner query gets the base counts. The outer query uses case statements to sum counts.
SQL Fiddle Example
select event_date,
sum(case when cnt between 0 and 2 then 1 else 0 end) as "0-2",
sum(case when cnt = 3 then 1 else 0 end) as "3",
sum(case when cnt between 4 and 99 then 1 else 0 end) as "4-99"
from
(select cast(event_time as date) as event_date,
name,
count(1) as cnt
from log
group by cast(event_time as date), name) baseCnt
group by event_date
order by event_date
try like this
select da,sum(case when c<3 then 1 else 0 end) as "0-2",
sum(case when c=3 then 1 else 0 end) as "3",
sum(case when c>3 then 1 else 0 end) as "4-66" from (
select cast(event_time as date) as da,count(*) as c from
table1 group by cast(event_time as date),name) as aa group by da
First aggregate in two steps:
SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2;
Names should be completely irrelevant according to your description.
Then take the query and run it through crosstab():
SELECT *
FROM crosstab(
$$SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES ('0-2'::text), ('3'), ('4_or_more')$$
) AS f (day date, "0-2" int, "3" int, "4_or_more" int);
crosstab() is supplied by the additional module tablefunc. Details and instructions in this related answer:
PostgreSQL Crosstab Query
This is a variation on a PIVOT query (although PostgreSQL supports this via the crosstab(...) table functions). The existing answers cover the basic technique, I just prefer to construct queries without the use of CASE, where possible.
To get started, we need a couple of things. The first is essentially a Calendar Table, or entries from one (if you don't already have one, they're among the most useful dimension tables). If you don't have one, the entries for the specified dates can easily be generated:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-01' AS DATE),
CAST('2014-08-01' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay))
SQL Fiddle Demo
This is primarily used to create the first step in the double aggregate, like so:
SELECT Calendar_Range.startOfDay, COUNT(Log.name)
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name
SQL Fiddle Demo
Remember that most aggregate columns with a nullable expression (here, COUNT(Log.name)) will ignore null values (not count them). This is also one of the few times it's acceptable to not include a grouped-by column in the SELECT list (normally it makes the results ambiguous). For the actual queries I'll put this into a subquery, but it would also work as a CTE.
We also need a way to construct our COUNT ranges. That's pretty easy too:
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SQL Fiddle Demo
We'll be querying these as "exclusive upper-bound" as well.
We now have all the pieces we need to do the query. We can actually use these virtual tables to make queries in both veins of the current answers.
First, the SUM(CASE...) style.
For this query, we'll take advantage of the null-ignoring qualities of aggregate functions again:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT startOfDay,
COUNT(Zero_To_Two.text) AS Zero_To_Two,
COUNT(Three.text) AS Three,
COUNT(Four_And_Up.text) AS Four_And_Up
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
LEFT JOIN Count_Range Zero_To_Two
ON Zero_To_Two.text = '0 - 2'
AND Entry_Count.count >= Zero_To_Two.start
AND Entry_Count.count < Zero_To_Two.next
LEFT JOIN Count_Range Three
ON Three.text = '3'
AND Entry_Count.count >= Three.start
AND Entry_Count.count < Three.next
LEFT JOIN Count_Range Four_And_Up
ON Four_And_Up.text = '4+'
AND Entry_Count.count >= Four_And_Up.start
GROUP BY startOfDay
ORDER BY startOfDay
SQL Fiddle Example
The other option is of course the crosstab query, where the CASE was being used to segment the results. We'll use the Count_Range table to decode the values for us:
SELECT startOfDay, "0 -2", "3", "4+"
FROM CROSSTAB($$WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT Calendar_Range.startOfDay, Count_Range.text, COUNT(*) AS count
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
JOIN Count_Range
ON Entry_Count.count >= Count_Range.start
AND (Entry_Count.count < Count_Range.end OR Count_Range.end IS NULL)
GROUP BY Calendar_Range.startOfDay, Count_Range.text
ORDER BY Calendar_Range.startOfDay, Count_Range.text$$,
$$VALUES('0 - 2', '3', '4+')$$) Data(startOfDay DATE, "0 - 2" INT, "3" INT, "4+" INT)
(I believe this is correct, but don't have a way to test it - Fiddle doesn't seem to have the crosstab functionality loaded. In particular, CTEs probably must go inside the function itself, but I'm not sure....)