SQL - Calculate percentage by group, for multiple groups - sql

I have a table in GBQ in the following format :
UserId Orders Month
XDT 23 1
XDT 0 4
FKR 3 6
GHR 23 4
... ... ...
It shows the number of orders per user and month.
I want to calculate the percentage of users who have orders, I did it as following :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
GROUP BY
HasOrders
ORDER BY
Parts
It gives me the following result:
HasOrders Parts
0 35
1 65
I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%
Currently to do this I execute the query once per month, which is not practical :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
WHERE Month = 1
GROUP BY
HasOrders
ORDER BY
Parts
Is there a way execute a query once and have this result ?
HasOrders Parts Month
0 25 1
1 75 1
0 45 2
1 55 2
... ... ...

SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020
You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:
WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA
It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09

Consider below approach
select hasOrders, round(100 * parts, 2) as parts, month from (
select month,
countif(orders = 0) / count(*) `0`,
countif(orders > 0) / count(*) `1`,
from your_table
group by month
)
unpivot (parts for hasOrders in (`0`, `1`))
with output like below

Related

How to spread month to day with amount value divided by total days per month

I have data with an amount of 1 month and want to change it to 30 days.
if 1 month the amount is 20000 then per day is 666.67
The following are sample data and results:
Account
Project
Date
Segment
Amount
Acc1
1
September 2022
Actual
20000
Result :
I need a query using sql server
You may try a set-based approach using an appropriate number table and a calculation with windowed COUNT().
Data:
SELECT *
INTO Data
FROM (VALUES
('Acc1', 1, CONVERT(date, '20220901'), 'Actual', 20000.00)
) v (Account, Project, [Date], Segment, Amount)
Statement for all versions, starting from SQL Server 2016 (the number table is generated using JSON-based approach with OPENJSON()):
SELECT d.Account, d.Project, a.[Date], d.Segment, a.Amount
FROM Data d
CROSS APPLY (
SELECT
d.Amount / COUNT(*) OVER (ORDER BY (SELECT NULL)),
DATEADD(day, CONVERT(int, [key]), d.[Date])
FROM OPENJSON('[1' + REPLICATE(',1', DATEDIFF(day, d.[Date], EOMONTH(d.[Date]))) + ']')
) a (Amount, Date)
Statement for SQL Server 2022 (the number table is generated with GENERATE_SERIES()):
SELECT d.Account, d.Project, a.[Date], d.Segment, a.Amount
FROM Data d
CROSS APPLY (
SELECT
d.Amount / COUNT(*) OVER (ORDER BY (SELECT NULL)),
DATEADD(day, [value], d.[Date])
FROM GENERATE_SERIES(0, DATEDIFF(day, d.[Date], EOMONTH(d.[Date])))
) a (Amount, Date)
Notes:
Both approaches calculate the days for each month. If you always want 30 days per month, replace DATEDIFF(day, d.[Date], EOMONTH(d.[Date])) with 29.
There is a rounding issue with this calculation. You may need to implement an additional calculation for the last day of the month.
You can use a recursive CTE to generate each day of the month and then divide the amount by the number of days in the month to achive the required output
DECLARE #Amount NUMERIC(18,2) = 20000,
#MonthStart DATE = '2022-09-01'
;WITH CTE
AS
(
SELECT
CurrentDate = #MonthStart,
DayAmount = CAST(#Amount/DAY(EOMONTH(#MonthStart)) AS NUMERIC(18,2)),
RemainingAmount = CAST(#Amount - (#Amount/DAY(EOMONTH(#MonthStart))) AS NUMERIC(18,2))
UNION ALL
SELECT
CurrentDate = DATEADD(DAY,1,CurrentDate),
DayAmount = CASE WHEN DATEADD(DAY,1,CurrentDate) = EOMONTH(#MonthStart)
THEN RemainingAmount
ELSE DayAmount END,
RemainingAmount = CASE WHEN DATEADD(DAY,1,CurrentDate) = EOMONTH(#MonthStart)
THEN 0
ELSE CAST(RemainingAmount-DayAmount AS NUMERIC(18,2)) END
FROM CTE
WHERE CurrentDate < EOMONTH(#MonthStart)
)
SELECT
CurrentDate,
DayAmount
FROM CTE
In case you want an equal split without rounding errors and without loops you can use this calculation. It spreads the rounding error across all entries, so they are all as equal as possible.
DECLARE #Amount NUMERIC(18,2) = 20000,
#MonthStart DATE = '20220901'
SELECT DATEADD(DAY,Numbers.i - 1,#MonthStart)
, ShareSplit.Calculated_Share
, SUM(ShareSplit.Calculated_Share) OVER (ORDER BY (SELECT NULL)) AS Calculated_Total
FROM (SELECT DISTINCT number FROM master..spt_values WHERE number BETWEEN 1 AND DAY(EOMONTH(#MonthStart)))Numbers(i)
CROSS APPLY(SELECT CAST(ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) * 0.01
+ CASE
WHEN Numbers.i
<= ABS((#Amount - (ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) / 100.0 * DAY(EOMONTH(#MonthStart)))) * 100)
THEN 0.01 * SIGN(#Amount - (ROUND(#Amount * 100 / DAY(EOMONTH(#MonthStart)),0) / 100.0 * DAY(EOMONTH(#MonthStart))))
ELSE 0
END AS DEC(18,2)) AS Calculated_Share
)ShareSplit

SQL - Calculate the difference in number of orders by month

I am working on the orders table provided by this site, it has its own editor where you can test your SQL statements.
The order table looks like this
order_id
customer_id
order_date
1
7000
2016/04/18
2
5000
2016/04/18
3
8000
2016/04/19
4
4000
2016/04/20
5
NULL
2016/05/01
I want to get the difference in the number of orders for subsequent months.
To elaborate, the number of orders each month would be like this
SQL Statement
SELECT
MONTH(order_date) AS Month,
COUNT(MONTH(order_date)) AS Total_Orders
FROM
orders
GROUP BY
MONTH(order_date)
Result:
Month
Total_Orders
4
4
5
1
Now my goal is to get the difference in subsequent months which would be
Month
Total_Orders
Total_Orders_Diff
4
4
4 - Null = Null
5
1
1 - 4 = -3
My strategy was to self-join following this answer
This was my attempt
SELECT
MONTH(a.order_date),
COUNT(MONTH(a.order_date)),
COUNT(MONTH(b.order_date)) - COUNT(MONTH(a.order_date)) AS prev,
MONTH(b.order_date)
FROM
orders a
LEFT JOIN
orders b ON MONTH(a.order_date) = MONTH(b.order_date) - 1
GROUP BY
MONTH(a.order_date)
However, the result was just zeros (as shown below) which suggests that I am just subtracting from the same value rather than from the previous month (or subtracting from a null value)
MONTH(a.order_date)
COUNT(MONTH(a.order_date))
prev
MONTH(b.order_date)
4
4
0
NULL
5
1
0
NULL
Do you have any suggestions as to what I am doing wrong?
You have to use LAG window function in your SELECT statement.
LAG provides access to a row at a given physical offset that comes
before the current row.
So, this is what you need:
SELECT
MONTH(order_date) as Month,
COUNT(MONTH(order_date)) as Total_Orders,
COUNT(MONTH(order_date)) - (LAG (COUNT(MONTH(order_date))) OVER (ORDER BY (SELECT NULL))) AS Total_Orders_Diff
FROM orders
GROUP BY MONTH(order_date);
Here in an example on the SQL Fiddle: http://sqlfiddle.com/#!18/5ed75/1
Solution without using LAG window function:
WITH InitCTE AS
(
SELECT MONTH(order_date) AS Month,
COUNT(MONTH(order_date)) AS Total_Orders
FROM orders
GROUP BY MONTH(order_date)
)
SELECT InitCTE.Month, InitCTE.Total_Orders, R.Total_Orders_Diff
FROM InitCTE
OUTER APPLY (SELECT TOP 1 InitCTE.Total_Orders - CompareCTE.Total_Orders AS Total_Orders_Diff
FROM InitCTE AS CompareCTE
WHERE CompareCTE.Month < InitCTE.Month) R;
Something like the following should give you what you want - disclaimer, untested!
select *, Total_Orders - lag(Total_orders,1) over(order by Month) as Total_Orders_Diff
from (
select Month(order_date) as Month, Count(*) as Total_Orders
From orders
Group by Month(order_date)
)o

How to find value in a range of following rows - SQL Teradata

I have a table with the following columns:
account, validity_date,validity_month,amount.
For each row i want to check if the value in field "amount' exist over the rows range of the next month. if yes, indicator=1, else 0.
account validity_date validity_month amount **required_column**
------- ------------- --------------- ------- ----------------
123 15oct2019 201910 400 0
123 20oct2019 201910 500 1
123 15nov2019 201911 1000 0
123 20nov2019 201911 500 0
123 20nov2019 201911 2000 1
123 15dec2019 201912 400
123 15dec2019 201912 2000
Can anyone help?
Thanks
validity_month/100*12 + validity_month MOD 100 calculates a month number (for comparing across years, Jan to previous Dec) and the inner ROW_NUMBER reduces multiple rows with the same amount per month to a single row (kind of DISTINCT):
SELECT dt.*
,CASE -- next row is from next month
WHEN Lead(nextMonth IGNORE NULLS)
Over (PARTITION BY account, amount
ORDER BY validity_date)
= (validity_month/100*12 + validity_month MOD 100) +1
THEN 1
ELSE 0
END
FROM
(
SELECT t.*
,CASE -- one row per account/month/amount
WHEN Row_Number()
Over (PARTITION BY account, amount, validity_month
ORDER BY validity_date ) = 1
THEN validity_month/100*12 + validity_month MOD 100
END AS nextMonth
FROM tab AS t
) AS dt
Edit:
The previous is for exact matching amounts, for a range match the query is probably very hard to write with OLAP-functions, but easy with a Correlated Subquery:
SELECT t.*
,CASE
WHEN
( -- check if there's a row in the next month matching the current amount +/- 10 percent
SELECT Count(*)
FROM tab AS t2
WHERE t2.account_ = t.account_
AND (t2.validity_month/100*12 + t2.validity_month MOD 100)
= ( t.validity_month/100*12 + t.validity_month MOD 100) +1
AND t2.amount BETWEEN t.amount * 0.9 AND t.amount * 1.1
) > 0
THEN 1
ELSE 0
END
FROM tab AS t
But then performance might be really bad...
Assuming the values are unique within a month and you have a value for each month for each account, you can simplify this to:
select t.*,
(case when lead(seqnum) over (partition by account, amount order by validity_month) = seqnum + 1
then 1 else 0
end)
from (select t.*,
dense_rank() over (partition by account order by validity_month) as seqnum
from t
) t;
Note: This puts 0 for the last month rather than NULL, but that can easily be adjusted.
You can do this without the subquery by using month arithmetic. It is not clear what the data type of validity_month is. If I assume a number:
select t.*,
(case when lead(floor(validity_month / 100) * 12 + (validity_month mod 100)
) over (partition by account, amount order by validity_month) =
(validity_month / 100) * 12 + (validity_month mod 100) - 1
then 1 else 0
end)
from t;
Just to add another way to do this using Standard SQL. This query will return 1 when the condition is met, 0 when it is not, and null when there isn't a next month to evaluate (as implied in your result column).
It is assumed that we're partitioning on the account field. Also includes a 10% range match on the amount field based on the comment made. Note that if you have an id field, you should include it (if two rows have the same account, validity_date, validity_month, amount there will only be one resulting row, due to DISTINCT).
Performance-wise, should be similar to the answer from #dnoeth.
SELECT DISTINCT
t1.account,
t1.validity_date,
t1.validity_month,
t1.amount,
CASE
WHEN t2.amount IS NOT NULL THEN 1
WHEN MAX(t1.validity_month) OVER (PARTITION BY t1.account) > t1.validity_month THEN 0
ELSE NULL
END AS flag
FROM `project.dataset.table` t1
LEFT JOIN `project.dataset.table` t2
ON
t2.account = t1.account AND
DATE_DIFF(
PARSE_DATE("%Y%m", CAST(t2.validity_month AS STRING)),
PARSE_DATE("%Y%m", CAST(t1.validity_month AS STRING)),
MONTH
) = 1 AND
t2.amount BETWEEN t1.amount * 0.9 AND t1.amount * 1.1;

Loop in Oracle SQL, comparing one month to another

I have to draft a SQL query which does the following:
Compare current week (e.g. week 10) amount to the average amount over previous 4 weeks (Week# 9,8,7,6).
Now I need to run the query on a monthly basis so say for weeks (10,11,12,13).
As of now I am running it four times giving the week parameter on each run.
For example my current query is something like this :
select account_id, curr.amount,hist.AVG_Amt
from
(
select
to_char(run_date,'IW') Week_ID,
sum(amount) Amount,
account_id
from Transaction t
where to_char(run_date,'IW') = '10'
group by account_id,to_char(run_date,'IW')
) curr,
(
select account_id,
sum(amount) / count(to_char(run_date,'IW')) as AVG_Amt
from Transactions
where to_char(run_date,'IW') in ('6','7','8','9')
group by account_id
) hist
where
hist.account_id = curr.account_id
and curr.amount > 2*hist.AVG_Amt;
As you can see, if I have to run the above query for week 11,12,13 I have to run it three separate times. Is there a way to consolidate or structure the query such that I only run once and I get the comparison data all together?
Just an additional info, I need to export the data to Excel (which I do after running query on the PL/SQL developer) and export to Excel.
Thanks!
-Abhi
You can use a correlated sub-query to get the sum of amounts for the last 4 weeks for a given week.
select
to_char(run_date,'IW') Week_ID,
sum(amount) curAmount,
(select sum(amount)/4.0 from transaction
where account_id = t.account_id
and to_char(run_date,'IW') between to_char(t.run_date,'IW')-4
and to_char(t.run_date,'IW')-1
) hist_amount,
account_id
from Transaction t
where to_char(run_date,'IW') in ('10','11','12','13')
group by account_id,to_char(run_date,'IW')
Edit: Based on OP's comment on the performance of the query above, this can also be accomplished using lag to get the previous row's value. Count of number of records present in the last 4 weeks can be achieved using a case expression.
with sum_amounts as
(select to_char(run_date,'IW') wk, sum(amount) amount, account_id
from Transaction
group by account_id, to_char(run_date,'IW')
)
select wk, account_id, amount,
1.0 * (lag(amount,1,0) over (order by wk) + lag(amount,2,0) over (order by wk) +
lag(amount,3,0) over (order by wk) + lag(amount,4,0) over (order by wk))
/ case when lag(amount,1,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,2,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,3,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,4,0) over (order by wk) <> 0 then 1 else 0 end
as hist_avg_amount
from sum_amounts
I think that is what you are looking for:
with lagt as (select to_char(run_date,'IW') Week_ID, sum(amount) Amount, account_id
from Transaction t
group by account_id, to_char(run_date,'IW'))
select Week_ID, account_id, amount,
(lag(amount,1,0) over (order by week) + lag(amount,2,0) over (order by week) +
lag(amount,3,0) over (order by week) + lag(amount,4,0) over (order by week)) / 4 as average
from lagt;

Add a column into a select for calculation

I have 2 columns from a table and i want to add a third column to output the result of a calculation
select statement at the moment is:
select revenue, cost
from costdata
my 2 columns are revenue and cost
table name: costdata
my formula is: = ((revenue - cost)/ revenue)*100
I want the third column to be named 'result'
any idea on how to do this in a select statement?
SELECT revenue
, cost
, ((revenue - cost) / revenue) * 100 As result
FROM costdata
You mentioned in the comments that you get a divide by zero error. This wll occur when revenue equals zero.
What you want to happen in this scenario is up to you but this should get you started
SELECT revenue
, cost
, CASE WHEN revenue = 0 THEN
0
ELSE
((revenue - cost) / revenue) * 100
END As result
FROM costdata
Query:
SELECT revenue,
cost,
CASE WHEN revenue <> 0
THEN ((revenue - cost) / revenue) * 100
ELSE 0 END As result
FROM costdata
Try,
select revenue, cost,((revenue - cost)/ revenue)*100 AS result
from costdata
SELECT revenue, cost, ((revenue - cost)/ revenue)*100 AS result FROM costdata