I have a table with the following columns:
account, validity_date,validity_month,amount.
For each row i want to check if the value in field "amount' exist over the rows range of the next month. if yes, indicator=1, else 0.
account validity_date validity_month amount **required_column**
------- ------------- --------------- ------- ----------------
123 15oct2019 201910 400 0
123 20oct2019 201910 500 1
123 15nov2019 201911 1000 0
123 20nov2019 201911 500 0
123 20nov2019 201911 2000 1
123 15dec2019 201912 400
123 15dec2019 201912 2000
Can anyone help?
Thanks
validity_month/100*12 + validity_month MOD 100 calculates a month number (for comparing across years, Jan to previous Dec) and the inner ROW_NUMBER reduces multiple rows with the same amount per month to a single row (kind of DISTINCT):
SELECT dt.*
,CASE -- next row is from next month
WHEN Lead(nextMonth IGNORE NULLS)
Over (PARTITION BY account, amount
ORDER BY validity_date)
= (validity_month/100*12 + validity_month MOD 100) +1
THEN 1
ELSE 0
END
FROM
(
SELECT t.*
,CASE -- one row per account/month/amount
WHEN Row_Number()
Over (PARTITION BY account, amount, validity_month
ORDER BY validity_date ) = 1
THEN validity_month/100*12 + validity_month MOD 100
END AS nextMonth
FROM tab AS t
) AS dt
Edit:
The previous is for exact matching amounts, for a range match the query is probably very hard to write with OLAP-functions, but easy with a Correlated Subquery:
SELECT t.*
,CASE
WHEN
( -- check if there's a row in the next month matching the current amount +/- 10 percent
SELECT Count(*)
FROM tab AS t2
WHERE t2.account_ = t.account_
AND (t2.validity_month/100*12 + t2.validity_month MOD 100)
= ( t.validity_month/100*12 + t.validity_month MOD 100) +1
AND t2.amount BETWEEN t.amount * 0.9 AND t.amount * 1.1
) > 0
THEN 1
ELSE 0
END
FROM tab AS t
But then performance might be really bad...
Assuming the values are unique within a month and you have a value for each month for each account, you can simplify this to:
select t.*,
(case when lead(seqnum) over (partition by account, amount order by validity_month) = seqnum + 1
then 1 else 0
end)
from (select t.*,
dense_rank() over (partition by account order by validity_month) as seqnum
from t
) t;
Note: This puts 0 for the last month rather than NULL, but that can easily be adjusted.
You can do this without the subquery by using month arithmetic. It is not clear what the data type of validity_month is. If I assume a number:
select t.*,
(case when lead(floor(validity_month / 100) * 12 + (validity_month mod 100)
) over (partition by account, amount order by validity_month) =
(validity_month / 100) * 12 + (validity_month mod 100) - 1
then 1 else 0
end)
from t;
Just to add another way to do this using Standard SQL. This query will return 1 when the condition is met, 0 when it is not, and null when there isn't a next month to evaluate (as implied in your result column).
It is assumed that we're partitioning on the account field. Also includes a 10% range match on the amount field based on the comment made. Note that if you have an id field, you should include it (if two rows have the same account, validity_date, validity_month, amount there will only be one resulting row, due to DISTINCT).
Performance-wise, should be similar to the answer from #dnoeth.
SELECT DISTINCT
t1.account,
t1.validity_date,
t1.validity_month,
t1.amount,
CASE
WHEN t2.amount IS NOT NULL THEN 1
WHEN MAX(t1.validity_month) OVER (PARTITION BY t1.account) > t1.validity_month THEN 0
ELSE NULL
END AS flag
FROM `project.dataset.table` t1
LEFT JOIN `project.dataset.table` t2
ON
t2.account = t1.account AND
DATE_DIFF(
PARSE_DATE("%Y%m", CAST(t2.validity_month AS STRING)),
PARSE_DATE("%Y%m", CAST(t1.validity_month AS STRING)),
MONTH
) = 1 AND
t2.amount BETWEEN t1.amount * 0.9 AND t1.amount * 1.1;
Related
I have a table where like this.
Year
ProcessDate
Month
Balance
RowNum
Calculation
2022
20220430
4
22855547
1
2022
20220330
3
22644455
2
2022
20220230
2
22588666
3
2022
20220130
1
33545444
4
2022
20221230
12
22466666
5
I need to take the previous row of each column and divide that amount by the current row.
Ex: Row 1 calculation should = Row 2 Balance / Row 1 Balance (22644455/22855547 = .99% )
Row 2 calculation should = Row 3 Balance / Row 2 Balance etc....
Table is just a Temporary table I created titled #MonthlyLoanBalance2.
Now I just need to take it a step further.
Let me know what and how you would go about doing this.
Thank you in advance!
Insert into #MonthlytLoanBalance2 (
Year
,ProcessDate
,Month
,Balance
,RowNum
)
select
--CloseYearMonth,
left(ProcessDate,4) as 'Year',
ProcessDate,
--x.LOANTypeKey,
SUBSTRING(CAST(x.ProcessDate as varchar(38)),5,2) as 'Month',
sum(x.currentBalance) as Balance
,ROW_NUMBER()over (order by ProcessDate desc) as RowNum
from
(
select
distinct LoanServiceKey,
LoanTypeKey,
AccountNumber,
CurrentBalance,
OpenDateKey,
CloseDateKey,
ProcessDate
from
cu.LAFactLoanSnapShot
where LoanStatus = 'Open'
and LoanTypeKey = 0
and ProcessDate in (select DateKey from dimDate
where IsLastDayOfMonth = 'Y'
and DateKey > convert(varchar, getdate()-4000, 112)
)
) x
group by ProcessDate
order by ProcessDate desc;``
I am assuming your data is already prepared as shown in the table. Now you can try Lead() function to resolve your issue. Remember format() function is used for taking only two precision.
SELECT *,
FORMAT((ISNULL(LEAD(Balance,1) OVER (ORDER BY RowNum), 1)/Balance),'N2') Calculation
FROM #MonthlytLoanBalance2
I have a table in GBQ in the following format :
UserId Orders Month
XDT 23 1
XDT 0 4
FKR 3 6
GHR 23 4
... ... ...
It shows the number of orders per user and month.
I want to calculate the percentage of users who have orders, I did it as following :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
GROUP BY
HasOrders
ORDER BY
Parts
It gives me the following result:
HasOrders Parts
0 35
1 65
I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%
Currently to do this I execute the query once per month, which is not practical :
SELECT
HasOrders,
ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
SELECT
*,
CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
FROM `Table` )
WHERE Month = 1
GROUP BY
HasOrders
ORDER BY
Parts
Is there a way execute a query once and have this result ?
HasOrders Parts Month
0 25 1
1 75 1
0 45 2
1 55 2
... ... ...
SELECT
SIGN(Orders),
ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
Demo on Postgres:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020
You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:
WITH DATA AS (
SELECT SIGN(Orders) AS HasOrders, Month,
COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
CASE WHEN HasOrders = 0
THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
END AS PartsRound0Down,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
AND MOD(TRUNCATE(PartsPercent), 2) = 0
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsRoundTowardEven,
CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
END AS PartsSmallestTowardZero
from DATA
It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different:
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09
Consider below approach
select hasOrders, round(100 * parts, 2) as parts, month from (
select month,
countif(orders = 0) / count(*) `0`,
countif(orders > 0) / count(*) `1`,
from your_table
group by month
)
unpivot (parts for hasOrders in (`0`, `1`))
with output like below
Let's say I have this Table Rewards
CustomerID DateReceived Point
1 2020-01-01 15
1 2020-01-02 20
2 2020-01-08 20
2 2020-01-10 10
3 2020-01-10 50
Then I have this table MaxRewards
CustomerID MaxPoint
1 20
2 10
3 30
So, the idea is Customer's Point combined (summed) should never surpassed MaxPoint in MaxRewards table, if it does, then it will substract starting on latest date, so in the case above, the expected result should be
CustomerID DateReceived Point
1 2020-01-01 15
1 2020-01-02 5
2 2020-01-08 10
2 2020-01-10 0
3 2020-01-10 30
How do I execute a query to update the table without a loop? I've been thinking of using CROSS APPLY but I can't seem to get it right.
The following approach seems to be working:
WITH cte AS (
SELECT
r.CustomerID,
r.DateReceived,
r.Point,
mr.MaxPoint,
COALESCE(SUM(r.Point) OVER (PARTITION BY r.CustomerID ORDER BY r.DateReceived
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) AS cumPoints
FROM Rewards r
INNER JOIN MaxRewards mr ON r.CustomerID = mr.CustomerID
)
SELECT
CustomerID,
DateReceived,
CASE WHEN cumPoints + Point < MaxPoint
THEN Point
ELSE CASE WHEN MaxPoint - cumPoints > 0
THEN MaxPoint - cumPoints ELSE 0 END END AS Point
FROM cte
ORDER BY
CustomerID,
DateReceived;
Demo
The logic used here is to compute, in the CTE, the cumulative sum of points for each customer, before the current record in the date sequence. Then, we report one of two things as the Point in the output. For cases where the cumulative sum be already greater than the maximum number of points, we report 0. Otherwise, we report the difference between the maximum number of points and the cumulative sum, which gives the number of points from the current record which we want to add.
Make use of Lag and Case statement:
Demo
SELECT CASE WHEN RUNNING_SUM > MAXPOINT AND PREV_VALUE IS NULL AND POINT > MAXPOINT THEN MAXPOINT
WHEN RUNNING_SUM > MAXPOINT AND PREV_VALUE IS NULL THEN POINT - MAXPOINT
WHEN RUNNING_SUM > MAXPOINT AND PREV_VALUE IS NOT NULL AND PREV_VALUE < POINT THEN POINT - PREV_VALUE
WHEN RUNNING_SUM > MAXPOINT AND PREV_VALUE IS NOT NULL THEN POINT - MAXPOINT
WHEN RUNNING_SUM > MAXPOINT THEN MAXPOINT - PREV_VALUE
ELSE POINT END AS POINT,
CUSTOMERID, DATERECEIVED FROM (
SELECT C.CUSTOMERID,C.DATERECEIVED, C.POINT, M.MAXPOINT, LAG(POINT) OVER(PARTITION BY C.CUSTOMERID ORDER BY DateReceived) PREV_VALUE,
SUM(POINT) OVER(PARTITION BY C.CUSTOMERID ORDER BY DateReceived) RUNNING_SUM
FROM CUSTOMER C INNER JOIN MaxRewards M
ON(C.CUSTOMERID = M.CUSTOMERID) ) X;
You can achieve the result from your requirements using lag, derived table, and case
select
CustomerID,
DateReceived,
case
when R.LastPoint > R.MaxPoint then 0
when R.Point + R.LastPoint > R.MaxPoint
then R.MaxPoint - R.LastPoint
else Point
end as Point
from (
select
R.CustomerID,
R.DateReceived,
R.Point,
MR.MaxPoint,
isnull(LAG(R.Point,1) OVER (
PARTITION BY R.CustomerID
ORDER BY R.DateReceived
),0) LastPoint
from dbo.Rewards R
join dbo.MaxRewards MR
on R.CustomerID = MR.CustomerID
) R
http://sqlfiddle.com/#!18/1a368
Please find the below query, I used an accumulative sum to calculate how many points left for each customer, and used IIF to understand whether a customer surpassed its threshold :
SELECT CustomerID,DateReceived,IIF(temp_point>0,temp_point,0) as Point
FROM
(
select CustomerID,DateReceived,
IIF(PointsLeft>0,Point,Point-ABS(PointsLeft)) as temp_point
from
(
select Rewards.* ,
sum(Rewards.Point) over (partition by MaxRewards.CustomerID order by DateReceived asc) s,
MaxRewards.Point as MaxPoint,
MaxRewards.Point - sum(Rewards.Point) over (partition by MaxRewards.CustomerID order by DateReceived asc) PointsLeft
From Rewards, MaxRewards
where Rewards.CustomerID = MaxRewards.CustomerID
) a
) b
Help! We're trying to create a new column (Is Valid?) to reproduce the logic below.
It is a binary result that:
it is 1 if it is the first known value of an ID
it is 1 if it is 3 seconds or later than the previous "1" of that ID
Note 1: this is not the difference in seconds from the previous record
It is 0 if it is less than 3 seconds than the previous "1" of that ID
Note 2: there are many IDs in the data set
Note 3: original dataset has ID and Date
Attached a PoC of the data and the expected result.
You would have to do this using a recursive CTE, which is quite expensive:
with tt as (
select t.*, row_number() over (partition by id order by time) as seqnum
from t
),
recursive cte as (
select t.*, time as grp_start
from tt
where seqnum = 1
union all
select tt.*,
(case when tt.time < cte.grp_start + interval '3 second'
then tt.time
else tt.grp_start
end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select cte.*,
(case when grp_start = lag(grp_start) over (partition by id order by time)
then 0 else 1
end) as isValid
from cte;
I have the below data in a table A which I need to insert into table B along with one computed column.
TABLE A:
Account_No | Balance | As_on_date
1001 |-100 | 1-Jan-2013
1001 |-150 | 2-Jan-2013
1001 | 200 | 3-Jan-2013
1001 |-250 | 4-Jan-2013
1001 |-300 | 5-Jan-2013
1001 |-310 | 6-Jan-2013
Table B:
In table B, there should be no of days to be shown when balance is negative and
the date one which it has gone into negative.
So, for 6-Jan-2013, this table should show below data:
Account_No | Balance | As_on_date | Days_passed | Start_date
1001 | -310 | 6-Jan-2013 | 3 | 4-Jan-2013
Here, no of days should be the days when the balance has gone negative in recent time and
not from the old entry.
I need to write a SQL query to get the no of days passed and the start date from when the
balance has gone negative.
I tried to formulate a query using Lag analytical function, but I am not succeeding.
How should I check the first instance of negative balance by traversing back using LAG function?
Even the first_value function was given a try but not getting how to partition in it based on negative value.
Any help or direction on this will be really helpful.
Here's a way to achive this using analytical functions.
INSERT INTO tableb
WITH tablea_grouped1
AS (SELECT account_no,
balance,
as_on_date,
SUM (CASE WHEN balance >= 0 THEN 1 ELSE 0 END)
OVER (PARTITION BY account_no ORDER BY as_on_date)
grp
FROM tablea),
tablea_grouped2
AS (SELECT account_no,
balance,
as_on_date,
grp,
LAST_VALUE (
balance)
OVER (
PARTITION BY account_no, grp
ORDER BY as_on_date
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
closing_balance
FROM tablea_grouped1
WHERE balance < 0
AND grp != 0 --keep this, if starting negative balance is to be ignored
)
SELECT account_no,
closing_balance,
MAX (as_on_date),
MAX (as_on_date) - MIN (as_on_date) + 1,
MIN (as_on_date)
FROM tablea_grouped2
GROUP BY account_no, grp, closing_balance
ORDER BY account_no, MIN (as_on_date);
First, SUM is used as analytical function to assign group number to consecutive balances less than 0.
LAST_VALUE function is then used to find the last -ve balance in each group
Finally, the result is aggregated based on each group. MAX(date) gives the last date, MIN(date) gives the starting date, and the difference of the two gives number of days.
Demo at sqlfiddle.
Try this and use gone_negative to computing specified column value for insert into another table:
select temp.account_no,
temp.balance,
temp.prev_balance,
temp.on_date,
temp.prev_on_date,
case
WHEN (temp.balance < 0 and temp.prev_balance >= 0) THEN
1
else
0
end as gone_negative
from (select account_no,
balance,
on_date,
lag(balance, 1, 0) OVER(partition by account_no ORDER BY account_no) prev_balance,
lag(on_date, 1) OVER(partition by account_no ORDER BY account_no) prev_on_date
from tblA
order by account_no) temp;
Hope this helps pal.
Here's on way to do it.
Select all records from my_table where the balance is positive.
Do a self-join and get all the records that have a as_on_date is greater than the current row, but the amounts are in negative
Once we get these, we cut-off the rows WHERE the date difference between the current and the previous row for as_on_date is > 1. We then filter the results a outer sub query
The Final select just groups the rows and gets the min, max values for the filtered rows which are grouped.
Query:
SELECT
account_no,
min(case when row_number = 1 then balance end) as balance,
max(mt2_date) as As_on_date,
max(mt2_date) - mt1_date as Days_passed,
min(mt2_date) as Start_date
FROM
(
SELECT
*,
MIN(break_date) OVER( PARTITION BY mt1_date ) AS min_break_date,
ROW_NUMBER() OVER( PARTITION BY mt1_date ORDER BY mt2_date desc ) AS row_number
FROM
(
SELECT
mt1.account_no,
mt2.balance,
mt1.as_on_date as mt1_date,
mt2.as_on_date as mt2_date,
case when mt2.as_on_date - lag(mt2.as_on_date,1) over () > 1 then mt2.as_on_date end as break_date
FROM
my_table mt1
JOIN my_table mt2 ON ( mt2.balance < mt1.balance AND mt2.as_on_date > mt1.as_on_date )
WHERE
MT1.balance > 0
order by
mt1.as_on_date,
mt2.as_on_date ) sub_query
) T
WHERE
min_break_date is null
OR mt2_date < min_break_date
GROUP BY
mt1_date,
account_no
SQLFIDDLE
I have a added a few more rows in the FIDDLE, just to test it out