Add a CASE on a SUM when using conditional aggregation - sql

I have the following table (called report) in SQL Server with millions of records so performance is a factor.
+---------+------------------------+---------+
| user_id | timestamp | balance |
+---------+------------------------+---------+
| 1 |2021-04-29 09:31:10.100 | 10 |
| 1 |2021-04-29 09:35:25.800 | 15 |
| 1 |2021-04-29 09:36:30.550 | 5 |
| 2 |2021-04-29 09:38:15.009 | 100 |
| 3 |2021-04-29 10:36:30.550 | 50 |
| 3 |2021-04-29 10:38:15.009 | 250 |
+---------+------------------------+---------+
Here are the requirements :
I would like to group the opening balance, closing balance and net movement of all users between a date range.
I require 2 queries:
all movement greater than a variable threshold (lets call it 10)
all movement less than a variable threshold (lets call it 10)
The records must also be returned using the OFFET x FETCH NEXT y ROWS for a lighter response to the UI.
Here I have a working query that does not take into account the less than / greater than the threashold requirement.
select user_id,
max(case when seqnum = 1 then balance end) as opening,
max(case when seqnum_desc = 1 then balance end) as closing,
sum(case when seqnum = 1 and seqnum_desc = 1 then 0
when seqnum = 1 then - balance
when seqnum_desc = 1 then balance
end) as movement
from (select r.*,
row_number() over (partition by user_id order by timestamp) as seqnum,
row_number() over (partition by user_id order by timestamp desc) as seqnum_desc
from report r where timestamp >= '2020-03-21 16:22:26.580' and timestamp <= '2022-03-24 16:22:26.580'
) r where timestamp >= '2020-03-21 16:22:26.580' and timestamp <= '2022-03-24 16:22:26.580'
group by user_id order by user_id OFFSET 0 ROWS FETCH NEXT 200 ROWS ONLY
Here is the DB fiddle to get the output below
+---------+-----------------+-----------------+--------------+
| user_id | opening | closing | movement |
+---------+-----------------+-----------------+--------------+
| 1 | 10 | 5 | -5 |
| 2 | 100 | 100 | 0 |
| 3 | 50 | 250 | 200 |
+---------+-----------------+-----------------+--------------+
How do I conditionally return only movements greater than 10 and less than 10.
Thank you in advance.

I would suggest a CTE or subquery:
with cte as (
<your query here>
)
select cte.*
from cte
where movement < 10; -- or whatever condition
Note: You might actually want the absolute value if you really mean -10 to 10 rather than 0 to 10:
where abs(movement) < 10

Related

How to verify if max value of a column corresponds to the max value from another column group by a third column

I have a table
Token | Acct_No | Customer_ID |
|:----|:-------:|-----:|
10 | 1 | ABC
7 | 2 | ABC
6 | 3 | ABC
12 | 4 | ABC
11 | 1 | ABC
8 | 1 | ABC
15| 4 | ABC
16 | 3 | ABC
10 | 2 | CDA
I want to know if there are any rows where max(token) for max(acct_no) < max(token) for any other acct_no for a particular customer_id.
In this case, it is the 2nd last record.
You can use the window functions first_value() to calculate the maximum token for the biggest acct_no for each customer. Then for rows that have the biggest token for each customerid/acct_no check to see if any tokens are larger:
select t.*
from (select t.*,
first_value(token) over (partition by customerid order by acct_no desc, token desc) as token_biggest_acct_no,
row_number() over (partition by customerid, acct_no order by token desc) as seqnum
from t
) t
where seqnum = 1 and -- only consider last rows for customers/accounts
token > token_biggest_acct_no;

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

How to transform a range of records to the values of the record after that range in SQL?

I am trying to replace some bad input records within a specific date range with correct records. However, I'm not sure if there is an efficient way to do so. Therefore my question is how to transform a (static) range of records to the values of the record after that range in SQL? Below you will find an example to clarify what I try to achieve.
In this example you can see that customer number 1 belongs to group number 0 (None) in the period from 25-06-2020 to 29-06-2020. From 30-06-2020 to 05-07-2020 this group number changes from 0 to 11 for customer number 1. This static period contains the wrong records, and should be changed to the values that are valid on 06-07-2020 (group number == 10). Is there a way to do this?
If I understand correctly, you can use window functions to get the data on that particular date and case logic to assign it to the specific date range:
select t.*,
(case when date >= '2020-07-01' and date <= '2020-07-05'
then max(case when date = '2020-07-06' then group_number end) over (partition by customer_number)
else group_number
end) as imputed_group_number,
(case when date >= '2020-07-01' and date <= '2020-07-05'
then max(case when date = '2020-07-06' then role end) over (partition by customer_number)
else role
end) as imputed_role
from t;
If you want to update the values, you can use JOIN:
update t
set group_number = tt.group_number,
role = tt.role
from tt
where tt.customer_number = t.customer_number and tt.date = '2020-07-06'
I think that window function first_value() does what you want:
select
date,
customer_number,
first_value(group_number) over(partition by customer_number order by date) group_number,
first_value(role) over(partition by customer_number order by date) role
from mytable
You can do the following as an example. Here i have choosen the criteria that if role='Leader' its a bad record and therefore you would be applying the next available group_number --> in column group_number1, and role1.
I have used a smaller subset of the rows you have in your excel example.
select date1
,customer_number
,group_number
,case when role='Leader' then
(select t1.group_number
from t t1
where t1.date1>t.date1
and t1.role<>'Leader'
order by t1.date1 asc
limit 1
)
else group_number
end as group_number1
,role
,case when role='Leader' then
(select t1.role
from t t1
where t1.date1>t.date1
and t1.role<>'Leader'
order by t1.date1 asc
limit 1
)
else role
end as role1
from t
order by 1
+------------+-----------------+--------------+---------------+--------+--------+
| DATE1 | CUSTOMER_NUMBER | GROUP_NUMBER | GROUP_NUMBER1 | ROLE | ROLE1 |
+------------+-----------------+--------------+---------------+--------+--------+
| 2020-06-25 | 1 | 0 | 0 | None | None |
| 2020-06-26 | 1 | 0 | 0 | None | None |
| 2020-06-27 | 1 | 0 | 0 | None | None |
| 2020-06-28 | 1 | 0 | 0 | None | None |
| 2020-06-29 | 1 | 0 | 0 | None | None |
| 2020-06-30 | 1 | 11 | 10 | Leader | Member |
| 2020-07-01 | 1 | 11 | 10 | Leader | Member |
| 2020-07-06 | 1 | 10 | 10 | Member | Member |
+------------+-----------------+--------------+---------------+--------+--------+
db fiddle link
https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=c95d12ced067c1df94947848b5a94c14

Setting batch number for set of records in sql

I have following table in SQL
id,date,records
1,2019-03-28 01:22:12,5
2,2019-03-29 01:23:23,5
3,2019-03-30 01:28:54,5
4,2019-03-28 01:12:21,2
5,2019-03-12 01:08:11,1
6,2019-03-28 01:01:21,12
7,2019-03-12 01:02:11,1
What i am trying to achieve is set a batch number that should keep on increasing after moving sum value crosses 15 and the moving sum should reset as well, so i am trying to create batch for records that has total moving sum value as 15
For ex. if Moving sum becomes 15 the batch number value should increment, which would given me rows containing total value of 15.
so the output i am looking for is
id,date,records, moving_sum,batch_number
1,2019-03-28 01:22:12,5,5,1
2,2019-03-29 01:23:23,5,10,1
3,2019-03-30 01:28:54,5,15,1
4,2019-03-28 01:12:21,2,2,2
5,2019-03-12 01:08:11,1,1,2
6,2019-03-28 01:01:21,2,12,2
7,2019-03-12 01:02:11,1,1,3
You need a recursive query for this:
with
tab as (select t.*, row_number() over(order by id) rn from mytable t),
cte as (
select
id,
date,
records,
records moving_sum,
1 batch_number,
rn
from tab
where rn = 1
union all
select
t.id,
t.date,
t.records,
case when c.moving_sum + t.records > 15 then t.records else c.moving_sum + t.records end,
case when c.moving_sum + t.records > 15 then c.batch_number + 1 else c.batch_number end,
t.rn
from cte c
inner join tab t on t.rn = c.rn + 1
)
select id, date, records, moving_sum, batch_number from cte order by id
The syntax for recursive common table expressions slightly varies across databases, so you might need to adapt that a little depending on your database.
Also note that if ids start at 1, and are always incrementing without gaps, you don't actually common table expression tab, and you can replace rn with id in the second common table expression.
Demo on DB Fiddle:
id | date | records | moving_sum | batch_number
-: | :--------- | ------: | ---------: | -----------:
1 | 2019-03-28 | 5 | 5 | 1
2 | 2019-03-29 | 5 | 10 | 1
3 | 2019-03-30 | 5 | 15 | 1
4 | 2019-03-28 | 2 | 2 | 2
5 | 2019-03-12 | 1 | 3 | 2
6 | 2019-03-28 | 12 | 15 | 2
7 | 2019-03-12 | 1 | 1 | 3

Partition & consecutive in SQL

fellow stackers
I have a data set like so:
+---------+------+--------+
| user_id | date | metric |
+---------+------+--------+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 0 |
| 2 | 4 | 1 |
+---------+------+--------+
I am looking to flag those customers who has 3 consecutive "1"s in the metric column. I have a solution as below.
select distinct user_id
from (
select user_id
,metric +
ifnull( lag(metric, 1) OVER (PARTITION BY user_id ORDER BY date), 0 ) +
ifnull( lag(metric, 2) OVER (PARTITION BY user_id ORDER BY date), 0 )
as consecutive_3
from df
) b
where consecutive_3 = 3
While it works it is not scalable. As one can imagine what the above query would look like if I were looking for a consecutive 50.
May I ask if there is a scalable solution? Any cloud SQL will do. Thank you.
If you only want such users, you can use a sum(). Assuming that metric is only 0 or 1:
select user_id,
(case when max(metric_3) = 3 then 1 else 0 end) as flag_3
from (select df.*,
sum(metric) over (partition by user_id
order by date
rows between 2 preceding and current row
) as metric_3
from df
) df
group by user_id;
By using a windowing clause, you can easily expand to as many adjacent 1s as you like.