Counting rows until where clause condition is satisfied - sql

I have a table of data which contains attributes like body, offer_id and created_at. When in chronological order I need to find the count of rows until 'body' satisfies my 'where' clause for a particular offer_id, i.e.
created at | offer id | body
---------------------------------------------
Jan | 12 | does not satisfy
Feb | 12 | does not satisfy
Mar | 12 | satisfies
Jan | 13 | does not satisfy
Feb | 13 | satisfies
Jan | 14 | does not satisfy
Feb | 14 | satisfies
Mar | 14 | does not satisfy
Apr | 14 | does not satisfy
Expected output:
offer_id | count
---------|------
12 | 3
13 | 2
14 | 2

First - you need to generate a number for every record inside its offer window:
select t.*, rownumber() over (partition by t.offer_ID order by t.created_at) as rn
from t
it will result in something like:
created at | offer id | body | rn
---------------------------------------------
Jan | 12 | does not satisfy | 1
Feb | 12 | does not satisfy | 2
Mar | 12 | satisfies | 3
Jan | 13 | does not satisfy | 1
Feb | 13 | satisfies | 2
Jan | 14 | does not satisfy | 1
Feb | 14 | satisfies | 2
Mar | 14 | does not satisfy | 3
Apr | 14 | does not satisfy | 4
from this subquery you can get a minimal rn (first record that satisfies the condition):
with sub as (
select t.*, rownumber() over (partition by t.offer_ID order by t.created_at) as rn
from t)
select offer_ID, min(rn)
from sub
where (satisfies)
group by offer_ID

straight as an arrow
select t.offer_id, count(*)
from mytable t
where not exists
(
select 1 from mytable tt
where tt.offer_id = t.offer_id
and tt.created_at < t.created_at
and tt.body = 'satisfies'
)
group by t.offer_id

DO you have tried something like this?
select count(*)
from mytable
where "satisfies"
Or, if you want to count only the different offer_id:
select count(distinct offer_id)
from mytable
where "satisfies"
Or, finally:
select count(offer_id)
from mytable
where "satisfies"
group by offer_id
Does is this what you need? If not, give me more details! ;)

One way to count the number that don't satisfy the condition is to use a cumulative sum:
select offer_id, count(*)
from (select t.*,
sum(case when <condition> then 1 else 0 end) over
(partition by offer_id order by created_at) as num
from t
) t
where num = 0;
However, this is one less than the number you have. So, instead:
select offer_id,
(sum(case when num = 0 then 1 else 0 end) +
max(case when num = 1 then 1 else 0 end)
)
from (select t.*,
sum(case when <condition> then 1 else 0 end) over
(partition by offer_id order by created_at) as num
from t
) t
where num in (0, 1)

If you just want the count of offer_id , you can use the below
select offer_id, count(*) as count_1 from table_name
where <<your condition>>
group by offer_id
If my understanding is wrong, please share a detailed description on what exactly you require.

You can break the task in two parts:
For each offer ID find the record/date that first satisfies the condition.
Count all records per offer ID until that found record/date.
With a subquery in SELECT:
select
offer_id,
(
select count(*)
from mytable m
where m.offer_id = mfit.offer_id
and m.created_at <= min(mfit.created_at)
) as cnt
from mytable mfit
where <condition>
group by offer_id
or a subquery in FROM:
select
mfit.offer_id,
count(*) as cnt
from
(
select offer_id, min(created_at) as min_date
from mytable
where <condition>
group by offer_id
) mfit
join mytable m on m.offer_id = mfit.offer_id and m.created_at <= mfit.created_at
group by mfit.offer_id;

Here is another query using an analytic function. Analytic functions have the advantage that you read the table just once and get different aggregations on-the-fly. The idea is to have a running total per offer_id with a one for a record matching your condition plus a count per offer_id. This looks as follows:
created at | offer id | body | s | c
---------------------------------------------------
Jan | 13 | does not satisfy | 0 | 1
Feb | 13 | satisfies | 1 | 2
Jan | 14 | does not satisfy | 0 | 1
Feb | 14 | satisfies | 1 | 2
Mar | 14 | does not satisfy | 1 | 3
Apr | 14 | does not satisfy | 1 | 4
May | 14 | satisfies | 2 | 5
Jun | 14 | does not satisfy | 2 | 6
Apr | 14 | does not satisfy | 2 | 7
May | 14 | satisfies | 3 | 8
So we are simply looking for the min(c) for s = 1.
select offer_id, min(c) as cnt
from
(
select
offer_id,
sum(case when <condition> then 1 else 0 end)
over (partition by offer_id order by created_at) as s,
count(*) over (partition by offer_id order by created_at) as c
from mytable
) data
where s = 1
group by offer_id
order by offer_id;

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

How to transform a range of records to the values of the record after that range in SQL?

I am trying to replace some bad input records within a specific date range with correct records. However, I'm not sure if there is an efficient way to do so. Therefore my question is how to transform a (static) range of records to the values of the record after that range in SQL? Below you will find an example to clarify what I try to achieve.
In this example you can see that customer number 1 belongs to group number 0 (None) in the period from 25-06-2020 to 29-06-2020. From 30-06-2020 to 05-07-2020 this group number changes from 0 to 11 for customer number 1. This static period contains the wrong records, and should be changed to the values that are valid on 06-07-2020 (group number == 10). Is there a way to do this?
If I understand correctly, you can use window functions to get the data on that particular date and case logic to assign it to the specific date range:
select t.*,
(case when date >= '2020-07-01' and date <= '2020-07-05'
then max(case when date = '2020-07-06' then group_number end) over (partition by customer_number)
else group_number
end) as imputed_group_number,
(case when date >= '2020-07-01' and date <= '2020-07-05'
then max(case when date = '2020-07-06' then role end) over (partition by customer_number)
else role
end) as imputed_role
from t;
If you want to update the values, you can use JOIN:
update t
set group_number = tt.group_number,
role = tt.role
from tt
where tt.customer_number = t.customer_number and tt.date = '2020-07-06'
I think that window function first_value() does what you want:
select
date,
customer_number,
first_value(group_number) over(partition by customer_number order by date) group_number,
first_value(role) over(partition by customer_number order by date) role
from mytable
You can do the following as an example. Here i have choosen the criteria that if role='Leader' its a bad record and therefore you would be applying the next available group_number --> in column group_number1, and role1.
I have used a smaller subset of the rows you have in your excel example.
select date1
,customer_number
,group_number
,case when role='Leader' then
(select t1.group_number
from t t1
where t1.date1>t.date1
and t1.role<>'Leader'
order by t1.date1 asc
limit 1
)
else group_number
end as group_number1
,role
,case when role='Leader' then
(select t1.role
from t t1
where t1.date1>t.date1
and t1.role<>'Leader'
order by t1.date1 asc
limit 1
)
else role
end as role1
from t
order by 1
+------------+-----------------+--------------+---------------+--------+--------+
| DATE1 | CUSTOMER_NUMBER | GROUP_NUMBER | GROUP_NUMBER1 | ROLE | ROLE1 |
+------------+-----------------+--------------+---------------+--------+--------+
| 2020-06-25 | 1 | 0 | 0 | None | None |
| 2020-06-26 | 1 | 0 | 0 | None | None |
| 2020-06-27 | 1 | 0 | 0 | None | None |
| 2020-06-28 | 1 | 0 | 0 | None | None |
| 2020-06-29 | 1 | 0 | 0 | None | None |
| 2020-06-30 | 1 | 11 | 10 | Leader | Member |
| 2020-07-01 | 1 | 11 | 10 | Leader | Member |
| 2020-07-06 | 1 | 10 | 10 | Member | Member |
+------------+-----------------+--------------+---------------+--------+--------+
db fiddle link
https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=c95d12ced067c1df94947848b5a94c14

Partition & consecutive in SQL

fellow stackers
I have a data set like so:
+---------+------+--------+
| user_id | date | metric |
+---------+------+--------+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 0 |
| 2 | 4 | 1 |
+---------+------+--------+
I am looking to flag those customers who has 3 consecutive "1"s in the metric column. I have a solution as below.
select distinct user_id
from (
select user_id
,metric +
ifnull( lag(metric, 1) OVER (PARTITION BY user_id ORDER BY date), 0 ) +
ifnull( lag(metric, 2) OVER (PARTITION BY user_id ORDER BY date), 0 )
as consecutive_3
from df
) b
where consecutive_3 = 3
While it works it is not scalable. As one can imagine what the above query would look like if I were looking for a consecutive 50.
May I ask if there is a scalable solution? Any cloud SQL will do. Thank you.
If you only want such users, you can use a sum(). Assuming that metric is only 0 or 1:
select user_id,
(case when max(metric_3) = 3 then 1 else 0 end) as flag_3
from (select df.*,
sum(metric) over (partition by user_id
order by date
rows between 2 preceding and current row
) as metric_3
from df
) df
group by user_id;
By using a windowing clause, you can easily expand to as many adjacent 1s as you like.

Group by similar number SQL (oracle sql)

I would like to find the number of sales that have a similar purchase value from the following table:
sale_number | value
------------+-------
1 | 10
2 | 11
3 | 21
4 | 30
A vanilla group by statement groups by exact value:
select count(sale_number), value from table group by value
Would give:
count(sale_number) | value
------------+-------
1 | 10
1 | 11
1 | 21
1 | 30
Is it possible to group by inexact numbers with a threshold (say +/- 10%)? Giving the desired result:
count(sale_number) | value
------------+-------
2 | 10
2 | 11
1 | 21
1 | 30
You can do what you want with a correlated subquery:
select t.*,
(select count(*)
from t t2
where t2.value >= t.value * 0.9 and
t2.value <= t.value * 1.1
) as cnt
from t;

Select records from database in specific date period

Ok I have this example table:
+-------+--------+-----------+
| users | groups | startDate |
+-------+--------+-----------+
| Foo | A | 1 Aug 18 |
| Foo | B | 1 Jan 18 |
| Boo | C | 1 Jan 18 |
| Doo | B | 1 Jan 18 |
| Loo | B | 1 Sep 18 |
+-------+--------+-----------+
and I want to select (Group B) users with "startDate" not higher than today and also without any other records for other groups in more recent "startDate" also not higher than today, so the correct result should be:
+-------+--------+-----------+
| users | groups | startDate |
+-------+--------+-----------+
| Doo | B | 1 Jan 18 |
+-------+--------+-----------+
I tried the following code but didn't get what I need:
DECLARE #StartDate date = '2018-08-01'
DECLARE #GroupID varchar(1) = 'B';
WITH CurrentUsers AS (SELECT users, groups, startDate, ROW_NUMBER() OVER(PARTITION BY users ORDER BY CASE WHEN startDate>#StartDate THEN 0 ELSE 1 END, ABS(DATEDIFF(DAY, #StartDate, startDate)) ASC) AS RowNum FROM usersTable) SELECT users FROM CurrentUsers WHERE groups=#GroupID AND RowNum = 1
If I understand correctly, you seem to want:
select user
from currentusers cu
group by user
having sum(case when groups = #GroupID then 1 else 0 end) > 0 and -- in Group B
max(startdate) < #StartDate;
EDIT:
The above is based on a misunderstanding. You want people whose active group is today. I think you want:
WITH CurrentUsers AS (
SELECT users, groups, startDate,
ROW_NUMBER() OVER (PARTITION BY users
ORDER BY startDate DESC
) as seqnum
FROM usersTable
WHERE startDate <= #StartDate
)
SELECT users
FROM CurrentUsers
WHERE groups=#GroupID AND seqnum = 1;