SQL - Calculate number of occurrences of previous day? - sql

I want to calculate the number of people who also had occurrence the previous day on a daily basis, but I'm not sure how to do this?
Sample Table:
| ID | Date |
+----+-----------+
| 1 | 1/10/2020 |
| 1 | 1/11/2020 |
| 2 | 2/20/2020 |
| 3 | 2/20/2020 |
| 3 | 2/21/2020 |
| 4 | 2/23/2020 |
| 4 | 2/24/2020 |
| 5 | 2/22/2020 |
| 5 | 2/23/2020 |
| 5 | 2/24/2020 |
+----+-----------+
Desired Output:
| Date | Count |
+-----------+-------+
| 1/11/2020 | 1 |
| 2/21/2020 | 1 |
| 2/23/2020 | 1 |
| 2/24/2020 | 2 |
+-----------+-------+
Edit: Added desired output. The output count should be unique to the ID, not the number of date occurrences. i.e. an ID 5 can appear on this list 10 times for dates 2/23/2020 and 2/24/2020, but that would count as "1".

Use lag():
select date, count(*)
from (select t.*, lag(date) over (partition by id order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date)
group by date;

Related

How do I find all “gaps” with predefined minimal gap size with SQL?

I read a lot of good answers (here, here, here) about finding gaps, but I still can't figure out how to find gaps with a minimal predefined size.
In my case gaps are entries with no name order by HE.
I also need to find gaps starting at the beginning of the table like in the example.
Anyone can help with a nice and clean SQL statement which can be altered uncomplicated to get predefined minimal gap sizes?
Example with expected output:
+-----------+----+ +----------------+ +----------------+ +----------------+
| name | HE | | GAPS >= 1 | | GAPS >= 2 | | GAPS >= 3 |
+-----------+----+ +-----------+----+ +-----------+----+ +-----------+----+
| | 1 | | name | HE | | name | HE | | name | HE |
| JohnDoe01 | 2 | +-----------+----+ +-----------+----+ +-----------+----+
| JohnDoe02 | 3 | | | 1 | | | 4 | | | 12 |
| | 4 | | | 4 | | | 5 | | | 13 |
| | 5 | | | 5 | | | 9 | | | 14 |
| JohnDoe03 | 6 | | | 9 | | | 10 | +-----------+----+
| JohnDoe04 | 7 | | | 10 | | | 12 |
| JohnDoe05 | 8 | | | 12 | | | 13 |
| | 9 | | | 13 | | | 14 |
| | 10 | | | 14 | +-----------+----+
| JohnDoe06 | 11 | +-----------+----+
| | 12 |
| | 13 |
| | 14 |
| JohnDoe07 | 15 |
+-----------+----+
You can identify the gaps and the start and stops. To identify the gaps, count the number of non-gaps and aggregate:
select min(he), max(he), count(*) as size
from (select t.*, count(name) over (order by he) as grp
from t
) t
where name is null
group by grp;
You can then filter using having for gaps of a certain size, say 2:
having count(*) >= 2
for instance.
This summarizes the gaps, with one per row. That actually seems more useful to me than a separate row for each row.
EDIT:
If you actually wanted the original rows, you could do:
select t.*
from (select t.*,
max(he) filter (where name is not null) over (order by he) as prev_he,
min(he) filter (where name is not null) over (order by he desc) as next_he,
max(he) over () as max_he
from t
) t
where name is null and
(max(next_he, max_he + 1) - coalesce(prev_he, 0) - 1) >= 2;
EDIT II:
In older versions of MySQL/MariaDB, you can use variables:
select min(he), max(he), count(*) as size
from (select t.*,
(#grp := #grp + (name is not null)) as grp
from (select t.* from t order by he) t cross join
(select #grp := 0) params
) t
where name is null
group by grp;

SQL group by changing column

Suppose I have a table sorted by date as so:
+-------------+--------+
| DATE | VALUE |
+-------------+--------+
| 01-09-2020 | 5 |
| 01-15-2020 | 5 |
| 01-17-2020 | 5 |
| 02-03-2020 | 8 |
| 02-13-2020 | 8 |
| 02-20-2020 | 8 |
| 02-23-2020 | 5 |
| 02-25-2020 | 5 |
| 02-28-2020 | 3 |
| 03-13-2020 | 3 |
| 03-18-2020 | 3 |
+-------------+--------+
I want to group by changes in value within that given date range, and add a value that increments each time as an added column to denote that.
I have tried a number of different things, such as using the lag function:
SELECT value, value - lag(value) over (order by date) as count
GROUP BY value
In short, I want to take the table above and have it look like:
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 01-15-2020 | 5 | 1 |
| 01-17-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-13-2020 | 8 | 2 |
| 02-20-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-25-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
| 03-13-2020 | 3 | 4 |
| 03-18-2020 | 3 | 4 |
+-------------+--------+-------+
I want to eventually have it all in one small table with the earliest date for each.
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
+-------------+--------+-------+
Any help would be very appreciated
you can use a combination of Row_number and Dense_rank functions to get the required results like below:
;with cte
as
(
select t.DATE,t.VALUE
,Dense_rank() over(partition by t.VALUE order by t.DATE) as d_rank
,Row_number() over(partition by t.VALUE order by t.DATE) as r_num
from table t
)
Select t.Date,t.Value,d_rank as count
from cte
where r_num = 1
You can use a lag and cumulative sum and a subquery:
SELECT value,
SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
FROM t
) t
Here is a db<>fiddle.
You can recursively use lag() and then row_number() analytic functions :
WITH t2 AS
(
SELECT LAG(value,1,value-1) OVER (ORDER BY date) as lg,
t.*
FROM t
)
SELECT t2.date,t2.value, ROW_NUMBER() OVER (ORDER BY t2.date) as count
FROM t2
WHERE value - lg != 0
Demo
and filter through inequalities among the returned values from those functions.

SQL calculating sum and number of distinct values within group

I want to calculate
(1) total sales amount
(2) number of distinct stores per product
in one query, if possible. Suppose we have data:
+-----------+---------+-------+--------+
| store | product | month | amount |
+-----------+---------+-------+--------+
| Anthill | A | 1 | 1 |
| Anthill | A | 2 | 1 |
| Anthill | A | 3 | 1 |
| Beetle | A | 1 | 1 |
| Beetle | A | 3 | 1 |
| Cockroach | A | 1 | 1 |
| Cockroach | A | 2 | 1 |
| Cockroach | A | 3 | 1 |
| Anthill | B | 1 | 1 |
| Beetle | B | 2 | 1 |
| Cockroach | B | 3 | 1 |
+-----------+---------+-------+--------+
I have tried this with no luck:
select
[product]
,[month]
,[amount]
,cnt_distinct_stores = count(distinct(stores))
from dbo.temp
group by
[product]
,[month]
order by 1,2
Would there be possible any combination of GROUP BY clause with window functions like SUM(amount) OVER(partition by [product],[month] ORDER BY [month] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Try
SELECT product,
SUM(amount),
COUNT(DISTINCT store)
FROM dbo.temp
GROUP BY product

How to partition by a customized sum value?

I have a table with the following columns: customer_id, event_date_time
I'd like to figure out how many times a customer triggers an event every 12 hours from the start of an event. In other words, aggregate the time between events for up to 12 hours by customer.
For example, if a customer triggers an event (in order) at noon, 1:30pm, 5pm, 2am, and 3pm, I would want to return the noon, 2am, and 3pm record.
I've written this query:
select
cust_id,
event_datetime,
nvl(24*(event_datetime - lag(event_datetime) over (partition BY cust_id ORDER BY event_datetime)),0) as difference
from
tbl
I feel like I'm close with this. Is there a way to add something like
over (partition BY cust_id, sum(difference)<12 ORDER BY event_datetime)
EDIT: I'm adding some sample data:
+---------+-----------------+-------------+---+
| cust_id | event_datetime | DIFFERENCE | X |
+---------+-----------------+-------------+---+
| 1 | 6/20/2015 23:35 | 0 | x |
| 1 | 6/21/2015 0:09 | 0.558611111 | |
| 1 | 6/21/2015 0:49 | 0.667777778 | |
| 1 | 6/21/2015 1:30 | 0.688333333 | |
| 1 | 6/21/2015 9:38 | 8.133055556 | |
| 1 | 6/21/2015 10:09 | 0.511111111 | |
| 1 | 6/21/2015 10:45 | 0.600555556 | |
| 1 | 6/21/2015 11:09 | 0.411111111 | |
| 1 | 6/21/2015 11:32 | 0.381666667 | |
| 1 | 6/21/2015 11:55 | 0.385 | x |
| 1 | 6/21/2015 12:18 | 0.383055556 | |
| 1 | 6/21/2015 12:23 | 0.074444444 | |
| 1 | 6/22/2015 10:01 | 21.63527778 | x |
| 1 | 6/22/2015 10:24 | 0.380555556 | |
| 1 | 6/22/2015 10:46 | 0.373611111 | |
+---------+-----------------+-------------+---+
The "x" are the records that should be pulled since they're the first records in the 12 hour block.
If I understand correctly, you want the first record in each 12-hour block where the blocks of time are defined by the first event time.
If so, you need to modify your query to get the difference from the *first * time for each customer. The rest is just arithmetic. The query would look something like this:
with t as (
select cust_id, event_datetime,
(24 * (event_datetime -
coalesce(min(event_datetime) over (partition by cust_id ), 0)
) as difference
from tbl
)
select t.*
from (select t.*,
row_number() over (partition by cust_id, floor(difference / 12)
order by difference) as seqnum
from t
) t
where seqnum = 1;

Oracle rank function issue

Iam experiencing an issue in oracle analytic functions
I want the rank in oracle to be displayed sequentialy but require a cyclic fashion.But this ranking should happen within a group.
Say I have 10 groups
In 10 groups each group must be ranked in till 9. If greater than 9 the rank value must start again from 1 and then end till howmuch so ever
emp id date1 date 2 Rank
123 13/6/2012 13/8/2021 1
123 14/2/2012 12/8/2014 2
.
.
123 9/10/2013 12/12/2015 9
123 16/10/2013 15/10/2013 1
123 16/3/2014 15/9/2015 2
In the above example the for the group of rows of the empid 123 i have split the rank in two subgroup fashion.Sequentially from 1 to 9 is one group and for the rest of the rows the rank again starts from 1.How to achieve this in oracle rank functions.
as per suggestion from Egor Skriptunoff above:
select
empid, date1, date2
, row_number() over(order by date1, date2) as "rank"
, mod(row_number() over(order by date1, date2)-1, 9)+1 as "cycle_9"
from yourtable
example result
| empid | date1 | date2 | rn | ranked |
|-------|----------------------|----------------------|----|--------|
| 72232 | 2016-10-26T00:00:00Z | 2017-03-07T00:00:00Z | 1 | 1 |
| 04365 | 2016-11-03T00:00:00Z | 2017-07-29T00:00:00Z | 2 | 2 |
| 79203 | 2016-12-15T00:00:00Z | 2017-05-16T00:00:00Z | 3 | 3 |
| 68638 | 2016-12-18T00:00:00Z | 2017-02-08T00:00:00Z | 4 | 4 |
| 75784 | 2016-12-24T00:00:00Z | 2017-11-18T00:00:00Z | 5 | 5 |
| 72836 | 2016-12-24T00:00:00Z | 2018-09-10T00:00:00Z | 6 | 6 |
| 03679 | 2017-01-24T00:00:00Z | 2017-10-14T00:00:00Z | 7 | 7 |
| 43527 | 2017-02-12T00:00:00Z | 2017-01-15T00:00:00Z | 8 | 8 |
| 03138 | 2017-02-26T00:00:00Z | 2017-01-30T00:00:00Z | 9 | 9 |
| 89758 | 2017-03-29T00:00:00Z | 2018-04-12T00:00:00Z | 10 | 1 |
| 86377 | 2017-04-14T00:00:00Z | 2018-10-07T00:00:00Z | 11 | 2 |
| 49169 | 2017-04-28T00:00:00Z | 2017-04-21T00:00:00Z | 12 | 3 |
| 45523 | 2017-05-03T00:00:00Z | 2017-05-07T00:00:00Z | 13 | 4 |
SQL Fiddle