SQL - How can I count distinct IDs for each day within the last 7 days? - sql

So, I'm trying to get the number of distinct users the registered sales on the last 7 days, for each day. Here's a sample of the table I have:
ID Date
1 2018-01-01
2 2018-01-02
3 2018-01-03
3 2018-01-04
2 2018-01-05
4 2018-01-06
5 2018-01-07
2 2018-01-08
Here's the outcome that I'd expect:
Distinct IDs Date
1 2018-01-01
2 2018-01-02
3 2018-01-03
3 2018-01-04
3 2018-01-05
4 2018-01-06
5 2018-01-07
4 2018-01-08
It's as if I as counting the distinct ID's in groups of the date and the 6 days before it. Any ideas?

Probably the simplest method is to use a correlated subquery:
select t.date,
(select count(distinct t2.id)
from t t2
where t2.date >= t.date - interval '6 day' and t2.date <= t.date
) as uniques_7day
from (select distinct date
from t
) t;
Note that you haven't specified the database, so this uses ANSI/ISO standard SQL syntax.

Related

Oracle SQL - Select users between two date by month

I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!

Subtract subsequent row from previous row based on User

I have the following data and I want to subtract current row from previous row based on the UserID. I tried the code below is not given me what I want
DECLARE #DATETBLE TABLE (UserID INT, Dates DATE)
INSERT INTO #DATETBLE VALUES
(1,'2018-01-01'), (1,'2018-01-02'), (1,'2018-01-03'),(1,'2018-01-13'),
(2,'2018-01-15'),(2,'2018-01-16'),(2,'2018-01-17'), (5,'2018-02-04'),
(5,'2018-02-05'),(5,'2018-02-06'),(5,'2018-02-11'), (5,'2018-02-17')
;with cte as (
select UserID,Dates, row_number() over (order by UserID) as seqnum
from #DATETBLE t
)
select t.UserID,t.Dates, datediff(day,tprev.Dates,t.Dates)as diff
from cte t left outer join
cte tprev
on t.seqnum = tprev.seqnum + 1;
Current Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 2
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 18
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
My Expected Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 NULL
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 NULL
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
Your tag (sql-server-2008) suggests me to use APPLY :
select t.userid, t.dates, datediff(day, t1.dates, t.dates) as diff
from #DATETBLE t outer apply
( select top (1) t1.*
from #DATETBLE t1
where t1.userid = t.userid and
t1.dates < t.dates
order by t1.dates desc
) t1;
If you have SQL Server version 2012 or higher, you could use LAG() with a partition by UserID:
SELECT UserID
, DATEDIFF(dd,COALESCE(LAG_DATES, Dates), Dates) as diff
FROM
(
SELECT UserID
, Dates
, LAG(Dates) OVER (PARTITION BY UserID ORDER BY Dates) as LAG_DATES
FROM #DATETBLE
) exp
This will give you a 0 value instead of a NULL value for the first date in the sequence though.
Since you tagged the post with SQL Server 2008, however, you may need to use a method that doesn't rely on this windowed function.

Datediff between multiple rows for certain ranges

My DB Table has a data set with datetime values.
How can I return a result set, that returns the datediff between the smallest and the highest date only in case the datediff between two values are not larger than 5 minutes?
Date
2018-01-01 00:00:00
2018-01-01 00:01:00
2018-01-01 00:02:00
2018-01-01 00:03:00
2018-01-01 00:04:00
2018-01-01 00:13:00
2018-01-01 00:14:00
2018-01-01 00:15:00
2018-01-01 00:19:00
2018-01-01 00:54:00
2018-01-01 00:59:00
2018-01-01 01:00:00
Result set should look like this:
Ranges(min)
5
4
1
2
What would be an approach for that query?
You can put breaks in whenever there is a gap of more than 5 minutes. Then accumulate the number of breaks to define a group and aggregate:
select min(dte), max(dte), count(*) as cnt
from (select t.*,
sum(isbreak) over (order by dte) as grp
from (select t.*,
(case when lag(dte) over (order by dte) > dateadd(minute, -5, dte)
then 0 else 1
end) as isbreak
from t
) t
) t
group by grp;
For some reason (not clear to me right now), I thought your question involved SQL Server, so it uses that syntax. lag() is ANSI standard functionality and available in most databases; date arithmetic does vary among databases.

Find longest streak in sqlite

please help me with getting streaks data. I have table of goal achievements
Table test
dt
2017-01-01
2017-01-02
2017-01-03. //3 days end of streak
2017-02-10 // 1 day
2017-02-15
2017-02-16
2017-02-17
2017-02-18 //4 days
I tried this in MySQL
Select dt, (select count(*) from test as t1 where t1.dt < t2.dt and datediff(t2.dt,t1.dt) =1) as str
from test as t2
And got
Dt str
2017-01-01 0
2017-01-02 1
2017-01-03 2
2017-02-10 0
2017-02-15 0
2017-02-16 1
2017-02-17 2
2017-02-18 3
Is it possible to get something like this
Dt. Str
2017-01-03 3
2017-02-10 1
2017-02-18 4
And get Max of it?
You can subtract row number (i.e the number of rows <= current row's date) from the current row's date to classify consecutive rows with one day difference into the same group. Then it is just a grouping operation to calculate the count.
select max(dt) as dt, count(*) as streak
from (select t1.dt
,date(t1.dt,-(select count(*) from t t2 where t2.dt<=t1.dt)||' day') as grp
from t t1
) t
group by grp
Run the inner query to see how groups are assigned.

SQL Server - Count events that happen from 15 min to 14 days from base time

I am using SQL Server 2005. I am trying to count the number of repeats that would fall in between 15 minuites and 14 days when the Client and Type are the same.
The Table [Interactions] looks like:
eci_date user_ID Type Client
2012-05-01 10:29:59.000 user1 12 14
2012-05-01 10:35:04.000 user1 3 15
2012-05-01 10:45:14.000 user3 4 14
2012-05-01 11:50:22.000 user1 5 15
------------------------------------------
2012-05-02 10:30:28.000 user2 12 14
2012-05-02 10:48:59.000 user5 12 14
2012-05-02 10:52:23.000 user2 12 15
2012-05-02 12:49:45.000 user8 3 14
------------------------------------------
2012-05-03 10:30:47.000 user4 5 15
2012-05-03 10:35:00.000 user6 4 12
2012-05-03 10:59:10.000 user7 4 12
I would like the output to look like:
eci_date Type Total_Calls Total_Repeats
2012-05-01 12 1 2
2012-05-01 3 1 0
2012-05-01 4 1 0
2012-05-01 5 1 1
---------------------------------------------
2012-05-02 12 3 0
2012-05-02 3 1 0
---------------------------------------------
2012-05-03 4 2 1
2012-05-03 5 1 0
So there would be 2 repeats because client 14 called in 2 times after the first date they called in because Client and Type must be the same and because I need to filter by day.
Thank You.
With Metrics As
(
Select T1.Client, T1.Type
, Min(eci_Date) As FirstCallDate
From Table1 As T1
Group By T1.Client, T1.Type
)
Select DateAdd(d, DateDiff(d,0,T1.eci_date), 0) As [Day], Type, Count(*) As TotalCalls
, (
Select Count(*)
From Table1 As T2
Join Metrics As M2
On M2.Client = T2.Client
And M2.Type = T2.Type
Where T2.eci_Date >= DateAdd(mi,15,M2.FirstCallDate)
And T2.eci_date <= DateAdd(d,15,M2.FirstCallDate)
And DateAdd(d, DateDiff(d,0,T1.eci_date), 0) = DateAdd(d, DateDiff(d,0,T2.eci_date), 0)
) As Total_Repeats
From Table1 As T1
Group By DateAdd(d, DateDiff(d,0,T1.eci_date), 0), Type
Order By [Day] Asc, Type Desc
SQL Fiddle
Your question is vague, so I'm interpreting it to mean the following:
* The "total_count" column is the number of distinct users on a given day
* The number of repeats is the number of calls after the first one in the next 14 days
The following query accomplishes this:
select eci_date, count(distinct id) as numusers, count(*) as Total_repeats
from
(
select cast(eci_date as date) as eci_date,
id,
count(*) as total,
min(eci_date) as firstcall
from table t
group by cast(eci_date as date), user_id
) t
left outer join table t2
on t.user_id = t2.user_id
and t2.eci_date between firstcall and dateadd(day, 14, firstcall)
and t2.eci_date <> firstcall
group by eci_date
Note this uses the syntax cast(<datetime> as date) to extract the date portion from a datetime.