How to group by Date Range starting from initial date - sql

I have the following table structure
Key int
MemberID int
VisitDate DateTime
How can group all the dates falling with a given date range say 15 days..The first visit for the sameMember should be considered as the starting date.
eg
Key ID VisitDate(MM/dd/YY)
1 1 02/01/11
2 1 02/09/11
3 1 02/12/11
4 1 02/17/11
5 2 02/03/11
6 2 02/19/11
In this case the result should be
ID StartDate EndDate
1 02/01/11 02/12/11
1 02/17/11 02/17/11
2 02/03/11 02/03/11
2 02/19/11 02/19/11

One way to do this would be to use window aggregating. Here's how:
Setup:
DECLARE #data TABLE (
[Key] int, ID int, VisitDate date
);
INSERT INTO #data ([Key], ID, VisitDate)
SELECT 1, 1, '02/01/2011' UNION ALL
SELECT 2, 1, '02/09/2011' UNION ALL
SELECT 3, 1, '02/12/2011' UNION ALL
SELECT 4, 1, '02/17/2011' UNION ALL
SELECT 5, 2, '02/03/2011' UNION ALL
SELECT 6, 2, '02/19/2011';
Query:
WITH marked AS (
SELECT
*,
Grp = DATEDIFF(DAY, MIN(VisitDate) OVER (PARTITION BY ID), VisitDate) / 15
FROM #data
)
SELECT
ID,
StartDate = MIN(VisitDate),
EndDate = MAX(VisitDate)
FROM marked
GROUP BY ID, Grp
ORDER BY ID, StartDate
Output:
ID StartDate EndDate
----------- ---------- ----------
1 2011-02-01 2011-02-12
1 2011-02-17 2011-02-17
2 2011-02-03 2011-02-03
2 2011-02-19 2011-02-19
Basically, for each row, the query is calculating the difference of days between VisitDate and the first VisitDate for the same ID and divides it by 15. The result is then used as a grouping criterion. Note that SQL Server uses integer division when both operands of the / operator are integers.

Related

BigQuery: Possible two merge 2 arrays of ranges where you split the range if they overlap

I was wondering if the following case is possible within BigQuery.
There are 2 tables of intervals. The intervals in a single table do not overlap with other intervals in the same table. The intervals however can overlap with intervals in the other table.
I want to merge the intervals, but also divide the intervals into multiple intervals if they overlap. So if for example the interval is in table A from 5/8/2020 - 5/9/2020 and there is an interval in B 18/8/2020 - 1/9/2020, then I want to split the interval as 5/8/2020 - 18/8/2020 (in A), 18/8/2020 - 1/9/2020 (in A and B) and 1/9/2020 - 5/9/2020 (in A).
A more extensive example: We have a table with intervals where people eat Apples
ID
StartDate
EndDate
1
01/01/19
01/04/19
2
01/01/19
03/01/19
And a table with intervals where people eat Bananas
ID
StartDate
EndDate
1
15/12/18
12/01/19
1
01/02/19
17/02/19
1
15/03/19
15/04/19
2
01/06/19
01/07/19
And now we want to combine those intervals and classify the intervals as either, apple eaters, banana eaters, or apple and banana eaters.
ID
StartDate
EndDate
type
1
15/12/18
01/01/19
B
1
01/01/19
12/01/19
AB
1
12/01/19
01/02/19
A
1
01/02/19
17/02/19
AB
1
17/02/19
15/03/19
A
1
15/03/19
01/04/19
AB
1
01/04/19
15/04/19
B
2
01/01/19
03/01/19
A
2
01/06/19
01/07/19
B
Is it possible to solve this with bigQuery?
Consider below query :
WITH stacked AS (
SELECT ID, date, STRING_AGG(type, '' ORDER BY type) type FROM (
SELECT *, 'A' type FROM Apples
UNION ALL
SELECT *, 'B' type FROM Bananas
), UNNEST (GENERATE_DATE_ARRAY(PARSE_DATE('%d/%m/%y', StartDate), PARSE_DATE('%d/%m/%y', EndDate), INTERVAL 1 DAY)) date
GROUP BY 1, 2
),
partitioned AS (
SELECT ID, date, type,
COUNTIF(flag) OVER w AS div,
type = 'AB' AND LEAD(type) OVER w <>'AB' in_AB,
type <> 'AB' AND LAG(type) OVER w = 'AB' out_AB,
type <> 'AB' AND LEAD(type, 1, 'A') OVER w <> 'AB' bw_AB,
FROM (
SELECT ID, date, type, type <> LAG(type) OVER (PARTITION BY ID ORDER BY date) AS flag
FROM stacked
)
WINDOW w AS (PARTITION BY ID ORDER BY date)
)
SELECT ID,
MIN(IF(out_AB, date - 1, date)) StartDate,
MAX(IF(in_AB or bw_AB, date, date + 1)) EndDate,
ANY_VALUE(type) type
FROM partitioned
GROUP BY ID, div
ORDER BY 1, 2;
With sample tables:
CREATE TEMP TABLE Apples AS
select 1 ID, '01/01/19' StartDate, '01/04/19'EndDate union all
select 2, '01/01/19', '03/01/19';
CREATE TEMP TABLE Bananas AS
select 1 ID, '15/12/18' StartDate, '12/01/19' EndDate union all
select 1, '01/02/19', '17/02/19' union all
select 1, '15/03/19', '15/04/19' union all
select 2, '01/06/19', '01/07/19';

For each quarter between two dates, add rows quarter by quarter in SQL SERVER

I have a table, with types int, datetime, datetime:
id start date end date
-- ---------- ----------
1 2019-04-02 2020-09-17
2 2019-08-10 2020-08-10
Here is create/insert:
CREATE TABLE dbo.something
(
id int,
[start date] datetime,
[end date] datetime
);
INSERT dbo.something(id,[start date],[end date])
VALUES(1,'20190402','20200917'),(2,'20190810','20200810');
What is a SQL query that can produce these results:
id Year Quarter
-- ---- ----------
1 2019 2
1 2019 3
1 2019 4
1 2020 1
1 2020 2
1 2020 3
2 2019 3
2 2019 4
2 2020 1
2 2020 2
2 2020 3
Just use a recursive CTE. This version switches to counting quarters from year 0:
with cte as (
select id,
year(start_date) * 4 + datepart(quarter, start_date) - 1 as yyyyq,
year(end_date) * 4 + datepart(quarter, end_date) - 1 as end_yyyyq
from t
union all
select id, yyyyq + 1, end_yyyyq
from cte
where yyyyq < end_yyyyq
)
select id, yyyyq / 4 as year, (yyyyq % 4) + 1 as quarter
from cte;
Here is a db<>fiddle.
If you cannot make another reference table/etc, you can use DATEDIFF (and DATEPART) using quarters, and then some simple date arithmetic.
The logic below is simply to find, for each startdate, the first quarter and then the number of additional quarters to get to the maximum. Then do a SELECT where the additional quarters are added to the startdate, to get each quarter.
The hardest part of the query to understand imo is the WITH numberlist section - all this does is generate a series of integers between 0 and the maximum number of quarters difference. If you already have a numbers table, you can use that instead.
Key code part is below, and here's a full DB_Fiddle with some additional test data.
CREATE TABLE #yourtable (id int, startdate date, enddate date)
INSERT INTO #yourtable (id, startdate, enddate) VALUES
(1, '2019-04-02', '2020-09-17'),
(2, '2019-08-10', '2020-08-20')
; WITH number_list AS
-- list of ints from 0 to maximum number of quarters
(SELECT n
FROM (SELECT ones.n + 10*tens.n AS n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n)
) AS a
WHERE n <= (SELECT MAX(DATEDIFF(quarter,startdate,enddate)) FROM #yourtable)
)
SELECT id,
YEAR(DATEADD(quarter, number_list.n, startdate)) AS [Year],
DATEPART(quarter, DATEADD(quarter, number_list.n, startdate)) AS [Quarter]
FROM (SELECT id, startdate, DATEDIFF(quarter,startdate,enddate) AS num_additional_quarters FROM #yourtable) yt
CROSS JOIN number_list
WHERE number_list.n <= yt.num_additional_quarters
DROP TABLE #yourtable
First create a date dimension table which contains date, corresponding quarter and year. Then use below query to get the result. Tweak column and table name according to your schema.
with q_date as
(
select 1 as id, '2019-04-02' :: date as start_date, '2020-09-17' :: date as end_date
UNION ALL
select 2 as id, '2019-08-10' :: date as start_date, '2020-08-10' :: date as end_date
)
select qd.id, dd.calendar_year, dd.calendar_quarter_number
from dim_date dd, q_date qd
where dd.date_dmk between qd.start_date and qd.end_date
group by qd.id, dd.calendar_year, dd.calendar_quarter_number
order by qd.id, dd.calendar_year, dd.calendar_quarter_number;

SQL: How to create a weekly user count summary by month

I’m trying to create a week over week active user count summary report/table aggregated by month. I have one table for June 2017 and one table for May 2016 which I need to join together in order to. The date timestamp is created_utc which is a UNIX timestamp which I can figure out to transform into a human-readable format and from there extract the week of the year value so 1 through 52. The questions I have are:
Number the weeks just by values of 1 through 4. So, week 1 for June, Week 1 for May, Week 2 for June week 2 for May and so on.
Joining the tables based by those weeks 1 through 4 values
Pivoting the table and adding a WOW Change variable.
I'd like the final table to look like this:
W
| Week | June_count | May_count |WOW_Change |
|:-----------|:-----------:|:------------:|:----------:
| Week_1 | 5 | 8 | 0.6 |
| Week_2 | 2 | 1 | -0.5 |
| Week_3 | 10 | 5 | -0.5 |
| Week_4 | 30 | 6 | 1 |
Below is some sample data as well as the code I've started.
CREATE TABLE June
(created_utc int, id varchar(6))
;
INSERT INTO June
(created_utc, userid)
VALUES
(1496354167, '6eq4xf'),
(1496362973, '6eqzz3'),
(1496431934, '6ewlm8'),
(1496870877, '6fwied'),
(1496778080, '6fo79k'),
(1496933893, '6g1gcg'),
(1497154559, '6gjkid'),
(1497618561, '6hmeud'),
(1497377349, '6h1osm'),
(1497221017, '6god73'),
(1497731470, '6hvmic'),
(1497273130, '6gs4ay'),
(1498080798, '6ioz8q'),
(1497769316, '6hyer4'),
(1497415729, '6h5cgu'),
(1497978764, '6iffwq')
;
CREATE TABLE May
(created_utc int, id varchar(6))
;
INSERT INTO May
(created_utc, userid)
VALUES
(1493729491, '68sx7k'),
(1493646801, '68m2s2'),
(1493747285, '68uohf'),
(1493664087, '68ntss'),
(1493690759, '68qe5k'),
(1493829196, '691fy9'),
(1493646344, '68m1dv'),
(1494166859, '69rhkl'),
(1493883023, '6963qb'),
(1494362328, '6a83wv'),
(1494525998, '6alv6c'),
(1493945230, '69bkhb'),
(1494050355, '69jqtz'),
(1494418011, '6accd0'),
(1494425781, '6ad0xm'),
(1494024697, '69hx2z'),
(1494586576, '6aql9y')
;
#standardSQL
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM June
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM May
Below is for BigQuery Standard SQL
#standardSQL
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.June` AS (
SELECT 1496354167 created_utc, '6eq4xf' userid UNION ALL
SELECT 1496362973, '6eqzz3' UNION ALL
SELECT 1496431934, '6ewlm8' UNION ALL
SELECT 1496870877, '6fwied' UNION ALL
SELECT 1496778080, '6fo79k' UNION ALL
SELECT 1496933893, '6g1gcg' UNION ALL
SELECT 1497154559, '6gjkid' UNION ALL
SELECT 1497618561, '6hmeud' UNION ALL
SELECT 1497377349, '6h1osm' UNION ALL
SELECT 1497221017, '6god73' UNION ALL
SELECT 1497731470, '6hvmic' UNION ALL
SELECT 1497273130, '6gs4ay' UNION ALL
SELECT 1498080798, '6ioz8q' UNION ALL
SELECT 1497769316, '6hyer4' UNION ALL
SELECT 1497415729, '6h5cgu' UNION ALL
SELECT 1497978764, '6iffwq'
), `project.dataset.May` AS (
SELECT 1493729491 created_utc, '68sx7k' userid UNION ALL
SELECT 1493646801, '68m2s2' UNION ALL
SELECT 1493747285, '68uohf' UNION ALL
SELECT 1493664087, '68ntss' UNION ALL
SELECT 1493690759, '68qe5k' UNION ALL
SELECT 1493829196, '691fy9' UNION ALL
SELECT 1493646344, '68m1dv' UNION ALL
SELECT 1494166859, '69rhkl' UNION ALL
SELECT 1493883023, '6963qb' UNION ALL
SELECT 1494362328, '6a83wv' UNION ALL
SELECT 1494525998, '6alv6c' UNION ALL
SELECT 1493945230, '69bkhb' UNION ALL
SELECT 1494050355, '69jqtz' UNION ALL
SELECT 1494418011, '6accd0' UNION ALL
SELECT 1494425781, '6ad0xm' UNION ALL
SELECT 1494024697, '69hx2z' UNION ALL
SELECT 1494586576, '6aql9y'
)
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
-- ORDER BY week
with result (as sample data is limited to just first two weeks result is also showing two weeks only which should not be an issue when you apply it to real data)
Row Week June_count May_count WOW_Change
1 Week_1 5 12 1.4
2 Week_2 6 5 -0.17
Use arithmetic on the day of the month to get the week:
SELECT j.weeknumber, j.user_count as june_user_count,
m.user_count as may_user_count
FROM (SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM June
GROUP BY week_number
) j JOIN
(SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM May
GROUP BY week_number
) m
ON m.week_number = j.week_number;
Note that splitting data into different tables just based on the date is bad idea. The data should all go into one table, perhaps partitioned if data volume is an issue.

#SQL - Order By matching record first

i need help to order this table (named "season") , by matching actual date with the BEGINDATE
ID NAME BEGINDATE
----------- -------------------- ----------
1 2014-2015 2014-10-01
2 2015-2016 2015-10-01
3 2016-2017 2016-10-01
4 2017-2018 2017-10-01
for example:
actual date is 2016/10/28 so we are in season 2016-2017 (id=3)
so the result should be
ID NAME BEGINDATE
----------- -------------------- ----------
3 2016-2017 2016-10-01
1 2014-2015 2014-10-01
2 2015-2016 2015-10-01
4 2017-2018 2017-10-01
UPDATE (SOLVED)
what i finally did was:
DECLARE #IDACTIVE AS INT = (SELECT MAX(ID) FROM SEASON WHERE BEGINDATE < GETDATE())
SELECT
1 AS ORDERBY,
ID,
NAME,
BEGINDATE
FROM SEASON
WHERE ID = #IDACTIVE
UNION
SELECT
2 AS ORDERBY,
ID,
NAME,
BEGINDATE
FROM SEASON
WHERE ID = #IDACTIVE
Follow the next approach:
1) Get The only matched row by using Top and Where clauses.
2) Get the all records except the one that you getting on point #1
3) Combine the result of two Selects via using UNION ALL.
Demo:-
Create table season (id int , NAME varchar(20),BEGINDATE date)
go
insert into season values (1,'2014-2015','2014-10-01')
insert into season values (2,'2015-2016','2015-10-01')
insert into season values (3,'2016-2017','2016-10-01')
insert into season values (4,'2017-2018','2017-10-01')
go
select * from (
select top 1 * from season
where BEGINDATE < getdate()
order by BEGINDATE desc
) a
union all
select * from season
where BEGINDATE != (
select top 1 BEGINDATE from season
where BEGINDATE < getdate()
order by BEGINDATE desc)
-- an another Soluation
select * from season
where DATEPART(Year,BEGINDATE) =DATEPART(Year,getdate())
union all
select * from season
where DATEPART(Year,BEGINDATE) !=DATEPART(Year,getdate())
The Result:
First move all future dates to the end, then order by beginDate
SELECT *
FROM season
ORDER BY CASE WHEN beginDate > GETDATE() THEN 0 ELSE 1 END,
beginDate
I think this is most easily done using window functions:
select s.*
from season s
order by (case when begindate = max(case when getdate() >= begindate then begindate end) over ()
then 1 else 2
end),
id

How can I dynamically create dates between a specific timespan and weeks?

I have the following customer table:
ID | StartDate | WeekCount
1 | 01.12.2015 | 2
2 | 03.12.2015 | 4
3 | 06.06.2014 | 8
The Startdate represents the date the customer gets the first visit, WeekCount is for the next visit (every X Weeks)
I want to query the next visit dates for a timespawn.
Lets say the first visit is 03.12.2015 then I query for March 2016 so the expected date should be 03.03.2015.
So basically StartDate+WeekCount and then the Date between filter.
I think recursive CTE will help you to solve your problem.
DECLARE #to_date DATETIME
SET #to_date = N'2016.03.01'
;WITH test_data AS(
SELECT 1 AS id, CAST(N'2015.12.01' AS DATETIME) AS startDate, 2 AS weekCount
UNION ALL
SELECT 2 AS id, CAST(N'2015.12.03' AS DATETIME) AS startDate, 4 AS weekCount
UNION ALL
SELECT 3 AS id, CAST(N'2014.06.06' AS DATETIME) AS startDate, 8 AS weekCount
),
result_tbl AS(
SELECT id, startDate, weekCount FROM test_data
UNION ALL
SELECT id, DATEADD(ww, R.weekCount, R.startDate), weekCount FROM result_tbl AS R
WHERE R.startDate < #to_date
)
SELECT * FROM result_tbl
ORDER BY id
Provided the datatype is date/datetime
Select columns from your_table
where StartDate>='20160301' and StartDate<'20160401'