I have an sql query that counts consecutive days but i need it to count weekends to. For example, if someone has a friday and a monday off i need this to count as 2 consecutive days if that makes sense.
tables:
CREATE TABLE Absence(
Date Date,
Code varchar(10),
Name varchar(10),
Type varchar(10)
);
INSERT INTO Absence (Date, Code, Name, Type)
VALUES ('01-10-18', 'S', 'Sam', 'Sick'),
('01-11-18','S', 'Sam', 'Sick'),
('01-12-18','S', 'Sam', 'Sick'),
('01-21-18','S', 'Sam', 'Sick'),
('01-26-18','S', 'Sam', 'Sick'),
('01-27-18','S', 'Sam', 'Sick'),
('02-12-18','S', 'Sam', 'Holiday'),
('02-13-18','S', 'Sam', 'Holiday'),
('02-18-18','S', 'Sam', 'Holiday'),
('02-25-18','S', 'Sam', 'Holiday'),
('02-10-18','S', 'Sam', 'Holiday'),
('02-13-18','F', 'Fred', 'Sick'),
('02-14-18','F', 'Fred', 'Sick'),
('03-09-18','F', 'Fred', 'Sick'),
('03-12-18','F', 'Fred', 'Sick'),
('02-28-18','F', 'Fred', 'Sick');
I have this code:
select name, min(date), max(date), count(*) as numdays, type
from (select a.*,
row_number() over (partition by name, type order by date) as
seqnum_ct
from absence a
) a
group by name, type, dateadd(day, -seqnum_ct, date);
And it produces this result:
| name | | | numdays | type |
|------|------------|------------|---------|---------|
| Fred | 2018-02-13 | 2018-02-14 | 2 | Sick |
| Fred | 2018-02-28 | 2018-02-28 | 1 | Sick |
| Fred | 2018-03-09 | 2018-03-09 | 1 | Sick |
| Fred | 2018-03-12 | 2018-03-12 | 1 | Sick |
| Sam | 2018-02-10 | 2018-02-10 | 1 | Holiday |
| Sam | 2018-02-12 | 2018-02-13 | 2 | Holiday |
| Sam | 2018-02-18 | 2018-02-18 | 1 | Holiday |
| Sam | 2018-02-25 | 2018-02-25 | 1 | Holiday |
| Sam | 2018-01-10 | 2018-01-12 | 3 | Sick |
| Sam | 2018-01-21 | 2018-01-21 | 1 | Sick |
| Sam | 2018-01-26 | 2018-01-27 | 2 | Sick |
If you look at these lines
('03-09-18','F', 'Fred', 'Sick'),
('03-12-18','F', 'Fred', 'Sick'),
This should equal 1 consecutive period even though it is a Friday and a Monday if this make sense. How can i edit this code so that it includes weekends to?
Thanks
SQL fiddle - http://sqlfiddle.com/#!18/1de27/1
Try this:
select name, min(date), max(date), count(*) as numdays, type
from (
select date, code, name, type, seqnum_ct + sum(weekend) over (partition by name, type order by date) seqnum_ct
from (select a.*,
row_number() over (partition by name, type order by date) as seqnum_ct,
case when datepart(weekday, [date]) = 2 and
datepart(weekday, lag([date]) over (partition by name, type order by date)) = 6 then 2 else 0 end [weekend]
from #absence a
) a
) a
group by name, type, dateadd(day, -seqnum_ct, date);
You can use a running sum to create groups to handle weekends. All you need to check is the current row's weekday is 2 (for Monday) and the previous row's is 6 (for Friday), for a given name,type in date order.
select name, min(date), max(date), count(*) as numdays, type
from (select a.*,sum(col) over(partition by Name,type order by [Date]) as grp
from (select a.*,
case when datediff(day,lag([Date]) over(partition by Name,type order by [Date]),[Date])=1 or
(datepart(weekday,[Date])=2 and datepart(weekday,lag([Date]) over(partition by Name,type order by [Date]))=6)
then 0 else 1 end as col
from absence a
) a
) a
group by name, type, grp
Related
I have weekly data of each product stock. I want to group it by year-month and get the first value of each month. In other words, I want to get the opening stock of each month, regardless the day of the month.
+------------+---------+
| MyDate | MyValue |
+------------+---------+
| 2018-01-06 | 2 |*
| 2018-01-13 | 7 |
| 2018-01-20 | 5 |
| 2018-01-27 | 2 |
| 2018-02-03 | 3 |*
| 2018-02-10 | 10 |
| 2018-02-17 | 6 |
| 2018-02-24 | 4 |
| 2018-03-03 | 7 |*
| 2018-03-10 | 5 |
| 2018-03-17 | 3 |
| 2018-03-24 | 4 |
| 2018-03-31 | 6 |
+------------+---------+
Desired results:
+----------------+---------+
| FirstDayOfMonth| MyValue |
+----------------+---------+
| 2018-01-01 | 2 |
| 2018-02-01 | 3 |
| 2018-03-01 | 7 |
+----------------+---------+
I thought this might work, but it ain't.
select
[product],
datefromparts(year([MyDate]), month([MyDate]), 1),
FIRST_VALUE(MyValue) OVER (PARTITION BY [Product], YEAR([MyDate]), MONTH([MyDate]) ORDER BY [MyDate] ASC) AS MyValue
from
MyTable
group by
[Product],
YEAR([MyDate]), MONTH([MyDate])
Edit. Thank you. The accent in my question is not how to get the first day of the month. I know that there are different techniques for that.
The accent is how to get the FIRST value in month (the opening stock). If there is a chance to get the closing stock in one shot - it would be great. The answers based on ROW_NUMBER do not allow to get closing stock in one shot, would require two joins.
Edit after accepting answer
Please consider John Cappelletti's answer as an alternative to the accepted one: https://stackoverflow.com/a/53559750/1903793
You don't really need the GROUP BY if you have chosen the window function route:
SELECT Product, DATEADD(DAY, 1, EOMONTH(MyDate, -1)) AS Month, MyValue
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Product, DATEADD(DAY, 1, EOMONTH(MyDate, -1)) ORDER BY MyDate) AS rn
FROM t
) AS x
WHERE rn = 1
UPDATE
To get the last row for the month just do a UNION ALL <above query> but change the order by clause to ORDER BY MyDate DESC. This will give you two rows per product-month.
You can use apply & eomonth to find the last day of month & add one day :
select distinct dateadd(day, 1, eomonth(t1.mydate, -1)) as FistDayOfMonth, t1.myvalue
from table t cross apply
( select top (1) t1.mydate, t1.myvalue
from table t1
where t1.product = t.product and
year(t1.MyDate) = year(t.MyDate) and month(t1.MyDate) = month(t.MyDate)
order by t1.mydate
) t1;
Could also use a rowNumber and cte.
DEMO
WITH CTE as (
SELECT '2018-01-06' myDate, 2 Myvalue UNION ALL
SELECT '2018-01-13', 7 UNION ALL
SELECT '2018-01-20', 5 UNION ALL
SELECT '2018-01-27', 2 UNION ALL
SELECT '2018-02-03', 3 UNION ALL
SELECT '2018-02-10', 10 UNION ALL
SELECT '2018-02-17', 6 UNION ALL
SELECT '2018-02-24', 4 UNION ALL
SELECT '2018-03-03', 7 UNION ALL
SELECT '2018-03-10', 5 UNION ALL
SELECT '2018-03-17', 3 UNION ALL
SELECT '2018-03-24', 4 UNION ALL
SELECT '2018-03-31', 6),
CTE2 as (SELECT *
, Row_Number() over (partition by DATEADD(month, DATEDIFF(month, 0, MyDate), 0) order by myDate) RN
FROM CTE)
SELECT DATEADD(month, DATEDIFF(month, 0, MyDate), 0), MyValue
FROM cte2
WHERE RN = 1
Giving us:
+----+---------------------+---------+
| | (No column name) | MyValue |
+----+---------------------+---------+
| 1 | 01.01.2018 00:00:00 | 2 |
| 2 | 01.02.2018 00:00:00 | 3 |
| 3 | 01.03.2018 00:00:00 | 7 |
+----+---------------------+---------+
Just another option is using the WITH TIES, and then a little cheat for the date
Example
Select top 1 with ties
MyDate = convert(varchar(7),MyDate,120)+'-01'
,MyValue
from YourTable
Order By Row_Number() over (Partition By convert(varchar(7),MyDate,120) Order By MyDate)
Returns
MyDate MyValue
2018-01-01 2
2018-02-01 3
2018-03-01 7
Here is a sample of the dataset I have (~10 TB)
+----+------------+----------+----------------+--------------+
| id | date | campaign | campaign_start | campaign_end |
+----+------------+----------+----------------+--------------+
| 1 | 2018-01-01 | 1 | 2018-01-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-01 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-02 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-03 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 2 | 2018-01-23 | 1 | 2018-01-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 2 | 2018-02-03 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
I want to:
For every unique id + campaign:
Get the frequency of occurrences of an id within the period of that specific campaign
Get the frequency of occurrences of an id within a variable lookback period (say 3 months) before the start of the campaign. Say " >= campaign_start + 3 months "
Get the earliest (first) and latest (last) date in that window
What I would like the output to be is:
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| id | campaign | campaign_frequency | total_lookback_frequency | campaign_start | campaign_end | first_date | last_date |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 1 | 1 | 1 | 1 | 2018-01-01 | 2018-02-03 | 2018-01-01 | 2018-01-01 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 1 | 2 | 3 | 4 | 2018-02-01 | 2018-02-03 | 2018-01-01 | 2018-02-03 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 2 | 1 | 1 | 1 | 2018-01-01 | 2018-02-03 | 2018-01-23 | 2018-01-23 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 2 | 2 | 1 | 2 | 2018-02-01 | 2018-02-03 | 2018-01-23 | 2018-02-03 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
The problem I have been having is that I can't get the total_lookback_frequency to work properly, it always returns the same result as campaign_frequency (which is just a count(id) group by id, campaign.
Below is what I had (that isn't working):
SELECT
id,
campaign,
min(date) as first_date,
max(date) as end_date,
count(id) as total_lookback_frequency,
WHERE
date >= sub(date, INTERVAL 730 hour)
GROUP BY
id,
campaign,
date
Would you be able to help out here?
Thanks!
Below is for BigQuery Standard SQL
#standardSQL
SELECT
id,
campaign,
COUNT(1) campaign_frequency,
(
SELECT COUNT(1)
FROM `project.dataset.table`
WHERE id = t.id
AND dt BETWEEN DATE_SUB(t.campaign_start, INTERVAL 3 MONTH) AND DATE_SUB(t.campaign_start, INTERVAL 1 DAY)
) total_lookback_frequency,
campaign_start,
campaign_end,
MIN(dt) AS first_date,
MAX(dt) AS end_date
FROM `project.dataset.table` t
GROUP BY id, campaign, campaign_start, campaign_end
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2018-01-01' dt, 1 campaign, DATE '2018-01-01' campaign_start, DATE '2018-02-03' campaign_end UNION ALL
SELECT 1, '2018-02-01', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 1, '2018-02-02', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 1, '2018-02-03', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 2, '2018-01-23', 1, '2018-01-01', '2018-02-03' UNION ALL
SELECT 2, '2018-02-03', 2, '2018-02-01', '2018-02-03'
)
SELECT
id,
campaign,
COUNT(1) campaign_frequency,
(
SELECT COUNT(1)
FROM `project.dataset.table`
WHERE id = t.id
AND dt BETWEEN DATE_SUB(t.campaign_start, INTERVAL 3 MONTH) AND DATE_SUB(t.campaign_start, INTERVAL 1 DAY)
) total_lookback_frequency,
campaign_start,
campaign_end,
MIN(dt) AS first_date,
MAX(dt) AS end_date
FROM `project.dataset.table` t
GROUP BY id, campaign, campaign_start, campaign_end
-- ORDER BY id, campaign
We have the following activity table and would like to query it to get the number of unique users for each month and the previous month. The date field (createdat) is a timestamp. The query needs to work in PostgreSQL.
Activity table:
| id | userid | createdat | username |
|--------|--------|-------------------------|----------------|
| 1d658a | 4957f3 | 2016-12-06 21:16:35:942 | Tom Jones |
| 3a86e3 | 684edf | 2016-12-03 21:16:35:943 | Harry Smith |
| 595756 | 582107 | 2016-12-26 21:16:35:944 | William Hanson |
| 2c87fe | 784723 | 2016-12-07 21:16:35:945 | April Cordon |
| 32509a | 4957f3 | 2016-12-20 21:16:35:946 | Tom Jones |
| 72e703 | 582107 | 2017-01-01 21:16:35:947 | William Hanson |
| 6d658a | 582107 | 2016-12-06 21:16:35:948 | William Hanson |
| 5c077c | 5934c4 | 2016-12-06 21:16:35:949 | Sandra Holmes |
| 92142b | 57ea5c | 2016-12-15 21:16:35:950 | Lucy Lawless |
| 3dd0a6 | 5934c4 | 2016-12-04 21:16:35:951 | Sandra Holmes |
| 43509a | 4957f3 | 2016-11-20 21:16:35:946 | Tom Jones |
| 85142b | 57ea5c | 2016-11-15 21:16:35:950 | Lucy Lawless |
| 7c87fe | 784723 | 2017-1-07 21:16:35:945 | April Cordon |
| 9c87fe | 784723 | 2017-2-07 21:16:35:946 | April Cordon |
Results:
| Month | UserThis Month | UserPreviousMonth |
|----------|----------------|-------------------|
| Dec 2016 | 6 | 2 |
| Jan 2017 | 2 | 6 |
| Feb 2017 | 1 | 2 |
You can try this query. to_char to get MON YYYY, You can try to write a subquery with lag windows function to get UserPreviousMonth count.
SELECT *
FROM (SELECT To_char(createdat, 'MON YYYY') Months,
Count(DISTINCT username) UserThisMonth,
Lag(Count(DISTINCT username)) OVER (
ORDER BY Date_part('year', createdat),
Date_part('month',createdat)
) UserPreviousMonth
FROM t
GROUP BY Date_part('year', createdat),
To_char(createdat, 'MON YYYY'),
Date_part('month', createdat)) t
WHERE userpreviousmonth IS NOT NULL
sqlfiddle:http://sqlfiddle.com/#!15/45e52/2
| months | userthismonth | userpreviousmonth |
|----------|---------------|-------------------|
| DEC 2016 | 6 | 2 |
| JAN 2017 | 2 | 6 |
| FEB 2017 | 1 | 2 |
EDIT
Types of Dec 2016 and Jan 2017 ... must string, because DateTime need a full date like 2017-01-01. If you need to be sorted and used on the graph I will suggest you sort on this query years and months columns, then make date string on front-end.
SELECT *
FROM (SELECT Date_part('year', createdat) years,
Date_part('month', createdat) months,
Count(DISTINCT username) UserThisMonth,
Lag(Count(DISTINCT username)) OVER (
ORDER BY Date_part('year', createdat),
Date_part('month',createdat)
) UserPreviousMonth
FROM user_activity
GROUP BY Date_part('year', createdat),
Date_part('month', createdat)) t
WHERE userpreviousmonth IS NOT NULL
sqlfiddle:http://sqlfiddle.com/#!15/2da2b/4
| years | months | userthismonth | userpreviousmonth |
|-------|--------|---------------|-------------------|
| 2016 | 12 | 6 | 2 |
| 2017 | 1 | 2 | 6 |
| 2017 | 2 | 1 | 2 |
Fastest and simplest with date_trunc(). Use to_char() once to display the month in preferred format:
WITH cte AS (
SELECT date_trunc('month', createdat) AS mon
, count(DISTINCT username) AS ct
FROM activity
GROUP BY 1
)
SELECT to_char(t1.mon, 'MON YYYY') AS month
, t1.ct AS users_this_month
, t2.ct AS users_previous_month
FROM cte t1
LEFT JOIN cte t2 ON t2.mon = t1.mon - interval '1 mon'
ORDER BY t1.mon;
db<>fiddle here
You commented:
the "Month" field in the results table needs to be a "date" data type so it can be sorted and used on the graph.
For this, simply cast in the final SELECT:
SELECT t1.mon::date AS month ...
Grouping and ordering by a (truncated) timestamp value is more efficient (and reliable) than by multiple values or a text representation.
The result includes the first month ('NOV 2016' in your demo), showing NULL for users_previous_month - like for any previous month without entries. You might want to display 0 instead or drop the row ...
Related:
How to get the date and time from timestamp in PostgreSQL select query?
PostgreSQL: running count of rows for a query 'by minute'
Aside: usernames in the form of "Tom Jones" are typically not unique. You'll want to operate with a unique ID instead.
Edit:
Shamelessly using #D-Shih's superior method of generating year/month combinations.
A couple of solutions:
WITH ua AS (
SELECT
TO_CHAR(createdate, 'YYYYMM') AS year_month,
COUNT(DISTINCT userid) distinct_users
FROM user_activity
GROUP BY
TO_CHAR(createdate, 'YYYYMM')
)
SELECT * FROM (
SELECT
TO_DATE(ua.year_month || '01', 'YYYYMMDD')
+ INTERVAL '1 month'
- INTERVAL '1 day'
AS month_end,
ua.distinct_users,
LAG(ua.distinct_users) OVER (ORDER BY ua.year_month) distinct_users_last_month
FROM ua
) uas WHERE uas.distinct_users_last_month IS NOT NULL
ORDER BY month_end DESC;
No windowing required:
WITH ua AS (
SELECT
TO_CHAR(createdate, 'YYYYMM') AS year_month,
TO_CHAR(createdate - INTERVAL '1 MONTH', 'YYYYMM') AS last_month,
COUNT(DISTINCT userid) AS distinct_users
FROM user_activity
GROUP BY
TO_CHAR(createdate, 'YYYYMM'),
TO_CHAR(createdate - INTERVAL '1 MONTH', 'YYYYMM')
)
SELECT
TO_DATE(ua1.year_month || '01', 'YYYYMMDD')
+ INTERVAL '1 month'
- INTERVAL '1 day'
AS month_end,
ua1.distinct_users,
ua2.distinct_users AS last_distinct_users
FROM
ua ua1 LEFT OUTER JOIN ua ua2
ON ua1.year_month = ua2.last_month
WHERE ua2.distinct_users IS NOT NULL
ORDER BY ua1.year_month DESC;
DDL:
CREATE TABLE user_activity (
id varchar(50),
userid varchar(50),
createdate timestamp,
username varchar(50)
);
COMMIT;
Data:
INSERT INTO user_activity VALUES ('1d658a','4957f3','20161206 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('3a86e3','684edf','20161203 21:16:35'::timestamp,'Harry Smith');
INSERT INTO user_activity VALUES ('595756','582107','20161226 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('2c87fe','784723','20161207 21:16:35'::timestamp,'April Cordon');
INSERT INTO user_activity VALUES ('32509a','4957f3','20161220 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('72e703','582107','20170101 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('6d658a','582107','20161206 21:16:35'::timestamp,'William Hanson');
INSERT INTO user_activity VALUES ('5c077c','5934c4','20161206 21:16:35'::timestamp,'Sandra Holmes');
INSERT INTO user_activity VALUES ('92142b','57ea5c','20161215 21:16:35'::timestamp,'Lucy Lawless');
INSERT INTO user_activity VALUES ('3dd0a6','5934c4','20161204 21:16:35'::timestamp,'Sandra Holmes');
INSERT INTO user_activity VALUES ('43509a','4957f3','20161120 21:16:35'::timestamp,'Tom Jones');
INSERT INTO user_activity VALUES ('85142b','57ea5c','20161115 21:16:35'::timestamp,'Lucy Lawless');
INSERT INTO user_activity VALUES ('7c87fe','784723','20170107 21:16:35'::timestamp,'April Cordon');
INSERT INTO user_activity VALUES ('9c87fe','784723','20170207 21:16:35'::timestamp,'April Cordo');
COMMIT;
Basic T-SQL user here. I'm having problems trying to complete a task and would appreciate some guidance. Apologies in advance for any errors as English is not my mother tongue.
I have a table with a lot of transactions, for the sake of simplicity let's say that I only have two columns: CUSTOMER_ID, which is my customer and DATE which is the date of the transaction.
My customers make a lot of transactions while they're in town but then they can spend weeks, months or even years before coming back and start making transactions again. I would like to somehow identify each one of those "Trips" and group the transactions involved, then I'd like to do thins like calculate trip duration, number of transactions, etc.
I'd like to consider a Trip as any new transaction occurring after an IDLE period of 10 days.
Let me try to better explain my request by using some simple example:
This is my transactions table:
+-------------+------------+
| CUSTOMER_ID | DATE |
+-------------+------------+
| JHON | 01-01-2016 |
| JHON | 01-02-2016 |
| PEDRO | 01-02-2016 |
| JHON | 01-05-2016 |
| MIKE | 01-05-2016 |
| MIKE | 01-10-2016 |
| JHON | 01-07-2016 |
| … | … |
| JHON | 02-15-2016 |
| JHON | 02-18-2016 |
| MIKE | 02-19-2016 |
| MIKE | 02-19-2016 |
+-------------+------------+
So far I've made this query to enumerate the customer's visits:
SELECT
CUSTOMER_ID,
DATE,
ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM
FROM
TRANSACTIONS
WHERE
CUSTOMER_ID IN ('JHON','MIKE','PEDRO')
Running that query would give a result similar to this:
+-------------+------------+-----------+
| CUSTOMER_ID | DATE | VISIT_NUM |
+-------------+------------+-----------+
| JHON | 01-01-2016 | 1 |
| JHON | 01-02-2016 | 2 |
| JHON | 01-07-2016 | 3 |
| JHON | 02-15-2016 | 4 |
| JHON | 02-18-2016 | 5 |
| MIKE | 01-05-2016 | 1 |
| MIKE | 01-10-2016 | 2 |
| MIKE | 02-19-2016 | 3 |
| MIKE | 02-19-2016 | 4 |
| PEDRO | 01-02-2016 | 1 |
+-------------+------------+-----------+
Now comes the tricky part: I need somehow to create a query that (maybe using the above query as a previous step) show me the customer with they trip info, continuing with the example the ideal result would be like this:
+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON | 1 | 01-01-2016 | 01-07-2016 | 7 | 3 |
| JHON | 2 | 02-15-2016 | 02-18-2016 | 3 | 2 |
| MIKE | 1 | 01-05-2016 | 01-10-2016 | 5 | 2 |
| MIKE | 2 | 02-19-2016 | 02-19-2016 | 1 | 2 |
| PEDRO | 1 | 01-02-2016 | 01-02-2016 | 1 | 1 |
+-------------+----------+---------------+-------------+---------------+--------------+
As you can see, Mr. Jhon came 3 times during the month of January and came back again in February. As more than 10 days passed from his last transaction in January, I'd like to consider his new set of transactions as a new "trip" for him. Mike also had some activity in January, and came back in February too, in his second trip he made two transactions in the same day, I'd like to account that too. If a customer only came a single day and had some activity (as the case of Mr. Pedro) I'd also like to consider that single-day, single-transaction record as a trip record.
I would greatly appreciate any light on this, I honestly have no idea on how to proceed (I've been reading about cursors but it seems like dark magic at this point, cant figure out a way to implement them on this).
Apologies again for any grammatical errors and any possible omissions on my part. I'd further clarify anything if necessary.
Calculating trip duration is not standard for all employees in your example,so i have tweaked it to follow first customer id for all
DEMO HERE
;with cte
as
(select cid,datee,datepart(month,datee) as monthh,
dense_rank () over (partition by cid order by datepart(month,datee)) as samemonth,
count(0) over (partition by cid,datepart(month,datee) ) as cnt
from #temp
)
,cte1 as
(
select cid,max(samemonth) as tripnumber,min(datee) as startdate,max(datee) as enddate,
max(cnt) as numberoftrips
from cte
group by cid,samemonth
)
select *,datediff(day,startdate,dateadd(day,1,enddate))as duration
from cte1
Output:
cid tripnumber startdate enddate numberoftransactions duration
JHON 1 2016-01-01 2016-01-07 3 7
JHON 2 2016-02-15 2016-02-18 2 4
MIKE 1 2016-01-05 2016-01-10 2 6
MIKE 2 2016-02-19 2016-02-19 2 1
PEDRO 1 2016-01-02 2016-01-02 1 1
I found the perfect answer elsewhere. All credit goes to to the Reddit user nvarscar for the amazing solution!
I'll just copy his/her answer below, in case someone else need it in the future:
You may use a window function feature, which helps you to aggregate
rows between current row and all preceding ones. The code looks too
long, but at least you will see the steps taken.
DECLARE #t TABLE
([CUSTOMER_ID] varchar(5), [DATE] datetime)
;
INSERT INTO #t
([CUSTOMER_ID], [DATE])
VALUES
('JHON', '2016-01-01 00:00:00'),
('JHON', '2016-01-02 00:00:00'),
('PEDRO', '2016-01-02 00:00:00'),
('JHON', '2016-01-05 00:00:00'),
('MIKE', '2016-01-05 00:00:00'),
('MIKE', '2016-01-10 00:00:00'),
('JHON', '2016-01-07 00:00:00'),
('JHON', '2016-02-15 00:00:00'),
('JHON', '2016-02-18 00:00:00'),
('MIKE', '2016-02-19 00:00:00'),
('MIKE', '2016-02-19 00:00:00'),
('JHON', '2016-02-01 00:00:00'),
('JHON', '2016-02-02 00:00:00'),
('PEDRO', '2016-03-02 00:00:00'),
('JHON', '2016-03-05 00:00:00'),
('MIKE', '2016-05-05 00:00:00'),
('MIKE', '2016-05-10 00:00:00'),
('JHON', '2016-03-07 00:00:00'),
('JHON', '2016-04-15 00:00:00'),
('JHON', '2016-04-18 00:00:00'),
('MIKE', '2016-06-19 00:00:00'),
('MIKE', '2016-06-19 00:00:00')
;
WITH CTE1 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM #t
GROUP BY
[CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT
[CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag
I have a table such as the following:
+----+----------------+-------------------------+
| id | employeeNumber | transactionTime |
+----+----------------+-------------------------+
| 1 | 1234 | 2016-02-23 15:11:00.000 |
+----+----------------+-------------------------+
| 2 | 1234 | 2016-02-22 11:01:00.000 |
+----+----------------+-------------------------+
| 3 | 1235 | 2016-02-22 07:22:00.000 |
+----+----------------+-------------------------+
| 4 | 1236 | 2016-02-20 09:16:00.000 |
+----+----------------+-------------------------+
| 5 | 1236 | 2016-02-19 11:01:00.000 |
+----+----------------+-------------------------+
| 6 | 1236 | 2016-02-18 11:44:00.000 |
+----+----------------+-------------------------+
| 7 | 1236 | 2016-02-17 12:12:00.000 |
+----+----------------+-------------------------+
| 8 | 1236 | 2016-02-16 11:09:00.000 |
+----+----------------+-------------------------+
| 9 | 1236 | 2016-02-15 11:19:00.000 |
+----+----------------+-------------------------+
| 10 | 1236 | 2016-02-14 09:12:00.000 |
+----+----------------+-------------------------+
I Need to find a way to return the number of consecutive days that each employee logged a transaction over the past 2 weeks. Such as this:
+------+--------------+-------------------------+-------------------------+
| days |employeeNumber| startTime | endTime |
+------+--------------+-------------------------+-------------------------+
| 2 | 1234 | 2016-02-22 11:01:00.000 | 2016-02-23 15:11:00.000 |
+------+--------------+-------------------------+-------------------------+
| 1 | 1235 | 2016-02-22 11:01:00.000 | 2016-02-22 11:01:00.000 |
+------+--------------+-------------------------+-------------------------+
| 7 | 1236 | 2016-02-14 09:12:00.000 | 2016-02-20 09:16:00.000 |
+------+--------------+-------------------------+-------------------------+
I have been working with the following query, but It only returns a single user and doesn't take into account only the past 2 weeks.
WITH
dates(date) AS (
SELECT DISTINCT CAST(transactionTime AS DATE)
FROM Fuel.dbo.comdata
WHERE employeeNumber = 123456
),
groups AS (
SELECT ROW_NUMBER() OVER (ORDER BY date) AS rn,
DATEADD(DAY, -ROW_NUMBER() OVER (ORDER BY date), date) AS grp,
date
FROM dates
)
SELECT COUNT(*) AS consecutiveDates,
MIN(date) AS minDate, MAX(date) AS maxDate
FROM groups
GROUP BY grp
ORDER BY 1 DESC, 2 DESC
Any help is appreciated.
UPDATE
So I have found the following query very helpful thanks to Gordon Linoff's answer below. However I notice that the Min/Max Dates don't match up to the number of consecutive days. As shown Here with live data:
SELECT * FROM (
SELECT employeeNumber, COUNT(*) AS consecutiveDays,
MIN(transactionTime) AS startTime, MAX(transactionTime) AS endTime
FROM (
SELECT cd.*, DATEADD(DAY, -DENSE_RANK() OVER (PARTITION BY
employeeNumber ORDER BY transactionTime), CAST(transactionTime AS
DATE)) AS grp
FROM Fuel.dbo.comdata cd
WHERE transactionTime >= DATEADD(DAY, -14, GETDATE())
) cd
GROUP BY employeeNumber, grp
) AS tbl1
WHERE consecutiveDays >= 7
+--------------+-------------------------+------------------------+
| empNum | days| startTime | endTime |
+--------+-------------------------------+------------------------+
| 16742 | 7 | 2016-04-28 17:00:00.000 | 2016-05-07 17:04:00.000|
+--------+-------------------------------+------------------------+
| 15056 | 8 | 2016-04-27 13:03:00.000 | 2016-05-08 09:51:00.000|
+--------+-------------------------------+------------------------+
As you can see the number of consecutive days does not match the start/end time. Any ideas?
I would do this with the difference using row number approaches (assuming there is never more than one record per day per employee):
select employee, count(*) as numdays,
min(timestamp) as startTime, max(timestamp) as endTime
from (select cd.*,
dateadd(day,
- row_number() over (partition by employee order by transactionTime),
cast(transactionTime as date)
) as grp
from Fuel.dbo.comdata cd
) cd
group by employee, grp;
The idea is to generate a series of sequential numbers for each employee based on the transactionTime. The difference between this and the transactionTime is constant, when the transactions are on consecutive days.
If you can have multiple transactions on the same day, then you can use dense_rank().
If you have duplicates on the same day:
select employee, count(*) as numdays,
min(timestamp) as startTime, max(timestamp) as endTime
from (select cd.*,
dateadd(day,
- dense_rank() over (partition by employee order by cast(transactionTime as date)),
cast(transactionTime as date)
) as grp
from Fuel.dbo.comdata cd
) cd
group by employee, grp;