Just wandering, will it be possible to get data into every 5 mins group by name in SQL Server.
Example: I have the following data in a table:
id name message datetime
--- -------- ------- --------
1 David test 1 2017-12-18 10:00
2 David test 2 2017-12-18 10:01
3 David test 3 2017-12-18 10:03
4 Alvin bluh 1 2017-12-18 10:04
5 Alvin bluh 2 2017-12-18 10:04
6 David test 4 2017-12-18 10:06
How can I get the result as below in SQL Server?
id name message datetime
--- -------- ------- --------
1 David test 1 2017-12-18 10:00
2 David test 2
3 David test 3
4 Alvin bluh 1 2017-12-18 10:04
5 Alvin bluh 2
6 David test 4 2017-12-18 10:06
Group by Name, and group datetime by 5 mins
select id, name, message,
datetime = case when rn = 1 then datetime end
from (
select *, rn = row_number() over (partition by name,
dateadd(minute, datediff(minute, 0, datetime) / 5 * 5, 0)
order by datetime)
from yourtable t
) d
order by id
Related
I have table dbo.WorkSchedules(Id, From, To) where I store date ranges for work schedules. I want to create a view that will have all dates for all rows of WorkSchedules. Thanks to this I have 1 view with all dates for all schedules.
On web I only found solutions for 1 row like 2 parameters start and end. My issue is different where I have multiple rows with start and end range.
Example:
WorkSchedules
Id | From | To
---+------------+-----------
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
Desired result
1 | 2018-01-01
2 | 2018-01-02
3 | 2018-01-03
4 | 2018-01-04
5 | 2018-01-05
6 | 2018-01-08
7 | 2018-01-09
8 | 2018-01-10
9 | 2018-01-11
10| 2018-01-12
If you are regularly dealing with "jobs" and "schedules" then I propose that you need a permanent calendar table (a table where each row is a unique date). You can create rows for dates dynamically but why do this many times when you can do it once and just re-use?
A calendar table, even of several decades, isn't "big" and when indexed they can be very fast as well. You can also store information about holidays and/or fiscal periods etc.
There are many scripts available to produce these tables, here's an answer with 2 scripts on this site: https://stackoverflow.com/a/5635628/2067753
Assuming you use the second (more comprehensive) script, then you can exclude weekends, or other conditions such as holidays, from query results.
Once you have a permanent Calendar table this style of query may be used:
CREATE TABLE WorkSchedules(
Id INTEGER NOT NULL PRIMARY KEY
,[From] DATE NOT NULL
,[To] DATE NOT NULL
);
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (1,'2018-01-01','2018-01-05');
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (2,'2018-01-12','2018-01-12');
with range as (
select min(ws.[From]) as dt_from, max(ws.[To]) dt_to
from WorkSchedules as ws
)
select c.*
from calendar as c
inner join range on c.date between range.dt_from and range.dt_to
where c.KindOfDay = 'BANKDAY'
order by c.date
and the result looks like this (note: "News Years Day" has been excluded)
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- -------------
1 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
2 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
3 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
4 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
5 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
6 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
7 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
8 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
9 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
Without the where clause the full range is:
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- ----------------
1 01.01.2018 00:00:00 2018 1 1 1 1 1 1 2018 1 1 HOLIDAY New Year's Day
2 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
3 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
4 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
5 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
6 06.01.2018 00:00:00 2018 1 1 1 6 6 6 2018 1 1 SATURDAY NULL
7 07.01.2018 00:00:00 2018 1 1 1 7 7 7 2018 1 1 SUNDAY NULL
8 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
9 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
10 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
11 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
12 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
and weekends and holidays may be excluded using the column KindOfDay
See this as a demonstration (with build of calendar table) here: http://rextester.com/CTSW63441
Ok, I worked this out for you, thinking you mean that you meant 01/08/2018 as a From date in the second row.
/*WorkSchedules
Id| From | To
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
*/
--DROP TABLE #WorkSchedules;
CREATE TABLE #WorkSchedules (
ID int,
[DateFrom] DATE,
[DateTo] DATE
)
INSERT INTO #WorkSchedules
SELECT 1, '2018-01-01', '2018-01-05'
UNION
SELECT 2, '2018-01-08', '2018-01-12'
;WITH CTEDATELIMITS AS (
SELECT [DateFrom], [DateTo]
FROM #WorkSchedules
)
,CTEDATES AS
(
SELECT [DateFrom] as [DateResult] FROM CTEDATELIMITS
UNION ALL
SELECT DATEADD(Day, 1, [DateResult]) FROM CTEDATES
JOIN CTEDATELIMITS ON CTEDATES.[DateResult] >= CTEDATELIMITS.[DateFrom]
AND CTEDATES.dateResult < CTEDATELIMITS.[DateTo]
)
SELECT [DateResult] FROM CTEDATES
ORDER BY [DateResult]
You would use a recursive CTE:
with dates as (
select from, to, from as date
from WorkSchedules
union all
select from, to, dateadd(day, 1, date)
from dates
where date < to
)
select row_number() over (order by date), date
from dates;
Note that from and to are reserved words in SQL. They are lousy names for identifiers. I have not escaped them because I assume they are not the actual names of the columns.
I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.
i have data like this table
ItemId Value Date
1 2 2017-12-18 17:00:00.000
1 2 2017-12-18 17:02:00.000
1 2 2017-12-18 17:04:00.000
1 3 2017-12-18 17:06:00.000
1 3 2017-12-18 17:08:00.000
1 2 2017-12-18 17:10:00.000
1 2 2017-12-18 17:12:00.000
1 2 2017-12-18 17:16:00.000
1 4 2017-12-18 17:14:00.000
i want to output like this in sql server
ItemId Value MaxDate
1 2 2017-12-18 17:04:00.000
1 3 2017-12-18 17:08:00.000
1 2 2017-12-18 17:16:00.000
1 4 2017-12-18 17:14:00.000
thanks for your anwsers.
You appear to want the last row before value changes, although I'm not sure where value "4" comes from (my best guess is that the last input row should have a "4" and a different timestamp).
If so, you can simply use lead():
select t.*
from (select t.*,
lead(value) over (partition by itemId order by date) as next_value
from t
) t
where next_value is null or next_value <> value;
I have a table that looks like:
id code date1 date2 block
--------------------------------------------------
20 1234 2017-07-01 2017-07-31 1
15 1234 2017-06-01 2017-06-30 1
13 1234 2017-05-01 2017-05-31 0
11 1234 2017-03-01 2017-03-31 0
9 1234 2017-02-01 2017-02-28 1
8 1234 2017-01-01 2017-01-31 0
7 1234 2016-11-01 2016-11-31 0
6 1234 2016-10-01 2016-10-31 1
2 1234 2016-09-01 2016-09-31 1
I need to rank the rows according to the blocks of 0's and 1's, like:
id code date1 date2 block desired_rank
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 1
13 1234 2017-05-01 2017-05-31 0 2
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 4
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 5
2 1234 2016-09-01 2016-09-31 1 5
I've tried to use rank() and dense_rank(), but the result I end up with is:
id code date1 date2 block dense_rank()
-------------------------------------------------------------------
20 1234 2017-07-01 2017-07-31 1 1
15 1234 2017-06-01 2017-06-30 1 2
13 1234 2017-05-01 2017-05-31 0 1
11 1234 2017-03-01 2017-03-31 0 2
9 1234 2017-02-01 2017-02-28 1 3
8 1234 2017-01-01 2017-01-31 0 3
7 1234 2016-11-01 2016-11-31 0 4
6 1234 2016-10-01 2016-10-31 1 4
2 1234 2016-09-01 2016-09-31 1 5
In the last table, the rank doesn't care about the rows, it just takes all the 1's and 0's as a unit and sets an ascending count starting at the first 1 and 0.
My query goes like this:
CREATE TEMP TABLE data (id integer,code text, date1 date, date2 date, block integer);
INSERT INTO data VALUES
(20,'1234', '2017-07-01','2017-07-31',1),
(15,'1234', '2017-06-01','2017-06-30',1),
(13,'1234', '2017-05-01','2017-05-31',0),
(11,'1234', '2017-03-01','2017-03-31',0),
(9, '1234', '2017-02-01','2017-02-28',1),
(8, '1234', '2017-01-01','2017-01-31',0),
(7, '1234', '2016-11-01','2016-11-30',0),
(6, '1234', '2016-10-01','2016-10-31',1),
(2, '1234', '2016-09-01','2016-09-30',1);
SELECT *,dense_rank() OVER (PARTITION BY code,block ORDER BY date2 DESC)
FROM data
ORDER BY date2 DESC;
By the way, the database is in postgreSQL.
I hope there's a workaround... Thanks :)
Edit: Note that the blocks of 0's and 1's aren't equal.
There's no way to get this result using a single Window Function:
SELECT *,
Sum(flag) -- now sum the 0/1 to create the "rank"
Over (PARTITION BY code
ORDER BY date2 DESC)
FROM
(
SELECT *,
CASE
WHEN Lag(block) -- check if this is the 1st row of a new block
Over (PARTITION BY code
ORDER BY date2 DESC) = block
THEN 0
ELSE 1
END AS flag
FROM DATA
) AS dt
I have two SQL Server tables containing the following information:
Table t_venues:
venue_id is unique
venue_id | start_date | end_date
1 | 01/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 15/01/2014
4 | 20/01/2014 | 30/01/2014
Table t_venueuser:
venue_id is not unique
venue_id | start_date | end_date
1 | 02/01/2014 | 02/01/2014
2 | 05/01/2014 | 05/01/2014
3 | 09/01/2014 | 10/01/2014
4 | 23/01/2014 | 25/01/2014
From these two tables I need to find the dates that haven't been selected for each range, so the output would look like this:
venue_id | start_date | end_date
1 | 01/01/2014 | 01/01/2014
3 | 11/01/2014 | 15/01/2014
4 | 20/01/2014 | 22/01/2014
4 | 26/01/2014 | 30/01/2014
I can compare the two tables and get the date ranges from t_venues to appear in my query using 'except' but I can't get the query to produce the non-selected dates. Any help would be appreciated.
Calendar Table!
Another perfect candidate for a calendar table. If you can't be bothered to search for one, here's one I made earlier.
Setup Data
DECLARE #t_venues table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venues (venue_id, start_date, end_date)
VALUES (1, '2014-01-01', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-15')
, (4, '2014-01-20', '2014-01-30')
;
DECLARE #t_venueuser table (
venue_id int
, start_date date
, end_date date
);
INSERT INTO #t_venueuser (venue_id, start_date, end_date)
VALUES (1, '2014-01-02', '2014-01-02')
, (2, '2014-01-05', '2014-01-05')
, (3, '2014-01-09', '2014-01-10')
, (4, '2014-01-23', '2014-01-25')
;
The Query
SELECT t_venues.venue_id
, calendar.the_date
, CASE WHEN t_venueuser.venue_id IS NULL THEN 1 ELSE 0 END As is_available
FROM dbo.calendar /* see: http://gvee.co.uk/files/sql/dbo.numbers%20&%20dbo.calendar.sql for an example */
INNER
JOIN #t_venues As t_venues
ON t_venues.start_date <= calendar.the_date
AND t_venues.end_date >= calendar.the_date
LEFT
JOIN #t_venueuser As t_venueuser
ON t_venueuser.venue_id = t_venues.venue_id
AND t_venueuser.start_date <= calendar.the_date
AND t_venueuser.end_date >= calendar.the_date
ORDER
BY t_venues.venue_id
, calendar.the_date
;
The Result
venue_id the_date is_available
----------- ----------------------- ------------
1 2014-01-01 00:00:00.000 1
1 2014-01-02 00:00:00.000 0
2 2014-01-05 00:00:00.000 0
3 2014-01-09 00:00:00.000 0
3 2014-01-10 00:00:00.000 0
3 2014-01-11 00:00:00.000 1
3 2014-01-12 00:00:00.000 1
3 2014-01-13 00:00:00.000 1
3 2014-01-14 00:00:00.000 1
3 2014-01-15 00:00:00.000 1
4 2014-01-20 00:00:00.000 1
4 2014-01-21 00:00:00.000 1
4 2014-01-22 00:00:00.000 1
4 2014-01-23 00:00:00.000 0
4 2014-01-24 00:00:00.000 0
4 2014-01-25 00:00:00.000 0
4 2014-01-26 00:00:00.000 1
4 2014-01-27 00:00:00.000 1
4 2014-01-28 00:00:00.000 1
4 2014-01-29 00:00:00.000 1
4 2014-01-30 00:00:00.000 1
(21 row(s) affected)
The Explanation
Our calendar tables contains an entry for every date.
We join our t_venues (as an aside, if you have the choice, lose the t_ prefix!) to return every day between our start_date and end_date. Example output for venue_id=4 for just this join:
venue_id the_date
----------- -----------------------
4 2014-01-20 00:00:00.000
4 2014-01-21 00:00:00.000
4 2014-01-22 00:00:00.000
4 2014-01-23 00:00:00.000
4 2014-01-24 00:00:00.000
4 2014-01-25 00:00:00.000
4 2014-01-26 00:00:00.000
4 2014-01-27 00:00:00.000
4 2014-01-28 00:00:00.000
4 2014-01-29 00:00:00.000
4 2014-01-30 00:00:00.000
(11 row(s) affected)
Now we have one row per day, we [outer] join our t_venueuser table. We join this in much the same manner as before, but with one added twist: we need to join based on the venue_id too!
Running this for venue_id=4 gives this result:
venue_id the_date t_venueuser_venue_id
----------- ----------------------- --------------------
4 2014-01-20 00:00:00.000 NULL
4 2014-01-21 00:00:00.000 NULL
4 2014-01-22 00:00:00.000 NULL
4 2014-01-23 00:00:00.000 4
4 2014-01-24 00:00:00.000 4
4 2014-01-25 00:00:00.000 4
4 2014-01-26 00:00:00.000 NULL
4 2014-01-27 00:00:00.000 NULL
4 2014-01-28 00:00:00.000 NULL
4 2014-01-29 00:00:00.000 NULL
4 2014-01-30 00:00:00.000 NULL
(11 row(s) affected)
See how we have a NULL value for rows where there is no t_venueuser record. Genius, no? ;-)
So in my first query I gave you a quick CASE statement that shows availability (1=available, 0=not available). This is for illustration only, but could be useful to you.
You can then either wrap the query up and then apply an extra filter on this calculated column or simply add a where clause in: WHERE t_venueuser.venue_id IS NULL and that will do the same trick.
This is a complete hack, but it gives the results you require, I've only tested it on the data you provided so there may well be gotchas with larger sets.
In general what you are looking at solving here is a variation of gaps and islands problem ,this is (briefly) a sequence where some items are missing. The missing items are referred as gaps and the existing items are referred as islands. If you would like to understand this issue in general check a few of the articles:
Simple talk article
blogs.MSDN article
SO answers tagged gaps-and-islands
Code:
;with dates as
(
SELECT vdates.venue_id,
vdates.vdate
FROM ( SELECT DATEADD(d,sv.number,v.start_date) vdate
, v.venue_id
FROM t_venues v
INNER JOIN master..spt_values sv
ON sv.type='P'
AND sv.number BETWEEN 0 AND datediff(d, v.start_date, v.end_date)) vdates
LEFT JOIN t_venueuser vu
ON vdates.vdate >= vu.start_date
AND vdates.vdate <= vu.end_date
AND vdates.venue_id = vu.venue_id
WHERE ISNULL(vu.venue_id,-1) = -1
)
SELECT venue_id, ISNULL([1],[2]) StartDate, [2] EndDate
FROM (SELECT venue_id, rDate, ROW_NUMBER() OVER (PARTITION BY venue_id, DateType ORDER BY rDate) AS rType, DateType as dType
FROM( SELECT d1.venue_id
,d1.vdate AS rDate
,'1' AS DateType
FROM dates AS d1
LEFT JOIN dates AS d0
ON DATEADD(d,-1,d1.vdate) = d0.vdate
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 1
AND ISNULL(d0.vdate, '01 Jan 1753') = '01 Jan 1753'
UNION
SELECT d1.venue_id
,ISNULL(d2.vdate,d1.vdate)
,'2'
FROM dates AS d1
LEFT JOIN dates AS d2
ON DATEADD(d,1,d1.vdate) = d2.vdate
WHERE CASE ISNULL(d2.vdate, '01 Jan 1753') WHEN '01 Jan 1753' THEN '2' ELSE '1' END = 2
) res
) src
PIVOT (MIN (rDate)
FOR dType IN
( [1], [2] )
) AS pvt
Results:
venue_id StartDate EndDate
1 2014-01-01 2014-01-01
3 2014-01-11 2014-01-15
4 2014-01-20 2014-01-22
4 2014-01-26 2014-01-30