TSQL - Sum on time column with condition - sql

Need to create a query, which will get summary of time during which was bit set ON/OFF.
Example:
╔═══════════════════════════╗
║ TABLE ║
╠════╦════════╦═════╦═══════╣
║ ID ║ TIME ║ BIT ║ VALUE ║
╠════╬════════╬═════╬═══════╣
║ 1 ║ 13:40 ║ 1 ║ 5 ║
║ 2 ║ 13:45 ║ 1 ║ 3 ║
║ 3 ║ 13:50 ║ 1 ║ 1 ║
║ 4 ║ 13:55 ║ 0 ║ 2 ║
║ 5 ║ 14:00 ║ 0 ║ 7 ║
║ 6 ║ 14:05 ║ 1 ║ 3 ║
║ 7 ║ 14:10 ║ 1 ║ 4 ║
║ 8 ║ 14:15 ║ 0 ║ 2 ║
║ 9 ║ 14:20 ║ 1 ║ 2 ║
╚════╩════════╩═════╩═══════╝
I would like to have total summary of TIME (and VALUE - simpler one) when the BIT was SET ON:
13:40 - 13:50 = 10 mins
14:05 - 14:10 = 5 mins
14:20 = no end time, 0 mins
-----------------------------------------
15 mins
Have found:
How to sum up time field in SQL Server = Sort of good question, but there is static time as start (0:00:00) which won't work in this case
Aggregate function over a given time interval = there is aggregation but not conditioned and works on all data
I thought that this could be done as a recursive function (passing last processed datetime), which will pass the last date which was handled, and sum the datetime since the BIT is ON.
SQL query for summing the VALUE (easy one):
SELECT SUM(Value)
FROM Table
WHERE Bit = 1
How should I get total value of minutes (time), during which was the BIT set ON?
EDIT: Query which can be used for testing:
DECLARE #Table TABLE(
ID INT Identity(1,1) PRIMARY KEY,
[TIME] DATETIME NOT NULL,
[BIT] BIT NOT NULL,
[VALUE] INT NOT NULL
);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('13:40',1,5);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('13:45',1,3);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('13:50',1,1);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('13:55',0,2);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('14:00',0,7);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('14:05',1,3);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('14:10',1,4);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('14:15',0,2);
INSERT INTO #Table([TIME],[BIT],[VALUE]) VALUES('14:20',1,2);
SELECT * FROM #Table;

Use LEAD function to get time in the next row and to calculate time interval. Then just group result by [bit]
WITH t AS(
SELECT
[time],
DATEDIFF(minute, [time], LEAD([time], 1, null) OVER (ORDER BY [time])) AS interval,
[bit],
[value]
FROM table1)
SELECT [bit], CAST(DATEADD(MINUTE, SUM(interval), '00:00') AS TIME), SUM([value]) FROM t
GROUP BY [bit]

You have two issues: summing up the time and identify the adjacent values. You can handle the second with the difference of row numbers approach. You can handle the former by converting to minutes:
select bit, min(time), max(time),
sum(datediff(minute, 0, time)) as minutes,
sum(value)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by bit order by id) as seqnum_b
from t
) t
group by (seqnum - seqnum_b), bit;

This is a "gaps and islands" problem, with a pretty standard solution. I came up with this, which is pretty much the same as Gordon's, but has an extra step to calculate the intervals. This is the only reason I am posting what is essentially a duplicate answer, I'm not sure that taking the difference in minutes from zero actually works?
DECLARE #table TABLE (id int, [time] TIME, [bit] BIT, value INT);
INSERT INTO #table SELECT 1, '13:40', 1, 5;
INSERT INTO #table SELECT 2, '13:45', 1, 3;
INSERT INTO #table SELECT 3, '13:50', 1, 1;
INSERT INTO #table SELECT 4, '13:55', 0, 2;
INSERT INTO #table SELECT 5, '14:00', 0, 7;
INSERT INTO #table SELECT 6, '14:05', 1, 3;
INSERT INTO #table SELECT 7, '14:10', 1, 4;
INSERT INTO #table SELECT 8, '14:15', 0, 2;
INSERT INTO #table SELECT 9, '14:20', 1, 2;
WITH x AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY [bit] ORDER BY id) AS a_id, ROW_NUMBER() OVER (ORDER BY id) AS b_id FROM #table),
y AS (
SELECT [bit], MIN([time]) AS min_time, MAX([time]) AS max_time, SUM(value) AS value FROM x GROUP BY a_id - b_id, [bit])
SELECT [bit], SUM(value) AS total_value, SUM(DATEDIFF(MINUTE, min_time, max_time)) AS total_minutes FROM y GROUP BY [bit];
Results:
bit total_value total_minutes
0 11 5
1 18 15
As a bonus here is a solution that only solves the actual question, i.e. how much elapsed time is there when the BIT is set to 1:
WITH x AS (SELECT id, id - DENSE_RANK() OVER(ORDER BY id) AS grp FROM #table WHERE [bit] = 1), y AS (SELECT MIN(id) AS range_start, MAX(id) AS range_end FROM x GROUP BY grp)
SELECT SUM(DATEDIFF(MINUTE, t1.[time], t2.[time])) AS minutes_elapsed FROM y INNER JOIN #table t1 ON t1.id = y.range_start INNER JOIN #table t2 ON t2.id = y.range_end;

Related

Returning MIN and MAX values and ignoring nulls - populate null values with preceding non-null value

Using a table of events, I need to return the date and type for:
the first event
the most recent (non-null) event
The most recent event could have null values, which in that case needs to return the most recent non-null value
I found a few articles as well as posts here on SO that are similar (maybe even identical) but am not able to decode or understand the solution - i.e.
Fill null values with last non-null amount - Oracle SQL
https://www.itprotoday.com/sql-server/last-non-null-puzzle
https://koukia.ca/common-sql-problems-filling-null-values-with-preceding-non-null-values-ad538c9e62a6
Table is as follows - there are additional columns, but I am only including 3 for the sake of simplicity. Also note that the first Type and Date could be null. In this case returning null is desired.
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ Update ║ 2019-04-02 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
The output should be:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Update ║ 2019-04-02 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
The first method I tried was to join the table to itself using a subquery that finds the MIN and MAX dates using case statements:
select
Email,
max(case when T1.Date = T2.Min_Date then T1.Type end) as FirstType,
max(case when T1.Date = T2.Min_Date then T1.Date end) as FirstDate,
max(case when T1.Date = T2.Max_Date then T1.Type end) as LastType,
max(case when T1.Date = T2.Max_Date then T1.Date end) as LastDate,
from
T1
join
(select
EmailAddress,
max(Date) as Max_Date,
min(Date) as Min_Date
from
Table1
group by
Email
) T2
on
T1.Email = T2.Email
group by
T1.Email
This seemed to work for the MIN values, but the MAX values would return null.
To solve the problem of returning the last non-value I attempted this:
select
EmailAddress,
max(Date) over (partition by EmailAddress rows unbounded preceding) as LastDate,
max(Type) over (partition by EmailAddress rows unbounded preceding) as LastType
from
T1
group by
EmailAddress,
Date,
Type
However, this gives a result of 3 rows, instead of 1.
I'll admit I don't quite understand analytic functions since I have not had to deal with them at length. Any help would be greatly appreciated.
Edit:
The aforementioned example is an accurate representation of what the data could look like, however the below example is the exact sample data that I am using.
Sample:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║
║ A ║ null ║ null ║
╚═══════╩════════╩════════════╝
Desired Outcome:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ Create ║ 2019-04-01 ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
Additional Use-Case:
╔═══════╦════════╦════════════╗
║ Email ║ Type ║ Date ║
╠═══════╬════════╬════════════╣
║ A ║ null ║ null ║
║ A ║ Create ║ 2019-04-01 ║
╚═══════╩════════╩════════════╝
Desired Outcome:
╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate ║ LastType ║ LastDate ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A ║ null ║ null ║ Create ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝
Use window functions and conditional aggregation:
select t.email,
max(case when seqnum = 1 then type end) as first_type,
max(case when seqnum = 1 then date end) as first_date,
max(case when seqnum_nonull = 1 and type is not null then type end) as last_type,
max(case when seqnum_nonull = 1 and type is not null then date end) as last_date
from (select t.*,
row_number() over (partition by email order by date) as seqnum,
row_number() over (partition by email, (case when type is null then 1 else 2 end) order by date) as seqnum_nonull
from t
) t
group by t.email;
As Spark SQL window functions support NULLS LAST|FIRST syntax you could use that then specify a pivot with multiple aggregates for rn values 1 and 2. I could do with seeing some more sample data but this work for your dataset:
%sql
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp;
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date), MAX(type) FOR rn In ( 1, 2 ) )
Rename the columns by supplying your required parts in the query, eg
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
Alternately supply a column list, eg
-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
), cte2 AS
(
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
)
SELECT *
FROM cte2 AS (Email, FirstDate, FirstType, LastDate, LastType)
This simple query uses ROW_NUMBER to assign a row number to the dataset ordered by the date column, but using the NULLS LAST syntax to ensure null rows appear last in the numbering. The PIVOT then converts the rows to columns.

Select maximum/minimum with another column

Is there is a way to select the maximum of value + another column without the use of TOP and order by?
Assuming that we have a list of people and their ages, we want take the oldest/youngest. I want to select the name + the age. Even If it happens that we want to group them by name, that won't work.
SELECT nom,
max(age)
from Agents
group by nom
╔════════╦═════╗
║ Name ║ Age ║
╠════════╬═════╣
║ John ║ 200 ║
║ Bob ║ 150 ║
║ GSkill ║ 300 ║
║ Smith ║ 250 ║
║ John ║ 400 ║
║ Zid ║ 300 ║
║ Wick ║ 250 ║
║ Smith ║ 140 ║
╚════════╩═════╝
You could use ROW_NUMBER or DENSE_RANK. For example, if you have to show those employees having the MIN and MAX salary then you could use following SQL statement:
SELECT x.Name, x.Salary,
IIF(x.RowNumMIN = 1, 1, 0) AS IsMin,
IIF(x.RowNumMAX = 1, 1, 0) AS IsMax
FROM (
SELECT x.Name, x.Salary,
ROW_NUMBER() OVER(ORDER BY x.Salary ASC) AS RowNumMIN,
ROW_NUMBER() OVER(ORDER BY x.Salary DESC) AS RowNumMAX
FROM dbo.SourceTable AS x
) AS x
WHERE x.RowNumMIN = 1 OR x.RowNumMAX = 1
If there are two or more people having the same min or max salary and you have to show all of then you could use DENSE_RANK function instead of ROW_NUMBER.
Try this query --
;WITH CTE
AS (
SELECT [NAME]
,AGE
,DENSE_RANK() OVER (
ORDER BY AGE DESC
) AS Older
,DENSE_RANK() OVER (
ORDER BY AGE ASC
) AS Younger
FROM tblSample
)
SELECT [NAME] + ': ' + CAST(AGE AS VARCHAR(50))
FROM CTE
WHERE Older = 1 OR Younger = 1

SQLite difference between latest and second latest row

I have table like this:
create table events(
event_type integer not null,
value integer not null,
time timestamp not null,
unique (event_type ,time)
);
insert into events values
(2, 5, '2015-05-09 12:42:00'),
(4, -42, '2015-05-09 13:19:57'),
(2, 2, '2015-05-09 14:48:39'),
(2, 7, '2015-05-09 13:54:39'),
(3, 16, '2015-05-09 13:19:57'),
(3, 20, '2015-05-09 15:01:09')
I would like to see to a query that for each event_type that has been registered more than once returns the difference between the latest and the second latest value .
Given the above table, I am expecting following output:
event_type value
2 -5
3 4
As I know in SQL Sever/Oracle, this can be achieved using row_number() over (partition by).
You could always simulate ROW_NUMBER:
WITH cte AS
(
SELECT *,
(SELECT COUNT(*) + 1
FROM "events" e1
WHERE e1.event_type = e.event_type
AND e1.time > e.time) AS rn
FROM "events" e
)
SELECT c.event_type, c."value" - c2."value" AS "value"
FROM cte c
JOIN cte c2
ON c.event_type = c2.event_type
AND c.rn = 1 AND c2.rn = 2
ORDER BY event_type, time;
SqlFiddleDemo
Output:
╔═══════════════╦═══════╗
║ event_type ║ value ║
╠═══════════════╬═══════╣
║ 2 ║ -5 ║
║ 3 ║ 4 ║
╚═══════════════╩═══════╝
Identifiers like time/events/value are reserwed words in some SQL dialects.

SQL Grouping Integers by Range

I have integer values: (199903, 199908, 201203, 201408, 201410, 201501, 201503)
and I would like to group these integers by integers falling within a range of 3.
In this example the grouping would be the following:
199903 (group 1)
199908 (group 2)
201203 (group 3)
201408 (group 4)
201410 (group 4)
201501 (group 5)
201503 (group 5)
You can use windowed function DENSE_RANK:
LiveDemo
CREATE TABLE #mytable(val INTEGER);
INSERT INTO #mytable(val)
VALUES(199903),(199908),(201203),(201408),(201410),(201501),(201503);
SELECT
val,
[group] = DENSE_RANK() OVER (ORDER BY val/3)
FROM #mytable;
Output:
╔════════╦═══════╗
║ val ║ group ║
╠════════╬═══════╣
║ 199903 ║ 1 ║
║ 199908 ║ 2 ║
║ 201203 ║ 3 ║
║ 201408 ║ 4 ║
║ 201410 ║ 4 ║
║ 201501 ║ 5 ║
║ 201503 ║ 5 ║
╚════════╩═══════╝
I suspect you mean sequences that differ by three or less. So, a new period starts when the difference is greater than 3. In SQL Server 2012+, you can use lag() for this. In SQL Server 2008, here is one way:
with t as (
select t.*,
(case when t.val - tprev.val < 3 then 0 else 1 end) as IsGroupStart
from table t outer apply
(select top 1 t2.val
from table t2
where t2.val < t.val
order by t2.val desc
) tprev
) t
select t.val, t2.grp
from t outer apply
(select sum(IsGroupStart) as grp
from t t2
where t2.val <= t.val
) t2;

How to select the equivalent row for another column of the max(column) in group by in SQL Server

I need to make a change in the Sql below to make CreatedOn return the selected record of the Max(Value). You can observe the -- todo line.
Should return: 2/01/2015 and 8/01/2015 as you can see in Query Result,
but the Max(CreatedOn) will select the max and not the referent record
of the Max(Value).
Sql
SET DATEFIRST 1
SELECT
CONCAT(DATEPART(YEAR, CreatedOn),DATEPART(WEEK, CreatedOn)) Week,
MAX(CreatedOn) CreatedOn, -- todo: this should return 2/01/2015 and 8/01/2015
MAX(Value) AS MaxValue
FROM Table1
GROUP BY CONCAT(DATEPART(YEAR, CreatedOn),DATEPART(WEEK, CreatedOn))
Table 1:
╔════╦═══════════╦═══════╗
║ Id ║ CreatedOn ║ Value ║
╠════╬═══════════╬═══════╣
║ 1 ║ 1/01/2015 ║ 1 ║
║ 2 ║ 2/01/2015 ║ 2 ║
║ 3 ║ 8/01/2015 ║ 4 ║
║ 4 ║ 9/01/2015 ║ 2 ║
╚════╩═══════════╩═══════╝
Query Result:
╔════════╦═══════════╦══════════╗
║ Week ║ CreatedOn ║ MaxValue ║
╠════════╬═══════════╬══════════╣
║ 2015 1 ║ 2/01/2015 ║ 2 ║
║ 2015 2 ║ 8/01/2015 ║ 4 ║
╚════════╩═══════════╩══════════╝
*Edited: I need to return 8/01/2015 because it is the correspondent row of the MaxValue (4).
You can use the ROW_NUMBER() over a partition of each week (PARTITION BY Week), ordering by the descending value (ORDER BY Value DESC) to 'rank' each record within the week. Selecting the row with the highest value in each week is then simply the top ranked row in each partition (WHERE Rnk = 1). I've used CTEs to prevent the repetition of the Week calculation.
WITH Weeks AS
(
SELECT CONCAT(DATEPART(YEAR, CreatedOn),DATEPART(WEEK, CreatedOn)) Week,
Id, CreatedOn, Value
FROM Table1
),
Ranked As
(
SELECT Week, CreatedOn, Value,
ROW_NUMBER() OVER (PARTITION BY Week ORDER BY Value DESC) Rnk
FROM Weeks
)
SELECT Week, CreatedOn, Value
FROM Ranked
WHERE Rnk = 1;
SqlFiddle here
Your query is correct, however there is a problem with the date format. You are reading your dates as dd/mm/yyyy while DB is interpreting them as mm/dd/yyyy.
So a quick fix is, use proper format while inserting values like
insert into Table1 ( id , CreatedOn, value)
values (1 , '01/01/2015' , 1 )
insert into Table1 ( id , CreatedOn, value)
values (2 , '01/02/2015' , 2 )
insert into Table1 ( id , CreatedOn, value)
values (3 , '01/09/2015' , 4 )
insert into Table1 ( id , CreatedOn, value)
values (4 , '01/08/2015' , 2)
I tried in this worked. Let me know if you need SQLFiddle of the same.