I have a table of employee timeclock punches that looks something like this:
| EmployeeID | PunchDate | PunchTime | PunchType | Sequence |
|------------|------------|-----------|-----------|----------|
| 5386 | 12/27/2016 | 03:57:42 | On Duty | 552 |
| 5386 | 12/27/2016 | 09:30:00 | Off Duty | 563 |
| 5386 | 12/27/2016 | 010:02:00 | On Duty | 564 |
| 5386 | 12/27/2016 | 12:10:00 | Off Duty | 570 |
| 5386 | 12/27/2016 | 12:22:00 | On Duty | 571 |
| 5386 | 12/27/2016 | 05:13:32 | Off Duty | 578 |
What I need to do is delete any rows where the difference in minutes between an Off Duty punch and the following On Duty punch is less than, say, 25 minutes. In the example above, I would want to remove Sequence 570 and 571.
I'm already creating this table by pulling all Off Duty punches from another table and using this query to pull all On Duty punches that follow an Off Duty punch:
SELECT * FROM [dbo].[Punches]
INSERT INTO [dbo].[UpdatePunches (EmployeeID,PunchDate,PunchTime,PunchType,Sequence)
SELECT * FROM [dbo].[Punches]
WHERE Sequence IN (
SELECT Sequence + 1
FROM [dbo].[Punches]
WHERE PunchType LIKE 'Off Duty%') AND
PunchType LIKE 'On Duty%'
I have been trying to fit some sort of DATEDIFF query both in this code and as a separate step to weed these out, but have not had any luck. I can't use specific Sequence numbers because those are going to change for every punch.
I'm using SQL Server 2008.
Any suggestions would be much appreciated.
You can assign rownumbers per employee based on punchdate and punchtime and join each row with the next based on ascending order of date and time.
Thereafter, get the rownumbers of those rows where the difference is less than 25 minutes and finally delete those rows.
with rownums as
(select t.*,row_number() over(partition by employeeid
order by cast(punchdate +' '+punchtime as datetime) ) as rn
from t)
,rownums_to_delete as
(
select r1.rn,r1.employeeid
from rownums r1
join rownums r2 on r1.employeeid=r2.employeeid and r1.rn=r2.rn+1
where dateadd(minute,25,cast(r2.punchdate +' '+r2.punchtime as datetime)) > cast(r1.punchdate +' '+r1.punchtime as datetime)
and r1.punchtype <> r2.punchtype
union all
select r2.rn, r2.employeeid
from rownums r1
join rownums r2 on r1.employeeid=r2.employeeid and r1.rn=r2.rn+1
where dateadd(minute,25,cast(r2.punchdate +' '+r2.punchtime as datetime)) > cast(r1.punchdate +' '+r1.punchtime as datetime)
and r1.punchtype <> r2.punchtype
)
delete r
from rownums_to_delete rd
join rownums r on rd.employeeid=r.employeeid and r.rn=rd.rn
Sample Demo
If date and time columns are not varchar but actual date and time datatype, use punchdate+punchtime in the query.
Edit: An easier version of the query would be
with todelete as (
select t1.employeeid,cast(t2.punchdate+' '+t2.punchtime as datetime) as punchtime,
t2.punchtype,t2.sequence,
cast(t1.punchdate+' '+t1.punchtime as datetime) next_punchtime,
t1.punchtype as next_punchtype,t1.sequence as next_sequence
from t t1
join t t2 on t1.employeeid=t2.employeeid
and cast(t2.punchdate+' '+t2.punchtime as datetime) between dateadd(minute,-25,cast(t1.punchdate+' '+t1.punchtime as datetime)) and cast(t1.punchdate+' '+t1.punchtime as datetime)
where t2.punchtype <> t1.punchtype
)
delete t
from t
join todelete td on t.employeeid = td.employeeid
and cast(t.punchdate+' '+t.punchtime as datetime) in (td.punchtime,td.next_punchtime)
;
SQL Server has a nice ability called updatable CTEs. Using lead() and lag(), you can do exactly what you want. The following assumes that the date is actually stored as a datetime -- this is just for the convenience of adding the date and time together (you can also explicitly use conversion):
with todelete as (
select tcp.*,
(punchdate + punchtime) as punchdatetime.
lead(punchtype) over (partition by employeeid order by punchdate, punchtime) as next_punchtype,
lag(punchtype) over (partition by employeeid order by punchdate, punchtime) as prev_punchtype,
lead(punchdate + punchtime) over (partition by employeeid order by punchdate, punchtime) as next_punchdatetime,
lag(punchdate + punchtime) over (partition by employeeid order by punchdate, punchtime) as prev_punchdatetime
from timeclockpunches tcp
)
delete from todelete
where (punchtype = 'Off Duty' and
next_punchtype = 'On Duty' and
punchdatetime > dateadd(minute, -25, next_punchdatetime)
) or
(punchtype = 'On Duty' and
prev_punchtype = 'Off Duty' and
prev_punchdatetime > dateadd(minute, -25, punchdatetime)
);
EDIT:
In SQL Server 2008, you can do use the same idea, just not as efficiently:
delete t
from t outer apply
(select top 1 tprev.*
from t tprev
where tprev.employeeid = t.employeeid and
(tprev.punchdate < t.punchdate or
(tprev.punchdate = t.punchdate and tprev.punchtime < t.punchtime)
)
order by tprev.punchdate desc, tprev.punchtime desc
) tprev outer apply
(select top 1 tnext.*
from t tnext
where tnext.employeeid = t.employeeid and
(t.punchdate < tnext.punchdate or
(t.punchdate = tnext.punchdate and t.punchtime < tnext.punchtime)
)
order by tnext.punchdate desc, tnext.punchtime desc
) tnext
where (t.punchtype = 'Off Duty' and
tnext.punchtype = 'On Duty' and
t.punchdatetime > dateadd(minute, -25, tnext.punchdatetime)
) or
(t.punchtype = 'On Duty' and
tprev.punchtype = 'Off Duty' and
tprev.punchdatetime > dateadd(minute, -25, t.punchdatetime)
);
You could create a DateTime from the Date and Time fields in a CTE and then lookup the next On Duty Time after the Off Duty Time like below:
;
WITH OnDutyDateTime AS
(
SELECT
EmployeeID,
Sequence,
DutyDateTime = DATEADD(ms, DATEDIFF(ms, '00:00:00', PunchTime), CONVERT(DATETIME, PunchDate))
FROM
#TempEmployeeData
where PunchType = 'On Duty'
),
OffDutyDateTime As
(
SELECT
EmployeeID,
Sequence,
DutyDateTime = DATEADD(ms, DATEDIFF(ms, '00:00:00', PunchTime), CONVERT(DATETIME, PunchDate))
FROM
#TempEmployeeData
where PunchType = 'Off Duty'
)
SELECT
OffDutyDateTime = DutyDateTime,
OnDutyDateTime = (SELECT TOP 1 DutyDateTime FROM OnDutyDateTime WHERE EmployeeID = A.EmployeeID AND Sequence > A.Sequence ORDER BY Sequence ASC ),
DiffInMinutes = DATEDIFF(minute,DutyDateTime,(SELECT TOP 1 DutyDateTime FROM OnDutyDateTime WHERE EmployeeID = A.EmployeeID AND Sequence > A.Sequence ORDER BY Sequence ASC ))
FROM
OffDutyDateTime A
OffDutyDateTime OnDutyDateTime DiffInMinutes
----------------------- ----------------------- -------------
2016-12-27 09:30:00.000 2016-12-27 10:02:00.000 32
2016-12-27 12:10:00.000 2016-12-27 12:22:00.000 12
2016-12-28 05:13:32.000 NULL NULL
(3 row(s) affected)
Maybe something like this would be easy to slap in there.. This simply uses a subquery to find the next 'on duty' punch and compare it in the main query to the 'off duty' punch.
Delete
FROM [dbo].[Punches] p
where p.PunchTime >=
dateadd(minute, -25, isnull (
(select top 1 p2.PunchTime from [dbo].[Punches] p2 where
p2.EmployeeID=p.EmployeeID and p2.PunchType='On Duty'
and p1.Sequence < p2.Sequence and p2.PunchDate=p.PunchDate
order by p2.Sequence asc)
),'2500-01-01')
and p.PunchType='Off Duty'
Related
I have the below kind of data and I need below kind of output.
Input:
id startdate enddate
1 21/01/2019 23/01/2019
1 23/01/2019 24/01/2019
1 24/01/2029 27/01/2019
1 29/01/2019 02/02/2019
Output:
id startdate enddate
1 21/01/2019 27/01/2019
1 29/01/2019 02/02/2019
We need to use the logic of matching the first record enddate and nth record startdate.
This is a gaps-and-islands problem, where you want to group together "adjacent" dates. Here is one approach using window functions: the idea is to compare the current start date to the end date of the "previous" row, and use a window sum to define the groups:
select id, min(startdate) startdate, max(enddate) enddate
from (
select t.*,
sum(case when startdate = lag_enddate then 0 else 1 end) over(partition by id order by startdate) grp
from (
select t.*,
lag(enddate) over(partition by id order by startdate) lag_enddate
from mytable t
) t
) t
group by id, grp
Demo on DB Fiddle - with credits to Sander for creating the DDL statements in the first place:
id | startdate | enddate
-: | :--------- | :---------
1 | 2019-01-21 | 2019-01-27
1 | 2019-01-29 | 2019-02-02
have a look at
NEXT VALUE FOR method, works 2016 and later
Use a CTE or subquery (works in 2008) where you join on your own table using the previous value as a join. Here a sample script I use showing backup growth
declare #backupType char(1)
, #DatabaseName sysname
set #DatabaseName = db_name() --> Name of current database, null for all databaseson server
set #backupType ='D' /* valid options are:
D = Database
I = Database Differential
L = Log
F = File or Filegroup
G = File Differential
P = Partial
Q = Partial Differential
*/
select backup_start_date
, backup_finish_date
, DurationSec
, database_name,backup_size
, PreviouseBackupSize
, backup_size-PreviouseBackupSize as growth
,KbSec= format(KbSec,'N2')
FROM (
select backup_start_date
, backup_finish_date
, datediff(second,backup_start_date,b.backup_finish_date) as DurationSec
, b.database_name
, b.backup_size/1024./1024. as backup_size
,case when datediff(second,backup_start_date,b.backup_finish_date) >0
then ( b.backup_size/1024.)/datediff(second,backup_start_date,b.backup_finish_date)
else 0 end as KbSec
-- , b.compressed_backup_size
, (
select top (1) p.backup_size/1024./1024.
from msdb.dbo.backupset p
where p.database_name = b.database_name
and p.database_backup_lsn< b.database_backup_lsn
and type=#backupType
order by p.database_backup_lsn desc
) as PreviouseBackupSize
from msdb.dbo.backupset as b
where #DatabaseName IS NULL OR database_name =#DatabaseName
and type=#backupType
)as A
order by backup_start_date desc
using a "cursor local fast_forward" to loop over the data on a row-by-row and use a temporary table where you store & compaire prev value
Here is a solution with common table expressions that could work.
Sample data
create table data
(
id int,
startdate date,
enddate date
);
insert into data (id, startdate, enddate) values
(1, '2019-01-21', '2019-01-23'),
(1, '2019-01-23', '2019-01-24'),
(1, '2019-01-24', '2019-01-27'),
(1, '2019-01-29', '2019-02-02');
Solution
-- determine start dates
with cte_start as
(
select s.id,
s.startdate
from data s
where not exists ( select 'x'
from data e
where e.id = s.id
and e.enddate = s.startdate )
),
-- determine date boundaries
cte_startnext as
(
select s.id,
s.startdate,
lead(s.startdate) over (partition by s.id order by s.startdate) as startdate_next
from cte_start s
)
-- determine periods
select sn.id,
sn.startdate,
e.enddate
from cte_startnext sn
cross apply ( select top 1 e.enddate
from data e
where e.id = sn.id
and e.startdate >= sn.startdate
and (e.startdate < sn.startdate_next or sn.startdate_next is null)
order by e.enddate desc ) e
order by sn.id,
sn.startdate;
Result
id startdate enddate
-- ---------- ----------
1 2019-01-21 2019-01-27
1 2019-01-29 2019-02-02
Fiddle to see build up of solution and intermediate CTE results.
Situation:
I have 5 columns
id
subtotal (price of item)
order_date (purchase date)
updated_at (if refunded or any other status change)
status
Objective:
I need the order date as column 1
I need to get the subtotal for each day regardless if of the status as column 2
I need the subtotal amount for refunds for the third column.
Example:
If a purchase is made on May 1st and refunded on May 3rd. The output should look like this
+-------+----------+--------+
| date | subtotal | refund |
+-------+----------+--------+
| 05-01 | 10.00 | 0.00 |
| 05-02 | 00.00 | 0.00 |
| 05-03 | 00.00 | 10.00 |
+-------+----------+--------+
while the row will look like that
+-----+----------+------------+------------+----------+
| id | subtotal | order_date | updated_at | status |
+-----+----------+------------+------------+----------+
| 123 | 10 | 2019-05-01 | 2019-05-03 | refunded |
+-----+----------+------------+------------+----------+
Query:
Currently what I have looks like this:
Note: Timezone discrepancy therefore bring back the dates by 8 hours.
;with cte as (
select id as orderid
, CAST(dateadd(hour,-8,order_date) as date) as order_date
, CAST(dateadd(hour,-8,updated_at) as date) as updated_at
, subtotal
, status
from orders
)
select
b.dates
, sum(a.subtotal_price) as subtotal
, -- not sure how to aggregate it to get the refunds
from Orders as o
inner join cte as a on orders.id=cte.orderid
inner join (select * from cte where status = ('refund')) as b on o.id=cte.orderid
where dates between '2019-05-01' and '2019-05-31'
group by dates
And do I need to join it twice? Hopefully not since my table is huge.
This looks like a job for a Calendar Table. Bit of a stab in the dark, but:
--Overly simplistic Calendar table
CREATE TABLE dbo.Calendar (CalendarDate date);
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3, N N4, N N5) --Many years of data
INSERT INTO dbo.Calendar
SELECT DATEADD(DAY, T.I, 0)
FROM Tally T;
GO
SELECT C.CalendarDate AS [date],
CASE C.CalendarDate WHEN V.order_date THEN subtotal ELSE 0 END AS subtotal,
CASE WHEN C.CalendarDate = V.updated_at AND V.[status] = 'refunded' THEN subtotal ELSE 0.00 END AS subtotal
FROM (VALUES(123,10.00,CONVERT(date,'20190501'),CONVERT(date,'20190503'),'refunded'))V(id,subtotal,order_date,updated_at,status)
JOIN dbo.Calendar C ON V.order_date <= C.CalendarDate AND V.updated_at >= C.CalendarDate;
GO
DROP TABLE dbo.Calendar;
Consider joining on a recursive CTE of sequential dates:
WITH dates AS (
SELECT CONVERT(datetime, '2019-01-01') AS rec_date
UNION ALL
SELECT DATEADD(d, 1, CONVERT(datetime, rec_date))
FROM dates
WHERE rec_date < '2019-12-31'
),
cte AS (
SELECT id AS orderid
, CAST(dateadd(hour,-8,order_date) AS date) as order_date
, CAST(dateadd(hour,-8,updated_at) AS date) as updated_at
, subtotal
, status
FROM orders
)
SELECT rec_date AS date,
CASE
WHEN c.order_date = d.rec_date THEN subtotal
ELSE 0
END AS subtotal,
CASE
WHEN c.updated_at = d.rec_date THEN subtotal
ELSE 0
END AS refund
FROM cte c
JOIN dates d ON d.rec_date BETWEEN c.order_date AND c.updated_at
WHERE c.status = 'refund'
option (maxrecursion 0)
GO
Rextester demo
I have a SQL Server query that I'm trying to convert to run in BigQuery. There are three tables involved:
CalendarMonths
FirstDayOfMonth | FirstDayOfNextMonth
----------------------------+----------------------------
2017-02-01 00:00:00.000 UTC | 2017-03-01 00:00:00.000 UTC
2017-03-01 00:00:00.000 UTC | 2017-04-01 00:00:00.000 UTC
Clients
clientid | name | etc.
---------+----------------+------
1 | Bob's Shop |
2 | Anne's Cookies |
ClientLogs
id | clientid | timestamp | price_current | price_old | license_count_current | license_count_old |
----+----------+----------------+---------------+-----------+-----------------------+---------------
1 | 1 | 2017-02-01 UTC | 1200 | 0 | 10 | 0 |
2 | 1 | 2018-02-03 UTC | 2400 | 1200 | 20 | 10 |
3 | 2 | 2016-07-13 UTC | 1200 | 0 | 10 | 0 |
4 | 2 | 2018-03-30 UTC | 0 | 1200 | 0 | 10 |
The T-SQL query looks something like this:
SELECT
FirstDayOfMonth, FirstDayOfNextMonth,
(SELECT SUM(sizeatdatelog.price_current)
FROM clients c
CROSS APPLY (SELECT TOP 1 *
FROM clientlogs
WHERE clientid = c.clientid
AND [timestamp] < cm.FirstDayOfMonth
ORDER BY [timestamp] DESC) sizeatdatelog
WHERE sizeatdatelog.license_count_current > 0) as StartingRevenue,
(another subquery for starting client count) as StartingClientCount,
(another subquery for churned revenue) as ChurnedRevenue,
(there are about 6 other subqueries)
FROM
CalendarMonths cm
ORDER BY
cm.FirstDayOfMonth
And the final output looks like:
FirstDayOfMonth | FirstDayOfNextMonth | StartingRevenue | StartingClientCount | etc
-------------------------------------------------------------------------------------------------------
2017-02-01 00:00:00.000 UTC | 2017-03-01 00:00:00.000 UTC | 68382995.43 | 79430 |
2017-03-01 00:00:00.000 UTC | 2017-04-01 00:00:00.000 UTC | 69843625.12 | 80430 |
In BigQuery, I added a simple subquery in the select clause and it worked great:
SELECT FirstDayOfMonth, FirstDayOfNextMonth, (SELECT clientId FROM clientlogs LIMIT 1 ) as cl
FROM CalendarMonths cm
ORDER BY cm.FirstDayOfMonth
However, as soon as I add a where clause to the subquery, I get this error message:
Error: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
How should I proceed from this point? If I can't get the results I'm looking for in one query, maybe I should look into creating multiple scheduled jobs that create temporary tables and then a final scheduled job that joins it all together. Or maybe I could look at doing this in code via GCP or use the BigQuery API in app scripts. The data size isn't huge and the query isn't run often. I'm looking for maintainability more than efficiency, so ideally there is a way to get this data into one query.
Below is for BigQuery Standard SQL
#standardSQL
SELECT FirstDayOfMonth, FirstDayOfNextMonth,
SUM(price_current) StartingRevenue, COUNT(1) StartingClientCount
FROM (
SELECT FirstDayOfMonth, FirstDayOfNextMonth,
clientid, price_current
FROM (
SELECT FirstDayOfMonth, FirstDayOfNextMonth, clientid,
FIRST_VALUE(price_current) OVER(latest_values) price_current,
FIRST_VALUE(license_count_current) OVER(latest_values) license_count_current
FROM `project.dataset.CalendarMonths` cm
JOIN `project.dataset.ClientLogs` cl
ON `timestamp` < FirstDayOfMonth
WINDOW latest_values AS (PARTITION BY clientid ORDER BY `timestamp` DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
)
WHERE license_count_current > 0
GROUP BY FirstDayOfMonth, FirstDayOfNextMonth, clientid, price_current
)
GROUP BY FirstDayOfMonth, FirstDayOfNextMonth
ORDER BY FirstDayOfMonth
most likely above can be extended to the rest of your subqueries
Correlated subquery like
SELECT TOP 1 *
FROM clientlogs
WHERE clientid = c.clientid
AND [timestamp] < cm.FirstDayOfMonth
ORDER BY [timestamp] DESC)
in BigQuery usually needs to be rewritten through aggregation along the lines of
SELECT ARRAY_AGG(foo ORDER BY [timestamp] DESC LIMIT 1)[offset(0)]
FROM ... as foo
WHERE correlated condition
BigQuery more likely to work with simple correlated subqueries in the form of
SELECT
{optional aggregation}
FROM table
WHERE {correlated condition}
For the sake of the community I'm posting the query I ended up using. Huge thanks to Mikhail Berlyant for his help with this one.
I ended up breaking the query into CTEs so I could use correlated subqueries to get the specific data I needed.
WITH previousMonths AS (
SELECT *
FROM (
SELECT FirstDayOfMonth, FirstDayOfNextMonth, account_c,
FIRST_VALUE(acl.timestamp_c ) OVER (start_values) timestamp_c,
FIRST_VALUE(acl.acv_current_c ) OVER (start_values) acv_current_c,
FIRST_VALUE(acl.license_count_current_c) OVER(start_values) license_count_current_c,
FIRST_VALUE(acl.price_current_c) OVER (start_values) price_current_c
FROM warehouse.project.calendar_months cm
JOIN warehouse.project.account_change_logs acl ON timestamp_c < FirstDayOfMonth
WINDOW start_values AS (PARTITION BY account_c, FirstDayOfMonth ORDER BY timestamp_c DESC)
)
GROUP BY FirstDayOfMonth, FirstDayOfNextMonth, account_c,
timestamp_c, acv_current_c, license_count_current_c, price_current_c
),
currentMonth AS (
SELECT *
FROM (
SELECT FirstDayOfMonth, FirstDayOfNextMonth, account_c,
FIRST_VALUE(acl.timestamp_c ) OVER (change_values) timestamp_c,
FIRST_VALUE(acl.acv_current_c ) OVER (change_values) acv_current_c,
FIRST_VALUE(acl.license_count_current_c) OVER(change_values) license_count_current_c,
FIRST_VALUE(acl.acv_old_c) OVER(PARTITION BY account_c, FirstDayOfMonth ORDER BY timestamp_c) acv_old_at_start_of_month_c,
FIRST_VALUE(acl.license_count_old_c) OVER(PARTITION BY account_c, FirstDayOfMonth ORDER BY timestamp_c) license_count_old_at_start_of_month_c,
FIRST_VALUE(acl.price_current_c) OVER (change_values) price_current_c
FROM warehouse.project.calendar_months cm
JOIN warehouse.project.account_change_logs acl
ON timestamp_c >= FirstDayOfMonth AND timestamp_c < FirstDayOfNextMonth
WINDOW change_values AS (PARTITION BY account_c, FirstDayOfMonth ORDER BY timestamp_c DESC)
)
GROUP BY FirstDayOfMonth, FirstDayOfNextMonth, account_c,
timestamp_c, acv_current_c, acv_old_at_start_of_month_c, license_count_current_c,
license_count_old_at_start_of_month_c, price_current_c
)
SELECT FirstDayOfMonth, FirstDayOfNextMonth,
(SELECT COUNT(acv_current_c) FROM previousMonths pm WHERE pm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_current_c > 0) as StartingAccounts,
(SELECT COUNT(acv_current_c) FROM currentMonth cm WHERE cm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_old_at_start_of_month_c = 0 AND license_count_current_c > 0) as NewAccounts,
(SELECT COUNT(acv_current_c) FROM currentMonth cm WHERE cm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_current_c = 0) as ChurnAccounts,
(SELECT SUM(license_count_current_c) FROM previousMonths pm WHERE pm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_current_c > 0) as StartingUsers,
(SELECT SUM(license_count_current_c) FROM currentMonth cm WHERE cm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_old_at_start_of_month_c = 0 AND license_count_current_c > 0) as NewUsers,
(SELECT SUM(license_count_current_c - license_count_old_at_start_of_month_c) FROM currentMonth cm WHERE cm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_old_at_start_of_month_c < license_count_current_c
AND license_count_old_at_start_of_month_c <> 0) as ExpansionUsers,
(SELECT SUM(license_count_old_at_start_of_month_c - license_count_current_c) FROM currentMonth cm WHERE cm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_old_at_start_of_month_c > license_count_current_c
AND license_count_current_c <> 0) as ContractionUsers,
(SELECT SUM(license_count_old_at_start_of_month_c - license_count_current_c) FROM currentMonth cm WHERE cm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_old_at_start_of_month_c > license_count_current_c
AND license_count_current_c = 0) as ChurnUsers,
(SELECT SUM(acv_current_c) FROM previousMonths pm WHERE pm.FirstDayOfMonth = cal.FirstDayOfMonth
AND license_count_current_c > 0) as StartingARR
--etc, etc,
FROM warehouse.project.calendar_months cal
ORDER BY FirstDayOfMonth
I have the following two columns.
Date | Market Value
------------------------------
2016-09-08 | 100
2016-09-07 | 130
2016-09-06 | 140
2016-09-05 | 180
I want to add a column that calulcate the difference in Market Value between the two dates.
Date | Market Value | Delta
------------------------------------------
2016-09-08 | 100 | -30
2016-09-07 | 130 | -10
2016-09-06 | 140 | -40
2016-09-05 | 180 |
.
100 (2016-09-08) minus 130 (2016-09-07) = -30
How do I write that function?
In SQL Server 2012+ the most efficient and simple way is to use the built-in LEAD function.
SELECT
[Date]
,[Market Value]
,LEAD([Market Value]) OVER (ORDER BY [Date] DESC) - [Market Value] AS Delta
FROM YourTable
;
LEAD returns the value of the next row as specified by its ORDER BY clause.
All other methods that self-join the table are less efficient.
If you have continous date you can do
select t1.date, t1.market_value, t1.market_value-t2.market_value from data_table t1 left join data_table t2 on t1.date-1=t2.date
If you dont have continous date and want to calculate diffrence between monday and friday you can use rownum for example like this
select t1.date, t1.market_value, t1.market_value-t2.market_value from (select rownum, date,market_value from data_table) t1 left join (select rownum, date,market_value from data_table) t2 on t1.rownum-1=t2.rownum
CREATE PROCEDURE UPDATE_DELTA
#START_DATE DATETIME,
#END_DATE DATETIME
AS BEGIN
UPDATE T
SET DELTA = MARKET_VALUE - (SELECT MARKET_VALUE
FROM YOURTABLE
WHERE [DATE] = T.[DATE] - 1)
FROM YOURTABLE T
WHERE [DATE] BETWEEN #START_DATE AND #END_DATE
END
And then to execute:
EXEC UPDATE_DELTA '2016-09-05', '2016-09-08'
This works as long as you have sequenced dates.
For SQL-Server below 2012 you could try this:
with cte as
(SELECT
ROW_NUMBER() OVER (ORDER BY [Date] DESC) row,
[Date],
[Market Value]
FROM [YourTable])
SELECT
a.[Date] ,
b.[Market Value] - ISNULL(a.[Market Value],0) AS Delta
FROM
cte a
LEFT JOIN cte b
on a.row = b.row+1
The original post is from here: SQL difference between rows
For SQL-Server 2012 and above you can use the recommended LEAD-Function.
Add column and update in the following way:
UPDATE t SET t.Delta = t.Market_Value-t2.Market_Value
FROM yourtable t
INNER JOIN yourtable t2 ON DATEADD(DD,-1,t.Date) = t2.Date
I would like to know how to make intersections or concatenations of adjacent date ranges in sql.
I have a list of customer start and end dates, for example (in dd/mm/yyyy format, where 31/12/9999 means the customer is still a current customer).
CustID | StartDate | Enddate |
1 | 01/08/2011|19/06/2012|
1 | 20/06/2012|07/03/2012|
1 | 03/05/2012|31/12/9999|
2 | 09/03/2009|16/08/2009|
2 | 16/01/2010|10/10/2010|
2 | 11/10/2010|31/12/9999|
3 | 01/08/2010|19/08/2010|
3 | 20/08/2010|26/12/2011|
Although the dates in different rows don't overlap, I would consider some of the ranges as a contigous period of time, e.g when the start date comes one day after an end date (for a given customer). Hence I would like to return a query that returns just the intersection of the dates,
CustID | StartDate | Enddate |
1 | 01/08/2011|07/03/2012|
1 | 03/05/2012|31/12/9999|
2 | 09/03/2009|16/08/2009|
2 | 16/01/2010|31/12/9999|
3 | 01/08/2010|26/12/2011|
I've looked at CTE tables, but I can't figure out how to return just one row for one contigous block of dates.
This should work in 2005 forward:
;WITH cte2 AS (SELECT 0 AS Number
UNION ALL
SELECT Number + 1
FROM cte2
WHERE Number < 10000)
SELECT CustID, Min(GroupStart) StartDate, MAX(EndDate) EndDate
FROM (SELECT *
, DATEADD(DAY,b.number,a.StartDate) GroupStart
, DATEADD(DAY,1- DENSE_RANK() OVER (PARTITION BY CustID ORDER BY DATEADD(DAY,b.number,a.StartDate)),DATEADD(DAY,b.number,a.StartDate)) GroupDate
FROM Table1 a
JOIN cte2 b
ON b.number <= DATEDIFF(d, startdate, EndDate)
) X
GROUP BY CustID, GroupDate
ORDER BY CustID, StartDate
OPTION (MAXRECURSION 0)
Demo: SQL Fiddle
You can build a quick table of numbers 0-something large enough to cover the spread of dates in your ranges to replace the cte so it doesn't run each time, indexed properly it will run quickly.
you can do this with recursive common table expression:
with cte as (
select t.CustID, t.StartDate, t.EndDate, t2.StartDate as NextStartDate
from Table1 as t
left outer join Table1 as t2 on t2.CustID = t.CustID and t2.StartDate = case when t.EndDate < '99991231' then dateadd(dd, 1, t.EndDate) end
), cte2 as (
select c.CustID, c.StartDate, c.EndDate, c.NextStartDate
from cte as c
where c.NextStartDate is null
union all
select c.CustID, c.StartDate, c2.EndDate, c2.NextStartDate
from cte2 as c2
inner join cte as c on c.CustID = c2.CustID and c.NextStartDate = c2.StartDate
)
select CustID, min(StartDate) as StartDate, EndDate
from cte2
group by CustID, EndDate
order by CustID, StartDate
option (maxrecursion 0);
sql fiddle demo
Quick performance tests:
Results on 750 rows, small periods of 2 days length:
sql fiddle demo
My query: 300 ms
Goat CO query with CTE: 10804 ms
Goat CO query with table of fixed numbers: 7 ms
Results on 5 rows, large periods:
sql fiddle demo
My query: 1 ms
Goat CO query with CTE: 700 ms
Goat CO query with table of fixed numbers: 36 ms