Computing difference in rows for all except consecutive days?

Computing difference in rows for all except consecutive days? - sql

I have a table as follows. I want to compute the difference in dates (in seconds) between consecutive rows according to the following:
If the dates differ by more than a day, then we go ahead and compute the difference
If the dates differ by more than a day and there are consecutive days with the value 84600 for the second date, then I want to first combine the dates before taking a difference
I am currently doing a self-join to handle the first case but am not sure if there is a good way to handle the second case. Any suggestion?
The following also gives an example:
CREATE TABLE #TEMP(Person VARCHAR(100), StartTime Datetime, TotalSeconds INT)
INSERT INTO #TEMP VALUES('A', '2013-02-20', 49800); -- We want to take the difference with the next row in this case
INSERT INTO #TEMP VALUES('A', '2013-02-25', 3000); -- Before taking the difference, I want to first merge the next four rows because 5th March is followed by three days with the value 86400
INSERT INTO #TEMP VALUES('A', '2013-03-05', 2100);
INSERT INTO #TEMP VALUES('A', '2013-03-06', 86400);
INSERT INTO #TEMP VALUES('A', '2013-03-07', 86400);
INSERT INTO #TEMP VALUES('A', '2013-03-08', 86400);
INSERT INTO #TEMP VALUES('A', '2013-03-09', 17100);
INSERT INTO #TEMP VALUES('B', '2012-04-24', 22500);
INSERT INTO #TEMP VALUES('B', '2012-04-26', 600);
INSERT INTO #TEMP VALUES('B', '2012-04-27', 10500);
INSERT INTO #TEMP VALUES('B', '2012-04-29', 41400);
INSERT INTO #TEMP VALUES('B', '2012-05-04', 86100);
SELECT *
FROM #TEMP
DROP TABLE #TEMP

The following handles the second case:
select Person, MIN(StartTime) as StartTime, MAX(StartTime) as maxStartTime
from (SELECT *,
dateadd(d, - ROW_NUMBER() over (partition by person order by StartTime), StartTime) as thegroup
FROM #TEMP t
) t
group by Person, thegroup
It groups all the time periods for a person, with consecutive dates collapsing into a single period (with a begin and end time). The trick is to assign a sequence number, using row_number() and then take the difference from StartTime. This difference is constant for a group of consecutive dates -- hence the outer group by.
You can use a with statement to put this into your query and then get the difference that you desire between consecutive rows.

Related

For loop in Microsoft SQL Server

I have a data which has 3 different numbers of "Equipement" and each "Equipement" has different contract date ( start_date and end_date).
Screen Data:
I want to write a script which I can say that for every "Equipement" If the first line of "end_date" match the second line of "start_date" in days, so I should do ("start_date" - 1 day) in the second line AS a new_end_date for the first line.
I've made an attempt, but for just the two first lines ( not generalized):
SELECT[Ref]
,[Equipement]
,[start_date]
,[end_date]
,CASE WHEN DATEDIFF(day, (SELECT [end_date] FROM [DWDiagnostics].[dbo].[Test1] WHERE [Ref] = 1290), (SELECT [start_date] FROM [DWDiagnostics].[dbo].[Test1] WHERE [Ref] = 1380)) < 0 THEN DATEADD(dd, -1, [start_date]) ELSE [end_date]
END AS [new_end_date]
FROM [DWDiagnostics].[dbo].[Test1]
Here's a screen of the result I want
SQL code for the Data ==>
DECLARE #Test TABLE
(
Ref VARCHAR(10),
Equipment VARCHAR(10),
start_date DATE,
end_date DATE
)
INSERT INTO #Test VALUES ('1290','9999','2014-03-01','2016-04-16')
INSERT INTO #Test VALUES ('1380','9999','2016-04-01','2018-05-17')
INSERT INTO #Test VALUES ('2000','9999','2018-05-01','2020-06-27')
INSERT INTO #Test VALUES ('2900','9999','2020-06-01','2021-06-29')
INSERT INTO #Test VALUES ('1556','8888','2016-01-01','2017-02-27')
INSERT INTO #Test VALUES ('1876','8888','2017-02-01','2018-04-26')
INSERT INTO #Test VALUES ('2897','8888','2018-04-01','2020-03-30')
INSERT INTO #Test VALUES ('2653','7777','2017-09-01','2018-10-14')
INSERT INTO #Test VALUES ('4536','7777','2018-10-01','2019-11-13')
INSERT INTO #Test VALUES ('2987','7777','2019-11-01','2020-12-27')
INSERT INTO #Test VALUES ('2776','7777','2020-12-01','2021-11-30')
SELECT * FROM #Test;

Thanks for posting sample data and tables structures. Makes this so much easier to work on the problem. This should work based on your explanation of the issue. However, some of the new_end_date values you posted as desired do not match up to your description. For example, with Equipment 9999 you have the second start_date as 4/1/2016 but in your desired output you show 3/30. The day before 4/1 is 3/31. There are some other examples with dates like that in your desired output that are slightly off the day before.
DECLARE #Test TABLE
(
Ref VARCHAR(10),
Equipment VARCHAR(10),
start_date DATE,
end_date DATE
)
INSERT INTO #Test VALUES ('1290','9999','2014-03-01','2016-04-16')
INSERT INTO #Test VALUES ('1380','9999','2016-04-01','2018-05-17')
INSERT INTO #Test VALUES ('2000','9999','2018-05-01','2020-06-27')
INSERT INTO #Test VALUES ('2900','9999','2020-06-01','2021-06-29')
INSERT INTO #Test VALUES ('1556','8888','2016-01-01','2017-02-27')
INSERT INTO #Test VALUES ('1876','8888','2017-02-01','2018-04-26')
INSERT INTO #Test VALUES ('2897','8888','2018-04-01','2020-03-30')
INSERT INTO #Test VALUES ('2653','7777','2017-09-01','2018-10-14')
INSERT INTO #Test VALUES ('4536','7777','2018-10-01','2019-11-13')
INSERT INTO #Test VALUES ('2987','7777','2019-11-01','2020-12-27')
INSERT INTO #Test VALUES ('2776','7777','2020-12-01','2021-11-30')
select *
, new_end_date = isnull(dateadd(day, -1, lead(start_date, 1)over(partition by Equipment order by start_date)), end_date)
from #Test
ORDER BY Equipment desc
, start_date

Counting number of transactions within past 1 hour on a particular user

Is there any way how to (in the best case, without using cursor) count number of transactions that the same user made in previous 1 hour.
That means that for this table
CREATE TABLE #TR (PK INT, TR_DATE DATETIME, USER_PK INT)
INSERT INTO #TR VALUES (1,'2018-07-31 06:02:00.000',10)
INSERT INTO #TR VALUES (2,'2018-07-31 06:36:00.000',10)
INSERT INTO #TR VALUES (3,'2018-07-31 06:55:00.000',10)
INSERT INTO #TR VALUES (4,'2018-07-31 07:10:00.000',10)
INSERT INTO #TR VALUES (5,'2018-07-31 09:05:00.000',10)
INSERT INTO #TR VALUES (6,'2018-07-31 06:05:00.000',11)
INSERT INTO #TR VALUES (7,'2018-07-31 06:55:00.000',11)
INSERT INTO #TR VALUES (8,'2018-07-31 07:10:00.000',11)
INSERT INTO #TR VALUES (9,'2018-07-31 06:12:00.000',12)
The result should be:
The solution could be something like: COUNT(*) OVER (PARTITION BY USER_PK ORDER BY TR_DATE ROWS BETWEEN ((WHERE DATEADD(HH,-1,PRECENDING.TR_DATE) > CURRENT ROW.TR_DATE) AND CURRENT ROW ...but I know that ROWS BETWEEN can not be used like that...

I am guessing SQL Server based on the syntax. In SQL Server, you can use apply:
select t.*, tr2.result
from #tr tr outer apply
(select count(*) as result
from #tr tr2
where tr2.user_id = tr.user_id and
tr2.tr_date > dateadd(hour, -1, tr.date) and
tr2.tr_date <= tr.tr_date
) tr2;

SELECT USER_PK, COUNT(*) AS TransactionCount
FROM #TR
WHERE DATEDIFF(MINUTE, TR_DATE, GETDATE()) <= 60
AND DATEDIFF(MINUTE, TR_DATE, GETDATE()) >= 0
GROUP BY USER_PK
You can change GETDATE() with whatever you want, but they need to have the same value

Can I write this query without CURSORs?

I am trying to find out the number of events that happened within a threshold time of timestamps found in another table for the same category. What is the fastest way to vary delta (in the case given below, delta is 5 minutes)? I just tested an approach using cursor (set a variable to 5 and then keep incrementing and executing the same query) but it is taking 10 seconds for each iteration. In my actual data, number of rows in #EVENTS is approximately equal to 100K and #CHANGES is about 500K.
My tables are as follows:
CREATE TABLE #EVENTS(Category varchar(20), Timestamp datetime)
GO
INSERT INTO #EVENTS VALUES('A', '2013-01-23 05:02:00.000')
INSERT INTO #EVENTS VALUES('A', '2013-01-23 05:04:00.000')
INSERT INTO #EVENTS VALUES('B', '2013-01-23 05:03:00.000')
INSERT INTO #EVENTS VALUES('B', '2013-01-21 05:02:00.000')
GO
CREATE TABLE #CHANGES(Category varchar(10), Timestamp datetime)
GO
INSERT INTO #CHANGES VALUES('A', '2013-01-23 05:00:00.000')
INSERT INTO #CHANGES VALUES('B', '2013-01-21 05:05:00.000')
SELECT *
FROM
(
SELECT X.Category, X.Timestamp, Y.Timestamp BeforeT, DATEADD(MINUTE, 5, Y.Timestamp) AfterT
FROM #EVENTS X, #CHANGES Y
WHERE X.Category = Y.Category
) X
WHERE X.Timestamp BETWEEN BeforeT AND AfterT
DROP TABLE #CHANGES
DROP TABLE #EVENTS
GO

Is this what you are looking for? It does a cross join to a CTE that defines the deltas:
with deltas as (
select 5 as delta union all
select 10 union all
select 20
)
SELECT *
FROM (SELECT e.Category, e.Timestamp, c.Timestamp BeforeT,
DATEADD(MINUTE, deltas.delta, c.Timestamp) AfterT,
deltas.delta
FROM #EVENTS e join
#CHANGES c
on e.Category = c.Category cross join
deltas
) X
WHERE X.Timestamp BETWEEN BeforeT AND AfterT
I also fixed your aliases. Queries read much better when the aliases are related to the underlying table name.

SQL SERVER: View to get minimum and maximum values from a table

I have a MSSQL Server table like this:
id (auto-increment)
amount
date
account_id
Data are inserted throughout the day. I now need a view to get the opening and closing amounts for each account for each day.
My trouble is creating a fast query to access both minimum and maximum values.
Creating a view to access just the minimum is fine using an in statement, however getting both minimum and maximum is tricky. I've tried using a with clause, but the query is incredibly slow.
BTW I am mapping the view to hibernate, so stored procedures and functions won't work the same way (that I know of).
Update
I guess my question wasn't clear from the responses I received. I want to get the opening and closing balances for each account. Maximum and minimum referred to getting the max and min (id) when grouped by date and account_id.
I want to get the amount when the id is equal to the maximum id (closing balance) and the amount when the id is equal to the minimum id (opening balance) for each account for each day.

SELECT account_id, date, MIN(amount), MAX(amount)
FROM <table>
GROUP BY account_id, date
There must be something missing from your question.

This does the work, don't have enough data to evaluate performance:
create table #accounts
(
id integer identity,
account_id integer,
amount decimal(18,3),
tran_date datetime
)
go
insert into #accounts values (1,124.56,'06/01/2009 09:34:56');
insert into #accounts values (1,125.56,'06/01/2009 10:34:56');
insert into #accounts values (1,126.56,'06/01/2009 11:34:56');
insert into #accounts values (2,124.56,'06/01/2009 09:34:56');
insert into #accounts values (2,125.56,'06/01/2009 10:34:56');
insert into #accounts values (2,126.56,'06/01/2009 11:34:56');
insert into #accounts values (3,124.56,'06/01/2009 09:34:56');
insert into #accounts values (3,125.56,'06/01/2009 10:34:56');
insert into #accounts values (3,126.56,'06/01/2009 11:34:56');
insert into #accounts values (4,124.56,'06/01/2009 09:34:56');
insert into #accounts values (4,125.56,'06/01/2009 10:34:56');
insert into #accounts values (4,126.56,'06/01/2009 11:34:56');
insert into #accounts values (1,124.56,'06/02/2009 09:34:56');
insert into #accounts values (1,125.56,'06/02/2009 10:34:56');
insert into #accounts values (1,126.56,'06/02/2009 11:34:56');
insert into #accounts values (2,124.56,'06/02/2009 09:34:56');
insert into #accounts values (2,125.56,'06/02/2009 10:34:56');
insert into #accounts values (2,126.56,'06/02/2009 11:34:56');
insert into #accounts values (3,124.56,'06/02/2009 09:34:56');
insert into #accounts values (3,125.56,'06/02/2009 10:34:56');
insert into #accounts values (3,126.56,'06/02/2009 11:34:56');
insert into #accounts values (4,124.56,'06/02/2009 09:34:56');
insert into #accounts values (4,125.56,'06/02/2009 10:34:56');
insert into #accounts values (4,126.56,'06/02/2009 11:34:56');
go
select
ranges.tran_day transaction_day,
ranges.account_id account_id,
bod.amount bod_bal,
eod.amount eod_bal
from
-- Subquery to define min/max records per account per day
(
select
account_id,
cast(convert(varchar(10),tran_date,101) as datetime) tran_day,
max(id) max_id,
min(id) min_id
from
#accounts
group by
account_id,
cast(convert(varchar(10),tran_date,101) as datetime)
) ranges
-- Beginning of day balance
JOIN #accounts bod
on (bod.id = ranges.min_id)
-- End of day balance
JOIN #accounts eod
on (eod.id = ranges.max_id)
go
If you need better performance, store the subquery to a temp table first and put an index on it for the joins ... that might speed it up a bit.

Based on John Saunders answer and Jeremy comment:
SELECT account_id, date, MIN(amount), MAX(amount)
FROM <table>
GROUP BY account_id, DatePart( Year, date ),DatePart( Month, date ), DatePart( Day, date )

Essentially I need the following query, but the with statement causes it to run slowly:
with x as (
select
MAX(ab.id) as maxId, MIN(ab.id) as minId
from Balance ab
group by ab.account_id, dbo.Get_PeriodDateFromDatetime(ab.StatementDate)
)
select
ab.Amount as openingBalance, ab2.Amount as closingBalance
from Balance ab, Balance ab2, x
where ab.id = x.maxId and ab2.id = x.minId

I don't know if this improves any, but the query you posted looks to be missing some parts, like account_id in the "with" query and joins on account_id in the main part:
with x as (
select
ab.account_id, MAX(ab.id) as closeId, MIN(ab.id) as openId
from Balance ab
group by ab.account_id, dbo.Get_PeriodDateFromDatetime(ab.StatementDate)
)
select
opbal.account_id, opbal.StatementDate,
opbal.Amount as openingBalance, clsbal.Amount as closingBalance
from Balance opbal, Balance clbal, x
where clsbal.id = x.closeId
and clsbal.ccount_id = x.account_id
and opbal.id = x.openId
and op.account_id = x.account_id
I'm a little concerned about the call to dbo.Get_PeriodDateFromDatetime(ab.StatementDate): if you have an index on account_id and StatementDate (you do have that index, don't you? It looks like a good candidate for a clustered index, too) then it's maybe not too bad, unless the table is massive.
How slow is "slow", by the way?

Pseudo Random Repeatable Sort in SQL Server (not NEWID() and not RAND())

I would like to randomly sort a result in a repeatable fashion for purposes such as paging. For this NEWID() is too random in that the same results cannot be re-obtained. Order by Rand(seed) would be ideal as with the same seed the same random collection would result. Unfortunately, the Rand() state resets with every row, does anyone have a solution?
declare #seed as int;
set #seed = 1000;
create table temp (
id int,
date datetime)
insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')
-- re-seeds for every item
select *, RAND(), RAND(id+#seed) as r from temp order by r
--1 2009-01-19 00:00:00.000 0.277720118060575 0.732224964471124
--2 2009-01-18 00:00:00.000 0.277720118060575 0.732243597442382
--3 2009-01-17 00:00:00.000 0.277720118060575 0.73226223041364
--4 2009-01-16 00:00:00.000 0.277720118060575 0.732280863384898
--5 2009-01-15 00:00:00.000 0.277720118060575 0.732299496356156
--6 2009-01-14 00:00:00.000 0.277720118060575 0.732318129327415
-- Note how the last column is +=~0.00002
drop table temp
-- interestingly this works:
select RAND(#seed), RAND()
--0.732206331499865 0.306382810665955
Note, I tried Rand(ID) but that just turns out to be sorted. Apparently Rand(n) < Rand(n+1)

Building off of gkrogers hash suggestion this works great. Any thoughts on performance?
declare #seed as int;
set #seed = 10;
create table temp (
id int,
date datetime)
insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')
-- re-seeds for every item
select *, HASHBYTES('md5',cast(id+#seed as varchar)) r
from temp order by r
--1 2009-01-19 00:00:00.000 0x6512BD43D9CAA6E02C990B0A82652DCA
--5 2009-01-15 00:00:00.000 0x9BF31C7FF062936A96D3C8BD1F8F2FF3
--4 2009-01-16 00:00:00.000 0xAAB3238922BCC25A6F606EB525FFDC56
--2 2009-01-18 00:00:00.000 0xC20AD4D76FE97759AA27A0C99BFF6710
--3 2009-01-17 00:00:00.000 0xC51CE410C124A10E0DB5E4B97FC2AF39
--6 2009-01-14 00:00:00.000 0xC74D97B01EAE257E44AA9D5BADE97BAF
drop table temp
EDIT: Note, the declaration of #seed as it's use in the query could be replace with a parameter or with a constant int if dynamic SQL is used. (declaration of #int in a TSQL fashion is not necessary)

You can use a value from each row to re-evaluate the rand function:
Select *, Rand(#seed + id) as r from temp order by r
adding the ID ensures that the rand is reseeded for each row. But for a value of seed you will always get back the same sequence of rows (provided that the table does not change)

Creating a hash can be much more time consuming than creating a seeded random number.
To get more variation in the ourput of RAND([seed]) you need to make the [seed] vary significantly too. Possibly such as...
SELECT
*,
RAND(id * 9999) AS [r]
FROM
temp
ORDER BY
r
Using a constant ensures the replicability you asked for. But be careful of the result of (id * 9999) causing an overflow if you expect your table to get big enough...

SELECT *, checksum(id) AS r FROM table ORDER BY r
This kind of works. Although the output from checksum() does not look all that random to me. The MSDN Documentation states:
[...], we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.
But may be it faster.

After doing some reading this is an accepted method.
Select Rand(#seed) -- now rand is seeded
Select *, 0 * id + Rand() as r from temp order by r
Having id in the expression causes it to be reevaluated every row. But multiplying it by 0 ensures that it doesnt not affect the outcome of rand.
What a horrible way of doing things!

create table temp (
id int,
date datetime)
insert into temp (id, date) values (1,'20090119')
insert into temp (id, date) values (2,'20090118')
insert into temp (id, date) values (3,'20090117')
insert into temp (id, date) values (4,'20090116')
insert into temp (id, date) values (5,'20090115')
insert into temp (id, date) values (6,'20090114')
-- re-seeds for every item
select *, NEWID() r
from temp order by r
drop table temp

This has worked well for me in the past, and it can be applied to any table (just bolt on the ORDER BY clause):
SELECT *
FROM MY_TABLE
ORDER BY
(SELECT ABS(CAST(NEWID() AS BINARY(6)) % 1000) + 1);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Computing difference in rows for all except consecutive days? - sql

Related

For loop in Microsoft SQL Server

Counting number of transactions within past 1 hour on a particular user

Can I write this query without CURSORs?

SQL SERVER: View to get minimum and maximum values from a table

Pseudo Random Repeatable Sort in SQL Server (not NEWID() and not RAND())

Categories

Resources