SQL count number of time series events, with some some start or stop entries missing

SQL count number of time series events, with some some start or stop entries missing - sql

I have some start/stop events and I need to count the number of total events but sometimes a start or stop is missing, for example:
Time Event
10:50 START
10:52 STOP
10:59 START
11:01 STOP
11:45 STOP
Count(Event) Where Event='START'
Would return 2, I also need to count the missing START value, so the result should be 3. Any ideas on how this could be done? Thanks!

Two constraints must be met to enable event counting.
Two START-STOP periods cannot overlap.
Two consecutive and chronologically ordered START and STOP event cannot be possibly originated from two different events, namely START+(missing TOP) and (missing START)+STOP.
It the conditions are met, a simple state machine can be implemented to detect the "missing" events. Such a row-by-row logic could (almost always) be implemented using the cursor syntax.
N.B. To exemplify the generality of the cursor method you can also see other answers A (update columns), B (a tedious algo) I made. The code structures are highly similar.
Test Dataset
use [testdb];
if OBJECT_ID('testdb..test') is not null
drop table testdb..test;
create table test (
[time] varchar(50),
[event] varchar(50),
);
insert into test ([time], [event])
values ('10:50', 'START'),('10:52', 'STOP'),('10:59', 'START'),
('11:01', 'STOP'),('11:45', 'STOP'),('11:50', 'STOP'),('11:55', 'START');
select * from test;
Code
/* cursor variables */
-- storage for each row
declare #time varchar(50),
#event varchar(50),
#state int = 0, -- state variable
#count int = 0; -- event count
-- open a cursor ordered by [time]
declare cur CURSOR local
for select [time], [event]
from test
order by [time]
open cur;
/* main loop */
while 1=1 BEGIN
/* fetch next row and check termination condition */
fetch next from cur
into #time, #event;
-- termination condition
if ##FETCH_STATUS <> 0 begin
-- check unfinished START before exit
if #state = 1
set #count += 1;
-- exit loop
break;
end
/* program body */
-- case 1. state = 0 (clear state)
if #state = 0 begin
-- 1-1. normal case -> go to state 1
if #event = 'START'
set #state = 1;
-- 1-2. a STOP without START -> keep state 0 and count++
else if #event = 'STOP'
set #count += 1;
-- guard
else
print '[Error] Bad event name: ' + #event
end
-- case 2. start = 1 (start is found)
else if #state = 1 begin
-- 2-1. normal case -> go to state 0 and count++
if #event = 'STOP' begin
set #count += 1;
set #state = 0;
end
-- 2-2. a START without STOP -> keep state 1 and count++
else if #event = 'START'
set #count += 1;
-- guard
else
print '[Error] Bad event name: ' + #event
end
END
-- cleanup
close cur;
deallocate cur;
Result
print #count; -- correct answer: 5
Tested on SQL Server 2017 (linux docker image, latest version).

Well, you could count each start and then each "stop" where the preceding event is not a start:
select count(*)
from (select t.*,
lag(event) over (order by time) as prev_event
from t
) t
where event = 'start' or
(prev_event = 'stop' and event = 'stop');

Related

weekly event select into file with different filenames depending on the variables(MariaDB)

I've always been a silent reader here until now.
Now I would like to ask for your expertise and post my ver first question here.
I have to achieve the following task on a weekly basis in my MariaDB via Events:
Every Week on Saturday night at midnight, i want to save the results of a certain view in an excel file (xlsx). The filename should be variable depending on the site_id and the current timestamp.
After saving the results into the file I want to cleanup the DB Tables with another Event, but the previous event must be successfully finished as a condition to start the cleanup event.
e.g.filename:
viewname_[site_id]_timestamp.xlsx
overall_weekly _3_01082022.xlsx
This is what I have so far:
EVENT 1(saving results into file):
CREATE EVENT overall_weekly
ON SCHEDULE EVERY 1 WEEK
STARTS TRUNCATE(CURRENT_TIMESTAMP) + '00:00:00' HOUR_SECONDS
ON COMPLETION PRESERVE
ENABLE
DO
DECLARE #path = char
DECLARE #view = char
DECLARE #site_id = int(3)
DECLARE #timestamp = timestamp
DECLARE #filetype = char(5)
DECLARE #full_filename = char
SET #path = "/home/reports/"
SET #view = "overall_traffic_weekly"
SET #site_id = 3
SET #timestamp = current_timestamp
SET #filetype = ".xlsx"
SET #full_filename = CONCAT(#path,#view,#site_id,#timestamp,#filetype)
SELECT * FROM
(
SELECT 'Column_name_1','Column_name2', ...
UNION ALL
(
SELECT * FROM overall_weekly
WHERE site_id = 3
)
) resulting_set
INTO OUTFILE #full_filename
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '/n';
EVENT 2(cleanup):
EVENT 1 must be SUCCESSFULLY finished for event 2 to start.
IF event 1 finishes with errors, cleanup must not start.
CREATE EVENT cleanup
ON SCHEDULE EVERY 1 WEEK
STARTS TRUNCATE(CURRENT_TIMESTAMP) + '03:00:00' HOUR_SECONDS
ON COMPLETION PRESERVE
ENABLE
DO
TRUNCATE sourcetable1,
TRUNCATE Sourcetable2
;
Many thanks for reading.

Problem solved:
I used 2 tables instead and matched the 2 records together in a third table

SQL Server: Why does adding a null to a variable not cause an error?

I have a stored procedure that has a loop based on a counter. When the counter becomes NULL the loop ends without any error. Why doesn't SQL Server at least display a warning or error message like other programming languages?
Here is a code sample which exhibits the problem:
DECLARE #MasterCount int = 0;
DECLARE #Count int; -- initialized to NULL by SQL Server
PRINT 'Starting'
IF (#MasterCount IS NULL)
PRINT '#MasterCount IS NULL';
ELSE
PRINT '#MasterCount ' + CAST(#MasterCount AS varchar(10))
IF (#Count IS NULL)
PRINT '#Count IS NULL';
WHILE (#MasterCount IS NOT NULL)
BEGIN
SET #MasterCount += #Count;
IF ##ERROR <> 0 PRINT 'Error occured!'
PRINT 'Loop #Count ' + CAST(#Count AS varchar(10))
SET #Count -= 1;
END
IF ##ERROR <> 0 PRINT 'Error occured!'
IF (#MasterCount IS NULL)
PRINT '#MasterCount IS NULL';
ELSE
PRINT '#MasterCount ' + CAST(#MasterCount AS varchar(10))
PRINT 'Ending'
Produces the following output:
Starting
#MasterCount 0
#Count IS NULL
#MasterCount IS NULL
Ending

It doesn't raise an error because this is defined, documented behaviour.
If you have two apples and you know the weight of only one then it makes sense that the weight of both of them added together is not known.
You can actually get a warning to appear if you slightly alter the formulation.
Instead of
SET #MasterCount += #Count;
You could use
SELECT #MasterCount = SUM(C)
FROM (VALUES(#Count),
(#MasterCount )) V(C);
In which case it gives
Warning: Null value is eliminated by an aggregate or other SET
operation.
This does change the semantics however. As the null value was entirely ignored you would end up with #MasterCount simply being assigned back its original value rather than being set to null in your scenario.

Procedure stops when legacy, new error traps are next to each other

We have a bunch of old stored procedures with legacy style error trapping. I changed one the other day and included a newer TRY...CATCH block. The stored procedure just stopped after the TRY/CATCH and returned as though there were an error in the legacy block.
If I put a
SELECT NULL
in between the two everything works fine. Anyone know why this is happening?
--BEGIN NEW ERROR TRAP--
BEGIN TRY
Do stuff...
END TRY
BEGIN CATCH
END CATCH
--END NEW ERROR TRAP---
----------------- OLD SCHOOL TRAP BEGIN -----------------
SELECT #spERROR = ##ERROR ,
#spROWCOUNT = ##ROWCOUNT
SET #spRETURN = #spRETURN + 1
IF ( #spROWCOUNT <= 0
OR #spERROR <> 0
)
SET #spRETURN = 0 - #spRETURN
IF ( #spROWCOUNT <= 0
OR #spERROR <> 0
)
RETURN #spRETURN
SELECT #spROWCOUNT = -1 ,
#spERROR = -1
------------------ OLD SCHOOL ERROR TRAP END ------------------

In your try catch block, the last statement is probably doing something that sets the row count to 0. The "SELECT NULL" is setting the row count to 1, since it returns one row, so no error is detected.
You can fix this by changing the logic in the "old" code or by setting your row count variable in the try/catch code. I would recommend that you remove the SELECT NULL, since it would guarantee success and you may not want that behavior.

Optimizing Levenshtein distance algorithm

I have a stored procedure that uses Levenshtein distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using:
ALTER function dbo.Levenshtein
(
#Source nvarchar(200),
#Target nvarchar(200)
)
RETURNS int
AS
BEGIN
DECLARE #Source_len int, #Target_len int, #i int, #j int, #Source_char nchar, #Dist int, #Dist_temp int, #Distv0 varbinary(8000), #Distv1 varbinary(8000)
SELECT #Source_len = LEN(#Source), #Target_len = LEN(#Target), #Distv1 = 0x0000, #j = 1, #i = 1, #Dist = 0
WHILE #j <= #Target_len
BEGIN
SELECT #Distv1 = #Distv1 + CAST(#j AS binary(2)), #j = #j + 1
END
WHILE #i <= #Source_len
BEGIN
SELECT #Source_char = SUBSTRING(#Source, #i, 1), #Dist = #i, #Distv0 = CAST(#i AS binary(2)), #j = 1
WHILE #j <= #Target_len
BEGIN
SET #Dist = #Dist + 1
SET #Dist_temp = CAST(SUBSTRING(#Distv1, #j+#j-1, 2) AS int) +
CASE WHEN #Source_char = SUBSTRING(#Target, #j, 1) THEN 0 ELSE 1 END
IF #Dist > #Dist_temp
BEGIN
SET #Dist = #Dist_temp
END
SET #Dist_temp = CAST(SUBSTRING(#Distv1, #j+#j+1, 2) AS int)+1
IF #Dist > #Dist_temp SET #Dist = #Dist_temp
BEGIN
SELECT #Distv0 = #Distv0 + CAST(#Dist AS binary(2)), #j = #j + 1
END
END
SELECT #Distv1 = #Distv0, #i = #i + 1
END
RETURN #Dist
END
Where should I go from here?

The way I've done this in the past is to store the "database" (actually a dictionary of words for a spelling correcter) as a trie.
Then I used a branch-and-bound routine to look up nearest matching entries. For small distances, the time it takes is exponential in the distance. For large distances, it is linear in the size of the dictionary, just as you are seeing now.
Branch-and-bound is basically a depth-first tree walk of the trie, but with an error budget. At each node, you keep track of the current levenshtein distance, and if it exceeds the budget, you prune that branch of the tree.
First you do the walk with a budget of zero. That will only find exact matches. If you don't find a match, then you walk it with a budget of one. That will find matches at a distance of 1. If you don't find any, then you do it with a budget of 2, and so on. This sounds inefficient, but since each walk takes so much more time than the previous one, the time is dominated by the last walk that you make.
Added: outline of code (pardon my C):
// dumb version of trie node, indexed by letter. You can improve.
typedef struct tnodeTag {
tnodeTag* p[128];
} tnode;
tnode* top; // the top of the trie
void walk(tnode* p, char* s, int budget){
int i;
if (*s == 0){
if (p == NULL){
// print the current trie path
}
}
else if (budget >= 0){
// try deleting this letter
walk(p, s+1, budget-1);
// try swapping two adjacent letters
if (s[1]){
swap(s[0], s[1]);
walk(p, s, budget-1);
swap(s[0], s[1]);
}
if (p){
for (i = 0; i < 128; i++){
// try exact match
if (i == *s) walk(p->p[i], s+1, budget);
// try replacing this character
if (i != *s) walk(p->p[i], s+1, budget-1);
// try inserting this letter
walk(p->p[i], s, budget-1);
}
}
}
}
Basically, you simulate deleting a letter by skipping it and searching at the same node. You simulate inserting a letter by descending the trie without advancing s. You simulate replacing a letter by acting as if the letter matched, even though it doesn't. When you get the hang of it, you can add other possible mismatches, like replacing 0 with O and 1 with L or I - dumb stuff like that.
You probably want to add a character array argument to represent the current word you are finding in the trie.

How to organize infinite while loop in SQL Server?

I want to use infinite WHILE loop in SQL Server 2005 and use BREAK keyword to exit from it on certain condition.
while true does not work, so I have to use while 1=1.
Is there a better way to organize infinite loop ?
I know that I can use goto, but while 1=1 begin ... end looks better structurally.

In addition to the WHILE 1 = 1 as the other answers suggest, I often add a "timeout" to my SQL "infintie" loops, as in the following example:
DECLARE #startTime datetime2(0) = GETDATE();
-- This will loop until BREAK is called, or until a timeout of 45 seconds.
WHILE (GETDATE() < DATEADD(SECOND, 45, #startTime))
BEGIN
-- Logic goes here: The loop can be broken with the BREAK command.
-- Throttle the loop for 2 seconds.
WAITFOR DELAY '00:00:02';
END
I found the above technique useful within a stored procedure that gets called from a long polling AJAX backend. Having the loop on the database-side frees the application from having to constantly hit the database to check for fresh data.

Using While 1 = 1 with a Break statement is the way to do it. There is no constant in T-SQL for TRUE or FALSE.

If you really have to use an infinite loop than using while 1=1 is the way I'd do it.
The question here is, isn't there some other way to avoid an infinite loop? These things just tend to go wrong ;)

you could use the snippet below to kick a sp after soem condition are rised. I assume that you ahev some sort of CurrentJobStatus table where all the jobs/sp keeps their status...
-- *** reload data on N Support.usp_OverrideMode with checks on Status
/* run
Support.usp_OverrideMode.Number1.sql
and
Support.usp_OverrideMode.Number2.sql
*/
DECLARE #FileNameSet TABLE (FileName VARCHAR(255));
INSERT INTO #FileNameSet
VALUES ('%SomeID1%');
INSERT INTO #FileNameSet
VALUES ('%SomeID2%');
DECLARE #BatchRunID INT;
DECLARE #CounterSuccess INT = 0;
DECLARE #CounterError INT = 0;
-- Loop
WHILE WHILE (#CounterError = 0 AND #CounterSuccess < (select COUNT(1) c from #FileNameSet) )
BEGIN
DECLARE #CurrenstStatus VARCHAR(255)
SELECT #CurrenstStatus = CAST(GETDATE() AS VARCHAR)
-- Logic goes here: The loop can be broken with the BREAK command.
SELECT #CounterSuccess = COUNT(1)
FROM dbo.CurrentJobStatus t
INNER JOIN #FileNameSet fns
ON (t.FileName LIKE fns.FileName)
WHERE LoadStatus = 'Completed Successfully'
SELECT #CounterError = COUNT(1)
FROM dbo.CurrentJobStatus t
INNER JOIN #FileNameSet fns
ON (t.FileName LIKE fns.FileName)
WHERE LoadStatus = 'Completed with Error(s)'
-- Throttle the loop for 3 seconds.
WAITFOR DELAY '00:00:03';
select #CurrenstStatus = #CurrenstStatus +char(9)+ '#CounterSuccess ' + CAST(#CounterSuccess AS VARCHAR(11))
+ char(9)+ 'CounterError ' + CAST(#CounterError AS VARCHAR(11))
RAISERROR (
'Looping... # %s'
,0
,1
,#CurrenstStatus
)
WITH NOWAIT;
END
-- TODO add some codition on #CounterError value
/* run
Support.usp_OverrideMode.WhenAllSuceed.sql
*/
Note the code is flexibile you can add as many condition checks on the #FileNameSet table var
Mario

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL count number of time series events, with some some start or stop entries missing - sql

Well, you could count each start and then each "stop" where the preceding event is not a start: select count() from (select t., lag(event) over (order by time) as prev_event from t ) t where event = 'start' or (prev_event = 'stop' and event = 'stop');

Related

weekly event select into file with different filenames depending on the variables(MariaDB)

SQL Server: Why does adding a null to a variable not cause an error?

Procedure stops when legacy, new error traps are next to each other

Optimizing Levenshtein distance algorithm

How to organize infinite while loop in SQL Server?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL count number of time series events, with some some start or stop entries missing - sql

Well, you could count each start and then each "stop" where the preceding event is not a start: select count(*) from (select t.*, lag(event) over (order by time) as prev_event from t ) t where event = 'start' or (prev_event = 'stop' and event = 'stop');

Related

weekly event select into file with different filenames depending on the variables(MariaDB)

SQL Server: Why does adding a null to a variable not cause an error?

Procedure stops when legacy, new error traps are next to each other

Optimizing Levenshtein distance algorithm

How to organize infinite while loop in SQL Server?

Categories

Resources

Well, you could count each start and then each "stop" where the preceding event is not a start: select count() from (select t., lag(event) over (order by time) as prev_event from t ) t where event = 'start' or (prev_event = 'stop' and event = 'stop');