Establishing business hierarchy in log files – replace CURSOR by something else?

Establishing business hierarchy in log files – replace CURSOR by something else? - sql

I'm using SQL Server as a warehouse to analyze log files. Those log files carry a kind of bussiness hierarchy (worker in this example):
Log Entry Id, Log Message
1 , Start Worker
2 , Do Cool Stuff
3 , Start Worker
4 , Do further cool stuff
5 , Start Worker
6 , This is a lot of working
7 , End worker
8 , End worker
9 , End worker
I need to relate the log entries to the current worker. The rule is quite simple: once a "Start worker" message is found, assign all following log entries to this worker. In the example hierarchy this means:
Log Entry Id, Log Message , Worker
1 , Start Worker , 1 (we take the entry id as worker id)
2 , Do Cool Stuff , 1
3 , Start Worker , 3
4 , Do further cool stuff , 3
5 , Start Worker , 5
6 , This is a lot of working , 5
7 , End worker , 5
8 , End worker , 3
9 , End worker , 1
Currently I'm using a stored procedure iterating all log entries with a cursor which basically uses a stack to establish the relationship between log entries and workers:
CREATE PROCEDURE CalculateRelations
AS
BEGIN
DECLARE entries_cur CURSOR FOR
SELECT Id, LogMessage
FROM LogEntries
ORDER BY Id;
DECLARE #Id BIGINT;
DECLARE #LogMessage VARCHAR(128);
DECLARE #ParentWorker BIGINT;
DECLARE #WorkerStack VARCHAR(MAX) = '';
OPEN entries_cur;
FETCH NEXT FROM entries_cur INTO #Id, #LogMessage;
WHILE ##FETCH_STATUS = 0
BEGIN
EXEC dbo.GetParentWorker #WorkerStack OUT, #Id, #LogMessage, #ParentWorker OUT;
UPDATE LogEntries
SET ParentWorker = #ParentWorker
WHERE Id = #Id;
FETCH NEXT FROM entries_cur INTO #Id, #LogMessage;
END;
CLOSE entries_cur;
DEALLOCATE entries_cur;
END;
GO
GetParentWorker is a stored procedure which uses the given VARCHAR variable WorkerStack as a stack. This means
"Start worker" message leads to adding (push) the Id to that VARCHAR
"End worker" message leads to removing and returning (pop) the last Id from that VARCHAR
all other messages lead to just returning (read) the last Id from that VARCHAR without modifying it
Now I'm wondering if it's possible to replace this cursor construct by a UPDATE statement. I'm not that deep in SQL and SQL Server, but might it be possible to realize this by dynamic variable assignment, CASE and the usage of the return value of GetParentWorker?

I think this is similar to Ian's, but I'll post for a slightly different approach to the indent level. I think you definitely want to put that indent level into the table with some indexing or this is going to be slow on large tables.
I'm using a CTE to calculate the indent level (basically just adding and subtracting one whenever we hit start or end, using a window function on preceding rows with a special case on ending worker in the current row). Outside of this toy solution, you'd want to limit the preceding rows to rows without an assigned worker and also rows up to the last time the level was zero before that.
Then we can just find the prior 'Start Worker' with the same level. These could probably be marked in pre-processing and indexed for quicker lookups.
UPDATE:
Simplified update statement by introducing window function CTE to calculate the worker id. This should reduce individual row lookups and improve performance in the update. See SQL Fiddle
WITH
WorkerNestingLevel AS (
SELECT
AuditLog.LogId
, AuditLog.LogMessage
, SUM( CASE LogMessage WHEN 'Start Worker' THEN 1 WHEN 'End Worker' THEN -1 ELSE 0 END ) OVER (ORDER BY LogId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
+ CASE LogMessage WHEN 'End Worker' THEN 1 ELSE 0 END AS [WorkerLevel]
FROM
AuditLog
)
, WorkerBatch AS (
SELECT
WorkerNestingLevel.LogId
, MAX( CASE WorkerNestingLevel.LogMessage WHEN 'Start Worker' THEN WorkerNestingLevel.LogId ELSE NULL END) OVER (PARTITION BY WorkerNestingLevel.WorkerLevel ORDER BY WorkerNestingLevel.LogId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS WorkerId
FROM
WorkerNestingLevel
)
UPDATE
AuditLog
SET
WorkerId = WorkerBatch.WorkerId
FROM
AuditLog
JOIN
WorkerBatch ON (WorkerBatch.LogID = AuditLog.LogId);

My apologies for misunderstanding at the first attempt, hopefully I've understood this time that each 'End Worker' value cancels out one of the 'Start Worker's that precedes it. Here it is using a WITH statement that generates a dataset with a field called indent which you need to establish how far back to look for the correct [Log Entry ID]. Does that meet the requirement?
WITH indenttable AS (SELECT [Log Entry ID]
, [Log Message]
, ((SELECT COUNT(*)
FROM yourtable y2
WHERE [Log Message]='Start Worker'
AND y2.[Log Entry ID]<=yourtable.[Log Entry ID])
-(SELECT COUNT(*)
FROM yourtable y2
WHERE [Log Message]='End Worker'
AND y2.[Log Entry ID]<yourtable.[Log Entry ID])) indent
FROM yourtable)
UPDATE yourtable
SET worker=(
SELECT TOP(1) [Log Entry ID]
FROM indenttable y2
WHERE [Log Message]='Start Worker'
AND y2.[Log Entry ID]<=indenttable.[Log Entry ID]
AND y2.indent<=indenttable.indent
ORDER BY [Log Entry ID] DESC)
FROM indenttable JOIN yourtable ON indenttable.[Log Entry ID]=yourtable.[Log Entry ID];

Related

How to delete duplicates data that is in between two common value?

How can I delete duplicate data based on the common value (Start and End)
(Time is unique key)
My table is:
Time
Data
10:24:11
Start
10:24:12
Result
10:24:13
Result
10:24:14
End
10:24:15
Start
10:24:16
Result
10:24:17
End
I want to get Data: Result in between Start and End that is with the MAX(TIME) when duplication does occur. as such
The result that I want:
Time
Data
10:24:11
Start
10:24:13
Result
10:24:14
End
10:24:15
Start
10:24:16
Result
10:24:17
End
I have tried rearranging the data, but couldn't seems to get the result that I want, Could someone give their advice on this case?
Update
I ended up not using either of the the approach suggested by #fredt and #airliquide as my version of HSQLDB doesn't support the function.
so what I did was, adding sequence and making Start = 1, Result = 2, and End = 3.
Sequence
Time
Data
Indicator
1
10:24:11
Start
1
2
10:24:12
Result
2
3
10:24:13
Result
2
4
10:24:14
End
3
5
10:24:15
Start
1
6
10:24:16
Result
2
7
10:24:17
End
3
Thereon, I make use of the indicator and sequence to get only latest Result. Such that if previous row is 2 (which is result), remove it.
The guide that I follow:
From: Is there a way to access the "previous row" value in a SELECT statement?
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1

Hi a first approach will be to use a lead function as folow
select hour,status from (select *,lead(status,1) over ( order by hour) as lead
from newtable)compare
where compare.lead <> status
OR lead is null
Give me what's expected using a postgres engine.

You can do this sort of thing with SQL procedures.
-- create the table with only two columns
CREATE TABLE actions (attime TIME UNIQUE, data VARCHAR(10));
-- drop the procedure if it exists
DROP PROCEDURE del_duplicates IF EXISTS;
create procedure del_duplicates() MODIFIES SQL DATA begin atomic
DECLARE last_time time(0) default null;
for_loop:
-- loop over the rows in order
FOR SELECT * FROM actions ORDER BY attime DO
-- each time 'Start' is found, clear the last_time variable
IF data = 'Start' THEN
SET last_time = NULL;
ITERATE for_loop;
END IF;
-- each time 'Result' is found, delete the row with previous time
-- if last_time is null, no row is actually deleted
IF data = 'Result' THEN
DELETE FROM actions WHERE attime = last_time;
-- then store the latest time
SET last_time = attime;
ITERATE for_loop;
END IF;
END FOR;
END
Your data must all belong to a single day, otherwise there will be strange overlaps that cannot be distinguished. It is better to use TIMESTAMP instead of TIME.

JOIN other table only if condition is true for ALL joined rows

I have two tables I'm trying to conditionally JOIN.
dbo.Users looks like this:
UserID
------
24525
5425
7676
dbo.TelemarketingCallAudits looks like this (date format dd/mm/yyyy):
UserID Date CampaignID
------ ---------- ----------
24525 21/01/2018 1
24525 26/08/2018 1
24525 17/02/2018 1
24525 12/01/2017 2
5425 22/01/2018 1
7676 16/11/2017 2
I'd like to return a table that contains ONLY users that I called at least 30 days ago (if CampaignID=1) and at least 70 days ago (if CampaignID=2).
The end result should look like this (today is 02/09/18):
UserID Date CampaignID
------ ---------- ----------
5425 22/01/2018 1
7676 16/11/2017 2
Note that because I called user 24524 with Campaign 1 only 7 days ago, I shall not see the user at all.
I tried this simple AND/OR condition and then I found out it will still return the users I shouldn't see because they do have rows indicating other calls and it simply ignoring the conditioned calls... which misses the goal obviously.
I have no idea on how to condition the overall appearance of the user if ANY of his associated rows in the second table did not meet the condition.
AND
(
internal_TelemarketingCallAudits.CallAuditID IS NULL --No telemarketing calls is fine
OR
(
internal_TelemarketingCallAudits.CampaignID = 1 --Campaign 1
AND
DATEADD(dd, 75, MAX(internal_TelemarketingCallAudits.Date)) < GETDATE() --Last call occured at least 10 days ago
)
OR
(
internal_TelemarketingCallAudits.CampaignID != 1 --Other campaigns
AND
DATEADD(dd, 10, MAX(internal_TelemarketingCallAudits.Date)) < GETDATE() --Last call occured at least 10 days ago
)
)
I really appreciate your help.

Try this: SQL Fiddle
select *
from dbo.Users u
inner join ( --get the most recent call per user (taking into account different campaign timescales)
select tca.UserId
, tca.CampaignId
, tca.[Date]
, case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end LastCalledInWindow
, row_number() over (partition by tca.UserId order by case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r
from dbo.TelemarketingCallAudits tca
inner join (
values (1, 60)
, (2, 70)
) c (CampaignId, DaysSinceLastCall)
on tca.CampaignId = c.CampaignId
) mrc
on mrc.UserId = u.UserId
and mrc.r = 1 --only accept the most recent call
and mrc.LastCalledInWindow = 0 --only include if they haven't been contacted in the last x days
I'm not comparing all rows here; but rather saw that you're interested in when the most recent call is; then you only care if that's in the X day window. There's a bit of additional complexity given the X days varies by campaign; so it's not the most recent call you care about so much as the most likely to fall within that window. To get around that, I sort each users' calls by those which are in the window first followed by those which aren't; then sort by most recent first within those 2 groups. This gives me the field r.
By filtering on r = 1 for each user, we only get the most recent call (adjusted for campaign windows). By filtering on LastCalledInWindow = 0 we exclude those who have been called within the campaign's window.
NB: I've used an inner query (aliased c) to hold the campaign ids and their corresponding windows. In reality you'd probably want a campaigns table holding that same information instead of coding inside the query itself.
Hopefully everything else is self-explanatory; but give me a nudge in the comments if you need any further information.
UPDATE
Just realised you'd also said "no calls is fine"... Here's a tweaked version to allow for scenarios where the person has not been called.
SQL Fiddle Example.
select *
from dbo.Users u
left outer join ( --get the most recent call per user (taking into account different campaign timescales)
select tca.UserId
, tca.CampaignId
, tca.[Date]
, case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end LastCalledInWindow
, row_number() over (partition by tca.UserId order by case when DateAdd(Day,c.DaysSinceLastCall, tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r
from dbo.TelemarketingCallAudits tca
inner join (
values (1, 60)
, (2, 70)
) c (CampaignId, DaysSinceLastCall)
on tca.CampaignId = c.CampaignId
) mrc
on mrc.UserId = u.UserId
where
(
mrc.r = 1 --only accept the most recent call
and mrc.LastCalledInWindow = 0 --only include if they haven't been contacted in the last x days
)
or mrc.r is null --no calls at all
Update: Including a default campaign offset
To include a default, you could do something like the code below (SQL Fiddle Example). Here, I've put each campaign's offset value in the Campaigns table, but created a default campaign with ID = -1 to handle anything for which there is no offset defined. I use a left join between the audit table and the campaigns table so that we get all records from the audit table, regardless of whether there's a campaign defined, then a cross join to get the default campaign. Finally, I use a coalesce to say "if the campaign isn't defined, use the default campaign".
select *
from dbo.Users u
left outer join ( --get the most recent call per user (taking into account different campaign timescales)
select tca.UserId
, tca.CampaignId
, tca.[Date]
, case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end LastCalledInWindow
, row_number() over (partition by tca.UserId order by case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r
from dbo.TelemarketingCallAudits tca
left outer join Campaigns c
on tca.CampaignId = c.CampaignId
cross join Campaigns dflt
where dflt.CampaignId = -1
) mrc
on mrc.UserId = u.UserId
where
(
mrc.r = 1 --only accept the most recent call
and mrc.LastCalledInWindow = 0 --only include if they haven't been contacted in the last x days
)
or mrc.r is null --no calls at all
That said, I'd recommend not using a default, but rather ensuring that every campaign has an offset defined. i.e. Presumably you already have a campaigns table; and since this offset value is defined per campaign, you can include a field in that table for holding this offset. Rather than leaving this as null for some records, you could set it to your default value; thus simplifying the logic / avoiding potential issues elsewhere where that value may subsequently be used.
You'd also asked about the order by clause. There is no order by 1/0; so I assume that's a typo. Rather the full statement is row_number() over (partition by tca.UserId order by case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end desc, tca.[Date] desc) r.
The purpose of this piece is to find the "most important" call for each user. By "most important" I basically mean the most recent, since that's generally what we're after; though there's one caveat. If a user is part of 2 campaigns, one with an offset of 30 days and one with an offset of 60 days, they may have had 2 calls, one 32 days ago and one 38 days ago. Though the call from 32 days ago is more recent, if that's on the campaign with the 30 day offset it's outside the window, whilst the older call from 38 days ago may be on the campaign with an offset of 60 days, meaning that it's within the window, so is more of interest (i.e. this user has been called within a campaign window).
Given the above requirement, here's how this code meets it:
row_number() produces a number from 1, counting up, for each row in the (sub)query's results. The counter is reset to 1 for each partition
partition by tca.UserId says that we're partitioning by the user id; so for each user there will be 1 row for which row_number() returns 1, then for each additional row for that user there will be a consecutive number returned.
The order by part of this statement defines which of each users' rows gets #1, then how the numbers progress thereafter; i.e. the first row according to the order by gets number 1, the next number 2, etc.
case when DateAdd(Day,coalesce(c.DaysSinceLastCall,dflt.DaysSinceLastCall), tca.[Date]) > getutcdate() then 1 else 0 end returns 1 for calls within their campaign's window, and 0 for those outside of the window. Since we're ordering by this result in ascending order, that says that any records within their campaign's window should be returned before any outside of their campaign's window.
we then order by tca.[Date] desc; i.e. the more recent calls are returned before the later calls.
finally, we name the output of this row number as r and in the outer query filter on r = 1; meaning that for each user we only take one row, and that's the first row according to the order criteria above; i.e. if there's a row in its campaign's window we take that, after which it's whichever call was most recent (within those in the window if there were any; then outside that window if there weren't).
Take a look at the output of the subquery to get a better idea of exactly how this works: SQL Fiddle
I hope that explanation makes some sense / helps you to understand the code? Sadly I can't find a way to explain it more concisely than the code itself does; so if it doesn't make sense try playing with the code and seeing how that affects the output to see if that helps your understanding.

Fetch rows based on condition

I am using PostgreSQL on Amazon Redshift.
My table is :
drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34') --row y
Expected output:
'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'
I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.
I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds
I tried this :
SELECT a.app_nm
,min(a.start)
,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
ON a.APP_nm = b.APP_nm
AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1
It works but it doesn't return row y when conditions fails.

There are two reasons that row y is not returned is due to the condition:
b.start > a.start means that a row will never join with itself
The GROUP BY will return only one record per APP_nm value, yet all rows have the same value.
However, there are further logic errors in the query that will not successfully handle. For example, how does it know when a "new" session begins?
The logic you seek can be achieved in normal PostgreSQL with the help of a DISTINCT ON function, which shows one row per input value in a specific column. However, DISTINCT ON is not supported by Redshift.
Some potential workarounds: DISTINCT ON like functionality for Redshift
The output you seek would be trivial using a programming language (which can loop through results and store variables) but is difficult to apply to an SQL query (which is designed to operate on rows of results). I would recommend extracting the data and running it through a simple script (eg in Python) that could then output the Start & End combinations you seek.
This is an excellent use-case for a Hadoop Streaming function, which I have successfully implemented in the past. It would take the records as input, then 'remember' the start time and would only output a record when the desired end-logic has been met.

Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.
The complete solution might look like this:
SELECT
start AS session_start,
session_end
FROM (
SELECT
start,
end1,
lead(end1, 1)
OVER (
ORDER BY end1) AS session_end,
session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN session_switch = 0 AND reverse_session_switch = 1
THEN 'start'
ELSE 'end' END AS session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN datediff(seconds, end1, lead(start, 1)
OVER (
ORDER BY end1 ASC)) > 10
THEN 1
ELSE 0 END AS session_switch,
CASE WHEN datediff(seconds, lead(end1, 1)
OVER (
ORDER BY end1 DESC), start) > 10
THEN 1
ELSE 0 END AS reverse_session_switch
FROM app_tax
)
AS sessioned
WHERE session_switch != 0 OR reverse_session_switch != 0
UNION
SELECT
start,
end1,
'start'
FROM (
SELECT
start,
end1,
row_number()
OVER (PARTITION BY APP_nm
ORDER BY end1 ASC) AS row_num
FROM APP_Tax
) AS with_row_number
WHERE row_num = 1
) AS with_boundary
) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;
Here is the breadkdown (by subquery name):
sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows
with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.

SQL query to retrieve discrepancies in punch order

Consider the table below.
The rule is - an employee cannot take a break (needs to clock out) from job num 1 before clocking in to job num 2. In this case the employee "A" was supposed to clock OUT instead of BREAK on jobnum 1 because he later clocked in to JobNum#2
Is it possible to write a query to find this in plain SQL?

Idea is to check if next record is proper one. To find next record one has to find first punchtime after current for same employee. Once this information is retrieved one can isolate record itself and check fields of interest, specifically is jobnum the same and [optionally] is punch_type 'IN'. If it is not, not exists evaluates to true and record is output.
select *
from #punch p
-- Isolate breaks only
where p.punch_type = 'BREAK'
-- The ones having no proper entry
and not exists
(
select null
-- The same table
from #punch a
where a.emplid = p.emplid
and a.jobnum = p.jobnum
-- Next record has punchtime from subquery
and a.punchtime = (select min (n.punchtime)
from #punch n
where n.emplid = p.emplid
and n.punchtime > p.punchtime
)
-- Optionally you might force next record to be 'IN'
and a.punch_type = 'IN'
)
Replace #punch with your table name. -- is comment in Sql Server; if you are not using this database, remove this lines. It is a good idea to tag your database and version as there are probably faster/better ways to do this.

Here is the SQL
select * from employees e1 cross join employees e2 where e1.JOBNUM = (e2.JOBNUM + 1)
and e1.PUNCH_TYPE = 'BREAK' and e2.PUNCH_TYPE = 'IN'
and e1.PUNCHTIME < e2.PUNCHTIME
and e1.EMPLID = e2.EMPLID

Optimize - Select whether record/condition exists in another table -TSQL

CREATE TABLE IntegrationLog (
IntegrationLogID INT IDENTITY(1,1) NOT NULL,
RecordID INT NOT NULL,
SyncDate DATETIME NOT NULL,
Success BIT NOT NULL,
ErrorMessage VARCHAR(MAX) NULL,
PreviousError BIT NOT NULL --last sync attempt for record failed for syncdate
)
My goal here, is to return every recordid, erorrmessage that has not been followed by a complete success, exclude where for the recorid there was a ( Success == 1 and PreviousError == 0) that occurred after the last time this error happened. For this recordid, I also want to know whether there has ever been a success ( Partial or otherwise ) that has ever happened.
Or in other words, I want to see errors and the record they occurred on that haven't been fixed since the error occurred. I also want to know whether I have ever had a success for the particular recordid.
This works, but I am curious if there is a better way to do this?
SELECT errors.RecordID ,
errors.errorMessage,
CASE WHEN PartialSuccess.RecordID IS NOT NULL THEN 1
ELSE NULL
END AS Resolved
FROM ( SELECT errors.RecordID ,
errors.ErrorMessage ,
MAX(SyncDate) AS SyncDate
FROM dbo.IntegrationLog AS Errors
WHERE errors.Success = 0
GROUP BY errors.RecordID ,
errors.ErrorMessage ,
errors.ErrorDescription
) AS Errors
LEFT JOIN dbo.IntegrationLog AS FullSuccess ON FullSuccess.RecordID = Errors.RecordID
AND FullSuccess.Success = 1
AND FullSuccess.PreviousError = 0
AND FullSuccess.SyncDate > Errors.SyncDate
LEFT JOIN ( SELECT partialSuccess.RecordID
FROM dbo.IntegrationLog AS partialSuccess
WHERE partialSuccess.Success = 1
GROUP BY partialSuccess.RecordID
) AS PartialSuccess ON Errors.RecordID = PartialSuccess.RecordID
WHERE FullSuccess.RecordID IS NULL
I also created a pastebin with a few different ways I saw of structuring the query. http://pastebin.com/FtNv8Tqw
Is there another option as well?
If it helps, background for the project is that I am trying to sync records that have been updated since their last successful sync ( Partial or Full ) and log the attempts. A batch of records is identified to be synced. Each record attempt is logged. If it failed, depending on the error it might be possible try to massage the data and attempt again. For this 'job', the time we collected the records is used as the SyncDate. So for a given SyncDate, we might have records that successfully synced on the first try, records we gave up on the first attempt, records we massaged and were able to sync, etc. Each attempt is logged.
Does it change anything if instead of wanting to know whether any success has occurred for that recordid, that I wish to identify whether a partial success has occurred since the last error occurrence.
Thank You! Suggestions on my framing of the question are welcome as well.

You should probably show query plan take a look at where most of the time is being spent and index appropriately.
That said one thing you can try is to use the Window Function ROW_NUMBER instead of MAX.
WITH cte
AS (SELECT errors.recordid,
errors.errormessage,
CASE
WHEN partialsuccess.recordid IS NOT NULL THEN 1
ELSE NULL
END
AS resolved,
Row_number() OVER (PARTITION BY errors.recordid ORDER BY
syncdate
DESC)
rn
FROM integrationlog error
LEFT JOIN integrationlog fullsuccess
ON fullsuccess.recordid = errors.recordid
AND fullsuccess.success = 1
AND fullsuccess.previouserror = 0
AND fullsuccess.syncdate > errors.syncdate
LEFT JOIN (SELECT partialsuccess.recordid
FROM dbo.integrationlog AS partialsuccess
WHERE partialsuccess.success = 1
GROUP BY partialsuccess.recordid) AS partialsuccess
ON errors.recordid = partialsuccess.recordid
WHERE errors.success = 0)
SELECT
recordid,
errormessage,
resolved
FROM cte
WHERE rn = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Establishing business hierarchy in log files – replace CURSOR by something else? - sql

Related

How to delete duplicates data that is in between two common value?

JOIN other table only if condition is true for ALL joined rows

Fetch rows based on condition

SQL query to retrieve discrepancies in punch order

Optimize - Select whether record/condition exists in another table -TSQL

Categories

Resources