Find third last event in table - Incomplete output - sql

As my research into Firebird continues, I've attempted to improve some of my queries. As I use Libreoffice Base, I'm not 100% sure how the data entry code works, but I believe it's something like this:
CREATE TABLE "Data Entry"(
ID int,
Date date,
"Vehicle Type" varchar,
events int,
"Hours 1" int,
"Hours 2" int
);
INSERT INTO "Data Entry" VALUES
(1, '31/12/22', 'A', '1', '0', '1'),
(2, '31/12/22', 'A', '1', '0', '1'),
(3, '29/12/22', 'A', '3', '0', '1'),
(4, '25/06/22', 'B1', '1', '0', '1'),
(5, '24/06/22' , 'B1', '1', '1', '0'),
(6, '24/06/22' , 'B1', '1', '1', '0'),
(7, '31/12/22' , 'B2', '7', '0', '1'),
(8, '29/12/22' , 'C', '1', '0', '1'),
(9, '29/12/22' , 'C', '2', '0', '1'),
(10, '19/01/22' , 'D1', '5', '1', '0'),
(11, '23/01/22' , 'D2', '6', '1', '1'),
(12, '29/07/19' , 'D3', '5', '0', '1'),
(13, '21/12/22' , 'D4', '1', '0', '1'),
(14, '19/12/22' , 'D4', '1', '1', '1'),
(15, '19/12/22' , 'D4', '1', '0', '1'),
(16, '28/12/22' , 'E', '2', '0', '1'),
(17, '24/12/22' , 'E', '3', '0', '1'),
(18, '14/07/07' , '1', '0', '0', '1'),
(19, '22/12/22' , '2', '1', '0', '1');
I tried this through the online Fiddle pages, but it throws up errors, so either I'm doing it incorrectly, or it's because there was no option for Firebird. Hopefully irrelevant, as I have the table already through the front-end.
One of my earlier queries which works as expected is shown below, along with its output:
SELECT
"Vehicle Type",
DATEDIFF(DAY, "Date", CURRENT_DATE) AS "Days Since 3rd Last Event"
FROM
(
SELECT
"Date",
"Events",
"Vehicle Type",
"Event Count",
ROW_NUMBER() OVER (PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS "rn"
FROM
(
SELECT
"Date",
"Events",
"Vehicle Type",
SUM("Events") OVER (PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS "Event Count"
FROM "Data Entry"
)
WHERE "Event Count" >= 3
)
WHERE "rn" = 1
Vehicle Type
Days Since 3rd Last Event
A
3
B1
191
B2
1
C
3
D1
347
D2
343
D3
1252
D4
14
E
8
In this output, it does not list every vehicle because not all vehicles have an Event Count that is equal to or greater than 3. The new query I am trying to put together is a combination of different queries (omitted to keep things relevant, plus they already work on their own), with a rewrite of the above code as well:
SELECT
"Vehicle Type",
SUM("Hours 1" + "Hours 2") AS "Total Hours",
MAX(CASE
WHEN
"Total Events" = 3
THEN
DATEDIFF(DAY, "Date", CURRENT_DATE)
END
) "Days Since 3rd Last Event"
FROM
(
SELECT
"Vehicle Type",
"Date",
"Hours 1",
"Hours 2",
CASE
WHEN
"Events" > 0
THEN
SUM( "Events")
OVER(
PARTITION BY "Vehicle Type"
ORDER BY "Date" DESC
)
END
"Total Events"
FROM
"Data Entry"
)
GROUP BY "Vehicle Type"
ORDER BY "Vehicle Type"
The expected output should be:
Vehicle Type
Days Since 3rd Last Event
Total Hours
1
1
2
1
A
3
3
B1
191
3
B2
1
1
C
3
2
D1
347
1
D2
343
2
D3
1252
1
D4
14
4
E
8
2
However, the actual output is:
Vehicle Type
Days Since 3rd Last Event
Total Hours
1
1
2
1
A
3
B1
191
3
B2
1
C
3
2
D1
1
D2
2
D3
1
D4
14
4
E
2
Granted, I've mixed and matched code, made some up myself, and copied some parts from elsewhere online, so there's a good chance I've not understood something correctly and blindly added it in thinking it would work, but now I'm at a loss as to what that could be. I've had a play around with changing the values of the WHEN statements and altering the operators between =, >, and >=, but any deviation from what's currently shown above outputs incorrect numbers. At least the three numbers displayed in the actual output are correct.

You could try using two rankings:
the first one that catches last three rows
the second one that catches your last row among the possible three
then get your date differences.
WITH last_three AS (
SELECT "Vehicle Type", "Date",
SUM("Hours 1"+"Hours 2") OVER(PARTITION BY "Vehicle Type") AS "Total Hours",
ROW_NUMBER() OVER(PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS rn
FROM "Data Entry"
), last_third AS (
SELECT "Vehicle Type", "Date", "Total Hours",
ROW_NUMBER() OVER(PARTITION BY "Vehicle Type" ORDER BY rn DESC) AS rn2
FROM last_three
WHERE rn <= 3
)
SELECT "Vehicle Type",
DATEDIFF(DAY, "Date", CURRENT_DATE) AS "Days Since 3rd Last Event",
"Total Hours"
FROM last_third
WHERE rn2 = 1
ORDER BY "Vehicle Type"
Check the demo here.
Note: You will get values for the "Vehicle Type" 1 and 2 too. If you can explain the rationale behind having those values empty, this query can be tweaked accordingly.

Related

Alternative to looping

I have a table like the following and I am required to show the subtotal of the use_time_sec column grouping by event_datetime, event_name (only show login), user_id and system_id.
sample input table
with sample_input as (
select '12/01/2023 14:27:59' as event_datetime, 'login' as event_name,'1' as user_id, 'X' as system_id, '0' as use_time_sec
union all
select '12/01/2023 14:28:05', 'screen 1', '1', 'X', '2'
union all
select '12/01/2023 14:28:05', 'screen 2', '1', 'X', '5',
union all
select '12/01/2023 14:28:17', 'screen 1', '1', 'X', '3',
union all
select '12/01/2023 14:28:23', 'logout', '1', '', '0',
union all
select '12/01/2023 14:28:23', 'login', '2', 'Y', '0',
union all
select '12/01/2023 14:28:23', 'screen 1', '2', 'Y', '10',
union all
select '12/01/2023 14:28:24', 'screen 2', '2', 'Y', '100',
union all
select '12/01/2023 14:28:29', 'login', '1', 'X', '0',
union all
select '12/01/2023 14:28:29', 'screen 1', '1', 'X', '500',
union all
select '12/01/2023 14:28:29', 'logout', '1', '', '0',
)
select * from sample_input
sample output
I can loop through the table to get my desired output. But thats not the most efficient solution as there are few millions of record in the table and growing everyday.
Will appreciate if someone can provide a better solution than what I have.
Note: The data is in google BigQuery.
Thanks
This is known as the Gaps and Islands problem. We're trying to identify the islands of user sessions. We need to do a query which gives us some way to identify a session. This relies heavily on window functions.
One way is to count the number of logins seen per user.
select
*,
sum(1)
filter(where event_name = 'login')
over(partition by user_id order by event_time)
as session_num
from events
order by event_time
That will keep a tally per user_id. It will add to the tally every time it sees a user login.
event_time
event_type
user_id
use_time_sec
session_num
1000
login
1
0
1
1001
things
1
3
1
1001
login
2
10
1
1002
logout
1
7
1
1005
logout
2
20
1
1100
login
1
5
2
1101
logout
1
10
2
Now we have a way to identify each user's sessions. We can grouping by user_id and session_num. These are our islands.
with sessions as (
select
*,
sum(1)
filter(where event_name = 'login')
over(partition by user_id order by event_time)
as session_num
from events
order by event_time
)
select
min(event_time) as session_start,
user_id,
sum(use_time_sec) as total_use_time_sec
from sessions
group by user_id, session_num
order by session_start
session_start
user_id
total_use_time_sec
1000
1
10
1001
2
130
1100
1
15
Demonstration in PostgreSQL, but it should work fine on BigQuery.

Assign Specific Value To All Rows in Partition If Condition Met

I have to build the Exceptions Report to catch Overlaps or Gaps. The dataset has clients and assigned supervisors with start and end dates of supervision.
CREATE TABLE Report
(Id INT, ClientId INT, ClientName VARCHAR(30), SupervisorId INT, SupervisorName
VARCHAR(30), SupervisionStartDate DATE, SupervisionEndDate DATE);
INSERT INTO Report
VALUES
(1, 22, 'Client A', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(2, 22, 'Client A', 44, 'Supervisor B', '2022-05-01', '2022-08-23'),
(3, 22, 'Client A', 55, 'Supervisor C', '2022-08-24', NULL),
(4, 23, 'Client B', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(5, 23, 'Client B', 44, 'Supervisor B', '2022-04-30', '2022-08-23'),
(6, 24, 'Client C', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(7, 24, 'Client C', 44, 'Supervisor B', '2022-05-01', '2022-08-23'),
(8, 24, 'Client C', 55, 'Supervisor C', '2022-07-22', '2022-10-25'),
(9, 25, 'Client D', 33, 'Supervisor A', '2022-01-01', '2022-04-30'),
(10, 25, 'Client D', 44, 'Supervisor B', '2022-07-23', NULL)
SELECT * FROM Report
'Valid' status should be assigned to all rows associated with Client if no Gaps or Overlaps present, for example:
Client A has 3 Supervisors - Supervisor A (01/01/2022 - 04/30/2022), Supervisor B (05/01/2022 - 08/23/2022) and Supervisor C (08/24/2022 - Present).
'Issue Found' status should be assigned to all rows associated with Client if any Gaps or Overlaps present, for example:
Client B has 2 Supervisors - Supervisor A (01/01/2022 - 04/30/2022) and Supervisor B (04/30/2022 - 08/23/2022).
Client C has 3 Supervisors - Supervisor A (01/01/2022 - 04/30/2022), Supervisor B (05/01/2022 - 08/23/2022) and Supervisor C (07/22/2022 - 10/25/2022).
These are examples of the Overlap.
Client D has 2 Supervisors - Supervisor A (01/01/2022 - 04/30/2022) and Supervisor B (07/23/2022 - Present).
This is the example of the Gap.
The Output I need:
I added some columns that might be helpful, but don't know how to accomplish the main goal.
However, I noticed, that if the first record in the [Diff Between PreviousEndDate And SupervisionStartDate] column is NULL and all other = 1, then it will be Valid.
SELECT
Report.*,
ROW_NUMBER() OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS [ClientRecordNumber],
COUNT(*) OVER (PARTITION BY Report.ClientId) AS [TotalNumberOfClientRecords],
DATEDIFF(DAY, Report.SupervisionStartDate, Report.SupervisionEndDate) AS SupervisionAging,
LAG(Report.SupervisionStartDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS PreviousStartDate,
LAG(Report.SupervisionEndDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS PreviousEndDate,
LEAD(Report.SupervisionStartDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS NextStartDate,
LEAD(Report.SupervisionEndDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)) AS NextEndDate,
DATEDIFF(dd, LAG(Report.SupervisionEndDate) OVER (PARTITION BY Report.ClientId ORDER BY COALESCE(Report.SupervisionStartDate, Report.SupervisionEndDate)), Report.SupervisionStartDate) AS [Diff Between PreviousEndDate And SupervisionStartDate]
FROM Report
One approach:
Use the additional LAG parameters to provide a default value for when its null, and make that value a valid value i.e. 1 day before the StartDate
Use a CTE to calculate the difference in days between the StartDate and previous EndDate.
Then use a second CTE to determine for any given client whether there is an issue.
Finally display your desired results.
WITH cte1 AS (
SELECT
R.*
, DATEDIFF(day, LAG(R.SupervisionEndDate,1,dateadd(day,-1,R.SupervisionStartDate)) OVER (PARTITION BY R.ClientId ORDER BY COALESCE(R.SupervisionStartDate, R.SupervisionEndDate)), R.SupervisionStartDate) AS Diff
FROM Report R
), cte2 AS (
SELECT *
, MAX(COALESCE(Diff,0)) OVER (PARTITION BY ClientId) MaxDiff
, MIN(COALESCE(Diff,0)) OVER (PARTITION BY ClientId) MinDiff
FROM cte1
)
SELECT Id, ClientId, ClientName, SupervisorId, SupervisorName, SupervisionStartDate, SupervisionEndDate
--, Diff, MaxDiff, MinDiff -- Debug
, CASE WHEN MaxDiff = 1 AND MinDiff = 1 THEN 'Valid' ELSE 'Issue Found' END [Status]
FROM cte2
ORDER BY Id;
Notes:
Use the fullname of the datepart you are diff-ing - its much clearer and easier to maintain.
Use short, relevant, table aliases to reduce the code.

Run rate calculate in pgsql

I Have this table:
CREATE TABLE data
(
Event_Date date,
approved int,
rejected int
)
INSERT INTO data (Event_date, approved, rejected)
VALUES
('20190910', '5', '2'),
('20190911', '6', '3'),
('20190912', '5', '2'),
('20190913', '7', '5'),
('20190914', '8', '4'),
('20190915', '10', '2'),
('20190916', '4', '1')
How to make a loop or something else for calculate run rate and get results(in Rolling monthly rate CL I write how formula need to be use) like this:
Event_date approved, rejected Rolling monthly rate
------------------------------------------------------------
20190901 5 2 ---
20190902 6 3 6+5/5+6+2+3
20190903 4 2 6+4/6+3+4+2
20190903 7 5 7+4/4+2+7+5
20190904 8 4 8+4/7+5+8+4
20190905 10 2 ....
20190906 4 1 .....
The lag() function, which returns the previous value, is perfect for this task.
You need to write a case when statement and skip the first entry since there is no previous value and then calculate using the desired formula.
select *, case when row_number() over() > 1
then approved + lag(approved) over() / approved + rejected + lag(approved) over() + lag(rejected) over()
end as rate
from my_table
Demo in DBfiddle

SSRS - Cannot read the next row for dataset DataSet1

with TotCFS as (select count(*)*1.0 as TotalCFS,
'Total CFS' as RowTitle
from PrivilegeData.TABLENAMEC c
where cast(CallCreatedDateTime as date) between #StartDate and #EndDate and CallPriority in ('1', '2', '3', '4', '5') and AreaCommand in ('FH', 'VA', 'NE', 'NW', 'SE', 'SW') and IsBolo = 0
)
select AreaCommand, CallPriority,
avg(datediff(second, CallCreatedDateTime, CallEntryDateTime)) as AverageSeconds,
left(dbo.[ConvertTimeToHHMMSS](avg(datediff(second, CallCreatedDateTime, CallEntryDateTime)), 's'), 7) as DisplayAvg,
'Create to Entry' as RowTitle, 1 as RowSort, b.SortOrder as ColumnSort
from PrivilegeData.TABLENAMEC c
inner join (select distinct AreaCommandAbbreviation, SortOrder from dimBeat) b on c.AreaCommand = b.AreaCommandAbbreviation
where cast(CallCreatedDateTime as date) between #StartDate and #EndDate and CallPriority in ('1', '2', '3', '4', '5') and AreaCommand in ('FH', 'VA', 'NE', 'NW', 'SE', 'SW') and IsBolo = 0
group by AreaCommand, CallPriority, SortOrder
UNION
select AreaCommand, CallPriority,
avg(datediff(second, CallEntryDateTime, CallDispatchDateTime)) as AvgEntryToDispatchSeconds,
left(dbo.ConvertTimeToHHMMSS(avg(datediff(second, CallEntryDateTime, CallDispatchDateTime)), 's'), 7) as DisplayAvgEntryToDispatchSeconds,
'Entry to Dispatch' as RowTitle, 2 , b.SortOrder
from PrivilegeData.TABLENAMEC c
inner join (select distinct AreaCommandAbbreviation, SortOrder from dimBeat) b on c.AreaCommand = b.AreaCommandAbbreviation
where cast(CallCreatedDateTime as date) between #StartDate and #EndDate and CallPriority in ('1', '2', '3', '4', '5') and AreaCommand in ('FH', 'VA', 'NE', 'NW', 'SE', 'SW') and IsBolo = 0
group by AreaCommand, CallPriority, SortOrder
I have about 8 unions I'm doing for this code. the difference is the name of the Row titles. this report has been running for about a year without any problems. I use this code in SSRS query type text. I also have one of my rowset name 'AverageSeconds' configured to read this expression
=IIf((Fields!RowSort.Value) < 7,Format(DateAdd("s", Avg(Fields!AverageSeconds.Value), "00:00:00"), "H:mm:ss"), Sum(Fields!AverageSeconds.Value))
the report some how broke and I have tried everything I find searching to fix it. Please help me with this error 'rsErrorReadingNextDataRow'.
This has got to be an issue with the data being operated upon. Maybe a 0 or NULL value condition.. I would start with reviewing records that were added or changed around the time that the problem began.
I Dropped and recreated the fact table and run my ssis package, which seams to fix it. The reason I did that is because I couldn't find a NULL or 0 value.

Count number of consecutive grouped entries in SQL

I'd like to create and populate the following No. of Entries in Curr.Status field seen below using SQL (sql server).
ID Sequence Prev.Status Curr.Status No. of Entries in Curr.Status
9-9999-9 1 Status D Status A 1
9-9999-9 2 Status A Status A 2
9-9999-9 3 Status A Status A 3
9-9999-9 4 Status A Status A 4
9-9999-9 5 Status A Status B 1
9-9999-9 6 Status B Status B 2
9-9999-9 7 Status B Status B 3
9-9999-9 8 Status B Status A 1
9-9999-9 9 Status A Status A 2
9-9999-9 10 Status A Status C 1
9-9999-9 11 Status C Status C 2
Is there an quick way using something like row_number() --this alone doesn't appear to be sufficient-- to create the field I'm looking for?
Thanks!
This appears to be a Groups and Islands problem. there are plenty of examples out there on how to achieve this, however:
WITH VTE AS(
SELECT *
FROM (VALUES('9-9999-9',1 ,'Status D','Status A'),
('9-9999-9',2 ,'Status A','Status A'),
('9-9999-9',3 ,'Status A','Status A'),
('9-9999-9',4 ,'Status A','Status A'),
('9-9999-9',5 ,'Status A','Status B'),
('9-9999-9',6 ,'Status B','Status B'),
('9-9999-9',7 ,'Status B','Status B'),
('9-9999-9',8 ,'Status B','Status A'),
('9-9999-9',9 ,'Status A','Status A'),
('9-9999-9',10,'Status A','Status C'),
('9-9999-9',11,'Status C','Status C')) V(ID, Sequence, PrevStatus,CurrStatus)),
CTE AS(
SELECT ID,
[Sequence],
PrevStatus,
CurrStatus,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Sequence]) -
ROW_NUMBER() OVER (PARTITION BY ID,CurrStatus ORDER BY [Sequence]) AS Grp
FROM VTE V)
SELECT ID,
[Sequence],
PrevStatus,
CurrStatus,
ROW_NUMBER() OVER (PARTITION BY Grp ORDER BY [Sequence]) AS Entries
FROM CTE;
You can mark the rows where status changes using LAG function, and use SUM() OVER () to assign unique number to each group. Numbering within group is trivial:
DECLARE #t TABLE (ID VARCHAR(100), Sequence INT, PrevStatus VARCHAR(100), CurrStatus VARCHAR(100));
INSERT INTO #t VALUES
('9-9999-9', 1, 'Status D', 'Status A'),
('9-9999-9', 2, 'Status A', 'Status A'),
('9-9999-9', 3, 'Status A', 'Status A'),
('9-9999-9', 4, 'Status A', 'Status A'),
('9-9999-9', 5, 'Status A', 'Status B'),
('9-9999-9', 6, 'Status B', 'Status B'),
('9-9999-9', 7, 'Status B', 'Status B'),
('9-9999-9', 8, 'Status B', 'Status A'),
('9-9999-9', 9, 'Status A', 'Status A'),
('9-9999-9', 10, 'Status A', 'Status C'),
('9-9999-9', 11, 'Status C', 'Status C');
WITH cte1 AS (
SELECT *, CASE WHEN LAG(CurrStatus) OVER(ORDER BY Sequence) = CurrStatus THEN 0 ELSE 1 END AS chg
FROM #t
), cte2 AS (
SELECT *, SUM(chg) OVER(ORDER BY Sequence) AS grp
FROM cte1
), cte3 AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY Sequence) AS SeqInGroup
FROM cte2
)
SELECT *
FROM cte3
ORDER BY Sequence
Demo on DB Fiddle
If the Sequence is identity column then you can do :
select t.*,
row_number() over (partition by (Sequence - seq) order by Sequence) as [No. of Entries in Curr.Status]
from (select t.*,
row_number() over (partition by [Curr.Status] order by Sequence) as seq
from table t
) t;
else you need to generate two row_numbers :
select t.*,
row_number() over (partition by (seq1- seq2) order by Sequence) as [No. of Entries in Curr.Status]
from (select t.*,
row_number() over (partition by id order by Sequence) as seq1
row_number() over (partition by id, [Curr.Status] order by Sequence) as seq2
from table t
) t;