Related
I have a table like the following and I am required to show the subtotal of the use_time_sec column grouping by event_datetime, event_name (only show login), user_id and system_id.
sample input table
with sample_input as (
select '12/01/2023 14:27:59' as event_datetime, 'login' as event_name,'1' as user_id, 'X' as system_id, '0' as use_time_sec
union all
select '12/01/2023 14:28:05', 'screen 1', '1', 'X', '2'
union all
select '12/01/2023 14:28:05', 'screen 2', '1', 'X', '5',
union all
select '12/01/2023 14:28:17', 'screen 1', '1', 'X', '3',
union all
select '12/01/2023 14:28:23', 'logout', '1', '', '0',
union all
select '12/01/2023 14:28:23', 'login', '2', 'Y', '0',
union all
select '12/01/2023 14:28:23', 'screen 1', '2', 'Y', '10',
union all
select '12/01/2023 14:28:24', 'screen 2', '2', 'Y', '100',
union all
select '12/01/2023 14:28:29', 'login', '1', 'X', '0',
union all
select '12/01/2023 14:28:29', 'screen 1', '1', 'X', '500',
union all
select '12/01/2023 14:28:29', 'logout', '1', '', '0',
)
select * from sample_input
sample output
I can loop through the table to get my desired output. But thats not the most efficient solution as there are few millions of record in the table and growing everyday.
Will appreciate if someone can provide a better solution than what I have.
Note: The data is in google BigQuery.
Thanks
This is known as the Gaps and Islands problem. We're trying to identify the islands of user sessions. We need to do a query which gives us some way to identify a session. This relies heavily on window functions.
One way is to count the number of logins seen per user.
select
*,
sum(1)
filter(where event_name = 'login')
over(partition by user_id order by event_time)
as session_num
from events
order by event_time
That will keep a tally per user_id. It will add to the tally every time it sees a user login.
event_time
event_type
user_id
use_time_sec
session_num
1000
login
1
0
1
1001
things
1
3
1
1001
login
2
10
1
1002
logout
1
7
1
1005
logout
2
20
1
1100
login
1
5
2
1101
logout
1
10
2
Now we have a way to identify each user's sessions. We can grouping by user_id and session_num. These are our islands.
with sessions as (
select
*,
sum(1)
filter(where event_name = 'login')
over(partition by user_id order by event_time)
as session_num
from events
order by event_time
)
select
min(event_time) as session_start,
user_id,
sum(use_time_sec) as total_use_time_sec
from sessions
group by user_id, session_num
order by session_start
session_start
user_id
total_use_time_sec
1000
1
10
1001
2
130
1100
1
15
Demonstration in PostgreSQL, but it should work fine on BigQuery.
As my research into Firebird continues, I've attempted to improve some of my queries. As I use Libreoffice Base, I'm not 100% sure how the data entry code works, but I believe it's something like this:
CREATE TABLE "Data Entry"(
ID int,
Date date,
"Vehicle Type" varchar,
events int,
"Hours 1" int,
"Hours 2" int
);
INSERT INTO "Data Entry" VALUES
(1, '31/12/22', 'A', '1', '0', '1'),
(2, '31/12/22', 'A', '1', '0', '1'),
(3, '29/12/22', 'A', '3', '0', '1'),
(4, '25/06/22', 'B1', '1', '0', '1'),
(5, '24/06/22' , 'B1', '1', '1', '0'),
(6, '24/06/22' , 'B1', '1', '1', '0'),
(7, '31/12/22' , 'B2', '7', '0', '1'),
(8, '29/12/22' , 'C', '1', '0', '1'),
(9, '29/12/22' , 'C', '2', '0', '1'),
(10, '19/01/22' , 'D1', '5', '1', '0'),
(11, '23/01/22' , 'D2', '6', '1', '1'),
(12, '29/07/19' , 'D3', '5', '0', '1'),
(13, '21/12/22' , 'D4', '1', '0', '1'),
(14, '19/12/22' , 'D4', '1', '1', '1'),
(15, '19/12/22' , 'D4', '1', '0', '1'),
(16, '28/12/22' , 'E', '2', '0', '1'),
(17, '24/12/22' , 'E', '3', '0', '1'),
(18, '14/07/07' , '1', '0', '0', '1'),
(19, '22/12/22' , '2', '1', '0', '1');
I tried this through the online Fiddle pages, but it throws up errors, so either I'm doing it incorrectly, or it's because there was no option for Firebird. Hopefully irrelevant, as I have the table already through the front-end.
One of my earlier queries which works as expected is shown below, along with its output:
SELECT
"Vehicle Type",
DATEDIFF(DAY, "Date", CURRENT_DATE) AS "Days Since 3rd Last Event"
FROM
(
SELECT
"Date",
"Events",
"Vehicle Type",
"Event Count",
ROW_NUMBER() OVER (PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS "rn"
FROM
(
SELECT
"Date",
"Events",
"Vehicle Type",
SUM("Events") OVER (PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS "Event Count"
FROM "Data Entry"
)
WHERE "Event Count" >= 3
)
WHERE "rn" = 1
Vehicle Type
Days Since 3rd Last Event
A
3
B1
191
B2
1
C
3
D1
347
D2
343
D3
1252
D4
14
E
8
In this output, it does not list every vehicle because not all vehicles have an Event Count that is equal to or greater than 3. The new query I am trying to put together is a combination of different queries (omitted to keep things relevant, plus they already work on their own), with a rewrite of the above code as well:
SELECT
"Vehicle Type",
SUM("Hours 1" + "Hours 2") AS "Total Hours",
MAX(CASE
WHEN
"Total Events" = 3
THEN
DATEDIFF(DAY, "Date", CURRENT_DATE)
END
) "Days Since 3rd Last Event"
FROM
(
SELECT
"Vehicle Type",
"Date",
"Hours 1",
"Hours 2",
CASE
WHEN
"Events" > 0
THEN
SUM( "Events")
OVER(
PARTITION BY "Vehicle Type"
ORDER BY "Date" DESC
)
END
"Total Events"
FROM
"Data Entry"
)
GROUP BY "Vehicle Type"
ORDER BY "Vehicle Type"
The expected output should be:
Vehicle Type
Days Since 3rd Last Event
Total Hours
1
1
2
1
A
3
3
B1
191
3
B2
1
1
C
3
2
D1
347
1
D2
343
2
D3
1252
1
D4
14
4
E
8
2
However, the actual output is:
Vehicle Type
Days Since 3rd Last Event
Total Hours
1
1
2
1
A
3
B1
191
3
B2
1
C
3
2
D1
1
D2
2
D3
1
D4
14
4
E
2
Granted, I've mixed and matched code, made some up myself, and copied some parts from elsewhere online, so there's a good chance I've not understood something correctly and blindly added it in thinking it would work, but now I'm at a loss as to what that could be. I've had a play around with changing the values of the WHEN statements and altering the operators between =, >, and >=, but any deviation from what's currently shown above outputs incorrect numbers. At least the three numbers displayed in the actual output are correct.
You could try using two rankings:
the first one that catches last three rows
the second one that catches your last row among the possible three
then get your date differences.
WITH last_three AS (
SELECT "Vehicle Type", "Date",
SUM("Hours 1"+"Hours 2") OVER(PARTITION BY "Vehicle Type") AS "Total Hours",
ROW_NUMBER() OVER(PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS rn
FROM "Data Entry"
), last_third AS (
SELECT "Vehicle Type", "Date", "Total Hours",
ROW_NUMBER() OVER(PARTITION BY "Vehicle Type" ORDER BY rn DESC) AS rn2
FROM last_three
WHERE rn <= 3
)
SELECT "Vehicle Type",
DATEDIFF(DAY, "Date", CURRENT_DATE) AS "Days Since 3rd Last Event",
"Total Hours"
FROM last_third
WHERE rn2 = 1
ORDER BY "Vehicle Type"
Check the demo here.
Note: You will get values for the "Vehicle Type" 1 and 2 too. If you can explain the rationale behind having those values empty, this query can be tweaked accordingly.
Greetings I need to select a specific date and if the specific date does not match I need to select the max, I'm no expert in SQL this is what I achieved so far but is returning all records:
select
scs.subscription_id,
case
when scs.end_date = max(scs.end_date) then max(scs.end_date)
when scs.end_date = '1900-01-01 00:00:00.000Z' then '1900-01-01 00:00:00.000Z'
end as end_date
from
sim_cards sc
inner join sim_card_subscriptions scs on
sc.id = scs.sim_card_id
where
scs.subscription_id = 1
group by
scs.end_date,
scs.subscription_id
Assuming you want that (match a date X or use MAX) policy to be applied for each row individually and the MAX has to be calculated only across related subscriptions (sharing the same sim_card_id) so that you can use max(end_date::date) over(partition by sim_card_id) window function in your CASE statement to fallback to MAX if the specific date is not matched.
Consider the following sample dataset
with data (id, sim_card_id, end_date) as (
values
(1, 1, '2021-08-05 20:21:00'),
(2, 1, '2021-10-10 12:12:10'),
(3, 1, '2021-12-11 00:11:14'),
(4, 2, '2021-12-14 09:08:45'),
(5, 2, '2021-12-14 15:42:07'),
(6, 3, '2021-10-09 20:20:33')
)
select
id,
case
when end_date::date = '2021-10-10' then end_date::date
else max(end_date::date) over(partition by sim_card_id)
end as end_date
from data
which yields the following output:
1 2021-12-11 -- max across sim_card ID=1
2 2021-10-10 -- matches desired date
3 2021-12-11 -- max across sim_card ID=1
4 2021-12-14 -- max across sim_card ID=2
5 2021-12-14 -- max across sim_card ID=2
6 2021-10-09 -- matches desired date
with TotCFS as (select count(*)*1.0 as TotalCFS,
'Total CFS' as RowTitle
from PrivilegeData.TABLENAMEC c
where cast(CallCreatedDateTime as date) between #StartDate and #EndDate and CallPriority in ('1', '2', '3', '4', '5') and AreaCommand in ('FH', 'VA', 'NE', 'NW', 'SE', 'SW') and IsBolo = 0
)
select AreaCommand, CallPriority,
avg(datediff(second, CallCreatedDateTime, CallEntryDateTime)) as AverageSeconds,
left(dbo.[ConvertTimeToHHMMSS](avg(datediff(second, CallCreatedDateTime, CallEntryDateTime)), 's'), 7) as DisplayAvg,
'Create to Entry' as RowTitle, 1 as RowSort, b.SortOrder as ColumnSort
from PrivilegeData.TABLENAMEC c
inner join (select distinct AreaCommandAbbreviation, SortOrder from dimBeat) b on c.AreaCommand = b.AreaCommandAbbreviation
where cast(CallCreatedDateTime as date) between #StartDate and #EndDate and CallPriority in ('1', '2', '3', '4', '5') and AreaCommand in ('FH', 'VA', 'NE', 'NW', 'SE', 'SW') and IsBolo = 0
group by AreaCommand, CallPriority, SortOrder
UNION
select AreaCommand, CallPriority,
avg(datediff(second, CallEntryDateTime, CallDispatchDateTime)) as AvgEntryToDispatchSeconds,
left(dbo.ConvertTimeToHHMMSS(avg(datediff(second, CallEntryDateTime, CallDispatchDateTime)), 's'), 7) as DisplayAvgEntryToDispatchSeconds,
'Entry to Dispatch' as RowTitle, 2 , b.SortOrder
from PrivilegeData.TABLENAMEC c
inner join (select distinct AreaCommandAbbreviation, SortOrder from dimBeat) b on c.AreaCommand = b.AreaCommandAbbreviation
where cast(CallCreatedDateTime as date) between #StartDate and #EndDate and CallPriority in ('1', '2', '3', '4', '5') and AreaCommand in ('FH', 'VA', 'NE', 'NW', 'SE', 'SW') and IsBolo = 0
group by AreaCommand, CallPriority, SortOrder
I have about 8 unions I'm doing for this code. the difference is the name of the Row titles. this report has been running for about a year without any problems. I use this code in SSRS query type text. I also have one of my rowset name 'AverageSeconds' configured to read this expression
=IIf((Fields!RowSort.Value) < 7,Format(DateAdd("s", Avg(Fields!AverageSeconds.Value), "00:00:00"), "H:mm:ss"), Sum(Fields!AverageSeconds.Value))
the report some how broke and I have tried everything I find searching to fix it. Please help me with this error 'rsErrorReadingNextDataRow'.
This has got to be an issue with the data being operated upon. Maybe a 0 or NULL value condition.. I would start with reviewing records that were added or changed around the time that the problem began.
I Dropped and recreated the fact table and run my ssis package, which seams to fix it. The reason I did that is because I couldn't find a NULL or 0 value.
I am trying to increase week number so that the week number of the year will not repeat
For example 18/12/2016 is week number 51, 25/12/2016 is week number 52 and 01/01/2017 is week number 1, 08/01/2017 is week number 2.
What I want is 18/12/2016 week NO 51, 25/12/2016 week No 52, week No 53, 08/01/2017 No week 54 etc so that no week will repeat
See the sample of my dataset and query I have tried
DECLARE #Table1 table (Week_End_Date DATE)
INSERT INTO #Table1
VALUES ('2016-04-03'), ('2016-04-10'), ('2016-04-17'), ('2016-04-24'),
('2016-05-01'), ('2016-05-08'), ('2016-05-15'), ('2016-05-22'),
('2016-05-29'), ('2016-06-05'), ('2016-06-12'), ('2016-06-19'),
('2016-06-26'), ('2016-07-03'), ('2016-07-10'), ('2016-07-17'),
('2016-07-24'), ('2016-07-31'), ('2016-08-07'), ('2016-08-14'),
('2016-08-21'), ('2016-08-28'), ('2016-09-04'), ('2016-09-11'),
('2016-09-18'), ('2016-09-25'), ('2016-10-02'), ('2016-10-09'),
('2016-10-16'), ('2016-10-23'), ('2016-10-30'), ('2016-11-06'),
('2016-11-13'), ('2016-11-20'), ('2016-11-27'), ('2016-12-04'),
('2016-12-11'), ('2016-12-18'), ('2016-12-25'), ('2017-01-01'),
('2017-01-08'), ('2017-01-15'), ('2017-01-22'), ('2017-01-29'),
('2017-02-05'), ('2017-02-12'), ('2017-02-19'), ('2017-02-26'),
('2017-03-05'), ('2017-03-12'), ('2017-03-19'), ('2017-03-26'),
('2017-04-02'), ('2017-04-09'), ('2017-04-16'), ('2017-04-23'),
('2017-04-30'), ('2017-05-07'), ('2017-05-14'), ('2017-05-21'),
('2017-05-28'), ('2017-06-04'), ('2017-06-11'), ('2017-06-18'),
('2017-06-25'), ('2017-07-02'), ('2017-07-09'), ('2017-07-16'),
('2017-07-23'), ('2017-07-30'), ('2017-08-06'), ('2017-08-13'),
('2017-08-20'), ('2017-08-27'), ('2017-09-03'), ('2017-09-10'),
('2017-09-17'), ('2017-09-24'), ('2017-10-01'), ('2017-10-08'),
('2017-10-15'), ('2017-10-22'), ('2017-10-29'), ('2017-11-05'),
('2017-11-12'), ('2017-11-19'), ('2017-11-26'), ('2017-12-03'),
('2017-12-10'), ('2017-12-17'), ('2017-12-24'), ('2017-12-31'),
('2018-01-07'), ('2018-01-14'), ('2018-01-21'), ('2018-01-28'),
('2018-02-04'), ('2018-02-11'), ('2018-02-18'), ('2018-02-25'),
('2018-03-04'), ('2018-03-11'), ('2018-03-18'), ('2018-03-25'),
('2018-04-01')
Query:
SELECT
Week_End_Date,
DATEPART(WEEK,Week_End_Date) AS WeekNumber,
REPLACE(LEFT(Week_End_Date,7),'-','') +
CASE
WHEN CAST(DATEPART(WEEK, Week_End_Date) AS VARCHAR(2)) IN ('1', '2', '3', '4', '5', '6', '7', '8', '9')
THEN '0' + CAST(DATEPART(WEEK, Week_End_Date) AS VARCHAR(2))
ELSE CAST(DATEPART(WEEK, Week_End_Date) AS VARCHAR(2))
END Wk_NO_Norepeat
FROM
#Table1
ORDER BY
Week_End_Date
Desired output
Date Current week number Expected output
04/12/2016 49 49
11/12/2016 50 50
18/12/2016 51 51
25/12/2016 52 52
01/01/2017 1 53
08/01/2017 2 54
15/01/2017 3 55
22/01/2017 4 56
29/01/2017 5 57
You could use ROW_NUMBER(). If i understand it correctly you may use something like:
DECLARE #Table1 table (Week_End_Date DATE)
INSERT INTO #Table1
VALUES ('2016-04-03'), ('2016-04-10'), ('2016-04-17'), ('2016-04-24'),
('2016-05-01'), ('2016-05-08'), ('2016-05-15'), ('2016-05-22'),
('2016-05-29'), ('2016-06-05'), ('2016-06-12'), ('2016-06-19'),
('2016-06-26'), ('2016-07-03'), ('2016-07-10'), ('2016-07-17'),
('2016-07-24'), ('2016-07-31'), ('2016-08-07'), ('2016-08-14'),
('2016-08-21'), ('2016-08-28'), ('2016-09-04'), ('2016-09-11'),
('2016-09-18'), ('2016-09-25'), ('2016-10-02'), ('2016-10-09'),
('2016-10-16'), ('2016-10-23'), ('2016-10-30'), ('2016-11-06'),
('2016-11-13'), ('2016-11-20'), ('2016-11-27'), ('2016-12-04'),
('2016-12-11'), ('2016-12-18'), ('2016-12-25'), ('2017-01-01'),
('2017-01-08'), ('2017-01-15'), ('2017-01-22'), ('2017-01-29'),
('2017-02-05'), ('2017-02-12'), ('2017-02-19'), ('2017-02-26'),
('2017-03-05'), ('2017-03-12'), ('2017-03-19'), ('2017-03-26'),
('2017-04-02'), ('2017-04-09'), ('2017-04-16'), ('2017-04-23'),
('2017-04-30'), ('2017-05-07'), ('2017-05-14'), ('2017-05-21'),
('2017-05-28'), ('2017-06-04'), ('2017-06-11'), ('2017-06-18'),
('2017-06-25'), ('2017-07-02'), ('2017-07-09'), ('2017-07-16'),
('2017-07-23'), ('2017-07-30'), ('2017-08-06'), ('2017-08-13'),
('2017-08-20'), ('2017-08-27'), ('2017-09-03'), ('2017-09-10'),
('2017-09-17'), ('2017-09-24'), ('2017-10-01'), ('2017-10-08'),
('2017-10-15'), ('2017-10-22'), ('2017-10-29'), ('2017-11-05'),
('2017-11-12'), ('2017-11-19'), ('2017-11-26'), ('2017-12-03'),
('2017-12-10'), ('2017-12-17'), ('2017-12-24'), ('2017-12-31'),
('2018-01-07'), ('2018-01-14'), ('2018-01-21'), ('2018-01-28'),
('2018-02-04'), ('2018-02-11'), ('2018-02-18'), ('2018-02-25'),
('2018-03-04'), ('2018-03-11'), ('2018-03-18'), ('2018-03-25'),
('2018-04-01')
DECLARE #FirstWeek INT = (SELECT DATEPART(WEEK,MIN(Week_End_Date))-1 FROM #Table1)
SELECT
Week_End_Date,
ROW_NUMBER() OVER (ORDER BY Week_End_Date)+#FirstWeek AS WeekNumber
FROM
#Table1
Perhaps this does what you want:
select t1.weekenddate, datepart(week, t1.weekenddate) AS WeekNumber,
(first_value(datepart(week, t1.weekenddate)) over (order by t1.weekenddate) +
row_number() over (order by t1.weekenddate) - 1
) as newweeknumber
from #table1 t1;
It calculates the first week number in the data and then just increments the week number by 1 in subsequent rows.