I have a table like the following and I am required to show the subtotal of the use_time_sec column grouping by event_datetime, event_name (only show login), user_id and system_id.
sample input table
with sample_input as (
select '12/01/2023 14:27:59' as event_datetime, 'login' as event_name,'1' as user_id, 'X' as system_id, '0' as use_time_sec
union all
select '12/01/2023 14:28:05', 'screen 1', '1', 'X', '2'
union all
select '12/01/2023 14:28:05', 'screen 2', '1', 'X', '5',
union all
select '12/01/2023 14:28:17', 'screen 1', '1', 'X', '3',
union all
select '12/01/2023 14:28:23', 'logout', '1', '', '0',
union all
select '12/01/2023 14:28:23', 'login', '2', 'Y', '0',
union all
select '12/01/2023 14:28:23', 'screen 1', '2', 'Y', '10',
union all
select '12/01/2023 14:28:24', 'screen 2', '2', 'Y', '100',
union all
select '12/01/2023 14:28:29', 'login', '1', 'X', '0',
union all
select '12/01/2023 14:28:29', 'screen 1', '1', 'X', '500',
union all
select '12/01/2023 14:28:29', 'logout', '1', '', '0',
)
select * from sample_input
sample output
I can loop through the table to get my desired output. But thats not the most efficient solution as there are few millions of record in the table and growing everyday.
Will appreciate if someone can provide a better solution than what I have.
Note: The data is in google BigQuery.
Thanks
This is known as the Gaps and Islands problem. We're trying to identify the islands of user sessions. We need to do a query which gives us some way to identify a session. This relies heavily on window functions.
One way is to count the number of logins seen per user.
select
*,
sum(1)
filter(where event_name = 'login')
over(partition by user_id order by event_time)
as session_num
from events
order by event_time
That will keep a tally per user_id. It will add to the tally every time it sees a user login.
event_time
event_type
user_id
use_time_sec
session_num
1000
login
1
0
1
1001
things
1
3
1
1001
login
2
10
1
1002
logout
1
7
1
1005
logout
2
20
1
1100
login
1
5
2
1101
logout
1
10
2
Now we have a way to identify each user's sessions. We can grouping by user_id and session_num. These are our islands.
with sessions as (
select
*,
sum(1)
filter(where event_name = 'login')
over(partition by user_id order by event_time)
as session_num
from events
order by event_time
)
select
min(event_time) as session_start,
user_id,
sum(use_time_sec) as total_use_time_sec
from sessions
group by user_id, session_num
order by session_start
session_start
user_id
total_use_time_sec
1000
1
10
1001
2
130
1100
1
15
Demonstration in PostgreSQL, but it should work fine on BigQuery.
Related
As my research into Firebird continues, I've attempted to improve some of my queries. As I use Libreoffice Base, I'm not 100% sure how the data entry code works, but I believe it's something like this:
CREATE TABLE "Data Entry"(
ID int,
Date date,
"Vehicle Type" varchar,
events int,
"Hours 1" int,
"Hours 2" int
);
INSERT INTO "Data Entry" VALUES
(1, '31/12/22', 'A', '1', '0', '1'),
(2, '31/12/22', 'A', '1', '0', '1'),
(3, '29/12/22', 'A', '3', '0', '1'),
(4, '25/06/22', 'B1', '1', '0', '1'),
(5, '24/06/22' , 'B1', '1', '1', '0'),
(6, '24/06/22' , 'B1', '1', '1', '0'),
(7, '31/12/22' , 'B2', '7', '0', '1'),
(8, '29/12/22' , 'C', '1', '0', '1'),
(9, '29/12/22' , 'C', '2', '0', '1'),
(10, '19/01/22' , 'D1', '5', '1', '0'),
(11, '23/01/22' , 'D2', '6', '1', '1'),
(12, '29/07/19' , 'D3', '5', '0', '1'),
(13, '21/12/22' , 'D4', '1', '0', '1'),
(14, '19/12/22' , 'D4', '1', '1', '1'),
(15, '19/12/22' , 'D4', '1', '0', '1'),
(16, '28/12/22' , 'E', '2', '0', '1'),
(17, '24/12/22' , 'E', '3', '0', '1'),
(18, '14/07/07' , '1', '0', '0', '1'),
(19, '22/12/22' , '2', '1', '0', '1');
I tried this through the online Fiddle pages, but it throws up errors, so either I'm doing it incorrectly, or it's because there was no option for Firebird. Hopefully irrelevant, as I have the table already through the front-end.
One of my earlier queries which works as expected is shown below, along with its output:
SELECT
"Vehicle Type",
DATEDIFF(DAY, "Date", CURRENT_DATE) AS "Days Since 3rd Last Event"
FROM
(
SELECT
"Date",
"Events",
"Vehicle Type",
"Event Count",
ROW_NUMBER() OVER (PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS "rn"
FROM
(
SELECT
"Date",
"Events",
"Vehicle Type",
SUM("Events") OVER (PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS "Event Count"
FROM "Data Entry"
)
WHERE "Event Count" >= 3
)
WHERE "rn" = 1
Vehicle Type
Days Since 3rd Last Event
A
3
B1
191
B2
1
C
3
D1
347
D2
343
D3
1252
D4
14
E
8
In this output, it does not list every vehicle because not all vehicles have an Event Count that is equal to or greater than 3. The new query I am trying to put together is a combination of different queries (omitted to keep things relevant, plus they already work on their own), with a rewrite of the above code as well:
SELECT
"Vehicle Type",
SUM("Hours 1" + "Hours 2") AS "Total Hours",
MAX(CASE
WHEN
"Total Events" = 3
THEN
DATEDIFF(DAY, "Date", CURRENT_DATE)
END
) "Days Since 3rd Last Event"
FROM
(
SELECT
"Vehicle Type",
"Date",
"Hours 1",
"Hours 2",
CASE
WHEN
"Events" > 0
THEN
SUM( "Events")
OVER(
PARTITION BY "Vehicle Type"
ORDER BY "Date" DESC
)
END
"Total Events"
FROM
"Data Entry"
)
GROUP BY "Vehicle Type"
ORDER BY "Vehicle Type"
The expected output should be:
Vehicle Type
Days Since 3rd Last Event
Total Hours
1
1
2
1
A
3
3
B1
191
3
B2
1
1
C
3
2
D1
347
1
D2
343
2
D3
1252
1
D4
14
4
E
8
2
However, the actual output is:
Vehicle Type
Days Since 3rd Last Event
Total Hours
1
1
2
1
A
3
B1
191
3
B2
1
C
3
2
D1
1
D2
2
D3
1
D4
14
4
E
2
Granted, I've mixed and matched code, made some up myself, and copied some parts from elsewhere online, so there's a good chance I've not understood something correctly and blindly added it in thinking it would work, but now I'm at a loss as to what that could be. I've had a play around with changing the values of the WHEN statements and altering the operators between =, >, and >=, but any deviation from what's currently shown above outputs incorrect numbers. At least the three numbers displayed in the actual output are correct.
You could try using two rankings:
the first one that catches last three rows
the second one that catches your last row among the possible three
then get your date differences.
WITH last_three AS (
SELECT "Vehicle Type", "Date",
SUM("Hours 1"+"Hours 2") OVER(PARTITION BY "Vehicle Type") AS "Total Hours",
ROW_NUMBER() OVER(PARTITION BY "Vehicle Type" ORDER BY "Date" DESC) AS rn
FROM "Data Entry"
), last_third AS (
SELECT "Vehicle Type", "Date", "Total Hours",
ROW_NUMBER() OVER(PARTITION BY "Vehicle Type" ORDER BY rn DESC) AS rn2
FROM last_three
WHERE rn <= 3
)
SELECT "Vehicle Type",
DATEDIFF(DAY, "Date", CURRENT_DATE) AS "Days Since 3rd Last Event",
"Total Hours"
FROM last_third
WHERE rn2 = 1
ORDER BY "Vehicle Type"
Check the demo here.
Note: You will get values for the "Vehicle Type" 1 and 2 too. If you can explain the rationale behind having those values empty, this query can be tweaked accordingly.
I Have this table:
CREATE TABLE data
(
Event_Date date,
approved int,
rejected int
)
INSERT INTO data (Event_date, approved, rejected)
VALUES
('20190910', '5', '2'),
('20190911', '6', '3'),
('20190912', '5', '2'),
('20190913', '7', '5'),
('20190914', '8', '4'),
('20190915', '10', '2'),
('20190916', '4', '1')
How to make a loop or something else for calculate run rate and get results(in Rolling monthly rate CL I write how formula need to be use) like this:
Event_date approved, rejected Rolling monthly rate
------------------------------------------------------------
20190901 5 2 ---
20190902 6 3 6+5/5+6+2+3
20190903 4 2 6+4/6+3+4+2
20190903 7 5 7+4/4+2+7+5
20190904 8 4 8+4/7+5+8+4
20190905 10 2 ....
20190906 4 1 .....
The lag() function, which returns the previous value, is perfect for this task.
You need to write a case when statement and skip the first entry since there is no previous value and then calculate using the desired formula.
select *, case when row_number() over() > 1
then approved + lag(approved) over() / approved + rejected + lag(approved) over() + lag(rejected) over()
end as rate
from my_table
Demo in DBfiddle
Would you, pleace, help me, to count cummulative sum in sql server 2017. Condition is: 1) partition by client 2) order by date_tm. Desirable result is in the table below.
create table #clients (client nvarchar(1)
, date_tm datetime
,sum_pay int
, desirable_result int)
insert into #clients
(client, date_tm, sum_pay, desirable_result)
select '1', '2020-01-01', 10, 10 union all
select '1', '2020-01-02', 20, 30 union all
select '2', '2020-01-03', 20, 60 union all
select '2', '2020-01-01', 20, 20 union all
select '2', '2020-01-02', 20, 40 union all
select '3', '2020-01-01', 20, 20 union all
select '3', '2020-01-04', 20, 70 union all
select '3', '2020-01-02', 30, 50
select * from #clients
drop table if exists #clients
Thank you very much.
are finding below
select c.*,sum(sum_pay) over(partition by client order by date_tm)
from #clients c
You can use sum()over() window function as below:
select * ,SUM (sum_pay) OVER (partition by client order by date_tm) AS cummulativesum from #clients
SELECT * ,
CASE WHEN desirable_result = cum_sum THEN 'OK' ELSE 'NO' END AS Status
FROM
(
select
*,
SUM (sum_pay) OVER (partition by client order by date_tm) AS cum_sum
from #clients as tbl
) as a
with this code you can compare, desirable_result and cummilative sum
I´m looking in to my user database to see how often the users change gender, there are three possible genders in the database, Men, Women and Unknown.
To track the changes i have a table that creates a new row every time a user changes any userdata. I would like to use this to see how many "gender swaps" that is performed every month.
Below shows a table for user 123, note that there could have been other changes to the user that didn´t change the gender, so each row is not a gender change.
The closest i have gotten is the query below that finds if there where two different gender changes in one month. But that requires the user to have two rows with different gender witin a month which doesn´t cover all the situations. For example they could have been registred as M and then changed to W in July.
#Test Query
SELECT
user_id ,
CONCAT(
(CAST(EXTRACT(year
FROM
modified_date ) AS string)),
( CAST(EXTRACT(month
FROM
modified_date ) AS string)))AS yearmonth ,
array_agg(gender order by modified desc limit 1)[safe_ordinal(1)] as agg_first,
array_agg(gender order by modified asc limit 1)[safe_ordinal(1)] as agg_last
FROM
START_TABLE
group by 1,2
#START_TABLE
SELECT 123 as user_id, "MALE" as gender, "2019-06-03 14:53:13 UTC" as modified_date
UNION ALL
select 123,"MALE", "2019-06-09 14:53:13 UTC"
UNION ALL
select 123,"FEMALE", "2019-06-14 14:53:13 UTC"
union all
select 123, "MALE", "2019-07-03 14:53:13 UTC"
UNION ALL
select 123,"MALE", "2019-07-09 14:53:13 UTC"
UNION ALL
select 123,"MALE", "2019-07-21 14:53:13 UTC"
union all
select 123,"MALE", "2019-08-01 14:53:13 UTC"
union all
select 123,"MALE", "2019-08-02 14:53:13 UTC"
union all
select 123, "UNKNOWN", "2019-09-03 14:53:13 UTC"
#RESULT_TABLE
SELECT "2019-06" as yearmonth, 1 as m_to_w, 0 as m_to_u, 0 as w_to_m, 0 as w_to_u, 0 as u_to_m, 0 as u_to_w
UNION ALL
select "2019-07",0,0,1,0,0,0
UNION ALL
select "2019-09",0,1,0,0,0,0
Below is for BigQuery Standard SQL
#standardSQL
SELECT user_id, FORMAT_DATE('%Y-%m', DATE(modified_date)) yearmonth,
COUNTIF((prev_gender, gender) = ('MALE', 'FEMALE')) AS m_to_w,
COUNTIF((prev_gender, gender) = ('MALE', 'UNKNOWN')) AS m_to_u,
COUNTIF((prev_gender, gender) = ('FEMALE', 'MALE')) AS w_to_m,
COUNTIF((prev_gender, gender) = ('FEMALE', 'UNKNOWN')) AS w_to_u,
COUNTIF((prev_gender, gender) = ('UNKNOWN', 'MALE')) AS u_to_m,
COUNTIF((prev_gender, gender) = ('UNKNOWN', 'FEMALE')) AS u_to_w
FROM (
SELECT *, LAG(gender) OVER(PARTITION BY user_id ORDER BY modified_date) prev_gender
FROM `project.dataset.table`
)
GROUP BY user_id, yearmonth
HAVING m_to_w + m_to_u + w_to_m + w_to_u + u_to_m + u_to_w > 0
-- ORDER BY user_id, yearmonth
If to apply to sample data from your question
WITH `project.dataset.table` AS (
SELECT 123 AS user_id, 'MALE' AS gender, TIMESTAMP '2019-06-03 14:53:13 UTC' AS modified_date UNION ALL
SELECT 123, 'MALE', '2019-06-09 14:53:13 UTC' UNION ALL
SELECT 123, 'FEMALE', '2019-06-14 14:53:13 UTC' UNION ALL
SELECT 123, 'MALE', '2019-07-03 14:53:13 UTC' UNION ALL
SELECT 123, 'MALE', '2019-07-09 14:53:13 UTC' UNION ALL
SELECT 123, 'MALE', '2019-07-21 14:53:13 UTC' UNION ALL
SELECT 123, 'MALE', '2019-08-01 14:53:13 UTC' UNION ALL
SELECT 123, 'MALE', '2019-08-02 14:53:13 UTC' UNION ALL
SELECT 123, 'UNKNOWN', '2019-09-03 14:53:13 UTC'
)
result is
Row user_id yearmonth m_to_w m_to_u w_to_m w_to_u u_to_m u_to_w
1 123 2019-06 1 0 0 0 0 0
2 123 2019-07 0 0 1 0 0 0
3 123 2019-09 0 1 0 0 0 0
Use lag() to find when users change gender:
select st.*
from (select st.*,
lag(gender) over (partition by user order by modified date) as prev_gender
from start_table st
) st
where prev_gender <> gender;
With this logic, you can summarize:
select date_trunc(modified_date, month) as yyyymm,
count(*) as num_changes,
count(distinct user) as num_users
from (select st.*,
lag(gender) over (partition by user order by modified_date) as prev_gender
from start_table st
) st
where prev_gender <> gender
group by yyyymm
order by yyyymm;
EDIT:
You want the specific changes as well. You can use COUNTIF():
select date_trunc(modified_date, month) as yyyymm,
countif( prev_gender = 'MALE' and gender = 'FEMALE' ) as m_to_f,
countif( prev_gender = 'MALE' and gender = 'UNKNOWN' ) as m_to_u,
. . .
from (select st.*,
lag(gender) over (partition by user order by modified_date) as prev_gender
from start_table st
) st
group by yyyymm
order by yyyymm;
You can use LAG function to achieve this. Here is a version which uses LAG and IF function to take into account those discrete conditions:
with processed as (
select
concat(
cast(extract(year from modified_date) as string),
'-',
cast(extract(month from modified_date) as string)
) as yearmonth,
lower(gender) as gender,
lower(lag(gender) over (partition by user_id order by modified_date)) as previous_gender
from `mytable.dataset`
)
select
yearmonth,
sum(if (previous_gender = "male" and gender = "female", 1, 0) ) as m_to_f,
sum(if (previous_gender = "male" and gender = "unknown", 1, 0) ) as m_to_u,
sum(if (previous_gender = "female" and gender = "male", 1, 0) ) as f_to_m,
sum(if (previous_gender = "female" and gender = "unknown", 1, 0) ) as f_to_u,
sum(if (previous_gender = "unknown" and gender = "male", 1, 0) ) as u_to_m,
sum(if (previous_gender = "unknown" and gender = "female", 1, 0) ) as u_to_f
from processed
group by 1
Hope it helps.
I need to count occurrences of protocol violations and durations between 2 dates from table to achieve effect like statistics table which will look like at the picture below:
Expected effect:
Explanation:
As you can see I need to select 'Country', 'Site' existing in Violations table and: 'Numbers', 'Maximum', 'Minimum' and 'Mean' of protocol violations duration existing in DB in the same table 'Violations' between two dates. So we have to count:
protocol violations occurrences existing in Violations table by country and site
min/max/avg durations of protocol violations by country and site
under two different conditions:
occurrences from Date Discovered to Date Reported
occurrences from Date Reported to Date Confirmed
Database Structure:
Available at SQLFILDDLE: Look HERE
I will add that code in attached SQLFIDDLE has more tables and an query but they are unnecessary right now for this problem. Feel free to use it.
I didn't remove old query because there is nice way to do:
'- All -' and
'- Unknown -' values. -
Violation table:
create table violations (
id long,
country varchar(20),
site varchar(20),
status_id int,
trial_id int,
discovered_date date,
reporded_date date,
confirmed_date date
);
Site table:
create table site (
id long,
site varchar(20)
);
My First try:
Here is my new SQLFIDDLE with query needed to improve commented lines:
SELECT v.country as country, v.site as site,
COUNT(*) as N --,
--MAX(list of durations in days between discovered date to repored date on each violation by country and site) as "Maximum",
--MIN(list of durations in days between discovered date to repored date on each violation by country and site) as "Minimum",
--AVG(list of durations in days between discovered date to repored date on each violation by country and site) as "Mean"
FROM violations v
WHERE v.trial_id = 3
GROUP BY ROLLUP (v.country, v.site)
I've managed to create abstract query with my idea. But I have a problem to write correct query for MAX, MIN and AVG where we must select max/min/avg value from list of durations in days between discovered date to reported date on each violation by country and site.
Could you help me please?
Please check this query. It is simplified and may give you an idea and direction. If you need more then this then let me know. Copy and paste to see results. This query will select and calc only the results between two dates in where clause. You need to run inner query first w/out where to see all dates etc... This query counts violations between 2 dates. Not sure what is the list of duration in days... See below for count of duration. You may add MAX/MIN etc...
-- Days between (duration) = (end_date-start_date) = number of days (number) --
SELECT (to_date('14-MAR-2013') - to_date('01-MAR-2013')) days_between
FROM dual
/
SELECT country, site
, Count(*) total_viol
, MAX(susp_viol) max_susp_viol
, MIN(susp_viol) min_susp_viol
FROM
(
SELECT 'GERMANY' country, '12222' site, 1 susp_viol, 2 conf_viol, trunc(Sysdate-30) disc_date, trunc(Sysdate-25) conf_date
FROM dual
UNION
SELECT 'GERMANY', '12222' , 3 , 14, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'GERMANY', '12222' , 6 , 25, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'GERMANY', '12222' , 2 , 1, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'GERMANY', '13333' , 10 , 5, trunc(Sysdate-15) , trunc(Sysdate-10) FROM dual
UNION
SELECT 'GERMANY', '13333' , 15 , 3, trunc(Sysdate-15) , trunc(Sysdate-10) FROM dual
UNION
SELECT 'GERMANY', 'Unknown Site' , 0 , 7, trunc(Sysdate-5) , trunc(Sysdate-2) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 1 , 5, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 2 , 10, trunc(Sysdate-15) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'RUSSIA', 'Unknown Site' , 10 , 10, trunc(Sysdate-3) , trunc(Sysdate-1) FROM dual
)
-- replace sysdate with your_date-default format is to_date('14-MAR-2013') or give format mask
WHERE conf_date BETWEEN trunc(Sysdate-20) AND trunc(Sysdate-10)
GROUP BY ROLLUP (country, site)
ORDER BY country, site
/
Count of duration:
SELECT country, site, (conf_date-disc_date) duration, count(*) total_durations
FROM
(
SELECT 'GERMANY' country, '12222' site, 1 susp_viol, 2 conf_viol, trunc(Sysdate-30) disc_date, trunc(Sysdate-20) conf_date
FROM dual
UNION
SELECT 'GERMANY', '12222' , 3 , 14, trunc(Sysdate-20) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'GERMANY', '12222' , 6 , 25, trunc(Sysdate-20) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'GERMANY', '12222' , 2 , 1, trunc(Sysdate-20) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'GERMANY', '13333' , 10 , 5, trunc(Sysdate-12) , trunc(Sysdate-6) FROM dual
UNION
SELECT 'GERMANY', '13333' , 15 , 3, trunc(Sysdate-17) , trunc(Sysdate-11) FROM dual
UNION
SELECT 'GERMANY', 'Unknown Site' , 0 , 7, trunc(Sysdate-5) , trunc(Sysdate-2) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 1 , 5, trunc(Sysdate-20) , trunc(Sysdate-15) FROM dual
UNION
SELECT 'RUSSIA', '12345' , 2 , 10, trunc(Sysdate-15) , trunc(Sysdate-12) FROM dual
UNION
SELECT 'RUSSIA', 'Unknown Site' , 10 , 10, trunc(Sysdate-3) , trunc(Sysdate-1) FROM dual
)
WHERE conf_date BETWEEN trunc(Sysdate-20) AND trunc(Sysdate-10)
GROUP BY ROLLUP (country, site, (conf_date-disc_date))
ORDER BY country, site
/