How to prevent SQL query from returning overlapping groups? - sql

I'm trying to generate a report that displays the number of failed login attempts that happen within 30 minutes of each other. The data for this report is in a SQL database.
This is the query I'm using to pull the data out.
SELECT
A.LoginID,
A.LogDatetime AS firstAttempt,
MAX(B.LogDatetime) AS lastAttempt,
COUNT(B.LoginID) + 1 AS attempts
FROM
UserLoginHistory A
JOIN UserLoginHistory B ON A.LoginID = B.LoginID
WHERE
A.SuccessfulFlag = 0
AND B.SuccessfulFlag = 0
AND A.LogDatetime < B.LogDatetime
AND B.LogDatetime <= DATEADD(minute, 30, A.LogDatetime)
GROUP BY
A.LoginID, A.LogDatetime
ORDER BY
A.LoginID, A.LogDatetime
This returns results that looks something like this:
Row
LoginID
firstAttempt
lastAttempt
attempts
1
1
2022-05-01 00:00
2022-05-01 00:29
6
2
1
2022-05-01 00:06
2022-05-01 00:33
6
3
1
2022-05-01 00:13
2022-05-01 00:39
6
4
1
2022-05-01 00:15
2022-05-01 00:45
6
5
1
2022-05-01 00:20
2022-05-01 00:50
6
6
1
2022-05-01 00:29
2022-05-01 00:55
6
7
1
2022-05-01 00:33
2022-05-01 01:01
6
8
1
2022-05-01 00:39
2022-05-01 01:04
6
...
...
...
...
...
However, you can see that the rows overlap a lot. For example, row 1 shows attempts from 00:00 to 00:29, which overlaps with row 2 showing attempts from 00:06 to 00:33. Row 2 ought to be like row 7 (00:33 - 01:01), since that row's firstAttempt is the next one after row 1's lastAttempt.

You might need to use recursive CTE's or insert your data into a temp table and loop it with updates to remove the overlaps.
Do you need to have set starting times? As a quick work around you could round down the the DATETIME to 30 minute intervals, that would ensure the logins don't overlap but it will only group the attempts by 30 minute buckets
SELECT
A.LoginID,
DATEADD(MINUTE, ROUND(DATEDIFF(MINUTE, '2022-01-01', A.LogDatetime) / 30.0, 0) * 30, '2022-01-01') AS LoginInterval,
MIN(A.LogDatetime) AS firstAttempt,
MAX(A.LogDatetime) AS lastAttempt,
COUNT(*) attempts
FROM
UserLoginHistory A
WHERE
A.SuccessfulFlag = 0
GROUP BY
A.LoginID, DATEADD(MINUTE, ROUND(DATEDIFF(MINUTE, '2022-01-01', A.LogDatetime) / 30.0, 0) * 30, '2022-01-01')
ORDER BY
A.LoginID, LoginInterval

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

Postgres table transformation: transposing values of a column into new columns

Is there a way to transpose/flatten the following table -
userId
time window
propertyId
count
sum
avg
max
1
01:00 - 02:00
a
2
5
1.5
3
1
02:00 - 03:00
a
4
15
2.5
6
1
01:00 - 02:00
b
2
5
1.5
3
1
02:00 - 03:00
b
4
15
2.5
6
2
01:00 - 02:00
a
2
5
1.5
3
2
02:00 - 03:00
a
4
15
2.5
6
2
01:00 - 02:00
b
2
5
1.5
3
2
02:00 - 03:00
b
4
15
2.5
6
to something like this -
userId
time window
a_count
a_sum
a_avg
a_max
b_count
b_sum
b_avg
b_max
1
01:00 - 02:00
2
5
1.5
3
2
5
1.5
3
1
02:00 - 03:00
4
15
2.5
6
4
15
2.5
6
2
01:00 - 02:00
2
5
1.5
3
2
5
1.5
3
2
02:00 - 03:00
4
15
2.5
6
4
15
2.5
6
Basically, I want to flatten the table by having the aggregation columns (count, sum, avg, max) per propertyId, so the new columns are a_count, a_sum, a_avg, a_max, b_count, b_sum, ... All the rows have these values per userId per time window.
Important clarification: The values in propertyId column can change and hence, the number of columns can change as well. So, if there are n different values for propertyId, then there will be n*4 aggregation columns created.
SQL does not allow a dynamic number of result columns on principal. It demands to know number and data types of resulting columns at call time. The only way to make it "dynamic" is a two-step process:
Generate the query.
Execute it.
If you don't actually need separate columns, returning arrays or document-type columns (json, jsonb, xml, hstore, ...) containing a variable number of data sets would be a feasible alternative.
See:
Execute a dynamic crosstab query

Oracle SQL - count number of active/open tickets per hour by day

I have a dataset from oracle db that looks something like this:
ticket_num start_date repair_date
1 1/1/2021 02:05:15 1/4/2021 09:30:00
2 1/2/2021 12:15:45 1/2/2021 14:03:00
3 1/2/2021 12:20:00 1/2/2021 13:54:00
I need to calculate the number of active tickets in an hour time slot. So if the ticket was opened before that hour, and closed after the hour it would be counted. All days and hours need to be represented regardless if there are active tickets open during that time. The expected output is:
month day hour #active_tix
1 1 2 1
1 1 3 1
...
1 2 12 3
1 2 13 3
1 2 14 2
1 2 15 1
...
1 4 9 1
1 4 10 0
Any help would be greatly appreciated.
You need a calendar table. In the query below it is created on the fly
select c.hstart, count(t.ticket_num) n
from (
-- create calendar on the fly
select timestamp '2021-01-01 00:00:00' + NUMTODSINTERVAL(level-1, 'hour') hstart
from dual
connect by timestamp '2021-01-01 00:00:00' + NUMTODSINTERVAL(level-1, 'hour') < timestamp '2022-01-01 00:00:00'
) c
left join mytable t on t.start_date < c.hstart and t.repair_date >= c.hstart
group by c.hstart
order by c.hstart

SELECTING records based on unique date and counting how many records on that date

I have a table that I'm going to simplify. Here's what it looks like:
tid session pos dateOn
-----------------------------------------------
1 23 0 12/24/2020 1:00:00
2 23 1 12/24/2020 1:01:23
3 12 0 12/24/2020 1:02:43
4 23 2 12/24/2020 1:04:01
5 23 3 12/24/2020 1:04:12
6 45 0 12/26/2020 4:23:15
This table tells me that there were 2 unique sessions 12/24/2020 and 1 on 12/26.
How do I write my SQL statement so that I get a result like this:
date recordCount
----------------------------
12/24/2020 2
12/26/2020 1
You should simply be able to convert to a date and aggregate:
select convert(date, dateon), count(distinct session)
from t
group by convert(date, dateon)
order by convert(date, dateon);

Calculate Churn by aggregating by date range in SQL

I am trying to calculate the churn rate from a data that has customer_id, group, date. The aggregation is going to be by id, group and date. The churn formula is (customers in previous cohort - customers in last cohort)/customers in previous cohort
customers in previous cohort refers to cohorts in before 28 days
customers in last cohort refers to cohorts in last 28 days
I am not sure how to aggregate them by date range to calculate the churn.
Here is sample data that I copied from SQL Group by Date Range:
Date Group Customer_id
2014-03-01 A 1
2014-04-02 A 2
2014-04-03 A 3
2014-05-04 A 3
2014-05-05 A 6
2015-08-06 A 1
2015-08-07 A 2
2014-08-29 XXXX 2
2014-08-09 XXXX 3
2014-08-10 BB 4
2014-08-11 CCC 3
2015-08-12 CCC 2
2015-03-13 CCC 3
2014-04-14 CCC 5
2014-04-19 CCC 4
2014-08-16 CCC 5
2014-08-17 CCC 3
2014-08-18 XXXX 2
2015-01-10 XXXX 3
2015-01-20 XXXX 4
2014-08-21 XXXX 5
2014-08-22 XXXX 2
2014-01-23 XXXX 3
2014-08-24 XXXX 2
2014-02-25 XXXX 3
2014-08-26 XXXX 2
2014-06-27 XXXX 4
2014-08-28 XXXX 1
2014-08-29 XXXX 1
2015-08-30 XXXX 2
2015-09-31 XXXX 3
The goal is to calculate the churn rate every 28 days in between 2014 and 2015 by the formula given above. So, it is going to be aggregating the data by rolling it by 28 days and calculating the churn by the formula.
Here is what I tried to aggregate the data by date range:
SELECT COUNT(distinct customer_id) AS count_ids, Group,
DATE_SUB(CAST(Date AS DATE), INTERVAL 56 DAY) AS Date_min,
DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY) AS Date_max
FROM churn_agg
GROUP BY count_ids, Group, Date_min, Date_max
Hope someone will help me with aggregation and churn calculation. I want to simply deduct the aggregated count_ids to deduct it from the next aggregated count_ids which is after 28 days. So this is going to be successive deduction of the same column value (count_ids). I am not sure if I have to use rolling window or simple aggregation to find the churn.
As corrected by #jarlh, it's not 2015-09-31 but 2015-09-30
You can use this to create 28 days calendar:
create table daysby28 (i int, _Date date);
insert into daysby28 (i, _Date)
SELECT i, cast('01-01-2014'as date) + i*INTERVAL '28 day'
from generate_series(0,50) i
order by 1;
After you use #jarlh churn_agg table creation he sent with the fiddle, with this query, you get what you want:
with cte as
(
select count(Customer) as TotalCustomer, Cohort, CohortDateStart From
(
select distinct a.Customer_id as Customer, b.i as Cohort, b._Date as CohortDateStart
from churn_agg a left join daysby28 b on a._Date >= b._Date and a._Date < b._Date + INTERVAL '28 day'
) a
group by Cohort, CohortDateStart
)
select a.CohortDateStart,
1.0*(b.TotalCustomer - a.TotalCustomer)/(1.0*b.TotalCustomer) as Churn from cte a
left join cte b on a.cohort > b.cohort
and not exists(select 1 from cte c where c.cohort > b.cohort and c.cohort < a.cohort)
order by 1
The fiddle of all together is here