Find discarded records postgresql - sql

I have this query
select count(id)filter(where id>2 and id<=50) from table;
I want to find records that are eliminated by this filter
Yes! I can do this to find those records
select count(id)filter(where id<=2 or id>50) from table;
But suppose I have complex query I replaced my formula with id in above query for example.
I have a formula that calculates three different times based on different values now if i want to filter each time on some condition I can use filter for example
These are my filters:
> start_time<= 40 mins and start_time> 5 mins
> end_time<= 10 mins and end_time> 1 mins
> journey_time<= 80 mins and journey_time> 10 mins
> Total_time(start_time+end_time+journey_time) <= 150 and Total_time(start_time+end_time+journey_time) > 15
If I want to filter I have to write my formula 8 times (To filter < and >= for each time and total time) This will be my query
select
avg(start_time_formula)filter(where start_time_formula<= 40 and
start_time_formula>5),
avg(end_time_formula)filter(where end_time_formula<= 10 and
end_time_formula>1),
avg(journey_time_formula)filter(where journey_time_formula<= 80 and
journey_time_formula>10)
from table
where (start_time_formula+end_time_formula+journey_time_formula <=150 and
start_time_formula+end_time_formula+journey_time_formula > 15)
Now if I want to find all the discarded values also.
Do I have to write same formula 8 more times that will replace > with <= and "AND" with "OR" so it give me the discarded results or is there any other way to find the discarded values?
Update
My table values are
id start_time end_time journey_time Out_time
1 2018-04-06 01:37:36 2018-04-06 10:37:36 2018-04-06 04:37:36 2018-04-06
11:37:36
2 2018-04-16 02:37:36 2018-04-16 08:37:36 2018-04-16 06:37:36 2018-04-16
07:37:36
3 2018-05-10 01:37:36 2018-04-10 11:37:36 2018-04-06 09:37:36 2018-04-10
10:11:36
4 2018-05-10 04:37:36 2018-05-10 5:00:36 2018-05-10 04:47:36 2018-05-10
05:5:36
My Calculations are
start_time = journey_time - start_time
journey_time = end_time - journey_time
end_time = Out_time - end_time
This is my desired Output
start_time journey_time end_time discarded
10 mins 13 mins 5 mins 3
thanks

use condition by using case when,so your query would be like below
select
sum(case when start_time_formula <= 40 and start_time_formula>5 then 1 else 0 end),
sum(case when end_time_formula<= 10 and end_time_formula>1 then 1 else 0 end) ,
sum(case when where journey_time_formula<= 80 and journey_time_formula>10 then 1 else o end )
from table
where (start_time_formula+end_time_formula+journey_time_formula <=150 and
start_time_formula+end_time_formula+journey_time_formula > 15)

Related

How to prevent SQL query from returning overlapping groups?

I'm trying to generate a report that displays the number of failed login attempts that happen within 30 minutes of each other. The data for this report is in a SQL database.
This is the query I'm using to pull the data out.
SELECT
A.LoginID,
A.LogDatetime AS firstAttempt,
MAX(B.LogDatetime) AS lastAttempt,
COUNT(B.LoginID) + 1 AS attempts
FROM
UserLoginHistory A
JOIN UserLoginHistory B ON A.LoginID = B.LoginID
WHERE
A.SuccessfulFlag = 0
AND B.SuccessfulFlag = 0
AND A.LogDatetime < B.LogDatetime
AND B.LogDatetime <= DATEADD(minute, 30, A.LogDatetime)
GROUP BY
A.LoginID, A.LogDatetime
ORDER BY
A.LoginID, A.LogDatetime
This returns results that looks something like this:
Row
LoginID
firstAttempt
lastAttempt
attempts
1
1
2022-05-01 00:00
2022-05-01 00:29
6
2
1
2022-05-01 00:06
2022-05-01 00:33
6
3
1
2022-05-01 00:13
2022-05-01 00:39
6
4
1
2022-05-01 00:15
2022-05-01 00:45
6
5
1
2022-05-01 00:20
2022-05-01 00:50
6
6
1
2022-05-01 00:29
2022-05-01 00:55
6
7
1
2022-05-01 00:33
2022-05-01 01:01
6
8
1
2022-05-01 00:39
2022-05-01 01:04
6
...
...
...
...
...
However, you can see that the rows overlap a lot. For example, row 1 shows attempts from 00:00 to 00:29, which overlaps with row 2 showing attempts from 00:06 to 00:33. Row 2 ought to be like row 7 (00:33 - 01:01), since that row's firstAttempt is the next one after row 1's lastAttempt.
You might need to use recursive CTE's or insert your data into a temp table and loop it with updates to remove the overlaps.
Do you need to have set starting times? As a quick work around you could round down the the DATETIME to 30 minute intervals, that would ensure the logins don't overlap but it will only group the attempts by 30 minute buckets
SELECT
A.LoginID,
DATEADD(MINUTE, ROUND(DATEDIFF(MINUTE, '2022-01-01', A.LogDatetime) / 30.0, 0) * 30, '2022-01-01') AS LoginInterval,
MIN(A.LogDatetime) AS firstAttempt,
MAX(A.LogDatetime) AS lastAttempt,
COUNT(*) attempts
FROM
UserLoginHistory A
WHERE
A.SuccessfulFlag = 0
GROUP BY
A.LoginID, DATEADD(MINUTE, ROUND(DATEDIFF(MINUTE, '2022-01-01', A.LogDatetime) / 30.0, 0) * 30, '2022-01-01')
ORDER BY
A.LoginID, LoginInterval

Oracle SQL - count number of active/open tickets per hour by day

I have a dataset from oracle db that looks something like this:
ticket_num start_date repair_date
1 1/1/2021 02:05:15 1/4/2021 09:30:00
2 1/2/2021 12:15:45 1/2/2021 14:03:00
3 1/2/2021 12:20:00 1/2/2021 13:54:00
I need to calculate the number of active tickets in an hour time slot. So if the ticket was opened before that hour, and closed after the hour it would be counted. All days and hours need to be represented regardless if there are active tickets open during that time. The expected output is:
month day hour #active_tix
1 1 2 1
1 1 3 1
...
1 2 12 3
1 2 13 3
1 2 14 2
1 2 15 1
...
1 4 9 1
1 4 10 0
Any help would be greatly appreciated.
You need a calendar table. In the query below it is created on the fly
select c.hstart, count(t.ticket_num) n
from (
-- create calendar on the fly
select timestamp '2021-01-01 00:00:00' + NUMTODSINTERVAL(level-1, 'hour') hstart
from dual
connect by timestamp '2021-01-01 00:00:00' + NUMTODSINTERVAL(level-1, 'hour') < timestamp '2022-01-01 00:00:00'
) c
left join mytable t on t.start_date < c.hstart and t.repair_date >= c.hstart
group by c.hstart
order by c.hstart

Binning SQL data

Sorry if this has been asked before. Let's imagine I have table of temperature measurements inside a set of mechanical components:
ComponentID
Timestamp
Value
A
1st Jan 2020 00:00
20 C
A
1st Jan 2020 00:10
25 C
B
1st Jan 2018 00:00
19C
...and so on. Size of the table is fairly big, i.e. I have thousands of components with 10-minute measurements over a couple of years. What I need is a tally of the temperatures for each component in each year into, say, 5-degree bins, so a table looking like this:
ComponentID
Year
[-20;-15)
[-15,-10)
[-10;-5)
...
A
2018
5
20
300
...
A
2019
0
41
150
...
B
2018
60
10
1
...
..so for each component in each year, I count the number of measurements where the temperature has been in the [-20,-15) range, the number of measurements in the [-15,-10) range, and so on. I have a query doing this, but it's awfully slow. Is there an 'optimal' way of doing this kind of aggregation?
I'd say you should first pre-process your data to make it more simple to aggregate, then aggregate it with another query like (MySQL syntax):
SELECT cats.ComponentID, cats.Year,
SUM(tm5) `[-5;0)`,
SUM(t00) `[0;5)`,
SUM(tp5) `[5;10)`,
SUM(tp10) `[10;15)`,
SUM(tp15) `[15;20)`,
SUM(tp20) `[20;25)`,
SUM(tp25) `[25;30)`
FROM (
SELECT
ComponentID,
YEAR(`Timestamp`) `Year`,
(`Value` BETWEEN -5 AND -0.0001 ) tm5,
(`Value` BETWEEN 0 AND 4.9999 ) t00,
(`Value` BETWEEN 5 AND 9.9999 ) tp5,
(`Value` BETWEEN 10 AND 14.9999) tp10,
(`Value` BETWEEN 15 AND 19.9999) tp15,
(`Value` BETWEEN 20 AND 24.9999) tp20,
(`Value` BETWEEN 25 AND 29.9999) tp25
FROM
measurements
) cats
GROUP BY cats.ComponentID, cats.Year
ORDER BY cats.ComponentID, cats.Year
Inner query could be done into a temporary table if it's too much of a strain on memory.
I've ignored the fact that your temperatures are expressed as Strings including unit, which of course you should convert to numbers at some point, as it was not the point of the question.
Input (table named measurements):
id ComponentID Timestamp Value
------ ----------- ------------------- --------
3 B 2018-01-01 00:00:00 19
4 A 2019-03-05 05:10:00 16
5 A 2019-12-01 00:00:00 18
1 A 2020-01-01 00:00:00 20
2 A 2020-01-01 00:10:00 25
Result:
ComponentID Year [-5;0) [0;5) [5;10) [10;15) [15;20) [20;25) [25;30)
----------- ------ ------ ------ ------ ------- ------- ------- ---------
A 2019 0 0 0 0 2 0 0
A 2020 0 0 0 0 0 1 1
B 2018 0 0 0 0 1 0 0
I would suggest:
SELECT ComponentID, YEAR(`Timestamp`) as Year,
SUM(Value >= -20 AND Value < -15) as [-20;-15),
SUM(Value >= -15 AND Value < -10) as [-15;-10),
SUM(Value >= -10 AND Value < -5) as [-10;-05),
SUM(Value >= -5 AND Value < 0) as [-05;00),
SUM(Value >= 0 AND Value < 5) as [00;-05),
. . .
FROM measurements m
GROUP BY m.ComponentID, Year;
Note the use of inequalities to capture the exact ranges that you want.

Calculate overlap time in seconds for groups in SQL

I have a bunch of timestamps grouped by ID and type in the sample data shown below.
I would like to find overlapped time between start_time and end_time columns in seconds for each group of ID and between each lead and follower combinations. I would like to show the overlap time only for the first record of each group which will always be the "lead" type.
For example, for the ID 1, the follower's start and end times in row 3 overlap with the lead's in row 1 for 193 seconds (from 09:00:00 to 09:03:13). the follower's times in row 3 also overlap with the lead's in row 2 for 133 seconds (09:01:00 to 2020-05-07 09:03:13). That's a total of 326 seconds (193+133)
I used the partition clause to rank rows by ID and type and order them by start_time as a start.
How do I get the overlap column?
row# ID type start_time end_time rank. overlap
1 1 lead 2020-05-07 09:00:00 2020-05-07 09:03:34 1 326
2 1 lead 2020-05-07 09:01:00 2020-05-07 09:03:13 2
3 1 follower 2020-05-07 08:59:00 2020-05-07 09:03:13 1
4 2 lead 2020-05-07 11:23:00 2020-05-07 11:33:00 1 540
4 2 follower 2020-05-07 11:27:00 2020-05-07 11:32:00 1
5 3 lead 2020-05-07 14:45:00 2020-05-07 15:00:00 1 305
6 3 follower 2020-05-07 14:44:00 2020-05-07 14:44:45 1
7 3 follower 2020-05-07 14:50:00 2020-05-07 14:55:05 2
In your example, the times completely cover the total duration. If this is always true, you can use the following logic:
select id,
(sum(datediff(second, start_time, end_time) -
datediff(second, min(start_time), max(end_time)
) as overlap
from t
group by id;
To add this as an additional column, then either use window functions or join in the result from the above query.
If the overall time has gaps, then the problem is quite a bit more complicated. I would suggest that you ask a new question and set up a db fiddle for the problem.
Tried this a couple of way and got it to work.
I first joined 2 tables with individual records for each type, 'lead' and 'follower' and created a case statement to calculate max start time for each lead and follower start time combination and min end time for each lead and follower end time combination. Stored this in a temp table.
CASE
WHEN lead_table.start_time > follower_table.start_time THEN lead_table.start_time
WHEN lead_table.start_time < follower_table.start_time THEN patient_table.start_time_local
ELSE 0
END as overlap_start_time,
CASE
WHEN follower_table.end_time < lead_table.end_time THEN follower_table.end_time
WHEN follower_table.end_time > lead_table.end_time THEN lead_table.end_time
ELSE 0
END as overlap_end_time
Then created an outer query to lookup the temp table just created to find the difference between start time and end time for each lead and follower combination in seconds
select temp_table.id,
temp_table.overlap_start_time,
temp_table.overlap_end_time,
DATEDIFF_BIG(second,
temp_table.overlap_start_time,
temp_table.overlap_end_time) as overlap_time FROM temp_table

MS SQL, how to get summary parameter by the time periods (hours) during time period (each work day)

MS SQL 2014.
At the plant 2 work shifts of 12 hours each. I need to create a statistics table, with the columns of time, work shift, bunker number and the weight of products in each bunker (kg).
For example:
DateTime Shift Bunker Weight
> 2018-02-25 12:43:50.9480000 1 1 123
> 2018-02-25 13:57:49.3300000 1 2 200
> 2018-02-25 15:21:15.2970000 1 2 100
> 2018-02-25 01:57:49.3300000 2 1 345
> 2018-02-25 02:21:15.2970000 2 1 55
> 2018-02-26 13:56:02.5570000 1 1 561
> 2018-02-26 14:57:49.3300000 1 2 254
> 2018-02-26 03:57:49.3300000 2 2 400
> 2018-02-26 05:57:49.3300000 2 2 200
How to make a query to output the total weight of products in each bunker for each working shift, for each day? Like this:
DateTime Shift Bunker Weight
> 2018-02-25 1 1 123
> 2018-02-25 1 2 300
> 2018-02-25 2 1 400
> 2018-02-26 1 1 561
> 2018-02-26 1 2 254
> 2018-02-26 2 2 600
This is more than my capabilities in SQL ( Thanks.
select CONVERT(date,datetime),shift,bunker,sum(Weight) as Weight
from table1 group by CONVERT(date,datetime),shift,bunker
You need to do a GROUP BY on Date part of the DateTime column along with Shift and Bunker.
Following query should give your the desired output.
SELECT CAST([DATETIME] AS DATE) AS [DateTime], [Shift],[Bunker] ,SUM([Weight]) AS [Weight]
FROM [TABLE_NAME]
GROUP BY CAST([DATETIME] AS DATE), [Shift], [Bunker]