I am new to bigquery and trying to calculate moving average over 90 days for a sample data.
The sample data looks like below :
+---------------+-------------+----------+--------------------+-----------+-----------+-------------+
| incident_id | inc start | inc description| element_name| uid | is_repeated |
+---------------+-------------+-----------------+-------------|-----------------------+-------------+
| 1 | 1/5/2022 | server down | vm-001 | vm-001_server_down | No |
| 2 | 1/5/2022 | server down | vm-001 | vm-001_server_down | No |
| 3 | 2/5/2022 | firewall issue | vm-002 | vm-002_firewall_issue| No |
| 4 | 3/5/2022 | firewall issue | vm-003 | vm-003_firewall_issue| No |
| 5 | 1/6/2022 | server down | vm-001 | vm-001_server_down | Yes |
| 6 | 1/6/2022 | server down | vm-001 | vm-001_server_down | Yes |
| 7 | 2/6/2022 | server unreach | vm-003 | vm-003_server_unreach| No |
| 8 | 19/11/2022 | server down | vm-001 | vm-001_server_down | No |
---------------+-------------+----------+------------+-------|-----------+------------+-------------+
if inc description and uid occurs more than twice it should be represent "yes" in ISREPEATED column within 90 days.
What is the fastest way to acheive this using SQL ?
Condisier below query:
aggregate incident days over 90 days for each uid first,
and then remove duplicate days and count unique days of incidents.
WITH incidents AS (
SELECT *,
ARRAY_AGG(start) OVER (
PARTITION BY uid
ORDER BY UNIX_DATE(PARSE_DATE('%e/%m/%Y', start))
RANGE BETWEEN 89 PRECEDING AND CURRENT ROW
) AS inc_days
FROM sample_table
)
SELECT * EXCEPT(inc_days),
IF ((SELECT COUNT(DISTINCT day) FROM UNNEST(inc_days) day) > 1, 'YES', 'NO') AS is_repeated
FROM incidents
ORDER BY id;
Related
I read a lot of good answers (here, here, here) about finding gaps, but I still can't figure out how to find gaps with a minimal predefined size.
In my case gaps are entries with no name order by HE.
I also need to find gaps starting at the beginning of the table like in the example.
Anyone can help with a nice and clean SQL statement which can be altered uncomplicated to get predefined minimal gap sizes?
Example with expected output:
+-----------+----+ +----------------+ +----------------+ +----------------+
| name | HE | | GAPS >= 1 | | GAPS >= 2 | | GAPS >= 3 |
+-----------+----+ +-----------+----+ +-----------+----+ +-----------+----+
| | 1 | | name | HE | | name | HE | | name | HE |
| JohnDoe01 | 2 | +-----------+----+ +-----------+----+ +-----------+----+
| JohnDoe02 | 3 | | | 1 | | | 4 | | | 12 |
| | 4 | | | 4 | | | 5 | | | 13 |
| | 5 | | | 5 | | | 9 | | | 14 |
| JohnDoe03 | 6 | | | 9 | | | 10 | +-----------+----+
| JohnDoe04 | 7 | | | 10 | | | 12 |
| JohnDoe05 | 8 | | | 12 | | | 13 |
| | 9 | | | 13 | | | 14 |
| | 10 | | | 14 | +-----------+----+
| JohnDoe06 | 11 | +-----------+----+
| | 12 |
| | 13 |
| | 14 |
| JohnDoe07 | 15 |
+-----------+----+
You can identify the gaps and the start and stops. To identify the gaps, count the number of non-gaps and aggregate:
select min(he), max(he), count(*) as size
from (select t.*, count(name) over (order by he) as grp
from t
) t
where name is null
group by grp;
You can then filter using having for gaps of a certain size, say 2:
having count(*) >= 2
for instance.
This summarizes the gaps, with one per row. That actually seems more useful to me than a separate row for each row.
EDIT:
If you actually wanted the original rows, you could do:
select t.*
from (select t.*,
max(he) filter (where name is not null) over (order by he) as prev_he,
min(he) filter (where name is not null) over (order by he desc) as next_he,
max(he) over () as max_he
from t
) t
where name is null and
(max(next_he, max_he + 1) - coalesce(prev_he, 0) - 1) >= 2;
EDIT II:
In older versions of MySQL/MariaDB, you can use variables:
select min(he), max(he), count(*) as size
from (select t.*,
(#grp := #grp + (name is not null)) as grp
from (select t.* from t order by he) t cross join
(select #grp := 0) params
) t
where name is null
group by grp;
I want to calculate the number of people who also had occurrence the previous day on a daily basis, but I'm not sure how to do this?
Sample Table:
| ID | Date |
+----+-----------+
| 1 | 1/10/2020 |
| 1 | 1/11/2020 |
| 2 | 2/20/2020 |
| 3 | 2/20/2020 |
| 3 | 2/21/2020 |
| 4 | 2/23/2020 |
| 4 | 2/24/2020 |
| 5 | 2/22/2020 |
| 5 | 2/23/2020 |
| 5 | 2/24/2020 |
+----+-----------+
Desired Output:
| Date | Count |
+-----------+-------+
| 1/11/2020 | 1 |
| 2/21/2020 | 1 |
| 2/23/2020 | 1 |
| 2/24/2020 | 2 |
+-----------+-------+
Edit: Added desired output. The output count should be unique to the ID, not the number of date occurrences. i.e. an ID 5 can appear on this list 10 times for dates 2/23/2020 and 2/24/2020, but that would count as "1".
Use lag():
select date, count(*)
from (select t.*, lag(date) over (partition by id order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date)
group by date;
I am trying to sum up following rows if they have the same id and status.
The DB is running on a Windows Server 2016 and is a Microsoft SQL Server 14.
I was thinking about using a self join, but that would only sum up 2 rows, or somehow use lead/lag.
Here is how the table looks like (Duration is the days between this row and the next, sorted by mod_Date, if they have the same id):
+-----+--------------+-------------------------+----------+
| ID | Status | mod_Date | Duration |
+-----+--------------+-------------------------+----------+
| 1 | In Inventory | 2015-04-10 09:11:37.000 | 12 |
| 1 | Deployed | 2015-04-22 10:13:35.000 | 354 |
| 1 | Deployed | 2016-04-10 09:11:37.000 | 30 |
| 1 | In Inventory | 2016-05-10 09:11:37.000 | Null |
| 2 | In Inventory | 2013-04-10 09:11:37.000 | 12 |
| ... | ... | ... | ... |
+-----+--------------+-------------------------+----------+
There can be several rows with the same status and id following each other not only two.
And what I want to get is:
+-----+--------------+-------------------------+----------+
| ID | Status | mod_Date | Duration |
+-----+--------------+-------------------------+----------+
| 1 | In Inventory | 2015-04-10 09:11:37.000 | 12 |
| 1 | Deployed | 2015-04-22 10:13:35.000 | 384 |
| 1 | In Inventory | 2016-05-10 09:11:37.000 | Null |
| 2 | In Inventory | 2013-04-10 09:11:37.000 | 12 |
| ... | ... | ... | ... |
+-----+--------------+-------------------------+----------+
This is an example of gaps and islands. In this case, I think the difference of row numbers suffices:
select id, status, max(mod_date) as mod_date, sum(duration) as duration
from (select t.*,
row_number() over (partition by id, status order by mod_date) as seqnum_is,
row_number() over (partition by id order by mod_date) as seqnum_i
from t
) t
group by id, status, seqnum_i - seqnum_is;
The trick here is that the difference of two increasing sequences identifies "islands" of where the values are the same. This is rather mysterious the first time you see it. But if you run the subquery, you'll probably quickly see how this works.
For example, I have a table like this:
+---------+-------+----------+
| sort_id | level | security |
+---------+-------+----------+
| 1 | 1 | A |
| 2 | 2 | A |
| 3 | 3 | U |
| 4 | 4 | A |
| 5 | 5 | A |
| 6 | 3 | A |
| 7 | 4 | U |
| 8 | 5 | A |
| 9 | 6 | A |
| 10 | 7 | A |
| 11 | 3 | A |
| 12 | 3 | A |
+---------+-------+----------+
Security column is A for Authorized and U for Unauthorized. I need to exclude those records under the Unauthorized records based on their level.
For a better picture of the SQL records, it looks like this:
Those pointed with arrow are the Unauthorized records and we should exclude those under it.
So the SQL result should be the following table:
+---------+-------+----------+
| sort_id | level | security |
+---------+-------+----------+
| 1 | 1 | A |
| 2 | 2 | A |
| 3 | 3 | U |
| 6 | 3 | A |
| 7 | 4 | U |
| 11 | 3 | A |
| 12 | 3 | A |
+---------+-------+----------+
How can we produce it using a simple Select statement? Thanks in advanced! Just comment if something is unclear.
If I understand "under the unauthorized records" as meaning a sequence of records with increasing id`s following the unauthorized records (based on the id), then here is an approach:
select sort_id, level, security
from (select t.*, min(case when authorized = 'U' then id end) over (partition by grp) as minuid
from (select t.*,
(row_number() over (order by id) - level) as grp
from table t
) t
) t
where id > minuid;
Iam experiencing an issue in oracle analytic functions
I want the rank in oracle to be displayed sequentialy but require a cyclic fashion.But this ranking should happen within a group.
Say I have 10 groups
In 10 groups each group must be ranked in till 9. If greater than 9 the rank value must start again from 1 and then end till howmuch so ever
emp id date1 date 2 Rank
123 13/6/2012 13/8/2021 1
123 14/2/2012 12/8/2014 2
.
.
123 9/10/2013 12/12/2015 9
123 16/10/2013 15/10/2013 1
123 16/3/2014 15/9/2015 2
In the above example the for the group of rows of the empid 123 i have split the rank in two subgroup fashion.Sequentially from 1 to 9 is one group and for the rest of the rows the rank again starts from 1.How to achieve this in oracle rank functions.
as per suggestion from Egor Skriptunoff above:
select
empid, date1, date2
, row_number() over(order by date1, date2) as "rank"
, mod(row_number() over(order by date1, date2)-1, 9)+1 as "cycle_9"
from yourtable
example result
| empid | date1 | date2 | rn | ranked |
|-------|----------------------|----------------------|----|--------|
| 72232 | 2016-10-26T00:00:00Z | 2017-03-07T00:00:00Z | 1 | 1 |
| 04365 | 2016-11-03T00:00:00Z | 2017-07-29T00:00:00Z | 2 | 2 |
| 79203 | 2016-12-15T00:00:00Z | 2017-05-16T00:00:00Z | 3 | 3 |
| 68638 | 2016-12-18T00:00:00Z | 2017-02-08T00:00:00Z | 4 | 4 |
| 75784 | 2016-12-24T00:00:00Z | 2017-11-18T00:00:00Z | 5 | 5 |
| 72836 | 2016-12-24T00:00:00Z | 2018-09-10T00:00:00Z | 6 | 6 |
| 03679 | 2017-01-24T00:00:00Z | 2017-10-14T00:00:00Z | 7 | 7 |
| 43527 | 2017-02-12T00:00:00Z | 2017-01-15T00:00:00Z | 8 | 8 |
| 03138 | 2017-02-26T00:00:00Z | 2017-01-30T00:00:00Z | 9 | 9 |
| 89758 | 2017-03-29T00:00:00Z | 2018-04-12T00:00:00Z | 10 | 1 |
| 86377 | 2017-04-14T00:00:00Z | 2018-10-07T00:00:00Z | 11 | 2 |
| 49169 | 2017-04-28T00:00:00Z | 2017-04-21T00:00:00Z | 12 | 3 |
| 45523 | 2017-05-03T00:00:00Z | 2017-05-07T00:00:00Z | 13 | 4 |
SQL Fiddle