Give IDs to variation on gaps and islands problem

Give IDs to variation on gaps and islands problem - sql

This dataset contains one ordered timestamp column (A) along with a pair of marker columns (B + C) that represent the start and end of a 'block', what I'm looking to produce is (D).
I've had a hard time of explaining this problem to colleagues, but essentially I need a way of giving an ID to these blocks of varying row count but note that on row 8 as an example a block can sometimes only occupy one row.
| A | B | C | D |
-----------------------------------------
| 06/10/2018 13:17:40 | 1 | 0 | 1 |
| 06/10/2018 13:17:56 | 0 | 0 | 1 |
| 06/10/2018 13:18:08 | 0 | 1 | 1 |
| 06/10/2018 13:18:21 | 1 | 0 | 2 |
| 06/10/2018 13:18:26 | 0 | 0 | 2 |
| 06/10/2018 13:18:26 | 0 | 0 | 2 |
| 06/10/2018 13:18:28 | 0 | 1 | 2 |
| 06/10/2018 13:18:28 | 1 | 1 | 3 |
| 06/10/2018 13:18:31 | 1 | 0 | 4 |
| 06/10/2018 19:49:26 | 0 | 0 | 4 |
| 06/10/2018 19:50:24 | 0 | 1 | 4 |

You can try to use LAG window function in subquery then use SUM window function with condition aggregate function.
SELECT A,B,C,SUM(CASE WHEN preC = 1 THEN 1 ELSE 0 END) OVER(ORDER BY A,preC) +1 'D'
FROM (
SELECT *,
LAG(C,1,C) OVER(ORDER BY A) preC
FROM T
) t1
sqlfiddle
Result
| A | B | C | D |
-----------------------------------------
| 06/10/2018 13:17:40 | 1 | 0 | 1 |
| 06/10/2018 13:17:56 | 0 | 0 | 1 |
| 06/10/2018 13:18:08 | 0 | 1 | 1 |
| 06/10/2018 13:18:21 | 1 | 0 | 2 |
| 06/10/2018 13:18:26 | 0 | 0 | 2 |
| 06/10/2018 13:18:26 | 0 | 0 | 2 |
| 06/10/2018 13:18:28 | 0 | 1 | 2 |
| 06/10/2018 13:18:28 | 1 | 1 | 3 |
| 06/10/2018 13:18:31 | 1 | 0 | 4 |
| 06/10/2018 19:49:26 | 0 | 0 | 4 |
| 06/10/2018 19:50:24 | 0 | 1 | 4 |

I don't see what C has to do with the problem. This is just a cumulative sum on B:
select a, b, c,
sum(b) over (order by a) as d
from t;

Related

How to assign duplicate increment in SQL?

While going through SQL columns, if we find text match "NEW" in Calc column, update the incrementing a count starting with 1 in Results column.
It should look like this on the output:

The following uses an id column to resolve the order issue. Replace that with your corresponding expression. This also addresses the requirement to start the display sequence with 1 and also show 0 for the 'NEW' rows.
The SQL (updated):
SELECT logs.*
, CASE WHEN text = 'NEW' THEN 0
ELSE
COALESCE(SUM(CASE WHEN text = 'NEW' THEN 1 END) OVER (PARTITION BY xrank ORDER BY id)+1, 1)
END AS display
FROM logs
ORDER BY id
The result:
+----+-------+------+---------+
| id | xrank | text | display |
+----+-------+------+---------+
| 1 | 1 | A | 1 |
| 2 | 1 | B | 1 |
| 3 | 1 | C | 1 |
| 4 | 1 | NEW | 0 |
| 5 | 1 | D | 2 |
| 6 | 1 | Q | 2 |
| 7 | 1 | B | 2 |
| 8 | 1 | NEW | 0 |
| 9 | 1 | D | 3 |
| 10 | 1 | Z | 3 |
| 11 | 2 | A | 1 |
| 12 | 2 | B | 1 |
| 13 | 2 | C | 1 |
| 14 | 2 | NEW | 0 |
| 15 | 2 | D | 2 |
| 16 | 2 | Q | 2 |
| 17 | 2 | B | 2 |
| 18 | 2 | NEW | 0 |
| 19 | 2 | D | 3 |
| 20 | 2 | Z | 3 |
+----+-------+------+---------+

You need a column that specifies the ordering for the table. With that, just use a cumulative sum:
select t.*,
1 + sum(case when Calc = 'NEW' then 1 else 0 end) over (partition by Rank_Id order by Seq) as display
from t;

dense_rank over boolean column

Good day. I have a permutated table with condition and I am running redshift DB. This is a table with events log and I splitted it into session start (bool = 1) and session continue (bool = 0) like this:
=======================
| ID | BOOL |
=======================
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| 10 | 0 |
| 11 | 1 |
| 12 | 0 |
| 13 | 0 |
| 14 | 1 |
| 15 | 0 |
| 16 | 0 |
=======================
I need to create sesssion_id column with something like dense_rank:
================================
| ID | BOOL | D_RANK |
================================
| 1 | 0 | 1 |
| 2 | 1 | 2 |
| 3 | 0 | 2 |
| 4 | 0 | 2 |
| 5 | 0 | 2 |
| 6 | 0 | 2 |
| 7 | 0 | 2 |
| 8 | 0 | 2 |
| 9 | 0 | 2 |
| 10 | 0 | 2 |
| 11 | 1 | 3 |
| 12 | 0 | 3 |
| 13 | 0 | 3 |
| 14 | 1 | 4 |
| 15 | 0 | 4 |
| 16 | 0 | 4 |
================================
Is there any option to do this? Would appreciate any help.

Use a cumulative sum. Assuming that bool is the start of a new session:
select t.*,
sum(bool) over (order by id) as session_id
from t;
Note: This will start at 0. You can add 1 if you need.

SQL - Partition restarted based on a column value

I need to create a new column that restarts at every 0 value of Column Repeated Call of each Customer_ID:
+-------------+---------+----------------------+---------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call |
+-------------+---------+----------------------+---------------+
| 1 | 1 | Null | 0 |
| 1 | 2 | 45 | 0 |
| 1 | 3 | 0 | 1 |
| 1 | 4 | 0 | 1 |
| 1 | 5 | 0 | 1 |
| 1 | 6 | 48 | 0 |
| 1 | 7 | 1 | 1 |
| 2 | 8 | Null | 0 |
| 2 | 9 | 1 | 1 |
+-------------+---------+----------------------+---------------+
In to something like this:
+-------------+---------+----------------------+---------------+-------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call | Order_Group |
+-------------+---------+----------------------+---------------+-------------+
| 1 | 1 | Null | 0 | 1 |
| 1 | 2 | 45 | 0 | 2 |
| 1 | 3 | 0 | 1 | 2 |
| 1 | 4 | 0 | 1 | 2 |
| 1 | 5 | 0 | 1 | 2 |
| 1 | 6 | 48 | 0 | 3 |
| 1 | 7 | 1 | 1 | 3 |
| 2 | 8 | Null | 0 | 1 |
| 2 | 9 | 1 | 1 | 1 |
+-------------+---------+----------------------+---------------+-------------+
Appreciate your suggestion, thanks!

You can use SUM() window function:
select t.*,
sum(case when Repeated_Call = 0 then 1 else 0 end)
over (partition by Customer_ID order by Call_Id) Order_Group
from tablename t
See the demo (for MySql but it is standard SQL).
Results:
| Customer_ID | Call_ID | Days Since Last Call | Repeated_Call | Order_Group |
| ----------- | ------- | -------------------- | ------------- | ----------- |
| 1 | 1 | | 0 | 1 |
| 1 | 2 | 45 | 0 | 2 |
| 1 | 3 | 0 | 1 | 2 |
| 1 | 4 | 0 | 1 | 2 |
| 1 | 5 | 0 | 1 | 2 |
| 1 | 6 | 48 | 0 | 3 |
| 1 | 7 | 1 | 1 | 3 |
| 2 | 8 | | 0 | 1 |
| 2 | 9 | 1 | 1 | 1 |

You can calculation every 0 value in column Repeated Call (for each customer) using window analytic function COUNT with ROWS UNBOUNDED PRECEDING:
SELECT *,
COUNT(CASE WHEN Repeated Call=0 THEN 1 ELSE NULL END )OVER(PARTITION BY Customer_ID
ORDER BY Call_ID ROWS UNBOUNDED PRECEDING)Order_Gr FROM Table

How do I conditionally increase the value of the proceeding row number by 1

I need to increase the value of the proceeding row number by 1. When the row encounters another condition I then need to reset the counter. This is probably easiest explained with an example:
+---------+------------+------------+-----------+----------------+
| Acct_ID | Ins_Date | Acct_RowID | indicator | Desired_Output |
+---------+------------+------------+-----------+----------------+
| 5841 | 07/11/2019 | 1 | 1 | 1 |
| 5841 | 08/11/2019 | 2 | 0 | 2 |
| 5841 | 09/11/2019 | 3 | 0 | 3 |
| 5841 | 10/11/2019 | 4 | 0 | 4 |
| 5841 | 11/11/2019 | 5 | 1 | 1 |
| 5841 | 12/11/2019 | 6 | 0 | 2 |
| 5841 | 13/11/2019 | 7 | 1 | 1 |
| 5841 | 14/11/2019 | 8 | 0 | 2 |
| 5841 | 15/11/2019 | 9 | 0 | 3 |
| 5841 | 16/11/2019 | 10 | 0 | 4 |
| 5841 | 17/11/2019 | 11 | 0 | 5 |
| 5841 | 18/11/2019 | 12 | 0 | 6 |
| 5132 | 11/03/2019 | 1 | 1 | 1 |
| 5132 | 12/03/2019 | 2 | 0 | 2 |
| 5132 | 13/03/2019 | 3 | 0 | 3 |
| 5132 | 14/03/2019 | 4 | 1 | 1 |
| 5132 | 15/03/2019 | 5 | 0 | 2 |
| 5132 | 16/03/2019 | 6 | 0 | 3 |
| 5132 | 17/03/2019 | 7 | 0 | 4 |
| 5132 | 18/03/2019 | 8 | 0 | 5 |
| 5132 | 19/03/2019 | 9 | 1 | 1 |
| 5132 | 20/03/2019 | 10 | 0 | 2 |
+---------+------------+------------+-----------+----------------+
The column I want to create is 'Desired_Output'. It can be seen from this table that I need to use the column 'indicator'. I want the following row to be n+1; unless the next row is 1. The counter needs to reset when the value 1 is encountered again.
I have tried to use a loop method of some sort but this did not produce the desired results.
Is this possible in some way?

The trick is to identify the group of consecutive rows starts from indicator 1 to the next 1. This is achieve by using the cross apply finding the Acct_RowID with indicator = 1 and use that as a Grp_RowID to use as partition by in the row_number() window function
select *,
Desired_Output = row_number() over (partition by t.Acct_ID, Grp_RowID
order by Acct_RowID)
from your_table t
cross apply
(
select Grp_RowID = max(Acct_RowID)
from your_table x
where x.Acct_ID = t.Acct_ID
and x.Acct_RowID <= t.Acct_RowID
and x.indicator = 1
) g

SQL how to force to display row with 0 if no data available?

My table returns results as following (skips row if HourOfDay does not have data for particular ID)
ID HourOfDay Counts
--------------------------
1 5 5
1 13 10
1 23 3
..........................HourOfDay up till 23
2 9 1
and so on.
What I am trying to achieve is to force showing rows displaying 0 for HoursOfDay, which don't have data, like following:
ID HourOfDay Counts
--------------------------
1 0 0
1 1 0
1 2 0
1......................
1 5 5
1 6 0
1......................
1 23 3
2 0 0
2 1 0
etc.
I have researched around about it. It looks like I can achieve this result if I create an extra table and outer join it. So I have created table variable in SP (as a temp workaround)
DECLARE #Hours TABLE
(
[Hour] INT NULL
);
INSERT INTO #Hours VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
,(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
However, no matter how I join it, it does not achieve desired result.
How do I proceed? Do I add extra columns to join on? Completely different approach? Any hint in the right direction is appreciated!

Using a derived table for the distinct Ids cross joined to #Hours, left joined to your table:
select
i.Id
, h.Hour
, coalesce(t.Counts,0) as Counts
from (select distinct Id from t) as i
cross join #Hours as h
left join t
on i.Id = t.Id
and h.Hour = t.HourOfDay
rextester demo: http://rextester.com/XFZYX88502
returns:
+----+------+--------+
| Id | Hour | Counts |
+----+------+--------+
| 1 | 0 | 0 |
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 5 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 1 | 9 | 0 |
| 1 | 10 | 0 |
| 1 | 11 | 0 |
| 1 | 12 | 0 |
| 1 | 13 | 10 |
| 1 | 14 | 0 |
| 1 | 15 | 0 |
| 1 | 16 | 0 |
| 1 | 17 | 0 |
| 1 | 18 | 0 |
| 1 | 19 | 0 |
| 1 | 20 | 0 |
| 1 | 21 | 0 |
| 1 | 22 | 0 |
| 1 | 23 | 3 |
| 2 | 0 | 0 |
| 2 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 0 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
| 2 | 9 | 1 |
| 2 | 10 | 0 |
| 2 | 11 | 0 |
| 2 | 12 | 0 |
| 2 | 13 | 0 |
| 2 | 14 | 0 |
| 2 | 15 | 0 |
| 2 | 16 | 0 |
| 2 | 17 | 0 |
| 2 | 18 | 0 |
| 2 | 19 | 0 |
| 2 | 20 | 0 |
| 2 | 21 | 0 |
| 2 | 22 | 0 |
| 2 | 23 | 0 |
+----+------+--------+

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Give IDs to variation on gaps and islands problem - sql

I don't see what C has to do with the problem. This is just a cumulative sum on B: select a, b, c, sum(b) over (order by a) as d from t;

Related

How to assign duplicate increment in SQL?

dense_rank over boolean column

SQL - Partition restarted based on a column value

How do I conditionally increase the value of the proceeding row number by 1

SQL how to force to display row with 0 if no data available?

Categories

Resources