Sequential numbering of data sequence - sql

I do have the following table:
create table test_seq (id int, obs int);
insert into test_seq values (1,1);
insert into test_seq values (2,1);
insert into test_seq values (3,1);
insert into test_seq values (4,0);
insert into test_seq values (5,0);
insert into test_seq values (6,1);
insert into test_seq values (7,1);
insert into test_seq values (8,0);
insert into test_seq values (9,0);
insert into test_seq values (10,1);
insert into test_seq values (11,0);
Is there s SQL way, how to create the following output?
id obs seq_num
1 1 1
2 1 1
3 1 1
4 0 2
5 0 2
6 1 3
7 1 3
8 0 4
9 0 4
10 1 5
11 0 6
seq_num is increased by 1 every time when value in column obs is changed compared to the previous row (ordered by id). I can solve this easily in Excel (using simple if formula), but can't figure out this in postgres.

using analytic functions, something like :
select id, obs, sum(cnt) over (order by id) as seq_num
from (
select id, obs, case when obs <> (lag(obs) over (order by id)) then 1 else 0 end as cnt
from test_seq
)
order by id;

I've figured it out:
with t as (
select
id,
obs,
case when lag(obs,1) over (order by id) <> obs then 1 else 0 end as test
from
tmp.test_seq
)
select
*,
sum(test) over (order by id rows between unbounded preceding and current row) + 1
from
t

You can use rownumber to do that.

Related

TSQL Make partitions "gaps and island"

I need to create partitions. Suppose I have this table:
CREATE TABLE MyTable (Pos INT UNIQUE, X INT)
INSERT INTO MyTable VALUES (3, 2)
INSERT INTO MyTable VALUES (5, 0)
INSERT INTO MyTable VALUES (6, 0)
INSERT INTO MyTable VALUES (9, 0)
INSERT INTO MyTable VALUES (43, 9)
INSERT INTO MyTable VALUES (53, 8)
INSERT INTO MyTable VALUES (56, 0)
INSERT INTO MyTable VALUES (81, 0)
INSERT INTO MyTable VALUES (163, 1)
INSERT INTO MyTable VALUES (9716, 0)
The query result should be this table with a column Y added, Y should be
IF X=0 : the previous value X<>0 (OR NULL, if not exists), ordered by Pos
IF X<>0 : X
Desired answer table looks like this
SELECT *
FROM MyQuery as a function of MyTable
ORDER BY Pos
Pos X Y
3 2 2
5 0 2
6 0 2
9 0 2
43 9 9
53 8 8
56 0 8
81 0 8
163 1 1
9716 0 1
This is a type of gaps-and-islands problem.
There are many solutions, here is one:
Use a running conditional count to number the rows that we want to group together
Use a partitioned conditional MIN to take the only value that we actually want, per group
WITH StartPoints AS (
SELECT *,
GroupId = COUNT(NULLIF(X, 0)) OVER (ORDER BY Pos)
FROM MyTable
)
SELECT
Pos,
X,
Y = MIN(NULLIF(X, 0)) OVER (PARTITION BY GroupId)
FROM StartPoints
ORDER BY Pos;
db<>fiddle

SQL count number of records where value remains constant

I need to find the count of tracker_id where position remains 1 through out the table.
tracker_id | position
---------------------
5 | 1
11 | 1
4 | 1
4 | 2
5 | 2
4 | 1
4 | 1
11 | 1
14 | 1
9 | 2
Here, the output should be 2 since, position of tracker_id:11 and 14 remains 1 through out the table.
You can use not exists
select count(*) from tbl a
where not exists(select 1
from tbl b
where a.tracker_id = b.tracker_id
and a.position <> b.position )
and a.position = 1
Output: 2
declare #table1 as table (tracker_id int,postion int)
insert into #table1 values (5,1)
insert into #table1 values (11,1)
insert into #table1 values (4,1)
insert into #table1 values (4,2)
insert into #table1 values (5,2)
insert into #table1 values (4,1)
insert into #table1 values (4,1)
insert into #table1 values (11,1)
insert into #table1 values (14,1)
insert into #table1 values (9,2)
select count(tracker_id),tracker_id,postion from #table1 group by tracker_id,postion
You can also do:
select ( count(distinct tracker_id) -
count(distinct tracker_id) filter (where position <> 1)
) as num_all_1s
from t;
Using uncorrelated subquery
select count(distinct tracker_id)
from t
where position=1
and tracker_id not in (select tracker_id from t where position<>1);
Using window function
select count(distinct tracker_id)
from (select *, avg(position) over (partition by tracker_id) as avg_pos from t) a
where avg_pos=1;
This one is just for giggles
select distinct count(*) over ()
from t
group by tracker_id
having count(*) = sum(position);
And if you really want to have fun
select count(distinct tracker_id)-count(distinct case when position<>1 then tracker_id end)
from t;
If position can only be 1, then you can use this, which gets all the tracker_ids with only a single position value, and then limits that to those records where position = 1:
WITH agg AS
(
SELECT
tracker_id
, p = MAX(position)
FROM table1
GROUP BY tracker_id
HAVING COUNT(DISTINCT position) = 1
)
SELECT COUNT(tracker_id)
FROM agg
WHERE p = 1

How can I group a set split by change in a field with respect to an order?

I have a set of records.
ID Value
1 a
2 b
3 b
4 b
5 a
6 a
7 b
8 b
And I would like to group them like so.
MIN(ID) MAX(ID) Value
1 1 a
2 4 b
5 6 a
7 8 b
I'm vaguely aware of oracle over() analytical function which looks to be the right direction, but I don't know what this problem is called much less how to solve it.
Probably an easier way, but this may help to start. I ran it on Postgres, but should work (maybe with a minor tweak) on Oracle. The inner most query puts the previous value on each row. We can use that to detect a grouping change (when value does not equal previous value). Every time there is a group change, we flag it with a "1". Sum these group changes and we now have a group id which increments every time there is a value change. Then we can perform our normal group by function.
create table x(id int, value varchar(1));
insert into x values(1, 'a');
insert into x values(2, 'b');
insert into x values(3, 'b');
insert into x values(4, 'b');
insert into x values(5, 'a');
insert into x values(6, 'a');
insert into x values(7, 'b');
insert into x values(8, 'b');
SELECT MIN(id), MAX(id), value
FROM ( SELECT id
,value
,previous_value
,SUM( CASE WHEN value = previous_value THEN 0 ELSE 1 END ) OVER(ORDER BY id) AS group_id
FROM ( SELECT id
,value
,COALESCE( LAG(value) OVER(ORDER BY id), value ) previous_value
FROM x
ORDER BY id
) y
) z
GROUP BY group_id, value
ORDER BY 1, 2;
min | max | value
-----+-----+-------
1 | 1 | a
2 | 4 | b
5 | 6 | a
7 | 8 | b
(4 rows)

How do I select TOP 5 PERCENT from each group?

I have a sample table like this:
CREATE TABLE #TEMP(Category VARCHAR(100), Name VARCHAR(100))
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'John')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Adam')
INSERT INTO #TEMP VALUES('A', 'Lisa')
INSERT INTO #TEMP VALUES('A', 'Lisa')
INSERT INTO #TEMP VALUES('A', 'Bucky')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Lily')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Tom')
INSERT INTO #TEMP VALUES('B', 'Ross')
INSERT INTO #TEMP VALUES('B', 'Ross')
INSERT INTO #TEMP VALUES('B', 'Ross')
SELECT Category, Name, COUNT(Name) Total
FROM #TEMP
GROUP BY Category, Name
ORDER BY Category, Total DESC
DROP TABLE #TEMP
Gives me the following:
A John 6
A Adam 4
A Lisa 2
A Bucky 1
B Lily 5
B Tom 4
B Ross 3
Now, how do I select the TOP 5 PERCENT records from each category assuming each category has more than 100 records (did not show in sample table here)? For instance, in my actual table, it should remove the John record from A and Lily record from B as appropriate (again, I did not show the full table here) to get:
A Adam 4
A Lisa 2
A Bucky 1
B Tom 4
B Ross 3
I have been trying to use CTEs and PARTITION BY clauses but cannot seem to achieve what I want. It removes the TOP 5 PERCENT from the overall result but not from each category. Any suggestions?
You could use a CTE (Common Table Expression) paired with the NTILE windowing function - this will slice up your data into as many slices as you need, e.g. in your case, into 20 slices (each 5%).
;WITH SlicedData AS
(
SELECT Category, Name, COUNT(Name) Total,
NTILE(20) OVER(PARTITION BY Category ORDER BY COUNT(Name) DESC) AS 'NTile'
FROM #TEMP
GROUP BY Category, Name
)
SELECT *
FROM SlicedData
WHERE NTile > 1
This basically groups your data by Category,Name, orders by something else (not sure if COUNT(Name) is really the thing you want here), and then slices it up into 20 pieces, each representing 5% of your data partition. The slice with NTile = 1 is the top 5% slice - just ignore that when selecting from the CTE.
See:
MSDN docs on NTILE
SQL Server 2005 ranking functions
SQL SERVER – 2005 – Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE
for more info
select Category,name,CountTotal,RankSeq,(50*CountTotal)/100 from (
select Category,name,COUNT(*)
over (partition by Category,name ) as CountTotal,
ROW_NUMBER()
over (partition by Category,name order by Category) RankSeq from #TEMP
--group by Category,Name
) temp
where RankSeq <= ((50*CountTotal)/100)
order by Category,Name,RankSeq
Output:
Category name CountTotal RankSeq 50*CountTotal)/100
A Adam 4 1 2
A Adam 4 2 2
A John 6 1 3
A John 6 2 3
A John 6 3 3
A Lisa 2 1 1
B Lily 5 1 2
B Lily 5 2 2
B Ross 3 1 1
B Tom 4 1 2
B Tom 4 2 2
I hope this helps :)
;WITH SlicedData AS
(
SELECT Category, Name, COUNT(Name) Total,
**PERCENT_RANK() OVER(PARTITION BY Category ORDER BY COUNT(Name) DESC) * 100** AS 'Percent'
FROM #TEMP
GROUP BY Category, Name
)
SELECT *
FROM SlicedData
WHERE Percent < 5
NTile will not work if number of records is less than your tile number.

SQL query to populate index field based on groups

I'm fairly new to sql and was hoping that someone can help me with an update query. I have a users table with group_id (foreign key), user_id and user_index columns. There are multiple users corresponding to each individual group_id, and user_id is a serial column which goes from 1 to the table size.
I'm looking for a query that will update the user_index column so that, for each group_id, each user will have a unique, sequential index starting with 1. So within group 1 there would be user_index 1,2,3... and within group 2 there would be user_index 1,2,3... and so on. Here is an example to clarify:
Initial state:
user_id | group_id | user_index
1 1 0
2 1 0
3 1 0
4 2 0
5 3 0
6 3 0
Desired state:
user_id | group_id | user_index
1 1 1
2 1 2
3 1 3
4 2 1
5 3 1
6 3 2
I hope that's clear. This would be easy to do in C or C++, but I'm wondering if there's a way to do it in sql.
UPDATE TableName
SET user_index = (SELECT COUNT(1) FROM TableName t2
WHERE t2.group_id = TableName.group_id AND t2.user_id <= TableName.user_id)
EDIT:
After looking at author's comment I created a test to see is this the right solution. Here's the test:
CREATE TABLE #table (user_id int, group_id int, user_index int)
INSERT INTO #table VALUES (1, 1, 0)
INSERT INTO #table VALUES (2, 1, 0)
INSERT INTO #table VALUES (3, 1, 0)
INSERT INTO #table VALUES (4, 2, 0)
INSERT INTO #table VALUES (5, 3, 0)
INSERT INTO #table VALUES (6, 3, 0)
SELECT * FROM #table
UPDATE #table
SET user_index = (SELECT COUNT(1) FROM #table t2
WHERE t2.group_id = #table.group_id AND t2.user_id <= #table.user_id)
SELECT * FROM #table
DROP TABLE #table
The output of two selects are exactly the same as in author's two selects - the first as beginning state, and the second as the desired outcome.