SQL Server GROUP BY COUNT Consecutive Rows Only

SQL Server GROUP BY COUNT Consecutive Rows Only - sql

I have a table called DATA on Microsoft SQL Server 2008 R2 with three non-nullable integer fields: ID, Sequence, and Value. Sequence values with the same ID will be consecutive, but can start with any value. I need a query that will return a count of consecutive rows with the same ID and Value.
For example, let's say I have the following data:
ID Sequence Value
-- -------- -----
1 1 1
5 1 100
5 2 200
5 3 200
5 4 100
10 10 10
I want the following result:
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 1
5 2 200 2
5 4 100 1
10 10 10 1
I tried
SELECT ID, MIN([Sequence]) AS Start, Value, COUNT(*) AS [Count]
FROM DATA
GROUP BY ID, Value
ORDER BY ID, Start
but that gives
ID Start Value Count
-- ----- ----- -----
1 1 1 1
5 1 100 2
5 2 200 2
10 10 10 1
which groups all rows with the same values, not just consecutive rows.
Any ideas? From what I've seen, I believe I have to left join the table with itself on consecutive rows using ROW_NUMBER(), but I am not sure exactly how to get counts from that.
Thanks in advance.

You can use Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g to create a group:
SELECT
ID,
MIN(Sequence) AS Sequence,
Val,
COUNT(*) AS cnt
FROM
(
SELECT
ID,
Sequence,
Sequence - ROW_NUMBER() OVER (ORDER BY ID, Val, Sequence) AS g,
Val
FROM
yourtable
) AS s
GROUP BY
ID, Val, g
Please see a fiddle here.

Related

sql - select single ID for each group with the lowest value

Consider the following table:
ID GroupId Rank
1 1 1
2 1 2
3 1 1
4 2 10
5 2 1
6 3 1
7 4 5
I need an sql (for MS-SQL) select query selecting a single Id for each group with the lowest rank. Each group needs to only return a single ID, even if there are two with the same rank (as 1 and 2 do in the above table). I've tried to select the min value, but the requirement that only one be returned, and the value to be returned is the ID column, is throwing me.
Does anyone know how to do this?

Use row_number():
select t.*
from (select t.*,
row_number() over (partition by groupid order by rank) as seqnum
from t
) t
where seqnum = 1;

Calculate "position in run" in SQL

I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.

One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t

Hive: window function - how to exclude the CURRENT ROW

I wish to calculate the minimum of a value over a partition, but the current row should not be taken into account.
SELECT *,
MIN(val) OVER(PARTITION BY col1)
FROM table
outputs the minimum over all rows in the partition.
The documentation shows ways to use CURRENT ROW, but not how to exclude it while performing the windowing operation.
I am looking for something like this:
SELECT *,
MIN(val) OVER(PARTITION BY col1 ROWS NOT CURRENT ROW)
FROM table
but this does not work.

I can think of a way to do this. The min over a window excluding the current row will always be the min over the window except when the row you are at is the min; then then min will be the 2nd min over the window. Example:
Data:
-----------
key | val
-----------
1 8
1 2
1 4
1 6
1 11
2 3
2 5
2 7
2 9
Query:
select key, val, act_min, val_arr
, case when act_min=val then val_arr[1] else act_min
end as min_except_for_c_row
from (
select key, val, act_min, sort_array(val_arr) val_arr
from (
select key, val
, min(val) over (partition by key) act_min
, collect_set(val) over (partition by key) val_arr
from db.table ) A
) B
I left all the columns in for illustration. You can modify the query as needed.
Output:
key val act_min val_arr min_except_for_c_row
1 8 2 [2,4,6,8,11] 2
1 2 2 [2,4,6,8,11] 4
1 4 2 [2,4,6,8,11] 2
1 6 2 [2,4,6,8,11] 2
1 11 2 [2,4,6,8,11] 2
2 3 3 [3,5,7,9] 5
2 5 3 [3,5,7,9] 3
2 7 3 [3,5,7,9] 3
2 9 3 [3,5,7,9] 3

Select Query to Get Unique Cells in Two Columns

I have an SQL Server database, that logs weather device sensor data.
The table looks like this:
Id DeviceId SensorId Value
1 1 1 42
2 1 1 3
3 1 2 30
4 2 2 0
5 2 1 1
6 3 1 26
7 3 1 23
8 3 2 1
In return the query should return the following:
Id DeviceId SensorId Value
2 1 1 3
3 1 2 30
4 2 2 0
5 2 1 1
7 3 1 23
8 3 2 1
For each device the sensor should be unique. i.e. Values in Columns DeviceId and SensorId should be unique (row-wise).
Apologies if I'm not clear enough.

If you don't want to sum Value as your desired result suggest, so you just want to take an "arbitrary" row of each "DeviceId + SensorId"-group:
WITH CTE AS
(
SELECT Id, DeviceId, SensorId, Value,
RN = ROW_NUMBER() OVER (PARTITION BY DeviceId, SensorId ORDER BY ID DESC)
FROM dbo.TableName
)
SELECT Id, DeviceId, SensorId, Value
FROM CTE
WHERE RN = 1
ORDER BY ID
This returns the row with the highest ID per group. You need to change ORDER BY ID DESC if you want a different result. Demo: http://sqlfiddle.com/#!6/8e31b/2/0 (your result)

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227

The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle

SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server GROUP BY COUNT Consecutive Rows Only - sql

Related

sql - select single ID for each group with the lowest value

Calculate "position in run" in SQL

Hive: window function - how to exclude the CURRENT ROW

Select Query to Get Unique Cells in Two Columns

How to find the SQL medians for a grouping

Categories

Resources