sql best strategy to partition same values based on temporal sequence - sql

I have data that looks like this, where there are multiple values for each ID that correspond to an ascending date variable:
ID LEVEL DATE
1 10 10/1/2000
1 10 11/20/2001
1 10 12/01/2001
1 30 02/15/2002
1 30 02/15/2002
1 20 05/17/2002
1 20 01/04/2003
1 30 07/20/2003
1 30 03/16/2004
1 30 04/15/2004
I want to acquire a count per each ID/LEVEL/DATE block that looks like this:
ID LEVEL COUNT
1 10 3
1 30 2
1 20 2
1 30 3
The problem is that if I use the count windows function and partition by level, it groups 30 together regardless of the temporal sequence. I want the count for level 30 both before and after 20 to be distinct. Does anyone know how to do that?

A standard gaps and islands solution using ROW_NUMBER(), if it's available on your particular DBMS...
WITH
ordered AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS set_ordinal,
ROW_NUMBER() OVER (PARTITION BY id, level ORDER BY date) AS grp_ordinal
FROM
yourData
)
SELECT
id,
level,
set_ordinal - grp_ordinal,
MIN(date),
COUNT(*)
FROM
ordered
GROUP BY
id,
level,
set_ordinal - grp_ordinal
ORDER BY
id,
MIN(date)
Visualising the effect of the two row numbers...
ID LEVEL DATE set_ordinal grp_ordinal set-grp GROUP
-- ----- ---------- ----------- ----------- ------- --------
1 10 10/01/2000 1 1 0 1,10,0
1 10 11/20/2001 2 2 0 1,10,0
1 10 12/01/2001 3 3 0 1,10,0
1 30 02/15/2002 4 1 3 1,30,3
1 30 02/15/2002 5 2 3 1,30,3
1 20 05/17/2002 6 1 5 1,20,5
1 20 01/04/2003 7 2 5 1,20,5
1 30 07/20/2003 8 3 5 1,30,5
1 30 03/16/2004 9 4 5 1,30,5
1 30 04/15/2004 10 5 5 1,30,5

Related

calculate avg(value) for last 10 records postgresql

i have a tricky task,
lets assume we have table "Racings", and there we have columns TRACK, CAR, CIRCLE_TIME
here is an example how data could be look like:
id
track
car
circle_time
10
1
10
15
9
1
10
14
8
1
10
16
7
1
10
15
6
1
10
13
5
2
10
7
4
2
10
4
3
2
10
5
2
3
10
8
1
3
10
10
what i need, i to add one more coumn like avg3_circle_time which will show me an average time from last 3 circle_time from each track, example:
id
track
car
circle_time
avg3_circle_time
10
1
10
15
15
9
1
10
14
15
8
1
10
16
14.6
7
1
10
15
null
6
1
10
13
null
5
2
10
7
5.3
4
2
10
4
null
3
2
10
5
null
2
3
10
8
null
1
3
10
10
null
I know how it could works in oracle, you could use something like rowid, but in case of postgresql i don't know, i have a draft like .....avg(circle_time) OVER(PARTITION BY track,car.....) as avg3_circle_time..... help me to solve that task please
You can use window functions to calculate moving averages:
SELECT track, id, car, circle_time, AVG(circle_time) OVER (
PARTITION BY track
ORDER BY id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
FROM t
ORDER BY track, id
Depending on your definition of previous three, the window could be ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING.
If you want only values when at least 3 circles available
select *
, case when lag(id, 2) over(partition by TRACK, CAR order by id) is not null then
avg(CIRCLE_TIME) over(partition by TRACK, CAR order by id rows between 2 preceding and current row) end a
from Racing
order by id desc;
db<>fiddle
Output
id track car circle_time a
10 1 10 15 15.0000000000000000
9 1 10 14 15.0000000000000000
8 1 10 16 14.6666666666666667
7 1 10 15 null
6 1 10 13 null
5 2 10 7 5.3333333333333333
4 2 10 4 null
3 2 10 5 null
2 3 10 8 null
1 3 10 10 null
Use LAED() then checking one of the next 2 rows is NULL or not. THEN sum of three values for calculating average.
-- PostgreSQL
SELECT *
, CASE WHEN next_circle_time IS NULL OR next_next_circle_time IS NULL
THEN NULL
ELSE ((t.circle_time + COALESCE(next_circle_time, 0) + COALESCE(next_next_circle_time, 0)) / 3 :: DECIMAL) :: DECIMAL(10, 1)
END avg_circle_time
FROM (SELECT *
, LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) next_circle_time
, LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) next_next_circle_time
FROM Racings) t
Another way Use AVG()
SELECT *
, CASE WHEN LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
OR LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
THEN NULL
ELSE AVG(circle_time) OVER (PARTITION BY track ORDER BY id DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END :: DECIMAL(10, 2) avg_circle_time
FROM Racings
Please check from url where both query exists https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f0cd868623725a1b92bf988cfb2deba3
Several of the posted answers end up repeating the window definition. You can avoid this with the window clause:
select *,
case when row_number() over(track_window) > 2
then trunc(avg(CIRCLE_TIME) over(track_window rows 2 preceding), 1)
end a
from Racing
window track_window as (partition by track order by id)
order by id desc
Note how, in this sample, track_window is defined once, then reused for both row_number and avg. In the latter case, the window clause is embellished with a frame as well (rows 2 preceding).

Resetting a Count in SQL

I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.

Add order within group and mark whether a row is the last in its group

I have a table in an SQL Server database on the following form, sorted according to id.
id group
1 10
17 10
24 10
2 20
16 20
72 20
104 20
8 30
9 30
I would like to select every row grouped according to the row group and add the following information to this table: the order (as sorted) within the group and whether the row is the last row in the group. In other words, something similar to this:
id group order last
1 10 1 0
17 10 2 0
24 10 3 1
2 20 1 0
16 20 2 0
72 20 3 0
104 20 4 1
8 30 1 0
9 30 2 1
I've tried fiddling around with ROW_NUMBER, but I'm not all that experienced with SQL Server and I can't get it to work. Does anyone have a suggestion?
Use ROW_NUMBER window function
select id,[group],
row_number()over(partition by [group] order by id) as [order],
case when row_number()over(partition by [group] order by id desc) = 1 then 1 else 0 end as Last
From yourtable

Select Query to Get Unique Cells in Two Columns

I have an SQL Server database, that logs weather device sensor data.
The table looks like this:
Id DeviceId SensorId Value
1 1 1 42
2 1 1 3
3 1 2 30
4 2 2 0
5 2 1 1
6 3 1 26
7 3 1 23
8 3 2 1
In return the query should return the following:
Id DeviceId SensorId Value
2 1 1 3
3 1 2 30
4 2 2 0
5 2 1 1
7 3 1 23
8 3 2 1
For each device the sensor should be unique. i.e. Values in Columns DeviceId and SensorId should be unique (row-wise).
Apologies if I'm not clear enough.
If you don't want to sum Value as your desired result suggest, so you just want to take an "arbitrary" row of each "DeviceId + SensorId"-group:
WITH CTE AS
(
SELECT Id, DeviceId, SensorId, Value,
RN = ROW_NUMBER() OVER (PARTITION BY DeviceId, SensorId ORDER BY ID DESC)
FROM dbo.TableName
)
SELECT Id, DeviceId, SensorId, Value
FROM CTE
WHERE RN = 1
ORDER BY ID
This returns the row with the highest ID per group. You need to change ORDER BY ID DESC if you want a different result. Demo: http://sqlfiddle.com/#!6/8e31b/2/0 (your result)

Queries to fetch items based on top Ranks

I have a table which rank the items which i have.
I need a queries which will pick up only the top 2 ranks for a given item, the rank may not be in sequential order.
I need to fetch the item with least two ranks, there will same rank for two items as well.
Here is the snap shot of my table.
Item Id Supp Id Rank
1 2 2
1 1 7
1 7 5
1 9 11
2 67 4
2 9 14
2 10 14
2 34 4
2 25 3
2 60 3
2 79 5
my requirement is if I enter 2 i should get the result as below
Item Id Supp_id Rank
2 25 3
2 60 3
2 67 4
2 34 4
I am using oracle 10g version.
As one of the approaches it can be done as follows. Here we are using dense_rank() over() analytic function to assign a rank for a row in a ordered by rank group of rows .
select t.item_id
, t.supp_id
, t.rank
from (select item_id
, supp_id
, rank
, dense_rank() over(partition by item_id
order by rank) as rn
from t1
where item_id = 2
) t
where t.rn <= 2
Result:
ITEM_ID SUPP_ID RANK
---------- ---------- ----------
2 25 3
2 60 3
2 67 4
2 34 4
SQLFiddle Demo