Row numbering based on contiguous data? - sql

I need to assign numbers to rows based on a date. The rule is that the same number is assigned to multiple contiguous rows with the same date. When a row's date value differs from the previous row's date value, the number is incremented. The result set would look something like this (the first column would be used to determine row order):
1 7/1/2021 1
2 7/2/2021 2
3 7/2/2021 2
4 7/1/2021 3
5 7/2/2021 4
The value of the date is not what' relevant in this case. As you can see, there are repeats of the same date that get assigned different numeric values because they are not contiguous. I'm struggling to figure out how I would accomplish this.

This is a Gaps & Islands problem. You need to provide the extra ordering columns for the query to make sense.
If you added these, the solution would go along the lines of:
select
d,
1 + sum(inc) over(order by ordering_columns) as grp
from (
select d, ordering_columns,
case when d <> lag(d) over(order by ordering_columns) then 1 else 0 end as inc
from t
) x
order by ordering_columns

Related

adjust recursive sql query to exclude holidays and weekends

I have a dataset like this called data_per_day
instructional_day
points
2023-01-24
2
2023-01-23
2
2023-01-20
1
2023-01-19
0
and so on. the table shows weekdays (days minus holidays and weekends) and the number of points someone has earned. 1 is the start of a streak and 0 is the end of a streak. 2 is max points after a streak has started.
I need to find how long is the latest streak. so in this case the result should be 3
I created a recursive cte but the query returns 2 as the streak count because i'm using lag mechanism with days. instead I need to adjust so that the instructional days are used rather than all dates.
RECURSIVE cte AS (
SELECT
student_unique_id,
instructional_day,
points,
1 AS cnt
FROM
`data_per_day`
WHERE
instructional_day = DATE_ADD(CURRENT_DATE('America/Chicago'), INTERVAL -1 DAY)
UNION ALL
SELECT
a.student_unique_id,
a.instructional_day,
a.points,
c.cnt+1
FROM (
SELECT
*
FROM
`data_per_day`
WHERE
points > 0 ) a
INNER JOIN
cte c
ON
a.student_unique_id = c.student_unique_id
AND a.instructional_day = c.instructional_day - INTERVAL '1' day )
SELECT
student_unique_id,
MAX(cnt) AS streak
FROM
cte --
WHERE
student_unique_id = "419"
GROUP BY
student_unique_id
How do I adjust the query?
This is not a trivial coding exercise, so I won't actually write the code and provide it.
What you have here is a gaps and islands question. You want to identify the largest "island" of days with points within a date range. Depending upon what dates are contained in your data, you may need to generate a list of sequential dates that meet your criteria.
One problem I see is that you are trying to combine the steps to generate the date range (the recursive CTE) with the points. You'll need to separate those steps.
Define the date range.
Generate the dates within the range.
Filter the dates with isweekday = 'no' and isholiday = 'no'. You will probably want to add a row number during this step.
[left] join the dates to your data, including coalesce(points, 0)
Filter the data to points > 0.
Identify the islands.
Identify the largest island per student.

Fill in the most recent value highest value in the row field over the date column using tableau or sql

I have a data having 0,1,2 values in the row field and dates in increasing order in the column field and I would like the last '2' value keep constant moving forward in the dates. Please do let know a way to workaround. Example may 2027 has 2 and then 0 but I would like to have 2 in june 2027 and the rest of the dates.I would like to keep previousor beginning date values same but has the maximum value 2 in this case carry forward to the later dates.
Thank you.
Question
MS SQL Syntax:
SELECT TOP 1 value FROM YourTableName WHERE value <> 0 ORDER BY date_action DESC
MySQL Syntax:
SELECT value FROM YourTableName WHERE value <> 0 ORDER BY date_action DESC LIMIT 1

Need to calculate next milestone in the sequence

I have a dataset something like this
I want to calculate the next clinical milestone for the ID as per the sequence number.
E.g. for 665 the next clinical milestone as per the sequence should be DBF as it doesn't have any date present in the actual column ( we need to ignore the intermediate values like FPA and FCI where data isn't present for column actual as data is really dirty and dates can be smaller compared to last one in sequence.)
There is another case where all data in the actual column for an ID is null then, in that case first non-null planned column value for that clinical milestone should be the next one.
e.g. in ID 666 CPC should be the next clinical milestone.
Thought using LAG function as well for this using max of actual for an ID but not sure how will it work when two rows have same actual date.
Use MAX() OVER () with a CASE expression to work out the current sequence value for each id, then filter based on that.
WITH
resequenced AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY sequence) AS new_sequence
FROM
yourTable
WHERE
actual IS NOT NULL
OR planned IS NOT NULL
),
summarised AS
(
SELECT
*,
MAX(CASE WHEN actual IS NOT NULL THEN new_sequence ELSE 0 END) OVER (PARTITION BY id) AS last_sequence
FROM
resequenced
)
SELECT
*
FROM
summarised
WHERE
new_sequence = last_sequence + 1
EDIT: Adapted to deal with gaps in Both the actual and planned columns.

Count rows with equal values in a window function

I have a time series in a SQLite Database and want to analyze it.
The important part of the time series consists of a column with different but not unique string values.
I want to do something like this:
Value concat countValue
A A 1
A A,A 1
B A,A,B 1
B A,B,B 2
B B,B,B 3
C B,B,C 1
B B,C,B 2
I don't know how to get the countValue column. It should count all Values of the partition equal to the current rows Value.
I tried this but it just counts all Values in the partition and not the Values equal to this rows Value.
SELECT
Value,
group_concat(Value) OVER wind AS concat,
Sum(Case When Value Like Value Then 1 Else 0 End) OVER wind AS countValue
FROM TimeSeries
WINDOW
wind AS (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
ORDER BY
date
;
The query is also limited by these factors:
The query should work with any amount of unique Values
The query should work with any Partition Size (ROWS BETWEEN n PRECEDING AND CURRENT ROW)
Is this even possible using only SQL?
Here is an approach using string functions:
select
value,
group_concat(value) over wind as concat,
(
length(group_concat(value) over wind) - length(replace(group_concat(value) over wind, value, ''))
) / length(value) cnt_value
from timeseries
window wind as (order by date rows between 2 preceding and current row)
order by date;

Calculate stdev over a variable range in SQL Server

Table format is as follows:
Date ID subID value
-----------------------------
7/1/1996 100 1 .0543
7/1/1996 100 2 .0023
7/1/1996 200 1 -.0410
8/1/1996 100 1 -.0230
8/1/1996 200 1 .0121
I'd like to apply STDEV to the value column where date falls within a specified range, grouping on the ID column.
Desired output would like something like this:
DateRange, ID, std_v
1 100 .0232
2 100 .0323
1 200 .0423
One idea I've had that works but is clunky, involves creating an additional column (which I've called 'partition') to identify a 'group' of values over which STDEV is taken (by using the OVER function and PARTITION BY applied to 'partition' and 'ID' variables).
Creating the partition variable involves a CASE statement prior where a given record is assigned a partition based on its date falling within a given range (ie,
...
, partition = CASE
WHEN date BETWEEN '7/1/1996' AND '10/1/1996' THEN 1
WHEN date BETWEEN '10/1/1996' AND '1/1/1997' THEN 2
...
Ideally, I'd be able to apply STDEV and the OVER function partitioning on the variable ID and variable date ranges (eg, say, trailing 3 months for a given reference date). Once this works for the 3 month period described above, I'd like to be able to make the date range variable, creating an additional '#dateRange' variable at the start of the program to be able to run this for 2, 3, 6, etc month ranges.
I ended up coming upon a solution to my question.
You can join the original table to a second table, consisting of a unique list of the dates in the first table, applying a BETWEEN clause to specify desired range.
Sample query below.
Initial table, with columns (#excessRets):
Date, ID, subID, value
Second table, a unique list of dates in the previous, with columns (#dates):
Date
select d.date, er.id, STDEV(er.value)
from #dates d
inner join #excessRet er
on er.date between DATEADD(m, -36, d.date) and d.date
group by d.date, er.id
order by er.id, d.date
To achieve the desired next step referenced above (making range variable), simply create a variable at the outset and replace "36" with the variable.