SQL analytics function window size based on result in window - sql

I require a criteria for the window size based on the calculation in the window.
For example the calculation in column col_2:
idx
col_0
col_1
col_2
rows in window
1
A
10
30
1,2
2
A
20
NULL
-
3
A
50
50
3
4
B
10
50
4,5,6
5
B
10
NULL
-
6
B
30
30
6
Basically, col_2 means something like sum(col_1) in [30,50] for the window in col_0 based only on the following columns. A query could be like
SELECT *,
SUM(col_1) OVER (PARTITION BY col_0 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING WHERE SUM(col_1) BETWEEN 30 AND 50) AS col_2
FROM table
I didn't find any function or sql snippet which efficiently solves this problem.
Thanks for your help.

Related

SQL Get max value of n next rows

Say I have a table with two columns: the time and the value. I want to be able to get a table with :
for each time get the max values of every next n seconds.
If I want the max value of every next 3 seconds, the following table:
time
value
1
6
2
1
3
4
4
2
5
5
6
1
7
1
8
3
9
7
Should return:
time
value
max
1
6
6
2
1
4
3
4
5
4
2
5
5
5
5
6
1
3
7
1
7
8
3
NULL
9
7
NULL
Is there a way to do this directly with an sql query?
You can use the max window function:
select *,
case
when row_number() over(order by time desc) > 2 then
max(value) over(order by time rows between current row and 2 following)
end as max
from table_name;
Fiddle
The case expression checks that there are more than 2 rows after the current row to calculate the max, otherwise null is returned (for the last 2 rows ordered by time).
Similar Version to Zakaria, but this solution uses about 40% less CPU resources (scaled to 3M rows for benchmark) as the window functions both use the same exact OVER clause so SQL can better optimize the query.
Optimized Max Value of Rolling Window of 3 Rows
SELECT *,
MaxValueIn3SecondWindow = CASE
/*Check 3 rows exists to compare. If 3 rows exists, then calculate max value*/
WHEN 3 = COUNT(*) OVER (ORDER BY [Time] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
/*Returns max [Value] between the current row and the next 2 rows*/
THEN MAX(A.[Value]) OVER (ORDER BY [Time] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END
FROM #YourTable AS A

Calculating median of 3 columns in a BigQuery table

I am trying to build a query to calculate median of 3 column values. My table looks like below,
Item
Column 1
Column 2
Column 3
A
10
12
4
B
5
14
20
C
15
5
4
I want to be able to output,
Item
Column 1
Column 2
Column 3
Median
A
10
12
4
10
B
5
14
20
14
C
15
5
4
5
I have tried percentile_cont() but that seems to be only for values in a single column. How do i achieve this?
Consider below approach
select *,
( select distinct percentile_disc(col, 0.5) over()
from unnest([Column1, Column2, Column3]) as col
) AS Median
from your_table
if applied to sample data in your question - output is
Have you tried this:
select Col1, Col2, Col3,
PERCENTILE_CONT([Col1, Col2, Col3], 0.5) OVER() AS Median
from tableName

Row Number with specific window size

I want to group records by row numbers.
Like from row 1-3 in group 1 , 4-6 in group 2 , 7-9 in group 3 and so on.
Suppose below is the table structure:
Row NumberDataValue
1 A 10
2 A 5
3 A 1
4 A 33
5 A 2
6 A 127
1 B 1
2 B 0
3 B 7
4 B 7
5 B 5
6 B 8
7 B 1
8 B 0
I want a output like this:
GroupValue
1 10
1 5
1 1
2 33
2 2
2 127
1 1
1 0
1 7
2 7
2 5
2 8
3 1
3 0
I am using Oracle 11G.
I can achieve this using PL/SQL. But I have to use SQL only. As I have to use this query in a reporting tool.
If this is a duplicate question please provide the link of the answered question.
Subtract 1 from the column "RowNumber" and divide by 3.
Then use TRUNC() to get the integer part:
SELECT TRUNC(("RowNumber" - 1) / 3) + 1 "Group",
"Value"
FROM tablename
See the demo.
I would assume the name of the first column is ordering.
You can do:
select
1 + trunc(row_number() over(partition by data order by ordering) - 1) / 3,
value
from t
What you show looks like the output from something like this:
select ceil(rn/3) as grp, value
from your_table
order by rn;
Note that "row number" and "group" are reserved words/phrases which should not be used as column names. I used rn and grp instead.
I think the ceiling function is the simplest way to arrive at what you want. If you want to base it on the RowNumber column:
select ceil( RowNumber / 3.0) as grouping
If you want to calculate it yourself using row_number():
select ceil( row_number() over (order by RowNumber) / 3.0 ) as grouping

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.

Concatenate data if null

I would like ROW_Number() to work normally UNLESS column 'box' is Null. If 'box' is null the row number doesn't increase.
I have data that looks like this...
Row Box
1 5
2 3
3 1
4 Null
5 Null
6 2
7 8
8 Null
9 Null
I want my query to pull out data that looks like this...
Row Box
1 5
2 3
3 1
3 Null
3 Null
4 2
5 8
5 Null
5 Null
I'm trying to avoid using a cursor but I can't figure out how to get this working without one.
You can do this with a correlated subquery. Here is one way:
select (select count(box) from t t2 where t2.row <= t.row) as row,
box
from t
order by row;
This is counting the number of valid box values up to a given row.
In SQL Server 2012, you can do this with a cumulative count():
select count(box) over (order by row) as row, box
from t
order by row;
These assume that row is set as in the question. If row does not start with those values, then you have a problem. SQL tables are inherently unordered, and you need some column to specify the ordering.