redshift SQL - Histogram With Equal Width Bins - sql

I currently have table like so:
ID
Count Value
1
45
2
24
3
13
4
67
5
3
6
21
...
...
Anyone know how to create a table that I can use to create a histogram with equal width bins going from min to max?
End result would look something like this:
Bin of Values
Count(of IDs)
min-5
3
6-10
20
11-15
5
16-20
2
21-25
35
...
...
(max-5)-max
1
I have used width_bucket in the past, but Redshift does not support the function. Any help would be greatly appreciated. Thank you!

You can use case in Redshift. It's a bit more laborious than width_bucket but the results can be the same:
select
case
when val between 0 and 5
then '0-5'
when val between 5 and 10
then '5-10'
when val between 10 and 15
then '10-15'
when val between 15 and 20
then '15-20'
...
end,
count(1)
from my_table
group by 1;

Related

Burndown analysis in SQL Server Management Studio

I'm trying to prepare my data to create a burndown visual. As you can see the Rate column isn't simply A - B, as it carries forward the previous value if B is null.
I've tried some case statements using lag and sums but no avail.
Some direction on the case statement or an optimal solution would be ideal.
For example, this is how my data looks:
ID
A
B
1
20
NULL
2
20
3
3
20
NULL
4
20
7
5
20
NULL
6
20
NULL
7
20
NULL
8
20
5
9
20
7
And I want a rate column that looks like this.
ID
A
B
Rate
1
20
NULL
20
2
20
3
17
3
20
NULL
17
4
20
7
10
5
20
NULL
10
6
20
NULL
10
7
20
NULL
10
8
20
5
5
9
20
7
-2
Thanks to #Larnu for the guidance.
Here is the solution when you have your data partitioned by some group ID and ordered by some data or row ID.
SELECT
GROUP_ID,
ROW_ID,
COL_A,
COL_B,
COL_A - (SUM(ISNULL(COL_B,0)) OVER (PARTITION BY GROUP_ID ORDER BY ROW_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM table

Group rows using the cumulative sum of a third column

I have a table with two columns:
sort_column = A column I use for sorting
value_column = My metric of interest (a positive integer)
Using SQL, I need to create contiguous groups of rows, ordered by sort_column, such that the sum of value_column within each group is the largest possible but staying below 100 (100 not included).
Find below an example of my desired result.
Thanks
sort_column
value_column
desired_result
1
53
1
2
25
1
3
33
2
4
25
2
5
10
2
6
46
3
7
9
3
8
49
4
9
48
4
10
53
5
11
33
5
12
52
6
13
29
6
14
16
6
15
66
7
16
1
7
17
62
8
18
57
9
19
47
10
20
12
10
Ok, so after a few lengthy attempts, I came to the conclusion the task is impossible with pure SQL, because a given value of the desired column depends on previous values of that same column, in a way that cannot be obtained from the first two columns alone, so the problem is impossible to tackle without using a recursive CTE, which BigQuery does not support.
I solved the issue by writing a javascript UDF for the task. It seems to be working fine and produces the expected results.
Many thanks everyone!

SQL - Select rows after reaching minimum value/threshold

Using Sql Server Mgmt Studio. My data set is as below.
ID Days Value Threshold
A 1 10 30
A 2 20 30
A 3 34 30
A 4 25 30
A 5 20 30
B 1 5 15
B 2 10 15
B 3 12 15
B 4 17 15
B 5 20 15
I want to run a query so only rows after the threshold has been reached are selected for each ID. Also, I want to create a new days column starting at 1 from where the rows are selected. The expected output for the above dataset will look like
ID Days Value Threshold NewDayColumn
A 3 34 30 1
A 4 25 30 2
A 5 20 30 3
B 4 17 15 1
B 5 20 15 2
It doesn't matter if the data goes below the threshold for the latter rows, I want to take the first row when threshold is crossed as 1 and continue counting rows for the ID.
Thank you!
You can use window functions for this. Here is one method:
select t.*, row_number() over (partition by id order by days) as newDayColumn
from (select t.*,
min(case when value > threshold then days end) over (partition by id) as threshold_days
from t
) t
where days >= threshold_days;

Select every ten steps SQL

I have the following table:
----------------------------------------------
oNumber oValue1
----------------------------------------------
1 54
2 44
3 89
4 65
ff.
10 33
11 22
ff.
20 43
21 76
ff.
100 45
I want to select every 10 value in oNumber. So the result should be:
----------------------------------------------
oNumber oValue1
----------------------------------------------
10 33
20 43
ff.
100 45
Also, oNumber is not a sequence number. It's just a value. Even it isn't a sequence number, 10, 20, 30 and so on will always appear under oNumber field.
Does anyone know how is the tsql for this case?
Thank you.
select * from table where oNumber % 10 = 0
https://msdn.microsoft.com/en-us/library/ms190279.aspx
Use the "Modulo" operator - %. So in this case, the answer would be something like:
SELECT * FROM table WHERE oNumber % 10 = 0
This will only load if oNumber is a number divisible by ten (and therefore has a remainder zero).
In the case you simply want multiples of 10, then just use the modulo operator as stated by Daniel and Ian.
select *
from table
where oNumber % 10 = 0;
However, I felt that you could be alluding to the fact that you want to get every 10th item in your list. If that's the case, which it may be not, you would simply just sequence your set based on oNumber and use the modulo operator.
select *
from (
select *,
RowNum = row_number() over (order by oNumber)
from table) a
where RowNum % 10 = 0;

SQL - Retrieve Closest Lower Value

When a column value does not equal, I would like to retrieve the closest lower pay value.
For instance: 10 yearsOfService should equal the value 650.00; 14 yearsOfService would equal the value 840.00 in the below incentive table,
ID Pay yearsOfService
1 125.00 0
2 156.00 2
3 188.00 3
4 206.00 4
5 650.00 6
6 840.00 14
7 585.00 22
8 495.00 23
9 385.00 24
10 250.00 25
I have tried several different approaches; including:
SELECT TOP 1 (pay) as incentivePay
FROM incentive
WHERE yearsOfService = '10'
This works but only for yearsOfService that match.
With 10 yearsOfService:
RESULTSET = [1 650.00]
Any ideas?
Please try:
SELECT TOP 1 (pay) as incentivePay
FROM incentive
WHERE yearsOfService <= '10'
ORDER BY yearsOfService desc