SQL Server : Segregate data into dynamic buckets

SQL Server : Segregate data into dynamic buckets - sql

Please help me with a SQL Server Query that can bucket data dynamically into ranges.
Here is my source data:
Value
=======
45
33.5
33.1
33
32.8
25.3
25.2
25.1
25
21.3
21.2
21.1
20.9
12.3
12.2
12.15
12.1
12
11.8
Expected output:
Value Rank
=============
45 1
(mean value in this range is 45)
33.5 2
33.1 2
33 2
32.8 2
(mean value is 33.1 - any value in the range (-10%) 29.79 to 36.41 (+10%) should be given a rank of 2)
25.3 3
25.2 3
25.1 3
25 3
21.3 4
21.2 4
21.1 4
20.9 4
12.3 5
12.2 5
12.15 5
12.1 5
12 5
11.8 5
DENSE, RANK and NTILE does not seem to give me a ranking like this. The range is dynamic and not known earlier. Any help highly appreciated.
The bucketing rule is:
Each bucket contains a data set with 10% variation from the mean value

Here's one way:
select val, dense_rank() over (order by cast(val/10 as int) desc) ntile
from yourtable
Use dense_rank but specify your buckets in the order by clause. (I'm assuming this is how it works for your sample data)

First convert the value to a number having 2 decimal places.
Then, use a CASE expression for doing FLOOR or ROUND function based on the first number after decimal point.
Then use DENSE_RANK function for giving rank based on the rounded value.
Query
select z.[Value], dense_rank() over(order by z.[val_rounded] desc) as [Rank] from(
select t.[Value],
case when substring(t.[Value2], charindex('.', t.[Value2], 1) + 1, 1) > 5
then round(t.[Value], 0) else floor(t.[Value]) end as [val_rounded] from(
select [Value], cast((cast([Value]as decimal(6, 2))) as varchar(50)) as [Value2]
from [your_table_name]
)t
)z;
Demo

Related

LAG function with sequential calculus

I come to you today because I'm struggling with a query that involve the LAG function (FYI, I am using PostgreSQL).
I have a table that contains the quantities of a product sold by country to another one on a monthly basis. The table is defined like this:
create table market_research.test_tonnage(
origin text, -- Origin country
desti text, -- Destination country
yr int, -- Year
mt int, -- Month
q numeric -- quantity sold (always > 0)
)
Here is the content:
origin
desti
yr
mt
q
toto
coucou
2019
1
1.4
toto
coucou
2019
2
2.5
toto
coucou
2019
3
1.2
tata
yoyo
2018
11
5.4
tata
yoyo
2018
12
5.5
tata
yoyo
2019
1
5.2
I am trying to create a view that will add 2 calculated fields as following:
beginning_stock : Initial value of 0, then beginning_stock = ending_stock of the previous month
ending_stock : ending_stock = beginning_stock - q
origin
desti
yr
mt
q
beginning_stock
ending_stock
toto
coucou
2019
1
1.4
0
-1.4
toto
coucou
2019
2
2.5
-1.4
-3.9
toto
coucou
2019
3
1.2
-3.9
-5.1
tata
yoyo
2018
11
5.4
0
-5.4
tata
yoyo
2018
12
5.5
-5.4
-10.9
tata
yoyo
2019
1
5.2
-10.9
-16.1
I have tried many queries using the LAG function but I think that the problem comes from the sequentiality of the calculus over time. Here is an example of my attempt:
select origin,
desti,
yr,
mt,
q,
COALESCE(lag(ending_stock, 1) over (partition by origin order by yr, mt), 0) beginning_stock,
beginning_stock - q ending_stock
from market_research.test_tonnage
Thank you for your help!
Max

You need a cumulative SUM() function instead of LAG():
demo:db<>fiddle
SELECT
*,
SUM(-q) OVER (PARTITION BY origin ORDER BY yr, mt) + q as beginning, -- 2
SUM(-q) OVER (PARTITION BY origin ORDER BY yr, mt) as ending -- 1
FROM my_table
Sum all quantities (because you want negative values, you can make the values negative before, of course) until the current gives you current total (ending)
Same operation without the current value (add q again, because the SUM() subtracted it already) gives the beginning.

SQL Sumif statement request

I would like to create a column that will get the total hours based on the store column and the hours column. See below table. So it will total up rep1,2,3 from store 142 and total rep 1,2 from store 356. Then I would also like to devide hours into total to get a contribution% column
Date store rep hours total cont%
--------------------------------------------------
x 142 rep1 5 11 0.45
x 142 rep2 2 11 0.18
x 142 rep3 4 11 0.36
x 356 rep1 4 7 0.57
x 356 rep2 3 7 0.42
Thank you!

You want window functions:
select t.*, sum(hours) over (partition by store) as total,
t.hours * 1.0 / sum(hours) over (partition by store) as cont_percent
from t;

How to calculate a running total that is a distinct sum of values

Consider this dataset:
id site_id type_id value date
------- ------- ------- ------- -------------------
1 1 1 50 2017-08-09 06:49:47
2 1 2 48 2017-08-10 08:19:49
3 1 1 52 2017-08-11 06:15:00
4 1 1 45 2017-08-12 10:39:47
5 1 2 40 2017-08-14 10:33:00
6 2 1 30 2017-08-09 07:25:32
7 2 2 32 2017-08-12 04:11:05
8 3 1 80 2017-08-09 19:55:12
9 3 2 75 2017-08-13 02:54:47
10 2 1 25 2017-08-15 10:00:05
I would like to construct a query that returns a running total for each date by type. I can get close with a window function, but I only want the latest value for each site to be summed for the running total (a simple window function will not work because it sums all values up to a date--not just the last values for each site). So I guess it could be better described as a running distinct total?
The result I'm looking for would be like this:
type_id date sum
------- ------------------- -------
1 2017-08-09 06:49:47 50
1 2017-08-09 07:25:32 80
1 2017-08-09 19:55:12 160
1 2017-08-11 06:15:00 162
1 2017-08-12 10:39:47 155
1 2017-08-15 10:00:05 150
2 2017-08-10 08:19:49 48
2 2017-08-12 04:11:05 80
2 2017-08-13 02:54:47 155
2 2017-08-14 10:33:00 147
The key here is that the sum is not a running sum. It should only be the sum of the most recent values for each site, by type, at each date. I think I can help explain it by walking through the result set I've provided above. For my explanation, I'll walk through the original data chronologically and try to explain the expected result.
The first row of the result starts us off, at 2017-08-09 06:49:47, where chronologically, there is only one record of type 1 and it is 50, so that is our sum for 2017-08-09 06:49:47.
The second row of the result is at 2017-08-09 07:25:32, at this point in time we have 2 unique sites with values for type_id = 1. They have values of 50 and 30, so the sum is 80.
The third row of the result occurs at 2017-08-09 19:55:12, where now we have 3 sites with values for type_id = 1. 50 + 30 + 80 = 160.
The fourth row is where it gets interesting. At 2017-08-11 06:15:00 there are 4 records with a type_id = 1, but 2 of them are for the same site. I'm only interested in the most recent value for each site so the values I'd like to sum are: 30 + 80 + 52 resulting in 162.
The 5th row is similar to the 4th since the value for site_id:1, type_id:1 has changed again and is now 45. This results in the latest values for type_id:1 at 2017-08-12 10:39:47 are now: 30 + 80 + 45 = 155.
Reviewing the 6th row is also interesting when we consider that at 2017-08-15 10:00:05, site 2 has a new value for type_id 1, which gives us: 80 + 45 + 25 = 150 for 2017-08-15 10:00:05.

You can get a cumulative total (running total) by including an ORDER BY clause in your window frame.
select
type_id,
date,
sum(value) over (partition by type_id order by date) as sum
from your_table;
The ORDER BY works because
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

SELECT type_id,
date,
SUM(value) OVER (PARTITION BY type_id ORDER BY type_id, date) - (SUM(value) OVER (PARTITION BY type_id, site_id ORDER BY type_id, date) - value) AS sum
FROM your_table
ORDER BY type_id,
date

Calculate Sub Query Column Based On Calculated Column

I have a table ScheduleRotationDetail that contains these as columns:
ScheduleRotationID ScheduleID Ordinal Duration
379 61 1 1
379 379 2 20
379 512 3 1
379 89 4 20
I have a query that goes like this in order to get the day of the year each schedule is supposed to start on:
SELECT ScheduleID, Ordinal, Duration,
,Duration * 7 AS DurationDays
,( SELECT ( ISNULL( SUM(ISNULL( Duration, 0 )), 0 ) - 1 ) * 7
FROM ScheduleRotationDetail WHERE ScheduleRotationID = srd.ScheduleRotationID
AND Ordinal <= srd.Ordinal ) AS StartDay
FROM ScheduleRotationDetail srd
WHERE srd.ScheduleRotationID = 379
That outputs this as the result set:
ScheduleID Ordinal Duration DurationDays StartDay
61 1 1 7 0
379 2 20 140 140
512 3 1 7 147
89 4 20 140 287
Yet what I need the start day column values to be are:
0
7
147
154
I have tried CTEs but can't get it to work so I've come to here for advice.

It looks like you want a cumulative sum. In SQL Server 2012+, you can do:
SELECT ScheduleID, Ordinal, Duration,
SUM(Duration*7) OVER (ORDER BY Ordinal) - Duration*7 as StartDate
FROM ScheduleRotationDetail srd ;
In earlier versions, you can use APPLY for this purpose (or a correlated subquery).

Sequence / serial no in Oracle sql

My question is similar to how to generate Serial numbers +Add 1 in select statement
But I need the seq as below in Oracle sql
table 1 data:
facility store stop_seq
32 729 1
32 380 2
32 603 3
12 722 4
12 671 5
48 423 6
I need result as below:
facility res_seq
32 1
12 2
48 3
Here res_seq should be order by based on stop_seq in tabel 1
Please help

select facility, row_number() over(order by max(stop_seq)) res_seq
from your_tab group by facility;
ROW_NUMBER is explained in the link posted in the question
Analytic functions are performed after GROUP BY, so in this query the data is aggregated by facility and then row numbers are assigned

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server : Segregate data into dynamic buckets - sql

Here's one way: select val, dense_rank() over (order by cast(val/10 as int) desc) ntile from yourtable Use dense_rank but specify your buckets in the order by clause. (I'm assuming this is how it works for your sample data)

Related

LAG function with sequential calculus

SQL Sumif statement request

How to calculate a running total that is a distinct sum of values

Calculate Sub Query Column Based On Calculated Column

Sequence / serial no in Oracle sql

Categories

Resources