Grouping by a given number of row values - sql

I have a list of values in a column 1, now I want to get the sum of next 5 row values or 6 row values like below and populate the value in appropriate column.
for example if you see 1st row value of the column 'next-5row-value' would be the sum of values from the current row to the next 5 rows which would be 9 and the next column would be sum of next 5 row values from that reference point.
I am trying to write functions to loop through to arrive at the sum. Is there an efficient way . can some one help me out. I am using postgres, greenplum . Thanks!

for example if you have this simple table:
sebpa=# \d numbers
Table "public.numbers"
Column | Type | Modifiers
--------+---------+------------------------------------------------------
id | integer | not null default nextval('numbers_id_seq'::regclass)
i | integer |
sebpa=# select * from numbers limit 15;
id | i
------+---
3001 | 3
3002 | 0
3003 | 5
3004 | 1
3005 | 1
3006 | 4
3007 | 1
3008 | 1
3009 | 4
3010 | 0
3011 | 4
3012 | 0
3013 | 3
3014 | 2
3015 | 1
(15 rows)
you can use this sql:
sebpa=# select id, i, sum(i) over( order by id rows between 0 preceding and 4 following) from numbers;
id | i | sum
------+---+-----
3001 | 3 | 10
3002 | 0 | 11
3003 | 5 | 12
3004 | 1 | 8
3005 | 1 | 11
3006 | 4 | 10
3007 | 1 | 10
3008 | 1 | 9
3009 | 4 | 11
3010 | 0 | 9
3011 | 4 | 10
3012 | 0 | 10
3013 | 3 | 13
3014 | 2 | 15
3015 | 1 | 17
3016 | 4 | 20
3017 | 3 | 17
--cutted output

You can try something like this:
SELECT V,
SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING) AS next_5row_value,
SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 6 FOLLOWING) AS next_6row_value
FROM YourTable;
If you don't want NULLs in your "next_5row_value" column (the same for the 6row), you can use the COALESCE function (supported in PostgreSQL too) that returns the first not null expression.
Something like this:
SELECT V,
COALESCE(SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING), 0) AS next_5row_value,
COALESCE(SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 6 FOLLOWING), 0) AS next_6row_value
FROM YourTable;

I don't think Postgres 8.4 supports the full functionality of window frames. You can do this by using lead():
select value,
(value +
lead(value, 1) over (order by id) +
lead(value, 2) over (order by id) +
lead(value, 3) over (order by id) +
lead(value, 4) over (order by id)
) as next5,
(value +
lead(value, 1) over (order by id) +
lead(value, 2) over (order by id) +
lead(value, 3) over (order by id) +
lead(value, 4) over (order by id) +
lead(value, 5) over (order by id)
) as next5
from table t;
Using a window frame definition is definitely a better approach if the database supports it. But the above will also work.

Related

Postgres: Range Lookup with Auto increment

I have the following table: table1
begin | value | end
---------------------
1 | 3 | 10
1 | 5 | 10
1 | 2 | 10
1 | 7 | 10
11 | 19 | 20
11 | 16 | 20
11 | 14 | 20
I am looking for the following output:
begin | value | end | case
-----------------------------
1 | 3 | 10 | 1
1 | 5 | 10 | 1
1 | 2 | 10 | 1
1 | 7 | 10 | 1
11 | 19 | 20 | 2
11 | 16 | 20 | 2
11 | 14 | 20 | 2
I want to assign a unique number for numbers falling within a particular range but I am unable to find my way around it. Any suggestions?
Hmmm. This is a gap and islands problem. You can identify where islands start by checking that there are no other rows that overlap with them. For that, you can use a cumulative max.
This gets you close:
select t.*,
count(*) filter where (prev_end < start) over (order by start) as grp
from (select t.*,
max(end) over (order by start range between unbounded preceding and 1 preceding) as prev_end
from t
) t;
However, the ties in the data mean that this has gaps. So, one more level:
select t.*, dense_rank() over (order by grp) as sequential_grp
from (select t.*,
count(*) filter (where prev_end < start) over (order by start) as grp
from (select t.*,
max(end) over (order by start range between unbounded preceding and 1 preceding) as prev_end
from t
) t
) t;
Here is a db<>fiddle -- with the column names changed, because names like begin and end are SQL keywords and hence a bad idea for column names.

Select Rows who's Sum Value = 80% of the Total

Here is an example the business problem.
I have 10 sales that resulted in negative margin.
We want to review these records, we generally use the 20/80 rule in reviews.
That is 20 percent of the sales will likely represent 80 of the negative margin.
So with the below records....
+----+-------+
| ID | Value |
+----+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 20 |
| 4 | 10 |
| 5 | 5 |
| 6 | 5 |
| 7 | 2 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
+----+-------+
I would want to return...
+----+-------+
| ID | Value |
+----+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 20 |
| 4 | 10 |
+----+-------+
The Total of Value is 106, 80% is then 84.8.
I need all the records, sorted descending who sum value gets me to at least 84.8
We use Microsoft APS PDW SQL, but can process on SMP if needed.
Assuming window functions are supported, you can use
with cte as (select id,value
,sum(value) over(order by value desc,id) as running_sum
,sum(value) over() as total
from tbl
)
select id,value from cte where running_sum < total*0.8
union all
select top 1 id,value from cte where running_sum >= total*0.8 order by value desc
One way is to use running totals:
select
id,
value
from
(
select
id,
value,
sum(value) over () as total,
sum(value) over (order by value desc) as till_here,
sum(value) over (order by value desc rows between unbounded preceding and 1 preceding)
as till_prev
from mytable
) summed_up
where till_here * 1.0 / total <= 0.8
or (till_here * 1.0 / total >= 0.8 and coalesce(till_prev, 0) * 1.0 / total < 0.8)
order by value desc;
This link could be useful, it calculates running totals:
https://www.codeproject.com/Articles/300785/Calculating-simple-running-totals-in-SQL-Server

Window running function except current row

I have a theoretical question, so I'm not interested in alternative solutions. Sorry.
Q: Is it possible to get the window running function values for all previous rows, except current?
For example:
with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over (partition by x order by i) - y as sum,
max(y) over (partition by x order by i) as max,
count(*) filter (where y > 2) over (partition by x order by i) as cnt
from
t;
Actual result is
i | x | y | sum | max | cnt
---+---+---+-----+-----+-----
1 | 1 | 1 | 0 | 1 | 0
2 | 1 | 3 | 1 | 3 | 1
3 | 1 | 2 | 4 | 3 | 1
4 | 2 | 4 | 0 | 4 | 1
5 | 2 | 2 | 4 | 4 | 1
6 | 2 | 8 | 6 | 8 | 2
(6 rows)
I want to have max and cnt columns behavior like sum column, so, result should be:
i | x | y | sum | max | cnt
---+---+---+-----+-----+-----
1 | 1 | 1 | 0 | | 0
2 | 1 | 3 | 1 | 1 | 0
3 | 1 | 2 | 4 | 3 | 1
4 | 2 | 4 | 0 | | 0
5 | 2 | 2 | 4 | 4 | 1
6 | 2 | 8 | 6 | 4 | 1
(6 rows)
It can be achieved using simple subquery like
select t.*, lag(y,1) over (partition by x order by i) as yy from t
but is it possible using only window function syntax, without subqueries?
Yes, you can. This does the trick:
with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over w as sum,
max(y) over w as max,
count(*) filter (where y > 2) over w as cnt
from t
window w as (partition by x order by i
rows between unbounded preceding and 1 preceding);
The frame_clause selects just those rows from the window frame that you are interested in.
Note that in the sum column you'll get null rather than 0 because of the frame clause: the first row in the frame has no row before it. You can coalesce() this away if needed.
SQLFiddle

Order rows by ntile and row_number

I'm trying to build stored procedure that will return data for Crystal Reports report.
Inside CR I'm using multi column layout.
I want to get 3 layout column something like this:
1 5 8
2 6 9
3 7 10
4
But because CR has some layout issues it is ordering my table like this:
1 2 3
4 5 6
7 8 9
10
So I've tried to create procedure that will return extra column on which I'll sort my data.
So instead 1,2,3,4 order I need 1,4,7,10,2,5,8,3,6,9...
I have table with that data:
ID | CASE_ID | CASE_DATE
--------------------------
1 | 1 | 2014-02-03
2 | 1 | 2014-02-04
3 | 1 | 2014-02-05
4 | 1 | 2014-02-06
5 | 1 | 2014-02-07
6 | 1 | 2014-02-08
7 | 1 | 2014-02-09
8 | 1 | 2014-02-10
9 | 1 | 2014-02-11
10 | 1 | 2014-02-12
AND I need stored procedure that will return this data:
ID | CASE_ID | CASE_DATE | ORDER
---------------------------------
1 | 1 | 2014-02-03 | 1
2 | 1 | 2014-02-04 | 5
3 | 1 | 2014-02-05 | 8
4 | 1 | 2014-02-06 | 2
5 | 1 | 2014-02-07 | 6
6 | 1 | 2014-02-08 | 9
7 | 1 | 2014-02-09 | 3
8 | 1 | 2014-02-10 | 7
9 | 1 | 2014-02-11 | 10
10 | 1 | 2014-02-12 | 4
Here is sql fiddle with sample data and my code: http://sqlfiddle.com/#!3/c24c1/1
Idea behind sort column:
divide all rows into 3 groups (ntile), take first item from first group, then first from second and first from third group
EDIT:
Here is my temporary solution, I hope that running this will clarify what I had in mind when I was asking this question:
--DECLARE #NUM INT;
--SET #NUM=3;
SELECT ID,
CASE_ID,
CONVERT(NVARCHAR(10),CASE_DATE,121) AS DATA,
(ROW1 - 1) * 3/*#NUM*/ + COL AS [ORDER]
FROM
( SELECT CASE_ID,
ID,
ROW AS LP,
COL,
ROW_NUMBER() OVER (PARTITION BY CASE_ID, COL ORDER BY ROW) AS ROW1,
CASE_DATE
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY D.CASE_ID ORDER BY D.ID) AS ROW,
NTILE(3/*#NUM*/) OVER (PARTITION BY D.CASE_ID ORDER BY D.ID) AS COL,
ID,
D.CASE_ID,
CASE_DATE
FROM DATA D
WHERE D.CASE_ID = 1)X )Y
ORDER BY Y.CASE_ID,
LP
Edit: It looks like you actually want the ORDER column, not just returning the columns in that order.
SELECT ID,
CASE_ID,
DATA,
ROW_NUMBER() OVER (ORDER BY ROW, N) AS [ORDER]
FROM (
SELECT ID,
CASE_ID,
N,
ROW_NUMBER() OVER (PARTITION BY CASE_ID, N ORDER BY ID) AS ROW,
DATA
FROM (
SELECT
ID,
CASE_ID,
NTILE(3) OVER (PARTITION BY CASE_ID ORDER BY ID) AS N,
CONVERT(NVARCHAR(10), CASE_DATE,121) AS DATA
FROM DATA
WHERE CASE_ID = 1 ) X ) Y
ORDER BY ID;
SQLFiddle

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.