Postgres: Range Lookup with Auto increment - sql

I have the following table: table1
begin | value | end
---------------------
1 | 3 | 10
1 | 5 | 10
1 | 2 | 10
1 | 7 | 10
11 | 19 | 20
11 | 16 | 20
11 | 14 | 20
I am looking for the following output:
begin | value | end | case
-----------------------------
1 | 3 | 10 | 1
1 | 5 | 10 | 1
1 | 2 | 10 | 1
1 | 7 | 10 | 1
11 | 19 | 20 | 2
11 | 16 | 20 | 2
11 | 14 | 20 | 2
I want to assign a unique number for numbers falling within a particular range but I am unable to find my way around it. Any suggestions?

Hmmm. This is a gap and islands problem. You can identify where islands start by checking that there are no other rows that overlap with them. For that, you can use a cumulative max.
This gets you close:
select t.*,
count(*) filter where (prev_end < start) over (order by start) as grp
from (select t.*,
max(end) over (order by start range between unbounded preceding and 1 preceding) as prev_end
from t
) t;
However, the ties in the data mean that this has gaps. So, one more level:
select t.*, dense_rank() over (order by grp) as sequential_grp
from (select t.*,
count(*) filter (where prev_end < start) over (order by start) as grp
from (select t.*,
max(end) over (order by start range between unbounded preceding and 1 preceding) as prev_end
from t
) t
) t;
Here is a db<>fiddle -- with the column names changed, because names like begin and end are SQL keywords and hence a bad idea for column names.

Related

Grouped LIMIT in PostgreSQL: show the first N rows for each group, BUT only if the first of those row equals specific data

Consider the following table:
SELECT * FROM report_raw_data;
ts | d_stamp | id_mod | value
-----------+------------+--------+------
1605450647 | 2020-11-15 | 1 | 60
1605464634 | 2020-11-15 | 2 | 54
1605382126 | 2020-11-14 | 1 | 40
1605362085 | 2020-11-14 | 3 | 33
1605355089 | 2020-11-13 | 1 | 60
1605202153 | 2020-11-12 | 2 | 30
What I need is to get the first two rows ordered by ts of each id_mod but only if the d_stamp is the current date (in this case 2020-11-15).
So far I have managed to get the first two rows of each id_mod ordered by ts, but I struggle with the only current date 2020-11-15.
Here is my and wrong result try:
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id_mod ORDER BY ts DESC) AS r,t.* FROM
report_raw_data t) x WHERE x.r <= 2;
ts | d_stamp | id_mod | value
-----------+------------+--------+------
1605450647 | 2020-11-15 | 1 | 60
1605382126 | 2020-11-14 | 1 | 40
1605464634 | 2020-11-15 | 2 | 54
1605202153 | 2020-11-12 | 2 | 30
1605362085 | 2020-11-14 | 3 | 33
If I use in the query WHERE = '2020-11-15' I will ultimately get only those records (so no second rows) which I need.
This is what I would like to get (ignoring the id_mod number 3) since it's the first row does not start on 2020-11-15:
ts | d_stamp | id_mod | value
-----------+------------+--------+------
1605450647 | 2020-11-15 | 1 | 60
1605382126 | 2020-11-14 | 1 | 40
1605464634 | 2020-11-15 | 2 | 54
1605202153 | 2020-11-12 | 2 | 30
One more note: I will need to be able to use LIMIT and OFFSET with the query to be able to paginate through the results on the frontend.
Starting from your current query, a simple approach is to use a window MAX() in the subquery to recover the latest ts per id_mod. You can then use that for additional filtering in the outer query.
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id_mod ORDER BY ts DESC) AS rn,
MAX(ts) OVER(PARTITION BY id_mod) max_ts
FROM report_raw_data t
) x
WHERE rn <= 2 and max_ts = current_date;
Assuming you have no future data, I would suggest:
SELECT rdr.*
FROM (SELECT rdr.*,
ROW_NUMBER() OVER (PARTITION BY id_mod ORDER BY ts DESC) AS seqnum
FROM report_raw_data rdr
WHERE d_stamp = current_date
) rdr
WHERE seqnum <= 2;
Filtering based on the time in the subquery should significantly improve performance. And for optimal performance, you want an index on (d_stamp, id_mod, ts desc).

How to calculate average of values without including the last value (sql)?

I have a table. I partition it by the id and want to calculate average of the values previous to the current, without including the current value. Here is a sample table:
+----+-------+------------+
| id | Value | Date |
+----+-------+------------+
| 1 | 51 | 2020-11-26 |
| 1 | 45 | 2020-11-25 |
| 1 | 47 | 2020-11-24 |
| 2 | 32 | 2020-11-26 |
| 2 | 51 | 2020-11-25 |
| 2 | 45 | 2020-11-24 |
| 3 | 47 | 2020-11-26 |
| 3 | 32 | 2020-11-25 |
| 3 | 35 | 2020-11-24 |
+----+-------+------------+
In this case, it means calculating the average of values for dates BEFORE 2020-11-26. This is the expected result
+----+-------+
| id | Value |
+----+-------+
| 1 | 46 |
| 2 | 48 |
| 3 | 33.5 |
+----+-------+
I have calculated it using ROWS N PRECEDING but it appears that this way I average N preceding + last row, and I want to exclude the last row (which is the most recent date in my case).
Here is my query:
SELECT ID,
(avg(Value) OVER(
PARTITION BY ID
ORDER BY Date
ROWS 9 PRECEDING )) as avg9
FROM t1
Then define your window in full using both the start and ends with BETWEEN:
SELECT ID,
(AVG(Value) OVER (PARTITION BY ID ORDER BY Date ROWS BETWEEN 9 PRECEDING AND 1 PRECEDING)) AS avg9
FROM t1;
Why not just filter:
select id, avg(value)
from t1
where date < '2020-11-26'
group by id;
If you want the date to be flexible -- say the most recent value for each date, then:
select id, avg(value)
from (select t1.*,
max(date) over (partition by id) as max_date
from t1
) t1
where date < max_date
group by id;
Do a row_number() over (Partition by id ORDER BY [Date] DESC). This will give a rank = 1 to the row with latest date. Wrap it within a CTE and then calculate avg for each partition where RANK > 1. Please check syntax.
;with a as
(
select id, value, Date, row_number() over (partition by id order by date
desc) as RN
)
select id, avg(Value) from a group by id where r.RN > 1

Select Rows who's Sum Value = 80% of the Total

Here is an example the business problem.
I have 10 sales that resulted in negative margin.
We want to review these records, we generally use the 20/80 rule in reviews.
That is 20 percent of the sales will likely represent 80 of the negative margin.
So with the below records....
+----+-------+
| ID | Value |
+----+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 20 |
| 4 | 10 |
| 5 | 5 |
| 6 | 5 |
| 7 | 2 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
+----+-------+
I would want to return...
+----+-------+
| ID | Value |
+----+-------+
| 1 | 30 |
| 2 | 30 |
| 3 | 20 |
| 4 | 10 |
+----+-------+
The Total of Value is 106, 80% is then 84.8.
I need all the records, sorted descending who sum value gets me to at least 84.8
We use Microsoft APS PDW SQL, but can process on SMP if needed.
Assuming window functions are supported, you can use
with cte as (select id,value
,sum(value) over(order by value desc,id) as running_sum
,sum(value) over() as total
from tbl
)
select id,value from cte where running_sum < total*0.8
union all
select top 1 id,value from cte where running_sum >= total*0.8 order by value desc
One way is to use running totals:
select
id,
value
from
(
select
id,
value,
sum(value) over () as total,
sum(value) over (order by value desc) as till_here,
sum(value) over (order by value desc rows between unbounded preceding and 1 preceding)
as till_prev
from mytable
) summed_up
where till_here * 1.0 / total <= 0.8
or (till_here * 1.0 / total >= 0.8 and coalesce(till_prev, 0) * 1.0 / total < 0.8)
order by value desc;
This link could be useful, it calculates running totals:
https://www.codeproject.com/Articles/300785/Calculating-simple-running-totals-in-SQL-Server

Order rows by ntile and row_number

I'm trying to build stored procedure that will return data for Crystal Reports report.
Inside CR I'm using multi column layout.
I want to get 3 layout column something like this:
1 5 8
2 6 9
3 7 10
4
But because CR has some layout issues it is ordering my table like this:
1 2 3
4 5 6
7 8 9
10
So I've tried to create procedure that will return extra column on which I'll sort my data.
So instead 1,2,3,4 order I need 1,4,7,10,2,5,8,3,6,9...
I have table with that data:
ID | CASE_ID | CASE_DATE
--------------------------
1 | 1 | 2014-02-03
2 | 1 | 2014-02-04
3 | 1 | 2014-02-05
4 | 1 | 2014-02-06
5 | 1 | 2014-02-07
6 | 1 | 2014-02-08
7 | 1 | 2014-02-09
8 | 1 | 2014-02-10
9 | 1 | 2014-02-11
10 | 1 | 2014-02-12
AND I need stored procedure that will return this data:
ID | CASE_ID | CASE_DATE | ORDER
---------------------------------
1 | 1 | 2014-02-03 | 1
2 | 1 | 2014-02-04 | 5
3 | 1 | 2014-02-05 | 8
4 | 1 | 2014-02-06 | 2
5 | 1 | 2014-02-07 | 6
6 | 1 | 2014-02-08 | 9
7 | 1 | 2014-02-09 | 3
8 | 1 | 2014-02-10 | 7
9 | 1 | 2014-02-11 | 10
10 | 1 | 2014-02-12 | 4
Here is sql fiddle with sample data and my code: http://sqlfiddle.com/#!3/c24c1/1
Idea behind sort column:
divide all rows into 3 groups (ntile), take first item from first group, then first from second and first from third group
EDIT:
Here is my temporary solution, I hope that running this will clarify what I had in mind when I was asking this question:
--DECLARE #NUM INT;
--SET #NUM=3;
SELECT ID,
CASE_ID,
CONVERT(NVARCHAR(10),CASE_DATE,121) AS DATA,
(ROW1 - 1) * 3/*#NUM*/ + COL AS [ORDER]
FROM
( SELECT CASE_ID,
ID,
ROW AS LP,
COL,
ROW_NUMBER() OVER (PARTITION BY CASE_ID, COL ORDER BY ROW) AS ROW1,
CASE_DATE
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY D.CASE_ID ORDER BY D.ID) AS ROW,
NTILE(3/*#NUM*/) OVER (PARTITION BY D.CASE_ID ORDER BY D.ID) AS COL,
ID,
D.CASE_ID,
CASE_DATE
FROM DATA D
WHERE D.CASE_ID = 1)X )Y
ORDER BY Y.CASE_ID,
LP
Edit: It looks like you actually want the ORDER column, not just returning the columns in that order.
SELECT ID,
CASE_ID,
DATA,
ROW_NUMBER() OVER (ORDER BY ROW, N) AS [ORDER]
FROM (
SELECT ID,
CASE_ID,
N,
ROW_NUMBER() OVER (PARTITION BY CASE_ID, N ORDER BY ID) AS ROW,
DATA
FROM (
SELECT
ID,
CASE_ID,
NTILE(3) OVER (PARTITION BY CASE_ID ORDER BY ID) AS N,
CONVERT(NVARCHAR(10), CASE_DATE,121) AS DATA
FROM DATA
WHERE CASE_ID = 1 ) X ) Y
ORDER BY ID;
SQLFiddle

Grouping by a given number of row values

I have a list of values in a column 1, now I want to get the sum of next 5 row values or 6 row values like below and populate the value in appropriate column.
for example if you see 1st row value of the column 'next-5row-value' would be the sum of values from the current row to the next 5 rows which would be 9 and the next column would be sum of next 5 row values from that reference point.
I am trying to write functions to loop through to arrive at the sum. Is there an efficient way . can some one help me out. I am using postgres, greenplum . Thanks!
for example if you have this simple table:
sebpa=# \d numbers
Table "public.numbers"
Column | Type | Modifiers
--------+---------+------------------------------------------------------
id | integer | not null default nextval('numbers_id_seq'::regclass)
i | integer |
sebpa=# select * from numbers limit 15;
id | i
------+---
3001 | 3
3002 | 0
3003 | 5
3004 | 1
3005 | 1
3006 | 4
3007 | 1
3008 | 1
3009 | 4
3010 | 0
3011 | 4
3012 | 0
3013 | 3
3014 | 2
3015 | 1
(15 rows)
you can use this sql:
sebpa=# select id, i, sum(i) over( order by id rows between 0 preceding and 4 following) from numbers;
id | i | sum
------+---+-----
3001 | 3 | 10
3002 | 0 | 11
3003 | 5 | 12
3004 | 1 | 8
3005 | 1 | 11
3006 | 4 | 10
3007 | 1 | 10
3008 | 1 | 9
3009 | 4 | 11
3010 | 0 | 9
3011 | 4 | 10
3012 | 0 | 10
3013 | 3 | 13
3014 | 2 | 15
3015 | 1 | 17
3016 | 4 | 20
3017 | 3 | 17
--cutted output
You can try something like this:
SELECT V,
SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING) AS next_5row_value,
SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 6 FOLLOWING) AS next_6row_value
FROM YourTable;
If you don't want NULLs in your "next_5row_value" column (the same for the 6row), you can use the COALESCE function (supported in PostgreSQL too) that returns the first not null expression.
Something like this:
SELECT V,
COALESCE(SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 5 FOLLOWING), 0) AS next_5row_value,
COALESCE(SUM(V) OVER(ORDER BY YourOrderingField
ROWS BETWEEN 1 FOLLOWING AND 6 FOLLOWING), 0) AS next_6row_value
FROM YourTable;
I don't think Postgres 8.4 supports the full functionality of window frames. You can do this by using lead():
select value,
(value +
lead(value, 1) over (order by id) +
lead(value, 2) over (order by id) +
lead(value, 3) over (order by id) +
lead(value, 4) over (order by id)
) as next5,
(value +
lead(value, 1) over (order by id) +
lead(value, 2) over (order by id) +
lead(value, 3) over (order by id) +
lead(value, 4) over (order by id) +
lead(value, 5) over (order by id)
) as next5
from table t;
Using a window frame definition is definitely a better approach if the database supports it. But the above will also work.