Sql Range Groups Start and End Id - sql

I have a query that I want to break into 'chunks' of size 200 and return the start id and end id of each 'chunk'.
Example:
select t.id
from t
where t.x = y --this predicate will cause the ids to not be sequential
If the example was the query I'm trying to break into 'chunks' I'd want to return:
(1st ID, 200th ID), (201st ID, 400th ID)...(start of final range ID, end of range ID)
Edit: For the final range, if it is not a full 200 rows it should still supply the final id in the query.
Is there a way to do this with just SQL or will I have to resort to application processing and/or multiple queries similar to a pagination implementation?
If there is a way to do this in SQL please supply an example.

Hmmm, I think the easiest way is to use row_number():
select id
from (select t.*, row_number() over (order by id) as seqnum
from t
where t.x = y
) t
where (seqnum % 200) in (0, 1);
EDIT:
Based on your comments:
select min(id) as startid, max(id) as endid
from (select t.*,
floor((row_number() over (order by id) - 1) / 200) as grp
from t
where t.x = y
) t
group by grp;

L for Left and R for Right
WITH cte AS (
SELECT
t.id,
row_number() over (order by id) as seqnum
FROM Table t
WHERE t.x = y
)
SELECT L.id as start_id, COALESCE(R.id, (SELECT MAX(ID) FROM cte) ) as end_id
FROM cte L
LEFT JOIN cte R
ON L.seqnum = R.seqnum - 199
WHERE L.seqnum % 200 = 1
SqlFiddleDemo
filtering only even number and block of 4.
See how R.seqnum - 199 for a block of size 200

Related

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?
One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;
Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle
you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t
This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

Select every second record then determine earliest date

I have table that looks like the following
I have to select every second record per PatientID that would give the following result (my last query returns this result)
I then have to select the record with the oldest date which would be the following (this is the end result I want)
What I have done so far: I have a CTE that gets all the data I need
WITH cte
AS
(
SELECT visit.PatientTreatmentVisitID, mat.PatientMatchID,pat.PatientID,visit.RegimenDate AS VisitDate,
ROW_NUMBER() OVER(PARTITION BY mat.PatientMatchID, pat.PatientID ORDER BY visit.VisitDate ASC) AS RowNumber
FROM tblPatient pat INNER JOIN tblPatientMatch mat ON mat.PatientID = pat.PatientID
LEFT JOIN tblPatientTreatmentVisit visit ON visit.PatientID = pat.PatientID
)
I then write a query against the CTE but so far I can only return the second row for each patientID
SELECT *
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate, RowNumber FROM cte
) as X
WHERE RowNumber = 2
How do I return the record with the oldest date only? Is there perhaps a MIN() function that I could be including somewhere?
If I follow you correctly, you can just order your existing resultset and retain the top row only.
In standard SQL, you would write this using a FETCH clause:
SELECT *
FROM (
SELECT
visit.PatientTreatmentVisitID,
mat.PatientMatchID,
pat.PatientID,
visit.RegimenDate AS VisitDate,
ROW_NUMBER() OVER(PARTITION BY mat.PatientMatchID, pat.PatientID ORDER BY visit.VisitDate ASC) AS rn
FROM tblPatient pat
INNER JOIN tblPatientMatch mat ON mat.PatientID = pat.PatientID
LEFT JOIN tblPatientTreatmentVisit visit ON visit.PatientID = pat.PatientID
) t
WHERE rn = 2
ORDER BY VisitDate
OFFSET 0 ROWS FETCH FIRST 1 ROW ONLY
This syntax is supported in Postgres, Oracle, SQL Server (and possibly other databases).
If you need to get oldest date from all selected dates (every second row for each patient ID) then you can try window function Min:
SELECT * FROM
(
SELECT *, MIN(VisitDate) OVER (Order By VisitDate) MinDate
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate,
RowNumber FROM cte
) as X
WHERE RowNumber = 2
) Y
WHERE VisitDate=MinDate
Or you can use SELECT TOP statement. The SELECT TOP clause allows you to limit the number of rows returned in a query result set:
SELECT TOP 1 PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate FROM
(
SELECT *
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate,
RowNumber FROM cte
) as X
WHERE RowNumber = 2
) Y
ORDER BY VisitDate
For simplicity add order desc on date column and use TOP to get the first row only
SELECT TOP 1 *
FROM
(
SELECT PatientTreatmentVisitID,PatientMatchID,PatientID, VisitDate, RowNumber FROM cte
) as X
WHERE RowNumber = 2
order by VisitDate desc

SQL - delete record where sum = 0

I have a table which has below values:
If Sum of values = 0 with same ID I want to delete them from the table. So result should look like this:
The code I have:
DELETE FROM tmp_table
WHERE ID in
(SELECT ID
FROM tmp_table WITH(NOLOCK)
GROUP BY ID
HAVING SUM(value) = 0)
Only deletes rows with ID = 2.
UPD: Including additional example:
Rows in yellow needs to be deleted
Your query is working correctly because the only group to total zero is id 2, the others have sub-groups which total zero (such as the first two with id 1) but the total for all those records is -3.
What you're wanting is a much more complex algorithm to do "bin packing" in order to remove the sub groups which sum to zero.
You can do what you want using window functions -- by enumerating the values for each id. Taking your approach using a subquery:
with t as (
select t.*,
row_number() over (partition by id, value order by id) as seqnum
from tmp_table t
)
delete from t
where exists (select 1
from t t2
where t2.id = t.id and t2.value = - t.value and t2.seqnum = t.seqnum
);
You can also do this with a second layer of window functions:
with t as (
select t.*,
row_number() over (partition by id, value order by id) as seqnum
from tmp_table t
),
tt as (
select t.*, count(*) over (partition by id, abs(value), seqnum) as cnt
from t
)
delete from tt
where cnt = 2;

PostgreSQL - column value changed - select query optimization

Say we have a table:
CREATE TABLE p
(
id serial NOT NULL,
val boolean NOT NULL,
PRIMARY KEY (id)
);
Populated with some rows:
insert into p (val)
values (true),(false),(false),(true),(true),(true),(false);
ID VAL
1 1
2 0
3 0
4 1
5 1
6 1
7 0
I want to determine when the value has been changed. So the result of my query should be:
ID VAL
2 0
4 1
7 0
I have a solution with joins and subqueries:
select min(id) id, val from
(
select p1.id, p1.val, max(p2.id) last_prev
from p p1
join p p2
on p2.id < p1.id and p2.val != p1.val
group by p1.id, p1.val
) tmp
group by val, last_prev
order by id;
But it is very inefficient and will work extremely slow for tables with many rows.
I believe there could be more efficient solution using PostgreSQL window functions?
SQL Fiddle
This is how I would do it with an analytic:
SELECT id, val
FROM ( SELECT id, val
,LAG(val) OVER (ORDER BY id) AS prev_val
FROM p ) x
WHERE val <> COALESCE(prev_val, val)
ORDER BY id
Update (some explanation):
Analytic functions operate as a post-processing step. The query result is broken into groupings (partition by) and the analytic function is applied within the context of a grouping.
In this case, the query is a selection from p. The analytic function being applied is LAG. Since there is no partition by clause, there is only one grouping: the entire result set. This grouping is ordered by id. LAG returns the value of the previous row in the grouping using the specified order. The result is each row having an additional column (aliased prev_val) which is the val of the preceding row. That is the subquery.
Then we look for rows where the val does not match the val of the previous row (prev_val). The COALESCE handles the special case of the first row which does not have a previous value.
Analytic functions may seem a bit strange at first, but a search on analytic functions finds a lot of examples walking through how they work. For example: http://www.cs.utexas.edu/~cannata/dbms/Analytic%20Functions%20in%20Oracle%208i%20and%209i.htm Just remember that it is a post-processing step. You won't be able to perform filtering, etc on the value of an analytic function unless you subquery it.
Window function
Instead of calling COALESCE, you can provide a default from the window function lag() directly. A minor detail in this case since all columns are defined NOT NULL. But this may be essential to distinguish "no previous row" from "NULL in previous row".
SELECT id, val
FROM (
SELECT id, val, lag(val, 1, val) OVER (ORDER BY id) <> val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Compute the result of the comparison immediately, since the previous value is not of interest per se, only a possible change. Shorter and may be a tiny bit faster.
If you consider the first row to be "changed" (unlike your demo output suggests), you need to observe NULL values - even though your columns are defined NOT NULL. Basic lag() returns NULL in case there is no previous row:
SELECT id, val
FROM (
SELECT id, val, lag(val) OVER (ORDER BY id) IS DISTINCT FROM val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Or employ the additional parameters of lag() once again:
SELECT id, val
FROM (
SELECT id, val, lag(val, 1, NOT val) OVER (ORDER BY id) <> val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Recursive CTE
As proof of concept. :)
Performance won't keep up with posted alternatives.
WITH RECURSIVE cte AS (
SELECT id, val
FROM p
WHERE NOT EXISTS (
SELECT 1
FROM p p0
WHERE p0.id < p.id
)
UNION ALL
SELECT p.id, p.val
FROM cte
JOIN p ON p.id > cte.id
AND p.val <> cte.val
WHERE NOT EXISTS (
SELECT 1
FROM p p0
WHERE p0.id > cte.id
AND p0.val <> cte.val
AND p0.id < p.id
)
)
SELECT * FROM cte;
With an improvement from #wildplasser.
SQL Fiddle demonstrating all.
Can even be done without window functions.
SELECT * FROM p p0
WHERE EXISTS (
SELECT * FROM p ex
WHERE ex.id < p0.id
AND ex.val <> p0.val
AND NOT EXISTS (
SELECT * FROM p nx
WHERE nx.id < p0.id
AND nx.id > ex.id
)
);
UPDATE: Self-joining a non-recursive CTE (could also be a subquery instead of a CTE)
WITH drag AS (
SELECT id
, rank() OVER (ORDER BY id) AS rnk
, val
FROM p
)
SELECT d1.*
FROM drag d1
JOIN drag d0 ON d0.rnk = d1.rnk -1
WHERE d1.val <> d0.val
;
This nonrecursive CTE approach is surprisingly fast, although it needs an implicit sort.
Using 2 row_number() computations: This is also possible to do with usual "islands and gaps" SQL technique (could be useful if you can't use lag() window function for some reason:
with cte1 as (
select
*,
row_number() over(order by id) as rn1,
row_number() over(partition by val order by id) as rn2
from p
)
select *, rn1 - rn2 as g
from cte1
order by id
So this query will give you all islands
ID VAL RN1 RN2 G
1 1 1 1 0
2 0 2 1 1
3 0 3 2 1
4 1 4 2 2
5 1 5 3 2
6 1 6 4 2
7 0 7 3 4
You see, how G field could be used to group this islands together:
with cte1 as (
select
*,
row_number() over(order by id) as rn1,
row_number() over(partition by val order by id) as rn2
from p
)
select
min(id) as id,
val
from cte1
group by val, rn1 - rn2
order by 1
So you'll get
ID VAL
1 1
2 0
4 1
7 0
The only thing now is you have to remove first record which can be done by getting min(...) over() window function:
with cte1 as (
...
), cte2 as (
select
min(id) as id,
val,
min(min(id)) over() as mid
from cte1
group by val, rn1 - rn2
)
select id, val
from cte2
where id <> mid
And results:
ID VAL
2 0
4 1
7 0
A simple inner join can do it. SQL Fiddle
select p2.id, p2.val
from
p p1
inner join
p p2 on p2.id = p1.id + 1
where p2.val != p1.val

Calculate arithmetic return from a table of values

I've created a table of index price levels (eg, S&P 500) that I'd like to calculate the daily return of. Table structure looks like this:
Date Value
2009-07-02 880.167341
2009-07-03 882.235134
2009-07-06 881.338052
2009-07-07 863.731494
2009-07-08 862.458985
I'd like to calculate the daily arithmetic return (ie, percentage return) of the index, defined as:
Daily Return = P(2)/P(1) - 1
Where P represents the index value in this case. Given the input table presented above, the desired output would look like this:
Date Return
2009-07-03 0.002349318
2009-07-06 -0.001016829
2009-07-07 -0.019977077
2009-07-08 -0.001473269
It occurs to me that a self join would work, but I'm not sure of the best way to increment the date on the second table to account for weekends.
Any thoughts on the best way to go about this?
WITH cteRank AS (
SELECT [Date], Value,
ROW_NUMBER() OVER(ORDER BY [Date]) AS RowNum
FROM YourTable
)
SELECT c1.[Date], c1.Value/c2.Value - 1 as [Return]
from cteRank c1
inner join cteRank c2
on c1.RowNum - 1 = c2.RowNum
where c1.RowNum > 1
A simple CROSS APPLY
SELECT
Tlater.Date, (Tlater.Value / TPrev2.Value) - 1
FROM
MyTable Tlater
CROSS APPLY
(
SELECT TOP 1 TPrev.Value
FROM MyTable TPrev
WHERE TPrev.Date < Tlater.Date
ORDER BY TPrev.Date
) TPrev2
Note: this becomes trivial in Denali (SQL Server 2012) with LAG (untested, may need a CTE)
SELECT
OrderDate,
(Value / (LAG(Value) OVER (ORDER BY Date))) -1
FROM
MyTable
Or
;WITH cPairs AS
(
SELECT
Date,
Value AS Curr,
LAG(Value) OVER (ORDER BY Date) AS Prev
FROM
MyTable
)
SELECT
Date,
(Curr / Prev) -1
FROM
cPairs
If you're using 2005+, you can use the ROW_NUMBER function combined with a CTE:
;with RowNums as ( select *, row_number() over (order by date) as RN from table )
select *, r1.Value / r.Value - 1 as Return
from RowNums r
inner join RowNums r1
on r.RN = r1.RN - 1