SQL - How to SELECT the best two months which are next to each other - sql

How to select in PostgreSQL the best two months from Table.
Table:
ID Month Value
1 2019-06 100
2 2019-07 120
3 2019-08 70
4 2019-09 200
5 2019-10 100
6 2019-11 50
I would like to select ID where sum(Value) of two months which are next to each other is the highest.
In the following case the result will be:
4 2019-09
5 2019-10
where the sum of values is equal to 300.

You can put the data on one row using join:
select t1.*, t2.*
from t t1 join
t t2
on t2.month = t1.month + interval '1 month'
order by t1.value + t.value desc
limit 1;
Getting separate rows is trickier. You can easily get the first row using lead():
select t.*
from (select t.*, lead(value, 1, 0) over (order by month) as next_value
from t
) t
order by (value + next_value) desc
limit 1;
Getting the second month is much trickier. I am thinking that the simplest method is to unpivot the first results:
select t.*
from (select t1, t2
from t t1 join
t t2
on t2.month = t1.month + interval '1 month'
order by t1.value + t.value desc
limit 1
) cross join lateral
unnest(array[t1, t2]) t
order by t.month;

Here is a solution that uses solely window functions and that does not assume that month is of a date-like datatype.
This works as follows:
first rank records per increasing month with row_number() (aliased as rn) and compute the sum of the current and previous value (aliased as vals)
rank records by vals (aliased rnk)
exhibit the row number the record that has the highest vals (aliased rn_max)
finally pull out the this record and the preceeding one (ie the one that has the previous row number)
Query:
select id, month, value
from (
select t.*, first_value(rn) over(order by rnk) rn_max
from (
select t.*, rank() over(order by vals desc) rnk
from (
select
t.*,
value + lag(value, 1, 0) over (order by month) vals,
row_number() over(order by month) rn
from mytable t
) t
) t
) t
where rn in (rn_max, rn_max - 1)
order by month
Step-by-step demo on DB Fiddle:
id | month | value
-: | :------ | ----:
4 | 2019-09 | 200
5 | 2019-10 | 100

Related

SELECT statement that shows continuous data with condition

I consider myself good at SQL but failed at this problem.
I need a SELECT statement that shows all rows above 100 if there are
3 rows or more with 100 next to it.
Given Table "Trend":
| id | volume |
+----+---------+
| 0 | 200 |
| 1 | 90 |
| 2 | 101 |
| 3 | 120 |
| 4 | 200 |
| 5 | 10 |
| 6 | 400 |
I need a SELECT statement to produce:
| 2 | 101 |
| 3 | 120 |
| 4 | 200 |
I suspect that you are after the following logic:
select *
from (
select t.*,
sum(case when volume > 100 then 1 else 0 end) over(order by id rows between 2 preceding and 2 following) cnt
from mytable t
) t
where volume > 100 and cnt >= 3
This counts how many values are above 100 in the range made of the two preceding rows, the current row and the next two rows. Then we filter on rows whose window count is 3 or more.
This uses a syntax that most database support (provided that window functions are available). Neater expressions may be available depending on the actual database you are using.
In MySQL:
sum(volume > 100) over(order by id rows between 2 preceding and 2 following) cnt
In Postgres:
count(*) filter(where volume > 100) over(order by id rows between 2 preceding and 2 following) cnt
Or:
sum((volume > 100)::int) over(order by id rows between 2 preceding and 2 following) cnt
This is tricky because you want the original rows . . . I am going to suggest lag() and lead():
select id, volume
from (select t.*,
lag(volume, 2) over (order by id) as prev_volume_2,
lag(volume) over (order by id) as prev_volume,
lead(volume, 2) over (order by id) as next_volume_2,
lead(volume) over (order by id) as next_volume
from t
) t
where volume > 100 and
( (prev_volume_2 > 100 and prev_volume > 100) or
(prev_volume > 100 and next_volume > 100) or
(next_volume_2 > 100 and next_volume > 100)
);
Another method is to treat this as a gaps-and-islands problem. This makes the solution more generalizable. You can assign a group by counting the number of rows less than or equal to 100 up to each row. Then count the number that are greater than 100 to see if those groups qualify to be in the final results:
select id, volume
from (select t.*,
sum(case when volume > 100 then 1 else 0 end) over (partition by grp) as cnt
from (select t.*,
sum(case when volume <= 100 then 1 else 0 end) over (order by id) as grp
from t
) t
) t
where volume > 100 and cnt >= 3;
Here is a db<>fiddle with these two approaches.
Key point here is "3 rows or more". MATCH_RECOGNIZE could be used:
SELECT *
FROM trend
MATCH_RECOGNIZE (
ORDER BY id -- ordering of a streak
MEASURES FINAL COUNT(*) AS l -- count "per" match
ALL ROWS PER MATCH -- get all rows
PATTERN(a{3,}) -- 3 or more
DEFINE a AS volume >= 100 -- condtion of streak
)
ORDER BY l DESC FETCH FIRST 1 ROWS WITH TIES;
-- choose the group that has the longest streak
The strength of this approach is a PATTERN part which could be modifed to handle different scenarios like a{3,5} - between 3 and 5 occurences, a{4} exactly 4 occurences and so on. More conditions could be defined which allows to build complex pattern detection.
db<>fiddle demo
Get the min value of volume for all consecutive 3 rows of the table.
Then join to the table and keep only the ones belonging to a group that has min > 100:
select distinct t.*
from Trend t
inner join (
select t.*,
min(t.volume) over (order by t.id rows between current row and 2 following) min_volume,
lead(t.id, 1) over (order by t.id) next1,
lead(t.id, 2) over (order by t.id) next2
from Trend t
) m on t.id in (m.id, m.next1, m.next2)
where m.min_volume > 100 and m.next1 is not null and m.next2 is not null
See the demo for SQL Server, MySql, Postgresql, Oracle, SQLite.
Results:
> id | volume
> -: | -----:
> 2 | 101
> 3 | 120
> 4 | 200
a simplistic approach:
--CREATE TABLE Trend (id integer, volume integer);
--insert into Trend VALUES
-- (0,200),
-- (1,90),
-- (2,101),
-- (3,120),
-- (4,200),
-- (5,10),
-- (6,400);
SELECT
t1.id, t1.volume
--,t2.id, t2.volume
--,t3.id, t3.volume
FROM Trend t1
INNER JOIN Trend t2 ON t2.id>t1.id and t2.volume>100 and not exists (select * from Trend t5 where t5.id between t1.id+1 and t2.id-1)
INNER JOIN Trend t3 ON t3.id>t2.id and t3.volume>100 and not exists (select * from Trend where id between t2.id+1 and t3.id-1)
WHERE t1.volume>100
union all
SELECT
--t1.id, t1.volume
t2.id, t2.volume
--,t3.id, t3.volume
FROM Trend t1
INNER JOIN Trend t2 ON t2.id>t1.id and t2.volume>100 and not exists (select * from Trend t5 where t5.id between t1.id+1 and t2.id-1)
INNER JOIN Trend t3 ON t3.id>t2.id and t3.volume>100 and not exists (select * from Trend where id between t2.id+1 and t3.id-1)
WHERE t1.volume>100
union all
SELECT
--t1.id, t1.volume
--t2.id, t2.volume
t3.id, t3.volume
FROM Trend t1
INNER JOIN Trend t2 ON t2.id>t1.id and t2.volume>100 and not exists (select * from Trend t5 where t5.id between t1.id+1 and t2.id-1)
INNER JOIN Trend t3 ON t3.id>t2.id and t3.volume>100 and not exists (select * from Trend where id between t2.id+1 and t3.id-1)
WHERE t1.volume>100

Get most recent measurement

I have a table that has has some measurements, ID and date.
The table is built like so
ID DATE M1 M2
1 2020 1 NULL
1 2020 NULL 15
1 2018 2 NULL
2 2019 1 NULL
2 2019 NULL 1
I would like to end up with a table that has one row per ID with the most recent measurement
ID M1 M2
1 1 15
2 1 1
Any ideas?
You can use correlated sub-query with aggregation :
select id, max(m1), max(m2)
from t
where t.date = (select max(t1.date) from t t1 where t1.id = t.id)
group by id;
Use ROW_NUMBER combined with an aggregation:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT ID, MAX(M1) AS M1, MAX(M2) AS M2
FROM cte
WHERE rn = 1
GROUP BY ID;
The row number lets us restrict to only records for each ID having the most recent year date. Then, we aggregate to find the max values for M1 and M2.
In standard SQL, you can use lag(ignore nulls):
select id, coalesce(m1, prev_m1), coalesce(m2, prev_m2)
from (select t.*,
lag(m1 ignore nulls) over (partition by id order by date) as prev_m1,
lag(m2 ignore nulls) over (partition by id order by date) as prev_m2,
row_number() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum = 1;

Comparing row values in oracle

I have Table1 with three columns:
Key | Date | Price
----------------------
1 | 26-May | 2
1 | 25-May | 2
1 | 24-May | 2
1 | 23 May | 3
1 | 22 May | 4
2 | 26-May | 2
2 | 25-May | 2
2 | 24-May | 2
2 | 23 May | 3
2 | 22 May | 4
I want to select the row where value 2 was last updated (24-May). The Date was sorted using RANK function.
I am not able to get the desired results. Any help will be appreciated.
SELECT *
FROM (SELECT key, DATE, price,
RANK() over (partition BY key order by DATE DESC) AS r2
FROM Table1 ORDER BY DATE DESC) temp;
Another way of looking at the problem is that you want to find the most recent record with a price different from the last price. Then you want the next record.
with lastprice as (
select t.*
from (select t.*
from table1 t
order by date desc
) t
where rownum = 1
)
select t.*
from (select t.*
from table1 t
where date > (select max(date)
from table1 t2
where t2.price <> (select price from lastprice)
)
order by date asc
) t
where rownum = 1;
This query looks complicated. But, it is structured so it can take advantage of indexes on table1(date). The subqueries are necessary in Oracle pre-12. In the most recent version, you can use fetch first 1 row only.
EDIT:
Another solution is to use lag() and find the most recent time when the value changed:
select t1.*
from (select t1.*
from (select t1.*,
lag(price) over (order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where rownum = 1;
Under many circumstances, I would expect the first version to have better performance, because the only heavy work is done in the innermost subquery to get the max(date). This verson has to calculate the lag() as well as doing the order by. However, if performance is an issue, you should test on your data in your environment.
EDIT II:
My best guess is that you want this per key. Your original question says nothing about key, but:
select t1.*
from (select t1.*,
row_number() over (partition by key order by date desc) as seqnum
from (select t1.*,
lag(price) over (partition by key order by date) as prev_price
from table1 t1
) t1
where prev_price is null or prev_price <> price
order by date desc
) t1
where seqnum = 1;
You can try this:-
SELECT Date FROM Table1
WHERE Price = 2
AND PrimaryKey = (SELECT MAX(PrimaryKey) FROM Table1
WHERE Price = 2)
This is very similar to the second option by Gordon Linoff but introduces a second windowed function row_number() to locate the most recent row that changed the price. This will work for all or a range of keys.
select
*
from (
select
*
, row_number() over(partition by Key order by [date] DESC) rn
from (
select
*
, NVL(lag(Price) over(partition by Key order by [date] DESC),0) prevPrice
from table1
where Key IN (1,2,3,4,5) -- as an example
)
where Price <> prevPrice
)
where rn = 1
apologies but I haven't been able to test this at all.

sql server select query a bit complex

I have a item_prices table with prices. Those prices vary at any time.
I want to display all items where date is highest
ITEM_prices
id | Items_name | item_price | item_date
------------------------------------------
1 A 10 2012-01-01
2 B 15 2012-01-01
3 B 16 2013-01-01
4 C 50 2013-01-01
5 A 20 2013-01-01
I want to display ABC items once each with highest date like as below
id | Items_name | item_price | item_date
-------------------------------------------
3 B 16 2013-01-01
4 C 50 2013-01-01
5 A 20 2013-01-01
when you can use native functions then why to go for any window function or CTE.
SELECT t1.*
FROM ITEM_prices t1
JOIN
(
SELECT Items_name,MAX(item_date) AS MaxItemDate
FROM ITEM_prices
GROUP BY Items_name
)t2
ON t1.Items_name=t2.Items_name AND t1.item_date=t2.MaxItemDate
One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your Items_name - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria.
So try something like this:
;WITH NewestItem AS
(
SELECT
id, Items_name, item_price, item_date,
RowNum = ROW_NUMBER() OVER(PARTITION BY Items_name ORDER BY item_date DESC)
FROM
dbo.ITEM_Prices
)
SELECT
id, Items_name, item_price, item_date
FROM
NewestItem
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each Items_Name) - ordered by the item_date in descending order (newest date gets RowNum = 1).
Does that approach what you're looking for??
One way is to use window functions to find the maximum date for each item:
select id, Items_name, item_price, item_date
from (select ip.*,
max(item_date) over (partition by items_name) as max_item_date
from item_prices
) ip
where item_date = max_item_date;
This will select all rows with the max date.
SELECT *
FROM item_prices
WHERE item_date = (SELECT max(item_date) FROM item_prices)
ORDER BY ID
This will select all rows for each item with the max date for that item.
select id, Items_name, item_price, item_date
from (select items_name, max(item_date) max_item_date
from item_prices
group by items_name
) ip
where item_date = max_item_date and items_name = ip.items_name

Rows inside the greatest streak?

Given the Rows
symbol_id profit date
1 100 2009-08-18 01:01:00
1 100 2009-08-18 01:01:01
1 156 2009-08-18 01:01:04
1 -56 2009-08-18 01:01:06
1 18 2009-08-18 01:01:07
How would I most efficiently select the rows that are involved in the greatest streak (of profit).
The greatest streak would be the first 3 rows, and I would want those rows. The query I came up with is just a bunch of nested queries and derived tables. I am looking for an efficient way to do this using common table expressions or something more advanced.
You haven't defined how 0 profit should be treated or what happens if there is a tie for longest streak. But something like...
;WITH T1 AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY symbol_id ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY symbol_id, SIGN(profit)
ORDER BY date) AS Grp
FROM Data
), T2 AS
(
SELECT *,
COUNT(*) OVER (PARTITION BY symbol_id,Grp) AS StreakLen
FROM T1
)
SELECT TOP 1 WITH TIES *
FROM T2
ORDER BY StreakLen DESC
Or - if you are looking for most profitable streak
;WITH T1 AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY symbol_id ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY symbol_id, CASE WHEN profit >= 0 THEN 1 END
ORDER BY date) AS Grp
FROM Data
), T2 AS
(
SELECT *,
SUM(profit) OVER (PARTITION BY symbol_id,Grp) AS StreakProfit
FROM T1
)
SELECT TOP 1 WITH TIES *
FROM T2
ORDER BY StreakProfit DESC
declare #T table
(
symbol_id int,
profit int,
[date] datetime
)
insert into #T values
(1, 100, '2009-08-18 01:01:00'),
(1, 100, '2009-08-18 01:01:01'),
(1, 156, '2009-08-18 01:01:04'),
(1, -56, '2009-08-18 01:01:06'),
(1, 18 , '2009-08-18 01:01:07')
;with C1 as
(
select *,
row_number() over(order by [date]) as rn
from #T
),
C2 as
(
select *,
rn - row_number() over(order by rn) as grp
from C1
where profit >= 0
)
select top 1 with ties *
from C2
order by sum(profit) over(partition by grp) desc
Result:
symbol_id profit date rn grp
----------- ----------- ----------------------- -------------------- --------------------
1 100 2009-08-18 01:01:00.000 1 0
1 100 2009-08-18 01:01:01.000 2 0
1 156 2009-08-18 01:01:04.000 3 0
If that's a MSSQL server then you want to consider using TOP 3 in your select clause
and ORDER BY PROFIT DESC.
If mysql/postgres you might want to consider using limit in your select clause with
the same order by too.
hope this helps.