SELECT statement that shows continuous data with condition - sql

I consider myself good at SQL but failed at this problem.
I need a SELECT statement that shows all rows above 100 if there are
3 rows or more with 100 next to it.
Given Table "Trend":
| id | volume |
+----+---------+
| 0 | 200 |
| 1 | 90 |
| 2 | 101 |
| 3 | 120 |
| 4 | 200 |
| 5 | 10 |
| 6 | 400 |
I need a SELECT statement to produce:
| 2 | 101 |
| 3 | 120 |
| 4 | 200 |

I suspect that you are after the following logic:
select *
from (
select t.*,
sum(case when volume > 100 then 1 else 0 end) over(order by id rows between 2 preceding and 2 following) cnt
from mytable t
) t
where volume > 100 and cnt >= 3
This counts how many values are above 100 in the range made of the two preceding rows, the current row and the next two rows. Then we filter on rows whose window count is 3 or more.
This uses a syntax that most database support (provided that window functions are available). Neater expressions may be available depending on the actual database you are using.
In MySQL:
sum(volume > 100) over(order by id rows between 2 preceding and 2 following) cnt
In Postgres:
count(*) filter(where volume > 100) over(order by id rows between 2 preceding and 2 following) cnt
Or:
sum((volume > 100)::int) over(order by id rows between 2 preceding and 2 following) cnt

This is tricky because you want the original rows . . . I am going to suggest lag() and lead():
select id, volume
from (select t.*,
lag(volume, 2) over (order by id) as prev_volume_2,
lag(volume) over (order by id) as prev_volume,
lead(volume, 2) over (order by id) as next_volume_2,
lead(volume) over (order by id) as next_volume
from t
) t
where volume > 100 and
( (prev_volume_2 > 100 and prev_volume > 100) or
(prev_volume > 100 and next_volume > 100) or
(next_volume_2 > 100 and next_volume > 100)
);
Another method is to treat this as a gaps-and-islands problem. This makes the solution more generalizable. You can assign a group by counting the number of rows less than or equal to 100 up to each row. Then count the number that are greater than 100 to see if those groups qualify to be in the final results:
select id, volume
from (select t.*,
sum(case when volume > 100 then 1 else 0 end) over (partition by grp) as cnt
from (select t.*,
sum(case when volume <= 100 then 1 else 0 end) over (order by id) as grp
from t
) t
) t
where volume > 100 and cnt >= 3;
Here is a db<>fiddle with these two approaches.

Key point here is "3 rows or more". MATCH_RECOGNIZE could be used:
SELECT *
FROM trend
MATCH_RECOGNIZE (
ORDER BY id -- ordering of a streak
MEASURES FINAL COUNT(*) AS l -- count "per" match
ALL ROWS PER MATCH -- get all rows
PATTERN(a{3,}) -- 3 or more
DEFINE a AS volume >= 100 -- condtion of streak
)
ORDER BY l DESC FETCH FIRST 1 ROWS WITH TIES;
-- choose the group that has the longest streak
The strength of this approach is a PATTERN part which could be modifed to handle different scenarios like a{3,5} - between 3 and 5 occurences, a{4} exactly 4 occurences and so on. More conditions could be defined which allows to build complex pattern detection.
db<>fiddle demo

Get the min value of volume for all consecutive 3 rows of the table.
Then join to the table and keep only the ones belonging to a group that has min > 100:
select distinct t.*
from Trend t
inner join (
select t.*,
min(t.volume) over (order by t.id rows between current row and 2 following) min_volume,
lead(t.id, 1) over (order by t.id) next1,
lead(t.id, 2) over (order by t.id) next2
from Trend t
) m on t.id in (m.id, m.next1, m.next2)
where m.min_volume > 100 and m.next1 is not null and m.next2 is not null
See the demo for SQL Server, MySql, Postgresql, Oracle, SQLite.
Results:
> id | volume
> -: | -----:
> 2 | 101
> 3 | 120
> 4 | 200

a simplistic approach:
--CREATE TABLE Trend (id integer, volume integer);
--insert into Trend VALUES
-- (0,200),
-- (1,90),
-- (2,101),
-- (3,120),
-- (4,200),
-- (5,10),
-- (6,400);
SELECT
t1.id, t1.volume
--,t2.id, t2.volume
--,t3.id, t3.volume
FROM Trend t1
INNER JOIN Trend t2 ON t2.id>t1.id and t2.volume>100 and not exists (select * from Trend t5 where t5.id between t1.id+1 and t2.id-1)
INNER JOIN Trend t3 ON t3.id>t2.id and t3.volume>100 and not exists (select * from Trend where id between t2.id+1 and t3.id-1)
WHERE t1.volume>100
union all
SELECT
--t1.id, t1.volume
t2.id, t2.volume
--,t3.id, t3.volume
FROM Trend t1
INNER JOIN Trend t2 ON t2.id>t1.id and t2.volume>100 and not exists (select * from Trend t5 where t5.id between t1.id+1 and t2.id-1)
INNER JOIN Trend t3 ON t3.id>t2.id and t3.volume>100 and not exists (select * from Trend where id between t2.id+1 and t3.id-1)
WHERE t1.volume>100
union all
SELECT
--t1.id, t1.volume
--t2.id, t2.volume
t3.id, t3.volume
FROM Trend t1
INNER JOIN Trend t2 ON t2.id>t1.id and t2.volume>100 and not exists (select * from Trend t5 where t5.id between t1.id+1 and t2.id-1)
INNER JOIN Trend t3 ON t3.id>t2.id and t3.volume>100 and not exists (select * from Trend where id between t2.id+1 and t3.id-1)
WHERE t1.volume>100

Related

SQL - How to SELECT the best two months which are next to each other

How to select in PostgreSQL the best two months from Table.
Table:
ID Month Value
1 2019-06 100
2 2019-07 120
3 2019-08 70
4 2019-09 200
5 2019-10 100
6 2019-11 50
I would like to select ID where sum(Value) of two months which are next to each other is the highest.
In the following case the result will be:
4 2019-09
5 2019-10
where the sum of values is equal to 300.
You can put the data on one row using join:
select t1.*, t2.*
from t t1 join
t t2
on t2.month = t1.month + interval '1 month'
order by t1.value + t.value desc
limit 1;
Getting separate rows is trickier. You can easily get the first row using lead():
select t.*
from (select t.*, lead(value, 1, 0) over (order by month) as next_value
from t
) t
order by (value + next_value) desc
limit 1;
Getting the second month is much trickier. I am thinking that the simplest method is to unpivot the first results:
select t.*
from (select t1, t2
from t t1 join
t t2
on t2.month = t1.month + interval '1 month'
order by t1.value + t.value desc
limit 1
) cross join lateral
unnest(array[t1, t2]) t
order by t.month;
Here is a solution that uses solely window functions and that does not assume that month is of a date-like datatype.
This works as follows:
first rank records per increasing month with row_number() (aliased as rn) and compute the sum of the current and previous value (aliased as vals)
rank records by vals (aliased rnk)
exhibit the row number the record that has the highest vals (aliased rn_max)
finally pull out the this record and the preceeding one (ie the one that has the previous row number)
Query:
select id, month, value
from (
select t.*, first_value(rn) over(order by rnk) rn_max
from (
select t.*, rank() over(order by vals desc) rnk
from (
select
t.*,
value + lag(value, 1, 0) over (order by month) vals,
row_number() over(order by month) rn
from mytable t
) t
) t
) t
where rn in (rn_max, rn_max - 1)
order by month
Step-by-step demo on DB Fiddle:
id | month | value
-: | :------ | ----:
4 | 2019-09 | 200
5 | 2019-10 | 100

SELECT SQL Matching Number

I have millions of rows of data that have similar values ​​like this:
Id Reff Amount
1 a1 1000
2 a2 -1000
3 a3 -2500
4 a4 -1500
5 a5 1500
every data must have positive and negative values. the question is, how do I show only records that don't have a similar value? like a row Id 3. thanks for help
You can use not exists:
select t.*
from mytable t
where not exists (select 1 from mytable t1 where t1.amount = -1 * t.amount)
A left join antipattern would also get the job done:
select t.*
from mytable t
left join mytable t1 on t1.amount = -1 * t.amount
where t1.id is null
Demo on DB Fiddle:
Id | Reff | Amount
-: | :--- | -----:
3 | a3 | -2500
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE Test(
Id int
,Reff varchar(2)
,Amount int
);
INSERT INTO Test(Id,Reff,Amount) VALUES (1,'a1',1000);
INSERT INTO Test(Id,Reff,Amount) VALUES (2,'a2',-1000);
INSERT INTO Test(Id,Reff,Amount) VALUES (3,'a3',-2500);
INSERT INTO Test(Id,Reff,Amount) VALUES (4,'a4',-1500);
INSERT INTO Test(Id,Reff,Amount) VALUES (5,'a5',1500);
Query 1:
select t.*
from Test t
left join Test t1 on t1.amount =ABS(t.amount)
where t1.id is null
Results:
| Id | Reff | Amount |
|----|------|--------|
| 3 | a3 | -2500 |
Using a NOT EXISTS or a LEFT JOIN will work fine to find the amounts that don't have an opposite amount in the data.
But to really find the amounts that don't balance out with an Amount sorted by ID?
For such SQL puzzle it should be handled as a Gaps-And-Islands problem.
So the solution might appear a bit more complicated, but it's actually quite simple.
It first calculates a ranking per absolute value.
And based on that ranking it filters the last amount where the SUM per ranking isn't balanced out (not 0)
SELECT Id, Reff, Amount
FROM
(
SELECT *,
SUM(Amount) OVER (PARTITION BY Rnk) AS SumAmountByRank,
ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Id DESC) AS Rn
FROM
(
SELECT Id, Reff, Amount,
ROW_NUMBER() OVER (ORDER BY Id) - ROW_NUMBER() OVER (PARTITION BY ABS(Amount) ORDER BY Id) AS Rnk
FROM YourTable
) AS q1
) AS q2
WHERE SumAmountByRank != 0
AND Rn = 1
ORDER BY Id;
A test on rextester here
If the sequence doesn't matter, and just the balance matters?
Then the query can be simplified.
SELECT Id, Reff, Amount
FROM
(
SELECT Id, Reff, Amount,
SUM(Amount) OVER (PARTITION BY ABS(Amount)) AS SumByAbsAmount,
ROW_NUMBER() OVER (PARTITION BY ABS(Amount) ORDER BY Id DESC) AS Rn
FROM YourTable
) AS q
WHERE SumByAbsAmount != 0
AND Rn = 1
ORDER BY Id;

SQL: How do I display all records per unique id, but not the first record ever recorded in SQL

Example:
id Pricemoney time/date
1 100 01/20/2017
1 10 01/21/2017
1 1000 01/21/20147
2 10 01/23/2017
2 100 01/24/2017
3 1000 01/19/2017
3 100 01/22/2017
3 10 01/24/2017
I want to run a SQL query where I can display all the Id and it's pricemoney BUT NOT include the first record (based on time/date) per unique
Just to clarify what I do not want to be displayed
userid Pricemoney issuedate
1 100 01/20/2017 -- not included
1 10 01/21/2017
1 1000 01/21/20147
2 10 01/23/2017 --- not inlcuded
2 100 01/24/2017
3 1000 01/19/2017 -- not included
3 100 01/22/2017
3 10 01/24/2017
Expected result:
id Pricemoney time/date
1 10 01/21/2017
1 1000 01/21/20147
2 100 01/24/2017
3 100 01/22/2017
3 10 01/24/2017
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by id order by time_date asc) as seqnum
from <tablename> t
) t
where seqnum > 1;
If you want to keep single rows, you can do:
select t.*
from (select t.*,
row_number() over (partition by id order by time_date asc) as seqnum,
count(*) over (partition by id) as cnt
from <tablename> t
) t
where seqnum > 1 and cnt > 1;
You may use EXISTS
select t1.*
from data t1
where exists (
select 1
from data t2
where t1.id = t2.id and t2.time_date < t1.time_date
)
you can try this :
select data1.id,data1.Date,data1.Pricemoney from data1
left join (
select id ,min(Date) date from data1
group by id
) as t
on data1.date= t.date and t.id = data1.id
where t.id is null
group by data1.id,data1.Date,data1.Pricemoney
above query not duplicated records also ignore, if want
not duplicated records then use having count(id) > 1 in left query e,g.
select data1.id,data1.Date,data1.Pricemoney from data1
left join (
select id ,min(Date) date from data1
group by id
having COUNT(id) > 1
) as t
on data1.date= t.date and t.id = data1.id
where t.id is null
group by data1.id,data1.Date,data1.Pricemoney

Finding Missing Numbers series when Data Is Grouped in sql server

I need to write a query that will calculate the missing numbers with their count in a sequence when the data is "grouped". The data are in multiple groups & each group is in sequence.
For Ex. I have number series like 1001-1050, 1245-1270, 4571-4590 and all numbers like 1001,1002,1003,....1050 is stored in Table1 and from that Table1 some numbers are stored in another table Table2. E.g. 1001,1002,1003,1004,1005.
I want to get output like this:
Utilized Numbers | Balance Numbers |
----------- -------------------------
1001 - 1005 = 5 | 1006 - 1050 = 45 |
1245 - 1251 = 7 | 1252 - 1270 = 19 |
4571 - 4573 = 3 | 4574 - 4590 = 17 |
The number of each series is single field which is stored in both tables.
You haven't really explained your data, but guessing that "Utilized" are the numbers found in both Table1 and Table2, and "Balance" are the numbers only in Table1.
You can get the result at least this way, it's a little bit messy, mostly because of formatting the results:
Edit: This is a new version that does not use lag.
select
min (case when C2 = 1 then MINID end), max (case when C2 = 1 then MAXID end), max(case when C2=1 then ROWS end),
min (case when C2 = 0 then MINID end), max (case when C2 = 0 then MAXID end), max(case when C2=0 then ROWS end)
from (
select min(ID) as MINID, max(ID) as MAXID, count(*) as ROWS, C2, row_number() over (partition by C2 order by min(ID)) as GRP3 from (
select *, ID - RN as GRP1, ID - RN2 as GRP2 from (
select
T1.ID, row_number() over (order by T1.ID) as RN,
case when T2.ID is NULL then 0 else 1 end as C2,
row_number() over (partition by case when T2.ID is NULL then 0 else 1 end order by T1.ID) as RN2,
T2.ID as ID2
from #Table1 T1
left outer join #Table2 T2 on T1.ID = T2.ID
) X
) Y
group by GRP1, GRP2, C2
) Z
group by GRP3
order by 1
The idea here is to have a row number ordered by Table1.ID, and it's compared to the Table1.ID, and if the difference changes, then it's a new group. The same logic is used second time, but now partitioned differently for rows that exist in Table2 to handle changes between "Utilized" and "Balance".
From those groupings you can get the min and max value + number of rows. There's one additional grouping with min/max and case to format the result into 2 columns.
See the demo.

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1