GROUP values separated by specific records - sql

I want to make a specific counter which will raise by one after a specific record is found in a row.
time event revenue counter
13.37 START 20 1
13.38 action A 10 1
13.40 action B 5 1
13.42 end 1
14.15 START 20 2
14.16 action B 5 2
14.18 end 2
15.10 START 20 3
15.12 end 3
I need to find out total revenue for every visit (actions between START and END). I was thinking the best way would be to set a counter like this:
so I could group events. But if you have a better solution, I would be grateful.

You can use a query similar to the following:
with StartTimes as
(
select time,
startRank = row_number() over (order by time)
from events
where event = 'START'
)
select e.*, counter = st.startRank
from events e
outer apply
(
select top 1 st.startRank
from StartTimes st
where e.time >= st.time
order by st.time desc
) st
SQL Fiddle with demo.
May need to be updated based on the particular characteristics of the actual data, things like duplicate times, missing events, etc. But it works for the sample data.

SQL Server 2012 supports an OVER clause for aggregates, so if you're up to date on version, this will give you the counter you want:
count(case when eventname='START' then 1 end) over (order by eventtime)
You could also use the latest START time instead of a counter to group by, like this:
with t as (
select
*,
max(case when eventname='START' then eventtime end)
over (order by eventtime) as timeStart
from YourTable
)
select
timeStart,
max(eventtime) as timeEnd,
sum(revenue) as totalRevenue
from t
group by timeStart;
Here's a SQL Fiddle demo using the schema Ian posted for his solution.

Related

Oracle SQL - select last 3 rows after a specific row

Below is my data:
My requirement is to get the first 3 consecutive approvals. So from above data, ID 4, 5 and 6 are the rows that I need to select. ID 1 and 2 are not eligible, because ID 3 is a rejection and hence breaks the consecutive condition of actions. Basically, I am looking for the last rejection in the list and then finding the 3 consecutive approvals after that.
Also, if there are no rejections in the chain of actions then the first 3 actions should be the result. For below data:
So my output should be ID 11, 12 and 13.
And if there are less than 3 approvals, then the output should be the list of approvals. For below data:
output should be ID 21 and 22.
Is there any way to achieve this with SQL query only - i.e. no PL-SQL code?
Here is one method that uses window functions:
Find the first row where there are three approvals.
Find the minimum action_at among the rows with three approvals
Filter
Keep the three rows you want
This version uses fetch which is in Oracle 12+:
select t.*
from (select t.*,
min(case when has_approval_3 = 3 then action_at end) over () as first_action_at
from (select t.*,
sum(case when action = 'APPROVAL' then 1 else 0 end) over (order by action_at rows between current row and 2 following) as has_approval_3
from t
) t
) t
where action = 'APPROVAL' and
(action_at >= first_action_at or first_action_at is null)
order by action_at
fetch first 3 rows only;
You can use IN and ROW_NUMBER analytical function as following:
SELECT * FROM
( SELECT
T.*,
ROW_NUMBER() OVER(ORDER BY Y.ACTION_AT) AS RN
FROM YOUR_TABLE Y
WHERE Y.ACTION = 'APPROVE'
AND Y.ACTION_AT >= COALESCE(
(SELECT MAX(YIN.ACTION_AT)
FROM YOUR_TABLE YIN
WHERE YIN.ACTION = 'REJECT'
), Y.ACTION_AT) )
WHERE RN <= 3;
Cheers!!

oracle sql need help on data according to pattern

I have sample data in oracle database shown below :
set_no set_eff_dt set_term_dt
1000 1/1/2015 12/31/2016
1000 1/1/2017 10/31/2017
1000 11/1/2017 12/31/2018
1000 2/1/2019 10/31/2019
1000 11/1/2019 10/31/2020
I want the Out Like below
1000 1/1/2015 12/31/2018
1000 2/1/2019 10/31/2020
Let me Explain the Pattern and how the out put will Come
In the second row set_off_dt is result of set_term_dt +1
In the third row set_off_dt is result of set_term_dt +1 of second row
In the 4th row set_eff_dt is not a result of set_term_dt+1 from third row so here is group break may b
In the 5th row set_eff_dt is again a result of set_term_dt+1 from 4th row
so it will be collapsed with 4th row as shown in output
In this same pattern we have thousand of records and we want to collapsed as per logic described
what i have tried
SELECT SET_NO,SET_EFF_DT,
case when LEAD (SET_EFF_DT,1) OVER (ORDER BY SET_EFF_DT)-1 = set_trm_dt then 1 else 0 end flg
FROM xx_fl_test
I was just able to identify the flag if SET_EFF_DT = set_trm_dt in new row... But still i dont got understand how tackle the collapsed on behalf of this data.
This is a gaps-and-islands problem. I would solve by calculating a grouping variable, in the following steps:
Determine where a group begins. For this, I do a lag on the previous set_trm_dt and case logic to see if there is no "connection".
Do a cumulative sum of the the flag, to assign a grp to each row.
Aggregate by grp.
The code looks like this:
select set_no, min(set_eff_dt), max(set_trm_dt)
from (select t.*,
sum(case when set_eff_dt > prev_set_trm_dt + 1 then 1 else 0 end) over (partition by set_no order by set_eff_dt) as grp
from (select t.*,
lag(set_trm_dt) over (partition by set_no order by set_eff_dt) as prev_set_trm_dt
from xx_fl_test t
) t
) t
group by set_no, grp;
Consider cumulatively summing your generated column to produce a grouping variable which requires two CTEs: one for your flg calculation, and second for cumulative sum of flg with a window function. Finally, aggregate by cum_flg (but conditionally add 1 for the very first grouping value which starts as 1).
WITH sub AS
(SELECT SET_NO, SET_EFF_DT, SET_TRM_DT,
CASE WHEN LEAD (SET_EFF_DT,1) OVER (ORDER BY SET_EFF_DT)-1 = SET_TRM_DT
THEN 1
ELSE 0
END AS flg
FROM xx_fl_test),
calc AS
(SELECT SET_NO, SET_EFF_DT, SET_TRM_DT,
SUM (flg) OVER (PARTITION BY SET_NO ORDER BY SET_EFF_DT) AS cum_flg
FROM sub)
SELECT SET_NO,
MIN(SET_EFF_DT) AS MIN_SET_EFF_DT,
MAX(SET_TRM_DT) AS MAX_SET_TRM_DT
FROM calc
GROUP BY SET_NO,
CASE cum_flg
WHEN 1
THEN cum_flg + 1
END
Rextester Demo

how can I speed up query with 3 function calls in PostgreSQL

I have a fucntion pb(IDCODE,0) running it with IDCODE=320 gives this sample data:
select *
from pb(320,0)
logid entrydate qty
1 1.10.17 5
2 1.10.17 6
3 1.10.17 5
4 1.10.17 -3
5 2.10.17 6
6 3.10.17 -100
*it actually gives more rows (like 20000) but I reduced it for the example
pb is a very heavy function but in simple terms it shows activities based on their order.
I want to find the entrydate of the first occurrences of qty<0 after the last row of qty>0.
In order to do that I need to do something like this:
Select Min(logid) where qty<0 and logid>(select max(logid) where qty>=0)
In the above sample the requested result is 3.10.17
Because:
logid=5 is max(logid) where qty>=0
and
logid=6 is Min(logid) where qty<0 and logid>(select max(logid) where qty>=0)
which in fact is : Select Min(logid) where qty<0 and logid>5
So I wrote the following query:
select entrydate
from pb(320,0)
where logid= ( SELECT min(logid)
FROM pb(320,0)
where qty<0 and logid>(SELECT coalesce(max(logid),0)
FROM pb(320,0)
WHERE qty >= 0))
This works great but it's 3 times that I call the function pb(320,0).
It's huge time consuming and needless to say that I actually run this query on many IDCODES (like 214) so pb(IDCODE,0) actually runs 214*3 this is horrible.
What can I do?
First, use a CTE, because Postgres might materialize the CTE.
However, you need only one table reference if you use window functions:
with t as (
select *
from pb(320,0)
)
select t.*
from (select t.*, max(case when qty > 0 then logid end) over () as last_poslogid
from t
) t
where id > last_poslogid and qty < 0
order by id
fetch first 1 row only;
More recent versions of Postgres support the filter clause which is a bit more efficient than the case.

SQL: How to Group By task and choose only those that are completed?

I'm using SQL Server and I would need to make query from todo list the next way.
ID Date Status Phase
1 21.1.2017 Done 1
1 22.1 2017 Done 2
2 20.1.2017 Done 1
2 22.1.2017 Undone 2
3 23.1.2017 Undone 1
3 25.1.2017 Undone 2
So I need to find those Task that are Done (all Statuses are Done) and then take the last date so MAX(Date)? I don't need to the Phase info.
Result should be
ID Date Status
1 22.1 2017 Done
Can you please help me how to Group by, take the MAX(Date) but also make some rule that all Statuses are Done?
You can use this.
SELECT TOP 1 [ID], [Date], [Status] FROM MyTable
WHERE [Status] ='Done'
ORDER BY [Date] DESC
here is one method:
select t.id, max(t.date) as date, max(t.status) as status
from t
group by t.id
having min(t.status) = max(t.status) and min(t.status) = 'Done';

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x