SQL Query- Partition into groups & calculate max- min value - sql

Need your help with a SQL query in Oracle db. I have data that I want to partition into groups when event = "Start". E.g. Row 1-6 is a group, row 7-9 is a group. I want to ignore rows with event = "Ignore". Finally I want to calculate max(Value)-min(Value) for these groups. I dont have any way to group the data.
Can this be achieved? Is it possible to use partition by Event = start. Same data is below:
Row Event Value Required Result is max-min of value
1 Start 10
2 A 11
3 B 12
4 C 13
5 D 14
6 E 15 5
--------------------------------------------
7 Start 16
8 A 18
9 B 20 4
--------------------------------------------
10 Start 27
11 A 30
12 B 33
13 C 34 7
--------------------------------------------
14 Ignore 35
--------------------------------------------
15 Ignore 36
--------------------------------------------
16 Start 33
17 A 34
18 B 35
19 C 36
20 D 37
21 E 38 5
--------------------------------------------

Yes, you can do this in SQL.
The following query first finds the group that a row is in, by finding the largest start before the row id. This version uses a correlated subquery for this calculation.
It then does the grouping on the id and does the calculation.
select groupid, max(value) - min(value)
from (select t.*,
(select max(row) from t t2 where t2.row < t.row and t2.event = start
) as groupid
from t
) t
where event <> 'IGNORE'

Related

Is there a way to query a table referencing the result column for prior periods?

The Problem
I want to perform a query using the aggregate results of all time periods prior in the current time period calculations, for each ID.
The solution I have come up with would be to do each time period separately, but this is problematic as the the number of time periods change.
Is there a way to query this using a general approach that does not require hardcoding the result set for each time period?
The Math
For each ID in the data the calculations would be as follows
In time period 0 the aggregate should be zero and the equation would be:
For time period 1 it should be:
And to show the goal, I'll jump to time period 3:
For ID1 and time period 3 the result would be:
As seen in the WANT column in the data below
The Data
The data I have is ID, T, B, P and A. WANT is the expected result, and should match R in the equations
ID
T
B
P
A
WANT
ID1
0
25
0
75
0,0000
ID1
1
25
5
70
1,7857
ID1
2
20
8
67
2,6013
ID1
3
15
32
43
14,4275
ID2
0
25
0
75
0,0000
ID2
1
20
5
70
1,4286
ID2
2
17
8
67
2,2004
ID2
3
10
32
43
10,1425
ID3
0
25
0
75
0,0000
ID3
1
25
5
70
1,7857
ID3
2
25
8
67
3,1983
ID3
3
5
32
43
7,4300
Example data
Now with solution provided by gsalem:
sqlfiddle
Try this in you dbfiddle:
with get_rates (id, t, b, p, a, prt, w) as
(select id, t, b, p,a, 0 prt, (0+b)*p/A w
from data
where T=0
union all
select a.id, a.t, b.b,b.p, b.a, b.prt+w,(b.prt+b.w+a.b)*a.p/a.a
from data a join get_rates b on (a.id=b.id and a.t=b.t+1))
select id,t,b,p,a,w
from get_rates
order by id,t

SAS sum observations not in a group, by group

I have a data set :
data have;
input group $ value;
datalines;
A 4
A 3
A 2
A 1
B 1
C 1
D 2
D 1
E 1
F 1
G 2
G 1
H 1
;
run;
The first variable is a group identifier, the second a value.
For each group, I want a new variable "sum" with the sum of all values in the column, exept for the group the observation is in.
My issue is having to do that on nearly 30 millions of observations, so efficiency matters.
I found that using data step was more efficient than using procs.
The final database should looks like :
data want;
input group $ value $ sum;
datalines;
A 4 11
A 3 11
A 2 11
A 1 11
B 1 20
C 1 20
D 2 18
D 1 18
E 1 20
F 1 20
G 2 18
G 1 20
H 1 20
;
run;
Any idea how to perform this please?
Edit: I don't know if this matter but the example I gave is a simplified version of my issue. In the real case, I have 2 other group variable, thus taking the sum of the whole column and substract the sum in the group is not a viable solution.
The requirement
sum of all values in the column, except for the group the observation is in
indicates two passes of the data must occur:
Compute the all_sum and each group's group_sumA hash can store each group's sum -- computed via a specified suminc: variable and .ref() method invocation. A variable can accumulate allsum.
Compute allsum - group_sum for each row of a group.The group_sum is retrieved from hash and subtracted from allsum.
Example:
data want;
if 0 then set have; * prep pdv;
declare hash sums (suminc:'value');
sums.defineKey('group');
sums.defineDone();
do while (not hash_loaded);
set have end=hash_loaded;
sums.ref(); * adds value to internal sum of hash data record;
allsum + value;
end;
do while (not last_have);
set have end=last_have;
sums.sum(sum:sum); * retrieve groups sum. Do you hear the Dragnet theme too?;
sum = allsum - sum; * subtract from allsum;
output;
end;
stop;
run;
What is wrong with a straight forward approach? You need to make two passes no matter what you do.
Like this. I included extra variables so you can see how the values are derived.
proc sql ;
create table want as
select a.*,b.grand,sum(value) as total, b.grand - sum(value) as sum
from have a
, (select sum(value) as grand from have) b
group by a.group
;
quit;
Results:
Obs group value grand total sum
1 A 3 21 10 11
2 A 1 21 10 11
3 A 2 21 10 11
4 A 4 21 10 11
5 B 1 21 1 20
6 C 1 21 1 20
7 D 2 21 3 18
8 D 1 21 3 18
9 E 1 21 1 20
10 F 1 21 1 20
11 G 1 21 3 18
12 G 2 21 3 18
13 H 1 21 1 20
Note it does not matter what you have as your GROUP BY clause.
Do you really need to output all of the original observations? Why not just output the summary table?
proc sql ;
create table want as
select a.group, b.grand - sum(value) as sum
from have a
, (select sum(value) as grand from have) b
group by a.group
;
quit;
Results
Obs group total sum
1 A 10 11
2 B 1 20
3 C 1 20
4 D 3 18
5 E 1 20
6 F 1 20
7 G 3 18
8 H 1 20
I would break this out into two different segments:
1.) You could start by using PROC SQL to get the sums by the group
2.) Then use some IF/THEN statements to reassign the values by group

Two Condition Where-clause SQL

I need to filter some rows when 2 conditions are met, but not excluding the other rows.
Table:
idRow idMaster idList
1 10 45
2 10 46
3 10 47
4 11 10
5 11 98
6 14 56
7 16 28
8 20 55
Example:
When:
idMaster=10 and id List=45 (only show this combination for idMaster 10)
idMaster=11 and idList=98 (only show this combination for idMaster 11)
list all other rows as well.
Expected result:
idRow idMaster idList
1 10 45
5 11 98
6 14 56
7 16 28
8 20 55
Running SQL Server 2014
I tried combinations of CASE IF but all cases only filter the idMaster=10,11 and idList=45,98, excluding the other rows
Although you didn't mentioned the database name, this following query logic will be applicable for all databases-
SELECT *
FROM your_table
WHERE idMaster NOT IN (10,11)
OR (idMaster = 10 AND idList = 45)
OR (idMaster = 11 AND idList = 98)
You can indeed do this with a (nested) case. Hopefully this helps you understand better.
case idMaster
when 10 then case idList when 45 then 1 end
when 11 then case idList when 98 then 1 end
else 1
end = 1
This might be the best though:
not (idList = 10 and idList <> 45 or idList = 11 and idList <> 98)
Overall it's usually beneficial to avoid repeating that list of values in multiple places. Both of these avoid the need to keep things in sync when changes come.

Sort String column which has numbers and Alphabets( Oracle SQL)

I want to sort a string column which can include both numbers and alphabets.
SQL Script:
select distinct a.UoA, b.rating , b.tot from omt_source a left join
wlm_progress_Scored b
on a.UoA = b.UoA
where a.UoA in (select UoA from UserAccess_dev
where trim(App_User) = lower(:APP_USER))
order by
regexp_substr(UoA, '^\D*') ,
to_number(regexp_substr(UoA, '\d+'))--);
Output I'm currently getting:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
23
26B
26A
27
28
30
31
32
33
34B
34A
But, I want 26 and 34 to be in this order
26A
26B
34A
34B
Any suggestion will be much helpful
Thanks
If your first order by clause ensures that the primary sort order is based on the numerical component of the UoA field, then your second order clause could actually be just the UoA field itself. I.e.
order by
regexp_substr(UoA, '^\D*'), UoA;

SQL - Select rows after reaching minimum value/threshold

Using Sql Server Mgmt Studio. My data set is as below.
ID Days Value Threshold
A 1 10 30
A 2 20 30
A 3 34 30
A 4 25 30
A 5 20 30
B 1 5 15
B 2 10 15
B 3 12 15
B 4 17 15
B 5 20 15
I want to run a query so only rows after the threshold has been reached are selected for each ID. Also, I want to create a new days column starting at 1 from where the rows are selected. The expected output for the above dataset will look like
ID Days Value Threshold NewDayColumn
A 3 34 30 1
A 4 25 30 2
A 5 20 30 3
B 4 17 15 1
B 5 20 15 2
It doesn't matter if the data goes below the threshold for the latter rows, I want to take the first row when threshold is crossed as 1 and continue counting rows for the ID.
Thank you!
You can use window functions for this. Here is one method:
select t.*, row_number() over (partition by id order by days) as newDayColumn
from (select t.*,
min(case when value > threshold then days end) over (partition by id) as threshold_days
from t
) t
where days >= threshold_days;