Grouping when using analytic functions

Grouping when using analytic functions - sql

Let's suppose we have a table that looks like this:
Level|Depth|Descrip|
0 | 0 | Base |
1 | 50 | Level_1 |
2 | 53 | Level_2 |
3 | 60 | Level_3 |
8 | 80 | Level_8 |
10 | 81 | Level_10|
15 | 101 | Level_15|
16 | 102 | Level_16|
17 | 102 | Level_16_bis|
18 | 103 | Level_17|
I need, in first place, to get the rows that represent significative(more than 15 mts) depth jump respecting the previous ones. I get those rows doing something like this:
Select level,depth, descrip from(
Select level
, depth
,lag(depth) over (order by level asc) as prev_depth
, descrip
from ground_levels
)
Where abs(depth-prev_depth) > 15 and depth > 0
Which give me a table like this:
Level|Depth|Descrip|
1 | 50 | Level_1|
8 | 80 | Level_8|
15 | 101 | Level_15|
Now, I need to collect the levels that falls in between the jumps. So, I need something like this:
Level|Depth| Descrip | Equivalent_levels |
1 | 50 | Level_1 | 2,3 |
8 | 80 | Level_8 | 10 |
15 | 101 | Level_15| 16,17,18 |
I have being doing some searching about use "listagg", rank() and other analytic functions but I'm stuck with the script :(
In addition, it would be great if I can start a grouping when this condition is meet: abs(depth-prev_depth) > 15, so I can get something like that:
Level|Depth|Descrip | Group_ID
1 | 50 | Level_1 | 1 |
2 | 53 | Level_2 | 1 |
3 | 60 | Level_3 | 1 |
8 | 80 | Level_8 | 2 |
10 | 81 | Level_10| 2 |
15 | 101 | Level_15| 3 |
16 | 102 | Level_16| 3 |
17 | 102 | Level_16_bis| 3 |
18 | 103 | Level_17| 3 |
Any ideas ??
P.S: Sorry my bad english...

You can use a cumulative sum to define the groups. And then aggregation:
Select min(level) as level,
min(depth) keep (dense_rank first order by level) as depth,
min(descrip) keep (dense_rank first order by level) as descrip,
list_agg(level, ',') within group (order by level) as levels
from (select gl.*,
sum(case when abs(prev_depth - depth) > 15 and depth > 0 then 1 else 0 end) over (order by level) as grp
from (select gl.*, lag(depth) over (order by level asc) as prev_depth
from ground_levels
) gl
) gl
group grp;
This actually keeps the starting level in the list. It can be removed, but that requires a bit more work.

Related

Running maths over an entire database and ranking all users

I have a database of bets. Each bet has a 'Win', 'Loss', or 'Pending' state. What I want to do is to have an SQL statement that will get the last, say, 20 bets a user has placed, find out their ROI (Total profit / Total staked * 100).
So I'm just wondering if there is a better way to do this. Do I basically have to get the users table, loop over every user, get their last 20 bets, find the ROI and then order it. If my User table gets huge then this process is going to take ages, right?
Is creating a 'View' going to save on this time?
Is there a way to do this in one statement that won't cost my life in processing time?
Here are the tables
Users
| ID | User |
| 1 | Test1 |
| 2 | Test2 |
| 3 | Test3 |
| 4 | Test4 |
Bets
| ID | User | Amount | Odds | Result |
| 1 | 1 | 10 | 1.35 | Win |
| 2 | 1 | 25 | 2.55 | Win |
| 3 | 3 | 15 | 1.65 | Loss |
| 4 | 2 | 11 | 2.12 | Pending |
Se essentially I would like a table that ranks them as ROI.
| User | AmountBet | AmountWon | ROI |
| 1 | 35 | 77 | 215 |
| 2 | 11 | 0 | 0 |
| 3 | 15 | 0 | 0 |
| 4 | 0 | 0 | 0 |

Assuming the ID of the bets table represents increasing time such that it can be used to identify "last 20", then
WITH b
AS
(
SELECT id,
user,
CASE WHEN result = 'Pending' THEN 0 ELSE amount END AS amount,
CASE WHEN result = 'Win' THEN amount * odds ELSE 0 END as winnings,
ROW_NUMBER() OVER (PARTITION BY user ORDER BY id DESC) AS rownum
FROM bets
)
SELECT user,
SUM(amount) AS amount_bet,
SUM(winnings) AS amount_won,
CASE
WHEN SUM(amount) > 0
THEN SUM(winnings) * 100 / SUM(amount)
ELSE 0
END AS roi
FROM b
WHERE rownum < 21
GROUP BY user;
dbfiddle.uk

oracle query alternating records

I have a table that looks like this:
SEQ TICKER INDUSTRY
1 AAPL 10
1 FB 10
1 IBM 10
1 CSCO 10
1 FEYE 20
1 F 20
2 JNJ 10
2 CMPQ 10
2 CYBR 10
2 PFPT 10
2 K 20
2 PANW 20
What I need is record with the same industry code, to alternate between the 1 & 2 records like this:
1 AAPL 10
2 IBM 10
1 FB 10
2 CSCO 10
1 FEYE 20
2 PANW 20
So basically, grouped by the same industry code, alternate between the 1 & 2 records.
Can't figure out how.

Use an analytic function to create a row number that starts over for each group (industry and sequence), then sort by that row number.
select seq, ticker, industry
,row_number() over (partition by industry, seq order by ticker)custom_order
from stocks
order by industry, custom_order, seq;
See this SQL Fiddle for a full example. (It doesn't perfectly match your example results but either your example results are incorrect or there's something else to this question I don't understand.)

Don't see how you arrived at the example result in your question, but this result:
| SEQ | TICKER | INDUSTRY |
|-----|--------|----------|
| 1 | AAPL | 10 |
| 2 | CMPQ | 10 |
| 1 | CSCO | 10 |
| 2 | CYBR | 10 |
| 1 | FB | 10 |
| 2 | IBM | 10 |
| 1 | JNJ | 10 |
| 2 | PFPT | 10 |
| 1 | F | 20 |
| 2 | FEYE | 20 |
| 1 | K | 20 |
| 2 | PANW | 20 |
Was produced using this query, where (I assume) you want the SEQ column calculated for you:
select
1 + mod(rn,2) Seq
, ticker
, industry
from (
select
ticker
, industry
, 1+ row_number() over (partition by industry
order by ticker) rn
from stocks
)
order by industry, rn
Please note this is a derivative of the earlier answer by Jon Heller, this derivative can be found online at http://sqlfiddle.com/#!4/088271/1

SQL order by highest to lowest in one table referencing another table in an UPDATE

Hey all I have the following tables that need in order to get data from one that matches the other and have it from highest to lowest depending on the int of TempVersion.
UPDATE
net_Users
SET
net_Users.DefaultId = b.TId
FROM
(SELECT
TOP 1 IndivId,
TId
FROM
UTeams
WHERE
UTeams.[Active] = 1
ORDER BY
TempVersion DESC
) AS b
WHERE
net_Users.IndivId = b.IndivId
In the above I am trying to order from the highest TempVersion to the lowest.
The query above seems to just update 1 of those records with the TempVersion and stop there. I am needing it to loop to find all associated users with the same IndivId matching.
Anyone able to help me out with this?
sample data
net_Users:
name | DefaultId | IndivId | etc...
--------+-----------+---------+-------
Bob | | 87 | etc...
Jan | | 231 | etc...
Luke | | 8 | etc...
UTeams:
IndivId | TempVersion | etc...
--------+-------------+-------
8 | 44 | etc...
17 | 18 | etc...
8 | 51 | etc...
8 | 2 | etc...
7 | 22 | etc...
8 | 125 | etc...
87 | 10 | etc...
14 | 88 | etc...
8 | 5 | etc...
15 | 54 | etc...
65 | 11 | etc...
87 | 15 | etc...
39 | 104 | etc...
And the output I would be needing is (going to choose IndivId 8):
In net_users:
Name | DefaultId | IndivId | etc...
-----+-----------+---------+-------
Luke | 125 | 8 | etc...
Luke | 51 | 8 | etc...
Luke | 44 | 8 | etc...
Luke | 5 | 8 | etc...
Luke | 2 | 8 | etc...

I think this is what you were trying to do:
update net_Users
set net_Users.DefaultId = coalesce((
select top 1 TId
from UTeams
where UTeams.[Active] = 1
and net_Users.IndivId = UTeams.IndivId
order by u.TempVersion desc
)
,net_Users.DefaultId
)
another way using cross apply()
update n
set DefaultId = coalesce(x.Tid,n.DefaultId)
from net_Users as n
cross apply (
select top 1 TId
from UTeams as u
where u.[Active] = 1
and n.IndivId = u.IndivId
order by u.TempVersion desc
) as x
another way to do that with a common table expression and row_number()
with cte as (
select
n.IndivId
, n.DefaultId
, u.Tid
, rn = row_number() over (
partition by n.IndivId
order by TempVersion desc
)
from net_users as n
inner join UTeams as u
on n.IndivId = u.IndivId
where u.[Active]=1
)
update cte
set DefaultId = Tid
where rn = 1

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.

You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.

SQL Data to interpolate/extrapolate

If i have a table that keeps a running average of kW usage at a certain temperature, and I wanted to get a kW usage for a temperature that has not been recorded before, how could i get either
(A) Two data points above or two points below the temperature to extrapolate.
(B) Closest data above and below the temperature to interpolate
The table temperatures looks like this
Column | Type | Modifiers | Storage | Stats target | Description
-------------------------+------------------+-----------+---------+--------------+---------------
temperature_io_id | integer | not null | plain | |
temperature_station_id | integer | not null | plain | |
temperature_value | integer | not null | plain | | in Fahrenheit
temperature_current_kw | double precision | not null | plain | |
temperature_value_added | integer | default 1 | plain | |
temperature_kw_year_1 | double precision | default 0 | plain | |
"temperatures_pkey" PRIMARY KEY, btree (temperature_io_id, temperature_station_id, temperature_value)
(A) Proposed Solution
This would be a bit easier I think. The query would order the rows by the temperature value > or < the temperature im going for, then limit the results to 2? This would give me the two closest values that are above or below the temperature. Of course the order would have to be descending and ascending to make sure i get the right side of the items.
SELECT * FROM temperatures
WHERE
temperature_value > ACTUALTEMP and temperature_io_id = ACTUAL_IO_id
ORDER BY
temperature_value
LIMIT 2;
I think similar to above, but just limit it to 1 and do 2 queries, one for > and the other for <. I feel like this could be done better though?
Edit - Some sample data
temperature_io_id | temperature_station_id | temperature_value | temperature_current_kw | temperature_value_added | temperature_kw_year_1
-------------------+------------------------+-------------------+------------------------+-------------------------+-----------------------
18751 | 151 | 35 | 26.1 | 2 | 0
18752 | 151 | 35 | 30.5 | 2 | 0
18753 | 151 | 35 | 15.5 | 2 | 0
18754 | 151 | 35 | 12.8 | 2 | 0
18643 | 151 | 35 | 4.25 | 2 | 0
18644 | 151 | 35 | 22.15 | 2 | 0
18645 | 151 | 35 | 7.45 | 2 | 0
18646 | 151 | 35 | 7.5 | 2 | 0
18751 | 151 | 34 | 25.34 | 5 | 0
18752 | 151 | 34 | 30.54 | 5 | 0
18753 | 151 | 34 | 15.48 | 5 | 0
18754 | 151 | 34 | 13.08 | 5 | 0
18643 | 151 | 34 | 4.3 | 5 | 0
18644 | 151 | 34 | 22.44 | 5 | 0
18645 | 151 | 34 | 7.34 | 5 | 0
18646 | 151 | 34 | 7.54 | 5 | 0

You can get the nearest rows using:
select t.*
from temperatures t
order by abs(temperature_value - ACTUAL_TEMPERATURE) asc
limit 2
Or, a better idea in this case, is union:
(select t.*
from temperatures t
where temperature_value <= ACTUAL_TEMPERATURE
order by temperature_value desc
limit 1
) union
(select t.*
from temperatures t
where temperature_value >= ACTUAL_TEMPERATURE
order by temperature_value asc
limit 1
)
This version is better because it returns only one row if the temperature is in the table. This is a case where the UNION and duplicate removal is useful.
Next use conditional aggregation to get the information needed. This uses a short-cut, assuming that the kw increases with temperature:
select min(temperature_value) as mintv, max(temperature_value) as maxtv,
min(temperature_current_kw) as minck, max(temperature_current_kw) as maxck
from ((select t.*
from temperatures t
where temperature_value <= ACTUAL_TEMPERATURE
order by temperature_value desc
limit 1
) union
(select t.*
from temperatures t
where temperature_value >= ACTUAL_TEMPERATURE
order by temperature_value asc
limit 1
)
) t;
Finally, do some arithmetic to get the weighted average:
select (case when maxtv = mintv then minkw
else minkw + (ACTUAL_TEMPERATURE - mintv) * ((maxkw - minkw) / (maxtv - mintv))
end)
from (select min(temperature_value) as mintv, max(temperature_value) as maxtv,
min(temperature_current_kw) as minkw, max(temperature_current_kw) as maxkw
from ((select t.*
from temperatures t
where temperature_value <= ACTUAL_TEMPERATURE
order by temperature_value desc
limit 1
) union
(select t.*
from temperatures t
where temperature_value >= ACTUAL_TEMPERATURE
order by temperature_value asc
limit 1
)
) t
) t;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Grouping when using analytic functions - sql

Related

Running maths over an entire database and ranking all users

oracle query alternating records

SQL order by highest to lowest in one table referencing another table in an UPDATE

How do I do multiple selection based on a flowchart of criteria?

SQL Data to interpolate/extrapolate

Categories

Resources