Calculate average amount for each value between two columns - sql

I have this table called ltv
tick_lower
tick_upper
ltv_usdc
ltv_eth
204800
204880
38470
-30.252800179
204620
205420
1107583.610283
867.663698001
The problem is that ltv_usdc and ltv_eth is distributed equally between tick_lower and tick_upper. So, I aim to return a table where ltv will be calculated for each tick.
Here is an example of calculations for the first row in the above table.
tick
ltv_usdc
ltv_eth
204800
38470/(tick_upper-tick_lower) = 480.875
-30.252800179/(tick_upper-tick_lower) = -0.37816000223
204801
480.875
-0.37816000223
...
...
...
204880
480.875
-0.37816000223
Finally, I'm willing to group all the rows for each tick.
So far I haven't found a solution.

This can be done in a few steps:
Get the max ticks (tick_upper-tick_lower) in ltv
Generate a number series from 0 to max_ticks
Calculate all ticks for each row in ltv
Aggregate ltv_usdc and ltv_eth to tick
Tested in MySQL 8 db<>fiddle
with recursive cte_param as (
select max(tick_upper - tick_lower) as max_ticks
from ltv),
cte_n (i) as (
select 0 as i
union all
select n.i + 1
from cte_n n
where i < (select max_ticks from cte_param)),
cte_ltv as (
select tick_lower,
tick_upper,
ltv_usdc / (tick_upper - tick_lower) as ltv_usdc,
ltv_eth / (tick_upper - tick_lower) as ltv_eth
from ltv)
select l.tick_lower + n.i as tick,
sum(l.ltv_usdc) as ltv_usdc,
sum(l.ltv_eth) as ltv_eth
from cte_ltv l, cte_n n
where l.tick_lower + n.i <= l.tick_upper
group by tick
order by tick;

Related

Frequency table of continuous variable in SQL?

I have a continuous variable SQL table:
x
1 622.108
2 622.189
3 622.048
4 622.758
5 622.191
6 622.677
7 622.598
8 622.020
9 621.228
10 622.690
...
and I try to get a simple frequency table, e.g. with 3 buckets, like this:
bucket n
[621.228-621.738[ 1
[621.738-622.248[ 5
[622.248-622.758] 4
Seems easy but I cannot manage to make it in SQL (I am running it on a Cloudera Impala engine).
I have looked into dense_rank() or ntile() without success.
Any idea ?
You can use window functions to divide the range into three equal parts and then use arithmetic:
select min_x + range * (row_number() over (order by min(x)) - 1) as bucket_hi,
min_x + range * row_number() over (order by min(x)) as bucket_hi,
count(*)
from (select t.*,
min(x) over () as min_x,
max(x) over () as max_x,
0.000001 + max(x) over () - min(x) over () as range
from t
) t
group by floor((x - min_x) / range)), min_x, range
There are at least two problems with your question:
You have not provided any code to show us what you have tried. It really is good sometimes to just work out the problem yourself. Nevertheless, I found the problem interesting and decided to play.
Your range blocks overlap. If, for example, you were to have the value 621.738 in your list, which bucket would contain it? [621.228-621.738] or [621.738-622.248]?
There are also at least three problems with my answer, so I don't expect you to accept this. However, maybe it will get you started. Hopefully, this disclaimer will keep me from getting down voted. :-)
The answer is in T-SQL. Sorry, it's what I have to work with.
The answer is not generic. It always creates three and only three buckets.
It only works if the data type limits the result to 3 decimal places.
Remember, this is only one possible solution, and in my mind a very weak one at that.
With those disclaimers, here's what I wrote:
SELECT
'[' + STR( RANGES.RANGESTART, 7, 3 )
+ ' - '
+ STR( RANGES.RANGEEND, 7, 3 ) + ']' AS 'BUCKET'
,COUNT(*) AS 'N'
FROM
( SELECT
VALS.MINVAL + (CAST( CNT.INC AS DECIMAL(7,3) ) * VALS.RANGEWIDTH) AS 'RANGESTART'
,CASE WHEN CNT.INC < 2
THEN VALS.MINVAL + (CAST( CNT.INC + 1 AS DECIMAL(7,3) ) * VALS.RANGEWIDTH) - 0.001
ELSE VALS.MINVAL + (CAST( CNT.INC + 1 AS DECIMAL(7,3) ) * VALS.RANGEWIDTH)
END AS 'RANGEEND'
FROM
( SELECT
MIN(CURVAL) AS 'MINVAL'
,MAX(CURVAL) AS 'MAXVAL'
,(MAX(CURVAL) - MIN(CURVAL)) / 3 AS 'RANGEWIDTH'
FROM
MYVALUE ) VALS
CROSS JOIN (VALUES (0), (1), (2) ) CNT(INC)
) RANGES
INNER JOIN MYVALUE V
ON V.CURVAL BETWEEN RANGES.RANGESTART AND RANGES.RANGEEND
GROUP BY
RANGES.RANGESTART
,RANGES.RANGEEND
ORDER BY 1
;
In the above, your values would be in the CURVAL column of the MYVALUE table.
Good luck. I hope this helps you on your way.

find the maximum in a column, but only when two other columns match

I need help in PostgreSQL.
I have two tables
Predicton - predicts future disasters and casualties for each city.
Measures fits the type of damage control providers for each type of disaster (incl. cost and percent of "averted casualties")
Each disaster and provider combination has an amount of averted casualties (the percent from measures * amount of predicted casualties for that disaster*0.01).
For each combination of city and disaster, I need to find two providers that
1) their combined cost is less than a million
2) have the biggest amount of combined averted casualties.
My work and product so far
select o1.cname, o1.etype, o1.provider as provider1, o2.provider as provider2, (o1.averted + o2.averted) averted_casualties
from (select cname, m.etype, provider, mcost, (percent*Casualties*0.01)averted
from measures m, prediction p
where (m.etype = p.etype)) as o1, (select cname, m.etype, provider, mcost, (percent*Casualties*0.01)averted
from measures m, prediction p
where (m.etype = p.etype)) as o2
where (o1.cname = o2.cname) and (o1.etype = o2.etype) and (o1.provider < o2.provider) and (o1.mcost + o2.mcost < 1000000)
How do I change this query so It Will show me the best averted_casualties for each city/disaster combo (not just max of all table, max for each combo)
This is the desired outcome:
P.S. I'm not allowed to use ordering, views or functions.
First, construct all pairs of providers and do the casualty and cost calculation:
select p.*, m1.provider as provider_1, m2.provider as provider_2,
p.casualties * (1 - m1.percent / 100.0) * (1 - m2.percent / 100.0) as net_casualties,
(m1.mcost + m2.mcost) as total_cost
from measures m1 join
measures m2
on m1.etype = m2.etype and m1.provide < m2.provider join
prediction p
on m1.etype = p.etype;
Then, apply your conditions. Normally, you would use window functions, but since ordering isn't allowed for this exercise, you want to use a subquery:
with pairs as (
select p.*, m1.provider as provider_1, m2.provider as provider_2,
p.casualties * (1 - m1.percent / 100.0) * (1 - m2.percent / 100.0) as net_casualties,
(m1.mcost + m2.mcost) as total_cost
from measures m1 join
measures m2
on m1.etype = m2.etype and m1.provide < m2.provider join
prediction p
on m1.etype = p.etype;
)
select p.*
from pairs p
where p.total_cost < 1000000 and
p.net_casualties = (select min(p2.net_casualties)
from pairs p2
where p2.city = p.city and p2.etype = p.etype and
p2.total_cost < 1000000
);
The biggest number of averted casualties results in the smallest number of net casualties. They are the same thing.
As for your attempted solution. Just seeing the , in the from clause tells me that you need to study up on join. Simple rule: Never use commas in the from clause. Always use proper, explicit, standard join syntax.
Your repeated subqueries also suggest that you need to learn about CTEs.

Is there a way I can Query Missing numbers in a table?

I work for a Logistics Company and we have to have a 7 digit Pro Number on each piece of freight that is in a pre-determined order. So we know there is gaps in the numbers, but is there any way I can Query the system and find out what ones are missing?
So show me all the numbers from 1000000 to 2000000 that do not exist in column name trace_number.
So as you can see below the sequence goes 1024397, 1024398, then 1051152 so I know there is a substantial gap of 26k pro numbers, but is there anyway to just query the gaps?
Select t.trace_number,
integer(trace_number) as number,
ISNUMERIC(trace_number) as check
from trace as t
left join tlorder as tl on t.detail_number = tl.detail_line_id
where left(t.trace_number,1) in ('0','1','2','3','4','5','6','7','8','9')
and date(pick_up_by) >= current_date - 1 years
and length(t.trace_number) = 7
and t.trace_type = '2'
and site_id in ('SITE5','SITE9','SITE10')
and ISNUMERIC(trace_number) = 'True'
order by 2
fetch first 10000 rows only
I'm not sure what your query has to do with the question, but you can identify gaps using lag()/lead(). The idea is:
select (trace_number + 1) as start_gap,
(next_tn - 1) as end_gap
from (select t.*,
lead(trace_number) order by (trace_number) as next_tn
from t
) t
where next_tn <> trace_number + 1;
This does not find them within a range. It just finds all gaps.
try Something like this (adapt the where condition, put into clause "on") :
with Range (nb) as (
values 1000000
union all
select nb+1 from Range
where nb<=2000000
)
select *
from range f1 left outer join trace f2
on f2.trace_number=f1.nb
and f2.trace_number between 1000000 and 2000000
where f2.trace_number is null

Fetch rows based on condition

I am using PostgreSQL on Amazon Redshift.
My table is :
drop table APP_Tax;
create temp table APP_Tax(APP_nm varchar(100),start timestamp,end1 timestamp);
insert into APP_Tax values('AFH','2018-01-26 00:39:51','2018-01-26 00:39:55'),
('AFH','2016-01-26 00:39:56','2016-01-26 00:40:01'),
('AFH','2016-01-26 00:40:05','2016-01-26 00:40:11'),
('AFH','2016-01-26 00:40:12','2016-01-26 00:40:15'), --row x
('AFH','2016-01-26 00:40:35','2016-01-26 00:41:34') --row y
Expected output:
'AFH','2016-01-26 00:39:51','2016-01-26 00:40:15'
'AFH','2016-01-26 00:40:35','2016-01-26 00:41:34'
I had to compare start and endtime between alternate records and if the timedifference < 10 seconds get the next record endtime till last or final record.
I,e datediff(seconds,2018-01-26 00:39:55,2018-01-26 00:39:56) Is <10 seconds
I tried this :
SELECT a.app_nm
,min(a.start)
,max(b.end1)
FROM APP_Tax a
INNER JOIN APP_Tax b
ON a.APP_nm = b.APP_nm
AND b.start > a.start
WHERE datediff(second, a.end1, b.start) < 10
GROUP BY 1
It works but it doesn't return row y when conditions fails.
There are two reasons that row y is not returned is due to the condition:
b.start > a.start means that a row will never join with itself
The GROUP BY will return only one record per APP_nm value, yet all rows have the same value.
However, there are further logic errors in the query that will not successfully handle. For example, how does it know when a "new" session begins?
The logic you seek can be achieved in normal PostgreSQL with the help of a DISTINCT ON function, which shows one row per input value in a specific column. However, DISTINCT ON is not supported by Redshift.
Some potential workarounds: DISTINCT ON like functionality for Redshift
The output you seek would be trivial using a programming language (which can loop through results and store variables) but is difficult to apply to an SQL query (which is designed to operate on rows of results). I would recommend extracting the data and running it through a simple script (eg in Python) that could then output the Start & End combinations you seek.
This is an excellent use-case for a Hadoop Streaming function, which I have successfully implemented in the past. It would take the records as input, then 'remember' the start time and would only output a record when the desired end-logic has been met.
Sounds like what you are after is "sessionisation" of the activity events. You can achieve that in Redshift using Windows Functions.
The complete solution might look like this:
SELECT
start AS session_start,
session_end
FROM (
SELECT
start,
end1,
lead(end1, 1)
OVER (
ORDER BY end1) AS session_end,
session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN session_switch = 0 AND reverse_session_switch = 1
THEN 'start'
ELSE 'end' END AS session_boundary
FROM (
SELECT
start,
end1,
CASE WHEN datediff(seconds, end1, lead(start, 1)
OVER (
ORDER BY end1 ASC)) > 10
THEN 1
ELSE 0 END AS session_switch,
CASE WHEN datediff(seconds, lead(end1, 1)
OVER (
ORDER BY end1 DESC), start) > 10
THEN 1
ELSE 0 END AS reverse_session_switch
FROM app_tax
)
AS sessioned
WHERE session_switch != 0 OR reverse_session_switch != 0
UNION
SELECT
start,
end1,
'start'
FROM (
SELECT
start,
end1,
row_number()
OVER (PARTITION BY APP_nm
ORDER BY end1 ASC) AS row_num
FROM APP_Tax
) AS with_row_number
WHERE row_num = 1
) AS with_boundary
) AS with_end
WHERE session_boundary = 'start'
ORDER BY start ASC
;
Here is the breadkdown (by subquery name):
sessioned - we first identify the switch rows (out and in), the rows in which the duration between end and start exceeds limit.
with_row_number - just a patch to extract the first row because there is no switch into it (there is an implicit switch that we record as 'start')
with_boundary - then we identify the rows where specific switches occur. If you run the subquery by itself it is clear that session start when session_switch = 0 AND reverse_session_switch = 1, and ends when the opposite occurs. All other rows are in the middle of sessions so are ignored.
with_end - finally, we combine the end/start of 'start'/'end' rows into (thus defining session duration), and remove the end rows
with_boundary subquery answers your initial question, but typically you'd want to combine those rows to get the final result which is the session duration.

Calculating Geometrically Linked Returns in SQL SERVER 2008

Calculating geometrically link returns
How do you multiply record2 * record1?
The desire is to return a value for actual rate and annulized rate
Given table unterval:
EndDate PctReturn
-------------------------------
1. 05/31/06 -0.2271835
2. 06/30/06 -0.1095986
3. 07/31/06 0.6984908
4. 08/31/06 1.4865360
5. 09/30/06 0.8938896
The desired output should look like this:
EndDate PctReturn Percentage UnitReturn
05/31/06 -0.2271835 -0.002272 0.997728
06/30/06 -0.1095986 -0.001096 0.996634669
07/31/06 0.6984908 0.006985 1.00359607
08/31/06 1.4865360 0.014865 1.018514887
09/30/06 0.8938896 0.008939 1.027619286
Percentage = PctReturn/100
UnitReturn (1 + S1) x (1 + S2) x ... (1 + Sn) - 1
Aggregating values desired:
Actual Rate 2.761928596
Annulized 6.757253223
Mathematics on aggregating value:
Actual Rate 1.027619 1.027619-1 = 0.027619 * 100 = 2.761928596
Annulized Rate 6.757253 (ActualRate^(12/number of intervals)-1)*100
Number of intervals in Example = 5
there are only 5 records or intervals
I did try utilizing the sum in the select statement but this did not allow for multiplying record2 by record1 to link returns. I thought utilizing the while function would allow for stepping record by record to multiply up the values of unitreturn. My starter level in SQL has me looking for help.
You have two option for getting a product in SQL Server.
1. Simulate using logs and exponents:
SQL Fiddle
create table returns
(
returnDate date,
returnValue float
)
insert into returns values('05/31/06', -0.002271835)
insert into returns values('06/30/06', -0.001095986)
insert into returns values('07/31/06', 0.006984908)
insert into returns values('08/31/06', 0.014865360)
insert into returns values('09/30/06', 0.008938896)
select totalReturn = power
(
cast(10.0 as float)
, sum(log10(returnValue + 1.0))
) - 1
from returns;
with tr as
(
select totalReturn = power
(
cast(10.0 as float)
, sum(log10(returnValue + 1.0))
) - 1
, months = cast(count(1) as float)
from returns
)
select annualized = power(totalReturn + 1, (1.0 / (months / 12.0))) - 1
from tr;
This leverages logs and exponents to simulate a product calculation. More info: User defined functions.
The one issue here is that it will fail for return < -100%. If you don't expect these it's fine, otherwise you'll need to set any values < 100% to -100%.
You can then use this actual return to get an annualized return as required.
2. Define a custom aggregate with CLR:
See Books Online.
You can create a CLR custom function and then link this an aggregate for use in your queries. This is more work and you'll have to enable CLRs on your server, but once it's done once you can use it as much as required.