SQL Data to interpolate/extrapolate - sql

If i have a table that keeps a running average of kW usage at a certain temperature, and I wanted to get a kW usage for a temperature that has not been recorded before, how could i get either
(A) Two data points above or two points below the temperature to extrapolate.
(B) Closest data above and below the temperature to interpolate
The table temperatures looks like this
Column | Type | Modifiers | Storage | Stats target | Description
-------------------------+------------------+-----------+---------+--------------+---------------
temperature_io_id | integer | not null | plain | |
temperature_station_id | integer | not null | plain | |
temperature_value | integer | not null | plain | | in Fahrenheit
temperature_current_kw | double precision | not null | plain | |
temperature_value_added | integer | default 1 | plain | |
temperature_kw_year_1 | double precision | default 0 | plain | |
"temperatures_pkey" PRIMARY KEY, btree (temperature_io_id, temperature_station_id, temperature_value)
(A) Proposed Solution
This would be a bit easier I think. The query would order the rows by the temperature value > or < the temperature im going for, then limit the results to 2? This would give me the two closest values that are above or below the temperature. Of course the order would have to be descending and ascending to make sure i get the right side of the items.
SELECT * FROM temperatures
WHERE
temperature_value > ACTUALTEMP and temperature_io_id = ACTUAL_IO_id
ORDER BY
temperature_value
LIMIT 2;
I think similar to above, but just limit it to 1 and do 2 queries, one for > and the other for <. I feel like this could be done better though?
Edit - Some sample data
temperature_io_id | temperature_station_id | temperature_value | temperature_current_kw | temperature_value_added | temperature_kw_year_1
-------------------+------------------------+-------------------+------------------------+-------------------------+-----------------------
18751 | 151 | 35 | 26.1 | 2 | 0
18752 | 151 | 35 | 30.5 | 2 | 0
18753 | 151 | 35 | 15.5 | 2 | 0
18754 | 151 | 35 | 12.8 | 2 | 0
18643 | 151 | 35 | 4.25 | 2 | 0
18644 | 151 | 35 | 22.15 | 2 | 0
18645 | 151 | 35 | 7.45 | 2 | 0
18646 | 151 | 35 | 7.5 | 2 | 0
18751 | 151 | 34 | 25.34 | 5 | 0
18752 | 151 | 34 | 30.54 | 5 | 0
18753 | 151 | 34 | 15.48 | 5 | 0
18754 | 151 | 34 | 13.08 | 5 | 0
18643 | 151 | 34 | 4.3 | 5 | 0
18644 | 151 | 34 | 22.44 | 5 | 0
18645 | 151 | 34 | 7.34 | 5 | 0
18646 | 151 | 34 | 7.54 | 5 | 0

You can get the nearest rows using:
select t.*
from temperatures t
order by abs(temperature_value - ACTUAL_TEMPERATURE) asc
limit 2
Or, a better idea in this case, is union:
(select t.*
from temperatures t
where temperature_value <= ACTUAL_TEMPERATURE
order by temperature_value desc
limit 1
) union
(select t.*
from temperatures t
where temperature_value >= ACTUAL_TEMPERATURE
order by temperature_value asc
limit 1
)
This version is better because it returns only one row if the temperature is in the table. This is a case where the UNION and duplicate removal is useful.
Next use conditional aggregation to get the information needed. This uses a short-cut, assuming that the kw increases with temperature:
select min(temperature_value) as mintv, max(temperature_value) as maxtv,
min(temperature_current_kw) as minck, max(temperature_current_kw) as maxck
from ((select t.*
from temperatures t
where temperature_value <= ACTUAL_TEMPERATURE
order by temperature_value desc
limit 1
) union
(select t.*
from temperatures t
where temperature_value >= ACTUAL_TEMPERATURE
order by temperature_value asc
limit 1
)
) t;
Finally, do some arithmetic to get the weighted average:
select (case when maxtv = mintv then minkw
else minkw + (ACTUAL_TEMPERATURE - mintv) * ((maxkw - minkw) / (maxtv - mintv))
end)
from (select min(temperature_value) as mintv, max(temperature_value) as maxtv,
min(temperature_current_kw) as minkw, max(temperature_current_kw) as maxkw
from ((select t.*
from temperatures t
where temperature_value <= ACTUAL_TEMPERATURE
order by temperature_value desc
limit 1
) union
(select t.*
from temperatures t
where temperature_value >= ACTUAL_TEMPERATURE
order by temperature_value asc
limit 1
)
) t
) t;

Related

Running maths over an entire database and ranking all users

I have a database of bets. Each bet has a 'Win', 'Loss', or 'Pending' state. What I want to do is to have an SQL statement that will get the last, say, 20 bets a user has placed, find out their ROI (Total profit / Total staked * 100).
So I'm just wondering if there is a better way to do this. Do I basically have to get the users table, loop over every user, get their last 20 bets, find the ROI and then order it. If my User table gets huge then this process is going to take ages, right?
Is creating a 'View' going to save on this time?
Is there a way to do this in one statement that won't cost my life in processing time?
Here are the tables
Users
| ID | User |
| 1 | Test1 |
| 2 | Test2 |
| 3 | Test3 |
| 4 | Test4 |
Bets
| ID | User | Amount | Odds | Result |
| 1 | 1 | 10 | 1.35 | Win |
| 2 | 1 | 25 | 2.55 | Win |
| 3 | 3 | 15 | 1.65 | Loss |
| 4 | 2 | 11 | 2.12 | Pending |
Se essentially I would like a table that ranks them as ROI.
| User | AmountBet | AmountWon | ROI |
| 1 | 35 | 77 | 215 |
| 2 | 11 | 0 | 0 |
| 3 | 15 | 0 | 0 |
| 4 | 0 | 0 | 0 |
Assuming the ID of the bets table represents increasing time such that it can be used to identify "last 20", then
WITH b
AS
(
SELECT id,
user,
CASE WHEN result = 'Pending' THEN 0 ELSE amount END AS amount,
CASE WHEN result = 'Win' THEN amount * odds ELSE 0 END as winnings,
ROW_NUMBER() OVER (PARTITION BY user ORDER BY id DESC) AS rownum
FROM bets
)
SELECT user,
SUM(amount) AS amount_bet,
SUM(winnings) AS amount_won,
CASE
WHEN SUM(amount) > 0
THEN SUM(winnings) * 100 / SUM(amount)
ELSE 0
END AS roi
FROM b
WHERE rownum < 21
GROUP BY user;
dbfiddle.uk

Grouping when using analytic functions

Let's suppose we have a table that looks like this:
Level|Depth|Descrip|
0 | 0 | Base |
1 | 50 | Level_1 |
2 | 53 | Level_2 |
3 | 60 | Level_3 |
8 | 80 | Level_8 |
10 | 81 | Level_10|
15 | 101 | Level_15|
16 | 102 | Level_16|
17 | 102 | Level_16_bis|
18 | 103 | Level_17|
I need, in first place, to get the rows that represent significative(more than 15 mts) depth jump respecting the previous ones. I get those rows doing something like this:
Select level,depth, descrip from(
Select level
, depth
,lag(depth) over (order by level asc) as prev_depth
, descrip
from ground_levels
)
Where abs(depth-prev_depth) > 15 and depth > 0
Which give me a table like this:
Level|Depth|Descrip|
1 | 50 | Level_1|
8 | 80 | Level_8|
15 | 101 | Level_15|
Now, I need to collect the levels that falls in between the jumps. So, I need something like this:
Level|Depth| Descrip | Equivalent_levels |
1 | 50 | Level_1 | 2,3 |
8 | 80 | Level_8 | 10 |
15 | 101 | Level_15| 16,17,18 |
I have being doing some searching about use "listagg", rank() and other analytic functions but I'm stuck with the script :(
In addition, it would be great if I can start a grouping when this condition is meet: abs(depth-prev_depth) > 15, so I can get something like that:
Level|Depth|Descrip | Group_ID
1 | 50 | Level_1 | 1 |
2 | 53 | Level_2 | 1 |
3 | 60 | Level_3 | 1 |
8 | 80 | Level_8 | 2 |
10 | 81 | Level_10| 2 |
15 | 101 | Level_15| 3 |
16 | 102 | Level_16| 3 |
17 | 102 | Level_16_bis| 3 |
18 | 103 | Level_17| 3 |
Any ideas ??
P.S: Sorry my bad english...
You can use a cumulative sum to define the groups. And then aggregation:
Select min(level) as level,
min(depth) keep (dense_rank first order by level) as depth,
min(descrip) keep (dense_rank first order by level) as descrip,
list_agg(level, ',') within group (order by level) as levels
from (select gl.*,
sum(case when abs(prev_depth - depth) > 15 and depth > 0 then 1 else 0 end) over (order by level) as grp
from (select gl.*, lag(depth) over (order by level asc) as prev_depth
from ground_levels
) gl
) gl
group grp;
This actually keeps the starting level in the list. It can be removed, but that requires a bit more work.

SQL select with preference on column values

I am new to SQL and I would like to ask about how to select entries based on preferences and grouping.
+----------+----------+------+
| ENTRY_ID | ROUTE_ID | TYPE |
+----------+----------+------+
| 1 | 15 | 0 |
| 1 | 26 | 1 |
| 1 | 39 | 1 |
| 2 | 22 | 1 |
| 2 | 15 | 1 |
| 3 | 30 | 1 |
| 3 | 35 | 0 |
| 3 | 40 | 1 |
+----------+----------+------+
With the table above, I would like to select 1 entry for each ENTRY_ID with the following preference for the returned ROUTE_ID:
IF TYPE = 0 is available
for any one of the entries with the same ENTRY_ID, return the minimum ROUTE_ID for all entries with TYPE = 0
IF for the same ENTRY_ID only TYPE = 1 is available, return the minimum ROUTE_ID
The expected outcome for the query will be the following:
+----------+----------+------+
| ENTRY_ID | ROUTE_ID | TYPE |
+----------+----------+------+
| 1 | 15 | 0 |
| 2 | 15 | 1 |
| 3 | 35 | 0 |
+----------+----------+------+
Thank you for your help!
You can group by both TYPE and ENTRY_ID, and then use the HAVING clause to filter out those where TYPE is not the minimal value for that record.
SELECT ENTRY_ID, MIN(ROUTE_ID), TYPE
FROM MyTable
GROUP BY ENTRY_ID, TYPE
HAVING TYPE = (SELECT MIN(s.TYPE) FROM MyTable s WHERE s.ENTRY_ID = MyTable.ENTRY_ID)
This relies on type only being able to be 0 or 1. If there are more possible values, it will only return the lowest type.
If you want complete rows, use a correlated subquery:
select t.*
from t
where t.route_id = (select top 1 t2.route_id
from t as t2
where t2.entry_id = t.entry_id
order by iif(t2.type = 0, 1, 2), -- put type 0 first
t2.route_id asc -- then the first route_id
);
This has the advantage that it can return more than just the three columns you show in the question.

Record batching on bases of running total values by specific number (FileSize wise batching)

We are dealing with large recordset and are currently using NTILE() to get the range of FileIDs and then using FileID column in BETWEEN clause to get specific records set. Using FileID in BETWEEN clause is a mandatory requirement from Developers. So, we cannot have random FileIDs in one batch, it has to be incremental.
As per new requirement, we have to make range based on FileSize column, e.g. 100 GB per batch.
For example:
Batch 1 : 1 has 100 size So ID: 1 record only.
Batch 2 : 2,3,4,5 = 80 but it is < 100 GB, so have to take FileId 6 if 120 GB (Total 300 GB)
Batch 3 : 7 ID has > 100 so 1 record only
And so on…
Below are my sample code, but it is not giving the expected result:
CREATE TABLE zFiles
(
FileId INT
,FileSize INT
)
INSERT INTO dbo.zFiles (
FileId
,FileSize
)
VALUES (1, 100)
,(2, 20)
,(3, 20)
,(4, 30)
,(5, 10)
,(6, 120)
,(7, 400)
,(8, 50)
,(9, 100)
,(10, 60)
,(11, 40)
,(12, 5)
,(13, 20)
,(14, 95)
,(15, 40)
DECLARE #intBatchSize FLOAT = 100;
SELECT y.FileID ,
y.FileSize ,
y.RunningTotal ,
DENSE_RANK() OVER (ORDER BY CEILING(RunningTotal / #intBatchSize)) Batch
FROM ( SELECT i.FileID ,
i.FileSize ,
RunningTotal = SUM(i.FileSize) OVER ( ORDER BY i.FileID ) -- RANGE UNBOUNDED PRECEDING)
FROM dbo.zFiles AS i WITH ( NOLOCK )
) y
ORDER BY y.FileID;
Result:
+--------+----------+--------------+-------+
| FileID | FileSize | RunningTotal | Batch |
+--------+----------+--------------+-------+
| 1 | 100 | 100 | 1 |
| 2 | 20 | 120 | 2 |
| 3 | 20 | 140 | 2 |
| 4 | 30 | 170 | 2 |
| 5 | 10 | 180 | 2 |
| 6 | 120 | 300 | 3 |
| 7 | 400 | 700 | 4 |
| 8 | 50 | 750 | 5 |
| 9 | 100 | 850 | 6 |
| 10 | 60 | 910 | 7 |
| 11 | 40 | 950 | 7 |
| 12 | 5 | 955 | 7 |
| 13 | 20 | 975 | 7 |
| 14 | 95 | 1070 | 8 |
| 15 | 40 | 1110 | 9 |
+--------+----------+--------------+-------+
Expected Result:
+--------+---------------+---------+
| FileID | FileSize (GB) | BatchNo |
+--------+---------------+---------+
| 1 | 100 | 1 |
| 2 | 20 | 2 |
| 3 | 20 | 2 |
| 4 | 30 | 2 |
| 5 | 10 | 2 |
| 6 | 120 | 2 |
| 7 | 400 | 3 |
| 8 | 50 | 4 |
| 9 | 100 | 4 |
| 10 | 60 | 5 |
| 11 | 40 | 5 |
| 12 | 5 | 6 |
| 13 | 20 | 6 |
| 14 | 95 | 6 |
| 15 | 40 | 7 |
+--------+---------------+---------+
We can achieve this if somehow we can reset the running total once it gets over 100. We can write a loop to have this result, but for that we need to go record by record, which is time consuming.
Please somebody help us on this?
You need to do this with a recursive CTE:
with cte as (
select z.fileid, z.filesize, z.filesize as batch_filesize, 1 as batchnum
from zfiles z
where z.fileid = 1
union all
select z.fileid, z.filesize,
(case when cte.batch_filesize + z.filesize > #intBatchSize
then z.filesize
else cte.batch_filesize + z.filesize
end),
(case when cte.batch_filesize + z.filesize > #intBatchSize
then cte.batchnum + 1
else cte.batchnum
end)
from cte join
zfiles z
on z.fileid = cte.fileid + 1
)
select *
from cte;
Note: I realize that fileid probably is not a sequence. You can create a sequence using row_number() in a CTE, to make this work.
There is a technical reason why running sums don't work for this. Essentially, any given fileid needs to know the breaks before it.
Small modification on above answered by Gordon Linoff and got expected result.
DECLARE #intBatchSize INT = 100
;WITH cte as (
select z.fileid, z.filesize, z.filesize as batch_filesize, 1 as batchnum
from zfiles z
where z.fileid = 1
union all
select z.fileid, z.filesize,
(case when cte.batch_filesize >= #intBatchSize
then z.filesize
else cte.batch_filesize + z.filesize
end),
(case when cte.batch_filesize >= #intBatchSize
then cte.batchnum + 1
else cte.batchnum
end)
from cte join
zfiles z
on z.fileid = cte.fileid + 1
)
select *
from cte;

SQL: Complex query with subtraction from different cells

I have two tables and I want to combine their data.
The first table
+------------+-----+------+-------+
| BusinessID | Lat | Long | Stars |
+------------+-----+------+-------+
| abc123 | 32 | 74 | 4.5 |
| abd123 | 32 | 75 | 4 |
| abe123 | 33 | 76 | 3 |
+------------+-----+------+-------+
The second table is:
+------------+-----+------+-------+
| BusinessID | day | time | count |
+------------+-----+------+-------+
| abc123 | 1 | 14 | 5 |
| abc123 | 1 | 15 | 6 |
| abc123 | 2 | 13 | 1 |
| abd123 | 4 | 12 | 4 |
| abd123 | 4 | 13 | 8 |
| abd123 | 5 | 11 | 2 |
+------------+-----+------+-------+
So what I want to do is find all the Businesses that are in a specific radius and have more check ins in the next hour than the current.
So the results are
+------------+
| BusinessID |
+------------+
| abd123 |
| abc123 |
+------------+
Because they have more check-ins in the next hour than the previous (6 > 5, 8 > 4)
What is more it would be helpful if the results where ordered by their difference in check-ins number. Ex. ( 8 - 4 > 6 - 5 )
SELECT *
FROM table2 t2
WHERE t2.BusinessID IN (
SELECT t1.BusinessID
FROM table1 t1
WHERE earth_box(ll_to_earth(32, 74), 4000/1.609) #> ll_to_earth(Lat, Long)
ORDER by earth_distance(ll_to_earth(32, 74), ll_to_earth(Lat, Long)), stars DESC
) AND checkin_day = 1 AND checkin_time = 14;
From the above query I can find the businesses in a radius and then find their check-ins in the specified time. Ex. 14. What I need to do now is to find the number of check-ins in the 15 hour (of the same businesses) and find if the number of the check-ins is greater than it was in the previous time.
I think you want something like this:
SELECT
t1.BusinessID
FROM
table1 t1
JOIN
(SELECT
*,
"count" - LAG("count") OVER (PARTITION BY BusinessID, "day" ORDER BY "time") "grow"
FROM
table2
WHERE
/* Some condition on table2 */) t2
ON t1.BusinessID = t2.BusinessID AND t2.grow > 0
WHERE
/* Some condition on table1 */
ORDER BY
t2.grow DESC;