SQL Window function to get 2 closest events - sql

I was trying to solve the "analyze weather patterns" problem as described here (https://joins-238123.netlify.com/window-functions/)
You're worried that hurricanes are happening more frequently, so you
decide to do a tiny bit of analysis. For each kind of weather event
find the 2 events that occurred the closest together and when they
happened
Table weather with data like:
type day
rain 6
rain 12
thunderstorm 13
rain 21
rain 27
rain 37
rain 44
rain 54
thunderstorm 56
rain 58
rain 61
rain 65
rain 68
rain 73
rain 82
hurricane 87
rain 92
rain 95
rain 98
rain 108
thunderstorm 111
rain 118
rain 123
rain 128
rain 131
hurricane 135
rain 136
rain 140
rain 149
thunderstorm 158
rain 159
rain 167
rain 175
hurricane 178
rain 179
rain 186
rain 192
rain 200
thunderstorm 202
rain 210
rain 219
thunderstorm 222
rain 226
rain 232
thunderstorm 238
rain 241
rain 246
rain 253
thunderstorm 257
rain 257
rain 267
rain 277
rain 286
rain 295
rain 302
rain 307
thunderstorm 312
rain 316
rain 325
thunderstorm 330
I could come up with :
select type, day, COALESCE(day - LAG(day, 1) over (partition by type order by day), 0) as days_since_previous from weather
It gives me results like:
type day days_since_previous
hurricane 87 0
hurricane 135 48
hurricane 178 43
rain 6 0
rain 12 6
rain 21 9
rain 27 6
But I can't get it to narrow the results down to the 2 closest events and only display the days between them.
How do I go about doing so that I get the desired result like:
type day days_since_previous
rain 61 3
hurricane 178 43
thunderstorm 238 16

You can use another window function to widdle down the rows:
SELECT type, day, days_since_previous
FROM (
SELECT type, day, (day - prev_day) AS days_since_previous,
ROW_NUMBER() OVER(PARTITION BY type ORDER BY (day - prev_day)) AS RowNum
FROM (
select type, day,
LAG(day, 1) over (partition by type order by day) as prev_day
from weather
) src
WHERE prev_day IS NOT NULL -- Ignore "first" events
) src
WHERE RowNum = 1
order by day
I also removed the COALESCE since that was causing the "first" events to be included in the calculations.

If you don't insist on displaying the day value - you could run a nested query:
In one SELECT (in a WITH clause, or a nested sub-select) add the gap to previous day as an OLAP function, as you suggest. No need to COALESCE, really ..
From that fullselect , run a GROUP BY select.
Like so:
WITH
w_gap2prev AS (
SELECT
*
, day - LAG(day) OVER(PARTITION BY type ORDER BY day) AS gap
FROM input
)
SELECT
type
, MIN(gap) AS days_since_previous
FROM w_gap2prev
WHERE gap IS NOT NULL
GROUP BY type
;
-- out type | days_since_previous
-- out --------------+---------------------
-- out hurricane | 43
-- out rain | 3
-- out thunderstorm | 16
-- out (3 rows)
-- out
-- out Time: First fetch (3 rows): 56.441 ms. All rows formatted: 56.479 ms

Related

Summing column that is grouped - SQL

I have a query:
SELECT
date,
COUNT(o.row_number)FILTER (WHERE o.row_number > 1 AND date_ddr IS NOT NULL AND telephone_number <> 'Anonymous' ) repeat_calls_24h
(
SELECT
telephone_number,
date_ddr,
ROW_NUMBER() OVER(PARTITION BY ddr.telephone_number ORDER BY ddr.date) row_number,
FROM
table_a
)o
GROUP BY 1
Generating the following table:
date
Repeat calls_24h
17/09/2022
182
18/09/2022
381
19/09/2022
81
20/09/2022
24
21/09/2022
91
22/09/2022
110
23/09/2022
231
What can I add to my query to provide a sum of the previous three days as below?:
date
Repeat calls_24h
Repeat Calls 3d
17/09/2022
182
18/09/2022
381
19/09/2022
81
644
20/09/2022
24
486
21/09/2022
91
196
22/09/2022
110
225
23/09/2022
231
432
Thanks
We can do it using lag.
select "date"
,"Repeat calls_24h"
,"Repeat calls_24h" + lag("Repeat calls_24h") over(order by "date") + lag("Repeat calls_24h", 2) over(order by "date") as "Repeat Calls 3d"
from t
date
Repeat calls_24h
Repeat Calls 3d
2022-09-17
182
null
2022-09-18
381
null
2022-09-19
81
644
2022-09-20
24
486
2022-09-21
91
196
2022-09-22
110
225
2022-09-23
231
432
Fiddle

how to select a value based on multiple criteria

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.
I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.
I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.
My current query is this:
SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`
WHERE ((offers * accepted_sale_price < 5000000)
AND (offers < 15)
AND (house_bid = sale_price OR
house_ask = sale_price))) as bid_ask_off,
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`) as
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100
Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:
but am instead receiving
I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.
It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.
Any help or advice would be greatly appreciated, and I hope I've provided enough information.
Here is a sample dataframe.
neighborhood date_of_sale offers accepted_sale_price house_bid house_ask
bronx 4/1/2022 3 323 320 323
manhattan 4/1/2022 4 244 230 244
manhattan 4/1/2022 8 856 856 900
queens 4/1/2022 15 110 110 135
brooklyn 4/2/2022 12 115 100 115
manhattan 4/2/2022 9 255 255 275
bronx 4/2/2022 6 330 300 330
queens 4/2/2022 10 405 395 405
brooklyn 4/2/2022 4 254 254 265
staten_island 4/3/2022 2 442 430 442
staten_island 4/3/2022 13 195 195 225
bronx 4/3/2022 4 650 650 690
manhattan 4/3/2022 2 286 266 286
manhattan 4/3/2022 6 356 356 400
staten_island 4/4/2022 4 361 361 401
staten_island 4/4/2022 5 348 348 399
bronx 4/4/2022 8 397 340 397
manhattan 4/4/2022 9 333 333 394
manhattan 4/4/2022 11 392 325 392
I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY neighborhood) s
ON h.neighborhood = s.neighborhood
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;
Or the following which modifies more the initial query but may be more like what you need.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
date_of_sale dos,
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY
neighborhood,
date_of_sale) s
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;

Get Value Difference and Time Stamp Difference from SQL Table that is not Ideal

This problem is way over my head. Can a report be created from the table below that will search the common date stamps and return Tank1Level difference? A few issues can occur, like the day on the time stamp can change and there can be 3 to 5 entries in the database that per tank filling process.
The report would show how much the tank was filled with the last t_stamp and last T1_Lot.
Here is the Data Tree
index T1_Lot Tank1Level Tank1Temp t_stamp quality_code
30 70517 - 1 43781.1875 120 7/10/2017 6:43 192
29 70517 - 1 242.6184692 119 7/10/2017 0:54 192
26 70617 - 2 242.6184692 119 7/10/2017 0:51 192
23 70617 - 2 44921.03516 134 7/8/2017 14:22 192
22 70617 - 2 892.652771 107 7/8/2017 8:29 192
21 62917 - 3 892.652771 107 7/8/2017 8:28 192
20 62917 - 3 42352.94141 124 7/6/2017 13:15 192
19 62917 - 3 5291.829102 121 7/6/2017 8:06 192
18 62917 - 2 5273.518066 121 7/6/2017 8:05 192
17 60817 - 2 444.0375366 97 7/6/2017 7:23 192
16 60817 - 2 476.0814819 97 7/5/2017 18:09 192
11 62817 - 3 45374.23047 113 6/30/2017 11:38 192
Here is what the report should look like.
At 7/10/2017 6:43 T1_Lot = 70517 - 1, Tank1Level difference = 43,629., and took 5:52.
At 7/8/2017 14:22 T1_Lot = 70517 - 1, Tank1Level difference = 44,028, and took 5:54.
At 7/6/2017 13:15 T1_Lot = 62917 - 3, Tank1Level difference = 41877, and took 5:10.
Here is how that was calculated.
Find the top time stamp with a value > 40,000 in Tank1Level,
Then Find the Next > 40000 in Tank Level.
Go one index up..
or it could be done with less than 8 hours accumulated
as you can see from the second report line there is data that should be ignored.
Report that last t_stamp of the series with the T1_Lot.
Calculate the difference in Tank1Level and report
Then Calculate the t_stamp difference in hh:mm and report.
Based on the data you provided, a self join might work.
from yourTable beforeFill join yourTable afterFill on beforeFill.t1_lot = afterFill.t1_lot
and beforeFill.index = afterFill.index - 1

Date operation with round giving wrong results (oracle sql on db.grussell.org)

I'm practicing SQL from the execises at db.grussell.org and excecise 10 from tutorial 5 (https://db.grussell.org/sql/interface.cgi?tn=Tutorial%205&qn=9) asks for how old are employees in months.
Here is the question
How old is each employee in months.
Format this as employee number against age in months.
Round to the nearest whole number of months.
Here is my code (empno is the employee ID, dob is the date of birth as a date type on the employee table
select e.empno,
round( months_between ( trunc(sysdate,'mm'),trunc(dob) ) )
from employee e
Here are my results
EMPLOYEE MONTHS_BETWEEN
1 882
2 845
3 674
4 647
5 705
6 756
7 832
8 714
9 714
10 670
11 658
12 700
13 902
14 853
15 748
16 658
17 683
18 673
19 702
20 640
21 622
22 927
23 751
24 810
25 758
26 689
27 779
28 732
29 586
30 633
31 744
32 695
For the rows where the employee number is 5,10 and 16 the result is incorrect for some reason that I cannot understand, for all the others the result is correct.
I'm suspecting that I'm missing something on the round function or not adding or subtracting some date on the months_between parameters.
Could someone point me in the right direction on what would be wrong with those?
I can't really understand why some of the data is correct and some isn't...
There is a chance that the exercise is wrong but I would not be able to tell.
The reason is because you are truncating sysdate. So, as of this month (July 2017), you are measuring the age as of July 1st, instead of the current date.
Instead just do:
select e.empno,
round(months_between(sysdate, e.dob)) as age_in_months
from employee e;

SQL order dates sequentially by year

I have a SQL view that produces the following list of Mondays in a specific date range as shown below:
Date Number
16/12/2013 208
23/12/2013 190
30/12/2013 187
15/12/2014 203
22/12/2014 190
29/12/2014 153
14/12/2015 225
21/12/2015 217
28/12/2015 223
Is it possible to order them by the first of each year then the second then the third etc. to give me the results as shown below:
Date Number
16/12/2013 208
15/12/2014 203
14/12/2015 225
23/12/2013 190
22/12/2014 190
21/12/2015 217
30/12/2013 187
29/12/2014 153
28/12/2015 223
Thank you in advance for any help or advice.
I think you should be able to get what you want by using the row_number() over a partition on the year, for example:
Select [Date], [Number],
Row_Number() over (PARTITION BY Year([DATE] order by [DATE]) as WEEK_IN_YR
from [table]
order by WEEK_IN_YR, [Date]
https://msdn.microsoft.com/en-gb/library/ms186734.aspx