Redshift Min Window Function over multiple dates/ IDs

Redshift Min Window Function over multiple dates/ IDs - sql

I have some data that looks like this (trunced):
date listing_id inquiry_id listed_on inquiry_date days_between_list_inquiry
2021-06-08 957 16891 2021-06-08T00:00:00.000Z 2020-12-22 168
2021-06-09 957 17045 2021-06-09T00:00:00.000Z 2020-12-22 169
2021-06-09 957 16985 2021-06-09T00:00:00.000Z 2020-12-22 169
2021-03-04 1117 6869 2021-03-04T00:00:00.000Z 2021-03-01 3
2021-03-05 1117 6933 2021-03-05T00:00:00.000Z 2021-03-01 4
2021-03-08 1117 7212 2021-03-08T00:00:00.000Z 2021-03-01 7
2021-03-11 1117 7449 2021-03-11T00:00:00.000Z 2021-03-01 10
The table captures a daily record of each listing on the day level.
For each listing_id, I'd like to create column that captures the first_inquiry_date related to that listing. So, for listing_id 957, that would be 2020-12-22; for ID 1117, it would be 2021-03-01.
I tried:
min(date_trunc('day',li.created_at)) over (order by ll.id asc, date asc rows unbounded preceding) as min_inquiry_date,
and
min(date_trunc('day',li.created_at)) over (order by ll.id date rows unbounded preceding) as min_inquiry_date,
and a variety of other order bys but I'm not getting what I'm looking for.
Any help would be greatly appreciated. Thank you!

You need a partition by:
min(date_trunc('day',li.created_at)) over (partition by listing_id) as min_inquiry_date

you can just do this
select
listing_id,
min(inquiry_date) as first_inquiry_date
from [table name]
group by 1

Related

Why does my cumulative column not work as expected?

Here is my query , I have a column called cum_balance which is supposed to calculate the cumulative balance but after row number 10 there is an anamoly and it doesn't work as expected , all I notice is that from row number 10 onwards the hour column has same value. What's the right syntax for this?
[select
hour,
symbol,
amount_usd,
category,
sum(amount_usd) over (
order by
hour asc RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as cum_balance
from
combined_transfers_usd_netflow
order by
hour][1]
I have tried removing RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW , adding a partition by hour and group by hour. None of them gave the expected result or errors
Row Number
Hour
SYMBOL
AMOUNT_USD
CATEGORY
CUM_BALANCE
1
2021-12-02 23:00:00
WETH
227.2795
in
227.2795
2
2021-12-03 00:00:00
WETH
-226.4801153
out
0.7993847087
3
2022-01-05 21:00:00
WETH
5123.716203
in
5124.515587
4
2022-01-18 14:00:00
WETH
-4466.2366
out
658.2789873
5
2022-01-19 00:00:00
WETH
2442.618599
in
3100.897586
6
2022-01-21 14:00:00
USDC
99928.68644
in
103029.584
7
2022-03-01 16:00:00
UNI
8545.36098
in
111574.945
8
2022-03-04 22:00:00
USDC
-2999.343
out
108575.602
9
2022-03-09 22:00:00
USDC
-5042.947675
out
103532.6543
10
2022-03-16 21:00:00
USDC
-4110.6579
out
98594.35101
11
2022-03-16 21:00:00
UNI
-3.209306045
out
98594.35101
12
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
13
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
14
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
15
2022-03-16 21:00:00
UNI
-6.418612089
out
98594.35101

The "problem" with your data in all the ORDER BY values after row 10 are the same.
So if we shrink the data down a little, and use for groups to repeat the experiment:
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,sum(val) over ( partition by grp order by date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as cum_val_1
,sum(val) over ( partition by grp order by date ) as cum_val_2
from data as d
order by 1,2;
we get:
GRP
DATE
VAL
CUM_VAL_1
CUM_VAL_2
1
2021-01-01
10
10
10
1
2021-01-02
11
21
21
1
2021-01-03
12
33
33
2
2021-01-01
20
20
20
2
2021-01-02
21
63
63
2
2021-01-02
22
63
63
2
2021-01-04
23
86
86
we see with group 1 that values accumulate as we expect. So for group 2 we put duplicate values as see those rows get the same value, but rows after "work as expected again".
This tells us how this function work across unstable data (values that sort the same) is that they are all stepped in one leap.
Thus if you want each row to be different they will need better ORDER distinctness. This could be forced by add random values of literal random nature, or feeling non random ROW_NUMBER, but really they would be random, albeit not explicit, AND if you use random, you might get duplicates, thus really should use ROW_NUMBER or SEQx to have unique values.
Also the second formula shows they are equal, and it's the ORDER BY problem not the framing of "which rows" are used.
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,seq8() as s
,sum(val) over ( partition by grp order by date ) as cum_val_1
,sum(val) over ( partition by grp order by date, s ) as cum_val_2
,sum(val) over ( partition by grp order by date, seq8() ) as cum_val_3
from data as d
order by 1,2;
gives:
GRP
DATE
VAL S
CUM_VAL_1
CUM_VAL_2
CUM_VAL_2_2
1
2021-01-01
10
0
10
10
1
2021-01-02
11
1
21
21
1
2021-01-03
12
2
33
33
2
2021-01-01
20
3
20
20
2
2021-01-02
21
4
63
41
2
2021-01-02
22
5
63
63
2
2021-01-04
23
6
86
86

PostgreSQL: How to show personal bests by comparing to previous runs

I'm looking to create the "fastest_run_time" column in PostgreSQL by looking at what the "current" personal best is as of the month of that row. So for example:
In 2016-07 my personal best was 762, it was beaten by a 720 in 2016-08
Since the run on 2016-09 of 745 isn't an improvement on 720, the fastest_run_time should stay as 720
It's only updated again when it is beaten with a 691 in 2016-12.
I've tried doing some partitioning and max/mins and have got it into this format but can't really see where to go from here

if the partition by syntax is supported:
select mt.*,
min(run_time) over
(partition by run_type
order by period
rows between unbounded preceding and current row) as fastest_run_time
from mytbl mt

Just a subquery:
select run_type, to_char(period, 'YYYY-MM'), run_time, (
select min(rs.run_time) from run rs
where rs.period <= run.period
) fastest_run_time from run;
Demo with schema.
Result:
run_type
to_char
run_time
fastest_run_time
A
2021-05
A
2021-06
762
762
A
2021-07
762
762
A
2021-08
720
720
A
2021-09
745
720
A
2021-10
745
720
A
2021-11
745
720
A
2021-12
691
691

combine two rows with 2 months into one row of one month, containing null values into one

I would like to have a dataframe where 1 row only contains one month of data.
month cust_id closed_deals cum_closed_deals checkout cum_checkout
2019-10-01 1 15 15 null null
2019-10-01 1 null 15 210 210
2019-11-01 1 27 42 null 210
2019-11-01 1 null 42 369 579
Expected result:
month cust_id closed_deals cum_closed_deals checkout cum_checkout
2019-10-01 1 15 15 210 210
2019-11-01 1 27 42 369 579
At first, I thought a normal groupby will work, but as I try to group by only by "month" and "cust_id", I got an error saying that closed_deals and checkout also need to be in the groupby.

You may simply aggregate by the (first of the) month and cust_id and take the max of all other columns:
SELECT
month,
cust_id,
MAX(closed_deals) AS closed_deals,
MAX(cum_closed_deals) AS cum_closed_deals,
MAX(checkout) AS checkout,
MAX(cum_checkout) AS cum_checkout
FROM yourTable
GROUP BY
month,
cust_id;

How to calculate RSI in SQL Server?

Can somebody give me an idea how to calculate the RSI (Relative Strength Index)?
The formula for RSI:
100 - 100 / (1 + RS)
RS (Relative Strength) = AverageGain / AverageLoss
RSI calculation is based on 14 periods. Losses are expressed as positive values, not negative values.
The very first calculations for average gain and average loss are simple 14-period averages:
First Average Gain = Sum of Gains over the past 14 periods / 14
First Average Loss = Sum of Losses over the past 14 periods / 14
The second, and subsequent, calculations are based on the prior averages and the current gain loss:
Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.
Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.
Below is my query which is not working because something in my recursion part is wrong. The way I calculate ROW_NUMBER: my first row is relative to the most recent trading day. I also make calculations over a 250 periods because of additional explanations, not relative to the calculation:
WITH bas AS
(
SELECT *
FROM
(SELECT
p.CurrentTicker, p.ClosePrice, p.PriceDate, p.ClosePriceChange,
(CASE WHEN p.ClosePriceChange > 0 THEN p.ClosePriceChange ELSE 0 END) Gain,
(CASE WHEN p.ClosePriceChange < 0 THEN ABS(p.ClosePriceChange) ELSE 0 END) Loss,
rownum
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY CurrentTicker ORDER BY PriceDate DESC) rownum
FROM
PriceHistory
WHERE
PriceDate >= DATEADD(day, -400, GETDATE())) p
INNER JOIN
Ticker t ON p.CurrentTicker = t.CurrentTicker
WHERE
rownum <= 250) a
),
rec AS
(
SELECT
CurrentTicker, PriceDate, ClosePriceChange, rownum,
AvgGain = SUM(Gain) OVER (PARTITION BY CurrentTicker ORDER BY PriceDate ROWS BETWEEN 13 PRECEDING and CURRENT ROW) / 14,
AvgLoss = SUM(Loss) OVER (PARTITION BY CurrentTicker ORDER BY PriceDate ROWS BETWEEN 13 PRECEDING and CURRENT ROW) / 14
FROM
bas
WHERE
rownum = (SELECT MAX(rownum) - 14 FROM bas)
UNION ALL
SELECT
bas.CurrentTicker, bas.PriceDate, bas.ClosePriceChange, bas.rownum,
a.AvgGain, a.AvgLoss
FROM
rec
INNER JOIN
bas ON rec.rownum - 1 = bas.rownum
AND rec.CurrentTicker = bas.CurrentTicker
CROSS APPLY
(SELECT
(rec.AvgGain * 13 + bas.Gain) / 14 AS AvgGain,
(rec.AvgLoss * 13 + bas.Loss)/14 as AvgLoss) a
)
SELECT *
FROM rec
WHERE CurrentTicker = 'AAPL'
ORDER BY PriceDate DESC
OPTION (MAXRECURSION 0)
Here is AAPL prices table:
PriceDate
ClosePrice
2021-02-23
125.86
2021-02-22
126
2021-02-19
129.87
2021-02-17
130.84
2021-02-16
133.19
2021-02-12
135.37
2021-02-11
135.13
2021-02-10
135.39
2021-02-09
136.01
2021-02-08
136.91
2021-02-05
136.76
2021-02-04
137.39
2021-02-03
133.94
2021-02-02
134.99
2021-02-01
134.14
2021-01-29
131.96
2021-01-28
137.09
2021-01-27
142.06
2021-01-26
143.16
2021-01-25
142.92
2021-01-22
139.07
2021-01-21
136.87
2021-01-20
132.03
2021-01-19
127.83
2021-01-15
127.14
2021-01-14
128.91
2021-01-13
130.89
2021-01-12
128.8
2021-01-11
128.98
2021-01-08
132.05
2021-01-07
130.92
2021-01-06
126.6
2021-01-05
131.01
2021-01-04
129.41
2020-12-31
132.69
2020-12-30
133.72
2020-12-29
134.87
2020-12-28
136.69
2020-12-24
131.97
2020-12-23
130.96
2020-12-22
131.88
2020-12-21
128.23
2020-12-18
126.655
2020-12-17
128.7
2020-12-16
127.81
2020-12-15
127.88
2020-12-14
121.78
2020-12-11
122.41
2020-12-10
123.24
2020-12-09
121.78
Now I hope someone will be able to help me. Thanks in advance.

Postgresql: Average for each day in interval

I have table that is structured like this:
item_id first_observed last_observed price
1 2016-10-21 2016-10-27 121
1 2016-10-28 2016-10-31 145
2 2016-10-22 2016-10-28 135
2 2016-10-29 2016-10-30 169
What I want is to get the average price for every day. I obviously cannot just group by first_observed or last_observed. Does Postgres offer a smart way of doing this?
The expected output would be like this:
date avg(price)
2016-10-21 121
2016-10-22 128
2016-10-23 128
2016-10-24 128
2016-10-25 128
2016-10-26 128
2016-10-27 128
2016-10-28 140
2016-10-29 157
2016-10-30 157
2016-10-31 157
I could also be outputted like this (both are fine):
start end avg(price)
2016-10-21 2016-10-21 121
2016-10-22 2016-10-27 128
2016-10-28 2016-10-28 140
2016-10-29 2016-10-31 157

demo:db<>fiddle
generate_series allows you to expand date ranges:
First step:
SELECT
generate_series(first_observed, last_observed, interval '1 day')::date as observed,
AVG(price)::int as avg_price
FROM items
GROUP BY observed
ORDER BY observed
expanding the date range
grouping the dates for AVG aggregate
Second step
SELECT
MIN(observed) as start,
MAX(observed) as end,
avg_price
FROM (
-- <first step as subquery>
)s
GROUP BY avg_price
ORDER BY start
Grouping by avg_price to get the MIN/MAX date for it

WITH ObserveDates (ObserveDate) AS (
SELECT * FROM generate_series((SELECT MIN(first_observed) FROM T), (SELECT MAX(last_observed) FROM T), '1 days')
)
SELECT ObserveDate, AVG(Price)
FROM ObserveDates
JOIN T ON ObserveDate BETWEEN first_observed AND last_observed
GROUP BY ObserveDate
ORDER BY ObserveDate

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Redshift Min Window Function over multiple dates/ IDs - sql

You need a partition by: min(date_trunc('day',li.created_at)) over (partition by listing_id) as min_inquiry_date

you can just do this select listing_id, min(inquiry_date) as first_inquiry_date from [table name] group by 1

Related

Why does my cumulative column not work as expected?

PostgreSQL: How to show personal bests by comparing to previous runs

combine two rows with 2 months into one row of one month, containing null values into one

How to calculate RSI in SQL Server?

Postgresql: Average for each day in interval

Categories

Resources