I have a table t with columns date, ticker, open, high, low, close.
declare #t table
(
[Datecol] date,
Ticker varchar(10),
[open] decimal (10,2),
[high] decimal (10,2),
[low] decimal (10,2),
[close] decimal(10,2)
)
insert into #t values
('20180215', 'ABC', '122.01', '125.76', '118.79' , '123.29')
,('20180216', 'ABC', '123.02', '130.62', '119.94' , '128.85')
,('20180217', 'ABC', '131.03', '139.80', '129.42' , '136.75')
,('20180218', 'ABC', '136.40', '137.95', '124.32' , '127.38')
,('20180219', 'ABC', '127.24', '138.52', '126.70' , '137.47')
,('20180220', 'ABC', '137.95', '142.01', '127.86' , '128.36')
,('20180215', 'JKL', '9.94', '10.30', '9.77' , '10.17')
,('20180216', 'JKL', '10.15', '10.24', '9.70' , '10.02')
,('20180217', 'JKL', '10.01', '10.18', '9.93' , '10.15')
,('20180218', 'JKL', '10.16', '10.20', '9.23' , '9.38')
,('20180219', 'JKL', '9.37', '9.79', '9.36' , '9.68')
,('20180220', 'JKL', '9.69', '10.01', '9.26' , '9.28')
I'm interested in calculating the daily Average True Range (ATR) for each ticker.
ATR = Max (Today's High, Yesterday's Close) - Min (Today's Low, Yesterday's Close)
Using LAG function, I can get yesterday's close:
SELECT
*,
((LAG([close], 1) OVER (PARTITION BY Ticker ORDER BY [Datecol])) - 0) * 1 AS 'yest_close'
FROM
#t t
Datecol Ticker open high low close yest_close
--------------------------------------------------------------
2018-02-15 ABC 122.01 125.76 118.79 123.29 NULL
2018-02-16 ABC 123.02 130.62 119.94 128.85 123.29
2018-02-17 ABC 131.03 139.80 129.42 136.75 128.85
2018-02-18 ABC 136.40 137.95 124.32 127.38 136.75
2018-02-19 ABC 127.24 138.52 126.70 137.47 127.38
2018-02-20 ABC 137.95 142.01 127.86 128.36 137.47
2018-02-15 JKL 9.94 10.30 9.77 10.17 NULL
2018-02-16 JKL 10.15 10.24 9.70 10.02 10.17
2018-02-17 JKL 10.01 10.18 9.93 10.15 10.02
2018-02-18 JKL 10.16 10.20 9.23 9.38 10.15
2018-02-19 JKL 9.37 9.79 9.36 9.68 9.38
2018-02-20 JKL 9.69 10.01 9.26 9.28 9.68
How do I get max (Today's High, Yesterday's close)?
You can use case (iif in SQL 2012) to find max or min of two values.
Here's a sample
select
*, ATR = iif([high] > yest_close, [high], yest_close) - iif([low] > yest_close, yest_close, [low])
from (
select
*, yest_close = lag([close]) over (partition by Ticker order by [Datecol])
from #t
) t
Output:
Datecol Ticker open high low close yest_close ATR
------------------------------------------------------------------------
2018-02-15 ABC 122.01 125.76 118.79 123.29 NULL NULL
2018-02-16 ABC 123.02 130.62 119.94 128.85 123.29 10.68
2018-02-17 ABC 131.03 139.80 129.42 136.75 128.85 10.95
2018-02-18 ABC 136.40 137.95 124.32 127.38 136.75 13.63
2018-02-19 ABC 127.24 138.52 126.70 137.47 127.38 11.82
2018-02-20 ABC 137.95 142.01 127.86 128.36 137.47 14.15
2018-02-15 JKL 9.94 10.30 9.77 10.17 NULL NULL
2018-02-16 JKL 10.15 10.24 9.70 10.02 10.17 0.54
2018-02-17 JKL 10.01 10.18 9.93 10.15 10.02 0.25
2018-02-18 JKL 10.16 10.20 9.23 9.38 10.15 0.97
2018-02-19 JKL 9.37 9.79 9.36 9.68 9.38 0.43
2018-02-20 JKL 9.69 10.01 9.26 9.28 9.68 0.75
Related
I'm trying to fill nulls in a table with data from the most recent record within a 10-day lookback window in Snowflake SQL.
CREATE TABLE activities
(
activity_id NUMBER,
activity_datetime DATE,
offer_id VARCHAR,
member_id NUMBER
);
INSERT INTO activities (activity_id, activity_datetime, offer_id, member_id)
VALUES (1, '2022-10-01', '1111', 10001)
, (2, '2022-10-05', '5555', 10001)
, (3, '2022-10-09', NULL, 10001)
, (4, '2022-10-09', NULL, 10001)
, (5, '2022-10-13', NULL, 10001)
, (6, '2022-10-13', NULL, 10001)
, (7, '2022-10-17', '18887', 10001)
, (8, '2022-10-21', '23331', 10001)
, (9, '2022-10-25', '27775', 10001)
, (10, '2022-10-29', '32219', 10001)
, (11, '2022-10-01', '1111', 20001)
, (12, '2022-10-05', '5555', 20001)
, (13, '2022-10-09', NULL, 20001)
, (14, '2022-10-09', NULL, 20001)
, (15, '2022-10-13', NULL, 20001)
, (16, '2022-10-13', NULL, 20001)
, (17, '2022-10-17', '18887', 20001)
, (18, '2022-10-21', '23331', 20001)
, (19, '2022-10-25', '27775', 20001)
, (20, '2022-10-29', '32219', 20001);
TABLE:
ACTIVITY_ID
ACTIVITY_DATETIME
OFFER_ID
MEMBER_ID
1
2022-10-01
1111
10001
2
2022-10-05
5555
10001
3
2022-10-09
null
10001
4
2022-10-09
null
10001
5
2022-10-17
null
10001
6
2022-10-17
null
10001
7
2022-10-19
18887
10001
8
2022-10-21
23331
10001
9
2022-10-25
27775
10001
10
2022-10-29
32219
10001
11
2022-10-01
1111
20001
12
2022-10-05
5555
20001
13
2022-10-09
null
20001
14
2022-10-09
null
20001
15
2022-10-17
null
20001
16
2022-10-17
null
20001
17
2022-10-19
18887
20001
18
2022-10-21
23331
20001
19
2022-10-25
27775
20001
20
2022-10-29
32219
20001
DESIRED RESULT:
ACTIVITY_ID
ACTIVITY_DATETIME
OFFER_ID
MEMBER_ID
1
2022-10-01
1111
10001
2
2022-10-05
5555
10001
3
2022-10-09
5555
10001
4
2022-10-09
5555
10001
5
2022-10-17
null
10001
6
2022-10-17
null
10001
7
2022-10-19
18887
10001
8
2022-10-21
23331
10001
9
2022-10-25
27775
10001
10
2022-10-29
32219
10001
11
2022-10-01
1111
20001
12
2022-10-05
5555
20001
13
2022-10-09
5555
20001
14
2022-10-09
5555
20001
15
2022-10-17
null
20001
16
2022-10-17
null
20001
17
2022-10-19
18887
20001
18
2022-10-21
23331
20001
19
2022-10-25
27775
20001
20
2022-10-29
32219
20001
The query below seems to be close but I cannot figure out how to produce the results in an efficient way. The query isn't great because it duplicates results for each of the null rows instead of producing the most recent record within the 10 day lookback window.
WITH activity_nulls AS (SELECT *
FROM activities
WHERE offer_id IS NULL)
, activity_non_null AS (SELECT *
FROM activities
WHERE offer_id IS NOT NULL)
SELECT activity_nulls.activity_id actvity_id_nulls
, activity_nulls.activity_datetime dt_nulls
, activity_non_null.offer_id
, activity_non_null.activity_datetime
FROM activity_nulls
INNER JOIN activity_non_null
ON activity_non_null.member_id = activity_nulls.member_id
WHERE activity_non_null.activity_datetime BETWEEN DATEADD(DAY, -14, activity_non_null.activity_datetime)
AND activity_non_null.activity_datetime;
RESULT:
ACTVITY_ID_NULLS
DT_NULLS
OFFER_ID
ACTIVITY_DATETIME
3
2022-10-09
1111
2022-10-01
3
2022-10-09
5555
2022-10-05
4
2022-10-09
1111
2022-10-01
4
2022-10-09
5555
2022-10-05
5
2022-10-13
1111
2022-10-01
5
2022-10-13
5555
2022-10-05
6
2022-10-13
1111
2022-10-01
6
2022-10-13
5555
2022-10-05
13
2022-10-09
5555
2022-10-05
13
2022-10-09
1111
2022-10-01
14
2022-10-09
5555
2022-10-05
14
2022-10-09
1111
2022-10-01
15
2022-10-13
5555
2022-10-05
15
2022-10-13
1111
2022-10-01
16
2022-10-13
5555
2022-10-05
16
2022-10-13
1111
2022-10-01
The query should only be producing a single row with the '5555' activity_id since it is the most recent record within the 10 day lookback window.
Joins alone are probably not going to get you there. This is a problem that calls out for window functions. The following CTE is in two parts. Part 1 - lag() back to the last non-null offer ID. Part 2 - Find out if the lagged non-null value is within 10 days.
with FILLED_DATES as
(
select activity_id
,activity_datetime
,offer_id
,lag(offer_id, 1, offer_id) ignore nulls over (partition by member_id order by activity_id) as LAGGED_OFFER_ID
,conditional_true_event(OFFER_ID is not null) over (partition by member_id order by activity_id) as DATE_GROUP
,member_id
from ACTIVITIES
)
select activity_id
,activity_datetime
,case
when offer_id is null and
activity_datetime - min(activity_datetime) over (partition by MEMBER_ID, DATE_GROUP order by activity_id) <= 10
then lagged_offer_id
else offer_id
end as offer_id
,MEMBER_ID
from FILLED_DATES order by ACTIVITY_ID
;
ACTIVITY_ID
ACTIVITY_DATETIME
OFFER_ID
MEMBER_ID
1
2022-10-01 00:00:00
1111
10001
2
2022-10-05 00:00:00
5555
10001
3
2022-10-09 00:00:00
5555
10001
4
2022-10-09 00:00:00
5555
10001
5
2022-10-17 00:00:00
null
10001
6
2022-10-17 00:00:00
null
10001
7
2022-10-19 00:00:00
18887
10001
8
2022-10-21 00:00:00
23331
10001
9
2022-10-25 00:00:00
27775
10001
10
2022-10-29 00:00:00
32219
10001
11
2022-10-01 00:00:00
1111
20001
12
2022-10-05 00:00:00
5555
20001
13
2022-10-09 00:00:00
5555
20001
14
2022-10-09 00:00:00
5555
20001
15
2022-10-17 00:00:00
null
20001
16
2022-10-17 00:00:00
null
20001
17
2022-10-19 00:00:00
18887
20001
18
2022-10-21 00:00:00
23331
20001
19
2022-10-25 00:00:00
27775
20001
20
2022-10-29 00:00:00
32219
20001
I want to sum up where clause. I don't know how to show off_duration in the next column.
SELECT CAST([RecordDateTime] AS DATE) AS DATE
,SUM(CAST([units] AS FLOAT)) AS Units
,sum(cast(duration AS INT) / 60) on_duration
FROM [Energies]
WHERE duration_mode = 'ON'
GROUP BY CAST([RecordDateTime] AS DATE)
ORDER BY CAST([RecordDateTime] AS DATE) DESC
op
Datet units on_duration
-------------------------------
2020-01-17 3.53 758
2020-01-16 7.66 973
2020-01-15 15.12 1806
2020-01-13 10.4 500
ex op
date units on_duration off_duration
-----------------------------------------------
2020-01-17 3.53 758 28
2020-01-16 7.66 973 9
2020-01-15 15.12 1806 96
2020-01-13 10.4 500 95
sample data
duration_mode duration RecordDateTime units
-------------------------------------------------------------
ON 187 2020-01-07 20:18:33.9744232 0.19
ON 187 2020-01-07 20:19:03.1554359 0.19
OFF 10 2020-01-07 20:22:13.5283932 0.00
ON 187 2020-01-07 20:24:39.0510166 0.19
I think that you are looking for conditional aggregation:
SELECT
CAST([RecordDateTime] AS DATE) as Date,
SUM(CAST([units] as float)) as Units,
SUM(CASE WHEN duration_mode = 'ON' THEN CAST(duration as int)/60 END) on_duration,
SUM(CASE WHEN duration_mode = 'OFF' THEN CAST(duration as int)/60 END) off_duration
FROM [Energies]
GROUP BY CAST([RecordDateTime] AS DATE)
ORDER BY CAST([RecordDateTime] AS DATE) desc
I have start and end date columns, and there are some where the start date equals the end date of the previous row without a gap. I'm trying to get it so that it would basically go from the Start Date row who's End Date is null and kinda "zig-zag" up going until the Start Date does not match the End Date.
I've tried CTEs, and ROW_NUMBER() OVER().
START_DTE END_DTE
2018-01-17 2018-01-19
2018-01-26 2018-02-22
2018-02-22 2018-08-24
2018-08-24 2018-09-24
2018-09-24 NULL
Expected:
START_DTE END_DTE
2018-01-26 2018-09-24
EDIT
Using a proposed solution with an added CTE to ensure dates don't have times with them.
WITH
CTE_TABLE_NAME AS
(
SELECT
ID_NUM,
CONVERT(DATE,START_DTE) START_DTE,
CONVERT(DATE,END_DTE) END_DTE
FROM
TABLE_NAME
WHERE ID_NUM = 123
)
select min(start_dte) as start_dte, max(end_dte) as end_dte, grp
from (select t.*,
sum(case when prev_end_dte = end_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from CTE_TABLE_NAME t
) t
) t
group by grp;
The following query provides these results:
start_dte end_dte grp
2014-08-24 2014-12-19 1
2014-08-31 2014-09-02 2
2014-09-02 2014-09-18 3
2014-09-18 2014-11-03 4
2014-11-18 2014-12-09 5
2014-12-09 2015-01-16 6
2015-01-30 2015-02-02 7
2015-02-02 2015-05-15 8
2015-05-15 2015-07-08 9
2015-07-08 2015-07-09 10
2015-07-09 2015-08-25 11
2015-08-31 2015-09-01 12
2015-10-06 2015-10-29 13
2015-11-10 2015-12-11 14
2015-12-11 2015-12-15 15
2015-12-15 2016-01-20 16
2016-01-29 2016-02-01 17
2016-02-01 2016-03-03 18
2016-03-30 2016-08-29 19
2016-08-30 2016-12-06 20
2017-01-27 2017-02-20 21
2017-02-20 2017-08-15 22
2017-08-15 2017-08-29 23
2017-08-29 2018-01-17 24
2018-01-17 2018-01-19 25
2018-01-26 2018-02-22 26
2018-02-22 2018-08-24 27
2018-08-24 2018-09-24 28
2018-09-24 NULL 29
I tried using having count (*) > 1 as suggested, but it provided no results
Expected example
START_DTE END_DTE
2017-01-27 2018-01-17
2018-01-26 2018-09-24
You can identify where groups of connected rows start by looking for where adjacent rows are not connected. A cumulative sum of these starts then gives you the groups.
select min(start_dte) as start_dte, max(end_dte) as end_dte
from (select t.*,
sum(case when prev_end_dte = start_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from t
) t
) t
group by grp;
If you want only multiply connected rows (as implied by your question), then add having count(*) > 1 to the outer query.
Here is a db<>fiddle.
I have a table of stock prices:
DECLARE #table TABLE (ClosingDate DATE, Ticker VarChar(6), Price Decimal (6,2))
INSERT INTO #Table
VALUES ('1/1/13' , 'ABC' , '100.00')
,('1/2/13' , 'ABC' , '101.50')
,('1/3/13' , 'ABC' , '99.80')
,('1/4/13' , 'ABC' , '95.50')
,('1/5/13' , 'ABC' , '78.00')
,('1/1/13' , 'JKL' , '34.57')
,('1/2/13' , 'JKL' , '33.99')
,('1/3/13' , 'JKL' , '31.85')
,('1/4/13' , 'JKL' , '30.11')
,('1/5/13' , 'JKL' , '45.00')
,('1/1/13' , 'XYZ' , '11.50')
,('1/2/13' , 'XYZ' , '12.10')
,('1/3/13' , 'XYZ' , '17.15')
,('1/4/13' , 'XYZ' , '14.10')
,('1/5/13' , 'XYZ' , '15.55')
I calculate drawdowns (% from max peak or max price) for each ticker:
SELECT Ticker,
t.ClosingDate,
t.Price,
MAX(t.[Price]) OVER (PARTITION BY Ticker ORDER BY ClosingDate) AS max_price,
(t.[Price] / MAX(t.[Price]) OVER (PARTITION BY Ticker ORDER BY ClosingDate)) - 1 AS Drawdown
FROM
#Table t;
Output:
Ticker ClosingDate Price max_price Drawdown
-----------------------------------------------------
ABC 2013-01-01 100.00 100.00 0.000000000
ABC 2013-01-02 101.50 101.50 0.000000000
ABC 2013-01-03 99.80 101.50 -0.016748769
ABC 2013-01-04 95.50 101.50 -0.059113301
ABC 2013-01-05 78.00 101.50 -0.231527094
JKL 2013-01-01 34.57 34.57 0.000000000
JKL 2013-01-02 33.99 34.57 -0.016777553
JKL 2013-01-03 31.85 34.57 -0.078680938
JKL 2013-01-04 30.11 34.57 -0.129013596
JKL 2013-01-05 45.00 45.00 0.000000000
XYZ 2013-01-01 11.50 11.50 0.000000000
XYZ 2013-01-02 12.10 12.10 0.000000000
XYZ 2013-01-03 17.15 17.15 0.000000000
XYZ 2013-01-04 14.10 17.15 -0.177842566
XYZ 2013-01-05 15.55 17.15 -0.093294461
A new high price is designated as a drawdown or 0.
How can I add days in drawdown?
Any date where drawdown = 0 resets the days counter to 0 and builds as each day remains in drawdown (price < max price)
Here is my expected output:
Ticker ClosingDate Price max_price Drawdown Days in DD
--------------------------------------------------------------------
ABC 1/1/2013 100.00 100.00 0.0000 0
ABC 1/2/2013 101.50 101.50 0.0000 0
ABC 1/3/2013 99.80 101.50 -0.0167 1
ABC 1/4/2013 95.50 101.50 -0.0591 2
ABC 1/5/2013 78.00 101.50 -0.2315 3
JKL 1/1/2013 34.57 34.57 0.0000 0
JKL 1/2/2013 33.99 34.57 -0.0168 1
JKL 1/3/2013 31.85 34.57 -0.0787 2
JKL 1/4/2013 30.11 34.57 -0.1290 3
JKL 1/5/2013 45.00 45.00 0.0000 0
XYZ 1/1/2013 11.50 11.50 0.0000 0
XYZ 1/2/2013 12.10 12.10 0.0000 0
XYZ 1/3/2013 17.15 17.15 0.0000 0
XYZ 1/4/2013 14.10 17.15 -0.1778 1
XYZ 1/5/2013 15.55 17.15 -0.0933 2
I am creating a DataFrame from a csv file, where my index (rows) is date and my column names are names of cities.
After I create the raw DataFrame, I am trying to create a DataFrame from selected columns. I have tried:
A=df['city1'] #city 1
B=df['city2']
C=pd.merge(A,B)
but it does't work. This is what A and B look like.
Date
2013-11-01 2.56
2013-12-01 1.77
2014-01-01 0.00
2014-02-01 0.38
2014-03-01 13.16
2014-04-01 10.29
2014-05-01 15.43
2014-06-01 11.48
2014-07-01 8.54
2014-08-01 11.11
2014-09-01 2.71
2014-10-01 4.16
2014-11-01 13.01
2014-12-01 9.59
Name: Seattle.Washington, dtype: float64 Date
And this is what I am looking to create:
City1 City2
Date
2013-11-01 0.00 2.94
2013-12-01 8.26 3.41
2014-01-01 1.11 14.27
2014-02-01 32.86 84.26
2014-03-01 34.12 0.00
2014-04-01 68.39 0.00
2014-05-01 27.17 9.09
2014-06-01 10.47 32.00
2014-07-01 14.19 26.83
2014-08-01 14.91 6.36
2014-09-01 3.76 8.32
2014-10-01 5.83 2.19
2014-11-01 10.79 2.64
2014-12-01 21.24 8.08
Any suggestion?
Error Message:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-222-ec50ff9f372f> in <module>()
14 S = df['City1']
15 A = df['City2']
16
---> 17 print merge(S,A)
18 #df2=pd.merge(A,A)
19 #print df2
C:\...\merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
36 right_on=right_on, left_index=left_index,
37 right_index=right_index, sort=sort, suffixes=suffixes,
---> 38 copy=copy)
39 return op.get_result()
40 if __debug__:
Answer: (Courtesy of #EdChum)
df[['City1', 'City2']]