How to transform ticks into minute bars in SQL - sql

I have market data stored in a table in the following format:
Timestamp Price Quantity Condition
01/11/2016 09:03:57 14.34 1 S
01/11/2016 09:03:58 14.31 5
01/11/2016 09:03:59 14.34 1 S
01/11/2016 09:03:59 14.35 2
etc.
I want to group this into bars of one minute length, looking something like this:
BarEndTime Open High Low Close
01/11/2016 09:03 14.15 14.16 14.13 14.15
01/12/2016 09:04 14.17 14.19 14.17 14.18
How do I group this data into one minute clusters based on the timestamp of the base data set? I do this fairly easily in R, but for a number of reasons I'd like to build these in SQL as well.

I have no knowledge of R therefore I can only guess what "buckets" and "cluster" are. But if, by any chance you should be interesed in the opening, minimum, maximum and closing values of Pricefor each minute interval then the following might be helpful:
;WITH cte AS (
SELECT CONVERT(char(16),Timestamp,126) ts, MIN(Price) p0, MAX(Price) p1,
MIN(Timestamp) t0, MAX(Timestamp) t1
FROM #tbl GROUP BY CONVERT(char(16),Timestamp,126)
)
SELECT ts,(SELECT min(Price) FROM #tbl WHERE Timestamp=t0) po,
p0,p1,
(SELECT max(Price) FROM #tbl WHERE Timestamp=t1) pc
FROM cte
See here for an example.
Input:
Timestamp Price Qty Cnd
01/11/2016 09:03:57 14.34 1 S
01/11/2016 09:03:58 14.31 5
01/11/2016 09:03:59 14.34 1 S
01/11/2016 09:03:59 14.35 2
01/11/2016 09:04:37 11.84 1 S
01/11/2016 09:04:48 12.36 5
01/11/2016 09:04:49 14.54 1 S
01/11/2016 09:04:59 13.35 2
Output:
ts po p0 p1 pc
2016-01-11T09:03 14.34 14.31 14.35 14.35
2016-01-11T09:04 11.84 11.84 14.54 13.35
Since according to the sample data there can be more than one Price for a particular Timestamp given I had to equip the (SELECT min(Price) FROM #tbl WHERE Timestamp=t0) subquery for the opening and closing prices with a min()/max() aggregate function. Maybe you can find a better solution to limit these subqueries to just a one-value result.
In my solution I used a common table expression (CTE), which is not available in some database systems like MySql. So, in case you are using a RDBS without CTE you can easily rewrite the above using a simple subquery since the cte is only referenced once anyway:
SELECT ts,(SELECT min(Price) FROM #tbl WHERE Timestamp=t0) po,p0,p1,
(SELECT max(Price) FROM #tbl WHERE Timestamp=t1) pc
FROM
(SELECT CONVERT(char(16),Timestamp,126) ts, MIN(Price) p0, MAX(Price) p1,
MIN(Timestamp) t0, MAX(Timestamp) t1
FROM #tbl GROUP BY CONVERT(char(16),Timestamp,126)) subq

If you are on Oracle:
calculating open as the value occurring first in a minute (lowest value if more than one on the first timestamp, and close being the last occurring value ina minute (higher of the values if multiples exist with the same timestamp), analytics become your friend.
with dat as(
SELECT to_Date('01/11/2016 09:03:57','dd/mm/yyyy hh24:mi:ss') ts, 14.34 val, 1 qty, 'S' cond from dual union all
SELECT to_Date('01/11/2016 09:03:58','dd/mm/yyyy hh24:mi:ss') ts, 14.31 val, 5 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:03:59','dd/mm/yyyy hh24:mi:ss') ts, 14.34 val, 1qty, 'S' cond from dual union all
SELECT to_Date('01/11/2016 09:03:59','dd/mm/yyyy hh24:mi:ss') ts, 14.35 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:03:51','dd/mm/yyyy hh24:mi:ss') ts, 14.35 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:09','dd/mm/yyyy hh24:mi:ss') ts, 14.45 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:19','dd/mm/yyyy hh24:mi:ss') ts, 14.15 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:29','dd/mm/yyyy hh24:mi:ss') ts, 14.55 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:39','dd/mm/yyyy hh24:mi:ss') ts, 14.85 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:49','dd/mm/yyyy hh24:mi:ss') ts, 14.45 val, 2 qty, null cond from dual union all
SELECT to_Date('01/11/2016 09:04:59','dd/mm/yyyy hh24:mi:ss') ts, 14.25 val, 2 qty, null cond from dual )
select trunc(ts,'mi') as ts_minute,
min (val) keep (dense_rank first order by ts) as open_val,
max (val) keep (dense_rank last order by ts) as close_val,
min (val) min_val,
max(val) max_val
from dat
group by trunc(ts,'mi') ;
TS_MINUTE, OPEN_VAL, CLOSE_VAL, MIN_VAL, MAX_VAL
01/11/2016 9:03:00 AM, 14.35, 14.35, 14.31, 14.35
01/11/2016 9:04:00 AM, 14.45, 14.25, 14.15, 14.85

Related

LAG with condition

I want to get a value from the previous row that matches a certain condition.
For example: here I want for each row to get the timestamp from the last event = 1.
I feel I can do it without joins with LAG and PARTITION BY with CASE but I am not able to crack it.
Please help.
Here is one approach using analytic functions:
WITH cte AS (
SELECT *, COUNT(CASE WHEN event = 1 THEN 1 END) OVER
(PARTITION BY customer_id ORDER BY ts) cnt
FROM yourTable
)
SELECT ts, customer_id, event,
MAX(CASE WHEN event = 1 THEN ts END) OVER
(PARTITION BY customer_id, cnt) AS desired_result
FROM cte
ORDER BY customer_id, ts;
Demo
We can articulate your problem by saying that your want the desired_result column to contain the most recent timestamp value when the event was 1. The count (cnt) in the CTE above computes a pseudo group of records for each time the event is 1. Then we simply do a conditional aggregation over customer and pseudo group to find the timestamp value.
One more approach with "one query":
with data as
(
select sysdate - 0.29 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.28 ts, 111 customer_id, 2 event from dual union all
select sysdate - 0.27 ts, 111 customer_id, 3 event from dual union all
select sysdate - 0.26 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.25 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.24 ts, 111 customer_id, 2 event from dual union all
select sysdate - 0.23 ts, 111 customer_id, 1 event from dual union all
select sysdate - 0.22 ts, 111 customer_id, 1 event from dual
)
select
ts, event,
last_value(case when event=1 then ts end) ignore nulls
over (partition by customer_id order by ts) desired_result,
max(case when event=1 then ts end)
over (partition by customer_id order by ts) desired_result_2
from data
order by ts
Edit: As suggested by MatBailie the max(case...) works as well and is a more general approach. The "last_value ... ignore nulls" is Oracle specific.

Finding the most recent thing prior to a specific event

I'm doing some timestamp problem solving but am stuck with some join logic.
I have a table of data like so:
id, event_time, event_type, location
1001, 2018-06-04 18:23:48.526895 UTC, I, d
1001, 2018-06-04 19:26:44.359296 UTC, I, h
1001, 2018-06-05 06:07:03.658263 UTC, I, w
1001, 2018-06-07 00:47:44.651841 UTC, I, d
1001, 2018-06-07 00:48:17.857729 UTC, C, d
1001, 2018-06-08 00:04:53.086240 UTC, I, a
1001, 2018-06-12 21:23:03.071829 UTC, I, d
...
And I'm trying to find the timestamp difference between when a user has an event_type of C and the most recent event type of I up to event_type C for a given location value.
Ultimately the schema I'm after is:
id, location, timestamp_diff
1001, d, 33
1001, z, 21
1002, a, 55
...
I tried the following, which works for only one id value, but doesn't seem to work for multiples ids. I might be over-complicating the issue, but I wasn't sure. On one id it gives about 5 rows, which is right. However, when I open it up two ids, I get upwards of 200 rows when I should get something like 7 (5 for the first id and 2 for the second):
with c as (
select
id
,event_time as c_time
,location
from data
where event_type = 'C'
and id = '1001'
)
,i as (
select
id
,event_time as i_time
,location
from data
where event_type = 'I'
)
,check1 as (
c.*
,i.i_time
from c
left join i on (c.id = i.id and c.location = i.location)
group by 1,2,3,4
having i_time <= c_time
)
,check2 as (
select
id
,c_time
,location
,max(i_time) as i_time
from check1
group by 1,2,3
)
select
id
,location
,timestamp_diff(c_time, i_time, second) as timestamp_diff
#standardSQL
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
This version addresses some edge cases - like for example case when there are consecutive 'C' events with "missing" 'I' events as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1001 id, TIMESTAMP '2018-06-04 18:23:48.526895 UTC' event_time, 'I' event_type, 'd' location UNION ALL
SELECT 1001, '2018-06-04 19:26:44.359296 UTC', 'I', 'h' UNION ALL
SELECT 1001, '2018-06-05 06:07:03.658263 UTC', 'I', 'w' UNION ALL
SELECT 1001, '2018-06-07 00:47:44.651841 UTC', 'I', 'd' UNION ALL
SELECT 1001, '2018-06-07 00:48:17.857729 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-08 00:04:53.086240 UTC', 'C', 'd' UNION ALL
SELECT 1001, '2018-06-12 21:23:03.071829 UTC', 'I', 'd'
)
SELECT id, location, TIMESTAMP_DIFF(event_time, i_event_time, SECOND) AS diff
FROM (
SELECT *, MAX(IF(event_type = 'I', event_time, NULL)) OVER(win2) AS i_event_time
FROM (
SELECT *, COUNTIF(event_type = 'C') OVER(win1) grp
FROM `project.dataset.table`
WINDOW win1 AS (PARTITION BY id, location ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win2 AS (PARTITION BY id, location, grp ORDER BY event_time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE event_type = 'C'
AND NOT i_event_time IS NULL
result is
Row id location diff
1 1001 d 33
while if not to address that mentioned edge case it would be
Row id location diff
1 1001 d 33
2 1001 d 83795
You can use a cumulative max() function to get the most recent i time before every event.
Then just filter based on the C event:
select id, location,
timestamp_diff(event_time, i_event_time, second) as diff
from (select t.*,
max(case when event_type = 'I' then event_time end) over (partition by id, location order by event_time) as i_event_time
from t
) t
where event_type = 'C';

Month counts between dates

I have the below table. I need to count how many ids were active in a given month. So thinking I'll need to create a row for each id that was active during that month so that id can be counted each month. A row should be generated for a term_dt during that month.
active_dt term_dt id
1/1/2018 101
1/1/2018 5/15/2018 102
3/1/2018 6/1/2018 103
1/1/2018 4/25/18 104
Apparently this is a "count number of overlapping intervals" problem. The algorithm goes like this:
Create a sorted list of all start and end points
Calculate a running sum over this list, add one when you encounter a start and subtract one when you encounter an end
If two points are same then perform subtractions first
You will end up with list of all points where the sum changed
Here is a rough outline of the query. It is for SQL Server but could be ported to any RDBMS that supports window functions:
WITH cte1(date, val) AS (
SELECT active_dt, 1 FROM #t AS t
UNION ALL
SELECT COALESCE(term_dt, '2099-01-01'), -1 FROM #t AS t
-- if end date is null then assume the row is valid indefinitely
), cte2 AS (
SELECT date, SUM(val) OVER(ORDER BY date, val) AS rs
FROM cte1
)
SELECT YEAR(date) AS YY, MONTH(date) AS MM, MAX(rs) AS MaxActiveThisYearMonth
FROM cte2
GROUP BY YEAR(date), MONTH(date)
DB Fiddle
I was toying with a simpler query, that seemed to do the trick, for Oracle:
with candidates (month_start) as (
select to_date ('2018-' || column_value || '-01','YYYY-MM-DD')
from
table
(sys.odcivarchar2list('01','02','03','04','05',
'06','07','08','09','10','11','12'))
), sample_data (active_dt, term_dt, id) as (
select to_date('01/01/2018', 'MM/DD/YYYY'), null, 101 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('05/15/2018', 'MM/DD/YYYY'), 102 from dual
union select to_date('03/01/2018', 'MM/DD/YYYY'),
to_date('06/01/2018', 'MM/DD/YYYY'), 103 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('04/25/2018', 'MM/DD/YYYY'), 104 from dual
)
select c.month_start, count(1)
from candidates c
join sample_data d
on c.month_start between d.active_dt and nvl(d.term_dt,current_date)
group by c.month_start
order by c.month_start
An alternative solution would be to use a hierarchical query, e.g.:
WITH your_table AS (SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, NULL term_dt, 101 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('15/05/2018', 'dd/mm/yyyy') term_dt, 102 ID FROM dual UNION ALL
SELECT to_date('01/03/2018', 'dd/mm/yyyy') active_dt, to_date('01/06/2018', 'dd/mm/yyyy') term_dt, 103 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('25/04/2018', 'dd/mm/yyyy') term_dt, 104 ID FROM dual)
SELECT active_month,
COUNT(*) num_active_ids
FROM (SELECT add_months(TRUNC(active_dt, 'mm'), -1 + LEVEL) active_month,
ID
FROM your_table
CONNECT BY PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= FLOOR(months_between(coalesce(term_dt, SYSDATE), active_dt)) + 1)
GROUP BY active_month
ORDER BY active_month;
ACTIVE_MONTH NUM_ACTIVE_IDS
------------ --------------
01/01/2018 3
01/02/2018 3
01/03/2018 4
01/04/2018 4
01/05/2018 3
01/06/2018 2
01/07/2018 1
01/08/2018 1
01/09/2018 1
01/10/2018 1
Whether this is more or less performant than the other answers is up to you to test.

SQL to calculate difference between 2 latest recent values by event_types

The events table looks like
event_type value timestamp
2 2 06-06-2016 14:00:00
2 7 06-06-2016 13:00:00
2 2 06-06-2016 12:00:00
3 3 06-06-2016 14:00:00
3 9 06-06-2016 13:00:00
4 9 06-06-2016 13:00:00
My goal is to filter event types that occur more than twice and subtract most two recent values and shows BY event_type.
The end result would be
event_type value
2 -5
3 -6
I was able to get filter events occurred more than twice and order by event_type based on timestamp desc.
The difficult part for me is to subtract most two recent values and shows BY event_type.
DB / SQL experts , please help
You can use a query like this:
SELECT event_type, diff
FROM (
SELECT event_type, value, "timestamp", rn,
value - LEAD(value) OVER (PARTITION BY event_type
ORDER BY "timestamp" DESC) AS diff
FROM (
SELECT event_type, value, "timestamp",
COUNT(*) OVER (PARTITION BY event_type) AS cnt,
ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY "timestamp" DESC) AS rn
FROM mytable) AS t
WHERE cnt >=2 AND rn <= 2 ) AS s
WHERE rn = 1
The innermost subquery uses:
Window function COUNT with PARTITION BY clause, so as to calculate the population of each event_type slice.
Window function ROW_NUMBER so as to get the two latest records within each event_type slice.
The mid-level query uses LEAD window function, so as to calculate the difference between the first and the second records. The outermost query simply returns this difference.
Demo here
This example only for Oracle.
Test data:
with t(event_type,
value,
timestamp) as
(select 2, 2, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 7, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 2, to_timestamp('06-06-2016 12:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 3, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 4, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual)
Query:
select event_type,
max(value) keep(dense_rank first order by rn) - max(value) keep(dense_rank last order by rn) as value
from (select event_type,
row_number() over(partition by event_type order by timestamp desc) rn,
value
from t) t
where rn in (1, 2)
group by event_type
having count (*) >= 2

SQL Oracle Query self query

I am trying to figure out how to populate the below NULL values with 1.245 for dates from 07-OCT-14 to 29-SEP-14 then from 26-SEP-14 to 28-JUL-14 it will be 1.447.
This means if the date is less than or equal to the given date then use the value of max effective date which is less than the given date
We could select the last available index_ratio value for given security_alias and effective date <=p.effective_date , so in other words we will need to modify the sql to return from the subquery the index ratio value identified for the maximum available effective date assuming that this effective date is less or equal position effective date
How to populate the value ?
select ab.security_alias,
ab.index_ratio,
ab.effective_date
from securitydbo.security_analytics_fi ab
where ab.security_alias = 123627
order by ab.effective_date desc
Below should be the output
Assuming I understand your requirements correctly, I think the analytic function LAST_VALUE() is what you're after. E.g.:
with sample_data as (select 1 id, 10 val, to_date('01/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, null val, to_date('02/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, null val, to_date('03/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, null val, to_date('04/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, 20 val, to_date('05/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, 21 val, to_date('06/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, null val, to_date('07/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, null val, to_date('08/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, 31 val, to_date('09/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, null val, to_date('10/08/2015', 'dd/mm/yyyy') dt from dual union all
select 1 id, 42 val, to_date('11/08/2015', 'dd/mm/yyyy') dt from dual)
select id,
last_value(val ignore nulls) over (partition by id order by dt) val,
dt
from sample_data
order by id, dt desc;
ID VAL DT
---------- ---------- ----------
1 42 11/08/2015
1 31 10/08/2015
1 31 09/08/2015
1 21 08/08/2015
1 21 07/08/2015
1 21 06/08/2015
1 20 05/08/2015
1 10 04/08/2015
1 10 03/08/2015
1 10 02/08/2015
1 10 01/08/2015