Related
I have a query that is taking transactions (buying and selling of items) to calculate the gain/loss when the running total resets back to 0.
The fiddle is here: https://www.db-fiddle.com/f/974UVvE6id2rEiBPR78CKx/0
The units of each item can be added and subtracted and each time they come back to 0 for a account and item combination we want to calculate the net result of those transactions.
You can see it working in the fiddle for the first few (when open = 0), however it fails if there are multiple transactions before getting to 0 (eg 1 increment, 2 separate decrements of units).
From this data:
INSERT INTO t
(account, item, units, price, created_at)
VALUES
(2, 'A', -1, '$120.00', '2022-09-23 17:33:07'),
(2, 'A', 1, '$110.00', '2022-09-23 17:34:31'),
(1, 'B', -1, '$70.00', '2022-09-23 17:38:31'),
(1, 'B', 1, '$50.00', '2022-09-23 17:36:31'),
(1, 'B', 2, '$50.00', '2022-09-23 17:40:31'),
(1, 'B', -1, '$60.00', '2022-09-23 17:41:31'),
(1, 'B', -1, '$70.00', '2022-09-23 17:42:31'),
(1, 'B', 1, '$50.00', '2022-09-23 17:35:31'),
(1, 'B', -1, '$60.00', '2022-09-23 17:33:31'),
(2, 'B', 1, '$70.00', '2022-09-23 17:43:31'),
(2, 'B', 1, '$75.00', '2022-09-23 17:45:31'),
(2, 'B', -2, '$80.00', '2022-09-23 17:46:31')
;
I need to produce this result (net is the relevant column which we cannot get right in the fiddle, it shows incorrect values for the last two net values):
account
item
units
price
created_at
open
cost
net
2
A
-1
$120.00
2022-09-23T17:33:07.000Z
-1
$120.00
1
B
-1
$60.00
2022-09-23T17:33:31.000Z
-1
$60.00
2
A
1
$110.00
2022-09-23T17:34:31.000Z
0
-$110.00
$10.00
1
B
1
$50.00
2022-09-23T17:35:31.000Z
0
-$50.00
$10.00
1
B
1
$50.00
2022-09-23T17:36:31.000Z
1
-$50.00
1
B
-1
$70.00
2022-09-23T17:38:31.000Z
0
$70.00
$20.00
1
B
2
$50.00
2022-09-23T17:40:31.000Z
2
-$100.00
1
B
-1
$60.00
2022-09-23T17:41:31.000Z
1
$60.00
1
B
-1
$70.00
2022-09-23T17:42:31.000Z
0
$70.00
$30.00
2
B
1
$70.00
2022-09-23T17:43:31.000Z
1
-$70.00
2
B
1
$75.00
2022-09-23T17:45:31.000Z
2
-$75.00
2
B
-2
$80.00
2022-09-23T17:46:31.000Z
0
$160.00
$15.00
View on DB Fiddle
We start by establishing cost and every time the running total is 0. By using lag and count we make groups out of every run that leads to zero divided by account and item. We use the groups we just created and find the running total of cost, but only display the result when our original running_total = 0.
select account
,item
,units
,price
,created_at
,running_total as open
,cost
,case running_total when 0 then sum(cost) over(partition by account, item, grp order by created_at) end as net
from
(
select *
,count(mark_0) over(partition by account, item order by created_at) as grp
from (
select *
,case when lag(running_total) over(partition by account, item order by created_at) = 0 then 1 when lag(running_total) over(partition by account, item order by created_at) is null then 1 end as mark_0
from (
select *
,sum(units) over(partition by account, item order by created_at) as running_total
,price*units*-1 as cost
from t
) t
) t
) t
order by created_at
account
item
units
price
created_at
open
cost
net
2
A
-1
120.00
2022-09-23 17:33:07+01
-1
120.00
null
1
B
-1
60.00
2022-09-23 17:33:31+01
-1
60.00
null
2
A
1
110.00
2022-09-23 17:34:31+01
0
-110.00
10.00
1
B
1
50.00
2022-09-23 17:35:31+01
0
-50.00
10.00
1
B
1
50.00
2022-09-23 17:36:31+01
1
-50.00
null
1
B
-1
70.00
2022-09-23 17:38:31+01
0
70.00
20.00
1
B
2
50.00
2022-09-23 17:40:31+01
2
-100.00
null
1
B
-1
60.00
2022-09-23 17:41:31+01
1
60.00
null
1
B
-1
70.00
2022-09-23 17:42:31+01
0
70.00
30.00
2
B
1
70.00
2022-09-23 17:43:31+01
1
-70.00
null
2
B
1
75.00
2022-09-23 17:45:31+01
2
-75.00
null
2
B
-2
80.00
2022-09-23 17:46:31+01
0
160.00
15.00
Fiddle
You can use a recursive cte, building up the results row by row, using a JSON object to store running open and cost values for every unique item:
with recursive transactions as (
select row_number() over (order by t1.created_at) id, t1.* from t t1
order by t1.created_at
),
cte(id, account, item, unit, price, created_at, open, cost, net, p) as (
select t.*, t.unit, -1*t.price*t.unit, 0, (select jsonb_object_agg(t1.item,
jsonb_build_object('u', 0, 'c', 0)) from transactions t1)||jsonb_build_object(t.item,
jsonb_build_object('u', t.unit, 'c', -1*t.price*t.unit))
from transactions t where t.id = 1
union all
select t.*, (c.p -> t.item -> 'u')::int + t.unit, -1*t.price*t.unit,
case when (c.p -> t.item -> 'u')::int + t.unit = 0
then (c.p -> t.item -> 'c')::int + -1*t.price*t.unit else 0 end,
c.p || jsonb_build_object(t.item, jsonb_build_object('u', (c.p -> t.item -> 'u')::int + t.unit, 'c',
case when (c.p -> t.item -> 'u')::int + t.unit = 0 then 0
else (c.p -> t.item -> 'c')::int + -1*t.price*t.unit end))
from cte c join transactions t on t.id = c.id + 1
)
select account, item, unit, price, created_at,
open, cost, case when net > 0 then net end
from cte;
I am doing some roster analysis and need to identify when an employee has worked for 5 or more consecutive days. In my table, I can extract data something like the below (note, there are lot more columns, this is just a cut down example):
Emp
Start
First_Entry
1234
23/06/2016
1
1234
24/06/2016
1
1234
24/06/2016
0
1234
25/06/2016
1
1234
26/06/2016
1
1234
27/06/2016
1
1234
28/06/2016
1
1234
29/06/2016
1
1234
29/06/2016
0
1234
30/06/2016
1
1234
2/07/2016
1
1234
3/07/2016
1
1234
3/07/2016
0
1234
4/07/2016
1
1234
4/07/2016
0
1234
5/07/2016
1
1234
6/07/2016
1
1234
9/07/2016
1
1234
10/07/2016
1
1234
11/07/2016
1
1234
12/07/2016
1
And what I am after is something like this:
Emp
Start
First_Entry
Consecutive_Days
Over_5
Status
1234
23/06/2016
1
1
0
Worked < 5
1234
24/06/2016
1
2
0
Worked < 5
1234
24/06/2016
0
2
0
Worked < 5
1234
25/06/2016
1
3
0
Worked < 5
1234
26/06/2016
1
4
0
Worked < 5
1234
27/06/2016
1
5
1
Worked >= 5
1234
28/06/2016
1
6
1
Worked >= 5
1234
29/06/2016
1
7
1
Worked >= 5
1234
29/06/2016
0
7
1
Worked >= 5
1234
30/06/2016
1
8
1
Worked >= 5
1234
02/07/2016
1
1
0
Worked < 5
1234
03/07/2016
1
2
0
Worked < 5
1234
03/07/2016
0
2
0
Worked < 5
1234
04/07/2016
1
3
0
Worked < 5
1234
04/07/2016
0
3
0
Worked < 5
1234
05/07/2016
1
4
0
Worked < 5
1234
06/07/2016
1
5
1
Worked >= 5
1234
09/07/2016
1
1
0
Worked < 5
1234
10/07/2016
1
2
0
Worked < 5
1234
11/07/2016
1
3
0
Worked < 5
1234
12/07/2016
1
4
0
Worked < 5
I'm really not sure how to go about getting the cumulative count for consecutive days, so any help you can give will be amazing
Probably someone would come up with a brilliant solution but this would do. Your problem looks like an "Gaps and Islands" problem. Finding islands of date ranges we can find out the rest easily. In the below SQL, #mindate is not a must, but makes it easier.
CREATE TABLE #temptable
(
[Emp] CHAR(4),
[startDate] DATE,
[First_Entry] BIT
);
INSERT INTO #temptable
(
[Emp],
[startDate],
[First_Entry]
)
VALUES
('1234', N'2016-06-23', 1),
('1234', N'2016-06-24', 1),
('1234', N'2016-06-24', 0),
('1234', N'2016-06-25', 1),
('1234', N'2016-06-26', 1),
('1234', N'2016-06-27', 1),
('1234', N'2016-06-28', 1),
('1234', N'2016-06-29', 1),
('1234', N'2016-06-29', 0),
('1234', N'2016-06-30', 1),
('1234', N'2016-07-02', 1),
('1234', N'2016-07-03', 1),
('1234', N'2016-07-03', 0),
('1234', N'2016-07-04', 1),
('1234', N'2016-07-04', 0),
('1234', N'2016-07-05', 1),
('1234', N'2016-07-06', 1),
('1234', N'2016-07-09', 1),
('1234', N'2016-07-10', 1),
('1234', N'2016-07-11', 1),
('1234', N'2016-07-12', 1);
DECLARE #minDate DATE;
SELECT #minDate = DATEADD(d, -1, MIN(startDate))
FROM #temptable;
WITH firstOnly
AS (SELECT *
FROM #temptable
WHERE First_Entry = 1),
grouper (emp, startDate, grp)
AS (SELECT Emp,
startDate,
DATEDIFF(d, #minDate, startDate) - ROW_NUMBER() OVER (PARTITION BY Emp ORDER BY startDate)
FROM firstOnly),
islands (emp, START, [end])
AS (SELECT emp,
MIN(startDate),
MAX(startDate)
FROM grouper
GROUP BY emp,
grp),
consecutives (emp, startDate, consecutive_days)
AS (SELECT f.Emp,
f.startDate,
-- i.START,
-- i.[end],
ROW_NUMBER() OVER (PARTITION BY f.Emp, i.START ORDER BY i.START)
FROM firstOnly f
INNER JOIN islands i
ON f.startDate
BETWEEN i.START AND i.[end])
SELECT t.Emp,
t.startDate,
t.First_Entry,
c.consecutive_days,
CAST(CASE
WHEN c.consecutive_days < 5 THEN
0
ELSE
1
END AS BIT) Over_5,
CASE
WHEN c.consecutive_days < 5 THEN
'Worked < 5'
ELSE
'Worked >= 5'
END [Status]
FROM consecutives c
INNER JOIN #temptable t
ON t.Emp = c.emp
AND t.startDate = c.startDate;
DROP TABLE #temptable;
This is a island and gap problem, You can try to use LAG window function to get the previous startDate row for each Emp, ten use SUM window function to calculate which days are continuous.
Finally, We can use CASE WHEN expression to judge whether the day is greater than 5.
;WITH CTE AS (
SELECT [Emp],
[startDate],
[First_Entry],
SUM(CASE WHEN DATEDIFF(dd,f_Dt,startDate) <= 1 THEN 0 ELSE 1 END) OVER(PARTITION BY Emp ORDER BY startDate) grp
FROM (
SELECT *,
LAG(startDate,1,startDate) OVER(PARTITION BY Emp ORDER BY startDate) f_Dt
FROM T
) t1
)
SELECT [Emp],
[startDate],
[First_Entry],
SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) Consecutive_Days,
(CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 1 ELSE 0 END) Over_5,
(CASE WHEN SUM(CASE WHEN First_Entry = 1 THEN 1 ELSE 0 END) OVER(PARTITION BY Emp,grp ORDER BY startDate) >= 5 THEN 'Worked >= 5' ELSE 'Worked < 5' END) Status
FROM CTE
sqlfiddle
I create a table using the command below:
CREATE TABLE IF NOT EXISTS stats (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
session_kind INTEGER NOT NULL,
ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
)
I insert some time series data using the command below:
INSERT INTO stats (session_kind) values (?1)
Some time after having executed several times the insert command, I have some time series data as below:
id session_kind ts
-----------------------------------------
1 0 2020-04-18 12:59:51 // day 1
2 1 2020-04-19 12:59:52 // day 2
3 0 2020-04-19 12:59:53
4 1 2020-04-19 12:59:54
5 0 2020-04-19 12:59:55
6 2 2020-04-19 12:59:56
7 2 2020-04-19 12:59:57
8 2 2020-04-19 12:59:58
9 2 2020-04-19 12:59:59
10 0 2020-04-20 12:59:51 // day 3
11 1 2020-04-20 12:59:52
12 0 2020-04-20 12:59:53
13 1 2020-04-20 12:59:54
14 0 2020-04-20 12:59:55
15 2 2020-04-20 12:59:56
16 2 2020-04-20 12:59:57
17 2 2020-04-20 12:59:58
18 2 2020-04-21 12:59:59 // day 4
What I would like to have a command that groups my data by date from the most recent day to the least and the number of each session_kind like below (I don't want to give any parameter to this command):
0 1 2 ts
-------------------------
0 0 1 2020-04-21 // day 4
3 2 3 2020-04-20 // day 3
2 2 4 2020-04-19 // day 2
1 0 0 2020-04-18 // day 1
How can I group my data as above?
You can do conditional aggregation:
select
sum(session_kind= 0) session_kind_0,
sum(session_kind= 1) session_kind_1,
sum(session_kind= 2) session_kind_2,
date(ts) ts_day
from mytable
group by date(ts)
order by ts_day desc
If you want something dynamic, then it might be simpler to put the results in rows rather than columns:
select date(ts) ts_day, session_kind, count(*) cnt
from mytable
group by date(ts), session_kind
order by ts_day desc, session_kind
If I understand correctly, you just want to sum the values:
select date(timestamp),
sum(case when session_kind = 1 then 1 else 0 end) as cnt_1,
sum(case when session_kind = 2 then 1 else 0 end) as cnt_2,
sum(case when session_kind = 3 then 1 else 0 end) as cnt_3
from t
group by date(timestamp);
You can also simplify this:
select date(timestamp),
sum( session_kind = 1 ) as cnt_1,
sum( session_kind = 2 ) as cnt_2,
sum( session_kind = 3 ) as cnt_3
from t
group by date(timestamp);
I have this Bigquery dataframe where 1 in long_entry or short_entry represents entering the trade at that time with a long/short position corresponding. While a 1 in long_exit or short_exit means exiting a trade. I would like to have 2 new columns, one called long_pnl which tabulate the PnL generated from individual long trades and another called short_pnl which tabulate the PnL generated from individual short trades.
Only a maximum of 1 trade/position at any point of time for this backtesting.
Below is my dataframe. As we can see, a long trade is entered on 26/2/2019 and closed at 1/3/2019 and the Pnl will be $64.45 while a short trade is entered on 4/3/2019 and closed on 5/3/2019 with a pnl of -$119.11 (loss).
date price long_entry long_exit short_entry short_exit
0 24/2/2019 4124.25 0 0 0 0
1 25/2/2019 4130.67 0 0 0 0
2 26/2/2019 4145.67 1 0 0 0
3 27/2/2019 4180.10 0 0 0 0
4 28/2/2019 4200.05 0 0 0 0
5 1/3/2019 4210.12 0 1 0 0
6 2/3/2019 4198.10 0 0 0 0
7 3/3/2019 4210.34 0 0 0 0
8 4/3/2019 4100.12 0 0 1 0
9 5/3/2019 4219.23 0 0 0 1
I hope to have an output like this, with another column for short_pnl:
date price long_entry long_exit short_entry short_exit long_pnl
0 24/2/2019 4124.25 0 0 0 0 NaN
1 25/2/2019 4130.67 0 0 0 0 NaN
2 26/2/2019 4145.67 1 0 0 0 64.45
3 27/2/2019 4180.10 0 0 0 0 NaN
4 28/2/2019 4200.05 0 0 0 0 NaN
5 1/3/2019 4210.12 0 1 0 0 NaN
6 2/3/2019 4198.10 0 0 0 0 NaN
7 3/3/2019 4210.34 0 0 0 0 NaN
8 4/3/2019 4100.12 0 0 1 0 NaN
9 5/3/2019 4219.23 0 0 0 1 NaN
Below is for BigQuery Standard SQL
#standardSQL
WITH temp1 AS (
SELECT PARSE_DATE('%d/%m/%Y', dt) dt, CAST(price AS numeric) price, long_entry, long_exit, short_entry, short_exit
FROM `project.dataset.table`
), temp2 AS (
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
SUM(long_entry) OVER(ORDER BY dt) + SUM(long_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) long_grp,
SUM(short_entry) OVER(ORDER BY dt) + SUM(short_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) short_grp
FROM temp1
)
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
IF(long_entry = 0, NULL,
FIRST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC) -
LAST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC)
) long_pnl,
IF(short_entry = 0, NULL,
LAST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC) -
FIRST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC)
) short_pnl
FROM temp2
If to apply above to sample data in your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT '24/2/2019' dt, 4124.25 price, 0 long_entry, 0 long_exit, 0 short_entry, 0 short_exit UNION ALL
SELECT '25/2/2019', 4130.67, 0, 0, 0, 0 UNION ALL
SELECT '26/2/2019', 4145.67, 1, 0, 0, 0 UNION ALL
SELECT '27/2/2019', 4180.10, 0, 0, 0, 0 UNION ALL
SELECT '28/2/2019', 4200.05, 0, 0, 0, 0 UNION ALL
SELECT '1/3/2019', 4210.12, 0, 1, 0, 0 UNION ALL
SELECT '2/3/2019', 4198.10, 0, 0, 0, 0 UNION ALL
SELECT '3/3/2019', 4210.34, 0, 0, 0, 0 UNION ALL
SELECT '4/3/2019', 4100.12, 0, 0, 1, 0 UNION ALL
SELECT '5/3/2019', 4219.23, 0, 0, 0, 1
), temp1 AS (
SELECT PARSE_DATE('%d/%m/%Y', dt) dt, CAST(price AS numeric) price, long_entry, long_exit, short_entry, short_exit
FROM `project.dataset.table`
), temp2 AS (
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
SUM(long_entry) OVER(ORDER BY dt) + SUM(long_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) long_grp,
SUM(short_entry) OVER(ORDER BY dt) + SUM(short_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) short_grp
FROM temp1
)
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
IF(long_entry = 0, NULL,
FIRST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC) -
LAST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC)
) long_pnl,
IF(short_entry = 0, NULL,
LAST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC) -
FIRST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC)
) short_pnl
FROM temp2
-- ORDER BY dt
result will be
Row dt price long_entry long_exit short_entry short_exit long_pnl short_pnl
1 2019-02-24 4124.25 0 0 0 0 null null
2 2019-02-25 4130.67 0 0 0 0 null null
3 2019-02-26 4145.67 1 0 0 0 64.45 null
4 2019-02-27 4180.1 0 0 0 0 null null
5 2019-02-28 4200.05 0 0 0 0 null null
6 2019-03-01 4210.12 0 1 0 0 null null
7 2019-03-02 4198.1 0 0 0 0 null null
8 2019-03-03 4210.34 0 0 0 0 null null
9 2019-03-04 4100.12 0 0 1 0 null -119.11
10 2019-03-05 4219.23 0 0 0 1 null null
I feel there should be a "shorter" solution - but above is still good enough I think to use
Requirement is to Group record of table based on 10 second time interval. Given table
Id DateTime Rank
1 2011-09-27 18:36:15 1
2 2011-09-27 18:36:15 1
3 2011-09-27 18:36:19 1
4 2011-09-27 18:36:23 1
5 2011-09-27 18:36:26 1
6 2011-09-27 18:36:30 1
7 2011-09-27 18:36:32 1
8 2011-09-27 18:36:14 2
9 2011-09-27 18:36:16 2
10 2011-09-27 18:36:35 2
Group Should be like this
Id DateTime Rank GroupRank
1 2011-09-27 18:36:15 1 1
2 2011-09-27 18:36:15 1 1
3 2011-09-27 18:36:19 1 1
4 2011-09-27 18:36:23 1 1
5 2011-09-27 18:36:26 1 2
6 2011-09-27 18:36:30 1 2
7 2011-09-27 18:36:32 1 2
8 2011-09-27 18:36:14 2 3
9 2011-09-27 18:36:16 2 3
10 2011-09-27 18:36:35 2 4
For Rank 1 Minimum time is 18:36:15 and based on that all records between 18:36:15 to 18:36:24 should be in a group and so on.
I want GroupRank in the same table. so it would be something with dense_Rank() Over clause. Can anyone help me to write the query in SQL.
You need to do this in two steps, the first is to separate each record into its 10 second groups, by getting the number of seconds difference from the minimum time for each rank, dividing it by 10, then rounding it down to the nearest integer.
SELECT *,
SecondGroup = FLOOR(DATEDIFF(SECOND,
MIN([DateTime]) OVER(PARTITION BY [Rank]),
[DateTime]) / 10.0)
FROM #T;
Which gives:
Id DateTime Rank SecondGroup
---------------------------------------------------
1 2011-09-27 18:36:15.000 1 0
2 2011-09-27 18:36:15.000 1 0
3 2011-09-27 18:36:19.000 1 0
4 2011-09-27 18:36:23.000 1 0
5 2011-09-27 18:36:26.000 1 1
6 2011-09-27 18:36:30.000 1 1
7 2011-09-27 18:36:32.000 1 1
8 2011-09-27 18:36:14.000 2 0
9 2011-09-27 18:36:16.000 2 0
10 2011-09-27 18:36:35.000 2 2
Then you can do your DENSE_RANK ordering by Rank and SecondGroup:
SELECT Id, [DateTime], [Rank],
GroupRank = DENSE_RANK() OVER(ORDER BY [Rank], SecondGroup)
FROM ( SELECT *,
SecondGroup = FLOOR(DATEDIFF(SECOND,
MIN([DateTime]) OVER(PARTITION BY [Rank]),
[DateTime]) / 10.0)
FROM #T
) AS t;
Which gives your desired output.
SAMPLE DATA
CREATE TABLE #T (Id INT, [DateTime] DATETIME, [Rank] INT);
INSERT #T (Id, [DateTime], [Rank])
VALUES
(1, '2011-09-27 18:36:15', 1),
(2, '2011-09-27 18:36:15', 1),
(3, '2011-09-27 18:36:19', 1),
(4, '2011-09-27 18:36:23', 1),
(5, '2011-09-27 18:36:26', 1),
(6, '2011-09-27 18:36:30', 1),
(7, '2011-09-27 18:36:32', 1),
(8, '2011-09-27 18:36:14', 2),
(9, '2011-09-27 18:36:16', 2),
(10, '2011-09-27 18:36:35', 2);