Lag function to create new column

Lag function to create new column - sql

I have a table with two columns and I want to create a new column based on adding both columns plus the result from the previous row. For example
Xdate X Y
2022-01-01 1 2
2022-01-02 3 5
2022-01-03 2 2
2022-01-04 4 6
The result should look like this:
Xdate X Y Z
2022-01-01 1 2 2 (1*2)
2022-01-02 2 2 8 ((2+2)(from all previous row)) * 2)
2022-01-03 2 1 12 ((2+8+2) (from all previous rows) * 1)
2022-01-04 3 1 25 ((3+12+8+2) (from all previous rows) * 1)
So far I have this, but I can't figure out how to finish the query
select
*,
Lag(X+Y, 1, X+Y) OVER(ORDER BY Xdate) AS Z
from temp

You can do this by using a recursive CTE. I added a ROW_NUMBER function just in case that there are gaps in your dates.
WITH cte_rn
AS
(
SELECT xdate,x,y,ROW_NUMBER() OVER (ORDER BY xdate) as rn
FROM #temp
)
,
cte_rec
AS
(
SELECT cte_rn.xdate, cte_rn.x,cte_rn.y,cte_rn.rn,cte_rn.x * cte_rn.y as z , cte_rn.x * cte_rn.y as Zsum
FROM cte_rn
WHERE rn = 1
UNION ALL
SELECT t.xdate,t.x,t.y,t.rn,(t.x + r.Zsum) * t.y as z ,r.Zsum+(t.x + r.Zsum) * t.y as Zsum
FROM cte_rec r
JOIN cte_rn t
ON r.rn + 1 = t.rn
)
SELECT *
FROM cte_rec;

Related

The way for insert and fill table base on column in another table in sql with Several million rows

I have a table similar A With 2 million recordes
ROW ID ITEM NoOfUnit
1 1 A 2
2 2 B 1
3 3 C 3
.
.
.
I want fill table B base on NoOfUnit from A Similar to the below
ROW ID ITEM QTY
1 1 A 1
2 1 A 1
3 2 B 1
4 3 C 1
5 3 C 1
6 3 C 1
.
.
.
Number of rows in table B very large and cursor very slow...

I would just use a recursive CTE:
with cte as (
select id, item, NoOfUnit, 1 as n
from a
union all
select id, item, NoOfUnit, n + 1
from a
where n < NoOfUnit
)
insert into b (id, item, qty)
select id, item, 1
from cte;
If qty is ever greater than 100, then you need option (maxrecursion 0).

All you need to do here is duplicate your rows based on the number held in NoOfUnit, which you could do with a numbers table. You then insert the result of this into your destination table.
An example of how to do this is as follows:
Query
declare #d table(ID int, ITEM char(1),NoOfUnit int);
insert into #d values
(1,'A',2)
,(2,'B',1)
,(3,'C',3)
;
with t as(select t from(values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) as t(t)) -- table with 10 rows
,n as(select row_number() over (order by (select null)) as n from t,t t2,t t3,t t4,t t5) -- cross join 10 rows 5 times for 10 * 10 * 10 * 10 * 10 = 100,000 rows with incrementing value
select d.ID
,d.ITEM
,1 as QTY
from #d as d
join n
on d.NoOfUnit >= n.n
order by d.ID
,d.ITEM;
Output
ID
ITEM
QTY
1
A
1
1
A
1
2
B
1
3
C
1
3
C
1
3
C
1

SQL Server - group and number matching contiguous values

I have a list of stock transactions and I am using Over(Partition By) to calculate the running totals (positions) by security. Over time a holding in a particular security can be long, short or flat. I am trying to find an efficient way to extract only the transactions relating to the current position for each security.
I have created a simplified sqlfiddle to show what I have so far. The cte query generates the running total for each security (code_id) and identifies when the holdings are long (L), short (s) or flat (f). What I need is to group and number matching contiguous values of L, S or F for each code_id.
What I have so far is this:
; WITH RunningTotals as
(
SELECT
*,
RunningTotal = sum(qty) OVER (Partition By code_id Order By id)
FROM
TradeData
), LongShortFlat as
(
SELECT
*,
LSF = CASE
WHEN RunningTotal > 0 THEN 'L'
WHEN RunningTotal < 0 THEN 'S'
ELSE 'F'
END
FROM
RunningTotals
)
SELECT
*
FROM
LongShortFlat r
I think what I need to do is create a GroupNum column by applying a row_number for each group of L, S and F within each code_id so the results look like this:
id code_id qty RunningTotal LSF GroupNum
1 1 5 5 L 1
2 1 2 7 L 1
3 1 7 14 L 1
4 1 -3 11 L 1
5 1 -5 6 L 1
6 1 -6 0 F 2
7 1 5 5 L 3
8 1 5 10 L 3
9 1 -2 8 L 3
10 1 -4 4 L 3
11 2 5 5 L 1
12 2 3 8 L 1
13 2 -4 4 L 1
14 2 -2 2 L 1
15 2 -2 0 F 2
16 2 6 6 L 3
17 2 -5 1 L 3
18 2 -5 -4 S 4
19 2 2 -2 S 4
20 2 4 2 L 5
21 2 -5 -3 S 6
22 2 -2 -5 S 6
23 3 5 5 L 1
24 3 2 7 L 1
25 3 1 8 L 1
I am struggling to generate the GroupNum column.
Thanks in advance for your help.

[Revised]
Sorry about that, I read your question too quickly. I came up with a solution using a recursive common table expression (below), then saw that you've worked out a solution using LAG. I'll post my revised query anyway, for posterity. Either way, the resulting query is (imho) pretty ugly.
;WITH cteBaseAgg
as (
-- Build the "sum increases over time" data
SELECT
row_number() over (partition by td.code_id order by td.code_id, td.Id) RecurseKey
,td.code_id
,td.id
,td.qty
,sum(tdPrior.qty) RunningTotal
,case
when sum(tdPrior.qty) > 0 then 'L'
when sum(tdPrior.qty) < 0 then 'S'
else 'F'
end LSF
from dbo.TradeData td
inner join dbo.TradeData tdPrior
on tdPrior.code_id = td.code_id -- All for this code_id
and tdPrior.id <= td.Id -- For this and any prior Ids
group by
td.code_id
,td.id
,td.qty
)
,cteRecurse
as (
-- "Set" the first row for each code_id
SELECT
RecurseKey
,code_id
,id
,qty
,RunningTotal
,LSF
,1 GroupNum
from cteBaseAgg
where RecurseKey = 1
-- For each succesive row in each set, check if need to increment GroupNum
UNION ALL SELECT
agg.RecurseKey
,agg.code_id
,agg.id
,agg.qty
,agg.RunningTotal
,agg.LSF
,rec.GroupNum + case when rec.LSF = agg.LSF then 0 else 1 end
from cteBaseAgg agg
inner join cteRecurse rec
on rec.code_id = agg.code_id
and agg.RecurseKey - 1 = rec.RecurseKey
)
-- Show results
SELECT
id
,code_id
,qty
,RunningTotal
,LSF
,GroupNum
from cteRecurse
order by
code_id
,id

Sorry for making this question a bit more complicated than it needed to be but for the sake of closure I have found a solution using the lag function.
In order to achieve what I wanted I continued my cte above with the following:
, a as
(
SELECT
*,
Lag(LSF, 1, LSF) OVER(Partition By code_id ORDER BY id) AS prev_LSF,
Lag(code_id, 1, code_id) OVER(Partition By code_id ORDER BY id) AS prev_code
FROM
LongShortFlat
), b as
(
SELECT
id,
LSF,
code_id,
Sum(CASE
WHEN LSF <> prev_LSF AND code_id = prev_code
THEN 1
ELSE 0
END) OVER(Partition By code_id ORDER BY id) AS grp
FROM
a
)
select * from b order by id
Here is the updated sqlfiddle.

random row from diapason (1: n) in groups sql

I need select random row from Table using groups and order, but random's row number in group should not be more then constant (for example const = 3).
What I mean:
id time x
1 10:20 1
1 11:21 9
1 16:14 4
1 08:13 8
2 01:20 2
2 21:13 0
For id=1 rows could be:
id time x
1 10:20 1
1 11:21 9
1 08:13 8
BUT not
1 16:14 4 because in order by time it's local number more than 3
for
Id= 2 - any row

WITH cte as (
SELECT *, ROW_NUMBER() OVER (partition by id ORDER BY RANNDOM()) as rn
FROM myTable
)
SELECT *
FROM cte
WHERE rn <= 3

Something like this:
SELECT distinct on (id) *
FROM (select
row_number() over (partition by id order by time ) as up_lim
from tab1) as a
WHERE row_number <= 3
ORDER by id, random() ;

Getting both an individual value and a sum from a table in BigQuery

Suppose after a query from a bigger dataset I have a table like this:
day x y
1 4 5
2 3 6
3 3 2
4 2 1
5 8 3
From that table I want to get the values of x and y from day 1 and the sums of x and y from all days into a new table. And how to have the results in a table with two rows instead of just one? Like this:
x y
day1 4 5
days1-5 20 17
Now the best I can do is this:
SELECT
SUM(x) AS allx,
SUM(y) AS ally,
SUM(CASE WHEN day = 1 THEN x END) AS day1x,
SUM(CASE WHEN day = 1 THEN y END) AS day1y
FROM (
..
..
)
I guess there is a more clever way of doing this.

BigQuery - Legacy SQL:
Using comma style UNION ALL
SELECT
day, x, y
FROM
( SELECT 'day1' AS day, x, y
FROM YourTable
WHERE day = 1 ),
( SELECT
CONCAT('day1-',STRING(COUNT(1))) AS day,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable )
OR
Using ROLLUP
SELECT
CONCAT('day_', IFNULL(STRING(day), 'all')) AS day,
x,
y
FROM (
SELECT
DAY,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable
GROUP BY ROLLUP(day)
)
WHERE IFNULL(day, 1) = 1
BigQuery - Standard SQL:
Don't forget to uncheck Use Legacy SQL checkbox under Show Options
SELECT
'day1' AS day,
x,
y
FROM YourTable
WHERE day = 1
UNION ALL
SELECT
FORMAT('day1-%d', COUNT(1)) AS day,
SUM(x) AS x,
SUM(y) AS y
FROM YourTable
Output from al is as expected:
day x y
day1 4 5
day1-5 20 17

"Cluster" Code Help in SQL

I am relative newcomer to SQL, but have gained many useful ideas through the site. Now I'm stuck on a piece of code that seems simple enough, but for some reason I can't wrap my head around it.
I am trying to create a third column (Column Z) based off of the first two columns below:
Column X Column Y
-------------------
1 a
1 b
1 c
2 a
2 d
2 e
2 f
4 b
5 i
5 c
3 g
3 h
6 j
6 k
6 l
What i need to have happen in Column Z:
For each individual value found in Column Y, note the value of Column X
Likewise, for each individual value in Column X, note the value of Column Y
Then, cluster (RANK/ROW_NUMBER?) these into groups seen below:
Column X Column Y Column Z
-----------------------------
1 a 1
1 b 1
1 c 1
2 a 1
2 d 1
2 e 1
2 f 1
4 b 1
5 i 1
5 c 1
3 g 2
3 h 2
6 j 3
6 k 3
6 l 3
I hope I've been clear enough without over-complicating things. My head has been spinning all morning. Let me know if anyone needs any more info.
Greatly appreciated in advance!

I have faced exactly this problem for some analyses in the past. The only way I could get it to work is by doing a loop, that incrementally adds in the information.
The loop assigns the minimum "x" value within each group as the group id. By your rules, this is guaranteed to be unique. It starts by assigning the current x value to z. It then finds the minimum z along the x and y dimensions. It repeats this process until no records change.
Given your data, the following is an outline of how to do it:
update t set z = x
while 1=1
begin
with toupdate as (
select t.*,
min(z) over (partition by x) as idx,
min(z) over (partition by y) as idy from t
)
update toupdate
set z = (case when idx < idy then idx else idy end)
where z > idx or z > idy;
if (##ROWCOUNT = 0) break;
end;
;with a as
(
select z, dense_rank() over (order by z) newZ from t
)
update a set z = newZ

Maybe not the best way, but it works
SQLFiddle http://sqlfiddle.com/#!3/99532/1
;WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS row_nb
FROM #t
)
, c2 AS (
SELECT e1.*
,CASE WHEN EXISTS(SELECT * FROM cte e2 WHERE e1.Y = e2.Y and e2.row_nb < e1.row_nb) THEN 1 ELSE 0 END as ex
FROM cte e1
)
, c3 AS (
SELECT X,1 - SIGN(SUM(ex)) as ex,MAX(row_nb) as max_row_nb
FROM c2
GROUP BY X
)
SELECT
cte.X,cte.Y
,(SELECT SUM(cc3.ex) FROM c3 cc3 where cc3.max_row_nb<= c3.max_row_nb) AS Z
FROM cte
INNER JOIN c3
ON c3.X = cte.X
ORDER BY cte.row_nb

declare #t table (x tinyint, y char(1), z tinyint)
insert #t (x,y) values(1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'d'),(2,'e'),(2,'c'),
(2,'f'),(4,'b'),(5,'i'),(5,'c'),(3,'g'),(3,'h'),(6,'j'),(6,'k'),(6,'l'),(7,'v')
;with a as
(
select x,parent from
(
select x, min(x) over (partition by y) parent from #t
) a
where x > parent
), b as
(
select x, parent from a
union all
select a.x, b.parent
from a join b on a.parent = b.x
), c as
(
select x, min(parent) parent
from b
group by x
), d as
(
select t.x,t.y, t.z,
dense_rank() over (order by coalesce(c.parent, t.x)) calculatedZ
from #t t
left join c on t.x = c.x
)
select x,y,calculatedZ as z from d
-- if you want to update instead of selecting, replace last line with:
-- update d set z = newz
-- select x,y,z from #t
option (maxrecursion 0)
Result:
x y z
1 a 1
1 b 1
1 c 1
2 a 1
2 d 1
2 e 1
2 c 1
2 f 1
4 b 1
5 i 1
5 c 1
3 g 2
3 h 2
6 j 3
6 k 3
6 l 3
8 j 3
7 v 4

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lag function to create new column - sql

Related

The way for insert and fill table base on column in another table in sql with Several million rows

SQL Server - group and number matching contiguous values

random row from diapason (1: n) in groups sql

Getting both an individual value and a sum from a table in BigQuery

"Cluster" Code Help in SQL

Categories

Resources