Calculate loads and avoiding cursors - sql

Given the following table structure, which is a representation of a bus route where passengers get on and off the bus with a door sensor. And, there is a person who sits on that bus with a clipboard holding a spot count.
CREATE TABLE BusLoad(
ROUTE CHAR(4) NOT NULL,
StopNumber INT NOT NULL,
ONS INT,
OFFS INT,
SPOT_CHECK INT)
go
INSERT BusLoad VALUES('AAAA', 1, 5, 0, null)
INSERT BusLoad VALUES('AAAA', 2, 0, 0, null)
INSERT BusLoad VALUES('AAAA', 3, 2, 1, null)
INSERT BusLoad VALUES('AAAA', 4, 6, 3, 8)
INSERT BusLoad VALUES('AAAA', 5, 1, 0, null)
INSERT BusLoad VALUES('AAAA', 6, 0, 1, 7)
INSERT BusLoad VALUES('AAAA', 7, 0, 3, null)
I want to add a column "LOAD" to this table that calculates the load at each stop.
Load = Previous stops load + current stop ONS - Current stop's OFFS if
SPOT_CHECK is null, otherwise LOAD = SPOT_CHECK
Expected Results:
ROUTE StopNumber ONS OFFS SPOT_CHECK LOAD
AAAA 1 5 0 NULL 5
AAAA 2 0 0 NULL 5
AAAA 3 2 1 NULL 6
AAAA 4 6 3 8 8
AAAA 5 1 0 NULL 9
AAAA 6 0 1 7 7
AAAA 7 0 3 NULL 4
I can do this with a cursor, but is there a way to do it using a query?

You can use the following query:
select ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK,
COALESCE(SPOT_CHECK, ONS - OFFS) AS ld,
SUM(CASE WHEN SPOT_CHECK IS NULL THEN 0 ELSE 1 END)
OVER (PARTITION BY ROUTE ORDER BY StopNumber) AS grp
from BusLoad
to get:
ROUTE StopNumber ONS OFFS SPOT_CHECK ld grp
----------------------------------------------------
AAAA 1 5 0 NULL 5 0
AAAA 2 0 0 NULL 0 0
AAAA 3 2 1 NULL 1 0
AAAA 4 6 3 8 8 1
AAAA 5 1 0 NULL 1 1
AAAA 6 0 1 7 7 2
AAAA 7 0 3 NULL -3 2
All you want now is the running total of ld over ROUTE, grp partitions of data:
;WITH CTE AS (
....
previous query here
)
select ROUTE, StopNumber, ONS, OFFS, SPOT_CHECK, grp,
sum(ld) over (PARTITION BY ROUTE, grp ORDER BY StopNumber) as load
from cte
Demo here
Note: The above query works for versions starting from 2012. If you want a query for 2008 you have to somehow simulate sum() over (order by ...). You can find many relevant posts here in SO.

You may use recursive query
with act_load as
(
select *, ons load
from busload
where stopnumber = 1 and route = 'AAAA'
union all
select b.*, case when b.spot_check is null then l.load + b.ons - b.offs
else b.spot_check
end load
from busload b
join act_load l on b.StopNumber = l.StopNumber + 1 and
b.route = l.route
)
select *
from act_load
dbfiddle demo

Related

SQL to ensure unique node names in adjacency list

So I have an adjacency list that forms a hierarchy simulating a versioned file structure. The problem is that the incoming file names are not currently unique and they need to be. To make things slightly more interesting the files may have different versions which should keep the name of the first version (note the versions all have the same NodeID).
Adjacency List
ParentID
NodeID
VersionNum
FileName
-1
1
1
FirstFolder
1
2
1
SecondFolder
1
3
1
ThirdFolder
1
4
1
FirstDocument
1
4
2
FirstDocument
1
5
1
FirstDocument
1
5
2
FirstDocument
2
6
1
FirstDocument
2
6
2
FirstDocument
2
7
1
SecondDocument
3
8
1
SecondDocument
3
9
1
ThirdDocument
3
9
2
ThirdDocument
3
10
1
ThirdDocument
3
11
1
ThirdDocument
Targeted Result
ParentID
NodeID
VersionNum
FileName
-1
1
1
FirstFolder
1
2
1
SecondFolder
1
3
1
ThirdFolder
1
4
1
FirstDocument
1
4
2
FirstDocument
1
5
1
FirstDocument_1
1
5
2
FirstDocument_1
2
6
1
FirstDocument
2
6
2
FirstDocument
2
7
1
SecondDocument
3
8
1
SecondDocument
3
9
1
ThirdDocument
3
9
2
ThirdDocument
3
10
1
ThirdDocument_1
3
11
1
ThirdDocument_2
*I should also note that the folder names are already guaranteed to be unique (they already exist, it is the documents that are incoming) and they only have 1 version.
CREATE TABLE #tmp_tree
(
ParentID INT,
NodeID INT,
VersionNum INT,
FileName VARCHAR(50),
);
INSERT INTO #tmp_tree (ParentID, NodeID, VersionNum, FileName)
VALUES (-1, 1, 1, 'FirstFolder' ),
(1, 2, 1, 'SecondFolder' ),
(1, 3, 1, 'ThirdFolder' ),
(1, 4, 1, 'FirstDocument' ),
(1, 4, 2, 'FirstDocument' ),
(1, 5, 1, 'FirstDocument' ),
(1, 5, 2, 'FirstDocument' ),
(2, 6, 1, 'FirstDocument' ),
(2, 6, 2, 'FirstDocument' ),
(2, 7, 1, 'SecondDocument' ),
(3, 8, 1, 'SecondDocument' ),
(3, 9, 1, 'ThirdDocument' ),
(3, 9, 2, 'ThirdDocument' ),
(3, 10, 1, 'ThirdDocument' )
(3, 11, 1, 'ThirdDocument' )
I really don't know how to approach this though resorting to a stored procedure. Adjacency list scream CTEs to me but that got me no where real fast. Group By loses the NodeID so while I can find the names of the documents that need to be renamed - I don't know how to use that to select the second occurrence of the name (ordered by NodeID).
-- I don't see how this helps... but this finds the names that need to change.
select ParentID, FileName,VersionNum, count(*) from #tmp_tree
GROUP BY ParentID, FileName, VersionNum
HAVING VersionNum = 1 and count(*) > 1
order by FileName
I know how to solve this procedural but not declaratively.
I don't know if this is closer or farther away from the solution:
select f2.*, Row_Number() over (order by f2.FileName) from
(select top 10 f.*, count(FileName) over (PARTITION by ParentID, FileName) as n from (select * from #tmp_tree where versionNum = 1) as f
order by f.ParentID, f.FileName) as f2
Where n > 1
I would assume the last line (3, 11) in the targeted result is a mistake.
You can find the repeated names with a window function in a subquery and then join it during the update. In short, you can do:
update #tmp_tree
set #tmp_tree.filename = concat(#tmp_tree.filename, '_', x.rn)
from #tmp_tree
join (
select *,
row_number() over(partition by parentid, filename order by nodeid) as rn
from #tmp_tree
where versionnum = 1
) x on x.rn > 1 and x.nodeid = #tmp_tree.nodeid;
Result:
ParentID NodeID VersionNum FileName
--------- ------- ----------- ---------------
-1 1 1 FirstFolder
1 2 1 SecondFolder
1 3 1 ThirdFolder
1 4 1 FirstDocument
1 4 2 FirstDocument
1 5 1 FirstDocument_2
1 5 2 FirstDocument_2
2 6 1 FirstDocument
2 6 2 FirstDocument
2 7 1 SecondDocument
3 8 1 SecondDocument
3 9 1 ThirdDocument
3 9 2 ThirdDocument
3 10 1 ThirdDocument_2
See running example at db<>fiddle.
You don't need to self-join the table, you can update the derived table directly, after calculating the row-number using DENSE_RANK
update x
set filename = concat(x.filename, '_', x.rn)
from (
select *,
dense_rank() over(partition by parentid, filename order by nodeid) as rn
from #tmp_tree
) x
where x.rn > 1;
db<>fiddle
DENSE_RANK will return the same number for tied results according to the ordering clause.

How to get a Parent-Child record from different level in a same table in SSIS?

Below are the details and output required.
The table has 3 columns:
Record
Parent Record
isactive
Output required as below based on inactive column:
e.g 1
Record Parent_Record Isactive
1 0 1
2 1 0
3 1 0
4 2 0
5 3 1
output
Record Parent_Record Isactive
5 1 1
e.g 2
Record Parent_Record Isactive
1 0 0
2 1 0
3 1 1
4 2 0
5 3 1
output
Record Parent_Record Isactive
5 3 1
You can use a recursive CTE to build the hierarchy starting from the leaf node going back to the higher active parent:
declare #tmp table (Record int, Parent_Record int, Isactive bit)
declare #recordToCheck int = 5
insert into #tmp values
(1, 0, 1)
,(2, 1, 0)
,(3, 1, 0)
,(4, 2, 0)
,(5, 3, 1)
;WITH RESULT (Record, Parent_Record, Isactive, Lev)
AS
(
--anchor
SELECT A.Record,Parent_Record, Isactive, 1 AS LEVEL
FROM #tmp AS A
WHERE A.Record = #recordToCheck
UNION ALL
--outer
SELECT C.Record, C.Parent_Record, C.Isactive, Lev + 1
FROM #tmp AS C
INNER JOIN RESULT AS B
ON C.Record=B.Parent_Record
)
select top 1 #recordToCheck as Record, Record as Parent_Record, Isactive
from RESULT
where Isactive = 1
order by lev desc
Result for e.g 1:
Result for e.g 2:

Adjusting table based on previous values in BigQuery

I have a table that looks like below:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|0| 0
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 0
2 |3/1/16|1| 0
3 |3/1/16|2| 0
I'm trying to make it so that flag is populated if X=2 in the PREVIOUS month. As such, it should look like this:
ID|Date |X| Flag |
1 |1/1/16|2| 0
2 |1/1/16|0| 0
3 |1/1/16|0| 0
1 |2/1/16|2| 1
2 |2/1/16|1| 0
3 |2/1/16|2| 0
1 |3/1/16|2| 1
2 |3/1/16|1| 0
3 |3/1/16|2| 1
I use this in SQL:
`select ID, date, X, flag into Work_Table from t
(
Select ID, date, X, flag,
Lag(X) Over (Partition By ID Order By date Asc) As Prev into Flag_table
From Work_Table
)
Update [dbo].[Flag_table]
Set flag = 1
where prev = '2'
UPDATE t
Set t.flag = [dbo].[Flag_table].flag FROM T
JOIN [dbo].[Flag_table]
ON t.ID= [dbo].[Flag_table].ID where T.date = [dbo].[Flag_table].date`
However I cannot do this in Bigquery. Any ideas?
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
You can test / play with it using dummy data from your question as
#standardSQL
WITH `project.dataset.work_table` AS (
SELECT 1 id, '1/1/16' dt, 2 x, 0 flag UNION ALL
SELECT 2, '1/1/16', 0, 0 UNION ALL
SELECT 3, '1/1/16', 0, 0 UNION ALL
SELECT 1, '2/1/16', 0, 0 UNION ALL
SELECT 2, '2/1/16', 1, 0 UNION ALL
SELECT 3, '2/1/16', 2, 0 UNION ALL
SELECT 1, '3/1/16', 2, 0 UNION ALL
SELECT 2, '3/1/16', 1, 0 UNION ALL
SELECT 3, '3/1/16', 2, 0
)
SELECT id, dt, x,
IF(LAG(x = 2) OVER(PARTITION BY id ORDER BY dt), 1, 0) flag
FROM `project.dataset.work_table`
ORDER BY dt, id
with result as
Row id dt x flag
1 1 1/1/16 2 0
2 2 1/1/16 0 0
3 3 1/1/16 0 0
4 1 2/1/16 0 1
5 2 2/1/16 1 0
6 3 2/1/16 2 0
7 1 3/1/16 2 0
8 2 3/1/16 1 0
9 3 3/1/16 2 1

skip consecutive rows after specific value

Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception:
skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare #req_data table (
id int primary key identity,
session_id int,
value int
)
insert into #req_data(session_id, value) values (5, 7)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (5, 1) -- ignore this one too
insert into #req_data(session_id, value) values (5, 12)
insert into #req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (5, 1) -- show this too
insert into #req_data(session_id, value) values (6, 7)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into #req_data(session_id, value) values (6, 3)
insert into #req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into #req_data(session_id, value) values (7, 7)
insert into #req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into #req_data(session_id, value) values (8, 2)
insert into #req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from #req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?
The following should work well for this.
WITH
cte_ControlValue AS (
SELECT
rd.id, rd.session_id, rd.value,
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
)
SELECT
cv.id, cv.session_id, cv.value
FROM
cte_ControlValue cv
WHERE
cv.value = 1
AND cv.ControlValue <> 7;
Results...
id session_id value
----------- ----------- -----------
5 5 1
6 5 1
7 5 1
11 6 1
13 8 1
Edit: How and why it works...
The basic premise is taken from Itzik Ben-Gan's "The Last non NULL Puzzle".
Essentially, we are relying 2 different behaviors that most people don't usually think about...
1) NULL + anything = NULL.
2) You can CAST or CONVERT an INT into a fixed length BINARY data type and it will continue to sort as an INT (as opposed to sorting like a text string).
This is easier to see when the intermittent steps are added to the query in the CTE...
SELECT
rd.id, rd.session_id, rd.value,
bv.BinVal,
SmearedBinVal = MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id),
SecondHalfAsINT = CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT),
ControlValue = ISNULL(CAST(SUBSTRING(MAX(bv.BinVal) OVER (PARTITION BY rd.session_id ORDER BY rd.id), 5, 4) AS INT), 999)
FROM
#req_data rd
CROSS APPLY ( VALUES (CAST(rd.id AS BINARY(4)) + CAST(NULLIF(rd.value, 1) AS BINARY(4))) ) bv (BinVal)
Results...
id session_id value BinVal SmearedBinVal SecondHalfAsINT ControlValue
----------- ----------- ----------- ------------------ ------------------ --------------- ------------
1 5 7 0x0000000100000007 0x0000000100000007 7 7
2 5 1 NULL 0x0000000100000007 7 7
3 5 1 NULL 0x0000000100000007 7 7
4 5 12 0x000000040000000C 0x000000040000000C 12 12
5 5 1 NULL 0x000000040000000C 12 12
6 5 1 NULL 0x000000040000000C 12 12
7 5 1 NULL 0x000000040000000C 12 12
8 6 7 0x0000000800000007 0x0000000800000007 7 7
9 6 1 NULL 0x0000000800000007 7 7
10 6 3 0x0000000A00000003 0x0000000A00000003 3 3
11 6 1 NULL 0x0000000A00000003 3 3
12 7 7 0x0000000C00000007 0x0000000C00000007 7 7
13 8 1 NULL NULL NULL 999
14 8 2 0x0000000E00000002 0x0000000E00000002 2 2
15 8 3 0x0000000F00000003 0x0000000F00000003 3 3
Looking at the BinVal column, we see an 8 byte hex value for all non-[value] = 1 rows and NULLS where [value] = 1... The 1st 4 bytes are the Id (used for ordering) and the 2nd 4 bytes are [value] (used to set the "previous non-1 value" or set the whole thing to NULL.
The 2nd step is to "smear" the non-NULL values into the NULLs using the window framed MAX function, partitioned by session_id and ordered by id.
The 3rd step is to parse out the last 4 bytes and convert them back to an INT data type (SecondHalfAsINT) and deal with any nulls that result from not having any non-1 preceding value (ControlValue).
Since we can't reference a windowed function in the WHERE clause, we have to throw the query into a CTE (a derived table would work just as well) so that we can use the new ControlValue in the where clause.
SELECT CRow.id
FROM #req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM #req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN #req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7
You can use the following query:
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
to get:
id session_id value grp
----------------------------
1 5 7 1
2 5 1 1
3 5 1 1
4 5 12 2
5 5 1 2
6 5 1 2
7 5 1 2
8 6 7 1
9 6 1 1
10 6 3 2
11 6 1 2
12 7 7 1
13 8 1 0
14 8 2 1
15 8 3 2
So, this query detects islands of consecutive 1 records that belong to the same group, as specified by the first preceding row with value <> 1.
You can use a window function once more to detect all 7 islands. If you wrap this in a second cte, then you can finally get the desired result by filtering out all 7 islands:
;with session_islands as (
select id, session_id, value,
coalesce(sum(case when value <> 1 then 1 end)
over (partition by session_id order by id), 0) as grp
from #req_data
), islands_with_7 as (
select id, grp, value,
count(case when value = 7 then 1 end)
over (partition by session_id, grp) as cnt_7
from session_islands
)
select id
from islands_with_7
where cnt_7 = 0 and value = 1

Different select criteria in odd and even events

I have a table which looks like this ( 10 billion rows)
AID BID CID
1 2 1
1 6 9
0 1 4
1 3 2
1 100 2
0 4 2
0 0 1
The AID could only be 0 or 1. BID and CID could be anything.
Now I want to select events first with AID=1 and then AID=0, and again AID=1 and then AID=0.
The idea is to select equal numbers of AID=1 and AID=0 event.
How can I achieve that?
The expected result is
AID BID CID
1 2 1
0 1 4
1 6 9
0 4 2
1 3 2
0 0 1
;WITH cte AS (
select *
FROM (VALUES
(1, 2, 1),
(1, 6, 9),
(0, 1, 4),
(1, 3, 2),
(1, 100, 2),
(0, 4, 2),
(0, 0, 1)
) as t(AID, BID, CID)
),
withrow AS (
SELECT ROW_NUMBER() OVER (PARTITION BY AID ORDER BY AID) as RN, *
FROM cte)
SELECT AID,BID,CID
FROM withrow
ORDER BY RN asc , aid desc
Output:
AID BID CID
----------- ----------- -----------
1 100 2
0 4 2
1 3 2
0 1 4
1 6 9
0 0 1
1 2 1
(7 row(s) affected)