Sum over multiple levels of nested repeated fields

Sum over multiple levels of nested repeated fields - google-bigquery

I have several order-detail tables in the source database: Order Header -> Order Line -> Shipped Line -> Received Line
I create a BQ table with two levels of nested repeated fields. Here is how some sample data looks like:
WITH stol as (
SELECT 1 AS stol_id, "stol-1.1" AS stol_number, 1 AS stol_transfer_order_line_id, 3 AS stol_quantity
UNION ALL
SELECT 2 AS stol_id, "stol-2.1" AS stol_number, 2 AS stol_transfer_order_line_id, 2 AS stol_quantity
UNION ALL
SELECT 3 AS stol_id, "stol-2.2" AS stol_number, 2 AS stol_transfer_order_line_id, 2 AS stol_quantity
UNION ALL
SELECT 4 AS stol_id, "stol-2.3" AS stol_number, 2 AS stol_transfer_order_line_id, 1 AS stol_quantity
),
rtol as (
SELECT 1 AS stol_id, "rtol-1.1" as rtol_number, 2 as rtol_quantity
UNION ALL
SELECT 1 as stol_id, "rtol-1.2" as rtol_number, 1 AS rtol_quantity
UNION ALL
SELECT 2 as stol_id, "rtol-2.1" as rtol_number, 2 AS rtol_quantity
UNION ALL
SELECT 3 as stol_id, "rtol-2.2" as rtol_number, 1 AS rtol_quantity
),
tol as (
SELECT 1 as tol_id, "tol-1" as tol_number, 3 as tol_transfer_quantity
UNION ALL
SELECT 2 as tol_id, "tol-2" AS tol_number, 5 AS tol_transfer_quantity
),
nest AS (
SELECT s.stol_id,
s.stol_number,
s.stol_quantity,
s.stol_transfer_order_line_id,
ARRAY_AGG(STRUCT(r.rtol_number, r.rtol_quantity)) as received
FROM stol s
LEFT JOIN rtol r ON s.stol_id = r.stol_id
GROUP BY 1, 2, 3, 4
),
final as (
SELECT t.tol_id
,t.tol_number
,t.tol_transfer_quantity
,ARRAY_AGG(STRUCT(n.stol_number, n.stol_quantity, n.received)) as shipped
FROM tol t
LEFT JOIN nest n ON t.tol_id = n.stol_transfer_order_line_id
GROUP BY 1, 2, 3
)
I want to sum the shipped and received quantities for each order line. I can get the correct result like so:
shipped as (
SELECT tol_number
,SUM(stol_quantity) as shipped_q
FROM final t, t.shipped
GROUP BY 1
),
received as (
SELECT tol_number
,SUM(rtol_quantity) as received_q
FROM final t, t.shipped s, s.received
GROUP BY 1
)
SELECT t.tol_number
,t.tol_transfer_quantity
,s.shipped_q
,r.received_q
FROM final t
LEFT JOIN shipped s on t.tol_number = s.tol_number
LEFT JOIN received r ON t.tol_number = r.tol_number
Correct results:
Row tol_number tol_transfer_quantity shipped_q received_q
1 tol-1 3 3 3
2 tol-2 5 5 3
What i am wondering is if there is a better way to do this? Trying something like this will over count the first level of nesting but just feels and looks a lot cleaner:
SELECT tol_number
,tol_transfer_quantity
,SUM(stol_quantity) as shipped_q
,SUM(rtol_quantity) as shipped_r
FROM final t, t.shipped s, s.received
GROUP BY 1, 2
Wrong result for shipped_q:
Row tol_number tol_transfer_quantity shipped_q shipped_r
1 tol-2 5 5 3
2 tol-1 3 6 3
Many thanks for any ideas.

#standardSQL
SELECT
tol_id,
tol_transfer_quantity,
(SELECT SUM(stol_quantity) FROM final.shipped) shipped_q,
(SELECT SUM(rtol_quantity) FROM final.shipped s, s.received) shipped_r
FROM final

I'd suggest you use sub-selects in which you treat your arrays like tables:
SELECT
tol_id,
SUM(tol_transfer_quantity),
SUM( (SELECT SUM(stol_quantity) FROM final.shipped) ) shipped_q,
SUM( (SELECT SUM(rtol_quantity) FROM final.shipped s, s.received) ) shipped_r
FROM
final
GROUP BY
1
hth!

Related

How to display null values in IN operator for SQL with two conditions in where

I have this query
select *
from dbo.EventLogs
where EntityID = 60181615
and EventTypeID in (1, 2, 3, 4, 5)
and NewValue = 'Received'
If 2 and 4 does not exist with NewValue 'Received' it shows this
current results
What I want

Ideally you should maintain somewhere a table containing all possible EventTypeID values. Sans that, we can use a CTE in place along with a left join:
WITH EventTypes AS (
SELECT 1 AS ID UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
)
SELECT et.ID AS EventTypeId, el.*
FROM EventTypes et
LEFT JOIN dbo.EventLogs el
ON el.EntityID = 60181615 AND
el.NewValue = 'Received'
WHERE
et.ID IN (1,2,3,4,5);

How to add rows to a specific number multiple times in the same query

I already asked for help on a part of my problem here.
I used to get 10 rows no matter if there are filled or not. But now I'm facing something else where I need to do it multiple times in the same query result.
WITH NUMBERS AS
(
SELECT 1 rowNumber
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 9
UNION ALL
SELECT 10
)
SELECT DISTINCT sp.SLC_ID, c.rowNumber, c.PCE_ID
FROM SELECT_PART sp
LEFT JOIN (
SELECT b.*
FROM NUMBERS
LEFT OUTER JOIN (
SELECT a.*
FROM (
SELECT SELECT_PART.SLC_ID, ROW_NUMBER() OVER (ORDER BY SELECT_PART.SLC_ID) as
rowNumber, SELECT_PART.PCE_ID
FROM SELECT_PART
WHERE SELECT_PART.SLC_ID = (must be the same as sp.SLC_ID and can''t hardcode it)
) a
) b
ON b.rowNumber = NUMBERS.rowNumber
) c ON c.SLC_ID = sp.SLC_ID
ORDER BY sp.SLC_ID, c.rowNumber
It works fine for the first 10 lines, but next SLC_ID only got 1 empty line
I need it to be like that
SLC_ID rowNumer PCE_ID
1 1 0001
1 2 0002
1 3 NULL
1 ... ...
1 10 NULL
2 1 0011
2 2 0012
2 3 0013
2 ... ...
2 10 0020
3 1 0021
3 ... ...
Really need it that way to build a report.

Instead of manually building a query-specific number list where you have to include every possible number you need (1 through 10 in this case), create a numbers table.
DECLARE #UpperBound INT = 1000000;
;WITH cteN(Number) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id]) - 1
FROM sys.all_columns AS s1
CROSS JOIN sys.all_columns AS s2
)
SELECT [Number] INTO dbo.Numbers
FROM cteN WHERE [Number] <= #UpperBound;
CREATE UNIQUE CLUSTERED INDEX CIX_Number ON dbo.Numbers([Number])
WITH
(
FILLFACTOR = 100, -- in the event server default has been changed
DATA_COMPRESSION = ROW -- if Enterprise & table large enough to matter
);
Source: mssqltips
Alternatively, since you can't add data, use a table that already exists in SQL Server.
WITH NUMBERS AS
(
SELECT DISTINCT Number as rowNumber FROM master..spt_values where type = 'P'
)
SELECT DISTINCT sp.SLC_ID, c.rowNumber, c.PCE_ID
FROM SELECT_PART sp
LEFT JOIN (
SELECT b.*
FROM NUMBERS
LEFT OUTER JOIN (
SELECT a.*
FROM(
SELECT SELECT_PART.SLC_ID, ROW_NUMBER() OVER (ORDER BY SELECT_PART.SLC_ID) as
rowNumber, SELECT_PART.PCE_ID
FROM SELECT_PART
WHERE SELECT_PART.SLC_ID = (must be the same as sp.SLC_ID and can''t hardcode it)
) a
) b
ON b.rowNumber = NUMBERS.rowNumber
) c ON c.SLC_ID = sp.SLC_ID
ORDER BY sp.SLC_ID, c.rowNumber
NOTE: Max value for this solution is 2047

SQL: Query complete hierarchy based on a primary key

We have a table like below
folderid name parent
==========================
1 one null
2 two 1
3 three 2
4 four 3
5 five 4
6 six 5
Is there a way to retrieve the complete list of records when given a folderid. For example if 1 is passed it should return the complete hierarchy till the leaf that is 6. If 6 is passed it should return the complete hierarchy till the root that is 1. If 4 is passed it should return the complete hierarchy from root to the leaf that is from 1 to 6.

You can use a recursive CTE:
with cte as (
select folderid
from t
where folderid = 1
union all
select t.folderid
from cte join
t
on cte.folderid = t.parent
)
select *
from cte
option (maxrecursion 0);
If you want additional columns, you can either include them in the recursive CTE or you can join them in the outer query.
Here is a db<>fiddle.
EDIT:
If you want to walk up and down the tree, I would recommend two CTEs:
with cte_c as (
select folderid, 1 as lev
from t
where folderid = 4
union all
select t.folderid, lev + 1
from cte_c join
t
on cte_c.folderid = t.parent
),
cte_p as (
select parent, 1 as lev
from t
where folderid = 4
union all
select t.parent as folderid, lev + 1
from cte_p join
t
on cte_p.parent = t.folderid
where t.parent is not null
)
select folderid
from cte_c
union all
select parent
from cte_p
where parent is not null
option (maxrecursion 0);
Here is a db<>fiddle for this version.

Select Union SQL

I am using the following query :
select 8 Union Select 0 Union Select 15
to populate the these 3 number in a column. The result I get is:
0
8
15
But I want 8 to come first and then 0 and then 15, e.g.
8
0
15
How do I do this?

Use UNION ALL
E.g.
select 8 UNION ALL Select 0 UNION ALL Select 15

#SimonMartin's answer works for the exact data set you give, but be aware that if your data set contains duplicate values, the UNION ALL will produce different results than UNION.
The UNION operator removes duplicates, whereas the UNION ALL will preserve them (as well as their order, as noted in #SimonMartin's answer).
If you want to combine the functionality of your UNION operator with the ordering capabilities provided by UNION ALL, then you need to start with UNION ALL then filter out the duplicate values yourself:
-- baseline query + 1 duplicate record at the end
with query as
(
select 8 as Val
UNION ALL
Select 0 as Val
UNION ALL
Select 15 as Val
UNION ALL
Select 0 as Val
)
-- now add row numbers
, queryWithRowNumbers as
(
select row_number() over (order by (select 0)) as rn, Val
from query
)
-- finally, get rid of the duplicates
select Val from (
select Val, min(rn) as minRn
from querywithrownumbers
group by val
) q
order by minRn
This will give results of
8
0
15
whereas if you ONLY use UNION ALL you will end up with
8
0
15
0

SELECT DISTINCT for data groups

I have following table:
ID Data
1 A
2 A
2 B
3 A
3 B
4 C
5 D
6 A
6 B
etc. In other words, I have groups of data per ID. You will notice that the data group (A, B) occurs multiple times. I want a query that can identify the distinct data groups and number them, such as:
DataID Data
101 A
102 A
102 B
103 C
104 D
So DataID 102 would resemble data (A,B), DataID 103 would resemble data (C), etc. In order to be able to rewrite my original table in this form:
ID DataID
1 101
2 102
3 102
4 103
5 104
6 102
How can I do that?
PS. Code to generate the first table:
CREATE TABLE #t1 (id INT, data VARCHAR(10))
INSERT INTO #t1
SELECT 1, 'A'
UNION ALL SELECT 2, 'A'
UNION ALL SELECT 2, 'B'
UNION ALL SELECT 3, 'A'
UNION ALL SELECT 3, 'B'
UNION ALL SELECT 4, 'C'
UNION ALL SELECT 5, 'D'
UNION ALL SELECT 6, 'A'
UNION ALL SELECT 6, 'B'

In my opinion You have to create a custom aggregate that concatenates data (in case of strings CLR approach is recommended for perf reasons).
Then I would group by ID and select distinct from the grouping, adding a row_number()function or add a dense_rank() your choice. Anyway it should look like this
with groupings as (
select concat(data) groups
from Table1
group by ID
)
select groups, rownumber() over () from groupings

The following query using CASE will give you the result shown below.
From there on, getting the distinct datagroups and proceeding further should not really be a problem.
SELECT
id,
MAX(CASE data WHEN 'A' THEN data ELSE '' END) +
MAX(CASE data WHEN 'B' THEN data ELSE '' END) +
MAX(CASE data WHEN 'C' THEN data ELSE '' END) +
MAX(CASE data WHEN 'D' THEN data ELSE '' END) AS DataGroups
FROM t1
GROUP BY id
ID DataGroups
1 A
2 AB
3 AB
4 C
5 D
6 AB
However, this kind of logic will only work in case you the "Data" values are both fixed and known before hand.
In your case, you do say that is the case. However, considering that you also say that they are 1000 of them, this will be frankly, a ridiculous looking query for sure :-)
LuckyLuke's suggestion above would, frankly, be the more generic way and probably saner way to go about implementing the solution though in your case.

From your sample data (having added the missing 2,'A' tuple, the following gives the renumbered (and uniqueified) data:
with NonDups as (
select t1.id
from #t1 t1 left join #t1 t2
on t1.id > t2.id and t1.data = t2.data
group by t1.id
having COUNT(t1.data) > COUNT(t2.data)
), DataAddedBack as (
select ID,data
from #t1 where id in (select id from NonDups)
), Renumbered as (
select DENSE_RANK() OVER (ORDER BY id) as ID,Data from DataAddedBack
)
select * from Renumbered
Giving:
1 A
2 A
2 B
3 C
4 D
I think then, it's a matter of relational division to match up rows from this output with the rows in the original table.

Just to share my own dirty solution that I'm using for the moment:
SELECT DISTINCT t1.id, D.data
FROM #t1 t1
CROSS APPLY (
SELECT CAST(Data AS VARCHAR) + ','
FROM #t1 t2
WHERE t2.id = t1.id
ORDER BY Data ASC
FOR XML PATH('') )
D ( Data )
And then going analog to LuckyLuke's solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sum over multiple levels of nested repeated fields - google-bigquery

#standardSQL SELECT tol_id, tol_transfer_quantity, (SELECT SUM(stol_quantity) FROM final.shipped) shipped_q, (SELECT SUM(rtol_quantity) FROM final.shipped s, s.received) shipped_r FROM final

I'd suggest you use sub-selects in which you treat your arrays like tables: SELECT tol_id, SUM(tol_transfer_quantity), SUM( (SELECT SUM(stol_quantity) FROM final.shipped) ) shipped_q, SUM( (SELECT SUM(rtol_quantity) FROM final.shipped s, s.received) ) shipped_r FROM final GROUP BY 1 hth!

Related

How to display null values in IN operator for SQL with two conditions in where

How to add rows to a specific number multiple times in the same query

SQL: Query complete hierarchy based on a primary key

Select Union SQL

SELECT DISTINCT for data groups

Categories

Resources