Flatten multiple arrays with uneven lengths in BigQuery

Flatten multiple arrays with uneven lengths in BigQuery - sql

I'm trying to flatten arrays in different columns with different lengths without duplicating the results.
For example (using standard SQL):
WITH
x AS (
SELECT
ARRAY[1,
2,
3] AS a,
ARRAY[1,
2] AS b)
SELECT
a,
b
FROM
x,
x.a,
x.b
Produces:
+-----++-----+
| a | b |
+-----++-----+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
+-----++-----+
It should look like this:
+-----++-----+
| a | b |
+-----++-----+
| 1 | 1 |
| 2 | 2 |
| 3 | null |
+-----++-----+

You can use JOIN:
SELECT a, b
FROM x LEFT JOIN
UNNEST(x.a) a left join
unnest(x.b) b
ON a = b;

Related

Oracle SQL unpivot and keep rows with null values [duplicate]

This question already has an answer here:
oracle - querying NULL values in unpivot query
(1 answer)
Closed 2 years ago.
I'm currently doing an unpivot for a Oracle Data Source (v.12.2) like this:
SELECT *
FROM some_table
UNPIVOT (
(X,Y,Val)
FOR SITE
IN (
(SITE1_X, SITE1_Y, SITE1_VAL) AS '1',
(SITE2_X, SITE2_Y, SITE2_VAL) AS '2',
(SITE3_X, SITE3_Y, SITE3_VAL) AS '3'
))
This works totally fine so far. There is only one exception - I have another column, let's say extend_info, ... if this column has the value y, there will be only one row of this column and all the site columns will be null. Nevertheless I would like to keep this row and not drop it.
I'm not really sure how to do this or what would be a nice way to do this. Any recommendations?
Example:
Original Table:
ID | SITE1_X | SITE1_Y |SITE1_VAL | SITE2_X | SITE2_Y | SITE2_VAL | ... | extend_info
-------
1 | 0 | 0 | 5 | 1 | 1 | 10 | ... | n
2 | 0 | 0 | 3 | null | null | null | ... | n
3 | null | null | null | null | null | null | ... | y
current output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
desired output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
4 | | | | | y
I don't really care what is in SITE|X|Y|VAL in that case, can be 0 for everything or null.
Bonus question:
If extend_info is y I would like to join another table with this ID. The other table looks like this:
ID | F_ID | X | Y | VAL
-----
1 | 4 | 1 | 1 | 8
2 | 4 | 2 | 2 | 9
and in that case my final output table should look like:
ID | SITE | X | Y | VAL | X_OTHER_TABLE | Y_OTHER_TABLE
-------
1 | 1 | 0 | 0 | 5 |
2 | 1 | 1 | 1 | 10 |
3 | 2 | 0 | 0 | 3 |
4 | | | | 8 | 1 | 1
5 | | | | 9 | 2 | 2
I know... the database structure is super ugly but that is what a vendor provides us and we are trying to create a View to make it easier to perform some data analysis tasks on it.
It doesn't have to look 1:1 like my final example - but maybe my itention gets clear = I want to have one single table/view with all the information in a single format.
Thanks for any help!

I would recommend a lateral join:
SELECT s.id, u.*
FROM some_table s CROSS JOIN LATERAL
(SELECT s.SITE1_X as SITE_X, s.SITE1_Y as SITE_Y, s.SITE1_VAL as SITE_VAL FROM DUAL UNION ALL
SELECT s.SITE2_X, s.SITE2_Y, s.SITE2_VAL FROM DUAL UNION ALL
SELECT s.SITE3_X, s.SITE3_Y, s.SITE3_VAL FROM DUAL
) u;
You can just join additional tables to this as you like.

Cte within Cte in SQL

I have been encountered with a situation where I need to apply a where, group by condition on the result of CTE in the CTE.
Table 1 as follows
+---+---+---+---+
| x | y | z | w |
+---+---+---+---+
| 1 | 2 | 3 | 1 |
| 2 | 3 | 4 | 2 |
| 3 | 2 | 5 | 3 |
| 1 | 2 | 6 | 2 |
+---+---+---+---+
Table 2 as follows
+---+---+-----+---+
| a | b | c | d |
+---+---+-----+---+
| 1 | m | 100 | 1 |
| 2 | n | 23 | 2 |
| 4 | o | 34 | 4 |
| 1 | m | 23 | 2 |
+---+---+-----+---+
Assuming I have the data of following sql query in a table called TAB
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a and cte1.w=cte2.d
Result of above CTE would be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 2 | 3 | 4 | 2 | n | 23 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+
I would like to query the following from the table TAB
select * from TAB where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123)
I'm trying to formulate the SQL query as follows but it's not as i expected:
select * from (
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a) as TAB
where ((X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123))
The final result must be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+

I don't think DB2 allows CTEs in subqueries or to be nested. Why not just write this using another CTE?
with cte as (
select x,y,z from
table1
),
cte1 as (
select a,b,c
from table2
),
tab as (
select cte.x,cte1.y,cte1.z,cte1.w,cte2.b,cte2.c
from cte left join
cte1
on cte.x=cte.a and cte1.w=cte2.d
)
select *
from TAB
where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123);

Select Tree Structure from two tables

let us assume I have Table A
| PK | Name |
-------------
| 1 | AA |
| 2 | BB |
| 3 | CC |
and table B
| PK | FK | Value |
-------------------
| 1 | 1 | i |
| 2 | 1 | j |
| 3 | 2 | x |
| 4 | 2 | y |
| 5 | 3 | l |
| 6 | 3 | k |
how can I select the below result
| PK | Name |
-------------
| 1 | AA |
| 1 | i |
| 2 | j |
| 2 | BB |
| 3 | x |
| 4 | y |
| 3 | CC |
| 3 | l |
| 4 | k |
List parents and under each parent list its children
Many Thanks for help

Very interesting table design. I believe it's just a matter of unioning your data and then ordering your results how you want them. If it's only a single level child-parent relationship, this should work just fine.
If Object_Id('tempdb..#TableA') Is Not Null Drop Table #TableA;
If Object_Id('tempdb..#TableB') Is Not Null Drop Table #TableB;
Select * Into #TableA
From (Values (1,'AA'),(2,'BB'),(3,'CC')) As a(PK,[Name])
Select * Into #TableB
From (Values (1,1,'i'),(2,1,'j'),(3,2,'x'),(4,2,'y'),(5,3,'l'),(6,3,'k')) As a(PK,FK,[Value])
;With Cte
As
(
Select PK,[Name],PK As OrderID,0 As LevelID /*Used to ensure parents are put above children*/
From #TableA
Union All
Select PK,[Value],FK,1
From #TableB
)
Select [PK], [Name]
From Cte
Order By OrderID,LevelID
Results:
PK | Name
1 | AA
1 | i
2 | j
2 | BB
3 | x
4 | y
3 | CC
5 | l
6 | k
Note: My last two rows(l and k) are a bit different than results. I assumed you it was typo when you put 3 and 4 as the id's rather than 5 and 6

SQL - Results based partially on aggregate of particular column

Thanks in advance for any assistance. I have a situation where I need a snapshot of SQL data but part of the results need to be based on the aggregate of one column. Here's a tiny subset of my data:
| A | B | last_date | next_date | C | D |
| 1 | 3 | 01/01/2000 | 01/01/2003 | 1 | 1 |
| 1 | 3 | 01/01/2001 | 01/01/2004 | 1 | 2 |
| 2 | 3 | 01/01/2002 | 01/01/2005 | 2 | 3 |
| 2 | 4 | 01/01/2003 | 01/01/2006 | 3 | 4 |
My results need to be grouped by columns A and B, the MAX of last_date and the MIN of next date. But the kicker is that the values for columns C and D should be the values that correspond to the MIN of next date. So for the above data subset my results would be:
| A | B | last_date | next_date | C | D |
| 1 | 3 | 01/01/2001 | 01/01/2003 | 1 | 1 |
| 2 | 3 | 01/01/2002 | 01/01/2005 | 2 | 3 |
| 2 | 4 | 01/01/2003 | 01/01/2006 | 3 | 4 |
Note how the first row of results has the value of last_date from the 2nd row of the initial data, but the values for columns C and D correspond to the first row from the initial data. In the case where there is an exact duplication of columns A, B, max(last_date), and min(next_date) but the values for columns C and D don't match, then I don't care which one is returned - but I must only return one row, not multiples.

You can use row_number adn get this results as below:
Select A, B, MaxLast_date, MinNext_date, C, D from (
select *, max(last_date) over(partition by A, B) as MaxLast_date, Min(next_date) over(partition by A, B) as MinNext_date,
next_rn = Row_number() over(partition by A, B order by next_date) from #yourtable
) a
Where a.next_rn = 1
Other way is with top (1) with ties as below:
Select top(1) with ties *, max(last_date) over(partition by A, B) as MaxLast_date, Min(next_date) over(partition by A, B) as MinNext_date
from #yourtable
Order by Row_number() over(partition by A, B order by next_date)
Output:
+---+---+--------------+--------------+---+---+
| A | B | MaxLast_date | MinNext_date | C | D |
+---+---+--------------+--------------+---+---+
| 1 | 3 | 2001-01-01 | 2003-01-01 | 1 | 1 |
| 2 | 3 | 2002-01-01 | 2005-01-01 | 2 | 3 |
| 2 | 4 | 2003-01-01 | 2006-01-01 | 3 | 4 |
+---+---+--------------+--------------+---+---+
Demo

SQL::Self join a table to satisfy a particular condition?

I have the following table:
mysql> SELECT * FROM temp;
+----+------+
| id | a |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+----+------+
I am trying to get the following output:
+----+------+------+
| id | a | a |
+----+------+------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 4 |
+----+------+------+
but I am having a small problem. I wrote the following query:
mysql> SELECT A.id, A.a, B.a FROM temp A, temp B WHERE B.a>A.a;
but my output is the following:
+----+------+------+
| id | a | a |
+----+------+------+
| 1 | 1 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 4 |
| 3 | 3 | 4 |
+----+------+------+
Can someone tell me how to convert this into the desired output? I am trying to get a form where only the consecutive values are produced. I mean, if 2 is greater than 1 and 3 is greater than 2, I do not want 3 is greater than 1.

Option 1: "Triangular Join" - Quadratic Complexity
SELECT A.id, A.a, MIN(B.a) AS a
FROM temp A
JOIN temp B ON B.a>A.a
GROUP BY A.id, A.a;`
Option 2: "Pseudo Row_Number()" - Linear Complexity
select a_numbered.id, a_numbered.a, b_numbered.a
from
(
select id,
a,
#rownum := #rownum + 1 as rn
from temp
join (select #rownum := 0) r
order by id
) a_numbered join (
select id,
a,
#rownum2 := #rownum2 + 1 as rn
from temp
join (select #rownum2 := 0) r
order by id
) b_numbered
on b_numbered.rn = a_numbered.rn+1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Flatten multiple arrays with uneven lengths in BigQuery - sql

You can use JOIN: SELECT a, b FROM x LEFT JOIN UNNEST(x.a) a left join unnest(x.b) b ON a = b;

Related

Oracle SQL unpivot and keep rows with null values [duplicate]

Cte within Cte in SQL

Select Tree Structure from two tables

SQL - Results based partially on aggregate of particular column

SQL::Self join a table to satisfy a particular condition?

Categories

Resources