BigQuery full join two tables without a column to join on - google-bigquery

I have two tables:
t1
col1 col2
a a
a c
b a
b d
c a
c d
t2
team game
mazs 1
mazs 2
doos 1
bahs 3
...
t2 is a very long table with many teams and games, whereas t1 is shown in its entirety - a table with 6 rows, featuring combinations of the letters a,b,c,d. Note that t1 is not a fully exhaustive list of a,b,c,d pairings, only the pairings in the 6 rows that appear.
I'd like to create a table that looks like this:
output
team game col1 col2
mazs 1 a a
mazs 1 a c
mazs 1 b a
mazs 1 b d
mazs 1 c a
mazs 1 c d
mazs 2 a a
mazs 2 a c
mazs 2 b a
mazs 2 b d
mazs 2 c a
mazs 2 c d
What's going on here is that for each row in t2, there are 6 rows in output, one row for each of the col1, col2 pairings from t1.
t1 and t2 are created on my end from the following queries:
SELECT col1, col2 FROM sometable GROUP BY col1, col2
SELECT DISTINCT team, game FROM anothertable
The first query creates t1 and the second query creates t2.

It is called CROSS JOIN, see below example:
With t1 as (
select 'a' col1, 'a' col2 union all
select 'a' col1, 'c' col2 union all
select 'b' col1, 'a' col2 union all
select 'b' col1, 'd' col2 union all
select 'c' col1, 'a' col2 union all
select 'c' col1, 'd' col2),
t2 as (
select 'mazs' team, 1 game union all
select 'mazs' team, 2 game union all
select 'doos' team, 1 game union all
select 'bahs' team, 3 game
)
SELECT * FROM t2 cross join t1;
Output:
+------+------+------+------+
| team | game | col1 | col2 |
+------+------+------+------+
| mazs | 1 | a | a |
| mazs | 1 | a | c |
| mazs | 1 | b | a |
| mazs | 1 | b | d |
| mazs | 1 | c | a |
| mazs | 1 | c | d |
| mazs | 2 | a | a |
| mazs | 2 | a | c |
| mazs | 2 | b | a |
| mazs | 2 | b | d |
| mazs | 2 | c | a |
| mazs | 2 | c | d |
| doos | 1 | a | a |
| doos | 1 | a | c |
| doos | 1 | b | a |
| doos | 1 | b | d |
| doos | 1 | c | a |
| doos | 1 | c | d |
| bahs | 3 | a | a |
| bahs | 3 | a | c |
| bahs | 3 | b | a |
| bahs | 3 | b | d |
| bahs | 3 | c | a |
| bahs | 3 | c | d |
+------+------+------+------+

Related

SQL: Get row number which increases every time a value changes

I have the following table in Vertica:
+----------+----------+----------+
| column_1 | column_2 | column_3 |
+----------+----------+----------+
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 1 |
| b | 2 | 1 |
| b | 3 | 1 |
| c | 1 | 1 |
| c | 2 | 1 |
| c | 3 | 1 |
| c | 1 | 2 |
| c | 2 | 2 |
| c | 3 | 2 |
+----------+----------+----------+
The table is ordered by column_1 and column_3.
I would like to add a row number, which increases every time when column_1 or column_3 change their value. It would look something like this:
+----------+----------+----------+------------+
| column_1 | column_2 | column_3 | row_number |
+----------+----------+----------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 1 | 1 | 2 |
| b | 2 | 1 | 2 |
| b | 3 | 1 | 2 |
| c | 1 | 1 | 3 |
| c | 2 | 1 | 3 |
| c | 3 | 1 | 3 |
| c | 1 | 2 | 4 |
| c | 2 | 2 | 4 |
| c | 3 | 2 | 4 |
+----------+----------+----------+------------+
I tried using partition over but I can't find the right syntax.
Vertica has the CONDITIONAL_CHANGE_EVENT() analytic functions.
It starts at 0, and increments by 1 every time the expression that makes the first argument undergoes a change.
Like so:
WITH
indata(column_1,column_2,column_3,rn) AS (
SELECT 'a',1,1,1
UNION ALL SELECT 'a',2,1,1
UNION ALL SELECT 'a',3,1,1
UNION ALL SELECT 'b',1,1,2
UNION ALL SELECT 'b',2,1,2
UNION ALL SELECT 'b',3,1,2
UNION ALL SELECT 'c',1,1,3
UNION ALL SELECT 'c',2,1,3
UNION ALL SELECT 'c',3,1,3
UNION ALL SELECT 'c',1,2,4
UNION ALL SELECT 'c',2,2,4
UNION ALL SELECT 'c',3,2,4
)
SELECT
*
, CONDITIONAL_CHANGE_EVENT(
column_1||column_3::VARCHAR
) OVER w + 1 AS rownum
FROM indata
WINDOW w AS (ORDER BY column_1,column_3,column_2)
;
-- out column_1 | column_2 | column_3 | rn | rownum
-- out ----------+----------+----------+----+--------
-- out a | 1 | 1 | 1 | 1
-- out a | 2 | 1 | 1 | 1
-- out a | 3 | 1 | 1 | 1
-- out b | 1 | 1 | 2 | 2
-- out b | 2 | 1 | 2 | 2
-- out b | 3 | 1 | 2 | 2
-- out c | 1 | 1 | 3 | 3
-- out c | 2 | 1 | 3 | 3
-- out c | 3 | 1 | 3 | 3
-- out c | 1 | 2 | 4 | 4
-- out c | 2 | 2 | 4 | 4
-- out c | 3 | 2 | 4 | 4
In the absence of an ORDER BY, SQL data sets are unordered. To establish the order in your example therefore, I've assumed the dataset can be sorted with ORDER BY column_1, column_3, column_2
If that assumption doesn't work, you MUST add additional columns that the data can be deterministically sorted by.
That gives the following query...
SELECT
yourTable.*,
DENSE_RANK() OVER (ORDER BY column_1, column_3) AS row_number
FROM
yourTable
ORDER BY
column_1, column_3, column_2
This would also work and doesn't require table sorting
Find distinct value from column_1 and column_3 and give new index for them
Merge the previous with origin table on column_1 and column_3
select t1.*, t2.row_number
from
your_table t1
join
(select column_1, column_2, row_number() over (partition by temp) as row_number from (select distinct column_1, column_2, 1 as temp from your_table) foo) t2
on
t1.column_1=t2.column_1 and t1.column_2=t2.column_2;

Cte within Cte in SQL

I have been encountered with a situation where I need to apply a where, group by condition on the result of CTE in the CTE.
Table 1 as follows
+---+---+---+---+
| x | y | z | w |
+---+---+---+---+
| 1 | 2 | 3 | 1 |
| 2 | 3 | 4 | 2 |
| 3 | 2 | 5 | 3 |
| 1 | 2 | 6 | 2 |
+---+---+---+---+
Table 2 as follows
+---+---+-----+---+
| a | b | c | d |
+---+---+-----+---+
| 1 | m | 100 | 1 |
| 2 | n | 23 | 2 |
| 4 | o | 34 | 4 |
| 1 | m | 23 | 2 |
+---+---+-----+---+
Assuming I have the data of following sql query in a table called TAB
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a and cte1.w=cte2.d
Result of above CTE would be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 2 | 3 | 4 | 2 | n | 23 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+
I would like to query the following from the table TAB
select * from TAB where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123)
I'm trying to formulate the SQL query as follows but it's not as i expected:
select * from (
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a) as TAB
where ((X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123))
The final result must be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+
I don't think DB2 allows CTEs in subqueries or to be nested. Why not just write this using another CTE?
with cte as (
select x,y,z from
table1
),
cte1 as (
select a,b,c
from table2
),
tab as (
select cte.x,cte1.y,cte1.z,cte1.w,cte2.b,cte2.c
from cte left join
cte1
on cte.x=cte.a and cte1.w=cte2.d
)
select *
from TAB
where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123);

Flatten multiple arrays with uneven lengths in BigQuery

I'm trying to flatten arrays in different columns with different lengths without duplicating the results.
For example (using standard SQL):
WITH
x AS (
SELECT
ARRAY[1,
2,
3] AS a,
ARRAY[1,
2] AS b)
SELECT
a,
b
FROM
x,
x.a,
x.b
Produces:
+-----++-----+
| a | b |
+-----++-----+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
+-----++-----+
It should look like this:
+-----++-----+
| a | b |
+-----++-----+
| 1 | 1 |
| 2 | 2 |
| 3 | null |
+-----++-----+
You can use JOIN:
SELECT a, b
FROM x LEFT JOIN
UNNEST(x.a) a left join
unnest(x.b) b
ON a = b;

T-SQL Compare a group of rows with other groups of rows

So I'm trying to filter one table by the values of multiple rows grouped by one column which match multiple rows of another table which are grouped by a column. For Exmaple:
###Table1###
+--------+-------+
| Symbol | Value |
+--------+-------+
| A | 1 |
| A | 2 |
| A | 3 |
| B | 9 |
| B | 8 |
+--------+-------+
###Table2###
+--------+-------+
| Symbol | Value |
+--------+-------+
| C | 9 |
| C | 8 |
| D | 1 |
| D | 2 |
| D | 4 |
| E | 9 |
| E | 8 |
| F | 1 |
| F | 2 |
| F | 3 |
+--------+-------+
The query needs to return C, E, and F but not D because the values for A match the values of F, and the values of B match the values of C and E.
I hope this makes sense.
You can get the match by joining the tables on the value and then counting the symbols. For your data, this should work:
select t2.symbol, t1.symbol
from (select t1.*, count(*) over (partition by symbol) as cnt
from table1 t1
) t1 join
table2 t2
on t1.value = t2.value
group by t1.symbol, t2.symbol, t1.cnt;
having count(*) = t1.cnt
This assumes:
No duplicates in either table.
You are looking for rows in table2 that match table1, but table2 could have additional values not in table1.

SQL::Self join a table to satisfy a particular condition?

I have the following table:
mysql> SELECT * FROM temp;
+----+------+
| id | a |
+----+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+----+------+
I am trying to get the following output:
+----+------+------+
| id | a | a |
+----+------+------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 4 |
+----+------+------+
but I am having a small problem. I wrote the following query:
mysql> SELECT A.id, A.a, B.a FROM temp A, temp B WHERE B.a>A.a;
but my output is the following:
+----+------+------+
| id | a | a |
+----+------+------+
| 1 | 1 | 2 |
| 1 | 1 | 3 |
| 2 | 2 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 4 |
| 3 | 3 | 4 |
+----+------+------+
Can someone tell me how to convert this into the desired output? I am trying to get a form where only the consecutive values are produced. I mean, if 2 is greater than 1 and 3 is greater than 2, I do not want 3 is greater than 1.
Option 1: "Triangular Join" - Quadratic Complexity
SELECT A.id, A.a, MIN(B.a) AS a
FROM temp A
JOIN temp B ON B.a>A.a
GROUP BY A.id, A.a;`
Option 2: "Pseudo Row_Number()" - Linear Complexity
select a_numbered.id, a_numbered.a, b_numbered.a
from
(
select id,
a,
#rownum := #rownum + 1 as rn
from temp
join (select #rownum := 0) r
order by id
) a_numbered join (
select id,
a,
#rownum2 := #rownum2 + 1 as rn
from temp
join (select #rownum2 := 0) r
order by id
) b_numbered
on b_numbered.rn = a_numbered.rn+1