BigQuery How to cross join unnest two arrays at the same time?

BigQuery How to cross join unnest two arrays at the same time? - sql

I have a large json object in bigquery. At some levels into the object I've got two different arrays at the same level.
Now if I want to pull the content of an array onto my SELECT I can do something like
SELECT arrayA
FROM ...
CROSS JOIN UNNEST(JSON_QUERY_ARRAY(json.level2.level3.arrayA))
This works well. But now I'd like to cross join the arrayB as well in the same operation. So that I get a join of the higher level information with all elements of arrayA and arrayB. To clarify no row should contain information of arrayA and arrayB I want each row to either be a join of the one or of the other array.

Related

SQL: CROSS JOIN UNNEST and include data from rows with NULLs in CROSS JOIN UNNEST column

I'm looking for assistance in the below SQL query where column policy_array has NULL values in some rows, but arrays of policy data in others. I would like to be able to include data from rows even when policy_array is NULL in the output.
When I execute the below query it executes a CROSS JOIN UNNEST as expected but also drops all data from columns with NULLs in the column policy_array as expected as well. I can imagine a work around by having an intermediate table where NULLs in policy_array are changed to something else, but I really would prefer not to do that.
SELECT
policy,
account_id,
rejects,
overturns,
appeals,
submits
FROM relevant_table
CROSS JOIN UNNEST(policy_array) AS p (policy)
WHERE
...

There are two options either LEFT JOIN with on true:
FROM relevant_table
LEFT JOIN UNNEST(policy_array) AS p (policy) ON true
Or a little bit more hackish which uses the fact that unnest supports multiple arrays - add array with one element (also note succinct syntax for cross join unnest):
FROM relevant_table,
UNNEST(policy_array, array[1]) AS p (policy, ignored)

BigQuery - How to unnest multiple nested values

I am trying to Select the two values I have highlighted in the image (attributes.price.list.item.net AND attributes.price.list.item.listPrice.gross)
I am using the following snippet but it just flattens the whole list array and returns every column within. If I try to unnest any other way, I only get errors. How can I unnest multiple nested arrays like this?
SELECT attributes.price.list
FROM my_table LEFT JOIN UNNEST(attributes.price.list)

Consider below approach
SELECT
el.item.net,
el.item.listPrice.gross
FROM my_table
LEFT JOIN UNNEST(attributes.price.list) el

How to UNNEST multiple arrays in BigQuery standardSQL

I am selecting data from Google Bigquery table which includes JSON column. My table has multiple nested arrays, one of the includes two nested levels.
here is my table schema
https://imgur.com/UBPKUMx
My statement is:
SELECT
items.*,
pay.*,
credits.creditnoteid,
credits.id,
credits.total
FROM client_account.invoices,
UNNEST(lineitems) items,
UNNEST(items.tracking),
UNNEST(payments) pay,
UNNEST(creditnotes) credits
https://imgur.com/c1YT258
Unfortunately I get no results...
Can you help me to unnest all of the arrays.

Ok, I did a test on one of my datasets. I think that creditnotes is always null. Because in my case I get no results when I unnest a column that is always null. You can fix it by using LEFT JOIN I modified your query to use left joins but you might be able to tune it better.
SELECT
items.*,
tracking.*,
pay.*,
credits.creditnoteid,
credits.id,
credits.total
FROM client_account.invoices
LEFT JOIN UNNEST(lineitems) items
LEFT JOIN UNNEST(items.tracking) tracking
LEFT JOIN UNNEST(payments) pay
LEFT JOIN UNNEST(creditnotes) credits

Duplicate Rows when self joining tables in SQL

I am trying to self join a table together based on the column "Warehouse Number". The goal is to list part numbers, descriptions, and item class of any pairs of parts that are in the same item class and same warehouse. Below is an example of the desired output and starting data.
STARTING DATA
EXAMPLE OF SOME DESIRED DATA
However, when that self join happens, there aren't "exact" duplicates but the pairs appear twice in the table.
EXAMPLE OF OUTPUT WITH PROBLEMS (HIGHLIGHTED)
I have tried most iterations of UNION, INNER JOIN, and other join methods. Is it possible to remove the pairs since it isn't technically an exact duplicate of another row?
Current SQL Code

You may alter your join condition to check that the first part number is strictly less than the second one:
SELECT
t1.PARTNUMB, t1.PARTDESC, t1.ITEMCLSS, t2.PARTNUMB, t2.PARTDESC, t2.ITEMCLSS
FROM PARTFIRST t1
INNER JOIN PARTSECOND t2
ON t1.WRHSNUMB = t2.WRHSNUMB AND
t1.ITEMCLSS = t2.ITEMCLSS AND
t1.PARTNUMB < t2.PARTNUMB;
The problem with using FIRST.PARTNUMB <> SECOND.PARTNUMB is that it would report two different part numbers twice, once on the left/right side and vice-versa. By using a strictly less than inequality, we exclude "duplicates," as you view them.

Integrating multiple datasets for the purpose of filtering (without joins) using BigQuery

I am trying to filter a dataset according to a condition in another dataset. In code it is something like this (though this doesn't work):
SELECT
location_integer
FROM
[datasetA]
WHERE
(SELECT COUNT(*) FROM datasetB
WHERE
datasetB.region_start < datasetA.location_integer
AND datasetA.location_integer < datasetB.region_end) > 1
In words: datasetA has a column of locations (integers) and datasetB has a column of regions specified by region_start and region_end. I want to filter datasetA by whether there exists a region containing datasetA.location in datasetB. If no such region exists, I want to filter that row out.
It would be equally good to create a intermediary table containing the number of regions containing each location in datasetA and then filtering on that, but I haven't managed to figure that out either.
Do these tables have to be included in the same dataset in order for this to work?
Thanks for the help.

Pretty much any answer that combines data from two tables in BigQuery will boil down to a union or a join. Union is clearly not helpful in your use case, so you're looking at a join.
Unfortunately this is a pretty tricky problem, since BigQuery's join conditions only allow conjunctions of equalities (e.g., a.f = b.f AND a.g = b.g).
If your tables are not too large, you can CROSS JOIN them, and filter out the remaining rows. But that solution doesn't scale well as your tables become large, because the amount of intermediate data generated can be pretty huge.
Alternatively, if your regions contain a small number of discrete values, you could join datasetB with a table of integers in order to generate the list of points contained in each region, and then join that table with the location table.
SELECT location_integer
FROM datasetA
WHERE location_integer IN
(SELECT (datasetB.region_start + integers.n) AS region_point
FROM datasetB
CROSS JOIN integers
WHERE integers.n <= (datasetB.region_end - datasetA.region_start))
This approach cuts down on the size of the CROSS JOIN assuming you can guarantee that the maximum size of a region is small.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery How to cross join unnest two arrays at the same time? - sql

Related

SQL: CROSS JOIN UNNEST and include data from rows with NULLs in CROSS JOIN UNNEST column

BigQuery - How to unnest multiple nested values

How to UNNEST multiple arrays in BigQuery standardSQL

Duplicate Rows when self joining tables in SQL

Integrating multiple datasets for the purpose of filtering (without joins) using BigQuery

Categories

Resources