Querying STRUCT elements in an ARRAY (Bigquery) - google-bigquery

Hi guys need help on this example on the official documentation: I cant seem to get similar results as the one on the documentation page even after copy n pasting the code there
https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#querying_nested_arrays
output according to the page:
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
AS participants)
SELECT
race,
participant
FROM races r
CROSS JOIN UNNEST(r.participants) as participant;
my sample output when i run the query above:

how do i get the output in the documentation? what should be the correct query?
Use below
SELECT
race,
TRANSLATE(FORMAT('%T', participant), '()"', '{}') AS participant
FROM races r
CROSS JOIN UNNEST(r.participants) as participant
with output

Related

Big_query SQL To have single row result instead of two rows

Im using Big Query Sql here
This is the table build
This table is showing customer id_123 has purchase in type_shop and delivery_shop & also delivery_home .
Is it possible for me to get the result to be reflect in a single row instead of 2 different rows ?
I only want to show this customer id_123 purchased in type_shop & uses delivery_home & delivery_shop in a row
I tried a few methods using array_agg(stru) but it is still shows 2 rows of result instead of 1.
Not sure what other SQL function should i try here ? try searching for similar content in stack overflow but there isnt one that i can apply .
Assuming your sample data is your 1st table. Consider approach below:
with sample_data as (
select 'id_123' as customer, 'm' as gender, [1,1] as type_shop, [0,0] as type_online, [0,0] as delivery_pickup,[0,1] as delivery_home, [1,0] as delivery_shop,
union all select 'id_456' as customer, 'f' as gender, [1,0,1] as type_shop, [0,1,0] as type_online, [0,0,0] as delivery_pickup,[1,0,0] as delivery_home, [0,1,1] as delivery_shop,
),
normalize_data as (
select
customer,
gender,
type_shop[safe_offset(index)] as type_shop,
type_online[safe_offset(index)] as type_online,
delivery_pickup[safe_offset(index)] as delivery_pickup,
delivery_home[safe_offset(index)] as delivery_home,
delivery_shop[safe_offset(index)] as delivery_shop,
from sample_data,
unnest(generate_array(0,array_length(type_shop)-1)) as index
),
join_data as (
select
customer,
gender,
max(type_shop) as t_shop,
max(type_online) as t_online,
max(delivery_pickup) as delivery_pickup,
max(delivery_home) as delivery_home,
max(delivery_shop) as delivery_shop,
from normalize_data
group by customer,gender,type_shop,type_online
)
select
customer,
gender,
array_agg(t_shop) as type_shop,
array_agg(t_online) as type_online,
array_agg(delivery_pickup) as delivery_pickup,
array_agg(delivery_home) as delivery_home,
array_agg(delivery_shop) as delivery_shop,
from join_data
group by customer,gender
Output:

Get all tables data with Node.js and SQLite

Having these tables into a db:
Athlete with fields: athlete_id, name, surname, date_of_birth, height, weight, bio, photo_id
AthletePhoto with fields: photo_id, photo, mime_type
AthleteResult with fields: athlete_id, gold, silver, bronze
Game with fields: game_id, city, year
The db model:
The code so far can only send data for one of the tables:
db.serialize(function () {
db.all(
'SELECT athlete_id, name, surname FROM Athlete',
function (err, rows) {
return res.send(rows);
}
);
});
so it uses that query: SELECT athlete_id, name, surname FROM Athlete.
Is there a way to combine the tables and send all data?
I've tried to combine 2 tables, Athlete and AthletePhoto but didn't send any data:
SELECT athlete_id, name FROM Athlete UNION SELECT game_id, city, year FROM Game UNION SELECT photo_id as athlete_id, mime_type as name FROM AthletePhoto
Assuming that your database structure correctly represents your application needs, the query which you are trying to make will look something like this:
SELECT
a.athlete_id, a.name, a.surname, a.date_of_birth, a.bio, a.height, a.weight,
ap.photo, ap.mime_type,
ar.gold, ar.silver, ar.bronze,
g.city, g.year
FROM
(
(
(Athlete a JOIN AthletePhoto ap ON a.photo_id = ap.photo_id)
JOIN
AthleteResults ar ON a.athlete_id = ar.athlete_id
)
JOIN
Game g ON ar.game_id = g.game_id
)
There is one mistake in Athlete table, that date_of_birth column is defined twice. You should rename anyone of them. There is no need to use UNION in your query if you want to combine results of different tables, use JOIN instead.
JOIN Combines different tables row-wise
UNION Combines different tables column-wise

BigQuery Materialized View of a STRUCT

We are trying to create a materialized view of a large BQ table. The table receives a high volume of streaming web activity inserts, is multi-tenant, and really leverages BQ's nested columnar structure.
We want to create a subset of this table for more efficient, near-real time query execution with minimal administrative overhead. We thought the simplest solution would be to create a materialized view which is just a subset of rows (by client) and columns, but currently materialized views require aggregation.
Additionally, the materialized view beta supports a limited set of aggregation functions and does not support sub-selects or UNNEST operations. We have not found a good method of extracting the deeply nested STRUCTs into the materialized view. A simple example:
SELECT
'7602E3E96349E972' as session_id,
'084F0262' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SAVE50'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'7602E3E96349E972' as session_id,
'01ECB6EF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SPRING','LOVE'] as value),
STRUCT(
'discounts' as name,
['14.99','6.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'508082BC49BAC09F' as session_id,
'038B67CF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['FREESHIP','HOLIDAY25'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as transaction
)] as contexts_transaction
UNION ALL
SELECT
'C88AE153C784D910' as session_id,
'EA716BD2' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['CYBER'] as value),
STRUCT(
'discounts' as name,
['9.99','19.99'] as value)
] as modifiers
)]
In that ideally we would retain this STRUCT as is, we are trying to accomplish something like this in the materialized view (recognizing these are not supported MV features):
SELECT
session_id,
transaction_id,
ARRAY_AGG(STRUCT<name STRING, value ARRAY<STRING>>(mods_array.name,mods_array.value)) as modifiers
FROM data,
UNNEST(contexts_transaction) trans_array,
UNNEST(trans_array.modifiers) mods_array
GROUP BY 1,2
We are open to any method of subsetting this massive table, not just MV, but would love it to have the same benefits (low maintenance, automatic, low cost). Any suggestions appreciated!
As far as I could understand from your question, you want to have a similar output to this:
with rawdata AS
(
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;
So, input table looks like this
While, output table looks like
If your intention is different, please state it in the question with more details.
For this purpose, I tried to create that query as a materialized view.
create or replace table project.dataset.rawdata as
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;
create materialized view project.dataset.mview as
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid
However, I get the error Unsupported aggregation function in materialized view: array_concat_agg..
Since materialized views are beta yet, we don't know if it's going to be supported in the future. However, it's not possible to do it with current capabilities.
#fhoffa can tell more about it, maybe.

how to shorten the runtime of a bigquery query

I have two tables:
1. PEOPLE (PK, Name, Address, Zip, <<some random other columns>>)
2. EMAIL (PK, Name, Address, Zip, Email)
This is a one-to-many table where they are linked by Name, Address, and Zip.
What I need is:
PEOPLE (PK, Name, Address, Zip, <<some random other columns>>, FK_Email1, Email1, FK_Email2, Email2, FK_Email3, Email3)
What I have so far is this:
#standardSQL
SELECT a.PK, a.FK, Source, FirstName, LastName, MiddleName, SuffixName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] as FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] as FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM (
SELECT
P.PK, P.FK, P.Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, P.City, P.State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION
, ARRAY_AGG(E.Email) Emails, ARRAY_AGG(E.PK) FK_Email
FROM `db.ds.table1` P
left JOIN `db.ds.table2` E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
Group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
) a
my problem is this already goes past the six out time limit.
Is there anyway to make this run faster?
Thanks!
I feel below does the same but in more optimized way
#standardSQL
SELECT
PK, FK, Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, City, State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] AS FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] AS FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM `db.ds.table1` P
LEFT JOIN (
SELECT FirstName, LastName, Address, Zip,
ARRAY_AGG(Email LIMIT 3) Emails, ARRAY_AGG(PK LIMIT 3) FK_Email
FROM `db.ds.table2`
GROUP BY FirstName, LastName, Address, Zip
) E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip

Flat to multi-level nested data

In BQ, I have used ARRAY_AGG(STRUCT(... to restructure some flat data but wanted to go a level further: create another array of records within an array of records.
Although STRUCT does not exist in PostgreSQL, I am interested how one would tackle that there too.
Considering the flat data:
WITH a AS (
SELECT 'ABC' company, 'adress1' address, 'name1' name, 'email1' email, 'work' ph_type, '+123' ph_nr
UNION ALL
SELECT 'ABC' company, 'adress1' address, 'name1' name, 'email1' email, 'cell' ph_type, '+987'
UNION ALL
SELECT 'DEF' company, 'adress2' address, 'name2' name, 'email2' email, 'work' ph_type, '+127'
UNION ALL
SELECT 'DEF' company, 'adress2' address, 'name2' name, 'email2' email, 'cell' ph_type, '+988'
UNION ALL
SELECT 'XYZ' company, 'adress3' address, 'name3' name, 'email3' email, 'work' ph_type, '+456'
)
I can nest contact like so
SELECT company, address, ARRAY_AGG(STRUCT(name, email, ph_type, ph_nr)) contact
FROM a
GROUP BY company, address
ORDER BY 1
but how can I nest, in the same select statement, phones as well (array of records within contact) ?
The JSON representation would look like - for the first contact:
[
{
"company": "ABC",
"address": "adress1",
"contact": [
{
"name": "name1",
"email": "email1",
"phone": [
{
"ph_type": "work",
"ph_nr": "+123"
},
{
"ph_type": "cell",
"ph_nr": "+987"
}
},
...
This can probably be done with a WITH clause or subselect to process the aggregations sequentially but not sure this would perform well (data read twice ?).
I have 600M records to parse daily so wondering about the most efficient way.
EDIT: corrected name definition
The answer to your question is two levels of aggregation.
However, the question itself confuses me, because the query uses name but that is not defined in the data.
Here is an example of what to do:
SELECT company, address, ARRAY_AGG(STRUCT(email, phones)) as contact
FROM (SELECT company, name, address, email, ARRAY_AGG(STRUCT(ph_type, ph_nr)) as phones
FROM a
GROUP BY company, name, address, email
) a
GROUP BY company, address
ORDER BY 1