Im using Big Query Sql here
This is the table build
This table is showing customer id_123 has purchase in type_shop and delivery_shop & also delivery_home .
Is it possible for me to get the result to be reflect in a single row instead of 2 different rows ?
I only want to show this customer id_123 purchased in type_shop & uses delivery_home & delivery_shop in a row
I tried a few methods using array_agg(stru) but it is still shows 2 rows of result instead of 1.
Not sure what other SQL function should i try here ? try searching for similar content in stack overflow but there isnt one that i can apply .
Assuming your sample data is your 1st table. Consider approach below:
with sample_data as (
select 'id_123' as customer, 'm' as gender, [1,1] as type_shop, [0,0] as type_online, [0,0] as delivery_pickup,[0,1] as delivery_home, [1,0] as delivery_shop,
union all select 'id_456' as customer, 'f' as gender, [1,0,1] as type_shop, [0,1,0] as type_online, [0,0,0] as delivery_pickup,[1,0,0] as delivery_home, [0,1,1] as delivery_shop,
),
normalize_data as (
select
customer,
gender,
type_shop[safe_offset(index)] as type_shop,
type_online[safe_offset(index)] as type_online,
delivery_pickup[safe_offset(index)] as delivery_pickup,
delivery_home[safe_offset(index)] as delivery_home,
delivery_shop[safe_offset(index)] as delivery_shop,
from sample_data,
unnest(generate_array(0,array_length(type_shop)-1)) as index
),
join_data as (
select
customer,
gender,
max(type_shop) as t_shop,
max(type_online) as t_online,
max(delivery_pickup) as delivery_pickup,
max(delivery_home) as delivery_home,
max(delivery_shop) as delivery_shop,
from normalize_data
group by customer,gender,type_shop,type_online
)
select
customer,
gender,
array_agg(t_shop) as type_shop,
array_agg(t_online) as type_online,
array_agg(delivery_pickup) as delivery_pickup,
array_agg(delivery_home) as delivery_home,
array_agg(delivery_shop) as delivery_shop,
from join_data
group by customer,gender
Output:
We are trying to create a materialized view of a large BQ table. The table receives a high volume of streaming web activity inserts, is multi-tenant, and really leverages BQ's nested columnar structure.
We want to create a subset of this table for more efficient, near-real time query execution with minimal administrative overhead. We thought the simplest solution would be to create a materialized view which is just a subset of rows (by client) and columns, but currently materialized views require aggregation.
Additionally, the materialized view beta supports a limited set of aggregation functions and does not support sub-selects or UNNEST operations. We have not found a good method of extracting the deeply nested STRUCTs into the materialized view. A simple example:
SELECT
'7602E3E96349E972' as session_id,
'084F0262' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SAVE50'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'7602E3E96349E972' as session_id,
'01ECB6EF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['SPRING','LOVE'] as value),
STRUCT(
'discounts' as name,
['14.99','6.99'] as value)
] as modifiers
)] as contexts_transaction
UNION ALL
SELECT
'508082BC49BAC09F' as session_id,
'038B67CF' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['FREESHIP','HOLIDAY25'] as value),
STRUCT(
'discounts' as name,
['9.99'] as value)
] as transaction
)] as contexts_transaction
UNION ALL
SELECT
'C88AE153C784D910' as session_id,
'EA716BD2' as transaction_id,
[STRUCT(
[STRUCT(
'promotions' as name,
['CYBER'] as value),
STRUCT(
'discounts' as name,
['9.99','19.99'] as value)
] as modifiers
)]
In that ideally we would retain this STRUCT as is, we are trying to accomplish something like this in the materialized view (recognizing these are not supported MV features):
SELECT
session_id,
transaction_id,
ARRAY_AGG(STRUCT<name STRING, value ARRAY<STRING>>(mods_array.name,mods_array.value)) as modifiers
FROM data,
UNNEST(contexts_transaction) trans_array,
UNNEST(trans_array.modifiers) mods_array
GROUP BY 1,2
We are open to any method of subsetting this massive table, not just MV, but would love it to have the same benefits (low maintenance, automatic, low cost). Any suggestions appreciated!
As far as I could understand from your question, you want to have a similar output to this:
with rawdata AS
(
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
)
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from rawdata
group by userid;
So, input table looks like this
While, output table looks like
If your intention is different, please state it in the question with more details.
For this purpose, I tried to create that query as a materialized view.
create or replace table project.dataset.rawdata as
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['ABCDEF'] as value), STRUCT('couponIds' as name, ['123456'] as value)] as transactions union all
SELECT 1 as userid, [STRUCT('transactionIds' as name, ['XYZ', 'KLM'] as value), STRUCT('couponIds' as name, ['789', '567'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['XY', 'KL'] as value), STRUCT('couponIds' as name, ['10', '15'] as value)] union all
SELECT 2 as userid, [STRUCT('transactionIds' as name, ['X', 'K'] as value), STRUCT('couponIds' as name, ['20', '25'] as value)]
;
create materialized view project.dataset.mview as
select
userid,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'transactionIds')) as transactionIds,
ARRAY_CONCAT_AGG((SELECT trx.value FROM UNNEST(transactions) trx WHERE trx.name = 'couponIds')) as couponIds
from project.dataset.rawdata
GROUP BY userid
However, I get the error Unsupported aggregation function in materialized view: array_concat_agg..
Since materialized views are beta yet, we don't know if it's going to be supported in the future. However, it's not possible to do it with current capabilities.
#fhoffa can tell more about it, maybe.
I have two tables:
1. PEOPLE (PK, Name, Address, Zip, <<some random other columns>>)
2. EMAIL (PK, Name, Address, Zip, Email)
This is a one-to-many table where they are linked by Name, Address, and Zip.
What I need is:
PEOPLE (PK, Name, Address, Zip, <<some random other columns>>, FK_Email1, Email1, FK_Email2, Email2, FK_Email3, Email3)
What I have so far is this:
#standardSQL
SELECT a.PK, a.FK, Source, FirstName, LastName, MiddleName, SuffixName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] as FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] as FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM (
SELECT
P.PK, P.FK, P.Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, P.City, P.State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION
, ARRAY_AGG(E.Email) Emails, ARRAY_AGG(E.PK) FK_Email
FROM `db.ds.table1` P
left JOIN `db.ds.table2` E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
Group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
) a
my problem is this already goes past the six out time limit.
Is there anyway to make this run faster?
Thanks!
I feel below does the same but in more optimized way
#standardSQL
SELECT
PK, FK, Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, City, State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] AS FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] AS FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM `db.ds.table1` P
LEFT JOIN (
SELECT FirstName, LastName, Address, Zip,
ARRAY_AGG(Email LIMIT 3) Emails, ARRAY_AGG(PK LIMIT 3) FK_Email
FROM `db.ds.table2`
GROUP BY FirstName, LastName, Address, Zip
) E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
In BQ, I have used ARRAY_AGG(STRUCT(... to restructure some flat data but wanted to go a level further: create another array of records within an array of records.
Although STRUCT does not exist in PostgreSQL, I am interested how one would tackle that there too.
Considering the flat data:
WITH a AS (
SELECT 'ABC' company, 'adress1' address, 'name1' name, 'email1' email, 'work' ph_type, '+123' ph_nr
UNION ALL
SELECT 'ABC' company, 'adress1' address, 'name1' name, 'email1' email, 'cell' ph_type, '+987'
UNION ALL
SELECT 'DEF' company, 'adress2' address, 'name2' name, 'email2' email, 'work' ph_type, '+127'
UNION ALL
SELECT 'DEF' company, 'adress2' address, 'name2' name, 'email2' email, 'cell' ph_type, '+988'
UNION ALL
SELECT 'XYZ' company, 'adress3' address, 'name3' name, 'email3' email, 'work' ph_type, '+456'
)
I can nest contact like so
SELECT company, address, ARRAY_AGG(STRUCT(name, email, ph_type, ph_nr)) contact
FROM a
GROUP BY company, address
ORDER BY 1
but how can I nest, in the same select statement, phones as well (array of records within contact) ?
The JSON representation would look like - for the first contact:
[
{
"company": "ABC",
"address": "adress1",
"contact": [
{
"name": "name1",
"email": "email1",
"phone": [
{
"ph_type": "work",
"ph_nr": "+123"
},
{
"ph_type": "cell",
"ph_nr": "+987"
}
},
...
This can probably be done with a WITH clause or subselect to process the aggregations sequentially but not sure this would perform well (data read twice ?).
I have 600M records to parse daily so wondering about the most efficient way.
EDIT: corrected name definition
The answer to your question is two levels of aggregation.
However, the question itself confuses me, because the query uses name but that is not defined in the data.
Here is an example of what to do:
SELECT company, address, ARRAY_AGG(STRUCT(email, phones)) as contact
FROM (SELECT company, name, address, email, ARRAY_AGG(STRUCT(ph_type, ph_nr)) as phones
FROM a
GROUP BY company, name, address, email
) a
GROUP BY company, address
ORDER BY 1