Array Aggregation - Retrieving an entire row of data in BigQuery - google-bigquery

We have used array aggregation method and loaded the data in BigQuery
Clarification :
Is it possible to retrieve the specific value in array aggregation method? What are the methods available for retrieving the data from the field which have multiple records?
Query Clarification
We tried to find out the value of all data from the particular field which has multiple values in the screenshots [image.png] using below query but we got an error.
Sample Query
select fv,product.productSKU,product.productVariant,product.productBrand
from dataset.tablename
where hn=9 and product.productBrand='Politix'

You should use UNNEST as in below example
#standardSQL
SELECT
fv,
product.productSKU,
product.productVariant,
product.productBrand
FROM `dataset.tablename`,
UNNEST(product) product
WHERE hn=9
AND product.productBrand='Politix'
You can also check Working with Arrays in Standard SQL

Related

SQL JSON HELP | Selecting ALL records with a certain JSON value

I am trying to get all records that contain WEAPON_COMBATPISTOL in the column inventory within my users table.
It is based on JSON. I am lost I have tried JSON_EXTRACT & JSON_CONTAINS.
JSON DATA:
[{"slot":1,"count":450,"name":"money"},
{"slot":2,"count":54,"name":"ammo-9"},
{"metadata":{"serial":"643280CXJ213639","durability":97.91999999999998,"registered":"Barry McCeiner","components":[],"ammo":11},"slot":3,"count":1,"name":"WEAPON_COMBATPISTOL"},
{"slot":4,"count":8,"name":"burger"},
{"slot":5,"count":8,"name":"icetea"},
{"slot":6,"count":7,"name":"stone"},
{"slot":10,"count":6,"name":"lockpick"}]
SQL Statement:
SELECT * FROM users WHERE JSON_CONTAINS(inventory, 'WEAPON_COMBATPISTOL', '$.name');
I also want to remove just the WEAPON_COMBATPISTOL from the JSON data. If that is possible.

Looking for guidance on my sql query that apparently includes an array

Quite new to sql, and looking for help on what i'm doing wrong.
With the code below, i'm getting the error "cannot access field value on a value with type array<struct> at [1:30]"
The "audience size value" comes from the dataset public_campaigns where as the engagement rate comes from the data set public_instagram_channels
I think the dataset that's causing the issue here is the public_campaigns.
thanks in advance for your help!
SELECT creator_audience_size.value, AVG(engagement_rate/1000000) AS avgER
FROM `public_instagram_channels` AS pic
JOIN `public_campaigns`AS pc
ON pic.id=pc.id
GROUP BY creator_audience_size.value
This is to do with the type of one of the columns using REPEATED mode.
In Google BigQuery you have to use UNNEST on these repeated columns to get their individual values in the result set.
It's unclear from what you've posted which column is the repeated type - looking at the table definition for public_instagram_channels and public_campaigns will reveal this - look for the word REPEATED in the Mode column of the table definition.
Once you've found it, include UNNEST in your query, as per this untested example:
SELECT creator_audience_size.value, AVG(engagement_rate/1000000) AS avgER
FROM `public_instagram_channels` AS pic,
UNNEST(`column_name`) AS whatever_you_want
JOIN `public_campaigns`AS pc ON pic.id = pc.id
GROUP BY creator_audience_size.value

How can data studio read a repeatable column as values of a single record?

I'm moving a Mongo collection into BigQuery to do analysis and visualizations in Google Data Studio. I'm specifically trying to map all results of a locations collection, which has multiple records, one for each location. Each record stores the lat long as an array of 2 numbers.
In Data Studio, when i try to map the locations.coordinates value, it croaks, because it only pulls in the first value of the array. If instead of mapping it, I output the result as a table, I see 2 rows for each record, with the _id being the same and locations.coordinates being different between a row that has the latitude (locations.coordinates[0]) and another row for the longitude (locations.coordinates[1]).
I think I have to do this as a scheduled query in bigquery, that runs after every sync of data. But, I'm hoping there is a way to do this as a calculated field or a blended data set, in Google Data Studio.
Data as it exists in mongo
Data as it exists in bigquery
Data as it exists in data studio
additional:
Big Query Record Types
You can address values in arrays directly and transform your data accordingly using struct etc.:
WITH t AS (
SELECT * FROM UNNEST([
STRUCT('a' AS company, STRUCT([-71.2, 42.0] as coordinates, 'Point' as type) AS location),
('b', ([-71.0, 42.2], 'Point')),
('c', ([-71.4, 42.4], 'Point'))
])
)
--show source structure of example data
--SELECT * FROM t
SELECT * except(location),
STRUCT(
location.coordinates[safe_offset(0)] as long,
location.coordinates[safe_offset(1)] as lat,
location.type
) as location
FROM t
There's offset() for 0-based access, ordinal() for 1-based access and with safe_ you don't trigger errors in case the index in the array doesn't exist. If you need to know that values are missing, then you should use the version without safe_.
Anyway - this structure is flat by choosing specific values from the array. It should work with datastudio or any other visualization tool, there are no repeated rows anymore

GCP Bigquery - query empty values from a record type value

I'm trying to query all resources that has empty records on a specific column but I'm unable to make it work. Here's the query that I'm using:
SELECT
service.description,
project.labels,
cost AS cost
FROM
`xxxxxx.xxxxx.xxxx.xxxx`
WHERE
service.description = 'BigQuery' ;
Here's the results:
As you can see, I'm getting everything with that query, but as mentioned, I'm looking to get resources with empty records only for example record 229,230 so on.
Worth to mention that schema for the column I'm trying to query is:
project.labels RECORD REPEATED
The above was mentioned because I tried using several combinations of WHERE but everything ends up in error.
To identify empty repeated record - you can use ARRAY_LENGTH in WHERE clause like in below example
WHERE ARRAY_LENGTH(project.labels) = 0

Flatten nested data in Big Query to a single row

This is what the data looks like
This is what I am trying to achieve
I just need the flattened data to show destination 1 and destination 2 as well as duration 1 and duration 2.
I have used the unnest function in Big Query but it creates multiple rows. I am unable to use any aggregation to group the multiple rows as the data is non-numeric. Thank you for helping!
Below is for BigQuery Standard SQL
#standardSQL
SELECT EnquiryReference,
Destinations[OFFSET(0)].Name AS Destination1,
Destinations[SAFE_OFFSET(1)].Name AS Destination2,
Destinations[OFFSET(0)].Duration AS Duration1,
Destinations[SAFE_OFFSET(1)].Duration AS Duration2
FROM `project.dataset.table`
If to apply to sample data from your question
result will be