Flatten nested data in Big Query to a single row - sql

This is what the data looks like
This is what I am trying to achieve
I just need the flattened data to show destination 1 and destination 2 as well as duration 1 and duration 2.
I have used the unnest function in Big Query but it creates multiple rows. I am unable to use any aggregation to group the multiple rows as the data is non-numeric. Thank you for helping!

Below is for BigQuery Standard SQL
#standardSQL
SELECT EnquiryReference,
Destinations[OFFSET(0)].Name AS Destination1,
Destinations[SAFE_OFFSET(1)].Name AS Destination2,
Destinations[OFFSET(0)].Duration AS Duration1,
Destinations[SAFE_OFFSET(1)].Duration AS Duration2
FROM `project.dataset.table`
If to apply to sample data from your question
result will be

Related

SQL - Parse a field and SUM numbers at regular delimiter intervals

I request your help for an issue beyond my current skills...
I'm using Google Big Query to store analytics data about my website, and to calculate the revenue I have a quite difficult query to build.
We have the field %product% which is formatted as following :
;%productID%;%productQuantity%;%productRevenue%;;
If more than one product has been bought, the different products data will be delimited by ",", which can give this :
;12345678;1;49.99;;,;45678912;1;54.99;;
;45678912;2;59.98;;,;14521452;2;139.98;;,;12345678;2;19.98;;
;14521452;1;54.99;;
The only way to calculate the revenue is to sum all the different %productRevenue% from a line and store this into a column.
I have no idea how to do it just with a SQL query... Maybe with RegEx ? Any idea ?
I'd like to create a view with that info to easily pull the data into PowerBI then. But maybe I should process that with M directly in PBI ?
Thanks a lot,
Alex
Below is for BigQuery Standard SQL
#standardSQL
SELECT
SPLIT(i, ';')[OFFSET(1)] productID,
SUM(CAST(SPLIT(i, ';')[OFFSET(2)] AS INT64)) productQuantity,
SUM(CAST(SPLIT(i, ';')[OFFSET(3)] AS FLOAT64)) productRevenue
FROM `project.dataset.table`,
UNNEST(SPLIT(product)) i
GROUP BY productID
if to apply to sample data from your question - output is
Row productID productQuantity productRevenue
1 12345678 3 69.97
2 45678912 3 114.97
3 14521452 3 194.97

GCP: select query with unnest from array has very big process data to run compared to hardcoded values

In bigQuery GCP, I am trying to grab some data in a table where the date is the same as a date in a list of values I have got. If I hardcode the list of values in the select it is vastly cheaper in process to run than if I use a temp structure like an array...
Is there a way to use the temp structure but avoid the enormous processing cost ?
Why is it so expensive for something small simple like this.
please see below examples:
**-----1/ array structure example: this query process's 144.8 GB----------**
WITH
get_a as (
SELECT
GENERATE_DATE_ARRAY('2000-01-01','2000-01-02') as array_of_dates
)
SELECT
a.heading as title
a.ingest_time as proc_date
FROM
'veiw_a.events' as a
get_a as b
UNNEST(b.array_of_dates) as c
WHERE
c in (CAST(a.ingest_time AS DATE)
)
**------2/ hardcoded example: this query processes 936.5 MB over 154 X's less ? --------**
SELECT
a.heading as title
a.ingest_time as proc_date
FROM
'veiw_a.events' as a
WHERE
(CAST(a.ingest_time as DATE)) IN ('2000-01-01','2000-01-02')
Presumably, your view_a.events table is partitioned by the ingest_time.
The issue is that partition pruning is very conservative (buggy?). With the direct comparisons, BigQuery is smart enough to recognize exactly which partitions are used for the query. But with the generated version, BigQuery is not able to figure this out, so the entire table needs to be read.

Array Aggregation - Retrieving an entire row of data in BigQuery

We have used array aggregation method and loaded the data in BigQuery
Clarification :
Is it possible to retrieve the specific value in array aggregation method? What are the methods available for retrieving the data from the field which have multiple records?
Query Clarification
We tried to find out the value of all data from the particular field which has multiple values in the screenshots [image.png] using below query but we got an error.
Sample Query
select fv,product.productSKU,product.productVariant,product.productBrand
from dataset.tablename
where hn=9 and product.productBrand='Politix'
You should use UNNEST as in below example
#standardSQL
SELECT
fv,
product.productSKU,
product.productVariant,
product.productBrand
FROM `dataset.tablename`,
UNNEST(product) product
WHERE hn=9
AND product.productBrand='Politix'
You can also check Working with Arrays in Standard SQL

Query to return the amount of time each field equals a true value

I'm collecting data and storing in SQL Server. The table consist of the data shown in the Example Table below. I need to show the amount of time that each field [Fault0-Fault10] is in a fault state. The fault state is represented as a 1 for the fields value.
I have used the following query and got the desired results. However this is for only one field. I need to have the total time for each field in one query. I'm having issues pulling all of the fields into one query.
SELECT
Distinct [Fault0]
,datediff(mi, min(DateAndTime), max(DateAndTime)) TotalTimeInMins
FROM [dbo].[Fault]
Where Fault0 =1
group by [Fault0]
Results
Assuming you want the max() - min(), then you simply need to unpivot the data. Your query can look like:
SELECT v.faultname,
datediff(minute, min(t.DateAndTime), max(t.DateAndTime)) as TotalTimeInMins
FROM [dbo].[Fault] f CROSS APPLY
(VALUES ('fault0', fault0), ('fault1', fault1), . . ., ('fault10', fault10)
) v(faultname, value)
WHERE v.value = 1
GROUP BY v.faultname;

How can I convert this SAS datastep to oracle sql for a conditional counter column?

I apologize for the generalness of the question. I'm trying to create a column that groups rows together based on the time between the current and previous observation. The code below is code that I' wrote that works correctly in SAS. However because of the way that a data step runs vs how oracle sql runs I can't figure out how to do this in oracle sql. Any help would be greatly appreciated!
DATA GROUP;
SET LAG1;
BY CUSTOMER_KEY;
IF (TIME_BTW>5 OR TIME_BTW=.) THEN JOURNEY=0;
JOURNEY+1;
IF FIRST.CUSTOMER_KEY THEN GROUP=0;
IF JOURNEY=1 THEN GROUP+1;
RUN;
It looks like you are defining groups based on time_btw. You seem to want an analytic function. I think the code is like this:
select t.*,
sum(case when time_btw > 5 then 1 else 0 end) over (partition by customer_key order by ??) as grp
from t;
Note that in SQL (unlike SAS), tables represent unordered sets. This means that you need a column that specifies the ordering.