Is there a way to modify and rename a column that is within a RECORD in BigQuery, and then keep this column with the same name and location as before? - google-bigquery

I have a RECORD in BigQuery with the structure:
Parent
|___Child_1
|___Child_2
|___Child_3
|___...
Child_1 is of type TIMESTAMP, so I would like to convert it from a TIMESTAMP string to an INT64 that represents the number of milliseconds since Unix Epoch. This is done via the unix_millis function.
I am having trouble getting this done for nested fields. Below are my attempts:
select *, unix_millis(parent.child_1) as parent.child_1 from `dataset.table`
When I tried the above, the query editor in BigQuery underlined "child_1" in "as parent.child_1", and gave the error Syntax error: Expected end of input but got "."
Why I expected this to work is because, for non-nested fields, it is possible to use unix_millis and then use the AS operator to rename the column.
So how would I go about performing the unix_millis function and then make sure that the resulting column has the same name and location within the RECORD as before?

Below is for BigQuery Standard SQL
#standardSQL
SELECT *
REPLACE((
SELECT AS STRUCT * REPLACE(UNIX_MILLIS(child1) AS child1)
FROM UNNEST([parent])
) AS parent)
FROM `project.dataset.table`
You can test, play with above using some simplified dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, STRUCT<child1 TIMESTAMP, child2 STRING, child3 INT64>(CURRENT_TIMESTAMP(), 'test1', 123) parent UNION ALL
SELECT 2, STRUCT<child1 TIMESTAMP, child2 STRING, child3 INT64>(CURRENT_TIMESTAMP(), 'test2', 456)
)
SELECT *
REPLACE((
SELECT AS STRUCT * REPLACE(UNIX_MILLIS(child1) AS child1)
FROM UNNEST([parent])
) AS parent)
FROM `project.dataset.table`
with output
Row id parent.child1 parent.child2 parent.child3
1 1 1599154064128 test1 123
2 2 1599154064128 test2 456
while original data was
Row id parent.child1 parent.child2 parent.child3
1 1 2020-09-03 17:29:09.512794 UTC test1 123
2 2 2020-09-03 17:29:09.512794 UTC test2 456

Related

Select Json object keys as columns in prestodb (sql)

This is my database:
mytable
SensorID
Name
Data
1
Prox
{"O3":33, "CO2":12, "PM10":12"}
3
IR
{"O3":33, "CO2":12, "PM10":12"}
SELECT (how to select field without mentioning object keys?) FROM mytable WHERE SensorID=1
actually, I tried this method it's working :
SELECT SensorID, Name, Data.O3, Data.CO2, Data.PM10 FROM mytable WHERE SensorID=1
The problem is sometimes I don't know what are values inside the object of the Data column
Excepted output is:
SensorID
Name
O3
CO2
PM10
1
Prox
33
12
12
How can I achieve this...

Complex aggregation with select

I have table in DB like this (ID column is not a unique UUID, just some object ID, primary key still exists, but removed for example)
ID
Option
Value
Number of searches
Search date
1
abc
a
1
2021-01-01
1
abc
b
2
2021-01-01
1
abc
a
3
2021-01-02
1
abc
b
4
2021-01-02
1
def
a
5
2021-01-01
1
def
b
6
2021-01-01
1
def
a
7
2021-01-02
1
def
b
8
2021-01-02
2
...
...
...
...
...
...
...
...
...
N
xyz
xyz
M
any date
I want to get a some kind of statistic report like
ID
Total searches
Option
Total number of option searches
Value
Total value searches
1
36
abc
10
a
4
b
6
def
26
a
12
b
14
Is it possible in some way? UNION isn't working were, clause GROUP BY also have no idea how can solve that
I can do it easily in kotlin, just request everything and aggregate to classes like that
data class SearchAggregate (
val id: String,
val options: List<Option>,
val values: List<Value>
)
data class Option (
val name: String,
val totalSearches: Long
)
data class Value(
val name: String,
val totalSearches: Long
)
and export to file but I have to request data by SQL
You can use the COUNT() window function in a subquery to preprocess the data. For example:
select
id,
max(total_searches) as total_searches,
option,
max(total_options) as total_options,
value,
max(total_values) as total_values
from (
select
id,
count(*) over(partition by id) as total_searches,
option,
count(*) over(partition by id, option) as total_options,
value,
count(*) over(partition by id, option, value) as total_values
from t
) x
group by id, option, value
See running example at DB Fiddle #1.
Or you can use a shorter query, as in:
select
id,
sum(cnt) over(partition by id) as total_searches,
option,
sum(cnt) over(partition by id, option) as total_searches,
value,
cnt
from (
select id, option, value, count(*) as cnt from t group by id, option, value
) x
See running example at DB Fiddle #2.
The first option is to use ROLLUP, as that is the intended SQL pattern. It doesn't give you the results in the format you asked for. That's a reflection on the format you asked for not being normalised.
SELECT
id,
option,
value,
SUM(`Number of searches`) AS total_searches
FROM
your_table
GROUP BY
ROLLUP(
id,
option,
value
)
It's concise, standard practice, SQL Friendly, etc, etc.
Thinking in terms of these normalised patterns will make your use of SQL much more effective.
That said, you CAN use SQL to aggregate and restructure the results. You get the structure you want, but with more code and increased maintenance, lower flexibility, etc.
SELECT
id,
SUM(SUM(`Number of searches`)) OVER (PARTITION BY id) as total_by_id,
option,
SUM(SUM(`Number of searches`)) OVER (PARTITION BY id, option) as total_by_id_option,
value,
SUM(`Number of searches`) AS total_by_id_option_value
FROM
your_table
GROUP BY
id,
option,
value
That doesn't leave blanks where you have them, but that's because to do so is a SQL Anti-Pattern, and should be handled in your presentation layer, not in the database.
Oh, and please don't use column names with spaces; stick to alphanumeric characters with underscores.
Demo : https://www.db-fiddle.com/f/fX3tNL82gqgVCRoa3v6snP/5

Select Select Field Using SQL (BigQuery)

I have a table named Table1. I have the following fields:
Field Name Type Mode
name STRING NULLABLE
items RECORD REPEATED
items.properties RECORD REPEATED
items.properties.name STRING NULLABLE
items.properties.value STRING NULLABLE
Here's an example of what the table looks like:
name items.properties.name items.properties.value
---------------------------------------------------------------
ABC1 type 1
frequent 1
---------------------------------------------------------------
ABC2 type 2
frequent 1
---------------------------------------------------------------
ABC3 type 2
frequent 2
---------------------------------------------------------------
ABC4 type 1
frequent 2
Ultimately, I want to select the names and values of these items, but I'm consistently getting errors. Here's what I'm trying as a start:
SELECT ARRAY(SELECT properties FROM UNNEST(items)) AS itemProp
FROM `Table1`
I've exhausted all my other ideas. But essentially, I just want to pull out all the values and names of the properties in individual rows so I can say where items.properties.name = type AND items.properties.value = 1.
Any ideas?
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.*
FROM `project.dataset.table1` t,
UNNEST(items) item, UNNEST(properties) property
WHERE property.name = 'type'
AND property.value = 1
Something like this:
select prop.*
from t cross join
(unnest(t.items)) item cross join
(unnest(item.properties)) prop;
You have a strange data structure. Only one repeated column is necessary. I see no reason to repeat properties within items.

BigQuery - nested json - select where nested item equals

Having the following table in BigQuery database, where the f0_
Row | f0_
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
3 | {"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}
4 | {"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}
5 | {"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}
6 | {"configuration":[{"param1":"value1"}]}
f0_ column is a pure string.
Is there a way to write a select query, where the "param2" value is equal to [3.0, 45] array meaning it would only return rows 1 and 2? Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Below is for BigQuery Standrad SQL
#standardSQL
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}' line UNION ALL
SELECT '{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}' UNION ALL
SELECT '{"configuration":[{"param1":"value1"}]}'
)
SELECT line
FROM `project.dataset.table`
WHERE REGEXP_EXTRACT(JSON_EXTRACT(line, '$.configuration'), r'{"param2":(.*?)}') = '[3.0,45]'
with result
Row line
1 {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}
Preferably would be great to accomplish it without directly indexing the first element in the "configuration" array as the order might not be guaranteed.
Note: this solution does not depend on position of "param2" in the configuration array
You can use some of BQ's neat JSON functions as described here.
Based on that, you can locate param2 and check if its value matches what you're looking for. If you aren't sure of the configuration order, you can iterate through the array to find param2, but it's not particularly efficient. I recommend you try to find a way where param2 is always the second field in the array. I was able to get the correct results like so:
SELECT json_text AS correct_configurations
FROM UNNEST([
'{"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,36]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,46]}]}',
'{"configuration":[{"param1":"value1"},{"param2":[3.0,30]}]}',
'{"configuration":[{"param1":"value1"}]}'
])
AS json_text
WHERE JSON_EXTRACT(json_text, '$.configuration[1].param2') LIKE "[3.0,45]";
Gives a result of:
Row | correct_configurations
1 | {"configuration":[{"param1":"value1"},{"param2":[3.0,45]}]}
2 | {"configuration":[{"param1":"value2"},{"param2":[3.0,45]}]}

StandardSQL NTH() and FIRST() functions for BigQuery

I'm starting out in BigQuery, with some experience in pSQl.
The #legacySQL query I'm running successfully is:
SELECT
FIRST(SPLIT(ewTerms, '/')) AS place,
NTH(2, SPLIT(ewTerms, '/')) AS divisor
FROM (SELECT ewTerms FROM account.free)
The string values in the 'ewTerms' column from the table 'free' are single digit fractions, such as "2/4", "3/5", etc. This #legacySQl query successfully creates two columns from 'ewTerms', reading:
Row place divisor
1 3 5
2 2 4
I need now to use this column creation in a WITH function, so I have to switch to using #standardSQL.
Can anyone tell me how I can call to the string's FIRST() and NTH() functions using #standardSQL? I've tried:
WITH prep AS(
SELECT
SPLIT(ewTerms, '/') AS split
FROM (SELECT ewTerms FROM accounts.free)
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep
but this is wrong. Help anyone?
Your question is not clear about what is wrong. This query works for me:
#standardSQL
WITH Input AS (
SELECT '3/5' AS ewTerms UNION ALL
SELECT '2/4' AS ewTerms
), prep AS (
SELECT
SPLIT(ewTerms, '/') AS split
FROM Input
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep;
The output is:
+-------+---------+
| place | divisor |
+-------+---------+
| 2 | 4 |
| 3 | 5 |
+-------+---------+
Using your original table, your query would be:
#standardSQL
WITH prep AS (
SELECT
SPLIT(ewTerms, '/') AS split
FROM accounts.free
)
SELECT
split[SAFE_ORDINAL(1)] AS place,
split[SAFE_ORDINAL(2)] AS divisor
FROM prep;