BigQuery UDF define constant dictionary and match for a given function argument - google-bigquery

What is the best way to define a map: Map(1 -> "One", 2 -> "Two") and define a function which will match to above function? I am thinking of defining a dictionary via Javascript and matching in the function body. An example would be great. Thanks

Below example for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION MAP(expr ANY TYPE, map ANY TYPE, `default` ANY TYPE ) AS (
IFNULL((
SELECT result
FROM (SELECT NULL AS search, NULL AS result UNION ALL SELECT * FROM UNNEST(map))
WHERE search = expr), `default`)
);
WITH `project.dataset.table` AS (
SELECT 1 id, 4 location_id UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5
)
SELECT id, location_id,
MAP(location_id,
[ (1, 'Los Angeles'),
(2, 'San Francisco'),
(3, 'New York'),
(4, 'Seattle')
], 'Non US') AS `Location`
FROM `project.dataset.table`
with result
Row id location_id Location
1 1 4 Seattle
2 2 2 San Francisco
3 3 5 Non US

Related

PostgreSQL: Select unique rows where distinct values are in list

Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.

How to query on fields from nested records without referring to the parent records in BigQuery?

I have data structured as follows:
{
"results": {
"A": {"first": 1, "second": 2, "third": 3},
"B": {"first": 4, "second": 5, "third": 6},
"C": {"first": 7, "second": 8, "third": 9},
"D": {"first": 1, "second": 2, "third": 3},
... },
...
}
i.e. nested records, where the lowest level has the same schema for all records in the level above. The schema would be similar to this:
results RECORD NULLABLE
results.A RECORD NULLABLE
results.A.first INTEGER NULLABLE
results.A.second INTEGER NULLABLE
results.A.third INTEGER NULLABLE
results.B RECORD NULLABLE
results.B.first INTEGER NULLABLE
...
Is there a way to do (e.g. aggregate) queries in BigQuery on fields from the lowest level, without knowledge of the keys on the (direct) parent level? Put differently, can I do a query on first for all records in results without having to specify A, B, ... in my query?
I would for example want to achieve something like
SELECT SUM(results.*.first) FROM table
in order to get 1+4+7+1 = 13,
but SELECT results.*.first isn't supported.
(I've tried playing around with STRUCTs, but haven't gotten far.)
Below trick is for BigQuery Standard SQL
#standardSQL
SELECT id, (
SELECT AS STRUCT
SUM(first) AS sum_first,
SUM(second) AS sum_second,
SUM(third) AS sum_third
FROM UNNEST([a]||[b]||[c]||[d])
).*
FROM `project.dataset.table`,
UNNEST([results])
You can test, play with above using dummy/sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 AS id, STRUCT(
STRUCT(1 AS first, 2 AS second, 3 AS third) AS A,
STRUCT(4 AS first, 5 AS second, 6 AS third) AS B,
STRUCT(7 AS first, 8 AS second, 9 AS third) AS C,
STRUCT(1 AS first, 2 AS second, 3 AS third) AS D
) AS results
)
SELECT id, (
SELECT AS STRUCT
SUM(first) AS sum_first,
SUM(second) AS sum_second,
SUM(third) AS sum_third
FROM UNNEST([a]||[b]||[c]||[d])
).*
FROM `project.dataset.table`,
UNNEST([results])
with output
Row id sum_first sum_second sum_third
1 1 13 17 21
Is there a way to do (e.g. aggregate) queries in BigQuery on fields from the lowest level, without knowledge of the keys on the (direct) parent level?
Below is for BigQuery Standard SQL and totally avoids referencing parent records (A, B, C, D, etc.)
#standardSQL
CREATE TEMP FUNCTION Nested_SUM(entries ANY TYPE, field_name STRING) AS ((
SELECT SUM(CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(entries), r'":{(.*?)}')) entry,
UNNEST(SPLIT(entry)) kv
WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') = field_name
));
SELECT id,
Nested_SUM(results, 'first') AS first_sum,
Nested_SUM(results, 'second') AS second_sum,
Nested_SUM(results, 'third') AS third_sum,
Nested_SUM(results, 'forth') AS forth_sum
FROM `project.dataset.table`
if to apply to sample data from your question as in below example
#standardSQL
CREATE TEMP FUNCTION Nested_SUM(entries ANY TYPE, field_name STRING) AS ((
SELECT SUM(CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(entries), r'":{(.*?)}')) entry,
UNNEST(SPLIT(entry)) kv
WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') = field_name
));
WITH `project.dataset.table` AS (
SELECT 1 AS id, STRUCT(
STRUCT(1 AS first, 2 AS second, 3 AS third) AS A,
STRUCT(4 AS first, 5 AS second, 6 AS third) AS B,
STRUCT(7 AS first, 8 AS second, 9 AS third) AS C,
STRUCT(1 AS first, 2 AS second, 3 AS third) AS D
) AS results
)
SELECT id,
Nested_SUM(results, 'first') AS first_sum,
Nested_SUM(results, 'second') AS second_sum,
Nested_SUM(results, 'third') AS third_sum,
Nested_SUM(results, 'forth') AS forth_sum
FROM `project.dataset.table`
output is
Row id first_sum second_sum third_sum forth_sum
1 1 13 17 21 null
I adapted Mikhail's answer in order to support grouping on the values of the lowest-level fields:
#standardSQL
CREATE TEMP FUNCTION Nested_AGGREGATE(entries ANY TYPE, field_name STRING) AS ((
SELECT ARRAY(
SELECT AS STRUCT TRIM(SPLIT(kv, ':')[OFFSET(1)], '"') AS value, COUNT(SPLIT(kv, ':')[OFFSET(1)]) AS count
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(entries), r'":{(.*?)}')) entry,
UNNEST(SPLIT(entry)) kv
WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') = field_name
GROUP BY TRIM(SPLIT(kv, ':')[OFFSET(1)], '"')
)
));
SELECT id,
Nested_AGGREGATE(results, 'first') AS first_agg,
Nested_AGGREGATE(results, 'second') AS second_agg,
Nested_AGGREGATE(results, 'third') AS third_agg,
FROM `project.dataset.table`
Output for WITH `project.dataset.table` AS (SELECT 1 AS id, STRUCT( STRUCT(1 AS first, 2 AS second, 3 AS third) AS A, STRUCT(4 AS first, 5 AS second, 6 AS third) AS B, STRUCT(7 AS first, 8 AS second, 9 AS third) AS C, STRUCT(1 AS first, 2 AS second, 3 AS third) AS D) AS results ):
Row id first_agg.value first_agg.count second_agg.value second_agg.count third_agg.value third_agg.count
1 1 1 2 2 2 3 2
4 1 5 1 6 1
7 1 8 1 9 1

How to implement generic Oracle DECODE function in BigQuery?

I'm looking into implementing the Oracle DECODE function as a UDF.
Below is the outward functionaliy
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions040.htm
Below is the outward functionality and syntax of a decode in Oracle:
Oracle:
DECODE( <expr> , <search1> , <result1> [ , <search2> , <result2> ... ] [ , <default> ] )
SELECT product_id,
DECODE (warehouse_id, 1, 'Southlake',
2, 'San Francisco',
3, 'New Jersey',
4, 'Seattle',
'Non domestic')
"Location of inventory" FROM inventories;
Primarily, with BigQuery UDFs SQL or JavaScript, with BigQuery UDFs, when you define the UDF function you need to know the number of parameters you are accepting and typing. When you define the SQL UDF function, you can also accept an array of type anything, but I am not sure if it can work and SQL UDF can perform what we want with an array. It seems based on the Javascript UDF documentation all parameters are named and typed and known up front.
Is there a way to accomplish this using a BigQuery UDF, it has to be dynamic like Oracle decode and fit any scenario you put in front of it not static knowing what you are decoding
Below is for BigQuery Standard SQL
CREATE TEMP FUNCTION DECODE(expr ANY TYPE, map ANY TYPE, `default` ANY TYPE ) AS ((
IFNULL((SELECT result FROM UNNEST(map) WHERE search = expr), `default`)
));
You can see how it works using below example
#standardSQL
CREATE TEMP FUNCTION DECODE(expr ANY TYPE, map ANY TYPE, `default` ANY TYPE ) AS ((
IFNULL((SELECT result FROM UNNEST(map) WHERE search = expr), `default`)
));
WITH `project.dataset.inventories` AS (
SELECT 1 product_id, 4 warehouse_id UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5
)
SELECT product_id, warehouse_id,
DECODE(warehouse_id,
[STRUCT<search INT64, result STRING>
(1,'Southlake'),
(2,'San Francisco'),
(3,'New Jersey'),
(4,'Seattle')
], 'Non domestic') AS `Location_of_inventory`
FROM `project.dataset.inventories`
with result
Row product_id warehouse_id Location_of_inventory
1 1 4 Seattle
2 2 2 San Francisco
3 3 5 Non domestic
Another example of use is:
#standardSQL
CREATE TEMP FUNCTION DECODE(expr ANY TYPE, map ANY TYPE, `default` ANY TYPE ) AS ((
IFNULL((SELECT result FROM UNNEST(map) WHERE search = expr), `default`)
));
WITH `project.dataset.inventories` AS (
SELECT 1 product_id, 4 warehouse_id UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5
), map AS (
SELECT 1 search, 'Southlake' result UNION ALL
SELECT 2, 'San Francisco' UNION ALL
SELECT 3, 'New Jersey' UNION ALL
SELECT 4, 'Seattle'
)
SELECT product_id, warehouse_id,
DECODE(warehouse_id, kv, 'Non domestic') AS `Location_of_inventory`
FROM `project.dataset.inventories`,
(SELECT ARRAY_AGG(STRUCT(search, result)) AS kv FROM map) arr
with the same output
Update to address - "for a reusable UDF, not having to name the fields makes it closer to Oracle's implementation."
CREATE TEMP FUNCTION DECODE(expr ANY TYPE, map ANY TYPE, `default` ANY TYPE ) AS (
IFNULL((
SELECT result FROM (
SELECT NULL AS search, NULL AS result UNION ALL SELECT * FROM UNNEST(map)
)
WHERE search = expr
), `default`)
);
So now - previous examples can be used w/o explicit naming fields as in example below
#standardSQL
CREATE TEMP FUNCTION DECODE(expr ANY TYPE, map ANY TYPE, `default` ANY TYPE ) AS (
IFNULL((
SELECT result FROM (
SELECT NULL AS search, NULL AS result UNION ALL SELECT * FROM UNNEST(map)
)
WHERE search = expr
), `default`)
);
WITH `project.dataset.inventories` AS (
SELECT 1 product_id, 4 warehouse_id UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5
)
SELECT product_id, warehouse_id,
DECODE(warehouse_id,
[ (1,'Southlake'),
(2,'San Francisco'),
(3,'New Jersey'),
(4,'Seattle')
], 'Non domestic') AS `Location_of_inventory`
FROM `project.dataset.inventories`
still with same output as before
```
SELECT product_id,
case
when warehouse_id = 1 then 'Southlake'
when warehouse_id = 2 then 'San Francisco'
when warehouse_id = 3 then 'New Jersey'
when warehouse_id = 4 then 'Seattle'
else
'Non domestic'
end as "Location of inventory" FROM inventories;
```

Pivoting Multiple Attributres and grouping them as a 'single' attribute (many to one)

So I Have a table called Value that's associated with different 'Fields'. Note that some of these fields have similar 'names' but they are named differently. Ultimately I want these 'similar names' to be pivoted/grouped as the same field name in the result set
VALUE_ID VALUE_TX FIELD_NAME Version_ID
1 Yes Adult 1
2 18 Age 1
3 Black Eye Color 1
4 Yes Is_Adult 2
5 25 Years_old 2
6 Brown Color_of_Eyes 2
I have a table called Submitted that looks like the following:
Version_ID Version_Name
1 TEST_RUN
2 REAL_RUN
I need a result set that Looks like this:
Submitted_Name Adult? Age Eye_Color
TEST_RUN Yes 18 Black
REAL_RUN Yes 25 Brown
I've tried the following:
SELECT * FROM (
select value_Tx, field_name, version_id
from VALUE
)
PIVOT (max (value_tx) for field_name in (('Adult', 'Is_Adult') as 'Adult?', ('Age', 'Years_old') as 'Age', ('Eye Color', 'Color_of_Eyes') as 'Eye_Color')
);
What am I doing wrong? Please let me know if I need to add any additional details / data.
Thanks in advance!
The error message that I am getting is the following:
ORA-00907: missing right parenthesis
I would change the field names in the subquery:
SELECT *
FROM (select value_Tx,
(case when field_name in ('Adult', 'Is_Adult') then 'Adult?'
field_name in ('Age', 'Years_old') then 'Age'
field_name in ('Eye Color', 'Color_of_Eyes') then 'Eye_Color'
else field_name
end) as field_name, version_id
from VALUE
)
PIVOT (max(value_tx) for field_name in ('Adult?', 'Age', 'Eye_Color'));
You can use double quotes for column aliasing within the pivot clause's part, and I think decode function suits well for this question. You can consider using the following query :
with value( value_id, value_tx, field_name, version_id ) as
(
select 1 ,'Yes' ,'Adult' ,1 from dual union all
select 2 ,'18' ,'Age' ,1 from dual union all
select 3 ,'Black','Eye_Color' ,1 from dual union all
select 4 ,'Yes' ,'Is_Adult' ,2 from dual union all
select 5 ,'25' ,'Years_old' ,2 from dual union all
select 6 ,'Brown','Color_of_Eyes',2 from dual
), Submitted( version_id, version_name ) as
(
select 1 ,'TEST_RUN' from dual union all
select 2 ,'REAL_RUN' from dual
)
select * from
(
select s.version_name as "Submitted_Name", v.value_Tx,
decode(v.field_name,'Adult','Is_Adult','Age','Years_old','Eye_Color',
'Color_of_Eyes',v.field_name) field_name
from value v
join Submitted s
on s.version_id = v.version_id
group by decode(v.field_name,'Adult','Is_Adult','Age','Years_old','Eye_Color',
'Color_of_Eyes',v.field_name),
v.value_Tx, s.Version_Name
)
pivot(
max(value_tx) for field_name in ( 'Is_Adult' as "Adult?", 'Years_old' as "Age",
'Color_of_Eyes' as "Eye_Color" )
);
Submitted_Name Adult? Age Eye_Color
REAL_RUN Yes 25 Brown
TEST_RUN Yes 18 Black
I think, better to solve as much as shorter way, as an example, using modular arithmetic would even be better as below :
select *
from
(
select s.version_name as "Submitted_Name", v.value_Tx, mod(v.value_id,3) as value_id
from value v
join Submitted s
on s.version_id = v.version_id
group by v.value_Tx, s.version_name, mod(v.value_id,3)
)
pivot(
max(value_tx) for value_id in ( 1 as "Adult?", 2 as "Age", 0 as "Eye_Color" )
)
Demo

How do I combine 2 records with a single field into 1 row with 2 fields (Oracle 11g)?

Here's a sample data
record1: field1 = test2
record2: field1 = test3
The actual output I want is
record1: field1 = test2 | field2 = test3
I've looked around the net but can't find what I'm looking for. I can use a custom function to get it in this format but I'm trying to see if there's a way to make it work without resorting to that.
thanks a lot
You need to use pivot:
with t(id, d) as (
select 1, 'field1 = test2' from dual union all
select 2, 'field1 = test3' from dual
)
select *
from t
pivot (max (d) for id in (1, 2))
If you don't have the id field you can generate it, but you will have XML type:
with t(d) as (
select 'field1 = test2' from dual union all
select 'field1 = test3' from dual
), t1(id, d) as (
select ROW_NUMBER() OVER(ORDER BY d), d from t
)
select *
from t1
pivot xml (max (d) for id in (select id from t1))
There are several ways to approach this - google pivot rows to columns. Here is one set of answers: http://www.dba-oracle.com/t_converting_rows_columns.htm