How to add columnname to value in bigquery - google-bigquery

I have a bigquery table that contains 3 "code" fields. some of these fields are used to look up against a code table. Assume the table looks like this:
data table:
id code1 code2 code3 data1
1 Y 3 A IA
2 Y 2 B IB
3 N 5 C IC
in order to perform the lookup, I have to concat the field_name to the value, delimited by a colon. I cannot hardcode the column name. using big query, is there a way to use the table object to infer the column name within the select statement?
for example:
select * from code_table join data_table where code1 = code.code_values
the value of code1 coming out is 'code1:Y' not 'Y'.
I'm wondering if there's a way to inject the column_name dynamically in the code1 value as it's going out to the code_table.
UPDATE 1:
Here's an example output from data_table to join against code_table:
1, code1:Y, code2:3, code3:A, IA
2, code1:Y, code2:2, code3:B, IB
3, code1:N, code2:5, code3:C, IC
Thanks

Does using the TO_JSON_STRING function give the desired output? Here is an example using your data:
WITH `project.dataset.table` AS (
SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT TO_JSON_STRING(t) AS json
FROM `project.dataset.table` AS t;
+---------------------------------------------------------+
| json |
+---------------------------------------------------------+
| {"id":1,"code1":"Y","code2":3,"code3":"A","data1":"IA"} |
| {"id":2,"code1":"Y","code2":2,"code3":"B","data1":"IB"} |
| {"id":3,"code1":"N","code2":5,"code3":"C","data1":"IC"} |
+---------------------------------------------------------+
If you want to strip out the quotes, you can do that too:
WITH `project.dataset.table` AS (
SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT REPLACE(TO_JSON_STRING(t), '"', '') AS json
FROM `project.dataset.table` AS t;
+-----------------------------------------+
| json |
+-----------------------------------------+
| {id:1,code1:Y,code2:3,code3:A,data1:IA} |
| {id:2,code1:Y,code2:2,code3:B,data1:IB} |
| {id:3,code1:N,code2:5,code3:C,data1:IC} |
+-----------------------------------------+
Edit: this gives the exact desired output. I'm assuming that you are okay with referencing id and data by name since it sounds like you don't want to format them in the same way.
WITH `project.dataset.table` AS (
SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT
REGEXP_REPLACE(
FORMAT(
'%d %s %s',
id,
REGEXP_REPLACE(
TO_JSON_STRING(
(SELECT AS STRUCT t.* EXCEPT (id, data1))
),
'["{}]', ''),
data1
),
r'[ ,]', ', '
) AS output
FROM `project.dataset.table` AS t;
+----------------------------------+
| output |
+----------------------------------+
| 1, code1:Y, code2:3, code3:A, IA |
| 2, code1:Y, code2:2, code3:B, IB |
| 3, code1:N, code2:5, code3:C, IC |
+----------------------------------+

#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 AS id, 'Y' AS code1, 3 AS code2, 'A' AS code3, 'IA' AS data1 UNION ALL
SELECT 2, 'Y', 2, 'B', 'IB' UNION ALL
SELECT 3, 'N', 5, 'C', 'IC'
)
SELECT
id,
MAX(IF(col = 1, val, NULL)) AS col1,
MAX(IF(col = 2, val, NULL)) AS col2,
MAX(IF(col = 3, val, NULL)) AS col3,
data1
FROM `project.dataset.table` AS t, UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'"|{|}', ''))) AS val WITH OFFSET col
WHERE col BETWEEN 1 AND 3
GROUP BY id, data1
ORDER BY id
output as below
id col1 col2 col3 data1
1 code1:Y code2:3 code3:A IA
2 code1:Y code2:2 code3:B IB
3 code1:N code2:5 code3:C IC
with above query you just need to know number of code columns so if it is 5 (for example) you need to add two more in SELECT and change BETWEEN 1 AND 3 to BETWEEN 1 AND 5

Related

How to use query for Oracle pivot?

I have four columns like below.
Will I be able to get something like this?
Thanks
I would recommend conditional aggregation. It is a database-independent syntax, which is more flexible than Oracle-specific pivot syntax:
select
piece_id,
max(case when attrb_code = 'A' then attrb_a_value end) a,
max(case when attrb_code = 'B' then attrb_a_value end) b,
max(case when attrb_code = 'C' then attrb_a_value end) c,
max(case when attrb_code = 'D' then attrb_b_value end) d
from mytable
group by piece_id
Just use COALESCE (or NVL) in the PIVOT:
SELECT *
FROM table_name
PIVOT (
MAX( COALESCE( attrb_a_value, attrb_b_value ) )
FOR attrb_code IN (
'A' AS A,
'B' AS B,
'C' AS C,
'D' AS D
)
)
So, for your sample data:
CREATE TABLE table_name ( piece_id, attrb_code, attrb_a_value, attrb_b_value ) AS
SELECT 22333, 'A', 8, NULL FROM DUAL UNION ALL
SELECT 22333, 'B', 9, NULL FROM DUAL UNION ALL
SELECT 22333, 'C', 4, NULL FROM DUAL UNION ALL
SELECT 22333, 'D', NULL, 5 FROM DUAL UNION ALL
SELECT 22332, 'A', 2, NULL FROM DUAL UNION ALL
SELECT 22332, 'B', 3, NULL FROM DUAL UNION ALL
SELECT 22332, 'C', 7, NULL FROM DUAL UNION ALL
SELECT 22332, 'D', NULL, 5 FROM DUAL
This outputs:
PIECE_ID | A | B | C | D
-------: | -: | -: | -: | -:
22333 | 8 | 9 | 4 | 5
22332 | 2 | 3 | 7 | 5
db<>fiddle here

How to get the index of element in an array and return the next element in Hive?

I have two tables.
table1:
id | array1
1 | ['a', 'b', 'c']
2 | ['b', 'a', 'c']
3 | ['c', 'b', 'a']
table2:
id | value2
1 | 'b'
3 | 'a'
I wish to get the following table:
id | value3
1 | 'c'
2 | 'b'
3 | 'c'
Explanation: what I want is that if the id in table1 does't exist in table2, then return the first element of array1. if the id in table1 exists in table2, then return the next element of value2 in array1 (in this case if value2 is the last element in array1, return the first element of array1)
How can I achieve this goal?
Explode array using posexplode, join with table2, calculate position for joined rows, aggregate, extract array elements.
Demo:
with table1 as(
select stack(3,
1, array('a', 'b', 'c'),
2, array('b', 'a', 'c'),
3, array('c', 'b', 'a')
) as (id,array1)
),
table2 as(
select stack(2,
1,'b',
3,'a'
) as (id,value2)
)
select s.id, nvl(s.array1[pos], s.array1[0]) value3
from
(
select s.id, s.array1, min(case when t2.id is not null then s.pos+1 end) pos
from
(
select t.id, t.array1, a.pos, a.value1
from table1 t
lateral view posexplode(t.array1) a as pos, value1
)s left join table2 t2 on s.id=t2.id and s.value1=t2.value2
group by s.id, s.array1
)s
order by id
Result:
id value3
1 c
2 b
3 c

Identify only when value matches

I need to return only rows that have the match e.g Value = A, but I only need the rows that have A and with no other values.
T1:
ID Value
1 A
1 B
1 C
2 A
3 A
3 B
4 A
5 B
5 D
5 E
5 F
Desired Output:
2
4
how can I achieve this?
when I try the following, 1&3 are also returned:
select ID from T1 where Value ='A'
With NOT EXISTS:
select t.id
from tablename t
where t.value = 'A'
and not exists (
select 1 from tablename
where id = t.id and value <> 'A'
)
From the sample data you posted there is no need to use:
select distinct t.id
but if you get duplicates then use it.
Another way if there are no null values:
select id
from tablename
group by id
having sum(case when value <> 'A' then 1 else 0 end) = 0
Or if you want the rows where the id has only 1 value = 'A':
select id
from tablename
group by id
having count(*) = 1 and max(value) = 'A'
I think the simplest way is aggregation with having:
select id
from tablename
group by id
having min(value) = max(value) and
min(value) = 'A';
Note that this ignores NULL values so it could return ids with both NULL and A. If you want to avoid that:
select id
from tablename
group by id
having count(value) = count(*) and
min(value) = max(value) and
min(value) = 'A';
Oracle Setup:
CREATE TABLE test_data ( ID, Value ) AS
SELECT 1, 'A' FROM DUAL UNION ALL
SELECT 1, 'B' FROM DUAL UNION ALL
SELECT 1, 'C' FROM DUAL UNION ALL
SELECT 2, 'A' FROM DUAL UNION ALL
SELECT 3, 'A' FROM DUAL UNION ALL
SELECT 3, 'B' FROM DUAL UNION ALL
SELECT 4, 'A' FROM DUAL UNION ALL
SELECT 5, 'B' FROM DUAL UNION ALL
SELECT 5, 'D' FROM DUAL UNION ALL
SELECT 5, 'E' FROM DUAL UNION ALL
SELECT 5, 'F' FROM DUAL
Query:
SELECT ID
FROM test_data
GROUP BY ID
HAVING COUNT( CASE Value WHEN 'A' THEN 1 END ) = 1
AND COUNT( CASE Value WHEN 'A' THEN NULL ELSE 1 END ) = 0
Output:
| ID |
| -: |
| 2 |
| 4 |
db<>fiddle here

Aggregate multiple columns into an array only when the columns have non null value in Bigquery

I have a table that looks like this:
+----+------+------+------+------+------+
| id | col1 | col2 | col3 | col4 | col5 |
+----+------+------+------+------+------+
| a | 1 | null | null | null | null |
| b | 1 | 2 | 3 | 4 | null |
| c | 1 | 2 | 3 | 4 | 5 |
| d | 2 | 1 | 7 | null | 4 |
+----+------+------+------+------+------+
I want to create an aggregated table where for each id I want an array that contains non null value from all the other columns. The output should look like this:
+-----+-------------+
| id | agg_col |
+-----+-------------+
| a | [1] |
| b | [1,2,3,4] |
| c | [1,2,3,4,5] |
| d | [2,1,7,4] |
+-----+-------------+
Is it possible to produce the output using bigquery standard sql?
Below is not super generic solution, but works for your specific example that you provided - id is presented with alphanumeric (not starting with digit) and rest of columns are numbers - integers
#standardSQL
SELECT id,
ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
You can test, play with above using sample data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != '') AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(ARRAY(SELECT * FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d*)')) col WHERE col != ''), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
with result as
Row id agg_col_as_array agg_col_as_string
1 a 1 [1]
2 b 1 [1,2,3,4]
2
3
4
3 c 1 [1,2,3,4,5]
2
3
4
5
4 d 2 [2,1,7,4]
1
7
4
Do you think it is possible to do this by mentioning specific columns and then binding them into an array?
Sure, it is doable - see below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(
SELECT col
FROM UNNEST([col1, col2, col3, col4, col5]) col
WHERE NOT col IS NULL
) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
ARRAY(
SELECT CAST(col AS STRING)
FROM UNNEST([col1, col2, col3, col4, col5]) col
WHERE NOT col IS NULL
), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
BUT ... this is not the best option you have as you need to manage and adjust number and names of columns in each case for different uses
Below solution is adjusted version of my original answer to address your latest comment - Actually the sample was too simple. Both of my id and other columns have alphanumeric and special characters.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, 1 col1, NULL col2, NULL col3, NULL col4, NULL col5 UNION ALL
SELECT 'b', 1, 2, 3, 4, NULL UNION ALL
SELECT 'c', 1, 2, 3, 4, 5 UNION ALL
SELECT 'd', 2, 1, 7, NULL, 4
)
SELECT id,
ARRAY(
SELECT col
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET
WHERE col != 'null' AND OFFSET > 0
) AS agg_col_as_array,
CONCAT('[', ARRAY_TO_STRING(
ARRAY(
SELECT col
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(.*?)(?:,|})')) col WITH OFFSET
WHERE col != 'null' AND OFFSET > 0
), ','), ']') AS agg_col_as_string
FROM `project.dataset.table` t
-- ORDER BY id
both with same result as before
Row id agg_col_as_array agg_col_as_string
1 a 1 [1]
2 b 1 [1,2,3,4]
2
3
4
3 c 1 [1,2,3,4,5]
2
3
4
5
4 d 2 [2,1,7,4]
1
7
4

T-SQL ORDER BY base on MIN of a group's column

Hi take the following data as an example
id | value
----------
A | 3
A | 9
B | 7
B | 2
C | 4
C | 5
I want to list out all the data base on the min value of each id group, so that the expected output is
id | value
----------
B | 2
B | 7
A | 3
A | 9
C | 4
C | 5
i.e. min of group A is 3, group B is 2, group C is 4, so group B first and then the rest of group B in ascending order. Next group A and then group C
I tried this but thats not what I want
SELECT * FROM (
SELECT 'A' AS id, '3' AS value
UNION SELECT 'A', '9' UNION SELECT 'B', '7' UNION SELECT 'B', '2'
UNION SELECT 'C', '4' UNION SELECT 'C', '5') data
GROUP BY id, value
ORDER BY MIN(value)
Please help! Thank you
SELECT * FROM (
SELECT 'A' AS id, '3' AS value
UNION SELECT 'A', '9' UNION SELECT 'B', '7' UNION SELECT 'B', '2'
UNION SELECT 'C', '4' UNION SELECT 'C', '5') data
ORDER BY MIN(value) OVER(PARTITION BY id), id, value
OVER Clause (Transact-SQL)
Add the over() clause to your query output and you can see what it does for you.
SELECT *,
MIN(value) OVER(PARTITION BY id) OrderedBy FROM (
SELECT 'A' AS id, '3' AS value
UNION SELECT 'A', '9' UNION SELECT 'B', '7' UNION SELECT 'B', '2'
UNION SELECT 'C', '4' UNION SELECT 'C', '5') data
ORDER BY MIN(value) OVER(PARTITION BY id), id, value
Result:
id value OrderedBy
---- ----- ---------
B 2 2
B 7 2
A 3 3
A 9 3
C 4 4
C 5 4