BigQuery SQL - adding new record to existing record

BigQuery SQL - adding new record to existing record - google-bigquery

I have two tables I want to create a UNION of, but they do not have the same schema.
Table 1 looks like this:
Table 1
Table 2 looks like this:
Table 2
How can I select all data for Table 2 and add a record for details.employment.locations city,regionCode, and Formatted. I want to just include null values for all of them using BigQuery SQL.
Thank you,

This should work. I have created ctes for table1 as table2 as per your schema. Since you have tables in place you can start your query from UNIONED cte
WITH TABLE1 AS
(
SELECT
'WEBSITE 1' AS WEBSITE,
STRUCT (
STRUCT ('NAME 1 ') AS NAME,
STRUCT ('AGE 1') AS AGE,
'GENDER' AS GENDER,
STRUCT(
'EMPLOYEMENT NAME 1' AS NAME,
FALSE AS _CURRENT,
'EMPLOYEMENT TITLE 1' AS TITLE,
STRUCT(
'EMPLOYEMENT LOCATION 1' AS CITY,
'EMPLOYEMENT REGION_CODE 1' AS REGION_CODE,
'EMPLOYEMENT FORMATTED 1' AS FORMATTED
) AS LOCATION
) AS EMPLOYEMENT
) AS DETAILS
)
, TABLE2 AS
(
SELECT
'WEBSITE 1' AS WEBSITE,
STRUCT (
STRUCT ('NAME 2') AS NAME,
STRUCT ('AGE 2') AS AGE,
'GENDER' AS GENDER,
STRUCT(
'EMPLOYEMENT NAME 2' AS NAME,
FALSE AS _CURRENT,
'EMPLOYEMENT TITLE 2' AS TITLE
) AS EMPLOYEMENT
) AS DETAILS
)
,UNIONED AS
(
SELECT WEBSITE,
DETAILS.NAME AS NAME,
DETAILS.AGE AS AGE,
DETAILS.GENDER AS GENDER,
DETAILS.EMPLOYEMENT.NAME AS EMPLOYEMENT_NAME,
DETAILS.EMPLOYEMENT._CURRENT AS EMPLOYEMENT_CURRENT,
DETAILS.EMPLOYEMENT.TITLE AS EMPLOYEMENT_TITLE,
DETAILS.EMPLOYEMENT.LOCATION.CITY AS EMPLOYEMENT_LOCATION_CITY,
DETAILS.EMPLOYEMENT.LOCATION.REGION_CODE AS EMPLOYEMENT_LOCATION_REGION_CODE,
DETAILS.EMPLOYEMENT.LOCATION.FORMATTED AS EMPLOYEMENT_LOCATION_FORMATTED,
FROM TABLE1
UNION ALL
SELECT WEBSITE,
DETAILS.NAME,
DETAILS.AGE,
DETAILS.GENDER,
DETAILS.EMPLOYEMENT.NAME,
DETAILS.EMPLOYEMENT._CURRENT,
DETAILS.EMPLOYEMENT.TITLE,
NULL ,
NULL,
NULL
FROM TABLE2
)
SELECT
WEBSITE,
STRUCT (
STRUCT (U.NAME ) AS NAME,
STRUCT (U.AGE) AS AGE,
U.GENDER,
STRUCT (
U.EMPLOYEMENT_NAME AS NAME,
U.EMPLOYEMENT_CURRENT AS _CURRENT,
U.EMPLOYEMENT_TITLE AS TITLE,
STRUCT (
U.EMPLOYEMENT_LOCATION_CITY AS CITY,
U.EMPLOYEMENT_LOCATION_REGION_CODE AS REGION_COD,
U.EMPLOYEMENT_LOCATION_FORMATTED AS FORMATTED
) AS LOCATION
) AS EMPLOYMENT
) AS DETAILS
FROM UNIONED AS U

Related

Select data from tables when needed columns are stored as records in a different table

An app is developed where a user picks what data he wants to see in a report. Having data as
ReportDataValues
ID
TableName
ColumnName
1
customer
first_name
2
address
zip_code
Customer
ID
first_name
last_name
address_id
1
joe
powell
1
2
andy
smith
2
Address
ID
street
zip_code
1
main ave.
48521
2
central str.
56851
is it possible using generic SQL mechanisms (PIVOT, UNPIVOT or other way) to select such data from only specified table.column pairs in DataValues table as rows so the query is compatible with SQL Server and Oracle and is not using dynamic execution of generated statements (like EXEC(query) or EXECUTE IMMEDIATE (query) ), so the result would be like
Col1
Col2
joe
48521
andy
56851
Later SQL statement will be used in a SAP Crystal Reports reporting engine.

In Oracle, join the customer and address tables to every row of reportdatavalues and then use a CASE expression to correlate the expected value with the table columns and pivot:
SELECT col1, col2
FROM (
SELECT c.id,
r.id AS value_id,
CASE
WHEN r.tablename = 'customer' AND r.columnname = 'id'
THEN TO_CHAR(c.id)
WHEN r.tablename = 'customer' AND r.columnname = 'first_name'
THEN c.first_name
WHEN r.tablename = 'customer' AND r.columnname = 'last_name'
THEN c.last_name
WHEN r.tablename = 'address' AND r.columnname = 'street'
THEN a.street
WHEN r.tablename = 'address' AND r.columnname = 'zip_code'
THEN TO_CHAR(a.zip_code)
END AS value
FROM customer c
INNER JOIN address a
ON a.id = c.address_id
CROSS JOIN ReportDataValues r
)
PIVOT (
MAX(value) FOR value_id IN (1 AS col1, 2 AS col2)
)
Which, for the sample data:
CREATE TABLE ReportDataValues (ID, TableName, ColumnName) AS
SELECT 1, 'customer', 'first_name' FROM DUAL UNION ALL
SELECT 2, 'address', 'zip_code' FROM DUAL;
CREATE TABLE Customer (ID, first_name, last_name, address_id) AS
SELECT 1, 'joe', 'powell', 1 FROM DUAL UNION ALL
SELECT 2, 'andy', 'smith', 2 FROM DUAL;
CREATE TABLE Address (ID, street, zip_code) AS
SELECT 1, 'main ave.', 48521 FROM DUAL UNION ALL
SELECT 2, 'central str.', 56851 FROM DUAL;
Outputs:
COL1
COL2
joe
48521
andy
56851
fiddle

How to write a BigQuery query that produces the count of the unique transactions and the combination of column names populated

I’m trying to write a query in BigQuery that produces the count of the unique transactions and the combination of column names populated.
I have a table:
TRAN CODE
Full Name
Given Name
Surname
DOB
Phone
The result set I’m after is:
TRAN CODE
UNIQUE TRANSACTIONS
NAME OF POPULATED COLUMNS
A
3
Full Name
A
4
Full Name,Phone
B
5
Given Name,Surname
B
10
Given Name,Surname,DOB,Phone
The result set shows that for TRAN CODE A
3 distinct customers provided Full Name
4 distinct customers provided Full Name and Phone #
For TRAN CODE B
5 distinct customers provided Given Name and Surname
10 distinct customers provided Given Name, Surname, DOB, Phone #
Currently to produce my results I’m doing it manually.
I tried using ARRAY_AGG but couldn’t get it working.
Any advice work be appreciated.
Thank you.

I think you want something like this:
select tran_code,
array_to_string(array[case when full_name is not null then 'full_name' end,
case when given_name is not null then 'given_name' end,
case when surname is not null then 'surname' end,
case when dob is not null then 'dob' end,
case when phone is not null then 'phone' end
], ','),
count(*)
from t
group by 1, 2

Consider below approach - no any dependency on column names rather than TRAN_CODE - quite generic!
select TRAN_CODE,
count(distinct POPULATED_VALUES) as UNIQUE_TRANSACTIONS,
POPULATED_COLUMNS
from (
select TRAN_CODE,
( select as struct
string_agg(col, ', ' order by offset) POPULATED_COLUMNS,
string_agg(val order by offset) POPULATED_VALUES,
string_agg(cast(offset as string) order by offset) pos
from unnest(regexp_extract_all(to_json_string(t), r'"([^"]+?)":')) col with offset
join unnest(regexp_extract_all(to_json_string(t), r'"[^"]+?":("[^"]+?"|null)')) val with offset
using(offset)
where val != 'null'
and col != 'TRAN_CODE'
).*
from `project.dataset.table` t
)
group by TRAN_CODE, POPULATED_COLUMNS
order by TRAN_CODE, any_value(pos)
below is output example

#Gordon_Linoff's solution is the best, but an alternative would be to do the following:
SELECT
TRAN_CODE,
COUNT(TRAN_ROW) AS unique_transactions,
populated_columns
FROM (
SELECT
TRAN_CODE,
TRAN_ROW,
# COUNT(value) AS unique_transactions,
STRING_AGG(field, ",") AS populated_columns
FROM (
SELECT
* EXCEPT(DOB),
CAST(DOB AS STRING ) AS DOB,
ROW_NUMBER() OVER () AS TRAN_ROW
FROM
sample) UNPIVOT(value FOR field IN (Full_name,
Given_name,
Surname,
DOB,
Phone))
GROUP BY
TRAN_CODE,
TRAN_ROW )
GROUP BY
TRAN_CODE,
populated_columns
But this should be more expensive...

Oracle conditional select based on address type

I'm working on a query to get student contact mailing addresses, and am at point where I am a bit stuck. I have managed to get a list of all student, and their contacts, but now when i try and join the contacts to their addresses, i'm not exactly sure how to get the correct address.
In the address table can hold multiple kinds of addresses (Home, Mailing, Business, Pickup, Dropoff) and basically what i need to do is only bring back one address per contact.
Normally this would be the home address, unless there is a mailing address
So my question is how do i write some type of conditional statement to only get entries WHERE ADDRESS_TYPE_NAME = 'Home' unless there is also an entry WHERE ADDRESS_TYPE_NAME = 'Mailing' for the same PERSON_ID?
Thanks

with CTE as
(
select Person_id,
Address_Type_Name,
Address_Info -- replace with your real column names
from Address_Table
where Address_Type_Name in ('Home','Mailing')
)
select Person_id, Address_info
from CTE a1
where Address_Type_Name = 'Home'
and not exists (select 1
from CTE a2
where a2.Address_Type_Name = 'Mailing'
and a2.Person_id = a1.Person_id)
union
select Person_id, Address_info
from CTE a1
where Address_Type_Name = 'Mailing'

You can prioritize Address Type and get highest priority type with
select Person_id,
case min(case Address_Type_Name
when 'Mailing' then 1
when 'Home' then 2
-- more
end)
when 1 then 'Mailing'
when 2 then 'Home'
-- more
end Best_Address_Type_Name
from Address_Table
group by Person_id;
Then join the result to your data as needed

Here is one way to do it, using the row_number() analytic function and not requiring any joins, explicit or implicit. It also handles various special cases: a student who has neither mailing nor home address (but still needs to be shown in the output), and another student with two mailing addresses (in which case a random one is chosen; if there are criteria to prefer one to the other, the query can be easily adapted to accommodate that).
with
students ( id, name, address_type, address ) as (
select 11, 'Andy', 'home' , '123 X street' from dual union all
select 11, 'Andy', 'office' , 'somewhere else' from dual union all
select 15, 'Eva' , 'mailing', 'post office' from dual union all
select 18, 'Jim' , 'office' , '1 building' from dual union all
select 30, 'Mary', 'mailing', 'mail addr 1' from dual union all
select 30, 'Mary', 'office' , '1 building' from dual union all
select 30, 'Mary', 'home' , 'her home' from dual union all
select 30, 'Mary', 'mailing', 'mail addr 2' from dual
)
-- End of test data (not needed for the SQL query - reference your actual table)
select id, name, address_type,
case when address_type is not null then address end as address
from (
select id, name,
case when address_type in ('home', 'mailing')
then address_type end as address_type,
address,
row_number() over (partition by id
order by case address_type when 'mailing' then 0
when 'home' then 1 end) as rn
from students
)
where rn = 1
;
ID NAME ADDRESS_TYPE ADDRESS
--- ---- ------------ --------------
11 Andy home 123 X street
15 Eva mailing post office
18 Jim
30 Mary mailing mail addr 1
4 rows selected.

Selecting id's where all rows match

I have two tables : DOCUMENT and METADATA. DOCUMENT stores an ID and some informations we're not interested in, METADATA stores "tags" for those documents. A tag is composed of a key and a value.
So for one document, there is only one entry in the DOCUMENT table, but possibly many in the METADATA table.
Now what I need is to pass a set of keys/values, and retrieve from the METADATA table only the documents that match ALL the keys/values. Which means inspecting different rows "at the same time", well, I don't really know how to do it.
Quick example:
META_KEY | META_VALUE | META_DOCUMENT_ID
----------------------------------------
Firstname| Chris | 1
Lastname | Doe | 1
Firstname| Chris | 2
Lastname | Moe | 2
So if I query with the following tags : "Firstname"="Chris", "Lastname"="Doe", I want 1 as result. If I only specify "Firstname"="Chris" I want both 1 and 2 as results.
Thanks a lot for any help !
EDIT :
How about something where I count the number of tags that have to match ?
Like this :
select meta_document_id, count(*) from metadata where (meta_key = 'Firstname' and meta_value = 'Chris') or (meta_key = 'Lastname' and meta_value = 'Doe') group by meta_document_id
With the count(*) I can easily find out if all the input key/value pairs have matched. How would that run performance-wise ?

Well, you are employing a database model named "key-value" or "Entity-attributte-value".
This is usually not a best choice, you can read more on this in these questions:
Key/Value pairs in a database table
Key value pairs in relational database
You need two separate queries for these two cases like this:
SELECT distinct META_DOCUMENT_ID
FROM METADATA
WHERE meta_key = 'Firstname' and meta_value = 'Chris'
SELECT distinct m1.META_DOCUMENT_ID
FROM METADATA m1
JOIN METADATA m2
ON m1.META_DOCUMENT_ID = m2.META_DOCUMENT_ID
WHERE m1.meta_key = 'Firstname' and m1.meta_value = 'Chris'
AND m2.meta_key = 'Lastname' and m2.meta_value = 'Doe'
EDIT:
I suppose I'll have to join N times the table for N key/value pairs ?
This could be done without a join, for example like below (assuming that each id has no more than 1 meta_key value):
SELECT META_DOCUMENT_ID
FROM METADATA
WHERE (meta_key, meta_value) IN
( ('Firstname' ,'Chris'), ('Lastname', 'Doe' ) )
GROUP BY META_DOCUMENT_ID
HAVING COUNT(*) = 2 /* 2 means that we are looking for 2 meta keys */
How is that going to run performance-wise ?
Terribly. See an explanation from links above about this model.
This query must in many cases do a full table scan (especially when a number of attributes/keys we are looking for is more than a few), count values for each id, then pick these id that have count = 2.
In a normalized model this is a simple query that can use indexes to quickly pick only these few rows with firstname = 'Chris'
SELECT *
FROM table
WHERE firstname = 'Chris' and lastname = 'Doe'

Oracle Setup:
CREATE TYPE KEY_VALUE_PAIR IS OBJECT (
KEY VARCHAR2(50),
VALUE VARCHAR2(50)
);
/
CREATE TYPE KEY_VALUE_TABLE IS TABLE OF KEY_VALUE_PAIR;
/
CREATE TABLE meta_data ( meta_key, meta_value, meta_document_id ) AS
SELECT 'Firstname', 'Chris', 1 FROM DUAL UNION ALL
SELECT 'Lastname', 'Doe', 1 FROM DUAL UNION ALL
SELECT 'Phonenumber', '555-2368', 1 FROM DUAL UNION ALL
SELECT 'Firstname', 'Chris', 2 FROM DUAL UNION ALL
SELECT 'Lastname', 'Moe', 2 FROM DUAL UNION ALL
SELECT 'Phonenumber', '555-0001', 2 FROM DUAL;
Query:
SELECT meta_document_id
FROM (
SELECT meta_document_id,
CAST(
COLLECT(
KEY_VALUE_PAIR( meta_key, meta_value )
) AS KEY_VALUE_TABLE
) AS key_values
FROM meta_data
GROUP BY meta_document_id
)
WHERE KEY_VALUE_TABLE(
-- Your values here:
KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
KEY_VALUE_PAIR( 'Lastname', 'Doe' )
)
SUBMULTISET OF key_values;
Output:
META_DOCUMENT_ID
------------------
1
Update - Reimplementing the meta data table using a nested table:
Oracle Setup:
CREATE TYPE KEY_VALUE_PAIR IS OBJECT (
META_KEY VARCHAR2(50),
META_VALUE VARCHAR2(50)
);
/
CREATE TYPE KEY_VALUE_TABLE IS TABLE OF KEY_VALUE_PAIR;
/
CREATE TABLE meta_data (
meta_document_id INT,
key_values KEY_VALUE_TABLE
) NESTED TABLE key_values STORE AS meta_data_key_values;
CREATE UNIQUE INDEX META_DATA_KEY_VALUES_IDX ON META_DATA_KEY_VALUES (
NESTED_TABLE_ID,
META_KEY,
META_VALUE
);
/
-- Insert everything in one go:
INSERT INTO META_DATA VALUES(
1,
KEY_VALUE_TABLE(
KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
KEY_VALUE_PAIR( 'Lastname', 'Doe' ),
KEY_VALUE_PAIR( 'Phonenumber', '555-2368' )
)
);
-- Insert everything in bits:
INSERT INTO meta_data VALUE ( 2, KEY_VALUE_TABLE() );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
( meta_key, meta_value ) VALUES( 'Firstname', 'Chris' );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
( meta_key, meta_value ) VALUES( 'Lastname', 'Moe' );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
( meta_key, meta_value ) VALUES( 'Phonenumber', '555-0001' );
--Select all the key-value pairs:
SELECT META_DOCUMENT_ID,
META_KEY,
META_VALUE
FROM META_DATA md,
TABLE( md.KEY_VALUES );
Query:
The changes above let you simplify the query a lot:
SELECT META_DOCUMENT_ID
FROM meta_data
WHERE KEY_VALUE_TABLE(
-- Your values here:
KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
KEY_VALUE_PAIR( 'Lastname', 'Doe' )
)
SUBMULTISET OF key_values;

If you know in advance all the possible TAGS, an approach could be with some PIVOT:
with METADATA (META_KEY, META_VALUE, META_DOCUMENT_ID) as
(
select 'Firstname', 'Chris',1 from dual union all
select 'Lastname', 'Doe',1 from dual union all
select 'Firstname', 'Chris',2 from dual union all
select 'Lastname', 'Moe',2 from dual
)
select *
from metadata
PIVOT ( max (META_VALUE ) FOR (META_KEY) IN ('Firstname' AS Firstname, 'Lastname' AS Lastname))
where Firstname = 'Chris' /* and Lastname ='Doe' ...*/

Print the record as vertical result SQL

How to show the result as vertical in SQL
i have an idea using PIVOT but i can't make it to work.
SELECT '1' ID
, 'Vincent' Name
, 'Enteng' NickName
, 'Male' Gender
Result is:
but i want the result to be
ID 1
Name Vincent
NickName Enteng
Gender Male

If you are just dealing with one record, then use union all:
SELECT 'ID' as which, '1' as value union all
SELECT 'Name', 'Vincent' union all
SELECT 'NickName', 'Enteng' union all
SELECT 'Gender', 'Male'
Note in some databases, you might need from dual or even another construct.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery SQL - adding new record to existing record - google-bigquery

Related

Select data from tables when needed columns are stored as records in a different table

How to write a BigQuery query that produces the count of the unique transactions and the combination of column names populated

Oracle conditional select based on address type

Selecting id's where all rows match

Print the record as vertical result SQL

Categories

Resources