Insert into Nested records in Bigquery FROM another nested table - sql

I am trying to insert data from one Bigquery table (nested) to another bigquery table (nested). However, I am getting issues during insert.
Source schema: T1
FieldName Type Mode
User STRING NULLABLE
order RECORD REPEATED
order.Name STRING NULLABLE
order.location STRING NULLABLE
order.subscription RECORD NULLABLE
order.subscription.date TIMESTAMP NULLABLE
order.Details RECORD REPEATED
order.Details.id STRING NULLABLE
order.Details.nextDate STRING NULLABLE
Target schema: T2
FieldName Type Mode
User STRING NULLABLE
order RECORD REPEATED
order.Name STRING NULLABLE
order.location STRING NULLABLE
order.subscription RECORD NULLABLE
order.subscription.date TIMESTAMP NULLABLE
order.Details RECORD REPEATED
order.Details.id STRING NULLABLE
order.Details.nextDate STRING NULLABLE
I am trying to use insert into functionality of bigquery. I am looking to insert only few field from source table. My query is like below:
INSERT INTO T2 (user,order.name,order.subscription.date,details.id)
SELECT user,order.name,order.subscription.date,details.id
from
T1 o
join unnest (o.order) order,
unnest ( order.details) details
After a bit of googling I am aware that I would need to use STRUCT when defining field names while inserting, but not sure how to do it. Any help is appreciated. Thanks in advance!

You will have to insert the records as per is needed in your destination table, Struct types need to be inserted fully ( with all the records that it contains ).
I provide a small sample below, I build the following table with a single record to explain this:
create or replace table `project-id.dataset-id.table-source` (
user STRING,
order_detail STRUCT<name STRING, location STRING,subscription STRUCT<datesub TIMESTAMP>,details STRUCT<id STRING,nextDate STRING>>
)
insert into `project-id.dataset-id.table-source` (user,order_detail)
values ('Karen',STRUCT('ShopAPurchase','Germany',STRUCT('2022-03-01'),STRUCT('1','2022-03-05')))
With that information we can now star inserting into our destination tables. In our sample, I'm reusing the source table and just adding an additional record into it like this:
insert into `project-id.dataset-id.table-source` (user,order_detail)
select 'Anna',struct(ox.name,'Japan',ox.subscription,struct('2',dx.nextDate))
from `project-id.dataset-id.table-source` o
join unnest ([o.order_detail]) ox, unnest ([o.order_detail.details]) dx
You will see that in order to perform an unnesting structs I will have to add the value inside an array []. As unnest flatens the struct as a single row. Also, when inserting struct types you will also have to create the struct or use the flattening records to create that struct column.
If you want to add additional records inside a STRUCT you will have to declare your destination table with an ARRAY inside of it. Lets look at this new table source_array:
create or replace table `project-id.dataset-id.table-source_array` (
user STRING,
order_detail STRUCT<name STRING, location STRING,subscription STRUCT<datesub TIMESTAMP>,details ARRAY<STRUCT<id STRING ,nextDate STRING>>>
)
insert into `project-id.dataset-id.table-source_array` (user,order_detail)
values ('Karen',STRUCT('ShopAPurchase','Germany',STRUCT(['2022-03-01']),STRUCT('1','2022-03-05')))
insert into `project-id.dataset-id.table-source_array` (user,order_detail)
select 'Anna',struct(ox.name,'Japan',ox.subscription,[struct('2',dx.nextDate),struct('3',dx.nextDate)])
from `project-id.dataset-id.table-source` o
join unnest ([o.order_detail]) ox, unnest ([o.order_detail.details]) dx
Keep in mind that you should be careful as when dealing with this as you might encounter subarrays error which may cause issues.
I make use of the following documentation for this sample:
STRUCT
UNNEST

Related

Change Schema while creating table

I have an issue later in my process when I want the append tables with a different Datatypes.
I am creating a new table out of an existing table. One column is the Calenderweek(KW) which was originally a STRING. In order to append my tables later on I need the same datatype for the column.
Is there a way to change the datatype for a column while creating the new table?
CREATE TABLE IF NOT EXISTS
MyNewTable
AS(
SELECT
Column_1 AS
Column_1_alias,
**KW_ AS KW,**
FROM
SourceTable);
What this Query does is that it only grabs the value of the column KW that contains a number, then checks if the STRING value contains a character and removes it from the STRING. Finally it CAST to the desired value type of the column, so it ends as an INT.
CREATE TABLE IF NOT EXISTS
dataset.MyNewTable
AS(
SELECT
Column1 AS
Column1_alias,
CAST(REGEXP_REPLACE(KW,'[^0-9^]','') as INT64) as KW_Alias
FROM
`project.dataset.source`
WHERE REGEXP_CONTAINS(KW,'[0-9]')
);
Another possible solution is to use the function REPLACE instead of REGEXP_REPLACE, to replace the string to a number.

Big Query Array data type related issue if the array is NULL

Please find below the table nullarraytest. The create statement:
create table nullarraytest (name array<string>, city string);
The values in the table:
insert into nullarraytest values([],"Mumbai");
insert into nullarraytest values(["abc","def"],"Pune");
insert into nullarraytest values(null,"Surat");
Issue/doubt:
The below query returns no data:
select city from nullarraytest where name is NULL;
It should return 2 rows "Mumbai" and "Surat".
The below query works properly as expected:
select city from nullarraytest where array_length(name)=0;
This returns 2 rows "Mumbai" and "Surat".
Why does the filter "name is null" doesn't work?
As #Jaytiger mentioned in the comment,
It’s given in this GCP Public documentation that BigQuery translates a NULL ARRAY into an empty ARRAY in the query result, although inside the query, NULL and empty ARRAYs are two distinct values. An empty array is not null. For nullable data types, NULL is a valid value. Currently, all existing data types are nullable, but conditions apply for ARRAYs.
You can also go through this StackOverflowpost.

spark sql Insert string column to struct of array type column

I am trying to insert a STRING type column to an ARRAY of STRUCT TYPE column, but facing errors. Could you help to provide the right direction to do the INSERT.
In databricks notebook, I have a raw table (raw_lms.rawTable) where all the columns are string type. This needs to insert into a transform table (tl_lms.transformedTable) where the columns are array of struct type.
CREATE TABLE raw_lms.rawTable
( PrimaryOwners STRING
,Owners STRING
)
USING DELTA LOCATION 'xxxx/rawTable'
CREATE TABLE tl_lms.transformedTable
( PrimaryOwners array<struct<Id:STRING>>
,Owners array<struct<Id:STRING>>
)
USING DELTA LOCATION 'xxxx/transformedTable'
Raw table has the below values populated: Eg.
INSERT INTO TABLE raw_lms.rawTable
VALUES
("[{'Id': '1393fe1b-bba2-4343-dff0-08d9dea59a03'}, {'Id': 'cf2e6549-5d07-458c-9d30-08d9dd5885cf'}]",
"[]"
)
I try to insert to transform table and get the below error:
INSERT INTO tl_lms.transformedTable
SELECT PrimaryOwners,
Owners
FROM raw_lms.rawTable
Error in SQL statement: AnalysisException: cannot resolve
'spark_catalog.raw_lms.rawTable.PrimaryOwners' due to data type
mismatch: cannot cast string to array<struct<Id:string>>;
I do not want to explode the data. I only need to simply insert row for a row between rawTable and transformedTable of different column data types.
Thanks for your time and help.
As the error messages states, you can't insert a string as array. You need to use array and named_struct functions.
Change the type of raw table to correct type and types not strings and try this:
INSERT INTO TABLE raw_lms.rawTable
VALUES
(array(named_struct('id', '1393fe1b-bba2-4343-dff0-08d9dea59a03'), named_struct('id', 'cf2e6549-5d07-458c-9d30-08d9dd5885cf')),
null
);
Or if you want to keep columns as string in raw table, then use from_json to parse the strings into correct type before inserting:
INSERT INTO tl_lms.transformedTable
SELECT from_json(PrimaryOwners, 'array<struct<Id:STRING>>'),
from_json(Owners, 'array<struct<Id:STRING>>')
FROM raw_lms.rawTable

Most elegant way to create 'subrecord' type keeping names of columns

So I'm playing with Postgres' composite types, and I cannot figure out one thing. Suppose I want to use subset of columns of certain table, or mix of different columns of several different tables used in the query, and create a record type out of them.
Logically, simple (c.id, c.name) should work, but it seems that column names are actually lost - it's not possible to address fields of the records by name and id, and, for example, to_json function cannot use field names when creating json out of this record. Using subquery (select c.id, c.name) is predictably failing with subquery must return only one column error.
I can, of course, use lateral join or common table expression to create this sub-type, but I'm thinking - if there's more elegant way?
see db<>fiddle demo with table example and test query
create table test(id integer, name text, price int);
insert into test(id,name,price)
values
(1,'name1',1),
(2,'name2',12),
(3,'name3',23),
(5,'name5',4),
(9,'name9',3);
create type sub_test as (id integer, name text);
select
c.price,
-- using predefined type - works
to_json((c.id, c.name)::sub_test),
-- creating row type on the fly - doesn't work, names are lost
to_json((c.id, c.name)),
-- using derived table created with lateral join - works
to_json(d)
from test as c, lateral(select c.id, c.name) as d

How can I insert a key-value pair into a hive map?

Based on the following tutorial, Hive has a map type. However, there does not seem to be a documented way to insert a new key-value pair into a Hive map, via a SELECT with some UDF or built-in function. Is this possible?
As a clarification, suppose I have a table called foo with a single column, typed map, named column_containing_map.
Now I want to create a new table that also has one column, typed map, but I want each map (which is contained within a single column) to have an additional key-value pair.
A query might look like this:
CREATE TABLE IF NOT EXISTS bar AS
SELECT ADD_TO_MAP(column_containing_map, "NewKey", "NewValue")
FROM foo;
Then the table bar would contain the same maps as table foo except each map in bar would have an additional key-value pair.
Consider you have a student table which contains student marks in various subjects.
hive> desc student;
id string
name string
class string
marks map<string,string>
You can insert values directly to table as below.
INSERT INTO TABLE student
SELECT STACK(1,
'100','Sekar','Mathematics',map("Mathematics","78")
)
FROM empinfo
LIMIT 1;
Here 'empinfo' table can be any table in your database.
And Results are:
100 Sekar Mathematics {"Mathematics":"78"}
for key-value pairs, you can insert like following sql:
INSERT INTO TABLE student values( "id","name",'class',
map("key1","value1","key2","value2","key3","value3","key4","value4") )
please pay attention to sequence of the values in map.
I think the combine function from brickhouse will do what you need. Slightly modifying the query in your original question, it would look something like this
SELECT
combine(column_containing_map, str_to_map("NewKey:NewValue"))
FROM
foo;
The limitation with this example is that str_to_map creates a MAP< STRING,STRING >. If your hive map contains other primitive types for the keys or values, this won't work.
I'm sorry, I didn't quite get this. What do you mean by with some UDF or built-in function?If you wish to insert into a table which has a Map field it's similar to any other datatype. For example :
I have a table called complex1, created like this :
CREATE TABLE complex1(c1 array<string>, c2 map<int,string> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '-' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n';
I also have a file, called com.txt, which contains this :
Mohammad-Tariq,007:Bond
Now, i'll load this data into the above created table :
load data inpath '/inputs/com.txt' into table complex1;
So this table contains :
select * from complex1;
OK
["Mohammad","Tariq"] {7:"Bond"}
Time taken: 0.062 seconds
I have one more table, called complex2 :
CREATE TABLE complex2(c1 map<int,string>);
Now, to select data from complex1 and insert into complex2 i'll do this :
insert into table complex2 select c2 from complex1;
Scan the table to cross check :
select * from complex2;
OK
{7:"Bond"}
Time taken: 0.062 seconds
HTH