spark sql Insert string column to struct of array type column

spark sql Insert string column to struct of array type column - sql

I am trying to insert a STRING type column to an ARRAY of STRUCT TYPE column, but facing errors. Could you help to provide the right direction to do the INSERT.
In databricks notebook, I have a raw table (raw_lms.rawTable) where all the columns are string type. This needs to insert into a transform table (tl_lms.transformedTable) where the columns are array of struct type.
CREATE TABLE raw_lms.rawTable
( PrimaryOwners STRING
,Owners STRING
)
USING DELTA LOCATION 'xxxx/rawTable'
CREATE TABLE tl_lms.transformedTable
( PrimaryOwners array<struct<Id:STRING>>
,Owners array<struct<Id:STRING>>
)
USING DELTA LOCATION 'xxxx/transformedTable'
Raw table has the below values populated: Eg.
INSERT INTO TABLE raw_lms.rawTable
VALUES
("[{'Id': '1393fe1b-bba2-4343-dff0-08d9dea59a03'}, {'Id': 'cf2e6549-5d07-458c-9d30-08d9dd5885cf'}]",
"[]"
)
I try to insert to transform table and get the below error:
INSERT INTO tl_lms.transformedTable
SELECT PrimaryOwners,
Owners
FROM raw_lms.rawTable
Error in SQL statement: AnalysisException: cannot resolve
'spark_catalog.raw_lms.rawTable.PrimaryOwners' due to data type
mismatch: cannot cast string to array<struct<Id:string>>;
I do not want to explode the data. I only need to simply insert row for a row between rawTable and transformedTable of different column data types.
Thanks for your time and help.

As the error messages states, you can't insert a string as array. You need to use array and named_struct functions.
Change the type of raw table to correct type and types not strings and try this:
INSERT INTO TABLE raw_lms.rawTable
VALUES
(array(named_struct('id', '1393fe1b-bba2-4343-dff0-08d9dea59a03'), named_struct('id', 'cf2e6549-5d07-458c-9d30-08d9dd5885cf')),
null
);
Or if you want to keep columns as string in raw table, then use from_json to parse the strings into correct type before inserting:
INSERT INTO tl_lms.transformedTable
SELECT from_json(PrimaryOwners, 'array<struct<Id:STRING>>'),
from_json(Owners, 'array<struct<Id:STRING>>')
FROM raw_lms.rawTable

Related

Big Query Array data type related issue if the array is NULL

Please find below the table nullarraytest. The create statement:
create table nullarraytest (name array<string>, city string);
The values in the table:
insert into nullarraytest values([],"Mumbai");
insert into nullarraytest values(["abc","def"],"Pune");
insert into nullarraytest values(null,"Surat");
Issue/doubt:
The below query returns no data:
select city from nullarraytest where name is NULL;
It should return 2 rows "Mumbai" and "Surat".
The below query works properly as expected:
select city from nullarraytest where array_length(name)=0;
This returns 2 rows "Mumbai" and "Surat".
Why does the filter "name is null" doesn't work?

As #Jaytiger mentioned in the comment,
It’s given in this GCP Public documentation that BigQuery translates a NULL ARRAY into an empty ARRAY in the query result, although inside the query, NULL and empty ARRAYs are two distinct values. An empty array is not null. For nullable data types, NULL is a valid value. Currently, all existing data types are nullable, but conditions apply for ARRAYs.
You can also go through this StackOverflowpost.

'Row value misused' when inserting an array of structs into database

The following insert query resulted in row value misused error (query generated by golang gorm):
INSERT INTO employees_details (img_uid, img_files)
VALUES
(
"asfe123y3uygy43",
(
"{\"ImageID\":\"ISDx-Y0fudfhv4lC_M25j\",\"ImageType\":\"original\",\"MediaType\":\"img/png\",\"MediaPath\":\"it-profile/faergbgbder34fgb/original/154895123-owen.png\"}",
"{\"ImageID\":\"fgsrtbrdsthrb\",\"ImageType\":\"thumbnail\",\"MediaType\":\"img/png\",\"MediaPath\":\"it-profile/faergbgbder34fgb/thumb/154895123-owen.png\"}"
)
)
The column img_files is a column containing an array of structs ([]img_files{string, string, string, string}). When I insert only 1 struct, the query works fine. But when I want to insert two at once (an array of 2 elements), I got row value misused error.

Insert into Nested records in Bigquery FROM another nested table

I am trying to insert data from one Bigquery table (nested) to another bigquery table (nested). However, I am getting issues during insert.
Source schema: T1
FieldName Type Mode
User STRING NULLABLE
order RECORD REPEATED
order.Name STRING NULLABLE
order.location STRING NULLABLE
order.subscription RECORD NULLABLE
order.subscription.date TIMESTAMP NULLABLE
order.Details RECORD REPEATED
order.Details.id STRING NULLABLE
order.Details.nextDate STRING NULLABLE
Target schema: T2
FieldName Type Mode
User STRING NULLABLE
order RECORD REPEATED
order.Name STRING NULLABLE
order.location STRING NULLABLE
order.subscription RECORD NULLABLE
order.subscription.date TIMESTAMP NULLABLE
order.Details RECORD REPEATED
order.Details.id STRING NULLABLE
order.Details.nextDate STRING NULLABLE
I am trying to use insert into functionality of bigquery. I am looking to insert only few field from source table. My query is like below:
INSERT INTO T2 (user,order.name,order.subscription.date,details.id)
SELECT user,order.name,order.subscription.date,details.id
from
T1 o
join unnest (o.order) order,
unnest ( order.details) details
After a bit of googling I am aware that I would need to use STRUCT when defining field names while inserting, but not sure how to do it. Any help is appreciated. Thanks in advance!

You will have to insert the records as per is needed in your destination table, Struct types need to be inserted fully ( with all the records that it contains ).
I provide a small sample below, I build the following table with a single record to explain this:
create or replace table `project-id.dataset-id.table-source` (
user STRING,
order_detail STRUCT<name STRING, location STRING,subscription STRUCT<datesub TIMESTAMP>,details STRUCT<id STRING,nextDate STRING>>
)
insert into `project-id.dataset-id.table-source` (user,order_detail)
values ('Karen',STRUCT('ShopAPurchase','Germany',STRUCT('2022-03-01'),STRUCT('1','2022-03-05')))
With that information we can now star inserting into our destination tables. In our sample, I'm reusing the source table and just adding an additional record into it like this:
insert into `project-id.dataset-id.table-source` (user,order_detail)
select 'Anna',struct(ox.name,'Japan',ox.subscription,struct('2',dx.nextDate))
from `project-id.dataset-id.table-source` o
join unnest ([o.order_detail]) ox, unnest ([o.order_detail.details]) dx
You will see that in order to perform an unnesting structs I will have to add the value inside an array []. As unnest flatens the struct as a single row. Also, when inserting struct types you will also have to create the struct or use the flattening records to create that struct column.
If you want to add additional records inside a STRUCT you will have to declare your destination table with an ARRAY inside of it. Lets look at this new table source_array:
create or replace table `project-id.dataset-id.table-source_array` (
user STRING,
order_detail STRUCT<name STRING, location STRING,subscription STRUCT<datesub TIMESTAMP>,details ARRAY<STRUCT<id STRING ,nextDate STRING>>>
)
insert into `project-id.dataset-id.table-source_array` (user,order_detail)
values ('Karen',STRUCT('ShopAPurchase','Germany',STRUCT(['2022-03-01']),STRUCT('1','2022-03-05')))
insert into `project-id.dataset-id.table-source_array` (user,order_detail)
select 'Anna',struct(ox.name,'Japan',ox.subscription,[struct('2',dx.nextDate),struct('3',dx.nextDate)])
from `project-id.dataset-id.table-source` o
join unnest ([o.order_detail]) ox, unnest ([o.order_detail.details]) dx
Keep in mind that you should be careful as when dealing with this as you might encounter subarrays error which may cause issues.
I make use of the following documentation for this sample:
STRUCT
UNNEST

Add a column which stores JSON data

I have created table like:
CREATE TABLE demo
(
name varchar(50),
adress nvarchar
);
But I'm not getting how to insert data into adress column which stores data of JSON object like:
INSERT INTO demo (name, adress)
VALUES ('vamsi', N'{"city":"avhfb","pin":46374});
Like this: The values I get in this adress column is dynamic count so that's why I need to store it in JSON format.

Don't overthink it. JSON is still a string, so nvarchar will be fine, but add a length to they datatype (probably nvarchar(max) for a JSON object).

Retrieve and insert into type objects in oracle

I have created an object type(address-city,state) in Oracle 10g .Then table cust_contact contains field of type address.Can anyone please provide SQL query to insert and retrieve values from this table including the type?

Selection is easy. Just include the type column in the query projection. Assuming that the ADDRESS column is called contact_address:
select id, contact_name, contact_address
from cust_contact
/
With inserts you need to specify the type in the statement:
insert into cust_contact values
(some_seq.nextval
, 'MR KNOX'
, address(34, 'Main Street', 'Whoville', 'SU')
)
/

You can also use the "." syntax when retrieving columns:
select c.contact_address.city from cust_contact c;
Please note that if cust_contact is a table of objects, then you must use the table alias "c".

for example :
first create type object say as address ,for this synatx or query is used:
create type address_ty as object(Street varchar2(50),City char(10),Zip number(6));
now use this address_ty as datatype in the time of table creation
for example:
create table Example(emp_name varchar2(10),emp_id number(10),address address_ty);
this will create table Example having
Address as address_ty as a datatype..
Now insert into Values In Example Table
Insert Into example Values('Sandeep Kumar',595,address_ty('Snap on sector 126','Noida',201301);
tHANX....

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

spark sql Insert string column to struct of array type column - sql

Related

Big Query Array data type related issue if the array is NULL

'Row value misused' when inserting an array of structs into database

Insert into Nested records in Bigquery FROM another nested table

Add a column which stores JSON data

Retrieve and insert into type objects in oracle

Categories

Resources