I am using two columns from the iris dataset as an example - sepal_length and sepal_width.
I have two tables
create table iris(sepal_length real, sepal_width real);, and
create table raw_json(data jsonb);
And inside the JSON file I have data like this
[{"sepal_width":3.5,"sepal_length":5.1},{"sepal_width":3.0,"sepal_length":4.9}]
First thing I do is copy raw_json from '/data.json';
So far I have only been able to figure out how to use jsonb_array_elements.
select jsonb_array_elements(data) from raw_json; gives back
jsonb_array_elements
-------------------------------------------
{"sepal_width": 3.5, "sepal_length": 5.1}
{"sepal_width": 3.0, "sepal_length": 4.9}
I want to insert (append actually) the data from raw_json table into iris table. I have figured out that I need to use either jsonb_to_recordset or json_populate_recordset. But how?
Also, could this be done without the raw_json table?
PS - Almost all the existing SE questions use a raw json string inside their queries. So that didn't work for me.
You must extract the jsons from the output of jsonb_array_elements used in from clause
INSERT INTO iris(sepal_length,sepal_width)
select (j->>'sepal_length' ) ::real,
(j->>'sepal_width' ) ::real
from raw_json cross join jsonb_array_elements(data) as j;
DEMO
Related
How can I add a new key/val pair in an already existing JSON col in bigqyery using SQL (big query flavor).
To something like
BigQuery provides Data Manipulation Language (DML) statements such as the SQL Update statement. See:
https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_statement
What you will have to do is retrieve the original value of your structured column and then perform a SQL UPDATE statement to set the new value of the column to be the absolute new value that you want.
Take care to realize that BigQuery is an OLAP database and is optimized for queries rather that updates or deletes. Make sure you read the information on using DML statements in BigQuery found here.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language
I feel like this question is less about how to update the table, but more about how to adjust existing json with extra/new key:value (then to either update table or just simply select out)
So, I assume you have table like below
and you might have another table with those new key:value pairs to use
in case if you don't really have second table - you can just use CTE like below
with new_key_val as (
select 1 id, '{"key3":"value3"}' add_json union all
select 2 id, '{"key14":"value14"}'
)
So, having above - you can use below approach
select *,
( select '{' || string_agg(trim(kv)) || ',' || trim(add_json, '{}') || '}'
from unnest(split(trim(json_col, '{}'), ',')) kv
) adjusted_json
from your_table
left join new_key_val
using(id)
with output
BigQuery supports JSON as a native data type but only offers a limited set of JSON functions. Unless your json data has a pre-defined, simple schema with known keys, you probably want to go the string-manipulation way.
I have a db table called ex_table and
Location is a column.
when i ran query it shows array structure.
I need extract array element.
My Query was
Select location form ex_table
it shows
[{country=BD, state=NIL, city=NIL}]
how do I select only city form location column?
Try the following:
WITH dataset AS (
SELECT location
FROM ex_table
)
SELECT places.city
FROM dataset, UNNEST (location) AS t(places)
As this is an array of objects, you need to flatten the data. This is done using the UNNEST syntax in Athena. More info on this can be found in the AWS documentation
I have a hive table which has 3 columns, such that combining 1 of them is a string, second is a string-ified json array and third is a string-ified json object. I want to retrieve a field from the string-ified json object, whose relevant key can be obtained by combining 1st column with 1st element in the string-fied json array in second array.
get_json_object(
get_json_object(
column3,concat(
"$.",column1,"__",
get_json_object(
column2,"$[0]"
))),
"$.fieldofinterest")
as field_of_interest
I wrote the above construct, to retrieve the field of interest.
When this is written as part of a select...from statement, I get the correct output in field_of_interest column.
When it is written as part of a create table t1 as select...from statement, the table gets created where field_of_interest is NULL for all rows.
There is no failure in the create...select statement. All other columns get populated fine. I am using get_json_object in other columns, they are not nested. They populate fine. Only this column doesn't.
What could be causing this? How can I begin to debug this? Had no luck with other stack-overflow answers.
Figured this out. It was because of version gap between where I was testing my query and where I was eventually running it.
I was testing it on hiveserver2. It ran on a spark 2.2 backend, where get_json_object is able to parse json arrays. On the server, where I was running it, hive 0.13 version was running which doesn't parse JSON arrays. I wrapped the array in an object and it worked like a charm.
get_json_object(
get_json_object(
column3,
concat(
"$.",
column1,
"__",
get_json_object(
concat('{"x":',column2,'}'),
"$.x[0]"
)
)
),
"$.fieldofinterest"
) as field_of_interest
My hive table looks like this :
CREATE EXTERNAL TABLE sample(id STRING,products STRUCT<urls:ARRAY<STRUCT<url:STRING>>,product_names:ARRAY<STRUCT<name:STRING>>,user:ARRAY<STRUCT<user_id:STRING>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE
LOCATION ‘/user/input/sample’;
Is there any way to explode the products field, so that it should store the url,name,user_id into three different columns ?
Can anyone please suggest me out regarding the same ....
you should be able to explode the your three arrays as follow
select url, product_name, user_id from sample
lateral VIEW explode(products.urls) A as url
lateral VIEW explode(products.product_names) B as product_name
lateral VIEW explode(products.user) C as user_id
;
I have a requirement to construct an SQL that has a where clause which is expected to look into a file entries to be used in that clause.
SELECT DISTINCT '/',t_05.puid, ',', t_01.puid,'/', t_01.poriginal_file_name
FROM PWORKSPACEOBJECT t_02, PREF_LIST_0 t_03, PPOM_APPLICATION_OBJECT t_04, PDATASET t_05, PIMANFILE t_01
WHERE t_03.pvalu_0 = t_01.puid AND t_02.puid = t_03.puid AND t_03.puid = t_04.puid AND t_04.puid = t_05.puid AND t_02.puid IN ( 'izeVNXjf44e$yB',
'gWYRvN9044e$yB' );
The above is the SQL query. As you can see the IN clause has two different strings ( puids ) that are to be considered. But in my case, this list is like 50k entries long and would come from splunk and will be in a text file.
Sample output of the text file looks as belows:
'gWYRvN9044e$yB',
'DOZVpdOQ44e$yB',
'TlfVpdOQ44e$yB',
'wOWRehUc44e$yB',
'wyeRehUc44e$yB',
'w6URehUc44e$yB',
'wScRehUc44e$yB',
'yzXVNXjf44e$yB',
'guWRvN9044e$yB',
'QiYRehUc44e$yB',
'gycRvN9044e$yB'
I am not an SQL guru, but a quick google on this gave me a reference to OPENROWSET construct, which is not available on Oracle.
Can you please suggest some pointers on what can be done to circumvent the problem.
Thanks,
Pavan.
Consider using an external table, SQL Loader or perhaps loading the file into a table in the application layer and querying it normally.
I would recommend creating a Global Temporary table, adding the rows to that table, and then joining to your temp table.
How to create a temporary table in Oracle
Other options:
You could also use pipelined functions:
https://oracle-base.com/articles/misc/pipelined-table-functions
Or use the with as... construct to fold the data into the SQL. But that would create a long SQL statement.