I store JSON objects in a column (string). I want to convert it to a table with schema.
JSON_DATA
{"id":"ksah2132","connections":{"structure":["123","456","789"]},"options":[{"id":"AA123","type":"optionA"},{"id":"BB123","type":"optionB"},{"id":"CC123","type":"optionC"}]}
{"id":"ksah3321","connections":{"structure":["567","332","435"]},"options":[{"id":"AA133","type":"optionA"},{"id":"BB156","type":"optionB"},{"id":"CC445","type":"optionC"}]}
Table with Schema:
CREATE TABLE `sandboxabc.raw_data`(`options` array<struct<id:string,type:string>>, `connections` struct<structure:array<string>>, `id` string)
How can I use Spark SQL to insert overwrite into the new table?
My code:
INSERT OVERWRITE TABLE sandboxabc.structured_data
SELECT
from_json (JSON_DATA,'$.options') AS options
,from_json (JSON_DATA,'$.connections') AS connections
,from_json (JSON_DATA,'$.id') AS id
FROM
sandboxabc.raw_data
Sample of output:
id
connection
option
ksah2132
{"structure":["123","456","789"]}
[{"id":"AA123","type":"optionA"},{"id":"BB123","type":"optionB"},{"id":"CC123","type":"optionC"}
Below spark-sql code should work for you. Please note that hive support should be enabled and hive related jars should be present in classpath.
INSERT OVERWRITE TABLE sandboxabc.structured_data
SELECT
id,
from_json(connection, "struct<structure:array<string>>") as connection,
from_json(options, "array<struct<id:string,type:string>>") as options
FROM (
select
get_json_object(JSON_DATA,'$.id') as id,
get_json_object(JSON_DATA,'$.connection') as connection,
get_json_object(JSON_DATA,'$.options') as options
FROM sandboxabc.raw_data)
Related
I have a table in RDBMS like so:
create table test (sno number, entry_date date default sysdate).
Now I want to create a table in hive with a structure as adding a default value to a column.
Hive currently doesn't support the feature of adding default value to any column while creating a table.
As a workaround load data into a temporary table and use the insert overwrite table statement to add the current date and time into the main table.
Create a temporary table:
create table test (sno number);
Load data into the table:
Create final table:
create table final_table (sno number, createDate string);
Finally load the data from temp test table to the final table:
insert overwrite table final_table select sno, FROM_UNIXTIME( UNIX_TIMESTAMP(), 'dd/MM/YYYY' ) from test;
Hive doesn't support DEFAULT fields
Doesn't mean you can't do it, though. Just a two step process of creating one "staging" table, then inserting into a second table and selecting that "default" value.
Adding a default value to a column while creating table in hive
Since you mention,
I've table in RDBMS
You could also use your existing table, and use Sqoop to import the data into Hive.
Actually, I want to move one table to another database.
But spark don't permit this.
Then, how to copy table by spark-sql?
I already tried this.
SELECT *
INTO table1 IN new_database
FROM old_database.table1
But it was not working.
maybe try:
CREATE TABLE new_db.new_table AS
SELECT *
FROM old_db.old_table;
To preserve partitioning and storage format do the following-
Get the complete schema of the existing table by running-
show create table db.old_table
The above query will output the table schema which you can just execute after changing the path name and table name.
Then insert all the rows into the new blank table using-
insert into db.new_table select * from db.old_table
The following snippet will create a new table while preserving the definition of the "old" table.
CREATE TABLE db.new_table LIKE db.old_table;
For more info, check the doc's CREATE TABLE.
I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calculated columns from table1 and insert them into table2. First I wanted a simple test to copy the rows. Can you create an external table from a select statement?
drop table if exists table2;
create external table table2 as
select *
from table1
STORED AS TEXTFILE LOCATION 'wasb://{container name}#{storage name}.blob.core.windows.net/';
yes but you have to seperate it into two commands. First create the external table then fill it.
create external table table2(attribute STRING)
STORED AS TEXTFILE
LOCATION 'table2';
INSERT OVERWRITE TABLE table2 Select * from table1;
The schema of table2 has to be the same as the select query, in this example it consists only of one string attribute.
I know this is too stale question but here is the solution.
CREATE EXTERNAL TABLE table2
STORED AS textfile
LOCATION wasb://....
AS SELECT * FROM table1
Since create external table with "as select" clause is not supported in Hive, first we need to create external table with complete DDL command and then load the data into the table. Please go through this for different data format supports.
create external table table_ext(col1 typ1,...)
STORED AS ORC
LOCATION 'table2'; // optional if not provided then default location is used
INSERT OVERWRITE TABLE table_ext Select * from table1;
make sure table_ext has same DDL as table1.
I'm currently working on a project that uses a Redshift table with 51 columns. However, the person who made the table forgot to add a sortkey to our time column which will hurt performance for our use case if we don't add it.
How can I make a version of the table with our time column as the sortkey? I'm aware that you can't make a column a sortkey if its a member of an existing table, but I was hoping there's a way to do it that doesn't involve writing out the CREATE TABLE syntax by hand; for example, something like this would be nice:
timecube=# CREATE TABLE foo (like bar) sortkey(time);
ERROR: CREATE TABLE LIKE is not supported with DISTSTYLE, DISTKEY(), or SORTKEY() clauses
but as you can see its not supported. Is there another way? As we're still developing we don't need any of existing data.
Using traditional tools like pgdump didn't work well because they don't include any of the Redshift extras like encoding.
Redshift supports specifying the DIST and SORT keys as part of CREATE TABLE AS statements, as per the docs.
CREATE TABLE table_name
DISTSTYLE KEY
DISTKEY ( column )
SORTKEY ( column )
AS
(SELECT *
FROM source_table)
;
First step you need to do use get create table statement for existing table. Then create new table this time add sort key to new table.
Check encoding for old table ( when you load data using copy command it automatically adds compression encodings)
select "column", type, encoding
from pg_table_def where tablename = 'old_table'
When creating new table add encoding type for each column. Create table with Sort key .
Once new table is created use below command
insert into new table ( select * from old table order by time asc)
I have two database in the same schema. My db is in Postgres. I want to copy data of any table (i.e product) of my 1st db into the same table of the 2nd db.
Is it possible to do so using query?
Can't do it as a single SQL command (at least not without dblink), but the easiest way is probably to just use a pipe between two psql's - use COPY on both ends, one sending the data out in CSV format the other one receiving it.
try
insert into db1.table1 select * from db2.table2
It's not possible in vanilla PostgreSQL installation.
If you are able to install contrib modules, use dblink:
INSERT
INTO product
SELECT *
FROM dblink
(
'dbname=sourcedb',
'
SELECT *
FROM product
'
) AS p (id INT, column1 INT, column2 TEXT, …)
This should be run in the target database.