Hive - FROM <BLAH> INSERT - hive

In the Hive documentation page about dynamic partitioning, there is this particular example of a multi-table insert:
FROM S
INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10
INSERT OVERWRITE TABLE R PARTITION (ds='2010-03-03, hr=12)
SELECT key, value, ds, hr from srcpart where ds is not null and hr = 12;
What is the FROM S clause all about? The multi-table insert query looks like it should work even without it, so what am I missing?

That is the syntax for multiple inserts.If you do not want multiple passes over the Hive object,then having FROM from_statement(table,view,file,subquery) precede the multiple Insert Overwrite Table will result in a single pass over the from_statement object.
In the above example, it really should be FROM srcpart or it should FROM S srcpart which refers to Hive object S with alias srcpart from which the two partitions T & R are created.
Example
Query 1:
hive> INSERT OVERWRITE TABLE Raining SELECT * FROM weather WHERE rain = 1;
hive> INSERT OVERWRITE TABLE Sunny SELECT * FROM weather WHERE rain = 0;
Query 2:
hive> FROM weather
INSERT OVERWRITE TABLE Raining SELECT * WHERE rain = 1
INSERT OVERWRITE TABLE Sunny SELECT * WHERE rain = 0;
The query 2 will result in a single pass over table weather while the query 1 will result in 2 passes for creating the same 2 objects.

Related

`CREATE TABLE AS SELECT FROM` in Oracle Cloud doesn't create a new table

I was trying to create a series of tables in a single SQL query in Oracle Cloud under the ADMIN account. In the minimum script below, RAW_TABLE refers to an existing table.
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
SELECT * FROM BASE3
This returns a view of the first 100 rows in BASE3, but it doesn't create the three tables along the way. Did I miss something or is there something peculiar about create table statements in Oracle SQL?
EDIT: The environment is Oracle Database Actions in Oracle Cloud. The three tables would not be available in the list of tables in the database, and doing something like select * from BASE3 in a subsequent query would fail.
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
SELECT * FROM BASE3
Above is a valid query sequence for Oracle database. It should have been created three new tables in database. Since it's not happening please do the work in few steps to find out what's wrong.
First please check whether RAW_TABLE is available in database or not. Then try to select data from RAW_TABLE
select * from RAW_TABLE;
If all those are successful then try to create single table with below query:
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
Hope you would find the problem by then.
DB-Fiddle:
Creating RAW_TABLE and populating data
create table RAW_TABLE (id int, name varchar(50));
insert into RAW_TABLE values (1,'A');
Query to create three more tables ans selecting from the last table:
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
SELECT * FROM BASE3
Output:
ID
NAME
1
A
db<>fiddle here
your query fails because you are executing the whole script as one batch and each line is depends on another one , the transactional DBMS's work with blocks of code as one transaction , and that block of code doesn't commit until sql engine can parse and validate the whole block, and since in your block, BASE1 and BASE2 tables doesn't exists just yet , It fails.
so you need to run each statement as a separate batch. either by executing them one by one or in Oracle you can use / as batch separator, like in sql server you can use GO. these commands are not SQL or Oracle commands and are not sent to the database server , they are just break block of code in batches on your client ( like SQL*Plus or shell or SSMS (for Microsoft sql server), so It would look like this:
CREATE TABLE BASE1 AS SELECT * FROM RAW_TABLE;
/
CREATE TABLE BASE2 AS SELECT * FROM BASE1;
/
CREATE TABLE BASE3 AS SELECT * FROM BASE2;
/
SELECT * FROM BASE3
if your client doesn't support that then you only have to run them one by one in separate batches.

Get just Inserted/Modified record based on updated_timestamp

I want to get just Inserted/Modified record based on updated_timestamp.
I have following scenario for DB2 database:
Triggering insert or update query to DB. The table contains updated_timestamp field which capture the insert or updated time.
Want to get my previous inserted/ updated record only using select query.
Example
insert into table_name(x,y,CURRENT TIMESTAMP);
want to get the above inserted record using select as
select * from table_name where updated_timestamp > ?
with what value should I replace the ?, above query should return me latest inserted record as x,y,<time_stamp>
If I understand what your asking, couldn't you use a subquery pulling the max(updated_timestamp) and other values from the table and use that to filter to only the most recently updated records for each one?
Something like this:
insert into table_name (x, y, timestamp)
Select table_name.x, table_name.y, DateTime()
from table_name join (select x, y, Max(updated_timestamp)
updated_timestamp from table_name) table_name2
on table_name.x = table_name2.x and table_name.y = tablename2.y
and table_name.updated_timestamp = table_name2.updated_timestamp
if your db2 version as this option you can use final table like this
SELECT updated_timestamp
FROM FINAL TABLE (INSERT INTO table_name (X, X, updated_timestamp )
VALUES(valueforX, valueforY, CURRENT TIMESTAMP));
look IBM Doc
you can use a variable too:
CREATE OR REPLACE VARIABLE YOURLIB.MYTIMESTAMP TIMESTAMP DEFAULT CURRENT TIMESTAMP;
INSERT INTO table_name (X, X, updated_timestamp )
VALUES(valueforX, valueforY, YOURLIB.MYTIMESTAMP));
but the best solution is update your table with you primary key and get your timestamp with your primary key after.
A suggestion, you use trigger for update last timestamp. May be can you use autotimestamp like this :
CREATE TABLE table_name
(
X VARCHAR(36),
Y VARCHAR(36),
CHANGE_TS TIMESTAMP FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP NOT NULL
)

Embedding Current Date in Hive Insert

I am trying to get the current date into a Hive database (version 0.13 running on an HDInsight cluster) with the following script
SET curdt = from_unixtime(unix_timestamp());
DROP TABLE IF EXISTS curtime_test;
CREATE TABLE curtime_test (
dateEntered STRING
);
INSERT INTO TABLE curtime_test
SELECT '${hivevar:curdt}' FROM hivesampletable limit 3;
SELECT * FROM curtime_test;
Note that I want to have the same insert date for all the inserted records, this is a toy example, but the real one I want to use it on has millions of records to insert. This version I tried above just inserts the string '${hivevar:curdt}' into the database, which is not what I want:
${hivevar:curdt}
${hivevar:curdt}
${hivevar:curdt}
Omitting the quotes causes the insert to error out because of the spaces in the string. How can I do this right?
Update:
Using the line
SELECT ${hiveconf:curdt} FROM hivesampletable limit 3;
as per the comment from Charlie Haley (I mixed up ${hivevar} and ${hiveconf}), gives me the results that I want. If he writes it up as an answer I will mark it as right.
The following code sample works for me. Does this solve your problem?
DROP TABLE IF EXISTS curtime_test;
CREATE TABLE curtime_test (
dateEntered STRING
);
INSERT INTO TABLE curtime_test
SELECT unix_timestamp() FROM hivesampletable limit 1;
SELECT * FROM curtime_test;

insert data from one mysql table to another

i am trying to run this SQL query to copy data from one table to another but it shows zero rows returned:
INSERT INTO ticket_updates
VALUES
(
SELECT *
FROM ticket_updates2
WHERE sequence = '4715'
)
You cannot put a select statement into a values method.
Your syntax should look like this:
INSERT into ticket_updates(all,my,columns)
select all,my,columns from ticket_updates2 where sequence = '4715'

Hive insert query like SQL

I am new to hive, and want to know if there is anyway to insert data into Hive table like we do in SQL. I want to insert my data into hive like
INSERT INTO tablename VALUES (value1,value2..)
I have read that you can load the data from a file to hive table or you can import data from one table to hive table but is there any way to append the data as in SQL?
Some of the answers here are out of date as of Hive 0.14
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL
It is now possible to insert using syntax such as:
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2));
INSERT INTO TABLE students
VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
You can use the table generating function stack to insert literal values into a table.
First you need a dummy table which contains only one line. You can generate it with the help of limit.
CREATE TABLE one AS
SELECT 1 AS one
FROM any_table_in_your_database
LIMIT 1;
Now you can create a new table with literal values like this:
CREATE TABLE my_table AS
SELECT stack(3
, "row1", 1
, "row2", 2
, "row3", 3
) AS (column1, column2)
FROM one
;
The first argument of stack is the number of rows you are generating.
You can also add values to an existing table:
INSERT INTO TABLE my_table
SELECT stack(2
, "row4", 1
, "row5", 2
) AS (column1, column2)
FROM one
;
Slightly better version of the unique2 suggestion is below:
insert overwrite table target_table
select * from
(
select stack(
3, # generating new table with 3 records
'John', 80, # record_1
'Bill', 61 # record_2
'Martha', 101 # record_3
)
) s;
Which does not require the hack with using an already exiting table.
You can use below approach. With this, You don't need to create temp table OR txt/csv file for further select and load respectively.
INSERT INTO TABLE tablename SELECT value1,value2 FROM tempTable_with_atleast_one_records LIMIT 1.
Where tempTable_with_atleast_one_records is any table with atleast one record.
But problem with this approach is that If you have INSERT statement which inserts multiple rows like below one.
INSERT INTO yourTable values (1 , 'value1') , (2 , 'value2') , (3 , 'value3') ;
Then, You need to have separate INSERT hive statement for each rows. See below.
INSERT INTO TABLE yourTable SELECT 1 , 'value1' FROM tempTable_with_atleast_one_records LIMIT 1;
INSERT INTO TABLE yourTable SELECT 2 , 'value2' FROM tempTable_with_atleast_one_records LIMIT 1;
INSERT INTO TABLE yourTable SELECT 3 , 'value3' FROM tempTable_with_atleast_one_records LIMIT 1;
No. This INSERT INTO tablename VALUES (x,y,z) syntax is currently not supported in Hive.
You could definitely append data into an existing table. (But it is actually not an append at the HDFS level). It's just that whenever you do a LOAD or INSERT operation on an existing Hive table without OVERWRITE clause the new data will be put without replacing the old data. A new file will be created for this newly inserted data inside the directory corresponding to that table. For example :
I have a file named demo.txt which has 2 lines :
ABC
XYZ
Create a table and load this file into it
hive> create table demo(foo string);
hive> load data inpath '/demo.txt' into table demo;
Now,if I do a SELECT on this table it'll give me :
hive> select * from demo;
OK
ABC
XYZ
Suppose, I have one more file named demo2.txt which has :
PQR
And I do a LOAD again on this table without using overwrite,
hive> load data inpath '/demo2.txt' into table demo;
Now, if I do a SELECT now, it'll give me,
hive> select * from demo;
OK
ABC
XYZ
PQR
HTH
Ways to insert data into Hive table:
for demonstration, I am using table name as table1 and table2
create table table2 as select * from table1 where 1=1;
or
create table table2 as select * from table1;
insert overwrite table table2 select * from table1;
--it will insert data from one to another. Note: It will refresh the target.
insert into table table2 select * from table1;
--it will insert data from one to another. Note: It will append into the target.
load data local inpath 'local_path' overwrite into table table1;
--it will load data from local into the target table and also refresh the target table.
load data inpath 'hdfs_path' overwrite into table table1;
--it will load data from hdfs location iand also refresh the target table.
or
create table table2(
col1 string,
col2 string,
col3 string)
row format delimited fields terminated by ','
location 'hdfs_location';
load data local inpath 'local_path' into table table1;
--it will load data from local and also append into the target table.
load data inpath 'hdfs_path' into table table1;
--it will load data from hdfs location and also append into the target table.
insert into table2 values('aa','bb','cc');
--Lets say table2 have 3 columns only.
Multiple insertion into hive table
Yes you can insert but not as similar to SQL.
In SQL we can insert the row level data, but here you can insert by fields (columns).
During this you have to make sure target table and the query should have same datatype and same number of columns.
eg:
CREATE TABLE test(stu_name STRING,stu_id INT,stu_marks INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
INSERT OVERWRITE TABLE test SELECT lang_name, lang_id, lang_legacy_id FROM export_table;
To insert entire data of table2 in table1. Below is a query:
INSERT INTO TABLE table1 SELECT * FROM table2;
You can't do insert into to insert single record. It's not supported by Hive. You may place all new records that you want to insert in a file and load that file into a temp table in Hive. Then using insert overwrite..select command insert those rows into a new partition of your main Hive table. The constraint here is your main table will have to be pre partitioned. If you don't use partition then your whole table will be replaced with these new records.
Enter the following command to insert data into the testlog table with some condition:
INSERT INTO TABLE testlog SELECT * FROM table1 WHERE some condition;
I think in such scenarios you should be using HBASE which facilitates such kind of insertion but it does not provide any SQL kind of query language. You need you use Java API of HBASE like the put method to do such kind of insertion. Moreover HBASE is column oriented no-sql database.
You still can insert into complex type in Hive - it works
(id is int, colleagues array)
insert into emp (id,colleagues) select 11, array('Alex','Jian') from (select '1')
you can add values to specific columns as well, just specify the column names in which you like to add corresponding values:
Insert into Table (Col1, Col2, Col4,col5,Col7) Values ('Va11','Va2','Val4','Val5','Val7');
Make sure the columns you skip dont have not null value type.
There are few properties to set to make a Hive table support ACID properties and to insert the values into tables as like in SQL .
Conditions to create a ACID table in Hive.
The table should be stored as ORC file. Only ORC format can support ACID prpoperties for now.
The table must be bucketed
Properties to set to create ACID table:
set hive.support.concurrency =true;
set hive.enforce.bucketing =true;
set hive.exec.dynamic.partition.mode =nonstrict
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads= 1;
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set the property hive.in.test to true in hive.site.xml
After setting all these properties , the table should be created with tblproperty 'transactional' ='true'. The table should be bucketed and saved as orc
CREATE TABLE table_name (col1 int,col2 string, col3 int) CLUSTERED BY col1 INTO 4
BUCKETS STORED AS orc tblproperties('transactional' ='true');
Now its possible to inserte values into the table like SQL query.
INSERT INTO TABLE table_name VALUES (1,'a',100),(2,'b',200),(3,'c',300);
Yes we can use Insert query in Hive.
hive> create table test (id int, name string);
INSERT: INSERT...VALUES is available starting in version 0.14.
hive> insert into table test values (1,'mytest');
This is going to work for insert. We have to use values keyword.
Note: User cannot insert data into a complex datatype column (array, map, struct, union) using the INSERT INTO...VALUES clause.