How to insert direct values into a hive table? - sql

I am new to hive. I just wanted to know how I can insert data into Hive table directly
Create table t1 ( name string)
and I want to insert a value eg name = 'John'
But I have seen so many documentation there isn't any example that inserts data directly into the table. Either I need to create a file internal or externally and add the value 'John' and load this data into the table or i can load data from another table.
My goal is to add data directly into the hive table by providing a values directly? I have provided an oracle example of a sql query I want to achieve:
INSERT INTO t1 (name)
values ('John')
I want an equivalent statement as above in Hive ?

You can use hive's table generating functions,like exlode() or stack()
Example
Table struct as (name String, age Int)
INSERT INTO TABLE target_table
SELECT STACK(
2, # Amount of record
'John', 80, # record 1
'Bill', 61 # record 2
)
FROM dual # Any table already exists
LIMIT 2; # Amount of record! Have to add this line!
That will add 2 records in your target_table.

As of latest version of Hive, insert into .. values (...)is not supported. The enhancement to insert/update/delete syntax is under development. Please look at the Implement insert, update, and delete in Hive with full ACID support

Inserting values into a table is now supported by HIVE from the version Hive 0.14.
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
More can be found at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL

Related

how to select a column with more than one values from non-existing table

How to select more than one value from a 'non-existing' table.
For example:
SELECT ('val1', 'val2') as "test"
How to make a column "test" with two values/rows val1 and val2?
To temporarily generate rows with values, PostgreSQL has following options:
VALUES Lists like VALUES (1, 'one'), (2, 'two'), (3, 'three');
WITH Queries (Common Table Expressions) common-table-expression
CREATE TABLE TEMPORARY to create a table for temporary use in this session
For small numbers of rows I would suggest the VALUES expression.
For more rows and usage in complex queries I would suggest the temporary table.
See also:
Temporary table postgresql function
A Step-by-Step Guide To PostgreSQL Temporary Table
PostgreSQL Common Table Expressions vs a temporary table? - Database Administrators Stack Exchange
you can use union two rows in table like below code:
select 'val1' as "test"
union
select 'val2' as "test"
If you use SQL Server family, you can use a query similar to this :
declare #tmpTable as table (test varchar(50))
insert into #tmpTable
select 'val' Go 100
now you can use #tmpTable

Inserting Values Into Table with Identity Column via Databricks

I've created a table in Databricks that is mapped to a table hosted in an Azure SQL DB. I'm trying to do a very simple insert statement on a small table, but an identity column is giving me issues. This table has the aforementioned identity column and three additional columns.
I first tried something similar to below:
%sql
INSERT INTO tableName (col2, col3, col4)
VALUES (1, 'Test Value', '2018-11-16')
That was giving me a syntax error, so I did some searching and learned that Hive SQL doesn't allow you to specify columns for an INSERT statement. So then I tried something like below as a test:
%sql
INSERT INTO tableName
VALUES (100, 1, 'Test Value', '2018-11-16')
That gives me an error message that I can't insert explicit values into an identity column, but that's what I expected to happen.
If I can't specify the columns for my INSERT statement, how do I avoid issues when I have an identity column? I just want to insert values for the non-identity columns, and I want the ID column to continue incrementing like normal. The above example is extremely watered-down. I will need to do much larger insertions based on SELECT statements eventually, so any solution involving toggling on IDENTITY_INSERT probably isn't feasible.
Below is how we can create a table with an identity column -
CREATE TABLE table_name
(column_name1 data_type GENERATED ALWAYS AS IDENTITY,
column_name2......)
Below are the two ways how we can insert the data into the table with the Identity column -
First way -
INSERT INTO T2 (CHARCOL2)
SELECT CHARCOL1 FROM T1;
Second way -
INSERT INTO T2 (CHARCOL2,IDENTCOL2) OVERRIDING USER VALUE
SELECT * FROM T1;
Links for reference-
Create table - https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html
Insert into table - https://www.ibm.com/docs/en/db2-for-zos/11?topic=statement-rules-inserting-data-into-identity-column

Hive insert query like SQL

I am new to hive, and want to know if there is anyway to insert data into Hive table like we do in SQL. I want to insert my data into hive like
INSERT INTO tablename VALUES (value1,value2..)
I have read that you can load the data from a file to hive table or you can import data from one table to hive table but is there any way to append the data as in SQL?
Some of the answers here are out of date as of Hive 0.14
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL
It is now possible to insert using syntax such as:
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2));
INSERT INTO TABLE students
VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
You can use the table generating function stack to insert literal values into a table.
First you need a dummy table which contains only one line. You can generate it with the help of limit.
CREATE TABLE one AS
SELECT 1 AS one
FROM any_table_in_your_database
LIMIT 1;
Now you can create a new table with literal values like this:
CREATE TABLE my_table AS
SELECT stack(3
, "row1", 1
, "row2", 2
, "row3", 3
) AS (column1, column2)
FROM one
;
The first argument of stack is the number of rows you are generating.
You can also add values to an existing table:
INSERT INTO TABLE my_table
SELECT stack(2
, "row4", 1
, "row5", 2
) AS (column1, column2)
FROM one
;
Slightly better version of the unique2 suggestion is below:
insert overwrite table target_table
select * from
(
select stack(
3, # generating new table with 3 records
'John', 80, # record_1
'Bill', 61 # record_2
'Martha', 101 # record_3
)
) s;
Which does not require the hack with using an already exiting table.
You can use below approach. With this, You don't need to create temp table OR txt/csv file for further select and load respectively.
INSERT INTO TABLE tablename SELECT value1,value2 FROM tempTable_with_atleast_one_records LIMIT 1.
Where tempTable_with_atleast_one_records is any table with atleast one record.
But problem with this approach is that If you have INSERT statement which inserts multiple rows like below one.
INSERT INTO yourTable values (1 , 'value1') , (2 , 'value2') , (3 , 'value3') ;
Then, You need to have separate INSERT hive statement for each rows. See below.
INSERT INTO TABLE yourTable SELECT 1 , 'value1' FROM tempTable_with_atleast_one_records LIMIT 1;
INSERT INTO TABLE yourTable SELECT 2 , 'value2' FROM tempTable_with_atleast_one_records LIMIT 1;
INSERT INTO TABLE yourTable SELECT 3 , 'value3' FROM tempTable_with_atleast_one_records LIMIT 1;
No. This INSERT INTO tablename VALUES (x,y,z) syntax is currently not supported in Hive.
You could definitely append data into an existing table. (But it is actually not an append at the HDFS level). It's just that whenever you do a LOAD or INSERT operation on an existing Hive table without OVERWRITE clause the new data will be put without replacing the old data. A new file will be created for this newly inserted data inside the directory corresponding to that table. For example :
I have a file named demo.txt which has 2 lines :
ABC
XYZ
Create a table and load this file into it
hive> create table demo(foo string);
hive> load data inpath '/demo.txt' into table demo;
Now,if I do a SELECT on this table it'll give me :
hive> select * from demo;
OK
ABC
XYZ
Suppose, I have one more file named demo2.txt which has :
PQR
And I do a LOAD again on this table without using overwrite,
hive> load data inpath '/demo2.txt' into table demo;
Now, if I do a SELECT now, it'll give me,
hive> select * from demo;
OK
ABC
XYZ
PQR
HTH
Ways to insert data into Hive table:
for demonstration, I am using table name as table1 and table2
create table table2 as select * from table1 where 1=1;
or
create table table2 as select * from table1;
insert overwrite table table2 select * from table1;
--it will insert data from one to another. Note: It will refresh the target.
insert into table table2 select * from table1;
--it will insert data from one to another. Note: It will append into the target.
load data local inpath 'local_path' overwrite into table table1;
--it will load data from local into the target table and also refresh the target table.
load data inpath 'hdfs_path' overwrite into table table1;
--it will load data from hdfs location iand also refresh the target table.
or
create table table2(
col1 string,
col2 string,
col3 string)
row format delimited fields terminated by ','
location 'hdfs_location';
load data local inpath 'local_path' into table table1;
--it will load data from local and also append into the target table.
load data inpath 'hdfs_path' into table table1;
--it will load data from hdfs location and also append into the target table.
insert into table2 values('aa','bb','cc');
--Lets say table2 have 3 columns only.
Multiple insertion into hive table
Yes you can insert but not as similar to SQL.
In SQL we can insert the row level data, but here you can insert by fields (columns).
During this you have to make sure target table and the query should have same datatype and same number of columns.
eg:
CREATE TABLE test(stu_name STRING,stu_id INT,stu_marks INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
INSERT OVERWRITE TABLE test SELECT lang_name, lang_id, lang_legacy_id FROM export_table;
To insert entire data of table2 in table1. Below is a query:
INSERT INTO TABLE table1 SELECT * FROM table2;
You can't do insert into to insert single record. It's not supported by Hive. You may place all new records that you want to insert in a file and load that file into a temp table in Hive. Then using insert overwrite..select command insert those rows into a new partition of your main Hive table. The constraint here is your main table will have to be pre partitioned. If you don't use partition then your whole table will be replaced with these new records.
Enter the following command to insert data into the testlog table with some condition:
INSERT INTO TABLE testlog SELECT * FROM table1 WHERE some condition;
I think in such scenarios you should be using HBASE which facilitates such kind of insertion but it does not provide any SQL kind of query language. You need you use Java API of HBASE like the put method to do such kind of insertion. Moreover HBASE is column oriented no-sql database.
You still can insert into complex type in Hive - it works
(id is int, colleagues array)
insert into emp (id,colleagues) select 11, array('Alex','Jian') from (select '1')
you can add values to specific columns as well, just specify the column names in which you like to add corresponding values:
Insert into Table (Col1, Col2, Col4,col5,Col7) Values ('Va11','Va2','Val4','Val5','Val7');
Make sure the columns you skip dont have not null value type.
There are few properties to set to make a Hive table support ACID properties and to insert the values into tables as like in SQL .
Conditions to create a ACID table in Hive.
The table should be stored as ORC file. Only ORC format can support ACID prpoperties for now.
The table must be bucketed
Properties to set to create ACID table:
set hive.support.concurrency =true;
set hive.enforce.bucketing =true;
set hive.exec.dynamic.partition.mode =nonstrict
set hive.compactor.initiator.on = true;
set hive.compactor.worker.threads= 1;
set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set the property hive.in.test to true in hive.site.xml
After setting all these properties , the table should be created with tblproperty 'transactional' ='true'. The table should be bucketed and saved as orc
CREATE TABLE table_name (col1 int,col2 string, col3 int) CLUSTERED BY col1 INTO 4
BUCKETS STORED AS orc tblproperties('transactional' ='true');
Now its possible to inserte values into the table like SQL query.
INSERT INTO TABLE table_name VALUES (1,'a',100),(2,'b',200),(3,'c',300);
Yes we can use Insert query in Hive.
hive> create table test (id int, name string);
INSERT: INSERT...VALUES is available starting in version 0.14.
hive> insert into table test values (1,'mytest');
This is going to work for insert. We have to use values keyword.
Note: User cannot insert data into a complex datatype column (array, map, struct, union) using the INSERT INTO...VALUES clause.

Retrieve inserted row ID in SQL

How do I retrieve the ID of an inserted row in SQL?
Users Table:
Column | Type
--------|--------------------------------
ID | * Auto-incrementing primary key
Name |
Age |
Query Sample:
insert into users (Name, Age) values ('charuka',12)
In MySQL:
SELECT LAST_INSERT_ID();
In SQL Server:
SELECT SCOPE_IDENTITY();
In Oracle:
SELECT SEQNAME.CURRVAL FROM DUAL;
In PostgreSQL:
SELECT lastval();
(edited: lastval is any, currval requires a named sequence)
Note: lastval() returns the latest sequence value assigned by your session, independently of what is happening in other sessions.
In SQL Server, you can do (in addition to the other solutions already present):
INSERT INTO dbo.Users(Name, Age)
OUTPUT INSERTED.ID AS 'New User ID'
VALUES('charuka', 12)
The OUTPUT clause is very handy when doing inserts, updates, deletes, and you can return any of the columns - not just the auto-incremented ID column.
Read more about the OUTPUT clause in the SQL Server Books Online.
In Oracle and PostgreSQL you can do this:
INSERT INTO some_table (name, age)
VALUES
('charuka', 12)
RETURNING ID
When doing this through JDBC you can also do that in a cross-DBMS manner (without the need for RETURNING) by calling getGeneratedKeys() after running the INSERT
I had the same need and found this answer ..
This creates a record in the company table (comp), it the grabs the auto ID created on the company table and drops that into a Staff table (staff) so the 2 tables can be linked, MANY staff to ONE company. It works on my SQL 2008 DB, should work on SQL 2005 and above.
===========================
CREATE PROCEDURE [dbo].[InsertNewCompanyAndStaffDetails]
#comp_name varchar(55) = 'Big Company',
#comp_regno nchar(8) = '12345678',
#comp_email nvarchar(50) = 'no1#home.com',
#recID INT OUTPUT
-- The '#recID' is used to hold the Company auto generated ID number that we are about to grab
AS
Begin
SET NOCOUNT ON
DECLARE #tableVar TABLE (tempID INT)
-- The line above is used to create a tempory table to hold the auto generated ID number for later use. It has only one field 'tempID' and its type INT is the same as the '#recID'.
INSERT INTO comp(comp_name, comp_regno, comp_email)
OUTPUT inserted.comp_id INTO #tableVar
-- The 'OUTPUT inserted.' line above is used to grab data out of any field in the record it is creating right now. This data we want is the ID autonumber. So make sure it says the correct field name for your table, mine is 'comp_id'. This is then dropped into the tempory table we created earlier.
VALUES (#comp_name, #comp_regno, #comp_email)
SET #recID = (SELECT tempID FROM #tableVar)
-- The line above is used to search the tempory table we created earlier where the ID we need is saved. Since there is only one record in this tempory table, and only one field, it will only select the ID number you need and drop it into '#recID'. '#recID' now has the ID number you want and you can use it how you want like i have used it below.
INSERT INTO staff(Staff_comp_id)
VALUES (#recID)
End
-- So there you go. I was looking for something like this for ages, with this detailed break down, I hope this helps.

large insert in two tables. First table will feed second table with its generated Id

One question about how to t-sql program the following query:
Table 1
I insert 400.000 mobilephonenumbers in a table with two columns. The number to insert and identity id.
Table 2
The second table is called SendList. It is a list with 3columns, a identity id, a List id, and a phonenumberid.
Table 3
Is called ListInfo and contains PK list id. and info about the list.
My question is how should I using T-sql:
Insert large list with phonenumbers to table 1, insert the generated id from the insert of phonenum. in table1, to table 2. AND in a optimized way. It cant take long time, that is my problem.
Greatly appreciated if someone could guide me on this one.
Thanks
Sebastian
What version of SQL Server are you using? If you are using 2008 you can use the OUTPUT clause to insert multiple records and output all the identity records to a table variable. Then you can use this to insert to the child tables.
DECLARE #MyTableVar table(MyID int);
INSERT MyTabLe (field1, field2)
OUTPUT INSERTED.MyID
INTO #MyTableVar
select Field1, Field2 from MyOtherTable where field3 = 'test'
--Display the result set of the table variable.
Insert MyChildTable (myID,field1, field2)
Select MyID, test, getdate() from #MyTableVar
I've not tried this directly with a bulk insert, but you could always bulkinsert to a staging table and then use the processs, described above. Inserting groups of records is much much faster than one at a time.