INSERT in table Hive - hive

I have created the following table in hive:
hive> CREATE TABLE IF NOT EXISTS Sensorreading ( recvtime String, nodeid int, sensorid int, systemid int, value float);
OK
Time taken: 3.007 seconds
hive> describe Sensorreading;
OK
recvtime string
nodeid int
sensorid int
systemid int
value float
Time taken: 0.381 seconds
hive>
And now I need to insert data in it. I have tried this but it don't work:
INSERT INTO TABLE Sensorreading (recvtime, nodeid, sensorid, systemid, value) VALUES ('2015-05-29 11:10:00',1,1,1,-45.4);
How is the syntax of INSERT? Thanks

INSERT...VALUES is available starting in Hive 0.14.
Check if your Hive version is 0.14 or later.

Insert is possible in hive 0.14. But if you need to insert something than there are two ways for it (manual methods , not any paticular command):
1. First you can load it from text file(changes only done in it i.e including your rows in it)
2. You can copy the part file to local and than do changes and then again revert back to regular path.

Related

Impala insert vs hive insert

When I tried to insert integer values into a column in a parquet table with Hive command, values are not getting insert and shows as null. But when used impala command it is working. But the partition size reduces with impala insert. Also number of rows in the partitions (show partitions) show as -1. What is the reason for this?
CREATE TABLE `TEST.LOGS`(
`recordtype` string,
`recordstatus` string,
`recordnumber` string,
`starttime` string,
`endtime` string,
`acctsessionid` string,
`subscriberid` string,
`framedip` string,
`servicename` string,
`totalbytes` int,
`rxbytes` int,
`txbytes` int,
`time` int,
`plan` string,
`tcpudp` string,
`intport` string)
PARTITIONED BY (`ymd` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://dev-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
TBLPROPERTIES (
'transient_lastDdlTime'='1634390569')
Insert Statement
Hive
sudo -u hdfs hive -e 'insert into table TEST.LOGS partition (ymd="20220221") select * from TEMP.LOGS;'
Impala
impala-shell --ssl -i xxxxxxxxxxx:21000 -q 'insert into table TEST.LOGS partition (ymd="20220221") select * from TEMP.LOGS;'
When I tried to insert integer values into a column in a parquet table with Hive command, values are not getting insert and shows as null.
Could you pls share your exact insert statement and table definition for precise answer? If i have to guess, this may be because of difference in implicit data type conversion by hive and impala.
HIVE - If you set hive.metastore.disallow.incompatible.col.type.changes to false, the types of columns in Metastore can be changed from any type to any other type. After such a type change, if the data can be shown correctly with the new type, the data will be displayed. Otherwise, the data will be displayed as NULL. As per documentation forward conversion works(int> bigint) whereas backward (big int > small int) doesnt and produces null.
Impala - it supports a limited set of implicit casts to avoid undesired results from unexpected casting behavior. Impala does perform implicit casts among the numeric types, when going from a smaller or less precise type to a larger or more precise one. For example, Impala will implicitly convert a SMALLINT to a BIGINT.
Also number of rows in the partitions (show partitions) show as -1 -
Please run compute stats table_name to fix this issue.

String is too long and would be truncated

Query:
CREATE TABLE SRC(SRC_STRING VARCHAR(20))
CREATE OR REPLACE TABLE TGT(tgt_STRING VARCHAR(10))
INSERT INTO SRC VALUES('JKNHJYGHTFGRTYGHJ')
INSERT INTO TGT(TGT_STRING) SELECT SRC_STRING::VARCHAR(10) FROM SRC
Error: String 'JKNHJYGHTFGRTYGHJ' is too long and would be truncated
Is there any way we can enable enforce length(not for COPY command) while inserting data from high precision to low precision column?
I'd recommend using the SUBSTR( ) function, to pick the piece of data you want, example as follows where I take the first 10 characters (if available, if there were only 5 it'd use those 5 characters).
CREATE OR REPLACE TEMPORARY TABLE SRC(
src_string VARCHAR(20));
CREATE OR REPLACE TEMPORARY TABLE TGT(
tgt_STRING VARCHAR(10));
INSERT INTO src
VALUES('JKNHJYGHTFGRTYGHJ');
INSERT INTO tgt(tgt_string)
SELECT SUBSTR(src_string, 1, 10)
FROM SRC;
SELECT * FROM tgt; --JKNHJYGHTF
Here's the documentation on the function:
https://docs.snowflake.com/en/sql-reference/functions/substr.html

hive record inserted but then get a error

I create a table in hive:
CREATE TABLE `test3`.`shop_dim` (
`shop_id` bigint,
`shop_name` string,
`shop_company_id` bigint,
`shop_url1` string,
`shop_url2` string,
`sid` string,
`shop_open_duration` string,
`date_modified` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ("path"="hdfs://myhdfs/warehouse/tablespace/managed/hive/test3.db/shop_dim")
STORED AS PARQUET
TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"date_modified\":\"true\",\"shop_company_id\":\"true\",\"shop_id\":\"true\",\"shop_name\":\"true\",\"shop_open_duration\":\"true\",\"shop_url1\":\"true\",\"shop_url2\":\"true\",\"sid\":\"true\"}}', 'bucketing_version'='2', 'numFiles'='12', 'numRows'='12', 'rawDataSize'='96', 'spark.sql.create.version'='2.3.0', 'spark.sql.sources.provider'='parquet', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"Shop_id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_company_id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_url1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_url2\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"sid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_open_duration\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Date_modified\",\"type\":\"timestamp\",\"nullable\":true,\"metadata\":{}}]}', 'totalSize'='17168')
GO
then I insert a record use below sql:
insert into test3.shop_dim values(11,'aaa',22,'11113','2222','sid','opend',unix_timestamp())
I can see the record is inserted,but waited for a long time,there is error:
>[Error] Script lines: 1-2 --------------------------
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
[Executed: 2018-10-24 下午12:00:03] [Execution: 0ms]
I use aqua studio as a tool.Why this error occur?
This issue can happen if the values being inserted are not matching to the expected type.
In your case "date_modified" column is of timestamp type, but unix_timestamp() will return bigint (current Unix timestamp in seconds).
If you execute the query
select unix_timestamp();
Output would be like : 1558547043
Instead, you need to use current_timestamp.
select current_timestamp;
Output would be like : 2019-05-22 17:50:18.803
You can refer Hive manual for in-built date functions at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
Below given hive setting can help resolve org.apache.hadoop.hive.ql.exec.StatsTask (state=08S01,code=1)
set hive.stats.column.autogather=false; or set hive.stats.autogather=false
set hive.optimize.sort.dynamic.partition=true;

Insert values in Hive tables with primitive and complex data type

If i have only one table such as student and table definition and schema is such as
hive>
create table student1(S_Id int,
> S_name Varchar(100),
> Address Struct<a:int, b:String, c:int>,
> marks Map<String, Int>);
OK
Time taken: 0.439 seconds
hive>
hive> Describe Student1;
OK
s_id int
s_name varchar(100)
address struct<a:int,b:string,c:int>
marks map<string,int>
Time taken: 0.112 seconds, Fetched: 4 row(s)
Now i am trying to insert values into that Student1 table such as
hive> insert into table student1 values(1, 'Afzal', Struct(42, 'nelson Ave NY', 08309),MAP("MATH", 89));
I am getting that error
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
How do i insert values for one record in one go, Can anyone please help me?
It works when using insert .. select statement. Create a dummy table with single row, or use some existing table + add limit 1. Also use named_struct function:
Demo:
hive> insert into table student1
select 1 s_id,
'Afzal' s_name,
named_struct('a',42, 'b','nelson Ave NY', 'c',08309) address,
MAP('MATH', 89) marks
from default.dual limit 1; --this is dummy table
Loading data to table dev.student1
Table dev.student1 stats: [numFiles=1, numRows=1, totalSize=48, rawDataSize=37]
OK
Time taken: 27.175 seconds
Check data:
hive> select * from student1;
OK
1 Afzal {"a":42,"b":"nelson Ave NY","c":8309} {"MATH":89}
Time taken: 0.125 seconds, Fetched: 1 row(s)

Convert table column data type from image to varbinary

I have a table like:
create table tbl (
id int,
data image
)
It's found that the column data have very small size, which can be stored in varbinary(200)
So the new table would be,
create table tbl (
id int,
data varbinary(200)
)
How can I migrate this table to new design without loosing the data in it.
Just do two separate ALTER TABLEs, since you can only convert image to varbinary(max), but you can, afterwards, change its length:
create table tbl (
id int,
data image
)
go
insert into tbl(id,data) values
(1,0x0101010101),
(2,0x0204081632)
go
alter table tbl alter column data varbinary(max)
go
alter table tbl alter column data varbinary(200)
go
select * from tbl
Result:
id data
----------- ---------------
1 0x0101010101
2 0x0204081632
You can use this ALTER statement to convert existing column IMAGE to VARBINARY(MAX). Refer Here
ALTER Table tbl ALTER COLUMN DATA VARBINARY(MAX)
After this conversion, you are surely, get your data backout.
NOTE:- Don't forgot to take backup before execution.
The IMAGE datatype has been deprecated in future version SQL SERVER, and needs to be converted to VARBINARY(MAX) wherever possible.
How about you create a NewTable with the varbinary, then copy the data from the OldTable into it?
INSERT INTO [dbo].[NewTable] ([id], [data])
SELECT [id], [image] FROM [dbo].[OldTable]
First of all from BOL:
image: Variable-length binary data from 0 through 2^31-1
(2,147,483,647) bytes.
The image data type is essentially an alias for varbinary (2GB), so converting it to a varbinary(max) should not lead to data loss.
But to be sure:
back up your existing data
add a new field (varbinary(max))
copy data from old field to new field
swap the fields with sp_rename
test
after successful test, drop the old column