Hive-Hbase integration - issues while inserting data

Hive-Hbase integration - issues while inserting data - hive

I was able to successfully integrate Hive & Hbase for straight forward scenarios (No Partition & bucketing). I was able to insert data in both Hive & hbase for those simple scenarios.
I am having issues with Hive partitioned table stored in Hbase. I was able to execute "Create DDL" statement. When I try to perform Insert I get an error message saying "Must specify table name"
CREATE TABLE hivehbase_customer(id int,holdid int,fname string,lname string,address string,zipcode string)
partitioned by (city string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:hold_id,personal_data:f_name,personal_data:l_name,personal_address:address,personal_address:zipcode")
TBLPROPERTIES ("hbase.table.name" = "hivehbase_custom", "hbase.mapred.output.outputtable" = "hivehbase_custom");
insert into table hivehbase_customer partition(city= 'tegacay') values (7394,NULL,NULL,NULL,NULL,29708);

try following insert query
insert into table hivehbase_customer partition(city) values (7394,NULL,NULL,NULL,NULL,29708,'tegacay');
actually partitioned column needs to be specified as last column in insert query.

Related

How to copy SQL column data to new column then drop the original column?

I am following the recommendation in this sql snowflake forum in order to transform an integer data column into a varchar by creating a new column. I want to drop the original integer column when I am done, but doing so always results in the new column no longer working and any future queries erroring out.
For instance, I have test_num is the integer and test_num_to_char is the varchar
alter table test_table
add test_num_to_char varchar as CAST(test_num as varchar)
then
alter table test_table
drop column test_num
select *
from test_table
results in an error message:
SQL execution internal error: Processing aborted due to error 300002:224117369
Is there a different transformation method that removes the dependency on the original integer column so I can drop it?

alter table test_table add test_num_to_char varchar(10);
go
update test_table set test_num_to_char = CAST(recno as varchar);

Try the TO_DECIMAL transformation method.
It's documentation is given here

How can I disable transactions for a Hive table?

I have a Hive table that was originally created as transactional, but I want to disable transactions on the table because they are not actually needed.
I tried to disable them using ALTER TABLE, but I got an error:
hive> ALTER TABLE foo SET TBLPROPERTIES('transactional'='false');
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. TBLPROPERTIES with 'transactional'='true' cannot be unset
I am using Hive 2.3.2

According to the documentation changing TBLPROPERTIES ("transactional"="false") is not allowed.
You can re-create the table.
Do table backup first:
create table bkp_table as
select * from your_table;
Then drop table and create again without transactional property. Reload data from backup.
Or make a new table, load data from old one, delete old, rename new.

You have to re-create the table.
First backup table if you want. then, DROP TABLE
Create Table with TBLPROPERTIES ( 'transactional'='false' )
CREATE TABLE your_table(
`col` string,
`col2` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'transactional'='false'
)
You can Choose Input and Output format

How to truncate a partitioned external table in hive?

I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table :
hive> truncate table abc;
But, it is throwing me an error stating : Cannot truncate non-managed table abc.
Can anyone please suggest me out regarding the same ...

Make your table MANAGED first:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='FALSE');
Then truncate:
truncate table abc;
And finally you can make it external again:
ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');

By default, TRUNCATE TABLE is supported only on managed tables. Attempting to truncate an external table results in the following error:
Error: org.apache.spark.sql.AnalysisException: Operation not allowed: TRUNCATE TABLE on external tables
Action Required
Change applications. Do not attempt to run TRUNCATE TABLE on an external table.
Alternatively, change applications to alter a table property to set external.table.purge to true to allow truncation of an external table:
ALTER TABLE mytable SET TBLPROPERTIES ('external.table.purge'='true');

There is an even better solution to this, which is basically a one liner.
insert overwrite table table_xyz select * from table_xyz where 1=2;
This code will delete all the files and create a blank file in the external folder location with absolute zero records.

Look at https://issues.apache.org/jira/browse/HIVE-4367 : use
truncate table my_ext_table force;

Hive Table is MANAGED or EXTERNAL - issue post table type conversion

I have a hive table in XYZ db named ABC.
When I run describe formatted XYZ.ABC; from hue, I get the following..
that is
Table Type: MANAGED_TABLE
Table Parameters: EXTERNAL True
So is this actually an external or a managed/internal hive table?

This is treated as an EXTERNAL table. Dropping table will keep the underlying HDFS data. The table type is being shown as MANAGED_TABLE since the parameter EXTERNAL is set to True, instead of TRUE.
To fix this metadata, you can run this query:
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Some details:
The table XYZ.ABC must have been created via this kind of query:
hive> CREATE TABLE XYZ.ABC
<additional table definition details>
TBLPROPERTIES (
'EXTERNAL'='True');
Describing this table will give:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: MANAGED_TABLE
:
Table Parameters:
EXTERNAL True
Dropping this table will keep the data referenced in Location in describe output.
hive> drop table XYZ.ABC;
# does not drop table data in HDFS
The Table Type still shows as MANAGED_TABLE which is confusing.
Making the value for EXTERNAL as TRUE will fix this.
hive> ALTER TABLE XYZ.ABC SET TBLPROPERTIES('EXTERNAL'='TRUE');
Now, doing a describe will show it as expected:
hive> desc formatted XYZ.ABC;
:
Location: hdfs://<location_of_data>
Table Type: EXTERNAL_TABLE
:
Table Parameters:
EXTERNAL TRUE

Example -
Lets create a sample MANAGED table,
CREATE TABLE TEST_TBL(abc int, xyz string);
INSERT INTO TABLE test_tbl values(1, 'abc'),(2, 'xyz');
DESCRIBE FORMATTED test_tbl;
Changing type to EXTERNAL (in the wrong way using True, instead of TRUE):
ALTER TABLE test_tbl SET TBLPROPERTIES('EXTERNAL'='True');
This gives,
Now lets DROP the table,
DROP TABLE test_tbl;
The result:
Table is dropped but data on HDFS isn't. Showing correct external table behavior!
If we re-create the table we can see data exists:
CREATE TABLE test_tbl(abc int, xyz string);
SELECT * FROM test_tbl;
Result:
The describe shows it wrongly as MANAGED TABLE along with EXTERNAL True because of:
.equals check in the meta
Hive Issue JIRA: HIVE-20057
Proposed fix: Use case insensitive equals

Save big dataset from R into temp table SQL database

I have a dataframe (predict_prc) with 60k rows and 2 variables (chrt_id and prc). And I need to save this dataframe into MS SQL database.
I choose the next way - create temp table, insert new values and exec the stored proc.
I tried the code below:
sql = paste("
CREATE TABLE #t (chrt_id INT PRIMARY KEY,prc FLOAT)
INSERT INTO #t
VALUES",
paste0(sprintf("(%.2i, ", predict_prc$chrt_id), sprintf("%.2f)", predict_prc$predict_prc), collapse = ", ")
,"EXEC DM.LoadChrtPrc
")
But its too many values to insert this way.
Then I tried next code:
sql_create = paste("
IF (SELECT object_id('#t')) IS NOT NULL
BEGIN
DROP TABLE #t
END
CREATE TABLE #t (chrt_id FLOAT PRIMARY KEY, prc FLOAT)
")
sql_exec = paste("
EXEC DM.LoadChrtPrc
")
channel <- odbcConnect('db.w')
create <- sqlQuery(channel, sql_create)
save <- sqlSave(channel, predict_prc, tablename = '#t', fast=TRUE, append=F, rownames=FALSE)
output <- sqlQuery(channel, sql_exec)
odbcClose(channel)
But i`ve got an error:
> save <- sqlSave(channel, predict_prc, tablename = '#t', fast=TRUE, append=F, rownames=FALSE)
Error in sqlSave(channel, predict_prc, tablename = "#t", fast = TRUE, :
42S01 2714 [Microsoft][ODBC SQL Server Driver][SQL Server]There is already an object named '#t' in the database.
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE "#t" ("chrt_id" float, "prc" float)'
If I execute save without create then I`ve got this error:
> save <- sqlSave(channel, predict_prc, tablename = '#t1', fast=TRUE, append=F, rownames=FALSE)
Error in sqlColumns(channel, tablename) :
‘#t’: table not found on channel
Can anybody help me with this issue?

SQL Server doesn't allow more than 1000 rows in one query.
You can insert all the values by creating chunks of 1000.
For every 1000 rows, you should create a new sql query and run it.

To solve the problem i have to create several temporary tables and then I inserted 1000 records per table to solve my problem.
So before you create temporary table, count the number of records you gonna put in temp table , divide by 1000 and then create a temp table as per your requirement.
This solution is for one time query solution.
If you want automate the process, use something else.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive-Hbase integration - issues while inserting data - hive

try following insert query insert into table hivehbase_customer partition(city) values (7394,NULL,NULL,NULL,NULL,29708,'tegacay'); actually partitioned column needs to be specified as last column in insert query.

Related

How to copy SQL column data to new column then drop the original column?

How can I disable transactions for a Hive table?

How to truncate a partitioned external table in hive?

Hive Table is MANAGED or EXTERNAL - issue post table type conversion

Save big dataset from R into temp table SQL database

Categories

Resources