not able to create table in kudu using impala-shell - impala

I was doing R&D on hadoop, hive, impala, and kudu. Installed HADOOP, HIVE, IMPALA, and KUDU servers.
I have configured --kudu_master_hosts=: in /etc/default -> impala file. i.e like below:
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}\
--kudu_master_hosts=<HOST_NAME>:<PORT>"
==============
Aftert that re-started the servers.
Then using Kudu JAVA client, i was able to create table in kudu and able to insert some records.
Then mapped the same table in impala by doing this:
CREATE EXTERNAL TABLE my_mapping_table
STORED AS KUDU
TBLPROPERTIES (
'kudu.table_name' = 'testT1'
);
successfully able to access the kudu table in impala and able to see all the records.
Now i was trying to create a table in KUDU using impala-shell.
[<HOST_NAME>:21000] > CREATE TABLE my_first_table
> (
> id BIGINT,
> name STRING,
> PRIMARY KEY(id)
> )
> STORED AS KUDU;
But this is giving a error i.e :
ERROR: ImpalaRuntimeException: Error creating Kudu table 'impala::default.my_first_table'
CAUSED BY: NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive.
Can any one explain me what is happening or what is the solution for this error.
Read through KUDU documentation but not getting any ideas.
Regards,
Akshay

This query will help to create table i.e
CREATE TABLE emp
(
uname STRING,
age INTEGER,
PRIMARY KEY(uname)
)
STORED AS KUDU
TBLPROPERTIES (
'kudu.num_tablet_replicas' = '1'
);
This query will work only if, --kudu_master_hosts=: is set in /etc/default impala file.
Else u have to give kudu_master_hosts in table properties. i.e
TBLPROPERTIES (
'kudu.num_tablet_replicas' = '1',
--kudu_master_hosts=<HOST_NAME>:<PORT>
);

Related

presto error "Hive table '' is corrupt. Found sub-directory in bucket directory for partition"?

I have already disabled hive transaction by setting "hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" according to the site.
But when i queryed by presto, it still showed the exception message like "Hive table '' is corrupt. Found sub-directory in bucket directory for partition".
I found that if i insert table manually, presto's query on this table returns ok.But if the table was inserted by flume, presto's query on this table would fail.
My manual sql is like below.
INSERT INTO test_flume6 PARTITION (area = 'x', business='x',
minuteTime='y') VALUES (201,'x','x');
My presto version is 0.229, my hive version is hive-1.1.0-cdh5.9.0.
My flume version is flume-ng-1.6.0-cdh5.9.0.
My flume's hiveSink configuration is like below.
# Hive Sink
agent.sinks.hive_sink.type = hive
agent.sinks.hive_sink.hive.metastore = thrift://hive:9083
agent.sinks.hive_sink.hive.database = estate
agent.sinks.hive_sink.hive.table = test_flume6
#agent.sinks.hive_sink.hive.txnsPerBatchAsk = 2
agent.sinks.hive_sink.hive.partition = asia,stock,%y-%m-%d-%H-%M
#agent.sinks.hive_sink.batchSize = 10
agent.sinks.hive_sink.serializer = DELIMITED
agent.sinks.hive_sink.serializer.delimiter = ,
agent.sinks.hive_sink.serializer.fieldnames = id,username,email
My hive table's creation sql is like below.
create table test_flume6(
`id` string,
username string,
email string
)
PARTITIONED BY(area string, business string, minuteTime string)
clustered by(id)
into 5 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
stored as orc
TBLPROPERTIES ('transactional'='false');

Select count(*) from Table , Select * from Table dosent yeild any output

I am trying to build a managed table (which orc formatted ,bucketed and table properties is set to true for transnational )on which i can run the update/Insert Statement In hive .
I am running this whole setup on AWS EMR and the Hive version is 2.4.3 the default directory store the data is S3.
I am able to populate the table from another external table .
However am getting select count(*) as zero and no output for select *
i dropped the table and recreated the table and repopulated the data .
The ANALYZE TABLE TABLE-NAME COMPUTE STATISTICS gives proper output .

NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive

I am trying to create a Kudu table using Impala-shell.
Query:
CREATE TABLE lol
(
uname STRING,
age INTEGER,
PRIMARY KEY(uname)
)
STORED AS KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1'
);
CREATE TABLE t (k INT PRIMARY KEY) STORED AS KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1'
);
But I am getting error:
ERROR: ImpalaRuntimeException: Error creating Kudu table 'impala::default.t'
CAUSED BY: NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive.
Please suggest what should be done for this.
I new to Kudu.
**
NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3 , this error is occurring because in query replication factor is not specified
In KUDU default replication factor = 3.
If you are running in query standalone cluster in that case only 1 tablet servers are alive in kudu's ( kudu tserver)
for above query replication factor should be 1
You can modife the replication factor as per the requirement by setting
table_num_replicas (optional)
- The number of replicas
Query:
CREATE TABLE lol
(
uname STRING,
age INTEGER,
PRIMARY KEY(uname)
)
STORED AS KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1',
'kudu.num_tablet_replicas' = '1'
);
In KUDU's for large a amount of data partition should be specified.
Query:
create table test
(
id int not null,
code string,
primary key(id)
)
partition by hash partitions 8
stored as KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1' ,
'kudu.num_tablet_replicas' = '1'
);
For setting more property refer https://kudu.apache.org/docs/command_line_tools_reference.html
In addition to the answer you can also set the "Default Number of Replicas" in Kudu's configuration to 1. this way you avoid the hassle of setting this in every command you types.
You can access this configuration from Cloudera Manager --> Kudu --> Configuration
then search for "Default Number of Replicas"
you might need to suppress the setting to avoid the warning message, because the recommended setting is 3.

Hive external table is unable to read already partitioned hdfs directory

I have a map reduce job, that already writes out record to hdfs using hive partition naming convention.
eg
/user/test/generated/code=1/channel=A
/user/test/generated/code=1/channel=B
After I create an external table, it does not see the partition.
create external table test_1 ( id string, name string ) partitioned by
(code string, channel string) STORED AS PARQUET LOCATION
'/user/test/generated'
Even with the alter command
alter table test_1 ADD PARTITION (code = '1', channel = 'A')
, it does not see the partition or record,
because
select * from test_1 limit 1 produces 0 result.
If I use empty location when I create external table, and then use
load data inpath ...
then it works. But the issue is there is too many partitions for the load data inpath to work.
Is there a way to make hive recognize the partition automatically (without doing insert query)?
Using msck, it seems to be working. But I had to exit the hive session, and connect again.
MSCK REPAIR TABLE test_1

Create External Hive Table Pointing to HBase Table

I have a table named "HISTORY" in HBase having column family "VDS" and the column names ROWKEY, ID, START_TIME, END_TIME, VALUE. I am using Cloudera Hadoop Distribution. I want to provide SQL interface to HBase table using Impala. In order to do this we have to create respective External Table in Hive? So how to create external hive table pointing to this HBase table?
Run the following code in Hive Query Editor:
CREATE EXTERNAL TABLE IF NOT EXISTS HISTORY
(
ROWKEY STRING,
ID STRING,
START_TIME STRING,
END_TIME STRING,
VALUE DOUBLE
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
(
"hbase.columns.mapping" = ":key,VDS:ID,VDS:START_TIME,VDS:END_TIME,VDS:VALUE"
)
TBLPROPERTIES("hbase.table.name" = "HISTORY");
Don't forget to Refresh Impala Metadata after External Table Creation with the following bash command:
echo "INVALIDATE METADATA" | impala-shell;