NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive - impala

I am trying to create a Kudu table using Impala-shell.
Query:
CREATE TABLE lol
(
uname STRING,
age INTEGER,
PRIMARY KEY(uname)
)
STORED AS KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1'
);
CREATE TABLE t (k INT PRIMARY KEY) STORED AS KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1'
);
But I am getting error:
ERROR: ImpalaRuntimeException: Error creating Kudu table 'impala::default.t'
CAUSED BY: NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive.
Please suggest what should be done for this.
I new to Kudu.
**

NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3 , this error is occurring because in query replication factor is not specified
In KUDU default replication factor = 3.
If you are running in query standalone cluster in that case only 1 tablet servers are alive in kudu's ( kudu tserver)
for above query replication factor should be 1
You can modife the replication factor as per the requirement by setting
table_num_replicas (optional)
- The number of replicas
Query:
CREATE TABLE lol
(
uname STRING,
age INTEGER,
PRIMARY KEY(uname)
)
STORED AS KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1',
'kudu.num_tablet_replicas' = '1'
);
In KUDU's for large a amount of data partition should be specified.
Query:
create table test
(
id int not null,
code string,
primary key(id)
)
partition by hash partitions 8
stored as KUDU
TBLPROPERTIES (
'kudu.master_addresses' = '127.0.0.1' ,
'kudu.num_tablet_replicas' = '1'
);
For setting more property refer https://kudu.apache.org/docs/command_line_tools_reference.html

In addition to the answer you can also set the "Default Number of Replicas" in Kudu's configuration to 1. this way you avoid the hassle of setting this in every command you types.
You can access this configuration from Cloudera Manager --> Kudu --> Configuration
then search for "Default Number of Replicas"
you might need to suppress the setting to avoid the warning message, because the recommended setting is 3.

Related

PostgreSQL query executes much slower on the server

I have a table like this which stores file operation logs on a computer:
CREATE TABLE tbl_logs_file_ops(
log_id bigserial primary key,
username VARCHAR(255),
computer VARCHAR(255),
activity VARCHAR(255),
is_directory bool,
event_type VARCHAR(255),
src text,
dst text,
event_time timestamp(6));
And I run the following query on it to retrieve the names of the files that are the top 10 most used:
SELECT src, COUNT(src)
FROM tbl_logs_file_ops
GROUP BY src
ORDER BY COUNT(src) DESC
NULLS LAST
LIMIT 10
When I execute this query from pgadmin4 which is running on the same local network as the server where the database is located, it takes about 3 seconds. When I execute the query through an API which is deployed on the same server as the database, it takes about a minute. I initially thought this was a problem with the API but then I executed the same query on the server using psql and it's same.
I've tried disabling ssl from the postgresql.conf file like some other posts were suggesting but nothing changed.

Greenplum's FDW tool duplicates data many times

I have Greenplum database version 6.14.1, working on CentOS 7.2 host.
So I try to copy data from Postgres 11 to Greenplum 6.14 by Foreign Data Wrapper.
With default options I receive N rows and all data comes through master node.
So I decide to change options to (mpp_execute "all segment"),
but in this case I receive 24*N rows, because my cluster has 24 segments nodes.
I think this is well known issue, but unfortunately can't find solution at all.
Steps to reproduce the behavior:
On Postgres server
create table x(id int, value float8);
insert into x select r, r * random() from generate_series(1,1000) r;
select count(*) from x;
1000
(1 row)
On Greenplum server
CREATE EXTENSION postgres_fdw;
create server foreign_server_x FOREIGN DATA WRAPPER postgres_fdw
OPTIONS(host '172.16.128.135', port '5432', dbname 'postgres');
-- user mapping
CREATE USER MAPPING FOR current_user
SERVER foreign_server_x OPTIONS (user 'admin', password 'admin');
-- foreign table foreign_x
CREATE FOREIGN TABLE foreign_x
(id int, value float8) SERVER foreign_server_x OPTIONS (schema_name 'public', table_name 'x');
select count(*) from foreign_x;
1000
(1 row)
-- mpp_execute = all segments
alter foreign table foreign_x options (add mpp_execute 'all segments');
-- foreign_x (24 segments)
select count(*) from foreign_x;
24000
(1 row)
this would be expected behavior since you have 24 segments, and are asking all of them to go query the database. I would suggest trying to execute only from the master, or select a unique count(*), or leverage an external table instead of FDW.

Can't delete from table after switch from logical to streaming replication

On my DEV server I tested logical replication, and return to streaming after that.
Now wal_level = replica and I have two slaves:
pid |state |application_name |client_addr|write_lag |flush_lag |replay_lag |sync_priority|sync_state|
-----|----------|--------------------|-----------|---------------|---------------|---------------|-------------|----------|
12811|streaming |db-slave1 |*.*.*.* |00:00:00.000569|00:00:00.001914|00:00:00.001932| 0|async |
25978|streaming |db-slave2 |*.*.*.* |00:00:00.000568|00:00:00.001913|00:00:00.001931| 0|async |
Now I created new table and insert one record. For example:
create table test_delete (
id int
);
insert into test_delete values (1);
delete from test_delete where id = 1;
The table created and replicated to both slaves, but deletion query failed with error:
SQL Error [55000]: ERROR: cannot delete from table "test_delete" because it does not have a replica identity and publishes deletes
Hint: To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.
So, I need help to restore status before switch lo logical replication and ability to delete from tables
After some investigation I found solution. Despite the fact that wal_level has changed in postgres.conf all tables still appears in pg_publication_tables.
So for check publication status used:
select * from pg_publication_tables;
and for remove records:
drop publication <publication_name>;

presto error "Hive table '' is corrupt. Found sub-directory in bucket directory for partition"?

I have already disabled hive transaction by setting "hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager" according to the site.
But when i queryed by presto, it still showed the exception message like "Hive table '' is corrupt. Found sub-directory in bucket directory for partition".
I found that if i insert table manually, presto's query on this table returns ok.But if the table was inserted by flume, presto's query on this table would fail.
My manual sql is like below.
INSERT INTO test_flume6 PARTITION (area = 'x', business='x',
minuteTime='y') VALUES (201,'x','x');
My presto version is 0.229, my hive version is hive-1.1.0-cdh5.9.0.
My flume version is flume-ng-1.6.0-cdh5.9.0.
My flume's hiveSink configuration is like below.
# Hive Sink
agent.sinks.hive_sink.type = hive
agent.sinks.hive_sink.hive.metastore = thrift://hive:9083
agent.sinks.hive_sink.hive.database = estate
agent.sinks.hive_sink.hive.table = test_flume6
#agent.sinks.hive_sink.hive.txnsPerBatchAsk = 2
agent.sinks.hive_sink.hive.partition = asia,stock,%y-%m-%d-%H-%M
#agent.sinks.hive_sink.batchSize = 10
agent.sinks.hive_sink.serializer = DELIMITED
agent.sinks.hive_sink.serializer.delimiter = ,
agent.sinks.hive_sink.serializer.fieldnames = id,username,email
My hive table's creation sql is like below.
create table test_flume6(
`id` string,
username string,
email string
)
PARTITIONED BY(area string, business string, minuteTime string)
clustered by(id)
into 5 buckets
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
stored as orc
TBLPROPERTIES ('transactional'='false');

not able to create table in kudu using impala-shell

I was doing R&D on hadoop, hive, impala, and kudu. Installed HADOOP, HIVE, IMPALA, and KUDU servers.
I have configured --kudu_master_hosts=: in /etc/default -> impala file. i.e like below:
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}\
--kudu_master_hosts=<HOST_NAME>:<PORT>"
==============
Aftert that re-started the servers.
Then using Kudu JAVA client, i was able to create table in kudu and able to insert some records.
Then mapped the same table in impala by doing this:
CREATE EXTERNAL TABLE my_mapping_table
STORED AS KUDU
TBLPROPERTIES (
'kudu.table_name' = 'testT1'
);
successfully able to access the kudu table in impala and able to see all the records.
Now i was trying to create a table in KUDU using impala-shell.
[<HOST_NAME>:21000] > CREATE TABLE my_first_table
> (
> id BIGINT,
> name STRING,
> PRIMARY KEY(id)
> )
> STORED AS KUDU;
But this is giving a error i.e :
ERROR: ImpalaRuntimeException: Error creating Kudu table 'impala::default.my_first_table'
CAUSED BY: NonRecoverableException: Not enough live tablet servers to create a table with the requested replication factor 3. 1 tablet servers are alive.
Can any one explain me what is happening or what is the solution for this error.
Read through KUDU documentation but not getting any ideas.
Regards,
Akshay
This query will help to create table i.e
CREATE TABLE emp
(
uname STRING,
age INTEGER,
PRIMARY KEY(uname)
)
STORED AS KUDU
TBLPROPERTIES (
'kudu.num_tablet_replicas' = '1'
);
This query will work only if, --kudu_master_hosts=: is set in /etc/default impala file.
Else u have to give kudu_master_hosts in table properties. i.e
TBLPROPERTIES (
'kudu.num_tablet_replicas' = '1',
--kudu_master_hosts=<HOST_NAME>:<PORT>
);