clickhouse connected to hive with hdfsHA failed - hive

when i used :
SELECT * FROM hive('thrift://xxx:9083', 'ods_qn', 'ods_crm_prod_on_off_line_msg_es_df', 'bizid Nullable(String), corpid Nullable(Int32),time Nullable(Int64),reasontype Nullable(Int32),weworkid Nullable(Int64), type Nullable(Int8),pt String', 'pt');
i get:
Received exception from server (version 22.3.2):
Code: 210. DB::Exception: Received from localhost:9000. DB::Exception: Unable to connect to HDFS: InvalidParameter: Cannot create namenode proxy, does not contain host or port. (NETWORK_ERROR)
ps:
my hdfs used HA mode
this is my clickhouse config.xnl about hdfs:
<libhdfs3_conf>/etc/clickhouse-server/hdfs-client.xml</libhdfs3_conf>
how can i do? thank you
ps:
when i use :
CREATE TCREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://nn1:8020/testck/other_test', 'TSV')
INSERT IINSERT INTO hdfs_engine_table VALUES ('one', 1), ('two', 2), ('three', 3)
select * from hdfs_engine_table;
SELECT *
FROM hdfs_engine_table
Query id: f736cbf4-09e5-4a0f-91b4-4d869b78e6e7
┌─name──┬─value─┐
│ one │ 1 │
│ two │ 2 │
│ three │ 3 │
└───────┴───────┘
it works ok!
but when i use hive url.
i got error

Related

trying to import csv file to table in sql

I have 4 csv files each having 500,000 rows. I am trying to import the csv data into my Exasol databse, but there is an error with the date column and I have a problem with the first unwanted column in the files.
Here is an example CSV file:
unnamed:0 , time, lat, lon, nobs_cloud_day
0, 2006-03-30, 24.125, -119.375, 22.0
1, 2006-03-30, 24.125, -119.125, 25.0
The table I created to import csv to is
CREATE TABLE cloud_coverage_CONUS (
index_cloud DECIMAL(10,0)
,"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
The command to import is
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv';
But I get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3050: [Column=0 Row=0] [Transformation of value='Unnamed: 0' failed - invalid character value for cast; Value: 'Unnamed: 0'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:59205' FILE 'e12a96a6-a98f-4c0a-963a-e5dad7319fd5' ;'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
Alternatively I use this table (without the first column):
CREATE TABLE cloud_coverage_CONUS (
"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
And use this import code:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'(2 FORMAT='YYYY-MM-DD', 3 .. 5);
But I still get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3052: [Column=0 Row=0] [Transformation of value='time' failed - invalid value for YYYY format token; Value: 'time' Format: 'YYYY-MM-DD'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:60350' FILE '22c64219-cd10-4c35-9e81-018d20146222' (2 FORMAT='YYYY-MM-DD', 3 .. 5);'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
(I actually do want to ignore the first column in the files.)
How can I solve this issue?
Solution:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv' (2 .. 5) ROW SEPARATOR = 'CRLF' COLUMN SEPARATOR = ',' SKIP = 1;
I did not realise that mysql is different from exasol
Looking at the first error message, a few things stand out. First we see this:
[Column=0 Row=0]
This tells us the problem is with the very first value in the file. This brings us to the next thing, where the message even tells us what value was read:
Transformation of value='Unnamed: 0' failed
So it's failing to convert Unnamed: 0. You also provided the table definition, where we see the first column in the table is a decimal type.
This makes sense. Unnamed: 0 is not a decimal. For this to work, the CSV data MUST align with the data types for the columns in the table.
But we also see this looks like a header row. Assuming everything else matches we can fix it by telling the database to skip this first row. I'm not familiar with Exasol, but according to the documentation I believe the correct code will look like this:
IMPORT INTO cloud_coverage_CONUS
FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'
(2 FORMAT='YYYY-MM-DD', 3 .. 5)
ROW SEPARATOR = 'CRLF'
COLUMN SEPARATOR = ','
SKIP = 1;

How do I update a JSON field in PostgreSQL?

I have a table with a column called data that contains some JSON. If the data column for any given row in the table is not null, it will contain a JSON-encoded object with a key called companyDescription. The value associated with companyDescription is an arbitrary JavaScript object.
If I query my table like this
select data->>'companyDescription' from companies where data is not null;
I get rows like this
{"ops":[{"insert":"\n"}]}
I am trying to update all rows in the table so that the companyDescription values will be wrapped in another JSON-encoded JavaScript object in the following manner:
{"type":"quill","content":{"ops":[{"insert":"\n"}]}}
Here's what I have tried, but I think it won't work because the ->> operator is for selecting some JSON field as text, and indeed it fails with a syntax error.
update companies
set data->>'companyDescription' = CONCAT(
'{"type":"quill","content":',
(select data->>'companyDescription' from companies),
'}'
);
What is the correct way to do this?
You can use a function jsonb_set. Currently XML or JSON values are immutable. You cannot to update some parts of these values. You can replace these values by some new modified value.
postgres=# select * from test;
┌──────────────────────────────────────────────────────────────────────┐
│ v │
╞══════════════════════════════════════════════════════════════════════╡
│ {"companyId": 10, "companyDescription": {"ops": [{"insert": "\n"}]}} │
└──────────────────────────────────────────────────────────────────────┘
(1 row)
postgres=# select jsonb_build_object('type', 'quill', 'content', v->'companyDescription') from test;
┌───────────────────────────────────────────────────────────┐
│ jsonb_build_object │
╞═══════════════════════════════════════════════════════════╡
│ {"type": "quill", "content": {"ops": [{"insert": "\n"}]}} │
└───────────────────────────────────────────────────────────┘
(1 row)
postgres=# select jsonb_set(v, ARRAY['companyDescription'], jsonb_build_object('type', 'quill', 'content', v->'companyDescription')) from test;
┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ jsonb_set │
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ {"companyId": 10, "companyDescription": {"type": "quill", "content": {"ops": [{"insert": "\n"}]}}} │
└────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)
So you final statement can looks like:
update companies
set data = jsonb_set(data::jsonb,
ARRAY['companyDescription'],
jsonb_build_object('type', 'quill',
'content', data->'companyDescription'))
where data is not null;

splitByChar Nullable

I have the next query:
SELECT DISTINCT col_name, toTypeName(col_name)
FROM remote('host_name', 'db.table', 'user', 'password')
The result is a 6 records(WITHOUT NULL). Example:
some_prefix-1, Nullable(String)
...
some_prefix-6, Nullable(String)
Now I try splitByChar, but I'm getting:
Code: 43, e.displayText() = DB::Exception: Nested type Array(String)
cannot be inside Nullable type (version 20.1.2.4 (official build))
I tried to use not null condition and convert type but the problem still remains. Like that:
SELECT DISTINCT toString(col_name) AS col_name_str,
splitByChar('-', col_name_str)
FROM remote('host_name', 'db.table', 'user', 'password')
WHERE col_name IS NOT NULL
Is this expected behavior? How to fix this?
Lack of Nullable support in splitByChar (https://github.com/ClickHouse/ClickHouse/issues/6517)
You use wrong cast toString
SELECT DISTINCT
cast(col_name, 'String') AS col_name_str,
splitByChar('-', col_name_str)
FROM
(
SELECT cast('aaaaa-vvvv', 'Nullable(String)') AS col_name
)
WHERE isNotNull(col_name)
┌─col_name_str─┬─splitByChar('-', cast(col_name, 'String'))─┐
│ aaaaa-vvvv │ ['aaaaa','vvvv'] │
└──────────────┴────────────────────────────────────────────┘
or assumeNotNull
SELECT DISTINCT
assumeNotNull(col_name) AS col_name_str,
splitByChar('-', col_name_str)
FROM
(
SELECT cast('aaaaa-vvvv', 'Nullable(String)') AS col_name
)
WHERE isNotNull(col_name)
┌─col_name_str─┬─splitByChar('-', assumeNotNull(col_name))─┐
│ aaaaa-vvvv │ ['aaaaa','vvvv'] │
└──────────────┴───────────────────────────────────────────┘

Avoid SQL injection while inserting a mix of hard-coded and variable values?

I'm writing database queries with pg-promise. My tables look like this:
Table "public.setting"
│ user_id │ integer │ not null
│ visualisation_id │ integer │ not null
│ name │ character varying │ not null
Table "public.visualisation"
│ visualisation_id │ integer │ not null
│ template_id │ integer │ not null
I want to insert some values into setting - three are hard-coded, and one I need to look up from visualisation.
The following statement does what I need, but must be vulnerable to SQL injection:
var q = "INSERT INTO setting (user_id, visualisation_id, template_id) (" +
"SELECT $1, $2, template_id, $3 FROM visualisation WHERE id = $2)";
conn.query(q, [2, 54, 'foo']).then(data => {
console.log(data);
});
I'm aware I should be using SQL names, but if I try using them as follows I get TypeError: Invalid sql name: 2:
var q = "INSERT INTO setting (user_id, visualisation_id, template_id) (" +
"SELECT $1~, $2~, template_id, $3~ FROM visualisation WHERE id = $2)";
which I guess is not surprising since it's putting the 2 in double quotes, so SQL thinks it's a column name.
If I try rewriting the query to use VALUES I also get a syntax error:
var q = "INSERT INTO setting (user_id, visualisation_id, template_id) VALUES (" +
"$1, $2, SELECT template_id FROM visualisation WHERE id = $2, $3)";
What's the best way to insert a mix of hard-coded and variable values, while avoiding SQL injection risks?
Your query is fine. I think you know value placeholders ($X parameter) and SQL Names too, but you are a bit confused.
In your query you only assign values to placeholders. The database driver will handle them for you, providing proper escaping and variable substitution.
The documentation says:
When a parameter's data type is not specified or is declared as
unknown, the type is inferred from the context in which the parameter
is used (if possible).
I can't find a source that states what is the default type, but I think the INSERT statement provides enough context to identify the real types.
On the other hand you have to use SQL Names when you build your query dinamically. For example you have variable column or table names. They must be inserted through $1~ or $1:name style parameters keeping you safe from injection attacks.

Record in Hive is not complete when using Flume's Hive sink

I want to use Flume to collect data to Hive database.
I have stored the data in the Hive ,but the data is not complete.
I want to inset the record like follows:
1201,Gopal
1202,Manisha
1203,Masthanvali
1204,Kiran
1205,Kranthi
when I run the Flume ,there is bucket_00000 and bucket_00000_flush_length in the HDFS(/user/hive/warehouse/test2.db/employee12/delta_0000501_0000600). (the data base is test2 , the table name is employee12)
when i use " select * from employee12",it show as follows:
--------------------------------------------------------------------
hive> select * from employee12;
OK
(two line next)
1201 Gopal
1202
Time taken: 0.802 seconds, Fetched: 1 row(s)
----------------------------------------------------------------------
Can anyone help me to find :
Why it only shows two rows?
Why the second row only list 1202?
How to setup correct config?
Flume config:
agenthive.sources = spooldirSource
agenthive.channels = memoryChannel
agenthive.sinks = hiveSink
agenthive.sources.spooldirSource.type=spooldir
agenthive.sources.spooldirSource.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agenthive.sources.spooldirSource.spoolDir=/home/flume/flume_test_home/spooldir
agenthive.sources.spooldirSource.channels=memoryChannel
agenthive.sources.spooldirSource.basenameHeader=true
agenthive.sources.spooldirSource.basenameHeaderKey=basename
agenthive.sinks.hiveSink.type=hive
agenthive.sinks.hiveSink.hive.metastore = thrift://127.0.0.1:9083
agenthive.sinks.hiveSink.hive.database = test2
agenthive.sinks.hiveSink.hive.table = employee12
agenthive.sinks.hiveSink.round = true
agenthive.sinks.hiveSink.roundValue = 10
agenthive.sinks.hiveSink.roundUnit = second
agenthive.sinks.hiveSink.serializer = DELIMITED
agenthive.sinks.hiveSink.serializer.delimiter = ","
agenthive.sinks.hiveSink.serializer.serdeSeparator = ','
agenthive.sinks.hiveSink.serializer.fieldnames =eid,name
agenthive.sinks.hiveSink.channel=memoryChannel
agenthive.channels.memoryChannel.type=memory
agenthive.channels.memoryChannel.capacity=100
Hive create table sentence:
create table if not exists employee12 (eid int,name string)
comment 'this is comment'
clustered by(eid) into 1 buckets
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as orc
tblproperties('transactional'='true');
Try to use external tables. I found this article when working on similar setup.