How to read data from hive table using flink sql client? - hive

I tried to read the data from hive table using the flink sql client as per the flink documentation but it failed. i can read the table meta information,but not the data.
here is my hive data:
0: jdbc:hive2://localhost:10000> create database testdb ;
No rows affected (2.048 seconds)
0: jdbc:hive2://localhost:10000> use testdb ;
No rows affected (0.106 seconds)
0: jdbc:hive2://localhost:10000> create table source (a bigint, b bigint) ;
No rows affected (1.026 seconds)
0: jdbc:hive2://localhost:10000> show tables ;
+-----------+--+
| tab_name |
+-----------+--+
| source |
+-----------+--+
1 row selected (0.877 seconds)
0: jdbc:hive2://localhost:10000> insert into source values (1, 1) ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
No rows affected (7.498 seconds)
0: jdbc:hive2://localhost:10000> select a, b from source ;
+----+----+--+
| a | b |
+----+----+--+
| 1 | 1 |
+----+----+--+
here is my flink sql client:
Flink SQL> show catalogs ;
+-----------------+
| catalog name |
+-----------------+
| default_catalog |
| myhive |
+-----------------+
2 rows in set
Flink SQL> use catalog myhive ;
[INFO] Execute statement succeed.
Flink SQL> show databases ;
+---------------+
| database name |
+---------------+
| default |
| testdb |
+---------------+
2 rows in set
Flink SQL> use testdb ;
[INFO] Execute statement succeed.
Flink SQL> show tables ;
+------------+
| table name |
+------------+
| source |
+------------+
1 row in set
Flink SQL> SET sql-client.execution.result-mode=tableau;
[INFO] Session property has been set.
Flink SQL> select a, b from source ;
Empty set
I have added a new project to quickly reproduce this error: https://github.com/lianxmfor/flink-hive

Related

tidb cannot fuzzy query with like '_' for double byte character?

In my program, I want to lookup "测试" with
select * from test where name like '测_';
After my test, I found that MySQL can, but tidb can't?
This seems to be working fine for me. What character set is your table, column and connection? What TiDB version are you using?
mysql> CREATE TABLE t1 (id char(2) character set utf8mb4 primary key);
Query OK, 0 rows affected (0.18 sec)
mysql> INSERT INTO t1 VALUES('测试'),('测x');
Query OK, 2 rows affected (0.12 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t1 WHERE id LIKE '测_';
+--------+
| id |
+--------+
| 测x |
| 测试 |
+--------+
2 rows in set (0.11 sec)
mysql> SELECT tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version() |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v5.4.0
Edition: Community
Git Commit Hash: 55f3b24c1c9f506bd652ef1d162283541e428872
Git Branch: heads/refs/tags/v5.4.0
UTC Build Time: 2022-01-25 08:39:26
GoVersion: go1.16.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.11 sec)

How to display all columns and its data type in a table via SQL query

I am trying to print the column names from a table called 'meta' and I need also its data types.
I tried this query
SELECT meta FROM INFORMATION_SCHEMA.TABLES;
but it throws an error saying no information schema available. Could you please help me, I am a beginner in SQL.
Edit:
select tables.name from tables join schemas on
tables.schema_id=schemas.id where schemas.name=’sprl_db’ ;
This query gives me all the tables in database 'sprl_db'
You can use the monetdb catalog:
select c.name, c.type, c.type_digits, c.type_scale
from sys.columns c
inner join sys.tables t on t.id = c.table_id and t.name = 'meta';
as you are using monetDB you can get that by using sys.columns
sys.columns
it will return all information related to table columns
you can also check Schema, table and columns documentation for monetDB
in sql server we get that like this exec sp_columns TableName
If I understand correctly you need to see the columns and the types of a table you (or some other user) defined called meta?
There are at least two ways to do this:
First (as #GMB mentioned in their answer) you can query the SQL catalog: https://www.monetdb.org/Documentation/SQLcatalog/TablesColumns
SELECT * FROM sys.tables WHERE NAME='meta';
+------+------+-----------+-------+------+--------+---------------+--------+-----------+
| id | name | schema_id | query | type | system | commit_action | access | temporary |
+======+======+===========+=======+======+========+===============+========+===========+
| 9098 | meta | 2000 | null | 0 | false | 0 | 0 | 0 |
+------+------+-----------+-------+------+--------+---------------+--------+-----------+
1 tuple
So this gets all the relevant information about the table meta. We are mostly interested in the value of the column id because this uniquely identifies the table.
(Please note that this id will probably be different in your system)
After we have this information we can query the columns table with this table id:
SELECT * FROM sys.columns WHERE table_id=9098;
+------+------+------+-------------+------------+----------+---------+-------+--------+---------+
| id | name | type | type_digits | type_scale | table_id | default | null | number | storage |
+======+======+======+=============+============+==========+=========+=======+========+=========+
| 9096 | i | int | 32 | 0 | 9098 | null | true | 0 | null |
| 9097 | j | clob | 0 | 0 | 9098 | null | true | 1 | null |
+------+------+------+-------------+------------+----------+---------+-------+--------+---------+
2 tuples
Since you are only interested in the names and types of the columns, you can modify this query as follows:
SELECT name, type FROM sys.columns WHERE table_id=9098;
+------+------+
| name | type |
+======+======+
| i | int |
| j | clob |
+------+------+
2 tuples
You can combine the two queries above with a join:
SELECT col.name, col.type FROM sys.tables as tab JOIN sys.columns as col ON tab.id=col.table_id WHERE tab.name='meta';
+------+------+
| name | type |
+======+======+
| i | int |
| j | clob |
+------+------+
2 tuples
The second, and preferred way to get this information if you are using the mclient utility of MonetDB, is by using the describe meta-command of mclient. When used without arguments it presents a list of tables that have been defined in the current database and when it is given the name of the table it prints its SQL definition:
sql>\d
TABLE sys.data
TABLE sys.meta
sql>\d sys.meta
CREATE TABLE "sys"."meta" (
"i" INTEGER,
"j" CHARACTER LARGE OBJECT
);
You can use the \? meta-command to see a list of all meta-commands in mclient:
sql>\?
\? - show this message
\<file - read input from file
\>file - save response in file, or stdout if no file is given
\|cmd - pipe result to process, or stop when no command is given
\history - show the readline history
\help - synopsis of the SQL syntax
\D table - dumps the table, or the complete database if none given.
\d[Stvsfn]+ [obj] - list database objects, or describe if obj given
\A - enable auto commit
\a - disable auto commit
\e - echo the query in sql formatting mode
\t - set the timer {none,clock,performance} (none is default)
\f - format using renderer {csv,tab,raw,sql,xml,trash,rowcount,expanded,sam}
\w# - set maximal page width (-1=unlimited, 0=terminal width, >0=limit to num)
\r# - set maximum rows per page (-1=raw)
\L file - save client-server interaction
\X - trace mclient code
\q - terminate session and quit mclient
For MySQL:
SELECT column_name,
data_type
FROM information_schema.columns
WHERE table_schema = ’ yourdatabasename ’
AND table_name = ’ yourtablename ’;
Output:
+-------------+-----------+
| COLUMN_NAME | DATA_TYPE |
+-------------+-----------+
| Id | int |
| Address | varchar |
| Money | decimal |
+-------------+-----------+

How to delete hive table records ?

how to delete hive table records, we have 100 records there and i need to delete 10 records only,
when i use
dfs -rmr table_name whole table deleted
if any chance to delete in Hbase , send to data in Hbase,
You cannot delete directly from Hive table,
However, you can use a workaround of overwriting into Hive table
insert overwrite into table_name
select * from table_name
where id in (1,2,3,...)
You can't delete data from Hive tables since it is already written in the files in HDFS. You can only drop partitions which deletes directories in HDFS. So best practice is to have partitions if you want to delete in the future.
To delete records in a table, you can use the SQL syntax from your hive client :
DELETE FROM tablename [WHERE expression]
Try with where and your key with in clause
DELETE FROM tablename where id in (select id from tablename limit 10);
Example:-
I had acid transactional table in hive
select * from trans;
+-----+-------+--+
| id | name |
+-----+-------+--+
| 2 | hcc |
| 1 | hi |
| 3 | hdp |
+-----+-------+--+
Now i want to delete only 2 then my delete statement would be
delete from trans where id in (select id from trans limit 1);
Result:-
select * from trans;
+-----+-------+--+
| id | name |
+-----+-------+--+
| 1 | hi |
| 3 | hdp |
+-----+-------+--+
So we have just deleted the first record like this way you can specify limit 10 then hive can delete first 10 records.
you can specify orderby... some other clauses in your subquery if you need to delete only first 10 having specific order(like delete id's from 1 to 10).

Parquet-backed Hive table: array column not queryable in Impala

Although Impala is much faster than Hive, we used Hive because it supports complex (nested) data types such as arrays and maps.
I notice that Impala, as of CDH5.5, now supports complex data types. Since it's also possible to run Hive UDF's in Impala, we can probably do everything we want in Impala, but much, much faster. That's great news!
As I scan through the documentation, I see that Impala expects data to be stored in Parquet format. My data, in its raw form, happens to be a two-column CSV where the first column is an ID, and the second column is a pipe-delimited array of strings, e.g.:
123,ASDFG|SDFGH|DFGHJ|FGHJK
234,QWERT|WERTY|ERTYU
A Hive table was created:
CREATE TABLE `id_member_of`(
`id` INT,
`member_of` ARRAY<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
The raw data was loaded into the Hive table:
LOAD DATA LOCAL INPATH 'raw_data.csv' INTO TABLE id_member_of;
A Parquet version of the table was created:
CREATE TABLE `id_member_of_parquet` (
`id` STRING,
`member_of` ARRAY<STRING>)
STORED AS PARQUET;
The data from the CSV-backed table was inserted into the Parquet table:
INSERT INTO id_member_of_parquet SELECT id, member_of FROM id_member_of;
And the Parquet table is now queryable in Hive:
hive> select * from id_member_of_parquet;
123 ["ASDFG","SDFGH","DFGHJ","FGHJK"]
234 ["QWERT","WERTY","ERTYU"]
Strangely, when I query the same Parquet-backed table in Impala, it doesn't return the array column:
[hadoop01:21000] > invalidate metadata;
[hadoop01:21000] > select * from id_member_of_parquet;
+-----+
| id |
+-----+
| 123 |
| 234 |
+-----+
Question: What happened to the array column? Can you see what I'm doing wrong?
It turned out to be really simple: we can access the array by adding it to the FROM with a dot, e.g.
Query: select * from id_member_of_parquet, id_member_of_parquet.member_of
+-----+-------+
| id | item |
+-----+-------+
| 123 | ASDFG |
| 123 | SDFGH |
| 123 | DFGHJ |
| 123 | FGHJK |
| 234 | QWERT |
| 234 | WERTY |
| 234 | ERTYU |
+-----+-------+

"Invalid combination field1 field2 field3" error message while trying to insert record into postgresql database

I'm trying to restore a database from server A to server B. For some reason, the import fails on 3 specific INSERT statements:
INSERT INTO tbl1 (device_id, group_name, param_id, value) VALUES (15, 'regX', 13, '4323');
INSERT INTO tbl1 (device_id, group_name, param_id, value) VALUES (15, 'device', 1, 'aatd');
INSERT INTO tbl1 (device_id, group_name, param_id, value) VALUES (15, 'regX', 14, 'ttdf');
The error returned is:
ERROR: Invalid combination of device, group, and parameter
It's the same error each record.
Here's what the table definition looks like:
testdb=# \d+ tbl1;
Table "public.tbl1"
Column | Type | Modifiers | Storage | Stats target | Description
------------+------------------------+-----------+----------+--------------+-------------
device_id | integer | | plain | |
group_name | character varying(255) | | extended | |
param_id | integer | | plain | |
value | character varying(255) | | extended | |
Other records that look similar work, with no issues. For example:
INSERT INTO tbl1 (device_id, group_name, param_id, value) VALUES (103, 'regX', 13, '130');
In fact, the database / import file has over 900 records and these are the only 3 that fail.
How I created the dump file / How I'm importing the dump:
To export:
pg_dump --create -U postgres origdb > outputfile.sql
And then on the new server, to import:
psql -f outputfile.sql -U postgres
What I've Tried So Far:
I've confirmed that in the original database, these records exist, and match what was generated by the dump command.
Here's what the data looks like in the original database:
origdb=# select * from tbl1 where device_id = 15;
device_id | group_name | param_id | value
-----------+------------+----------+--------------
15 | regX | 13 | 4323
15 | device | 1 | aatd
15 | regX | 14 | ttdf
(3 rows)
I've tried to import these records manually on the new server vs. importing the entire dump file. I get the same error message.
I've also been checking to see what pk's have been defined...
testdb=# SELECT
pg_attribute.attname,
format_type(pg_attribute.atttypid, pg_attribute.atttypmod)
FROM pg_index, pg_class, pg_attribute, pg_namespace
WHERE
pg_class.oid = 'tbl1'::regclass AND
indrelid = pg_class.oid AND
nspname = 'public' AND
pg_class.relnamespace = pg_namespace.oid AND
pg_attribute.attrelid = pg_class.oid AND
pg_attribute.attnum = any(pg_index.indkey)
AND indisprimary;
attname | format_type
---------+-------------
(0 rows)
Questions:
I'm not quite sure where it's getting the names "device, group, and parameter" in the error message ... what do these correspond to? I assume field names, but how can I verify this?
Any suggestions on what else to check to troubleshoot? I'm just hunting around to look for any foreign keys on this table etc.?? But any suggestions would be appreciated.
I didn't make this database so I'm not sure of all the relations etc.
Thanks.
This looks like a trigger that blocks these specific inserts and displays a custom message.
The trigger can be disabled in the original database but not on the new one.
See your user created triggers with this command:
SELECT * FROM pg_trigger;