I have 2 hadoop cluster A and B. And i have hive setup on Cluster A. I am not able to access the data on cluster B from hive setup.
I am doing this to create the table
CREATE EXTERNAL TABLE test_table (id string)
LOCATION "hdfs://B/user/siddharth.a/test_sid";
I am getting the exception
message:java.lang.IllegalArgumentException: java.net.UnknownHostException: B
I am able to create the table.
Ex: The below command works.
CREATE EXTERNAL TABLE test_table (id string)
LOCATION "hdfs://ip_of_namenode_of_B/user/siddharth.a/test_sid";
if i use
dfs -ls hdfs://B/user/siddharth.a
this is working.The last command should fail as well.
Related
I had a client upload a malformed table with a name like foo.bar into an Athena instance. What syntax can I use to drop the table? If I try
drop table if exists `foo.bar`
The command silently fails, presumably because the parser interprets foo as the database name. If I try adding the database name explicitly as
drop table if exists dbname."foo.bar"
or
drop table if exists dbname.`foo.bar`
I get a parse error from Athena.
Unfortunately, I don't have access to the Glue console to remove the table from there so I was wondering if it's possible to drop such a table via Athena SQL. Thanks!
Even if you don't have access to the Glue console you can use the the AWS CLI to delete the table directly using the Glue API:
aws glue delete-table --database-name dbname --name foo.bar
i am trying to creating hive managed table based on the hbase table.
i created sample hbase table like :
create ‘hbase_table’,’name’
and copied hbase-site.xml to hive/conf,
and created auxlib directory in hive root directory, added zookeeper, hbase-hive-handler, hbase jars to auxlib and given that path to hive.aux.jars.path in hive-site.xml
loaded the data into hbase table, i can able to access the hbase data.
now i am trying to create hive external table. here is my syntax:
CREATE EXTERNAL TABLE hive_table (Row_key string, First_name string,last_name string,age int,City string,Team string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES(“hbase.columns.mapping”=”:key,name:first_name ,name:last_name,details:age,details:city,details:team”) TBLPROPERTIES (“hbase.table.name” = “hbase_table”);
i am getting following exception:
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.TableNotFoundException: hbase_table
I am troubleshooting an application issue on an External (unmanaged) Table that was created using the CREATE TABLE X LIKE PARQUET syntax via Cloudera Impala. I am trying to determine the Location of the files comprising the partitions of the External table but having difficulty determining how to do this, or finding documentation describing this.
If I do a:
show create table T1;
I see the hive-managed location such as:
LOCATION 'hdfs://nameservice1/user/hive/warehouse/databaseName'
If I do a:
describe formatted T1;
I see that the table is in fact external but it doesnt give any insight on the unmanaged Location.
| Table Type: | EXTERNAL_TABLE
| Location: | hdfs://nameservice1/user/hive/warehouse/databaseName/T1
Question:
How do I determine the Location/URI/Parent Directory of the actual external files that comprise this External Table?
When you create a external table with impala or hive and you want know the location you must put the HDFS location, for example :
CREATE EXTERNAL TABLE my_db.table_name
(column string ) LOCATION 'hdfs_path'
The probably location o theses files if dont provite this, is under user directory that execute the comand create table.
For more detail you can see this link:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_create_table.html
I hope to help!
In Hive, for a managed table changed the location to a non HDFS location, for example Amazon S3. When we drop the managed table, the data in the external location will be lost?
The below explanation provides steps to create the scenario (done in cloudera instance).
CREATE TABLE states_internal (state string) LOCATION '/user/demo/states';
1) Create database mytraining;
2) Create internal/managed table in a location in hive warehouse location: It creates the table states_internal poing to the location hdfs://quickstart.cloudera:8020/user/demo/states.
CREATE TABLE mytraining.states_internal (state string) LOCATION '/user/demo/states';
2) Load the data from local file system
LOAD DATA LOCAL INPATH 'file:///home/cloudera/Desktop/Hivedocs/hivedata/states.txt' INTO TABLE myTraining.states_internal;
3)Change the table location from hdfs to local
ALTER TABLE states_internal set LOCATION 'file:///home/cloudera/Desktop/Hivedocs/hivedata/states.txt'
4) Drop the table states_internal. Check it deletes the local file file:///home/cloudera/Desktop/Hivedocs/hivedata/states.txt. The previous file present in the hdfs: hdfs://quickstart.cloudera:8020/user/demo/states is still present.
DROP TABLE states_internal;
In a cluster having Hive installed, What does the metastore and namenode have? i understand that the Metastore has all the table schema and partition details and metadata. Now what is this metadata? then what does the namenode have? and where is this metastore present in a cluster?
The NameNode keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It also keeps track of all the DataNode(Dead+Live) through heartbeat mechanism. It also helps client for reads/writes by receiving their requests and redirecting them to the appropriate DataNode.
The metadata which metastore stores contains things like :
IDs of Database
IDs of Tables
IDs of Index
The time of creation of an Index
The time of creation of a Table
IDs of roles assigned to a particular user
InputFormat used for a Table
OutputFormat used for a Table etc etc.
Is this what you wanted to know?
And it is not mandatory to have metastore in the cluster itself. Any machine(inside or outside the cluster) having a JDBC-compliant database can be used for the metastore.
HTH
P.S : You might find the E/R diagram of metastore useful.
Hive data (not metadata) is spread across Hadoop HDFS DataNode servers. Typically, each block of data is stored on 3 different DataNodes. The NameNode keeps track of which DataNodes have which blocks of actual data.
For a Hive production environment, the metastore service should run in an isolated JVM. Hive processes can communicate with the metastore service using Thrift. The Hive metastore data is persisted in an ACID database such as Oracle DB or MySQL. You can use SQL to find out what is in the Hive metastore:
Here are the tables in the Hive metastore:
SQL> select table_name from user_tables;
DBS
DATABASE_PARAMS
SEQUENCE_TABLE
SERDES
TBLS
SDS
CDS
BUCKETING_COLS
TABLE_PARAMS
PARTITION_KEYS
SORT_COLS
SD_PARAMS
COLUMNS_V2
SERDE_PARAMS
You can describe the structure of each table:
SQL> describe partition_keys;
TBL_ID NUMBER
PKEY_COMMENT VARCHAR2(4000)
PKEY_NAME VARCHAR2(128)
PKEY_TYPE VARCHAR2(767)
INTEGER_IDX NUMBER(10)
And find the contents of each table:
SQL> select * from partition_keys;
So if in Hive you "CREATE TABLE xxx (...) PARTITIONED BY (...)" the Hive partitioning data is stored into the metastore (Oracle, MySQL...) database.
For example, in Hive if you create a table like this:
hive> create table employee_table (id bigint, name string) partitioned by (region string);
You will find this in the metastore:
SQL> select tbl_id,pkey_name from partition_keys;
TBL_ID PKEY_NAME
------ ---------
8 region
SQL> select tbl_name from tbls where tbl_id=8;
TBL_NAME
--------
employee_table
When you insert data into employee_table, the data will be stored in HDFS on Hadoop DataNodes and the NameNode will keep track of which DataNodes have the data.
Metastore - Its a database which stores metadata a.k.a all the details about the tables you create in HIVE. By default, HIVE comes with and uses Derby database. But you can use any other database like MySQL or Oracle.
Use of Metastore: Whenever you fire a query from your Hive CLI, the Execution engine gathers all the details regarding the table and creates an Execution plan(Job). These details comes from Metastore. Finally the Execution engine sends the Job to Hadoop. From here, the common Hadoop Map Reduce Job is executed and the result is send back to Hive. The Name node communicates with Execution engine to successfully execute the MR Job.
Above diagram is excellent one to understand Hive and hadoop communication.
Regarding Hive-Metastore (not hadoop - metastore):
It is not necessary/compulsory to have metastore in your hadoop environment as it is only required if you are using HIVE on top of your HDFS cluster.
Metastore is the metadata repository for HIVE only and used by HIVE to store created database object's meta information only(not actual data, which is already in HDFS because HIVE do not store data. Hive uses already stored datain File system)
Hive implementation required a metastore service using any RDBMS.
Regarding Namenode (hadoop -namenode):
core part of Hadoop, which behaves like metastore for cluster.
Not a RDBMS . Stores file system meta info in File System only.