I have created a database using hql , and is being created there . But I am not able to use that database from an impala application . But the table exists in the hive and we ca query it there . This issue is seen only for some newly created tables. Can somebody please help
issue the following command in impala shell.
invalidate metadata;
This will load the metadata information to the impala coordinator node you are connected to.
http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_invalidate_metadata.html
Related
I am using Hive based on HDInsight Hadoop cluster -- Hadoop 2.7 (HDI 3.6).
We have some old Hive tables that point to some very storage accounts that don't exist any more. But these tables still point to these storage locations , basically the Hive Metastore still contain references to the deleted storage accounts. If I try to drop such a hive table , I get an error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException org.apache.hadoop.fs.azure.AzureException: No credentials found for account <deletedstorage>.blob.core.windows.net in the configuration, and its container data is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.)
Manipulating the Hive Metatstore directly is risky as it could land the Metastore in an invalid state.
Is there any way to get rid of these orphan tables?
Is there is any way to see in which database we are working with in hive terminal. while working in hive using webGUI(hue) there is a list of database from where we can select the database(which will be active database)
Yes, we can. for that we have to set the properties in hive terminal.
SET hive.cli.print.current.db = true;
We have created a scheduler that pulls the data from on premise sql server and puts it on the HDFS. Now the problem is we need to verify if the data pushed is correct and is consistent with the on premise data.
Could you please help me how to compare these tables and their data for correctness . Any thing will help. Thanks.
You can use SQOOP which also supports validation between a Hive table and a database with the --validate option.
Refer to Sqoop User Guide - Validation for more details.
I created a MapR DB JSON table using Apache Drill and I would like to query that using Hive.
Is that possible or do I need to load data to newly created MapR DB JSON Hive table ?
Let me know.
Thanks,
Pratap
You can't create a MapR DB JSON table using Apache Drill. CTAS is supported for dfs only in Drill for now, see more [1].
Drill can query MapR-DB tables directly [2]. Also it can query Hive Mapr-DB tables via Hive code (hive-maprdb-json-handler) or via Drill MapR-DB native reader for Hive tables [3].
If you already have JSON tables in MapR-DB database you can create Hive external tables for them and query them in Hive then [4].
[1] https://drill.apache.org/docs/create-table-as-ctas/
[2] https://drill.apache.org/docs/mapr-db-format/
[3] https://issues.apache.org/jira/browse/DRILL-6454
[4] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExternalTables
I want to use HIVE table as a source and Oracle table as target in my Informatica developer tool mapping with hive environment. Mean I want to run my Informatcia developer tool mapping in HIVE mode. Is it possible if yes then please let me know the steps.
Thanks
Yes, it is possible, You just need to change the execution engine.
Refer to Informatica BDM help document for the detailed steps
You can create Oracle connection via sqoop and use it with any of the Hadoop engine in informatica bdm
Yes this is possible using sqoop for oracle. But there is limitation that with sqoop you can only insert in oracle.
You can use PowerExchange for Hadoop to load data from PowerCenter to new Hive tables.
Source or targets will appear in PowerCenter as ODBC data sources or ODBC targets .