Cannot Load Hive Table into Pig via HCatalog - hive

I am currently configuring a Cloudera HDP dev image using this tutorial on CentOS 6.5, installing the base and then adding the different components as I need them. Currently, I am installing / testing HCatalog using this section of the tutorial linked above.
I have successfully installed the package and am now testing HCatalog integration with Pig with the following script:
A = LOAD 'groups' USING org.apache.hcatalog.pig.HCatLoader();
DESCRIBE A;
I have previously created and populated a 'groups' table in Hive before running the command. When I run the script with the command pig -useHCatalog test.pig I get an exception rather than the expected output. Below is the initial part of the stacktrace:
Pig Stack Trace
---------------
ERROR 2245: Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1608)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
...
Has anyone encountered this error before? Any help would be much appreciated. I would be happy to provide more information if you need it.

The error was caused by HBase's Thrift server not being proper configured. I installed/configured Thrift and added the following to my hive-xml.site with the proper server information added:
<property>
<name>hive.metastore.uris</name>
<value>thrift://<!--URL of Your Server-->:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
I thought the snippet above was not required since I am running Cloudera HDP in pseudo-distributed mode.Turns out, it and HBase Thrift are required to use HCatalog with Pig.

Related

Unable to connect to AWS RDS using Liquibase on Mac

I am unable to connect to an AWS RDS instance using liquibase from the terminal using a Mac. I am able to connect to the same database and URL using MySQL workbench and pymysql in python. I have mysql-connector-java.jar in Liquibase's lib folder. I have also done brew install mysql and brew install mysql-client. I get the same error when using an invalid URL, but I have checked (many times) to make sure the URL string is correct. The exact error I get is:
Unexpected error running Liquibase: liquibase.exception.DatabaseException: liquibase.exception.DatabaseException: Connection could not be created to jdbc:mysql://-URL IS HERE-?createDatabaseIfNotExists=true with driver com.mysql.cj.jdbc.Driver. Communications link failure
From the command:
liquibase --url=jdbc:mysql://-URL IS HERE-:3306/cdp_sms?createDatabaseIfNotExists=true --username="cdpadmin" --password=$CDP_DB_PW --changeLogFile=db.changelog-master.xml update
Any help would be greatly appreciated.
I had encounter a similar issue a while ago with Liquibase and AWS RDS. I would suggest you to check your security groups (inbound/outbound rules) in the aws settings.

Hive with HBase (both Kerberos) java.net.SocketTimeoutException .. on table 'hbase:meta'

Error
Receiving Timeout errors when trying to query HBase from Hive using HBaseStorageHandler.
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68199: row 'phoenix_test310,,'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase-master.example.com,16020,1583728693297, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
... 3 more
I tried to follow what documentation I could and have some hbase configuraiton options added to hive-site.xml based on this Cloudera link
Environment:
Hadoop 2.9.2
HBase 1.5
Hive 2.3.6
Zookeeper 3.5.6
First, the Cloudera link should be ignored, Hive detects the presence of HBase through environment variables and then automatically reads the hbase-site.xml configuration settings.
There is no need to duplicate HBase settings within hive-site.xml
Configuring Hive for HBase
Modify your hive-env.sh as folllows:
# replace <hbase-install> with your installation path /etc/hbase for example
export HBASE_BIN="<hbase-install>/bin/hbase"
export HBASE_CONF_DIR="<hbase-install>/conf"
Separately you should ensure HADOOP_* environment variables are set as well in hive-env.sh,
and that the hbase lib directory is added to HADOOP_CLASSPATH.
We solved this error,by adding this property hbase.client.scanner.timeout.period=600000
hbase 1.2
https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/admin_hbase_scanner_heartbeat.html#concept_xsl_dz1_jt

Alluxio + Hive on EMR

I have Alluxio 1.8 installed on an EMR 5.19.0 cluster, and can see my S3 tables using /usr/local/alluxio/bin/alluxio fs ls /.
However, when I start up hive and issue
hive> [[DDL w/ LOCATION = alluxio://master_host:19998/my_table ]]], I get the following:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found
Is there a way of getting past this? I've tried starting hive with --auxpath pointing to both /usr/local/alluxio/client/alluxio-1.8.1-client.jar and a copy of the jar on hdfs without any success.
Any help?
I posted a blog talking about the reasons for the error message java.lang.ClassNotFoundException: Class alluxio.hadoop.FileSystem not found. Here are some tips, hope they can help:
For Hive, set environment variable HIVE_AUX_JARS_PATH in conf/hive-env.sh:
export HIVE_AUX_JARS_PATH=/<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar:${HIVE_AUX_JARS_PATH}
which I guess is equivalent to what you have done to set --auxpath.
Depending on your setting of Hive (e.g., Hive on MR or Spark or Tez), you may also need to make sure the runtime is also able to access the client jar. Take Hive on MR as an example, you perhaps also need to append the path to Alluxio client jar to mapreduce.application.classpath or yarn.application.classpath to ensure each task of the MR jobs can access this jar.

issue in accessing ORC transactional table through apache drill

Issue while accessing ORC transactional hive table through apache drill.
Apache drill 1.10.0
Hive 1.2.1
Below is the error coming while accessing data from the mentioned table through apache drill.
Query Failed: An Error Occurred
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException: For input string: "0000112_0000" [Error Id: ad9b4243-d48d-43c7-9755-388202d7c54d on inbbrdssvm16.india.tcs.com:31010]
Please help me in resolving the issue.
I suggest you to move onto latest Drill and Hive versions.
This issue is resolved in Apache Drill 1.13.0 version
https://issues.apache.org/jira/browse/DRILL-5978

Not able to start hiveserver2 for Apache Hive

Could any one help to resolve below problem, I'm trying to start hserver2 and I configured hive_site.xml and configuration file for Hadoop Directory path as well and jar file hive-service-rpc-2.1.1.jar also available at directory lib. And I am able to start using hive but not hiveserver2
$ hive --service hiveserver2 Exception in thread "main" java.lang.ClassNotFoundException: /home/directory/Hadoop/Hive/apache-hive-2/1/1-bin/lib/hive-service-rpc-2/1/1/jar
export HIVE_HOME=/usr/local/hive-1.2.1/
export HIVE_HOME=/usr/local/hive-2.1.1
I am glad that I solve it's problem. Here is my question ,I have different version hive ,and My command use 1.2.1, but it find it's jar form 2.1.1.
you can user command which hive server 2 ,find where is you command from .