Missing hive execution jar 3.1.2 - hive

Hi I am trying to run the following command while installing hive 3.1.2
bin/schematool -dbType derby -initSchema
When I run this it tells me "Missing Hive Execution Jar: home/<user>/hive/lib/hive-exec-*.jar". I looked into my /hive/lib directory and it contains the hive-exec-3.1.2.jar. I'm running a 32-bit Ubuntu VM, that has Hadoop already installed on it and working. Java is up to date if that helps too, thanks for the help. First, I unpacked my apache hive tar file, moved it to my home directory changed it to just hive, then I set the export HIVE_HOME= “home/<user>/hive”
export PATH=$PATH:$HIVE_HOME/bin. Next, I made the following change to core-site.xml in hadoop,
<property>
<name>hadoop.proxyuser.firepower.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.firepower.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.groups</name>
<value>*</value>
</property>
</configuration>
Than I made a tmp directory and a user directory in hdfs with write privileges. However when I did this it was giving me error about a disabled stack guard, and that i should run execstac -c , and WARN util.NativeCodeLoder: Unable to load native-hadoop library for your platfor... using builtin-java classes. But i was still able to create the dirs. Than after that I tried to initalize the derby databse with the schematool.

Related

Zeppelin configuration properties file: Can't load BigQuery interpreter configuration

I am attempting to set my zeppelin.bigquery.project_id (or any bigquery configuration property) via my zeppelin-site.xml, but my changes are not loaded when I start Zeppelin. The project ID always defaults to ' '. I am able to change other configuration properties (ex. zeppelin.notebook.storage). I am using Zeppelin 0.7.3 from https://hub.docker.com/r/apache/zeppelin/.
zeppelin-site.xml (created before starting Zeppelin, before an interpreter.json file exists):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>zeppelin.notebook.storage</name>
<value>org.apache.zeppelin.notebook.repo.S3NotebookRepo</value>
<description>notebook persistence layer implementation</description>
</property>
... etc ...
<property>
<name>zeppelin.bigquery.project_id</name>
<value>my-project-id</value>
<description>Google Project Id</description>
</property>
</configuration>
Am I configuring the interpreter incorrectly? Could this parameter be overridden elsewhere?
I am not really familiar with Apache Zeppelin, but I have found some documentation pages that make me think that you should actually store the BigQuery configuration parameters in your Interpreter configuration file:
This entry in the GCP blog explains how to use the BigQuery Interpreter for Apache Zeppelin. It includes some examples on how to use it with Dataproc, Apache Spark and the Interpreter.
The BigQuery Interpreter documentation for Zeppelin 0.7.3 mentions that zeppelin.bigquery.project_id is the right parameter to configure, so that is not the issue here. Here there is some information on how to configure the Zeppelin Interpreters.
The GitHub page of the BigQuery Interpreter states that you have to configure the properties during Interpreter creation, and then you should enable is by using %bigquery.sql.
Finally, make sure that you are specifying the BigQuery interpreter in the appropriate field in the zeppelin-site.xml (like done in the template) or instead enable it by clicking on the "Gear" icon and selecting "bigquery".
Edit /usr/lib/zeppelin/conf/interpreter.json, change zeppelin.bigquery.project_id to be the value of your project and run sudo systemctl restart zeppelin.service.

Running Sqoop job on YARN using Oozie

I've got a problem with running Sqoop job on YARN in Oozie using Hue. I want to download table from Oracle database and upload that table to HDFS. I've got multinode cluster consists of 4 nodes.
I want to run simple Sqoop statement:
import --options-file /tmp/oracle_dos.txt --table BD.BD_TABLE --target-dir /user/user1/files/user_temp_20160930_30 --m 1
Options file is located on local system on node number 1. Other nodes have no options file in /tmp/ dir. I created Oozie workflow with Sqoop job and tried to run it, but I got error:
3432 [main] ERROR org.apache.sqoop.Sqoop - Error while expanding arguments
java.lang.Exception: Unable to read options file: /tmp/oracle_dos.txt
The weirdest thing is that the job is sometimes ok, but sometimes fails. The log file gave me answer why - Oozie runs Sqoop jobs on YARN.
Resource Manager (which is component of YARN) decides which node will execute Sqoop job. When Resource Manager decided that Node 1 (which has options file on local file system) should execute job, everything is ok. But when RM decided that one of other 3 nodes should execute Sqoop job, it failed.
This is big problem for me, because I don't want to upload options file on every node (because what if I will have 1000 nodes?). So my question is - is there any way to tell Resource Manager which node it should use?
You can make a custom file available for you oozie action on a node, it can be done by using <file> tag in your sqoop action, look at this syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<action name="[NODE-NAME]">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<command>[SQOOP-COMMAND]</command>
<arg>[SQOOP-ARGUMENT]</arg>
...
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
</sqoop>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
Also read this:
The file , archive elements make available, to map-reduce jobs, files
and archives. If the specified path is relative, it is assumed the
file or archiver are within the application directory, in the
corresponding sub-path. If the path is absolute, the file or archive
it is expected in the given absolute path.
Files specified with the file element, will be symbolic links in the
home directory of the task.
...
So in simplest case you put your file oracle_dos.txt in your workflow directory, add element oracle_dos.txt in workflow.xml and change you command to something like this:
import --options-file ./oracle_dos.txt --table BD.BD_TABLE --target-dir /user/user1/files/user_temp_20160930_30 --m 1
In this case nevertheless your sqoop action is running on some randomly picked node in a cluster, oozie will copy oracle_dos.txt to this node and you can refer to it as to local file.
Perhaps this is about file permissions. Try to put this file in /home/{user}.

Hortonworks Hive and SpagoBI

I want to connect hortonworks hive with SpagoBI studio i am using jdbc driver to make the connection but its not working please anyone solve this problem.
Thankyou
First of all you should create an environment file for the spagobi. In that file you need to provide the paths for jars of hive lib and hadoop-core.jar(for hadoop version 1)
Then you need to execute the run the env file and after that start SpagoBI.
It would run properly.
Basically this env file provides the access of jars of hive lib (including your hive-jdbc-*.jar) to SpagoBi
The environment file is
HADOOP_HOME=/usr/lib/hadoop
HIVE_HOME=/usr/lib/hive
echo -e '1\x01foo' > /tmp/spagobi/a.txt
echo -e '2\x01bar' >> /tmp/spagobi/a.txt
CLASSPATH=.:$HADOOP_HOME/hadoop-core.jar:$HIVE_HOME/conf
for i in ${HIVE_HOME}/lib/*.jar ; do
CLASSPATH=$CLASSPATH:$i
done
`
just save the code in file
spagobi.env
and then execute the file thorugh . spagobi-env.env
If you using updated version of hive then download some jar files that given below:-
1. hadoop-common-2.6.0.2.2.0.0-2041.jar
2. z-hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar
then,In SpagoBI Studio -> Go to data source connection -> select hive driver -> then add new driver file "z-hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar" file -> then ok.
Then, provide credentials to connect with hive given below:-
URL : jdbc:hive2://localhost:10000/xyz
Driver: org.apache.hive.jdbc.HiveDriver
definately, this will work.
Thanks
Aman
Edit server.xml
Path: All-In-One-SpagoBI-X.X.X\conf\server.xml
Add:
<!-- Hive Configuration-->
<Resource name="jdbc/hive" auth="Container" type="javax.sql.DataSource" driverClassName="org.apache.hive.jdbc.HiveDriver"
url="jdbc:hive2://data_node_server.com:10000/wsms" username=" " password=" "
maxActive="20" maxIdle="10" maxWait="-1"/>
Edit context.xml
Path: All-In-One-SpagoBI-X.X.X\webapp\spagobi\meta-inf\context.xml
Add:
<ResourceLink global="jdbc/hive" name="jdbc/hive" type="javax.sql.DataSource"/>
Do same with all context.xml for each engine:
All-In-One-SpagoBI-XXX\webapps\XXXEngine\META-INF
Add Following Jars in All-In-One-SpagoBI-XXX\lib:
httpcore-4.3.jar
httpclient-4.3-beta2.jar
httpclient-4.2.jar
hadoop-common-xxx.jar
hive-exec.jar
hive-jdbc-xxx.jar
hive-metastore-xxx.jar
hive-service-xxx.jar
slf4j-api-xxx.jar (present in webapp/spagobi/web-inf/lib)
hadoop-auth-xxx.jar (optional but recommended - may be required for kerberos or other authentication)
Data source
Label : hive2_conn (hive-jdbc-1.2.1000.2.4.0.0-169)
Description : Connecting to hive
Dialect : Hive QL
url : jdbc:hive2://server_name.com:10000/wsms (Note use data node server address , name node server wont work)
user : hive_user_name
pwd : hvie_user_pwd
Driver : org.apache.hive.jdbc.HiveDriver
Environment Variables - Hive Server (Data Node) (set variable for user which is used in data source)
vi ~/.bash_profile
HADOOP_HOME=/usr/hdp/2.4.0.0-169/hadoop
HIVE_HOME=/usr/hdp/2.4.0.0-169/hive CLASSPATH=.:$HADOOP_HOME/*.jar:$HADOOP_HOME/lib/*.jar:$HIVE_HOME/lib/*.jar
Hadoop Server Configurations(hive-site.xml):
<!-- hive Multi user Support -->
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<property>
<name>atlas.hook.hive.maxThreads</name>
<value>50</value>
</property>
<property>
<name>atlas.hook.hive.minThreads</name>
<value>5</value>
</property>
<!-- Configute to Support HTTP protocol default value binary (set it to http)-->
<property>
<name>hive.server2.transport.mode</name>
<value>http</value><!--default is binary-->
</property>
<!-- Query Optimization -->
<!-- Enable Cost Based Optimization , To Optimize Query Executio plan default value false (set it to True) -->
<property>
<name>hive.cbo.enable</name>enter code here
<value>true</value>
</property>

Cannot Load Hive Table into Pig via HCatalog

I am currently configuring a Cloudera HDP dev image using this tutorial on CentOS 6.5, installing the base and then adding the different components as I need them. Currently, I am installing / testing HCatalog using this section of the tutorial linked above.
I have successfully installed the package and am now testing HCatalog integration with Pig with the following script:
A = LOAD 'groups' USING org.apache.hcatalog.pig.HCatLoader();
DESCRIBE A;
I have previously created and populated a 'groups' table in Hive before running the command. When I run the script with the command pig -useHCatalog test.pig I get an exception rather than the expected output. Below is the initial part of the stacktrace:
Pig Stack Trace
---------------
ERROR 2245: Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1608)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
...
Has anyone encountered this error before? Any help would be much appreciated. I would be happy to provide more information if you need it.
The error was caused by HBase's Thrift server not being proper configured. I installed/configured Thrift and added the following to my hive-xml.site with the proper server information added:
<property>
<name>hive.metastore.uris</name>
<value>thrift://<!--URL of Your Server-->:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
I thought the snippet above was not required since I am running Cloudera HDP in pseudo-distributed mode.Turns out, it and HBase Thrift are required to use HCatalog with Pig.

Why 'mapred-site.xml' is not included in the latest Hadoop 2.2.0?

Latest build of Hadoop provides mapred-site.xml.template
Do we need to create a new mapred-site.xml file using this?
Any link on documentation or explanation related to Hadoop 2.2.0 will be much appreciated.
I believe it's still required. For our basic Hadoop 2.2.0 2-node cluster setup that we have working I did the following from the setup documentation.
"
From the base of the Hadoop installation, edit the etc/hadoop/mapred-site.xml file. A new
configuration option for Hadoop 2 is the capability to specify a framework name for
MapReduce, setting the mapreduce.framework.name property. In this install we will use the
value of "yarn" to tell MapReduce that it will run as a YARN application.
First, copy the template file to the mapred-site.xml.
cp mapred-site.xml.template mapred-site.xml
Next, copy the following into Hadoop etc/hadoop/mapred-site.xml file and remove the original empty tags.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
"
Wrt documentation, I found this the most useful. Also, etc/hosts configs for cluster setup and other cluster related configs were a bit hard to figure out.