Vectorwise to Hive using Sqoop

Vectorwise to Hive using Sqoop - hive

I've been trying to import a table from Vectorwise to Hive using Sqoop. I downloaded the Vectorwise JDBC driver and all. It just ain't working.
This is the command I'm using:
sudo -u hdfs sqoop import --driver com.ingres.jdbc.IngresDriver --connect jdbc:ingres://172.16.63.157:VW7/amit --username ingres -password ingres --table vector_table --hive-table=vector_table --hive-import --create-hive-table -m 1
And I'm getting the error:
12/06/07 22:08:27 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.ingres.jdbc.IngresDriver
java.lang.RuntimeException: Could not load db driver class: com.ingres.jdbc.IngresDriver
at com.cloudera.sqoop.manager.SqlManager.makeConnection(SqlManager.java:635)
at com.cloudera.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:53)
at com.cloudera.sqoop.manager.SqlManager.execute(SqlManager.java:524)
at com.cloudera.sqoop.manager.SqlManager.execute(SqlManager.java:547)
at com.cloudera.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:191)
at com.cloudera.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:175)
at com.cloudera.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:263)
at com.cloudera.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1226)
at com.cloudera.sqoop.orm.ClassWriter.generate(ClassWriter.java:1051)
at com.cloudera.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:84)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:370)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:456)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:146)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:239)
I'd really appreciate it if someone can help me out here.
Thanks in advance! :)

I can't comment yet so as an answer:
This is a quote from the documentation:
You can use Sqoop with any other JDBC-compliant database. First,
download the appropriate JDBC driver for the type of database you want
to import, and install the .jar file in the $SQOOP_HOME/lib directory
on your client machine. (This will be /usr/lib/sqoop/lib if you
installed from an RPM or Debian package.) Each driver .jar file also
has a specific driver class which defines the entry-point to the
driver. For example, MySQL’s Connector/J library has a driver class of
com.mysql.jdbc.Driver. Refer to your database vendor-specific
documentation to determine the main driver class. This class must be
provided as an argument to Sqoop with --driver.
Do you have the proper jar file in a directory that's accessible by Sqoop?
For the future it is also always useful if you give a bit more information about your environment like which version of Sqoop you are using etc.

Okay, I got it working. It was a simple permission issue. I changed the owner of iijdbc.jar to hdfs.
sudo chown hdfs /usr/lib/sqoop/lib/iijdbc.jar
Now it's working! :)
I can now import my Vectorwise tables to Hive using Sqoop. Great!

Related

Not able to start hiveserver2 for Apache Hive

Could any one help to resolve below problem, I'm trying to start hserver2 and I configured hive_site.xml and configuration file for Hadoop Directory path as well and jar file hive-service-rpc-2.1.1.jar also available at directory lib. And I am able to start using hive but not hiveserver2
$ hive --service hiveserver2 Exception in thread "main" java.lang.ClassNotFoundException: /home/directory/Hadoop/Hive/apache-hive-2/1/1-bin/lib/hive-service-rpc-2/1/1/jar

export HIVE_HOME=/usr/local/hive-1.2.1/
export HIVE_HOME=/usr/local/hive-2.1.1
I am glad that I solve it's problem. Here is my question ,I have different version hive ,and My command use 1.2.1, but it find it's jar form 2.1.1.
you can user command which hive server 2 ,find where is you command from .

How to access custom UDFs through Spark Thrift Server?

I am running Spark Thrift Server on EMR. I start up the Spark Thrift Server by:
sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh --queue interactive.thrift --jars /opt/lib/custom-udfs.jar
Notice that I have a customer UDF jar and I want to add it to the Thrift Server classpath, so I added --jars /opt/lib/custom-udfs.jar in the above command.
Once I am in my EMR, I issued the following to connect to the Spark Thrift Server.
beeline -u jdbc:hive2://localhost:10000/default
Then I was able to issue command like show databases. But how do I access the custom UDF? I thought by adding the --jars option in the Thrift Server startup script, that would add the jar for Hive resource to use as well.
The only way I can access the custom UDF now is by adding the customer UDF jar to Hive resource
add jar /opt/lib/custom-udfs.jar
Then create function of the UDF.
Question:
Is there a way to auto config the custom UDF jar without adding jar each time to the spark session?
Thanks!

The easiest way is to edit the file start-thriftserver.sh, at the end:
Wait server is ready
Execute setup SQL query
You could also post a proposal on JIRA, this is a very good feature "Execute setup code at start up".

The problem here seems to be that the --jars should be positioned correctly; which should be the first argument. I too had trouble getting the jars to work properly. This worked for me
# if your spark installation is in /usr/lib/
sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh \
--jars /path/to/jars/jar1.jar,/path/to/jars/jar2.jar \
--properties-file ./spark-thrift-sparkconf.conf \ # this is only needed if you want to customize spark configuration, it looks similar to spark-defaults.conf
--class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

How to connect Spark-Notebook to Hive metastore?

This is a cluster with Hadoop 2.5.0, Spark 1.2.0, Scala 2.10, provided by CDH 5.3.2. I used a compiled spark-notebook distro
It seems Spark-Notebook cannot find the Hive metastore by default.
How to specify the location of hive-site.xml for spark-notebook so that it can load the Hive metastore?
Here is what I tried:
link all files from /etc/hive/conf, with hive-site.xml included, to the current directory
specify SPARK_CONF_DIR variable in bash

When you start the notebook set the environment variable EXTRA_CLASSPATH with the path where you have located the hive-site.xml,
this works for me:EXTRA_CLASSPATH=/path_of_my_mysql_connector/mysql-connector-java.jar:/my_hive_site.xml_directory/conf ./bin/spark-notebook
I have also passed the jar of my mysqlconnector because I have Hive with MySql.
I have found some info from this link: https://github.com/andypetrella/spark-notebook/issues/351

Using CDH 5.5.0 Quickstart VM, the solution is the following: You need the reference hive-site.xmlto the notebook which provides the access information to the hive metastore. By default, spark-notebooks uses an internal metastore.
You can the define the following environmental variable in ~/.bash_profile:
HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hive/conf.cloudera.hive/
export HADOOP_CON_DIR
(Make sure you execute source ~/.bash_profile if you do not open a new terminal the terminal)
(The solution is given here: https://github.com/andypetrella/spark-notebook/issues/351)

load local data files into hive table failed when using hive

when i tried to load local data files into hive table,it report error while moving files.And i found the link,which give comments to fix this issue.I follow this step ,but it still can't work.
http://answers.mapr.com/questions/3565/getting-started-with-hive-load-the-data-from-sample-table txt-into-the-table-fails
After mkdir /user/hive/tmp,and set hive.exec.scratchdir= /user/hive/tmp,it still report RuntimeException Cannot make directory:file/user/hive/tmp/hive_2013* How can I fix this issue?Who are familiar with hive can help me?Thanks!
hive version is 0.10.0
hadoop version is 1.1.2

I suspect a permission issue here, because you are using MapR distribution.
Make sure that the user trying to create the directory has permissions to create the directory on CLDB.
Easy way to debug here is to do
$hadoop fs -chmod -R 777 /user/hive
and then try to load the data, to confirm if it's permission issue.

Sqoop - Could not find or load main class org.apache.sqoop.Sqoop

I installed Hadoop, Hive, HBase, Sqoop and added them to the PATH.
When I try to execute sqoop command, I'm getting this error:
Error: Could not find or load main class org.apache.sqoop.Sqoop
Development Environment:
OS : Ubuntu 12.04 64-bit
Hadoop Version: 1.0.4
Hive Version: 0.9.0
Hbase Version: 0.94.5
Sqoop Version: 1.4.3

make sure you have sqoop-1.4.3.jar under your SQOOP HOME directory.
Note : May be because you had downloaded wrong distribution under Sqoop Distribution

I have resolved this issue on CentOS 6.3.
I have Hadoop-1.0.4, hbase-0.94.6, hive-0.10.0, pig-0.11.1, sqoop-1.4.3.bin__hadoop-1.0.0, zookeeper-3.4.5 installed.
I was also running same problem at sqoop: Error - Could not find the main class: org.apache.sqoop.Sqoop.
To resolve this issue I have copied the jar file: sqoop-1.4.3.jar from $SQOOP_HOME/ into the $HADOOP_HOME/lib/.
Hope this would help someone who struggling sqoop to be work with hadoop.

Unfortunately, I didn't find a complete answer for my problems. Current sqoop installation version I used was 1.4.6 . I am not sure about sqoop-1.4.6.tar.gz if one has to compile the source code, I was able to beat the same error Error - Could not find the main class: org.apache.sqoop.Sqoop using following instructions:
Instead I downloaded sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz from apache sqoop and installed it at /home/ubuntu/SQOOP/ renamed sqoop-1.4.6.bin__hadoop-2.0.4-alpha to sqoop. I wanted to use with Yarn.
Then export and set $SQOOP_HOME
I used this
export SQOOP_HOME=/home/ubuntu/SQOOP/sqoop/
export PATH=$PATH:$SQOOP_HOME/bin
Now if one go to $SQOOP_HOME/bin and try
./sqoop help
It should work without any issue.

The problem in my case was that hadoop-env.sh file has this line in it:
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
It seems that when you call sqoop it internally calls configure-sqoop which sets the HADOOP_CLASSPATH correctly but then when it (sqoop) calls hadoop, hadoop ignores that variable and reset it back to what is in hadooop-env.sh
The fix was to change the hadoop-env.sh to have this line instead:
export HADOOP_CLASSPATH="${JAVA_HOME}/lib/tools.jar:$HADOOP_CLASSPATH"

#user225003 solution magically worked and I looked into some of the files and here is what happens under the hood when you execute "sqoop" script.
The "sqoop" script essentially executes "hadoop" script from $HADOOP_COMMON_HOME/bin/ directory. While configuring sqoop, in "sqoop-env.sh" we set the $HADOOP_COMMON_HOME to hadoop installation directory. If your sqoop and hadoop installations are not in regular location /usr/local, I believe sqoop-x.x.x.jar is not in the hadoop script's classpath.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Vectorwise to Hive using Sqoop - hive

Okay, I got it working. It was a simple permission issue. I changed the owner of iijdbc.jar to hdfs. sudo chown hdfs /usr/lib/sqoop/lib/iijdbc.jar Now it's working! :) I can now import my Vectorwise tables to Hive using Sqoop. Great!

Related

Not able to start hiveserver2 for Apache Hive

How to access custom UDFs through Spark Thrift Server?

How to connect Spark-Notebook to Hive metastore?

load local data files into hive table failed when using hive

Sqoop - Could not find or load main class org.apache.sqoop.Sqoop

Categories

Resources