Why the error getting when importing table data from Oracle DB to HIVE using Sqoop? - hive

I am getting below error while importing data from oracle DB to HIVE using Sqoop
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot run program "hive": error=2, No such file or directory
below is my command which I am executing.
sqoop import --connect jdbc:oracle:thin:#host:port/xe--username sa -- password sa --table SA.SHIVAMSAMPLE --hive-import -m 1
The data is getting created inside hdfs but hive tables are not getting created i.e a folder gets created inside (bin/hdfs dfs -ls) direct default directory.
when I will give explicitly path for warehouse then only it will store in warehouse directory like "user/hive/warehouse" after that also table not created and not loaded data.
I installed hadoop in "Amit/hadoop-2.6.5" and HIVe is "Amit/apache-hive-1.2.1-bin" and sqoop "Amit/sqoop-1.4.5-cdh5.3.2" in .bashrc I set the hadoop path only.
Is require to hive and sqoop as well.
When I set hive home in sqoop-env.sh file then above command run fine but table is not created and file is created inside hdfs /user/hive/warehouse/shivamsample
can you please tell me any extra conf require to resolve this issue.

Related

Hive CLI giving problem while starting it

When i run command hive it is only able to start from bin folder beacause metastore is created in bin only if i run it from home its not able to get start and shows error.
I have added these lines in my .bashrc file for hive
HIVE env variables
export HIVE_HOME=/opt/hadoop/hive/apache-hive-2.3.4-bin
export PATH=$HIVE_HOME/bin:$PATH
Can you try to setup path as mentioned below and retry,
user#ubuntu:~$ sudo gedit ~/.bashrc
Copy and paste the following lines at end of the file
# Set HIVE_HOME
export HIVE_HOME="/opt/hadoop/hive/apache-hive-2.3.4-bin"
PATH=$PATH:$HIVE_HOME/bin
export PATH
But here my suggestion is, instead of using hive command prompt try to use recommended way that is beeline client. If you have hiveserver2 configured you can connect using beeline client and query to hive.

Error while using sqoop to copy data to s3

I am using sqoop to copy a Postgres table to s3 using the following command
sqoop import -m 1 --connect jdbc:postgresql://xx.us-west-2.rds.amazonaws.com:5432/prod_db --username user_ro --password user_pwd --table content --target-dir s3://test/user/sqoop_test --as-avrodatafile
This works for the first time. Before the next execution I deleted the target directory using:
aws s3 rm s3://test/user/sqoop_test
Next execution of sqoop results in following error:
18/07/21 05:31:53 ERROR tool.ImportTool: Encountered IOException running import job: com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: Directory 'user/sqoop_test' present in the metadata but not s3
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:453)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:380)
I have also tried doing "emrfs delete..." followed by "emrfs import..." & "emrfs sync.." But that didn't help resolve the problem. Any help will be appreciated.

sqoop failed to overwrite

Im using the below command for importing data from sqlserver to Azure blob storage
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect "jdbc:sqlserver://server-IP;database=database_name;username=user;password=password"
--username test --password "password" --query "select top 5 * from employ where \$CONDITIONS" --delete-target-dir --target-dir 'wasb://sample#workingclusterblob.blob.core.windows.net/source/employ'
-m 1
getting below error
18/01/30 03:35:45 INFO tool.ImportTool: Destination directory wasb://sample#workingclusterblob.blob.core.windows.net/source/employ is not present, hence not deleting.
18/01/30 03:35:45 INFO mapreduce.ImportJobBase: Beginning query import.
18/01/30 03:35:46 INFO client.AHSProxy: Connecting to Application History server at headnodehost/10.0.0.19:10200
18/01/30 03:35:46 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory wasb://sample#workingclusterblob.blob.core.windows.net/source/employ already exists
Logs statement are confusing which tells both as not present for deleting and present while writing.
From Apache Sqoop user guide:
By default, imports go to a new target location. If the destination
directory already exists in HDFS, Sqoop will refuse to import and
overwrite that directory’s contents. If you use the --append argument,
Sqoop will import data to a temporary directory and then rename the
files into the normal target directory in a manner that does not
conflict with existing filenames in that directory.
I'm not in the Azure environment to reproduce and validate the solution, But Try adding the --append to your sqoop import and let me know.
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect "jdbc:sqlserver://server-IP;database=database_name;username=user;password=password"
--username test --password "password" --query "select top 5 * from employ where \$CONDITIONS" --delete-target-dir --append --target-dir 'wasb://sample#workingclusterblob.blob.core.windows.net/source/employ'
-m 1

How can import table from rdbms to to hive?

I am trying to import table from mysql to hive then i am getting following error.
can you please provide the solution for this.
bin/sqoop import --connect jdbc:mysql://202.63.155.22:3306/demo --username careuser --P --table caremanager --hive-import --verbose -m 1
3/12/30 02:42:05 WARN hive.TableDefWriter:
Column createddate had to be cast to a less precise type in Hive
13/12/30 02:42:05 ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Cannot run program "hive": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
Actually sqoop is trying to access hive for creating hive table but cant execute it. Is hive installed on your machine and is Hive home is set in PATH environment variable.Please verify it.Hope these will solve your problem.

How to move mysql data to Hive using sqoop2

I have successfully installed and data imported from MySql to the HDFS but I couldn't get the
data in the Hive. the corresponding data imported from the Mysql is found in Hdfs as comma separated format.
Older versions of sqoop using
sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import --hive-table foo.tb
In Sqoop2 i couldn't find an option to set the hive table options while creating the job or connectors
Please help...
Sqoop2 currently do not have integration with Hive and can import data only to HDFS. You can load them into Hive manually from there though (using LOAD DATA command). Or you can continue using Sqoop1 for the time being.