Error while using sqoop to copy data to s3 - amazon-s3

I am using sqoop to copy a Postgres table to s3 using the following command
sqoop import -m 1 --connect jdbc:postgresql://xx.us-west-2.rds.amazonaws.com:5432/prod_db --username user_ro --password user_pwd --table content --target-dir s3://test/user/sqoop_test --as-avrodatafile
This works for the first time. Before the next execution I deleted the target directory using:
aws s3 rm s3://test/user/sqoop_test
Next execution of sqoop results in following error:
18/07/21 05:31:53 ERROR tool.ImportTool: Encountered IOException running import job: com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: Directory 'user/sqoop_test' present in the metadata but not s3
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:453)
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:380)
I have also tried doing "emrfs delete..." followed by "emrfs import..." & "emrfs sync.." But that didn't help resolve the problem. Any help will be appreciated.

Related

Why the error getting when importing table data from Oracle DB to HIVE using Sqoop?

I am getting below error while importing data from oracle DB to HIVE using Sqoop
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot run program "hive": error=2, No such file or directory
below is my command which I am executing.
sqoop import --connect jdbc:oracle:thin:#host:port/xe--username sa -- password sa --table SA.SHIVAMSAMPLE --hive-import -m 1
The data is getting created inside hdfs but hive tables are not getting created i.e a folder gets created inside (bin/hdfs dfs -ls) direct default directory.
when I will give explicitly path for warehouse then only it will store in warehouse directory like "user/hive/warehouse" after that also table not created and not loaded data.
I installed hadoop in "Amit/hadoop-2.6.5" and HIVe is "Amit/apache-hive-1.2.1-bin" and sqoop "Amit/sqoop-1.4.5-cdh5.3.2" in .bashrc I set the hadoop path only.
Is require to hive and sqoop as well.
When I set hive home in sqoop-env.sh file then above command run fine but table is not created and file is created inside hdfs /user/hive/warehouse/shivamsample
can you please tell me any extra conf require to resolve this issue.

sqoop all table from mysql to Hive import

I am trying to import all tables from mysql schema to hive by using blow sqoop query:-
sqoop import-all-tables --connect jdbc:mysql://ip-172-31-20-247:3306/retail_db --username sqoopuser -P --hive-import --hive-import --create-hive-table -m 3
it is saying ,
18/09/01 09:24:52 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://ip-172-31-35-141.ec2.internal:8020/user/kumarrupesh2389619/categories already exists
Run the below command
hdfs dfs -rmr /user/kumarrupesh2389619/categories
Your command is failing since the directory already exists.

sqoop failed to overwrite

Im using the below command for importing data from sqlserver to Azure blob storage
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect "jdbc:sqlserver://server-IP;database=database_name;username=user;password=password"
--username test --password "password" --query "select top 5 * from employ where \$CONDITIONS" --delete-target-dir --target-dir 'wasb://sample#workingclusterblob.blob.core.windows.net/source/employ'
-m 1
getting below error
18/01/30 03:35:45 INFO tool.ImportTool: Destination directory wasb://sample#workingclusterblob.blob.core.windows.net/source/employ is not present, hence not deleting.
18/01/30 03:35:45 INFO mapreduce.ImportJobBase: Beginning query import.
18/01/30 03:35:46 INFO client.AHSProxy: Connecting to Application History server at headnodehost/10.0.0.19:10200
18/01/30 03:35:46 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory wasb://sample#workingclusterblob.blob.core.windows.net/source/employ already exists
Logs statement are confusing which tells both as not present for deleting and present while writing.
From Apache Sqoop user guide:
By default, imports go to a new target location. If the destination
directory already exists in HDFS, Sqoop will refuse to import and
overwrite that directory’s contents. If you use the --append argument,
Sqoop will import data to a temporary directory and then rename the
files into the normal target directory in a manner that does not
conflict with existing filenames in that directory.
I'm not in the Azure environment to reproduce and validate the solution, But Try adding the --append to your sqoop import and let me know.
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect "jdbc:sqlserver://server-IP;database=database_name;username=user;password=password"
--username test --password "password" --query "select top 5 * from employ where \$CONDITIONS" --delete-target-dir --append --target-dir 'wasb://sample#workingclusterblob.blob.core.windows.net/source/employ'
-m 1

Sqoop import statement

The below mentioned sqoop import statements worked for me the other day and today the same statements is showing error. Below is the error and import statement.
ERROR:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'sqoop import
sqoop import \
--connect jdbc:mysql://localhost/loudacre \
--username training --password training \
--table device \
-- target-dir /loudacre/device
-m 1
You are missing where $CONDITIONS' in your import command. where $CONDITIONS' is mandatory and $CONDITIONS should be in upper case. Pls do this correction and try again.
you have missed "\" at the end of fourth line:
can you run this and check it once.
sqoop import \
--connect jdbc:mysql://localhost/loudacre \
--username training --password training \
--table device \
--target-dir /loudacre/device \
-m 1
Yes you may face the error saying that --target-dir /loudacre/device already exists. when you run the command for the first as there is no target-dir defined in hdfs the script runs fine but for the second time as target-dir is already available in hdfs dir it throws error saying dir already exists.
Solutions to resolve this error:
1. Give another new directory or delete existing directory and try script again.
2. you can also import mysql data in sqoop append mode or overwride mode. Refer the below link for reference:
Using sqoop import, How to append rows into existing hive table?
Kindly let me know if it works.
Error solved guys. Turns out when you copy paste the command from notepad to terminal, " changes and results into an error.We need to explicitly write " in the temrinal.

How can import table from rdbms to to hive?

I am trying to import table from mysql to hive then i am getting following error.
can you please provide the solution for this.
bin/sqoop import --connect jdbc:mysql://202.63.155.22:3306/demo --username careuser --P --table caremanager --hive-import --verbose -m 1
3/12/30 02:42:05 WARN hive.TableDefWriter:
Column createddate had to be cast to a less precise type in Hive
13/12/30 02:42:05 ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Cannot run program "hive": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
Actually sqoop is trying to access hive for creating hive table but cant execute it. Is hive installed on your machine and is Hive home is set in PATH environment variable.Please verify it.Hope these will solve your problem.