Sqoop hive import from oozie

Sqoop hive import from oozie - hive

I am using Cloudera Quick Start Docker image
The quickstart image has mysql installed in it. When i use following sqoop command from command line to import categories table it works and i can see that categories table is created
sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
Then i logged into Hue as cloudera user and i did create a new oozie workflow with single sqoop task, but when i try to execute that sqoop is able to download the data into HDFS, but when it tries to create hive table on top of that it fails
This is how my workflow.xml looks like
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5" xmlns:sla="uri:oozie:sla:0.2">
<start to="sqoop-4467"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-4467">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
</command>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
<sla:info>
<sla:nominal-time>${nominal_time}</sla:nominal-time>
<sla:should-end>${30 * MINUTES}</sla:should-end>
</sla:info>
</action>
<end name="End"/>
</workflow-app>
This is how my job.properties file looks like
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
nominal_time=2016-12-20T20:53Z
jobTracker=quickstart.cloudera:8032
After the job failed, when i checked the /user/home/cloudera folder i can see the categories folder with data but i dont see the hive table being created. This is the error that i see in the jobhistory server for the failed job
Sqoop command arguments :
import
--connect
jdbc:mysql://localhost/retail_db
--username
root
--password
cloudera
-m
1
--table
categories
--hive-import
--hive-overwrite
Fetching child yarn jobs
tag id : oozie-3ff81b7743470e73dcb44de6729a66d9
Child yarn jobs are found -
=================================================================
>>> Invoking Sqoop command line now >>>
6223 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6302 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0
6336 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
6336 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.BaseSqoopTool - Using Hive-specific delimiters for output. You can override
6336 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.BaseSqoopTool - delimiters with --fields-terminated-by, etc.
6367 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6654 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
6666 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
7250 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7279 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7281 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
9303 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/4fd8773510dfe4082d136b2ab7d27eb3/categories.jar
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
9314 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
9318 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of categories
9388 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
10238 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
29055 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 1.0049 KB in 19.659 seconds (52.3425 bytes/sec)
29061 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 58 records.
29076 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
29097 [uber-SubtaskRunner] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000012-161221020706124-oozie-oozi-W/sqoop-4467--sqoop/action-data.seq
Oozie Launcher ends

did you copy the hive-site.xml to HDFS ?that will do or you can import the table to hdfs path using --target-dir and set the location of hive table to point that path

Related

import clob data in parquet format with scoop

I'm trying to importe clob data with scoop in parquet format here is my command line:
sshpass -p ${MDP_MAPR} ssh -n ${USR_MAPR}#${CNX_MAPR} sqoop import -Dmapred.job.queue.name=root.leasing.dev --connect ${CNX_DB} --username ${USR_DB} --password ${MDP_DB} --query "${query}" --delete-target-dir --target-dir ${DST_HDFS}/${SOURCE}_${table} --hive-overwrite --hive-import --hive-table ${SOURCE}_${table} --hive-database ${DST_HIVE} --hive-drop-import-delims -m 1 ${DRIVER_DB} --as-parquetfile >>${ficTrace} 2>&1
but it doesn't work and I can't find why, here is the log I get from it's execution:
Warning: /opt/mapr/sqoop/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/07/09 14:44:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-mapr-1703
18/07/09 14:44:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/09 14:44:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/07/09 14:44:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-2.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/07/09 14:44:43 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/07/09 14:44:43 INFO manager.SqlManager: Using default fetchSize of 1000
18/07/09 14:44:43 INFO tool.CodeGenTool: Beginning code generation
18/07/09 14:44:44 INFO manager.OracleManager: Time zone has been set to GMT
18/07/09 14:44:44 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:44 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:44 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/mapr/hadoop/hadoop-2.7.0
Note: /tmp/sqoop-mapr/compile/2b49a98afbeb2ac1135adc84c66cf092/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/07/09 14:44:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-mapr/compile/2b49a98afbeb2ac1135adc84c66cf092/QueryResult.jar
18/07/09 14:44:53 INFO tool.ImportTool: Destination directory /app/list/datum/data/calf_hors_prod-cluster/datum/dev/leasing/tmp_sqoop/DE_DECISIONS is not present, hence not deleting.
18/07/09 14:44:53 INFO mapreduce.ImportJobBase: Beginning query import.
18/07/09 14:44:53 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/07/09 14:44:53 INFO mapreduce.JobBase: Setting default value for hadoop.job.history.user.location=none
18/07/09 14:44:53 INFO manager.OracleManager: Time zone has been set to GMT
18/07/09 14:44:53 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:53 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:54 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 2005
thank you for your help.

You can try adding this at the end of your Sqoop command:
--map-column-java <ORACLE_CLOB_COLUMN_NAME>=String
For example, if the Oracle table has a column named BODY of type CLOB, add this at the end:
--map-column-java BODY=String
This will provide guidance to Sqoop on Oracle CLOB type to Java type mapping.
If there are multiple columns, you can use this syntax pattern:
--map-column-java <ORACLE_CLOB_COLUMN_NAME_1>=String,<ORACLE_CLOB_COLUMN_NAME_2>=String

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS.
Start the hiveserver2 in the Ambari UI and check the contents of /var/log/hive/hiveserver2.log.
Below is the error log.
Any help would be appreciated.
Contents of hiveserver2.log
2018-03-08 04:41:53,345 WARN [main-EventThread]: server.HiveServer2 (HiveServer2.java:process(343)) - This instance of HiveServer2 has been removed from the list of server instances available for dynamic service discovery. The last client session has ended - will shutdown now.
2018-03-08 04:41:53,347 INFO [main]: zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x16203aad5af0040 closed
2018-03-08 04:41:53,347 INFO [main]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:stop(405)) - Shutting down HiveServer2
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2018-03-08 04:41:53,348 WARN [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(508)) - Error starting HiveServer2 on attempt 1, will retry in 60 seconds
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1520480101488_0046 failed 2 times due to AM Container for appattempt_1520480101488_0046_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://ip-10-0-91-7.ap-northeast-2.compute.internal:8088/cluster/app/application_1520480101488_0046 Then click on links to logs of each attempt.
Diagnostics: ExitCodeException exitCode=2: tar: Removing leading `/' from member names
tar: Skipping to next header
gzip: /hadoopfs/fs1/yarn/nodemanager/filecache/60_tmp/tmp_tez.tar.gz: invalid compressed data--format violated
tar: Exiting with failure status due to previous errors
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.startPool(TezSessionPoolManager.java:76)
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:488)
at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:87)
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:720)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I had exactly the same issue with HDP on AWS. FYI, In my case the issue was with HDP version 2.6.4.5-2. I'm going to show how I fixed using this version because it is the latest at this time.
As the error log shows the problem is with tez.tar.gz that is corrupted then YARN is unable to decompress it in the YARN container.
This tez.tar.gz file is copied from the hdfs:///hdp/apps/<hdp_version>/tez/tez.tar.gz.
To reproduce the error and confirm that this file is corrupted, you can run the following command:
sudo su
su hdfs
hdfs dfs -get /hdp/apps/2.6.4.5-2/tez.tar.gz
tar -xvzf tez.tar.gz
You will get the following error:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The fix is pretty simple, you must just replace the HDFS file with the one that you have on your local file-system running the following command:
hdfs dfs -rm /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
hdfs dfs -put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
Now restart Hive Server 2 service and done!
NOTE: If something similar happens with other services you can do the same thing. Please check the following link that has more details: https://community.hortonworks.com/articles/30096/foxing-broken-targz-and-jar-files-in-hdp-24.html
Hope this helps!

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

I am using HDP 2.6 Sandbox. I have created a user space with user root under hdfs group and executing following sqoop hive import and encountering following 2 errors:
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
However, data got imported correctly into hive table.
Please help me to understand the significant of this error and how can I overcome this error.
[root#sandbox-hdp ~]# sqoop import \
> --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db \
> --username retail_dba \
> --password hadoop \
> --table departments \
> --hive-home /apps/hive/warehouse \
> --hive-import \
> --create-hive-table \
> --hive-table retail_db.departments \
> --target-dir /user/root/hive_import \
> --outdir java_files
Warning: /usr/hdp/2.6.3.0-235/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/14 09:42:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
18/01/14 09:42:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/14 09:42:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/14 09:42:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/14 09:42:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/14 09:42:38 INFO tool.CodeGenTool: Beginning code generation
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.3.0-235/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/14 09:42:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.jar
18/01/14 09:42:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/14 09:42:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/14 09:42:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/14 09:42:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/14 09:42:40 INFO mapreduce.ImportJobBase: Beginning import of departments
18/01/14 09:42:41 INFO client.RMProxy: Connecting to ResourceManager at sandbox-hdp.hortonworks.com/172.17.0.2:8032
18/01/14 09:42:42 INFO client.AHSProxy: Connecting to Application History server at sandbox-hdp.hortonworks.com/172.17.0.2:10200
18/01/14 09:42:46 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/14 09:42:46 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`department_id`), MAX(`department_id`) FROM `departments`
18/01/14 09:42:46 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 2 to: 7
18/01/14 09:42:46 INFO mapreduce.JobSubmitter: number of splits:4
18/01/14 09:42:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515818851132_0050
18/01/14 09:42:47 INFO impl.YarnClientImpl: Submitted application application_1515818851132_0050
18/01/14 09:42:47 INFO mapreduce.Job: The url to track the job: http://sandbox-hdp.hortonworks.com:8088/proxy/application_1515818851132_0050/
18/01/14 09:42:47 INFO mapreduce.Job: Running job: job_1515818851132_0050
18/01/14 09:42:55 INFO mapreduce.Job: Job job_1515818851132_0050 running in uber mode : false
18/01/14 09:42:55 INFO mapreduce.Job: map 0% reduce 0%
18/01/14 09:43:05 INFO mapreduce.Job: map 25% reduce 0%
18/01/14 09:43:09 INFO mapreduce.Job: map 50% reduce 0%
18/01/14 09:43:12 INFO mapreduce.Job: map 75% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: map 100% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: Job job_1515818851132_0050 completed successfully
18/01/14 09:43:16 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=682132
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=481
HDFS: Number of bytes written=60
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=44760
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=44760
Total vcore-milliseconds taken by all map tasks=44760
Total megabyte-milliseconds taken by all map tasks=11190000
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=481
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1284
CPU time spent (ms)=5360
Physical memory (bytes) snapshot=561950720
Virtual memory (bytes) snapshot=8531210240
Total committed heap usage (bytes)=176685056
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=60
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Transferred 60 bytes in 34.7351 seconds (1.7274 bytes/sec)
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
18/01/14 09:43:16 WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:284)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
18/01/14 09:43:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:43:16 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/hdp/2.6.3.0-235/hive/lib/hive-common-1.2.1000.2.6.3.0-235.jar!/hive-log4j.properties
OK
Time taken: 10.427 seconds
Loading data to table retail_db.departments
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1873) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

The first error
WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
You need to check if Sqoop binaries are ok. Better copy them again so you don't need to ckeck file by file.
The second error
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop
Is because you are executing sqoop with "root" user. Change it to user that exists in the hadoop cluster.

Two ideas
ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
There is a class missing somewhere.
And I see you're trying to run you sqoop command using your root account under LINUX. Make sure root belong to hdfs group. I'm not sure root is included by default.

Sometimes null values will not handle by Sqoop while importing data into Hive from RDBMS so you should handle them explicitly by using the following keys:
--null-string and --null-non-string
Complete command is
sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db --username retail_dba --password hadoop --table departments --hive-home /apps/hive/warehouse --null-string 'na' --null-non-string 'na' --hive-import --create-hive-table --hive-table retail_db.departments --target-dir /user/root/hive_import

It's occurrence is due to the field in in /etc/hive/conf/hive-site.xml:
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
Set the value to false and try to run the same query,
Or else make the --target-dir /user/root/hive_import to read/write access directory or remove it, it will take the hive home directory

"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop

I tried to import data from local postgres to s3 using sqoop. My command is
sqoop import --connect jdbc:postgresql://localhost/postgres --username username --password password --table table --driver org.postgresql.Driver --target-dir s3://xxxxxxxx/data/ -m 1```
I was able to import to a local directory, but failed for a s3 bucket.
The log is posted as below.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/05/25 23:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/05/25 23:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/05/25 23:39:51 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/05/25 23:39:51 INFO manager.SqlManager: Using default fetchSize of 1000
17/05/25 23:39:51 INFO tool.CodeGenTool: Beginning code generation
17/05/25 23:39:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:39:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:39:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local
Note: /tmp/sqoop-user/compile/faa4eb1b79a8e71f5c732c605f8968d8/table.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/05/25 23:40:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-user/compile/faa4eb1b79a8e71f5c732c605f8968d8/table.jar
17/05/25 23:40:00 INFO mapreduce.ImportJobBase: Beginning import of table
17/05/25 23:40:00 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
17/05/25 23:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/25 23:40:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/05/25 23:40:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:40:01 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: s3
java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:164)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:156)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:160)
... 11 more
Really not sure how to add details to the log, or how to make log not look like code so it will not ask for more details. sorry.

Oozie Sqoop Issue

I am trying to run a oozie sqoop job to import from teradata to Hive.
Sqoop runs fine in CLI. But I am facing the issues in scheduling it with oozie.
Note: I am able to do shell actions in oozie and it works fine.
Find below the error logs and workflow
Error logs:
Log Type: stderr
Log Upload Time: Wed Feb 01 04:19:00 -0500 2017
Log Length: 513
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:WARN No appenders could be found for logger (org.apache.hadoop.yarn.client.RMProxy).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
No such sqoop tool: sqoop. See 'sqoop help'.
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Log Type: stdout
Log Upload Time: Wed Feb 01 04:19:00 -0500 2017
Log Length: 158473
Showing 4096 bytes of 158473 total. Click here for the full log.
curity.ShellBasedUnixGroupsMapping
dfs.client.domain.socket.data.traffic=false
dfs.client.read.shortcircuit.streams.cache.size=256
fs.s3a.connection.timeout=200000
dfs.datanode.block-pinning.enabled=false
mapreduce.job.end-notification.max.retry.interval=5000
yarn.acl.enable=true
yarn.nm.liveness-monitor.expiry-interval-ms=600000
mapreduce.application.classpath=$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH
mapreduce.input.fileinputformat.list-status.num-threads=1
dfs.client.mmap.cache.size=256
mapreduce.tasktracker.map.tasks.maximum=2
yarn.scheduler.fair.user-as-default-queue=true
yarn.timeline-service.ttl-enable=true
yarn.nodemanager.linux-container-executor.resources-handler.class=org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler
dfs.namenode.max.objects=0
dfs.namenode.service.handler.count=10
dfs.namenode.kerberos.principal.pattern=*
yarn.resourcemanager.state-store.max-completed-applications=${yarn.resourcemanager.max-completed-applications}
dfs.namenode.delegation.token.max-lifetime=604800000
mapreduce.job.classloader=false
yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000
mapreduce.job.hdfs-servers=${fs.defaultFS}
yarn.application.classpath=$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
dfs.datanode.hdfs-blocks-metadata.enabled=true
mapreduce.tasktracker.dns.nameserver=default
dfs.datanode.readahead.bytes=4193404
mapreduce.job.ubertask.maxreduces=1
dfs.image.compress=false
mapreduce.shuffle.ssl.enabled=false
yarn.log-aggregation-enable=false
mapreduce.tasktracker.report.address=127.0.0.1:0
mapreduce.tasktracker.http.threads=40
dfs.stream-buffer-size=4096
tfile.fs.output.buffer.size=262144
fs.permissions.umask-mode=022
dfs.client.datanode-restart.timeout=30
dfs.namenode.resource.du.reserved=104857600
yarn.resourcemanager.am.max-attempts=2
yarn.nodemanager.resource.percentage-physical-cpu-limit=100
ha.failover-controller.graceful-fence.connection.retries=1
mapreduce.job.speculative.speculative-cap-running-tasks=0.1
hadoop.proxyuser.hdfs.groups=*
dfs.datanode.drop.cache.behind.writes=false
hadoop.proxyuser.HTTP.hosts=*
hadoop.common.configuration.version=0.23.0
mapreduce.job.ubertask.enable=false
yarn.app.mapreduce.am.resource.cpu-vcores=1
dfs.namenode.replication.work.multiplier.per.iteration=2
mapreduce.job.acl-modify-job=
io.seqfile.local.dir=${hadoop.tmp.dir}/io/local
yarn.resourcemanager.system-metrics-publisher.enabled=false
fs.s3.sleepTimeSeconds=10
mapreduce.client.output.filter=FAILED
------------------------
Sqoop command arguments :
sqoop
import
--connect
"jdbc:teradata://xx.xxx.xx:xxxx/DATABASE=Database_name"
--verbose
--username
xxx
-password
'xxx'
--table
BILL_DETL_EXTRC
--split-by
EXTRC_RUN_ID
--m
1
--fields-terminated-by
,
--hive-import
--hive-table
OPS_TEST.bill_detl_extr213
--target-dir
/hadoop/dev/TD_archive/bill_detl_extrc
Fetching child yarn jobs
tag id : oozie-56ea2084fcb1d55591f8919b405f0be0
Child yarn jobs are found -
=================================================================
Invoking Sqoop command line now >>>
3324 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://namenode:8020/user/hadoopadm/oozie-oozi/0000039-170123205203054-oozie-oozi-W/sqoop-action--sqoop/action-data.seq
Oozie Launcher ends
Log Type: syslog
Log Upload Time: Wed Feb 01 04:19:00 -0500 2017
Log Length: 16065
Showing 4096 bytes of 16065 total. Click here for the full log.
adoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Job jar is not present. Not adding any jar to the list of resources.
2017-02-01 04:18:51,990 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file on the remote FS is /user/hadoopadm/.staging/job_1485220715968_0219/job.xml
2017-02-01 04:18:52,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #5 tokens and #1 secret keys for NM use for launching container
2017-02-01 04:18:52,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of containertokens_dob is 6
2017-02-01 04:18:52,074 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle token in serviceData
2017-02-01 04:18:52,174 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://svacld001.bcbsnc.com:8020]
2017-02-01 04:18:52,240 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapred.JobConf: Task java-opts do not specify heap size. Setting task attempt jvm max heap size to -Xmx820m
2017-02-01 04:18:52,243 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1485220715968_0219_m_000000_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED
2017-02-01 04:18:52,243 INFO [uber-EventHandler] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_LAUNCH for container container_1485220715968_0219_01_000001 taskAttempt attempt_1485220715968_0219_m_000000_0
2017-02-01 04:18:52,245 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: [attempt_1485220715968_0219_m_000000_0] using containerId: [container_1485220715968_0219_01_000001 on NM: [svacld005.bcbsnc.com:8041]
2017-02-01 04:18:52,246 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: mapreduce.cluster.local.dir for uber task: /disk1/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk10/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk11/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk12/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk2/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk3/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk4/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk5/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk6/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk7/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk8/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219,/disk9/yarn/nm/usercache/hadoopadm/appcache/application_1485220715968_0219
2017-02-01 04:18:52,247 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1485220715968_0219_m_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
2017-02-01 04:18:52,247 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1485220715968_0219_m_000000 Task Transitioned from SCHEDULED to RUNNING
2017-02-01 04:18:52,249 INFO [uber-SubtaskRunner] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-02-01 04:18:52,258 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-02-01 04:18:52,324 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.oozie.action.hadoop.OozieLauncherInputFormat$EmptySplit#9c73765
2017-02-01 04:18:52,329 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2017-02-01 04:18:52,340 INFO [uber-SubtaskRunner] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
WORKFLOW
<workflow-app xmlns="uri:oozie:workflow:0.5" name="oozie-wf">
<start to="sqoop-wf"/>
<action name="sqoop-wf">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>xx.xx.xx:8032</job-tracker>
<name-node>hdfs://xx.xxx.xx:8020</name-node>
<command>import --connect "jdbc:teradata://ip/DATABASE=EDW_EXTRC_TAB_HST" --connection-manager "com.cloudera.connector.teradata.TeradataManager" --verbose --username HADOOP -password 'xxxxx' --table BILL_DETL_EXTRC --split-by EXTRC_RUN_ID --m 1 --fields-terminated-by , --hive-import --hive-table OPS_TEST.bill_detl_extrc1 --target-dir /hadoop/dev/TD_archive/data/PDCRDATA_TEST/bill_detl_extrc </command>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
JOB PROPERTIES
oozie.wf.application.path=hdfs:///hadoop/dev/TD_archive/workflow1.xml
oozie.use.system.libpath=true
security_enabled=True
dryrun=False
jobtracker=xxx.xxx:8032
nameNode=hdfs://xx.xx:8020
NOTE:
We are using cloudera CDH5.5
All the necessary JARS (sqoop-connector-teradata-1.5c5.jar tdgssconfig.jar terajdbc4.jar) are placed in /var/lib/sqoop and as well as placed in HDFS too.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sqoop hive import from oozie - hive

did you copy the hive-site.xml to HDFS ?that will do or you can import the table to hdfs path using --target-dir and set the location of hive table to point that path

Related

import clob data in parquet format with scoop

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop

Oozie Sqoop Issue

Categories

Resources