import clob data in parquet format with scoop

import clob data in parquet format with scoop - sql

I'm trying to importe clob data with scoop in parquet format here is my command line:
sshpass -p ${MDP_MAPR} ssh -n ${USR_MAPR}#${CNX_MAPR} sqoop import -Dmapred.job.queue.name=root.leasing.dev --connect ${CNX_DB} --username ${USR_DB} --password ${MDP_DB} --query "${query}" --delete-target-dir --target-dir ${DST_HDFS}/${SOURCE}_${table} --hive-overwrite --hive-import --hive-table ${SOURCE}_${table} --hive-database ${DST_HIVE} --hive-drop-import-delims -m 1 ${DRIVER_DB} --as-parquetfile >>${ficTrace} 2>&1
but it doesn't work and I can't find why, here is the log I get from it's execution:
Warning: /opt/mapr/sqoop/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/07/09 14:44:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-mapr-1703
18/07/09 14:44:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/09 14:44:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/07/09 14:44:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-2.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/07/09 14:44:43 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/07/09 14:44:43 INFO manager.SqlManager: Using default fetchSize of 1000
18/07/09 14:44:43 INFO tool.CodeGenTool: Beginning code generation
18/07/09 14:44:44 INFO manager.OracleManager: Time zone has been set to GMT
18/07/09 14:44:44 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:44 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:44 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/mapr/hadoop/hadoop-2.7.0
Note: /tmp/sqoop-mapr/compile/2b49a98afbeb2ac1135adc84c66cf092/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/07/09 14:44:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-mapr/compile/2b49a98afbeb2ac1135adc84c66cf092/QueryResult.jar
18/07/09 14:44:53 INFO tool.ImportTool: Destination directory /app/list/datum/data/calf_hors_prod-cluster/datum/dev/leasing/tmp_sqoop/DE_DECISIONS is not present, hence not deleting.
18/07/09 14:44:53 INFO mapreduce.ImportJobBase: Beginning query import.
18/07/09 14:44:53 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/07/09 14:44:53 INFO mapreduce.JobBase: Setting default value for hadoop.job.history.user.location=none
18/07/09 14:44:53 INFO manager.OracleManager: Time zone has been set to GMT
18/07/09 14:44:53 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:53 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:54 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 2005
thank you for your help.

You can try adding this at the end of your Sqoop command:
--map-column-java <ORACLE_CLOB_COLUMN_NAME>=String
For example, if the Oracle table has a column named BODY of type CLOB, add this at the end:
--map-column-java BODY=String
This will provide guidance to Sqoop on Oracle CLOB type to Java type mapping.
If there are multiple columns, you can use this syntax pattern:
--map-column-java <ORACLE_CLOB_COLUMN_NAME_1>=String,<ORACLE_CLOB_COLUMN_NAME_2>=String

Related

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

I am using HDP 2.6 Sandbox. I have created a user space with user root under hdfs group and executing following sqoop hive import and encountering following 2 errors:
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
However, data got imported correctly into hive table.
Please help me to understand the significant of this error and how can I overcome this error.
[root#sandbox-hdp ~]# sqoop import \
> --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db \
> --username retail_dba \
> --password hadoop \
> --table departments \
> --hive-home /apps/hive/warehouse \
> --hive-import \
> --create-hive-table \
> --hive-table retail_db.departments \
> --target-dir /user/root/hive_import \
> --outdir java_files
Warning: /usr/hdp/2.6.3.0-235/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/14 09:42:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
18/01/14 09:42:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/14 09:42:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/14 09:42:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/14 09:42:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/14 09:42:38 INFO tool.CodeGenTool: Beginning code generation
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.3.0-235/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/14 09:42:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.jar
18/01/14 09:42:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/14 09:42:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/14 09:42:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/14 09:42:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/14 09:42:40 INFO mapreduce.ImportJobBase: Beginning import of departments
18/01/14 09:42:41 INFO client.RMProxy: Connecting to ResourceManager at sandbox-hdp.hortonworks.com/172.17.0.2:8032
18/01/14 09:42:42 INFO client.AHSProxy: Connecting to Application History server at sandbox-hdp.hortonworks.com/172.17.0.2:10200
18/01/14 09:42:46 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/14 09:42:46 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`department_id`), MAX(`department_id`) FROM `departments`
18/01/14 09:42:46 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 2 to: 7
18/01/14 09:42:46 INFO mapreduce.JobSubmitter: number of splits:4
18/01/14 09:42:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515818851132_0050
18/01/14 09:42:47 INFO impl.YarnClientImpl: Submitted application application_1515818851132_0050
18/01/14 09:42:47 INFO mapreduce.Job: The url to track the job: http://sandbox-hdp.hortonworks.com:8088/proxy/application_1515818851132_0050/
18/01/14 09:42:47 INFO mapreduce.Job: Running job: job_1515818851132_0050
18/01/14 09:42:55 INFO mapreduce.Job: Job job_1515818851132_0050 running in uber mode : false
18/01/14 09:42:55 INFO mapreduce.Job: map 0% reduce 0%
18/01/14 09:43:05 INFO mapreduce.Job: map 25% reduce 0%
18/01/14 09:43:09 INFO mapreduce.Job: map 50% reduce 0%
18/01/14 09:43:12 INFO mapreduce.Job: map 75% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: map 100% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: Job job_1515818851132_0050 completed successfully
18/01/14 09:43:16 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=682132
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=481
HDFS: Number of bytes written=60
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=44760
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=44760
Total vcore-milliseconds taken by all map tasks=44760
Total megabyte-milliseconds taken by all map tasks=11190000
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=481
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1284
CPU time spent (ms)=5360
Physical memory (bytes) snapshot=561950720
Virtual memory (bytes) snapshot=8531210240
Total committed heap usage (bytes)=176685056
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=60
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Transferred 60 bytes in 34.7351 seconds (1.7274 bytes/sec)
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
18/01/14 09:43:16 WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:284)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
18/01/14 09:43:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:43:16 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/hdp/2.6.3.0-235/hive/lib/hive-common-1.2.1000.2.6.3.0-235.jar!/hive-log4j.properties
OK
Time taken: 10.427 seconds
Loading data to table retail_db.departments
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1873) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

The first error
WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
You need to check if Sqoop binaries are ok. Better copy them again so you don't need to ckeck file by file.
The second error
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop
Is because you are executing sqoop with "root" user. Change it to user that exists in the hadoop cluster.

Two ideas
ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
There is a class missing somewhere.
And I see you're trying to run you sqoop command using your root account under LINUX. Make sure root belong to hdfs group. I'm not sure root is included by default.

Sometimes null values will not handle by Sqoop while importing data into Hive from RDBMS so you should handle them explicitly by using the following keys:
--null-string and --null-non-string
Complete command is
sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db --username retail_dba --password hadoop --table departments --hive-home /apps/hive/warehouse --null-string 'na' --null-non-string 'na' --hive-import --create-hive-table --hive-table retail_db.departments --target-dir /user/root/hive_import

It's occurrence is due to the field in in /etc/hive/conf/hive-site.xml:
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
Set the value to false and try to run the same query,
Or else make the --target-dir /user/root/hive_import to read/write access directory or remove it, it will take the hive home directory

message:Hive Schema version 1.2.0 does not match metastore's schema version 2.1.0 Metastore is not upgraded or corrupt

enviroment: spark2.11 hive2.2 hadoop2.8.2
hive shell run successfully! and hava no error or warning.
but when run application.sh, start failed
/usr/local/spark/bin/spark-submit \
--class cn.spark.sql.Demo \
--num-executors 3 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 3 \
--files /usr/local/hive/conf/hive-site.xml \
--driver-class-path /usr/local/hive/lib/mysql-connector-java.jar \
/usr/local/java/sql/sparkstudyjava.jar \
and the error tips:
Exception in thread "main" java.lang.IllegalArgumentException: Error while
instantiating 'org.apache.spark.sql.hive.HiveSessionState':
...
Caused by: java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveExternalCatalog':
...
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to
instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
...
Caused by: java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
...
Caused by: MetaException(message:Hive Schema version 1.2.0 does not match
metastore's schema version 2.1.0 Metastore is not upgraded or corrupt)
...
i try many method to solving this errors, but errors still occurs.
how to fix?

Probably hive is referring to another version of hive (which is configured differently).
Execute below command & see if output is different from /usr/local/hive.
$which hive
If both are same hive directories, add below properties in hive-site.xml.
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>

Sometimes the hive jar in spark could be different than hive version installed, to handle this scenario, You can Pass the jars and version in conf parameter to spark job submit
E.g.
--conf spark.sql.hive.metastore.version=2.3.0 --conf spark.sql.hive.metastore.jars=/home/apache-hive-2.3.6-bin/lib/*

I got the similar issue due to Spark & Hive Version mis-match.Spark 2.0 points to 1.2.0 Hive version and the default Hive i was using was 0.14.0. So by passing version while starting the pyspark will resolve the issue.
pyspark --master yarn --num-executors 1 --executor-memory 512M --conf spark.sql.hive.metastore.version=0.14.0 --conf spark.sql.hive.metastore.jars=/usr/local/hive/apache-hive-0.14.0-bin/*

"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop

I tried to import data from local postgres to s3 using sqoop. My command is
sqoop import --connect jdbc:postgresql://localhost/postgres --username username --password password --table table --driver org.postgresql.Driver --target-dir s3://xxxxxxxx/data/ -m 1```
I was able to import to a local directory, but failed for a s3 bucket.
The log is posted as below.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/05/25 23:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/05/25 23:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/05/25 23:39:51 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/05/25 23:39:51 INFO manager.SqlManager: Using default fetchSize of 1000
17/05/25 23:39:51 INFO tool.CodeGenTool: Beginning code generation
17/05/25 23:39:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:39:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:39:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local
Note: /tmp/sqoop-user/compile/faa4eb1b79a8e71f5c732c605f8968d8/table.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/05/25 23:40:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-user/compile/faa4eb1b79a8e71f5c732c605f8968d8/table.jar
17/05/25 23:40:00 INFO mapreduce.ImportJobBase: Beginning import of table
17/05/25 23:40:00 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
17/05/25 23:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/25 23:40:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/05/25 23:40:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:40:01 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: s3
java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:164)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:156)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:160)
... 11 more
Really not sure how to add details to the log, or how to make log not look like code so it will not ask for more details. sorry.

Sqoop hive import from oozie

I am using Cloudera Quick Start Docker image
The quickstart image has mysql installed in it. When i use following sqoop command from command line to import categories table it works and i can see that categories table is created
sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
Then i logged into Hue as cloudera user and i did create a new oozie workflow with single sqoop task, but when i try to execute that sqoop is able to download the data into HDFS, but when it tries to create hive table on top of that it fails
This is how my workflow.xml looks like
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5" xmlns:sla="uri:oozie:sla:0.2">
<start to="sqoop-4467"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-4467">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
</command>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
<sla:info>
<sla:nominal-time>${nominal_time}</sla:nominal-time>
<sla:should-end>${30 * MINUTES}</sla:should-end>
</sla:info>
</action>
<end name="End"/>
</workflow-app>
This is how my job.properties file looks like
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
nominal_time=2016-12-20T20:53Z
jobTracker=quickstart.cloudera:8032
After the job failed, when i checked the /user/home/cloudera folder i can see the categories folder with data but i dont see the hive table being created. This is the error that i see in the jobhistory server for the failed job
Sqoop command arguments :
import
--connect
jdbc:mysql://localhost/retail_db
--username
root
--password
cloudera
-m
1
--table
categories
--hive-import
--hive-overwrite
Fetching child yarn jobs
tag id : oozie-3ff81b7743470e73dcb44de6729a66d9
Child yarn jobs are found -
=================================================================
>>> Invoking Sqoop command line now >>>
6223 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6302 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0
6336 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
6336 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.BaseSqoopTool - Using Hive-specific delimiters for output. You can override
6336 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.BaseSqoopTool - delimiters with --fields-terminated-by, etc.
6367 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6654 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
6666 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
7250 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7279 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7281 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
9303 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/4fd8773510dfe4082d136b2ab7d27eb3/categories.jar
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
9314 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
9318 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of categories
9388 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
10238 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
29055 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 1.0049 KB in 19.659 seconds (52.3425 bytes/sec)
29061 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 58 records.
29076 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
29097 [uber-SubtaskRunner] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000012-161221020706124-oozie-oozi-W/sqoop-4467--sqoop/action-data.seq
Oozie Launcher ends

did you copy the hive-site.xml to HDFS ?that will do or you can import the table to hdfs path using --target-dir and set the location of hive table to point that path

Failed to schematool -initSchema -dbType derby

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Failed to create database 'metastore_db', see the next exception for details.
SQL Error code: 40000
Use --verbose for detailed stacktrace.
* schemaTool failed *

FYI,
please check the permission on hive installation directory.
hive installation directory should be owned by the same user that is for hadoop.
that's how it worked for me.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

import clob data in parquet format with scoop - sql

Related

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

message:Hive Schema version 1.2.0 does not match metastore's schema version 2.1.0 Metastore is not upgraded or corrupt

"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop

Sqoop hive import from oozie

Failed to schematool -initSchema -dbType derby

Categories

Resources