"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop

"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop - amazon-s3

I tried to import data from local postgres to s3 using sqoop. My command is
sqoop import --connect jdbc:postgresql://localhost/postgres --username username --password password --table table --driver org.postgresql.Driver --target-dir s3://xxxxxxxx/data/ -m 1```
I was able to import to a local directory, but failed for a s3 bucket.
The log is posted as below.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/Cellar/sqoop/1.4.6/libexec/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/05/25 23:39:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/05/25 23:39:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/05/25 23:39:51 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/05/25 23:39:51 INFO manager.SqlManager: Using default fetchSize of 1000
17/05/25 23:39:51 INFO tool.CodeGenTool: Beginning code generation
17/05/25 23:39:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:39:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:39:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local
Note: /tmp/sqoop-user/compile/faa4eb1b79a8e71f5c732c605f8968d8/table.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/05/25 23:40:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-user/compile/faa4eb1b79a8e71f5c732c605f8968d8/table.jar
17/05/25 23:40:00 INFO mapreduce.ImportJobBase: Beginning import of table
17/05/25 23:40:00 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
17/05/25 23:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/25 23:40:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/05/25 23:40:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM table AS t WHERE 1=0
17/05/25 23:40:01 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: s3
java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:164)
at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:156)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:160)
... 11 more
Really not sure how to add details to the log, or how to make log not look like code so it will not ask for more details. sorry.

Related

import clob data in parquet format with scoop

I'm trying to importe clob data with scoop in parquet format here is my command line:
sshpass -p ${MDP_MAPR} ssh -n ${USR_MAPR}#${CNX_MAPR} sqoop import -Dmapred.job.queue.name=root.leasing.dev --connect ${CNX_DB} --username ${USR_DB} --password ${MDP_DB} --query "${query}" --delete-target-dir --target-dir ${DST_HDFS}/${SOURCE}_${table} --hive-overwrite --hive-import --hive-table ${SOURCE}_${table} --hive-database ${DST_HIVE} --hive-drop-import-delims -m 1 ${DRIVER_DB} --as-parquetfile >>${ficTrace} 2>&1
but it doesn't work and I can't find why, here is the log I get from it's execution:
Warning: /opt/mapr/sqoop/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/07/09 14:44:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-mapr-1703
18/07/09 14:44:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/07/09 14:44:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/07/09 14:44:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hive/hive-2.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/07/09 14:44:43 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/07/09 14:44:43 INFO manager.SqlManager: Using default fetchSize of 1000
18/07/09 14:44:43 INFO tool.CodeGenTool: Beginning code generation
18/07/09 14:44:44 INFO manager.OracleManager: Time zone has been set to GMT
18/07/09 14:44:44 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:44 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:44 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/mapr/hadoop/hadoop-2.7.0
Note: /tmp/sqoop-mapr/compile/2b49a98afbeb2ac1135adc84c66cf092/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/07/09 14:44:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-mapr/compile/2b49a98afbeb2ac1135adc84c66cf092/QueryResult.jar
18/07/09 14:44:53 INFO tool.ImportTool: Destination directory /app/list/datum/data/calf_hors_prod-cluster/datum/dev/leasing/tmp_sqoop/DE_DECISIONS is not present, hence not deleting.
18/07/09 14:44:53 INFO mapreduce.ImportJobBase: Beginning query import.
18/07/09 14:44:53 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/07/09 14:44:53 INFO mapreduce.JobBase: Setting default value for hadoop.job.history.user.location=none
18/07/09 14:44:53 INFO manager.OracleManager: Time zone has been set to GMT
18/07/09 14:44:53 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:53 INFO manager.SqlManager: Executing SQL statement: select * from doe.DE_DECISIONS where (1 = 0)
18/07/09 14:44:54 ERROR tool.ImportTool: Imported Failed: Cannot convert SQL type 2005
thank you for your help.

You can try adding this at the end of your Sqoop command:
--map-column-java <ORACLE_CLOB_COLUMN_NAME>=String
For example, if the Oracle table has a column named BODY of type CLOB, add this at the end:
--map-column-java BODY=String
This will provide guidance to Sqoop on Oracle CLOB type to Java type mapping.
If there are multiple columns, you can use this syntax pattern:
--map-column-java <ORACLE_CLOB_COLUMN_NAME_1>=String,<ORACLE_CLOB_COLUMN_NAME_2>=String

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

I am using HDP 2.6 Sandbox. I have created a user space with user root under hdfs group and executing following sqoop hive import and encountering following 2 errors:
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
However, data got imported correctly into hive table.
Please help me to understand the significant of this error and how can I overcome this error.
[root#sandbox-hdp ~]# sqoop import \
> --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db \
> --username retail_dba \
> --password hadoop \
> --table departments \
> --hive-home /apps/hive/warehouse \
> --hive-import \
> --create-hive-table \
> --hive-table retail_db.departments \
> --target-dir /user/root/hive_import \
> --outdir java_files
Warning: /usr/hdp/2.6.3.0-235/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/14 09:42:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
18/01/14 09:42:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/14 09:42:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/14 09:42:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/14 09:42:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/14 09:42:38 INFO tool.CodeGenTool: Beginning code generation
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.3.0-235/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/14 09:42:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.jar
18/01/14 09:42:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/14 09:42:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/14 09:42:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/14 09:42:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/14 09:42:40 INFO mapreduce.ImportJobBase: Beginning import of departments
18/01/14 09:42:41 INFO client.RMProxy: Connecting to ResourceManager at sandbox-hdp.hortonworks.com/172.17.0.2:8032
18/01/14 09:42:42 INFO client.AHSProxy: Connecting to Application History server at sandbox-hdp.hortonworks.com/172.17.0.2:10200
18/01/14 09:42:46 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/14 09:42:46 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`department_id`), MAX(`department_id`) FROM `departments`
18/01/14 09:42:46 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 2 to: 7
18/01/14 09:42:46 INFO mapreduce.JobSubmitter: number of splits:4
18/01/14 09:42:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515818851132_0050
18/01/14 09:42:47 INFO impl.YarnClientImpl: Submitted application application_1515818851132_0050
18/01/14 09:42:47 INFO mapreduce.Job: The url to track the job: http://sandbox-hdp.hortonworks.com:8088/proxy/application_1515818851132_0050/
18/01/14 09:42:47 INFO mapreduce.Job: Running job: job_1515818851132_0050
18/01/14 09:42:55 INFO mapreduce.Job: Job job_1515818851132_0050 running in uber mode : false
18/01/14 09:42:55 INFO mapreduce.Job: map 0% reduce 0%
18/01/14 09:43:05 INFO mapreduce.Job: map 25% reduce 0%
18/01/14 09:43:09 INFO mapreduce.Job: map 50% reduce 0%
18/01/14 09:43:12 INFO mapreduce.Job: map 75% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: map 100% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: Job job_1515818851132_0050 completed successfully
18/01/14 09:43:16 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=682132
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=481
HDFS: Number of bytes written=60
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=44760
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=44760
Total vcore-milliseconds taken by all map tasks=44760
Total megabyte-milliseconds taken by all map tasks=11190000
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=481
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1284
CPU time spent (ms)=5360
Physical memory (bytes) snapshot=561950720
Virtual memory (bytes) snapshot=8531210240
Total committed heap usage (bytes)=176685056
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=60
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Transferred 60 bytes in 34.7351 seconds (1.7274 bytes/sec)
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
18/01/14 09:43:16 WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:284)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
18/01/14 09:43:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:43:16 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/hdp/2.6.3.0-235/hive/lib/hive-common-1.2.1000.2.6.3.0-235.jar!/hive-log4j.properties
OK
Time taken: 10.427 seconds
Loading data to table retail_db.departments
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1873) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

The first error
WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
You need to check if Sqoop binaries are ok. Better copy them again so you don't need to ckeck file by file.
The second error
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop
Is because you are executing sqoop with "root" user. Change it to user that exists in the hadoop cluster.

Two ideas
ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
There is a class missing somewhere.
And I see you're trying to run you sqoop command using your root account under LINUX. Make sure root belong to hdfs group. I'm not sure root is included by default.

Sometimes null values will not handle by Sqoop while importing data into Hive from RDBMS so you should handle them explicitly by using the following keys:
--null-string and --null-non-string
Complete command is
sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db --username retail_dba --password hadoop --table departments --hive-home /apps/hive/warehouse --null-string 'na' --null-non-string 'na' --hive-import --create-hive-table --hive-table retail_db.departments --target-dir /user/root/hive_import

It's occurrence is due to the field in in /etc/hive/conf/hive-site.xml:
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
Set the value to false and try to run the same query,
Or else make the --target-dir /user/root/hive_import to read/write access directory or remove it, it will take the hive home directory

Unable to initialize hive with Derby from Brew install

It had been my understanding that Derby creates file(s) in the current directory. But there are none there.
So I had tried to do the hive initialization using Derby: but .. it seems there is a derby database already.
schematool --verbose -initSchema -dbType derby
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Connecting to jdbc:derby:;databaseName=metastore_db;create=true
Connected to: Apache Derby (version 10.10.2.0 - (1582446))
Driver: Apache Derby Embedded JDBC Driver (version 10.10.2.0 - (1582446))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:derby:> !autocommit on
Autocommit status: true
0: jdbc:derby:> CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii'
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
Closing: 0: jdbc:derby:;databaseName=metastore_db;create=true
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:291)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:264)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:505)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Schema script failed, errorcode 2
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:390)
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:347)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:287)
So .. where is it?
Update I have reinstalled hive from scratch using
brew reinstall hive
And the same error occurs.
Another update Given the new direction of this error it now is answered by within another question:
An answer to a non-os/x - but similar otherwise - question was found that can serve here:
https://stackoverflow.com/a/40017753/1056563
I installed hive with HomeBrew(MacOS) at /usr/local/Cellar/hive and afer running schematool -dbType derby -initSchema I get the following error message:
Starting metastore schema initialization to 2.0.0 Initialization script hive-schema-2.0.0.derby.sql Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
However, I can't find either metastore_db or metastore_db.tmp folder under install path, so I tried:
find /usr/ -name hive-schema-2.0.0.derby.sql
vi /usr/local/Cellar/hive/2.0.1/libexec/scripts/metastore/upgrade/derby/hive-schema-2.0.0.derby.sql
comment the 'NUCLEUS_ASCII' function and 'NUCLEUS_MATCHES' function
rerun schematool -dbType derby -initSchema, then everything goes well!

Homebrew installs Hive (version 2.3.1) unconfigured. The default settings are to use in-process Derby database (Hive already includes the required lib).
The only thing you have to do (immediatelly after brew install hive) is to initialize the database:
schematool -initSchema -dbType derby
and then you can run hive, and it will work. However, if you tried to run hive before initializing the database, Hive will actually semi-create an incomplete database and will fail to work:
show tables;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Since the database is semi-created, schematool will now fail as well:
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
To fix that, you will have to delete the database:
rm -Rf metastore_db
and run the initilization command again.
Noticed that I deleted the metastore_db from current directory? This is another problem: Hive is configured to create and use the Derby database in current working dir. This is because it has the following default value for ‘javax.jdo.option.ConnectionURL’:
jdbc:derby:;databaseName=metastore_db;create=true
To fix that, create file /usr/local/opt/hive/libexec/conf/hive-site.xml as
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:/usr/local/var/hive/metastore_db;create=true</value>
</property>
</configuration>
and recreate the database like before. Now the database is in /usr/local/var/hive, so in case you again accidentally ran hive before initializing the DB, delete it with:
rm -Rf /usr/local/var/hive

You might have to look at the hive configuration file. That should tell you where it is being initialized.

Sqoop hive import from oozie

I am using Cloudera Quick Start Docker image
The quickstart image has mysql installed in it. When i use following sqoop command from command line to import categories table it works and i can see that categories table is created
sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
Then i logged into Hue as cloudera user and i did create a new oozie workflow with single sqoop task, but when i try to execute that sqoop is able to download the data into HDFS, but when it tries to create hive table on top of that it fails
This is how my workflow.xml looks like
<workflow-app name="My_Workflow" xmlns="uri:oozie:workflow:0.5" xmlns:sla="uri:oozie:sla:0.2">
<start to="sqoop-4467"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="sqoop-4467">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<command>import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera -m 1 --table categories --hive-import --hive-overwrite
</command>
</sqoop>
<ok to="End"/>
<error to="Kill"/>
<sla:info>
<sla:nominal-time>${nominal_time}</sla:nominal-time>
<sla:should-end>${30 * MINUTES}</sla:should-end>
</sla:info>
</action>
<end name="End"/>
</workflow-app>
This is how my job.properties file looks like
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
nominal_time=2016-12-20T20:53Z
jobTracker=quickstart.cloudera:8032
After the job failed, when i checked the /user/home/cloudera folder i can see the categories folder with data but i dont see the hive table being created. This is the error that i see in the jobhistory server for the failed job
Sqoop command arguments :
import
--connect
jdbc:mysql://localhost/retail_db
--username
root
--password
cloudera
-m
1
--table
categories
--hive-import
--hive-overwrite
Fetching child yarn jobs
tag id : oozie-3ff81b7743470e73dcb44de6729a66d9
Child yarn jobs are found -
=================================================================
>>> Invoking Sqoop command line now >>>
6223 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6302 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0
6336 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
6336 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.BaseSqoopTool - Using Hive-specific delimiters for output. You can override
6336 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.BaseSqoopTool - delimiters with --fields-terminated-by, etc.
6367 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
6654 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Preparing to use a MySQL streaming resultset.
6666 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
7250 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7279 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
7281 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
9303 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/4fd8773510dfe4082d136b2ab7d27eb3/categories.jar
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - It looks like you are importing from mysql.
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - This transfer can be faster! Use the --direct
9314 [uber-SubtaskRunner] WARN org.apache.sqoop.manager.MySQLManager - option to exercise a MySQL-specific fast path.
9314 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.MySQLManager - Setting zero DATETIME behavior to convertToNull (mysql)
9318 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of categories
9388 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
10238 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
29055 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 1.0049 KB in 19.659 seconds (52.3425 bytes/sec)
29061 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 58 records.
29076 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
29097 [uber-SubtaskRunner] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000012-161221020706124-oozie-oozi-W/sqoop-4467--sqoop/action-data.seq
Oozie Launcher ends

did you copy the hive-site.xml to HDFS ?that will do or you can import the table to hdfs path using --target-dir and set the location of hive table to point that path

Failed to schematool -initSchema -dbType derby

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Failed to create database 'metastore_db', see the next exception for details.
SQL Error code: 40000
Use --verbose for detailed stacktrace.
* schemaTool failed *

FYI,
please check the permission on hive installation directory.
hive installation directory should be owned by the same user that is for hadoop.
that's how it worked for me.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

"No FileSystem for scheme: s3" when importing data from postgres to s3 using sqoop - amazon-s3

Related

import clob data in parquet format with scoop

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

Unable to initialize hive with Derby from Brew install

Sqoop hive import from oozie

Failed to schematool -initSchema -dbType derby

Categories

Resources