hadoop hive loading error

hadoop hive loading error - apache

While loading files to hadoop via hive. I got following error:
Failed with exception org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:/tmp/hive-hadoop/hive_2012-11-22_19-31-25_550_6464715632657097841/-ext-10000/outlog_11_oct12.csv
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1257)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.CopyTask
And according to other threads, its datanode issue, but all of my datanodes are up and running.

I think the problem maybe lack of write permissions on /tmp directory.
Try hadoop dfs -chmod g+w /tmp ?

Related

Apache hive beeline error: JAVA Error: A JNI error has occurred, please check your installation and try again

When I run the beeline command that came with Apache Hive 3.1.2 I get an error that says:
PS C:\Users\bluet> beeline.cmd
File Not Found
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/jdbc/JdbcUriParseException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:650)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:632)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.jdbc.JdbcUriParseException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more
This is what the beeline.cmd command looks like.
Hadoop is running smoothly including hdfs and yarn.
I am on Java 8.
What could be the problem?

I also received the same error. Then when i checked the file "beeline.cmd", i saw it needs hive-jdbc--standalone.jar**. So Downloading that and placing it in C:\Users-----\apache-hive-3.1.2-bin\lib solved the issue.

How to copy files from s3 to s3 same folder?

I am trying to combine log files from s3 to s3 using the following command.
s3-dist-cp --src s3://path/to/ym=2020/ --dest s3://path/to/ym=2020/ --groupBy='.*/(\d{8}).+(\.json) --deleteOnSuccess'
I have the following files in
s3://path/to/ym=2020/20200802010010.json
s3://path/to/ym=2020/20200802010020.json
s3://path/to/ym=2020/20200802010030.json
s3://path/to/ym=2020/20200802010040.json
s3://path/to/ym=2020/20200802010050.json
s3://path/to/ym=2020/20200803010010.json
s3://path/to/ym=2020/20200803010020.json
s3://path/to/ym=2020/20200803010030.json
s3://path/to/ym=2020/20200803010040.json
s3://path/to/ym=2020/20200803010050.json
expected result is
s3://path/to/ym=2020/20200802.json
s3://path/to/ym=2020/20200803.json
but I get the following errors
20/08/04 03:29:14 INFO s3distcp.S3DistCp: Created 1 files to copy 856 files
Exception in thread "main" java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.mkdirs(S3NativeFileSystem.java:1052)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1961)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.mkdirs(EmrFileSystem.java:443)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:893)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:728)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
The following command is working.(change dest path to other path)
s3-dist-cp --src s3://path/to/ym=2020/ --dest s3://path/to/archives/ym=2020/ --groupBy='.*/(\d{8}).+(\.json) --deleteOnSuccess'

Hive can't load file to table because it can't find it in hive warehouse

I can't load data to hive table and the logs show this problem
The file that I want to load:
> [hdfs#vmi200937 root]$ hdfs dfs -ls /suppression-files Found 1 items
> -rw-rw-rw- 3 hdfs hdfs 694218562 2018-12-21 05:06 /suppression-files/md5.txt
Hive directory:
> [hdfs#vmi200937 root]$ hdfs dfs -ls
> /apps/hive/warehouse/suppression.db Found 1 items drwxrwxrwx - hive
> hadoop 0 2018-12-21 06:30
> /apps/hive/warehouse/suppression.db/md5supp
Here is the Hive Query:
> hive (suppression)> LOAD DATA INPATH '/suppression-files/md5.txt' INTO
> TABLE md5supp;
Logs:
Loading data to table suppression.md5supp Failed with exception
java.io.FileNotFoundException: Directory/File does not exist
/apps/hive/warehouse/suppression.db/md5supp/md5.txt at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1901)
at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:82)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1877)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
FAILED: Execution Error, return code 40000 from
org.apache.hadoop.hive.ql.exec.MoveTask.
java.io.FileNotFoundException: Directory/File does not exist
/apps/hive/warehouse/suppression.db/md5supp/md5.txt at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1901)
at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:82)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1877)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

I found the solution !
I should just set the owner of the directory /suppression-file to hive:hdfs
by hdfs dfs chown -R hive:hdfs /suppression-file

not able to run HPL/SQL query from HIVE cli

I tried to run "dbms_output.put_line('This is HPL/Sql');" from Hive cli it has given below exception.
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1140)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'dbms_output' '.' 'put_line'
I am using Hive 2.1.0.
as per HPL/sql documentation HPL/SQL is included to Apache Hive since version 2.0.
Is there any additional configuration changes is required to enable hpl/sql support in hive.

We can't run HPL/SQL query directly from Hive Cli. we should use either
1. hplsql -e 'query' or 2. hplsql -e sql/hql file.
For Example-
hplsql -e 'dbms_output.put_line(`this is hplsql`)';
or
hplsql -e 'PRINT `this is hplsql`';

Write function in HPL/SQL, register it in hive and Use it.

Pig script running into jave heap space error 2997

Here is the pig stack trace. I am running my code over daily data individually, but it will fail some days. Others are done within 5 min. I have about 10 group all parallel 1 at the end to do some counts. Is this the reason of error?
Backend error message
Error: Java heap space
Pig Stack Trace
ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
org.apache.pig.backend.executionengine.ExecException: ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:819)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:452)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:282)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)
at org.apache.pig.PigServer.execute(PigServer.java:1405)
at org.apache.pig.PigServer.executeBatch(PigServer.java:456)
at org.apache.pig.PigServer.executeBatch(PigServer.java:439)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
at org.apache.pig.backend.hadoop.executionengine.Launcher.getErrorMessages(Launcher.java:215)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:803)
... 19 more
================================================================================
Pig Stack Trace
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:179)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

hadoop hive loading error - apache

I think the problem maybe lack of write permissions on /tmp directory. Try hadoop dfs -chmod g+w /tmp ?

Related

Apache hive beeline error: JAVA Error: A JNI error has occurred, please check your installation and try again

How to copy files from s3 to s3 same folder?

Hive can't load file to table because it can't find it in hive warehouse

not able to run HPL/SQL query from HIVE cli

Pig script running into jave heap space error 2997

Categories

Resources