hadoop hive loading error - apache

While loading files to hadoop via hive. I got following error:
Failed with exception org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not replicated yet:/tmp/hive-hadoop/hive_2012-11-22_19-31-25_550_6464715632657097841/-ext-10000/outlog_11_oct12.csv
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1257)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.CopyTask
And according to other threads, its datanode issue, but all of my datanodes are up and running.

I think the problem maybe lack of write permissions on /tmp directory.
Try hadoop dfs -chmod g+w /tmp ?

Related

Apache hive beeline error: JAVA Error: A JNI error has occurred, please check your installation and try again

When I run the beeline command that came with Apache Hive 3.1.2 I get an error that says:
PS C:\Users\bluet> beeline.cmd
File Not Found
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/jdbc/JdbcUriParseException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:650)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:632)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.jdbc.JdbcUriParseException
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more
This is what the beeline.cmd command looks like.
Hadoop is running smoothly including hdfs and yarn.
I am on Java 8.
What could be the problem?
I also received the same error. Then when i checked the file "beeline.cmd", i saw it needs hive-jdbc--standalone.jar**. So Downloading that and placing it in C:\Users-----\apache-hive-3.1.2-bin\lib solved the issue.

How to copy files from s3 to s3 same folder?

I am trying to combine log files from s3 to s3 using the following command.
s3-dist-cp --src s3://path/to/ym=2020/ --dest s3://path/to/ym=2020/ --groupBy='.*/(\d{8}).+(\.json) --deleteOnSuccess'
I have the following files in
s3://path/to/ym=2020/20200802010010.json
s3://path/to/ym=2020/20200802010020.json
s3://path/to/ym=2020/20200802010030.json
s3://path/to/ym=2020/20200802010040.json
s3://path/to/ym=2020/20200802010050.json
s3://path/to/ym=2020/20200803010010.json
s3://path/to/ym=2020/20200803010020.json
s3://path/to/ym=2020/20200803010030.json
s3://path/to/ym=2020/20200803010040.json
s3://path/to/ym=2020/20200803010050.json
expected result is
s3://path/to/ym=2020/20200802.json
s3://path/to/ym=2020/20200803.json
but I get the following errors
20/08/04 03:29:14 INFO s3distcp.S3DistCp: Created 1 files to copy 856 files
Exception in thread "main" java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.mkdirs(S3NativeFileSystem.java:1052)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1961)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.mkdirs(EmrFileSystem.java:443)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:893)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:728)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
The following command is working.(change dest path to other path)
s3-dist-cp --src s3://path/to/ym=2020/ --dest s3://path/to/archives/ym=2020/ --groupBy='.*/(\d{8}).+(\.json) --deleteOnSuccess'

Hive can't load file to table because it can't find it in hive warehouse

I can't load data to hive table and the logs show this problem
The file that I want to load:
> [hdfs#vmi200937 root]$ hdfs dfs -ls /suppression-files Found 1 items
> -rw-rw-rw- 3 hdfs hdfs 694218562 2018-12-21 05:06 /suppression-files/md5.txt
Hive directory:
> [hdfs#vmi200937 root]$ hdfs dfs -ls
> /apps/hive/warehouse/suppression.db Found 1 items drwxrwxrwx - hive
> hadoop 0 2018-12-21 06:30
> /apps/hive/warehouse/suppression.db/md5supp
Here is the Hive Query:
> hive (suppression)> LOAD DATA INPATH '/suppression-files/md5.txt' INTO
> TABLE md5supp;
Logs:
Loading data to table suppression.md5supp Failed with exception
java.io.FileNotFoundException: Directory/File does not exist
/apps/hive/warehouse/suppression.db/md5supp/md5.txt at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1901)
at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:82)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1877)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
FAILED: Execution Error, return code 40000 from
org.apache.hadoop.hive.ql.exec.MoveTask.
java.io.FileNotFoundException: Directory/File does not exist
/apps/hive/warehouse/suppression.db/md5supp/md5.txt at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1901)
at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:82)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1877)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
I found the solution !
I should just set the owner of the directory /suppression-file to hive:hdfs
by hdfs dfs chown -R hive:hdfs /suppression-file

not able to run HPL/SQL query from HIVE cli

I tried to run "dbms_output.put_line('This is HPL/Sql');" from Hive cli it has given below exception.
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1140)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'dbms_output' '.' 'put_line'
I am using Hive 2.1.0.
as per HPL/sql documentation HPL/SQL is included to Apache Hive since version 2.0.
Is there any additional configuration changes is required to enable hpl/sql support in hive.
We can't run HPL/SQL query directly from Hive Cli. we should use either
1. hplsql -e 'query' or 2. hplsql -e sql/hql file.
For Example-
hplsql -e 'dbms_output.put_line(`this is hplsql`)';
or
hplsql -e 'PRINT `this is hplsql`';
Write function in HPL/SQL, register it in hive and Use it.

Pig script running into jave heap space error 2997

Here is the pig stack trace. I am running my code over daily data individually, but it will fail some days. Others are done within 5 min. I have about 10 group all parallel 1 at the end to do some counts. Is this the reason of error?
Backend error message
Error: Java heap space
Pig Stack Trace
ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
org.apache.pig.backend.executionengine.ExecException: ERROR 0: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:819)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:452)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:282)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)
at org.apache.pig.PigServer.execute(PigServer.java:1405)
at org.apache.pig.PigServer.executeBatch(PigServer.java:456)
at org.apache.pig.PigServer.executeBatch(PigServer.java:439)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
at org.apache.pig.backend.hadoop.executionengine.Launcher.getErrorMessages(Launcher.java:215)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:803)
... 19 more
================================================================================
Pig Stack Trace
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:179)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)