I am using Apache Pig. I am trying to load a comma separated file as a Pig table. It does not throw any error while loading the file.
But when I try to print that table using "dump" command, it gives error.
File I loaded
Error,fdgdf
Error,dfgdf
Error,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Info,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Debug,dfgdf
Command to load
logFile1 = LOAD 'PigTestFile' using PigStorage();
Command to print table
dump logFile1
Error I get
led Jobs:
JobId Alias Feature Message Outputs
job_1454617624671_0152 logFile1 MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs:
//ip-172-31-53-48.ec2.internal:8020/user/e1681fe26eed362777aabca1682510/PigTestFile
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-53-48.ec2.internal:8020/user/e1681fe26eed362777aabca1682510/PigTestFile
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
hdfs://ip-172-31-53-48.ec2.internal:8020/tmp/temp1258481141/tmp-1928081547,
:
:
2016-02-07 06:31:20,100 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-02-07 06:31:20,107 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias logFile1. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
[EDIT]
When I closely read the log I found that it is not able to find the file which was used to load the table. It is expecting it to be in HDFS. Where as my file was on local box.
I then moved the file into HDFS and then ran same commands. It worked well.
But then why did it not give error while executing "Load" command itself ??
As explained by Murali in his answer (which I have accepted) Map/ Reduce jobs for a script will get triggered only when STORE/ DUMP is encountered.
Here is more explanation about it from Apache Pig documentation
In general, Pig processes Pig Latin statements as follows:
First, Pig validates the syntax and semantics of all statements.
Next, if Pig encounters a DUMP or STORE, Pig will execute the statements.
In this example Pig will validate, but not execute, the LOAD and FOREACH statements.
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
In this example, Pig will validate and then execute the LOAD, FOREACH, and DUMP statements.
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);
B = FOREACH A GENERATE name;
DUMP B;
(John)
(Mary)
(Bill)
(Joe)
Map/ Reduce jobs for a script will get triggered only when STORE/ DUMP is encountered.
In this case, Map phase for LOAD command will start only when STORE/ DUMP is encountered in script.
Default execution mode is map reduce. If the file is in local path you have use local mode for execution.
pig -x local {pigfilename.pig}
Refer : https://pig.apache.org/docs/r0.9.1/start.html#execution-modes
Extract from above link :
Pig has two execution modes or exectypes:
Local Mode - To run Pig in local mode, you need access to a single
machine; all files are installed and run using your local host and
file system. Specify local mode using the -x flag (pig -x local).
Mapreduce Mode - To run Pig in mapreduce mode, you need access to a
Hadoop cluster and HDFS installation. Mapreduce mode is the default
mode; you can, but don't need to, specify it using the -x flag (pig OR
pig -x mapreduce).
Related
i have hive in hdinsight cluster and nifi in my local machine.
i am trying to execute a hive script from executeprocess processor which has properties set as below:
command: hive
command argument: -f /home/name/firstq.hql
Redirect Error Stream: true
i have controller services to hiveconnection pool. when i start the processor, the error is thrown as shown below:
o.a.n.processors.standard.ExecuteProcess ExecuteProcess[id=d5db18b2-0159-1000-6569-c054490cbfa5] Failed to create process due to java.io.IOException: Cannot run program "hive": CreateProcess error=2, The system cannot find the file specified: java.io.IOException: Cannot run program "hive": CreateProcess error=2, The system cannot find the file specified
org.apache.nifi.util.ReflectionUtils Failed while invoking annotated method 'public void org.apache.nifi.processors.standard.ExecuteProcess.shutdownExecutor()' with arguments '[]'.
i have tried the command argument by giving the local machine path too. though same error is thrown.
in the script, i am trying to insert one row into the existing table.
please help me what am i doing wrong.
thanks
I am trying to get nutch 1.11 to execute a crawl. I am using cygwin to run these commands in windows 7.
Nutch is running, I am getting results from running bin/nutch, but I keep getting error messages when I try to run a crawl.
I am getting the following error when I try to run a crawl execute with nutch:
Error running: /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl/crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/seed.txt
Failed with exit value 127.
I have my JAVA_HOME classpath set, and I have altered the host file to include the 127.0.0.1 as the localhost.
I am curious if I am calling the write directory correctly, if maybe that is the problem.
The full printout looks like:
User5#User5-PC /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ TestCrawl/ 2
Injecting seed URLs
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/
Injector: starting at 2015-12-23 17:48:21
Injector: crawlDb: TestCrawl/crawldb
Injector: urlDir: C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls
Injector: Converting injected urls to crawl db entries.
Injector: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
at org.apache.nutch.crawl.Injector.inject(Injector.java:323)
at org.apache.nutch.crawl.Injector.run(Injector.java:379)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Error running:
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/
Failed with exit value 127.
The hadoop log that I think may have something to do with the error I am getting is:
2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
2016-01-07 12:24:40,450 ERROR crawl.Injector - Injector: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.nutch.crawl.Injector.run(Injector.java:379)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3048)
at java.net.URI.<init>(URI.java:746)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 4 more
You are running linux commands from Cygwin and there is no C:\ path in linux systems. Correct command should be something like
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/bin/nutch inject TestCrawl/crawldb /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/urls/seed.txt
You have answer to your problem in this message:
2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
This is happening because hadoop version included with nutch 1.11 is designed to work in linux out of the box and not on windows.
I had same situation and I ended up using nutch1.11 in ubuntu virtual box.
hadoop-core jar file is needed when you are working with nutch
with nutch 1.11 compatible hadoop-core jar is 0.20.0
please download jar from this link :
http://www.java2s.com/Code/Jar/h/Downloadhadoop0200corejar.htm
paste that jar into "C:\cygwin64\home\apache-nutch-1.11\lib" folder
and it will run successfully.
The problem is pretty clear. According to your hadoop log, it cannnot find the winutils.exe file. Include winutils.exe in %HADOOP_HOME%/bin folder
When i run my pig script locally (pig -file script.pig -param INPUT=val -param OUTPUT=val) everything is working fine. But when I schedule my pig script using Oozie (Coordinator/Workflow) then script fails. I dont understand why...
Can someone help me out?
Pig script
alarms = LOAD '$INPUT' USING PigStorage('|', '-noschema') AS (
row_num:long,
timestamp:chararray,
protocol_name:chararray,
source_ip:chararray,
destination_ip:chararray,
source_port:int,
destination_port:int
);
alarms_projection = FOREACH alarms {
GENERATE
SUBSTRING(timestamp, 0, 10) as alarm_date:chararray,
SUBSTRING(timestamp, 11, 19) as alarm_time:chararray,
protocol_name,
source_ip,
destination_ip,
source_port,
destination_port;
}
STORE alarms_projection INTO '$OUTPUT' USING PigStorage('|');
ERROR
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exception invoking main(), Scheme not present in uri /etl/av/complete/alarms
org.apache.oozie.action.hadoop.LauncherException: Scheme not present in uri /etl/av/complete/alarms
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:177)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.oozie.action.hadoop.LauncherException: Scheme not present in uri /etl/av/complete/alarms
at org.apache.oozie.action.hadoop.LauncherURIHandlerFactory.getURIHandler(LauncherURIHandlerFactory.java:41)
at org.apache.oozie.action.hadoop.PrepareActionsDriver.doOperations(PrepareActionsDriver.java:65)
at org.apache.oozie.action.hadoop.LauncherMapper.executePrepare(LauncherMapper.java:444)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:173)
... 8 more
Oozie Launcher failed, finishing Hadoop job gracefully
It was a configuration error in the workflow.xml. I was using a prepare statement to empty the output directory. But instead of setting the path hdfs://node:port/path/to/the/file I used /path/to/the/files.
The right way of using prepare
<prepare>
<delete path="hdfs://node:8020/path/to/files"/>
</prepare>
We are deploying on one of our server and we have below error.
ERROR: tooltwist.fip.FipException Unknown response from server: 500: Internal Server Error
Exception: tooltwist.fip.FipException: tooltwist.fip.FipException: Unknown response from server: 500: Internal Server Error
Looking at the FIP log, it shows:
Error installing batch: tooltwist.fip.FipException: Pre-commit command failed: protected/pre_commit.sh
tooltwist.fip.FipException: Pre-commit command failed: protected/pre_commit.sh
at tooltwist.fip.FipServer_updateExecuter.commitTransaction(FipServer_updateExecuter.java:309)
at tooltwist.fip.FipServer_updateExecuter.prepareUpdates_1_3(FipServer_updateExecuter.java:250)
at tooltwist.fip.FipServer_updateExecuter.executeUpdates(FipServer_updateExecuter.java:142)
at tooltwist.fip.FipServer.destination_installBatchOfFiles(FipServer.java:199)
at tooltwist.fip.jetty.InstallBatchServlet.doPost(InstallBatchServlet.java:134)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:530)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:426)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:931)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:361)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:867)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
at org.eclipse.jetty.server.Server.handle(Server.java:337)
at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:581)
at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1020)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:775)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:228)
at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:417)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:474)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:437)
at java.lang.Thread.run(Thread.java:662)
Any idea about the error?
Couple of suggestions:
1. Does the pre_commit.sh shell script exist on the server.
2. Does it have +x permissions?
If FIP was installed using the normal way, it should not cause an issue.
The fipserver initially saves files it receives on the destination server, but into temporary locations. Once all of the files have been received and saved it runs a three step process to complete the installation:
1. Run a script named protected/pre-commit.sh. The normal operation of this script is to shut down the web server.
2. For each new file:
a) move any existing file to .fip-rollback-xxxxxx/filename.
b) move the new file from it's temporary location to the correct location.
3. Run a script named protected/post-commit.sh. This most commonly restarts the server.
The pre and post commit scripts are user provided. They should should normally exit with a status of zero, as any other status indicates that an error has occurred.
As suggested in the previous answer, check that these scripts exist, and that they are executable. If this fails to solve your problem, insert debug into the scripts to determine where and why they are failing.
I'm trying to create a Offline Backup of my HSQLDB (using HSQLDB 2.2.6.jar) as explained in the HSQLDB User Guide: http://hsqldb.org/doc/2.0/guide/management-chapt.html#N1400A
java -cp path/to/hsqldb.jar org.hsqldb.lib.tar.DbBackup --save \
tar/path.tar db/base/path
But I can't find out where the db/base/path is supposed to point. This is the remark in the User Guide:
db/base/path is the file path to the catalog file base name (in same fashion as in server.database.* settings and JDBC URLs with catalog type file:.
And that's the Error Message I get:
Exception in thread "main" java.io.FileNotFoundException: File not found:
path\to\hsqldb.jar.properties
at org.hsqldb.lib.tar.DbBackup.write(Unknown Source)
at org.hsqldb.lib.tar.DbBackup.main(Unknown Source)
The paths in the Guide must be replaced with the paths that you use. For example if you want to save the backup to the directory named /backupdir/ and your database files are named mydatabase and they are located in in /dbdir/, then the command is:
java -cp hsqldb.jar org.hsqldb.lib.tar.DbBackup --save /backupdir/mydatabase.tar /dbdir/mydatabase