Nutch problems executing crawl - apache

I am trying to get nutch 1.11 to execute a crawl. I am using cygwin to run these commands in windows 7.
Nutch is running, I am getting results from running bin/nutch, but I keep getting error messages when I try to run a crawl.
I am getting the following error when I try to run a crawl execute with nutch:
Error running: /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl/crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/seed.txt
Failed with exit value 127.
I have my JAVA_HOME classpath set, and I have altered the host file to include the 127.0.0.1 as the localhost.
I am curious if I am calling the write directory correctly, if maybe that is the problem.
The full printout looks like:
User5#User5-PC /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ TestCrawl/ 2
Injecting seed URLs
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/
Injector: starting at 2015-12-23 17:48:21
Injector: crawlDb: TestCrawl/crawldb
Injector: urlDir: C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls
Injector: Converting injected urls to crawl db entries.
Injector: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
at org.apache.nutch.crawl.Injector.inject(Injector.java:323)
at org.apache.nutch.crawl.Injector.run(Injector.java:379)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Error running:
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/
Failed with exit value 127.
The hadoop log that I think may have something to do with the error I am getting is:
2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
2016-01-07 12:24:40,450 ERROR crawl.Injector - Injector: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.nutch.crawl.Injector.run(Injector.java:379)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3048)
at java.net.URI.<init>(URI.java:746)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 4 more

You are running linux commands from Cygwin and there is no C:\ path in linux systems. Correct command should be something like
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/bin/nutch inject TestCrawl/crawldb /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/urls/seed.txt

You have answer to your problem in this message:
2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
This is happening because hadoop version included with nutch 1.11 is designed to work in linux out of the box and not on windows.
I had same situation and I ended up using nutch1.11 in ubuntu virtual box.

hadoop-core jar file is needed when you are working with nutch
with nutch 1.11 compatible hadoop-core jar is 0.20.0
please download jar from this link :
http://www.java2s.com/Code/Jar/h/Downloadhadoop0200corejar.htm
paste that jar into "C:\cygwin64\home\apache-nutch-1.11\lib" folder
and it will run successfully.

The problem is pretty clear. According to your hadoop log, it cannnot find the winutils.exe file. Include winutils.exe in %HADOOP_HOME%/bin folder

Related

Flink 1.10 not connecting to minio (s3)

I'm running locally a docker compose running flink and minio
When I try to connect to minio, I always get the following error:
caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
It seems that the plugin isn't loaded correctly.
my flink config (flink-conf.yaml):
state.backend: filesystem
s3.endpoint: http://minio:9000
s3.path.style.access: true
s3.access-key: minio
s3.secret-key: minio123
presto.s3.access-key: minio
presto.s3.secret-key: minio123
presto.s3.endpoint: http://minio:9000
presto.s3.path-style-access: true
I've copied the required plugin as following:
mkdir -p plugins/s3-fs-presto
cp opt/flink-s3-fs-presto-*.jar plugins/s3-fs-presto
Any suggestions?
Stack trace:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7ae6657256719d8c32d76ba113fb35f0)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 21 more
Caused by: java.io.IOException: Error opening the Input Split s3://test/test.txt [0,3243]: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:995)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
... 2 more
#

JDeveloper 12c | IntegratedWebLogicServer | Error while building the default domain

I'm trying to run IntegratedWebLogicServer using JDeveloper Version 12.2.1.4.0, and I have the following message:
ERROR: An error occurred while building the default domain.
And the log shows:
BuildDefaultDomain1.py 2020-02-10 11:53:05
cmd.exe /c ""C:\Oracle\Middleware\Oracle_Home\oracle_common\common\bin\wlst.cmd" "C:\Users\nauana.nonato\AppData\Roaming\JDeveloper\system12.2.1.4.42.190911.2248\o.j2ee.adrs\BuildDefaultDomain1.py""
Cannot run program "cmd.exe" (in directory "C:\Oracle\Middleware\Oracle_Home\oracle_common\common\bin"): Malformed argument has embedded quote: "C:\Oracle\Middleware\Oracle_Home\oracle_common\common\bin\wlst.cmd" "C:\Users\nauana.nonato\AppData\Roaming\JDeveloper\system12.2.1.4.42.190911.2248\o.j2ee.adrs\BuildDefaultDomain1.py"
java.io.IOException: Cannot run program "cmd.exe" (in directory "C:\Oracle\Middleware\Oracle_Home\oracle_common\common\bin"): Malformed argument has embedded quote: "C:\Oracle\Middleware\Oracle_Home\oracle_common\common\bin\wlst.cmd" "C:\Users\nauana.nonato\AppData\Roaming\JDeveloper\system12.2.1.4.42.190911.2248\o.j2ee.adrs\BuildDefaultDomain1.py"
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at oracle.jdevimpl.adrs.weblogic.wlst.ScriptRunnerImpl.runScript(ScriptRunnerImpl.java:106)
at oracle.jdevimpl.adrs.weblogic.builder.DomainScriptRunnerImpl.runScript(DomainScriptRunnerImpl.java:146)
at oracle.jdevimpl.adrs.weblogic.builder.DefaultDomainBuilder.createDomain(DefaultDomainBuilder.java:606)
at oracle.jdevimpl.adrs.weblogic.builder.DefaultDomainBuilder.build(DefaultDomainBuilder.java:274)
at oracle.jdevimpl.adrs.weblogic.builder.DefaultDomainBuilder$1.run(DefaultDomainBuilder.java:225)
at org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:1443)
at org.netbeans.modules.openide.util.GlobalLookup.execute(GlobalLookup.java:68)
at org.openide.util.lookup.Lookups.executeWith(Lookups.java:303)
at org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:2058)
Caused by: java.lang.IllegalArgumentException: Malformed argument has embedded quote: "C:\Oracle\Middleware\Oracle_Home\oracle_common\common\bin\wlst.cmd" "C:\Users\nauana.nonato\AppData\Roaming\JDeveloper\system12.2.1.4.42.190911.2248\o.j2ee.adrs\BuildDefaultDomain1.py"
at java.lang.ProcessImpl.needsEscaping(ProcessImpl.java:279)
at java.lang.ProcessImpl.createCommandLine(ProcessImpl.java:202)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:436)
at java.lang.ProcessImpl.start(ProcessImpl.java:140)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 9 more
What should I do?
Follow the following steps to fix this issues in 12.2.1.4
Go to Oracle_Home\jdeveloper\ide\bin folder.
Edit ide.conf file.
Under # Other OSGi configuration options for locating bundles and boot delegation section (or any other section) add the following line
AddVMOption -Djdk.lang.Process.allowAmbiguousCommands=true
Restart JDeveloper.
I am having the same problem as you. I am installing Oralce SOA Quick Start 12c version 12.1
Version JDK: javac 1.8.0_251
To solve this problem. You set parameters in enviroment:
-Djdk.lang.Process.allowAmbiguousCommands=true
set JAVA_TOOL_OPTIONS=-Djdk.lang.Process.allowAmbiguousCommands=true
good luck: ManhKM

zeppelin interpreter error even after giving correct details

getting the below error
and i have given mysql settings in the interpreter:
com.mysql.jdbc.Driver
jdbc:mysql://:3306/
username and password
restarted interpreter and binded it, but still get the error
using commands: use and select commands
enter code herejava.lang.NullPointerException
at org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:201)
at org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:288)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Adding the jar to $ZEPPELIN_HOME/lib folder like user3921855 didn't work for me.
I got it working adding MYSQL connector to dependency section in the interpreter config (ex: Artifact : mysql:mysql-connector-java:5.1.38)
IMPORTANT :
You need to restart Zeppelin daemon for the interpreter to pick the new jar. Don't know why because it said it had restarted the sub-process. Might be a bug.
Stop / Start Reminder :
$ZEPPELIN_HOME/bin/zeppelin-daemon.sh stop
$ZEPPELIN_HOME/bin/zeppelin-daemon.sh start

Exception trying to create a new Weblogic Domain

I have installed Oracle WebLogic Server 12.1.2.0 per the instructions. When I run the ./configure.sh it asks if I want to create a domain, I said no. Further down the instructions it asks if I want to create a new domain and start WLS. The following commands are listed:
$ mkdir /home/myhome/mydomain
$ cd /home/myhome/mydomain
$ $JAVA_HOME/bin/java $JAVA_OPTIONS -Xmx1024m -XX:MaxPermSize=256m
weblogic.Server
when the command is executed the following exception is generated:
[tester#kohls-enterprise-dev gravityDomain]$ $JAVA_HOME/bin/java
$JAVA_OPTIONS -Xmx1024m -XX:MaxPermSize=256m weblogic.Server
Exception in thread "main" java.lang.NoClassDefFoundError:
weblogic/Server
Caused by: java.lang.ClassNotFoundException: weblogic.Server
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: weblogic.Server. Program will exit.
I have ensured that the MW_HOME is set per the instructions and have run the setWLSEnv.sh
Any suggestions?
Thank you in Advanced
O. Frank
You need to have weblogic.jar and many other jar files in the CLASSPATH. This is not the way to run a WebLogic Domain. You should create the domain and do it the prescribed way. If you really want to get low-key, include the following in your CLASSPATH
ant-all.jar
ant-contrib.jar
config-launch.jar
derbyclient.jar
derbynet.jar
tools.jar
weblogic.jar
weblogic_patch.jar
weblogic.server.modules_10.3.6.0.jar
weblogic_sp.jar
webservices.jar
xqrl.jar
Plus you would need a config.xml and additional files. Best would be to create a domain and scale down from there if you want.
You need to run configuration wizard: (config.sh)
a domain contains several files/directories, not only a few jar files.
the wizard will help you creating the domain (and the servers as well).
follow the steps from Oracle's documentation:
https://docs.oracle.com/middleware/1212/wls/WLDCW/newdom.htm#WLDCW109

why jmap command running with exception java.lang.reflect.InvocationTargetException? [duplicate]

I get the following exception when i take a heapdump using
jmap -F -dump:format=b,file=/tmp/heapdump/before.hprof 10737
Attaching to process ID 10737, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jmap.JMap.runTool(JMap.java:179)
at sun.tools.jmap.JMap.main(JMap.java:110)
Caused by: java.lang.RuntimeException: Type "nmethodBucket*", referenced in VMStructs::localHotSpotVMStructs in the remote VM, was not present in the remote VMStructs::localHotSpotVMTypes table (should have been caught in the debug build of that VM). Can not continue.
at sun.jvm.hotspot.HotSpotTypeDataBase.lookupOrFail(HotSpotTypeDataBase.java:361)
at sun.jvm.hotspot.HotSpotTypeDataBase.readVMStructs(HotSpotTypeDataBase.java:252)
at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:87)
at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568)
at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)
at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:77)
Anyone know how to resolve this ?
I was seeing the same error because my path to jmap wasn't the same as the path to the java process (i.e. targeting two different versions).
Running jmap with the full path to my JDK resolved it.
If OpenJDK is used, it requires installation of debuginfo-packages.
In Centos this works with
- sudo debuginfo-install java-1.8.0-openjdk
- or sudo yum install java-1.8.0-openjdk-debuginfo.x86_64
See
- https://bugzilla.redhat.com/show_bug.cgi?id=1010786#c15
- amazon linux - install openjdk-debuginfo?