I run hadoop jar /home/apache-nutch-2.3.1/runtime/deploy/apache-nutch-2.3.1.job org.apache.nutch.crawl.Crawl urls -dir crawl -depth 3 -topN 5
But I get the following error:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.nutch.crawl.Crawl
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:316)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
I created a urls/seed.text file in /home/apache-nutch-2.3.1/build/ that contains the following URLs:
http://nutch.apache.org
http://apache.org
and I edited conf/regex-urlfilter.txt as follow:
+^http://([a-z0-9]*\.)*apache.org/
The class org.apache.nutch.crawl.Crawl has been removed since version 1.8. It's recommended to run the shell script bin/crawl instead. It will launch Hadoop jobs for every step of a crawl: inject, generate, fetch, parse, etc.
Related
I am trying to run liquibase update in Windows by providing the details such as classpath, dialect,url,user, password and I am getting an sl4j.Logger class not found exception.
The command looks like this
java -jar liquibase.jar update --classpath=mypath ...
The sl4j jar is present in the lib package of my liquibase.
What should I do to solve the error.
The error log is as follows:-
Exception in thread "main" java.lang.NoClassDefFoundError:
org/slf4j/LoggerFactory at
liquibase.logging.core.Slf4JLoggerFactory.getLog(Slf4JLoggerFactory.java:9)
at liquibase.logging.LogService.getLog(LogService.java:39)
at liquibase.integration.commandline.Main.<clinit>(Main.java:61)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 3 more
When I try to run through bat file with the command
liquibase update --changeLogFile=C:\Desktop\migrate\update-test\changelog\databasechangelog.xml --driver=org.hibernate.dialect.PostgreSQLDialect --classpath=C:\Desktop\migrate\update-test\postgresql --url=jdbc:postgresql://myserver:5432/ --username=postgres --password=postgres --logLevel=debug
, I am getting unexpected params error
unexpected command parameters: [--changeLogFile=C:Desktop\migrate\update-test\changelog\databasechangelog.xml, --driver=org.hibernate.dialect.PostgreSQLDialect, --classpath=C:\migrate\update-test\postgresql, --url=jdbc:postgresql://myserver.COM:5432/, --username=postgres, --password=postgres]
What have I done wrong here?
In the future, it is helpful if you include the full error in your question.
That said, you might be seeing the same thing as in this question: unable to use liquibase standalone shell script
I'm using the spark-submit command as below:
spark-submit --class com.example.hdfs.spark.RawDataAdapter --master yarn --deploy-mode cluster --jars /home/hadoop/emr/deployment/server/emr-core-1.0-SNAPSHOT.jar home/hadoop/emr-spark-1.0-SNAPSHOT.jar hdfs://111.11.11.111:8020/user/hdfsinputfile.zip 8000
However, it gives me the error java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3. Though the IParser3.class is present in emr-core-1.0-SNAPSHOT.jar. I don't understand why it throws that error. I tried several ways but couldn't succeed. How can I resolve this?
I am able to run the same command in client mode and also as a standalone spark application. Getting this error only when in yarn cluster mode.
Exception from container-launch. Container id: container_e37_1526066605784_0014_02_000001 Exit code: 15 Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : g.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) 18/05/14 14:00:13 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.util.concurrent.ExecutionException: Boxed Error at scala.concurrent.impl.Promise$.resolver(Promise.scala:55) at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$resolveTry(Promise.scala:47) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244) at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:664) Caused by: java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.findClass(SimpleClassLoader.java:152) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input.RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) Failing this attempt. Failing the application.
Quoting from Spark Documentation :-
http://spark.apache.org/docs/latest/running-on-yarn.html
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application
So in cluster mode, the jar is executed on any available node so , so you can try these 2 ways :-
1) Copy the dependency jar to each node .
2) You can try to copy the jar to Distributed (HDFS system) and then use it .
For more details you can have a look into :
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
I am getting following error while running JUnit5 with SpringBoot.
Here is the stack trace that I get when I run
./gradlew clean test
:junitPlatformTest
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/util/ReflectionUtil
at org.apache.logging.log4j.jul.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:34)
at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:46)
at org.apache.logging.log4j.jul.LogManager.getLogger(LogManager.java:89)
at java.util.logging.LogManager.demandLogger(LogManager.java:551)
at java.util.logging.Logger.demandLogger(Logger.java:455)
at java.util.logging.Logger.getLogger(Logger.java:502)
at org.junit.platform.commons.logging.LoggerFactory$DelegatingLogger.<init>(LoggerFactory.java:62)
at org.junit.platform.commons.logging.LoggerFactory.getLogger(LoggerFactory.java:49)
at org.junit.platform.launcher.core.DefaultLauncher.<clinit>(DefaultLauncher.java:44)
at org.junit.platform.launcher.core.LauncherFactory.create(LauncherFactory.java:59)
at org.junit.platform.console.tasks.ConsoleTestExecutor.executeTests(ConsoleTestExecutor.java:61)
at org.junit.platform.console.tasks.ConsoleTestExecutor.lambda$execute$0(ConsoleTestExecutor.java:57)
at org.junit.platform.console.tasks.CustomContextClassLoaderExecutor.invoke(CustomContextClassLoaderExecutor.java:33)
at org.junit.platform.console.tasks.ConsoleTestExecutor.execute(ConsoleTestExecutor.java:57)
at org.junit.platform.console.ConsoleLauncher.executeTests(ConsoleLauncher.java:85)
at org.junit.platform.console.ConsoleLauncher.execute(ConsoleLauncher.java:75)
at org.junit.platform.console.ConsoleLauncher.execute(ConsoleLauncher.java:48)
at org.junit.platform.console.ConsoleLauncher.main(ConsoleLauncher.java:40)
Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.util.ReflectionUtil
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
The github url for above code is github link
This is caused by mixing different versions of Log4J on the test runtime classpath. Spring Boot pulls in 2.10.0, your build.gradle declares 2.6.2.
If you remove the version from your dependency definition, Spring Boot will define the version and it will work.
So, please replace these two lines
testRuntime("org.apache.logging.log4j:log4j-core:${log4jVersion}")
testRuntime("org.apache.logging.log4j:log4j-jul:${log4jVersion}")
with these:
testRuntime("org.apache.logging.log4j:log4j-core")
testRuntime("org.apache.logging.log4j:log4j-jul")
Alternatively, you can delete the logManager property from the junitPlatform extension and remove both dependencies.
I am using Lucene 5.4 and recently want to migrate a project to spring framework.
If I invoke my indexing code in a java main function it works no errors, but when deploy the code on Tomcat 9.0, it comes with the following error. The WEB-INF/lib folder has four Lucene jars, which are lucene-core-5.4.0.jar, lucene-facet-5.4.0.jar, lucene-queries-5.4.0.jar and lucene-queryparser-5.4.0.jar. I think these four jars should be enough for document indexing, right? Also I am using lucent 5.4, why the code try to find Lucene50Codec class rather than Lucene54Codec class?
Tomcat Exception report
message Handler processing failed; nested exception is java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene50/Lucene50Codec
description The server encountered an internal error that prevented it from fulfilling this request.
exception
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene50/Lucene50Codec
org.springframework.web.servlet.DispatcherServlet.triggerAfterCompletionWithError(DispatcherServlet.java:1302)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:977)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:969)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:860)
javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:845)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
root cause
java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene50/Lucene50Codec
java.lang.Class.getDeclaredConstructors0(Native Method)
java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
java.lang.Class.getConstructor0(Class.java:3075)
java.lang.Class.newInstance(Class.java:412)
org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:47)
org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:47)
org.apache.lucene.codecs.Codec.getDefault(Codec.java:140)
org.apache.lucene.index.LiveIndexWriterConfig.<init>(LiveIndexWriterConfig.java:120)
org.apache.lucene.index.IndexWriterConfig.<init>(IndexWriterConfig.java:140)
com.zhaoyun.r3ds.core.lucene.LuceneFactoryImpl.createWriter(LuceneFactoryImpl.java:113)
com.zhaoyun.r3ds.core.engine.SearchEngineImpl.getImageWriter(SearchEngineImpl.java:87)
com.zhaoyun.r3ds.core.engine.ImageEngine.addImageDocument(ImageEngine.java:50)
com.zhaoyun.r3ds.restful.controller.SemanticController.index(SemanticController.java:43)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:497)
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:222)
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:137)
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:814)
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:737)
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:969)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:860)
javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:845)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
The index might have been written with an earlier version of Lucene and the coded is no longer available in Lucene 5.4.
You need to include the lucene-backward-codecs-5.4.0.jar file as well.
Alternatively, you might have multiple versions of Lucene in Tomcats classpath where some are of Version 5.0 and some are of version 5.4. You should make sure, that there is only one version of Lucene on the classpath of Tomcat.
I have oozie workflow that does a nutch crawl I designed using hue.
All steps in the process work, except for indexing to solr.
The oozie action that defines the solrindex is as follows
`
<start to="solr-test"/>
<action name="solr-test">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>org.apache.nutch.indexer.IndexingJob</main-class>
<java-opts>solr.server.url=http://ip-redacted:8983/solr/raw</java-opts>
<arg>hdfs://ip-redacted:8020/user/admin/c</arg>
<arg>-dir</arg>
<arg>hdfs://ip-redacted:8020/user/admin/s000</arg>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
`
When I run the action I get the following error message
Main class [org.apache.oozie.action.hadoop.JavaMain], exit code [-1]
The locations hdfs://ip-redacted:8020/user/admin/c and
hdfs://ip-redacted:8020/user/admin/s000 are locations that contain the crawldb and the segments respectively.
The stderr of the job says ::
`Log Length: 122
Intercepting System.exit(-1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], exit code [-1]`
The syslog says::
`ERROR [main] org.apache.nutch.indexer.IndexingJob: Indexer: java.lang.RuntimeException: org.apache.nutch.indexer.IndexWriter not found.
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:51)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:100)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:55)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)`
have verified that the class exists in the apache-nutch-1.7.jar file.
And if I request hadoop to run as a map-reduce job in the command shell as follows::
`hadoop jar apache-nutch-1.7.jar org.apache.nutch.indexer.IndexingJob -D solr.server.url=http://ip-redacted:8983/solr/raw hdfs://ip-redacted:8020/user/admin/c -dir hdfs://ip-redacted:8020/user/admin/s000`
It works!! But, when I do it as a oozie job, created through Hue, it fails...
Also, other actions, like inject, generate, fetch, parse work fine in Hue. It's only solrindex step that fails and I don't know what to do to fix it. Any input on this will be great!
Did you put the Nutch jar (and dependencies if needed) in a 'lib' directory in the HDFS workspace of the workflow?
Ah, I'm beginning to loathe the packaging of Nutch!
Try extracting the classes/plugins folder from the job archive, copy it to HDFS (something like hdfs dfs -put -r plugins lib) and then add the HDFS path of the plugins folder to the "files" list of the indexing step.
Best,
Edoardo