Configuring Nutch 2.3 with HSQL 2.3.3 - ClassNotFoundException : org/apache/avro/ipc/ByteBufferOutputStream - hsqldb

I'm getting ClassNotFoundException : org/apache/avro/ipc/ByteBufferOutputStream when I run apache Nutch with HSQLDB although I have all the avro related jar files under lib
avro-1.7.6.jar
avro-compiler-1.7.6.jar
avro-ipc-1.7.6.jar
avro-mapred-1.7.6.jar
This is what I did:
Got HSQLDB up and running
root#elephant hsqldb# sudo java -cp /home/hsqldb/hsqldb-2.3.3/hsqldb/lib/hsqldb.jar org.hsqldb.server.Server --props /home/hsqldb/hsqldb-2.3.3/hsqldb/conf/server.properties
[Server#372f7a8d]: [Thread[main,5,main]]: checkRunning(false) entered
[Server#372f7a8d]: [Thread[main,5,main]]: checkRunning(false) exited
[Server#372f7a8d]: Startup sequence initiated from main() method
[Server#372f7a8d]: Loaded properties from [/home/hsqldb/hsqldb-2.3.3/hsqldb/conf/server.properties]
[Server#372f7a8d]: Initiating startup sequence...
[Server#372f7a8d]: Server socket opened successfully in 28 ms.
[Server#372f7a8d]: Database [index=0, id=0, db=file:/home/hsqldb/hsqldb-2.3.3/hsqldb/data/nutch, alias=nutchdb] opened sucessfully in 1406 ms.
[Server#372f7a8d]: Startup sequence completed in 1438 ms.
[Server#372f7a8d]: 2015-12-26 18:30:13.841 HSQLDB server 2.3.3 is online on port 9001
[Server#372f7a8d]: To close normally, connect and execute SHUTDOWN SQL
[Server#372f7a8d]: From command line, use [Ctrl]+[C] to abort abruptly
Configured ivy/ivy.xml
uncommented below lines in ivy.xml
<dependency org="org.apache.gora" name="gora-core" rev="0.5" conf="*->default"/>
and
<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating"
conf="*->default" />
uncommented the below lines conf/gora.properites
###############################
# Default SqlStore properties #
###############################
gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver
gora.sqlstore.jdbc.url=jdbc:hsqldb:hsql://localhost/nutchdb
gora.sqlstore.jdbc.user=sa
gora.sqlstore.jdbc.password=
Ran ant build
ant runtime
Added configuration for nutch-site.xml
root#elephant conf# cat nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
</property>
<property>
<name>http.agent.name</name>
<value>NutchCrawler</value>
</property>
<property>
<name>http.robots.agents</name>
<value>NutchCrawler,*</value>
</property>
</configuration>
Created seed.txt under urls folder
Executed the nutch by injecting the urls
[root#elephant local]# bin/nutch inject urls/
InjectorJob: starting at 2015-12-26 19:11:24
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/ipc/ByteBufferOutputStream
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:259)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:93)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:77)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.ipc.ByteBufferOutputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more

Gora-sql is not supported. Due some licenses issues (if I am not wrong), it became disabled around Gora 0.2.
So I suggest you to use other storage like, for example, HBase.
How to get HBase up&running fast: read answer at https://stackoverflow.com/a/39837926/582789

Related

Failing Oozie Launcher, Cannot Run Program

I am trying to run an oozie workflow with a very basic script. The script itself is very simple, it takes a csv file and loads it into a table in impala. My workflow looks like this.
<workflow-app xmlns='uri:oozie:workflow:0.5' name='TEST'>
<global>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>${queueName}</value>
</property>
</configuration>
</global>
<start to='postLoad' />
<action name="postLoad">
<shell xmlns="uri:oozie:shell-action:0.3">
<exec>${nameNode}/${baseCodePath}/Util/Test.sh</exec>
<env-var>impalaConn=${impalaConn}</env-var>
<env-var>xferUser=${xferUser}</env-var>
<env-var>ingUser=${ingUser}</env-var>
<env-var>explTableName=${explTableName}</env-var>
<env-var>stagingPath=${stagingPath}</env-var>
<file>${nameNode}/${baseCodePath}/Util/Test.sh</file>
<file>${nameNode}/${commonCodePath}/Util/loadUsrEnv.sh</file>
</shell>
...
However when I run it, I always seem to get this error and I'm not sure why it cannot run the program. The directories/files are all pointed to the right places.
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], main() threw exception, Cannot run program "Test.sh" (in directory "/data13/yarn/nm/usercache/user/appcache/application"): error=2, No such file or directory
java.io.IOException: Cannot run program "Test.sh" (in directory "/data13/yarn/nm/usercache/user/appcache/application"): error=2, No such file or directory
Use
<exec>Test.sh</exec>
The <file> tag tells Oozie to download the file from a HDFS location to the YARN container directory

Oozie workflow: Hive action failed because of Tez

Running my script on a data node running the hive client tools is working. But when i schedule the hive script using Oozie than i get the Error as shown below.
I've set the tez.lib.uris in the tez-site.xml to hdfs:///apps/tez/,hdfs:///apps/tez/lib/
What I'm missing here?
Hive script:
USE av_raw;
LOAD DATA INPATH '${INPUT}' INTO TABLE alarms_stg;
INSERT INTO TABLE alarms PARTITION (year, month)
SELECT * FROM alarms_stg WHERE job_id = '${JOBID}';
Workflow action:
<!-- load processed data and store in hive -->
<action name="load-data">
<hive xmlns="uri:oozie:hive-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<script>load_data.hive</script>
<param>INPUT=${complete}</param>
<param>JOBID=${wf:actionData('stage-data')['hadoopJobs']}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
Error:
Log Type: stderr
Log Length: 3227
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/grid/5/hadoop/yarn/local/filecache/2418/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
Logging initialized using configuration in file:/grid/2/hadoop/yarn/local/usercache/hdfs/appcache/application_1417175595182_12259/container_1417175595182_12259_01_000002/hive-log4j.properties
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:358)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:316)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:277)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:137)
at org.apache.tez.client.TezSession.start(TezSession.java:105)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:185)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:356)
... 19 more
Please try to add tez.lib.uris=hdfs:///apps/tez/,hdfs:///apps/tez/lib/ in workflow.xml of your Oozie job
e.g) workflow.xml
<!-- load processed data and store in hive -->
<action name="load-data">
<hive xmlns="uri:oozie:hive-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez/,hdfs:///apps/tez/lib/</value>
</property>
</configuration>
<script>load_data.hive</script>
<param>INPUT=${complete}</param>
<param>JOBID=${wf:actionData('stage-data')['hadoopJobs']}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
Eventually you can try to add value of "tez.lib.uris" directly in "Workflow Settings" under "Hadoop Properties" .
tez.lib.uris = hdfs:///apps/tez/,hdfs:///apps/tez/lib/
Before you add it verify the correct value in tez-site.xml.

tomee persistence.xml in EJB JAR META-INF crashes my application deployment

I am using TomEE to deploy an EAR file, that contains one EJB JAR and one WAR.
I want to add entities using the default provider. I have created a resource in tomee.xml to use MySQL DB.
Then I would like to use entity manager so I am trying to create the following persistence.xml in the EJB JAR META-INF directory:
<persistence xmlns="http://java.sun.com/xml/ns/persistence" version="1.0">
<persistence-unit name="MyProjectDataBase" transaction-type="JTA">
<provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider>
<jta-data-source>MyProjectDS</jta-data-source>
<non-jta-data-source>MyProjectDSUnmanaged</non-jta-data-source>
<properties>
<property name="openjpa.jdbc.DBDictionary" value="mysql" />
<property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema" />
</properties>
</persistence-unit>
</persistence>
MyProject & MyProjectUnmanaged are the resources Ids I created in tomee.xml.
Once I add this persistence.xml I get the following exception in catalina.out and my app is not deployed:
SEVERE: Application could not be deployed: /Users/avitale/Development/apache-tomee-jaxrs-1.5.0/apps/projecteam-ear
org.apache.openejb.OpenEJBException: Creating application failed: /Users/avitale/Development/apache-tomee-jaxrs-1.5.0/apps/projecteam-ear: loader (instance of org/apache/catalina/loader/StandardClassLoader): attempted duplicate class definition for name: "org/apache/openejb/cdi/CdiPlugin"
at org.apache.openejb.assembler.classic.Assembler.createApplication(Assembler.java:940)
at org.apache.openejb.assembler.classic.Assembler.createApplication(Assembler.java:532)
at org.apache.openejb.assembler.classic.Assembler.buildContainerSystem(Assembler.java:433)
at org.apache.openejb.assembler.classic.Assembler.build(Assembler.java:341)
at org.apache.openejb.OpenEJB$Instance.<init>(OpenEJB.java:144)
at org.apache.openejb.OpenEJB.init(OpenEJB.java:290)
at org.apache.tomee.catalina.TomcatLoader.initialize(TomcatLoader.java:231)
at org.apache.tomee.catalina.TomcatLoader.init(TomcatLoader.java:131)
at org.apache.tomee.catalina.ServerListener.lifecycleEvent(ServerListener.java:113)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:401)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:110)
at org.apache.catalina.startup.Catalina.load(Catalina.java:633)
at org.apache.catalina.startup.Catalina.load(Catalina.java:658)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:281)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:450)
Caused by: java.lang.LinkageError: loader (instance of org/apache/catalina/loader/StandardClassLoader): attempted duplicate class definition for name: "org/apache/openejb/cdi/CdiPlugin"
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:295)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.apache.openejb.cdi.OptimizedLoaderService.loadWebBeansPlugins(OptimizedLoaderService.java:70)
at org.apache.openejb.cdi.OptimizedLoaderService.load(OptimizedLoaderService.java:53)
at org.apache.openejb.cdi.OptimizedLoaderService.load(OptimizedLoaderService.java:47)
at org.apache.webbeans.plugins.PluginLoader.startUp(PluginLoader.java:75)
at org.apache.openejb.cdi.OpenEJBLifecycle.startApplication(OpenEJBLifecycle.java:159)
at org.apache.openejb.cdi.ThreadSingletonServiceImpl.initialize(ThreadSingletonServiceImpl.java:150)
at org.apache.openejb.cdi.CdiBuilder.build(CdiBuilder.java:44)
at org.apache.openejb.assembler.classic.Assembler.createApplication(Assembler.java:794)
... 20 more
Once I remove the persistence.xml then the application is successfully deployed.
Please help me as I don't understand how to proceed :(
Thanks in advance.
Would be helpful if you could specify the version you are actually using.
More over, did you check you won't deliver TomEE or any related dependencies in your EAR file (I mean lib/ or WEB-INF/lib)?
For your information, TomEE 1.5.1 will be released shortly with some fixes around EAR deployments.

Using cargo to deploy war to remote Jboss7 server

I would like to use Cargo to deploy my maven generated war file to a remote JBoss Server that is already running.
I have configured my pom.xml like this:
...
<plugin>
<groupId>org.codehaus.cargo</groupId>
<artifactId>cargo-maven2-plugin</artifactId>
<dependencies>
<dependency>
<groupId>org.jboss.as</groupId>
<artifactId>jboss-as-controller-client</artifactId>
<version>7.1.0.Final</version>
</dependency>
</dependencies>
<configuration>
<!-- Container configuration -->
<container>
<timeout>300000</timeout> <!-- 5 minutes -->
<containerId>jboss71x</containerId>
<type>remote</type>
</container>
<!-- Configuration to use with the container -->
<configuration>
<type>runtime</type>
<properties>
<cargo.hostname><myIP></cargo.hostname>
<cargo.jboss.management.port>9999</cargo.jboss.management.port>
<cargo.remote.username><myUser></cargo.remote.username>
<cargo.remote.password><myPass></cargo.remote.password>
</properties>
</configuration>
<!-- Deployer configuration -->
<deployer>
<type>remote</type>
</deployer>
<!-- Deployables configuration -->
<deployables>
<deployable>
<groupId>de.<myGroup></groupId>
<artifactId><myArtifact></artifactId>
<type>war</type>
<pingURL>http://<myIP>:8080/<myContextPath></pingURL>
<pingTimeout>60000</pingTimeout>
</deployable>
</deployables>
</configuration>
</plugin>
...
Of course all variables of the form <my...> are filled out with the real values.
If I run this with the maven command:
mvn -X cargo:deploy
The maven console says:
...
Sep 12, 2012 5:06:12 PM org.xnio.nio.NioXnio <clinit>
INFO: XNIO NIO Implementation Version 3.0.3.GA
Sep 12, 2012 5:06:12 PM org.jboss.remoting3.EndpointImpl <clinit>
INFO: JBoss Remoting version 3.2.2.GA
[DEBUG] [swordCallbackHandler] Responded to a RealmCallback
[DEBUG] [swordCallbackHandler] Responded to a NameCallback
[DEBUG] [swordCallbackHandler] Responded to a PasswordCallback
[INFO] [Boss7xRemoteDeployer] The deployment has failed: org.codehaus.cargo.util.CargoException: Cannot deploy deployable org.codehaus.cargo.container.jboss.deployable.JBossWAR[myWar.war]
...
On the running JBoss server I can see a logging message in the console like this:
17:07:24,197 ERROR [org.jboss.remoting.remote.connection] (Remoting "pc09:MANAGEMENT" read-1) JBREM000200: Remote connection failed: java.io.IOException: Eine vorhandene Verbindung wurde vom Remotehost geschlossen.
(sorry for the german inside of the log. It basically says: An existing connection was closed by the remote host.)
Does anybody have an idea what could be wrong or how I could get more debug info from cargo to find out what exactly the problem is?
BTW: I have used the JBoss CLI access to that Jboss Server for Arquillian tests so I am pretty confident that the access to that Jboss server via CLI should be ok.
EDIT:
Seems like I have to undeploy first. To reassure that the access to the CLI works I just connected to it using the jboss-cli.bat. Then I just coincidentally undeployed the existing war and after that the deployment via cargo started to work.
(Can I close this question or mark it as resolved in any way?)

javax.naming.NameNotFoundException: Resource /WEB-INF/classes not found with maven tomcat plugin

i am running on maven 2 and i am using the Apache Tomcat Maven Plugin for tomcat 7
with the following configuration:
<plugin>
<groupId>org.apache.tomcat.maven</groupId>
<artifactId>tomcat7-maven-plugin</artifactId>
<version>2.0-beta-1</version>
<configuration>
<path>/${project.build.finalName}</path>
</configuration>
</plugin>
but when trying to run the application with mvn tomcat7:run
i am getting the following exception:
SEVERE: Unable to determine URL for WEB-INF/classes
javax.naming.NameNotFoundException: Resource /WEB-INF/classes not found
at org.apache.naming.resources.BaseDirContext.listBindings(BaseDirContext.java:733)
at org.apache.naming.resources.ProxyDirContext.listBindings(ProxyDirContext.java:546)
at org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1197)
at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:825)
at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:300)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5161)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1568)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1558)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
please advise how to fix this exception.
It seems that is a problem in the tomcat version bundled with the tomcat7-maven-plugin (which is fixed to 7.0.25). See this answer on a similar question.