Nutch 2.3 + Elasticsearch 2.1.1. Cannot load Elasticsearch dependencies - apache

I'm trying to integrate Nutch 2.3 in order to push data to the latest Elasticsearch 2.1.1.
I started updating versions and dependencies in the following files:
<plugin id="indexer-elastic" name="ElasticIndexWriter" version="1.0.0"
<library name="indexer-elastic.jar">
<export name="*" />
<library name="elasticsearch-2.1.1.jar"/>
<library name="hppc-0.7.1.jar"/>
<library name="jackson-core-2.6.2.jar"/>
<library name="jackson-dataformat-cbor-2.6.2.jar"/>
<library name="jackson-dataformat-smile-2.6.2.jar"/>
<library name="jackson-dataformat-yaml-2.6.2.jar"/>
<library name="guava-18.0.jar"/>
<library name="compress-lzf-1.0.2.jar"/>
<library name="t-digest-3.0.jar"/>
<library name="jsr166e-1.1.0.jar"/>
<library name="commons-cli-1.3.1.jar"/>
<library name="netty-3.10.5.Final.jar"/>
<library name="joda-time-2.8.2.jar"/>
<library name="lucene-analyzers-common-5.3.1.jar"/>
<library name="lucene-backward-codecs-5.3.1.jar"/>
<library name="lucene-core-5.3.1.jar"/>
<library name="lucene-highlighter-5.3.1.jar"/>
<library name="lucene-join-5.3.1.jar"/>
<library name="lucene-memory-5.3.1.jar"/>
<library name="lucene-queries-5.3.1.jar"/>
<library name="lucene-queryparser-5.3.1.jar"/>
<library name="lucene-spatial-5.3.1.jar"/>
<library name="lucene-suggest-5.3.1.jar"/>
<library name="HdrHistogram-2.1.6.jar"/>
<library name="joda-convert-1.2.jar"/>
<import plugin="nutch-extensionpoints" />
<extension id="org.apache.nutch.indexer.elastic"
name="Elasticsearch Index Writer"
<implementation id="ElasticIndexWriter"
class="org.apache.nutch.indexwriter.elastic.ElasticIndexWriter" />
<ivy-module version="1.0">
<info organisation="org.apache.nutch" module="${}">
<license name="Apache 2.0" />
<ivyauthor name="Apache Nutch Team" url="" />
<description>Apache Nutch</description>
<include file="../../..//ivy/ivy-configurations.xml" />
<!--get the artifact from our module name -->
<artifact conf="master" />
<dependency org="org.elasticsearch" name="elasticsearch"
rev="2.1.1" conf="*->default" />
<dependency org="" name="guava" rev="18.0" />
I also reworked org.apache.nutch.indexwriter.elastic.ElasticIndexWriter to work against the new interface of the elasticsearch 2.1.1 client.
So what is the problem?
It seems that the dependencies listed in indexer-elastic/plugin.xml are not loaded automatically at runtime. Therefore elasticsearch client cannot benefit from them and throws exceptions..
So i tried a different approach adding the dependencies one by one according to the exception it gives me in $NUTCH_ROOT/ivy/ivy.xml where the main dependencies of Apache Nutch are listed. That's not the right approach but it's kind of working.
How to deal with plugin dependencies?
What is the strategy for using newer version of a library in the plugin. For example Nutch uses Guava v11.0.2 but Elasticsearch 2.1.1 requires Guava v18.0. Although i'm specifying it explicitly in indexer-elastic/ivy.xml it seems to load the old version at runtime.

Plugin dependencies should be declared both in the plugin's ivy.xml and in the plugin.xml files. I haven't tested the files you included but cannot see anything immediately wrong with them. As you pointed out, declaring the deps in the main ivy file is not great.
See this note on how to upgrade the Tika plugin, the same logic applies to all the plugins.
As for resolving conflicts between the main dependencies and the ones from the plugins, unfortunately you'll have to deal with it yourself e.g. force the version you need in the main ivy.xml as Nutch does not handle the plugins as dependencies (in the Maven sense) of the main code.


apache ant ivy conditionally build a module

I have a usecase where in I had to create a new module in out project. Our main project has multiple modules and each module is a java project. We are using ivy for dependency resolution. Now the probkem is that in the new module , I had to use java 1.7 API (WatchService) which is not there in java 1.6. Now in build.xml I can check for the java version and accordingly build this new module depending on the java -version. The problem comes in ivy.xml of or main web project where I have to mention the jar file of the new module as a dependency to include in the generated war file. If the java version is 1.7 , then in that case problem wont be there as the jar will be build and the its dependency will be resolved. the problem arises when the java version is 1.6. The jar file wont be created and when its time to generate the war file, ivy wont be able to resolve the dependency as the jar file is not there. Maybe the approach that I am trying to apply here is not fine. Please advice me on how to work around this particular use case.
In ivy you can use configurations to maintain different sets of depenedencies:
<ivy-module version="2.0">
<info organisation="com.myspotontheweb" module="demo"/>
<conf name="compile_jdk7" description="Java JDK7 compile dependencies"/>
<conf name="compile_jdk6" description="Java JDK6 compile dependencies"/>
<!-- JDK7 dependencies -->
<dependency org="org.myorg" name="module1" rev="latest.integration" conf="compile_jdk7->default"/>
<dependency org="org.myorg" name="module2" rev="latest.integration" conf="compile_jdk7->default"/>
<dependency org="org.myorg" name="module3" rev="latest.integration" conf="compile_jdk7->default"/>
<!-- JDK6 dependencies -->
<dependency org="org.myorg" name="module1" rev="latest.integration" conf="compile_jdk6->default"/>
<dependency org="org.myorg" name="module3" rev="latest.integration" conf="compile_jdk6->default"/>
and in the build file use a condition task to choose which configuration is used at run-time to populate the classpath, using the cachepath task:
<project name="demo" default="compile" xmlns:ivy="antlib:org.apache.ivy.ant">
<condition property="compile.config" value="compile_jdk7">
<equals arg1="${}" arg2="1.7"/>
<condition property="compile.config" value="compile_jdk6">
<equals arg1="${}" arg2="1.6"/>
<target name="resolve" description="Use ivy to resolve classpaths">
<ivy:cachepath pathid="compile.path" conf="${compile.config}"/>
<target name="compile" depends="resolve" description="Compile code">
<javac ...... classpathref="compile.path"/>

Ivy/Maven Resolve: Don't pull transitive "provided" jars

I'm using Ivy for my projects, but we're using Artifactory as our jar repository. I actually use <ivy:makepom> Ant task to create a Maven pom.xml, so I can deploy the jars and wars back to my Maven repository via the Maven deploy:deploy workflow.
I build a big jar called common-all.jar that requires about 30 jars for its compilation. I specify about 10 jars, and Ivy pulls down the dependencies. As part of the compile process, I specify the log4j jar, and some JBoss jars. These jars, of course, will be provided by our environment.
With this Jar, I also a bunch of wars. I specify the common-all.jar as part of my dependency, and the 30 jars that common-all.jar requires are also pulled down. All is well and good.
The problem is when I build the war. I do not want the JBoss jars or the log4j jars included as part of the war. These will be provided by the environment. I've marked them as provided in the pom.xml file. when I build common-all.jar.
Now, the question is how do I specify that I want these when I compile the code for the war, but I don't want to include them in my war itself.
Here's a sample of my ivy.xml file.
How can I specify that the common-all.jar requires certain specific jars for compilation, but when I build it in a war, I don't want all of these jars
<ivy-module version="1.0">
<conf name="default" visibility="public"
description="The single built artifact. Nothing else"/>
<conf name="compile" visibility="public"
description="The master module and transitive dependencies"/>
<conf name="provided" visibility="public"
description="Needed for compile. Will be provided outside or war"/>
<conf name="runtime" visibility="public"
description="Not required for compile, but for runtime"
<conf name="default" visibility="public"
description="The default configuration"
<conf name="test" visibility="private"
description="Required for testing" extends="runtime"/>
<!-- Normal Compile Dependencies -->
<dependency org="ximpleware" name="vtd-xml"
rev="2.5" conf="compile->default"/>
<dependency org="com.travelclick" name="common-all"
rev="4.1" conf="compile->compile,runtime"/>
<!-- Testing -->
<dependency org="junit" name="junit"
rev="4.10" conf="test->default"/>
You haven't demonstrated how you declare the common-all dependency, so I'll make up the following example:
<dependency org="mygroup" name="common-all" rev="1.0" conf="compile->default;provided"/>
The magic is the configuration mapping:
The local "compile" configuration is mapped to the common module and its default (compile) scope dependencies, and
The local "provided" configuration is mapped to the common module and its provided scope dependencies.
Inside your build file the configurations are used as follows:
<project name="demo" default="build" xmlns:ivy="antlib:org.apache.ivy.ant">
<target name="resolve">
<ivy:cachepath pathid="compile.path" conf="compile"/>
<ivy:cachepath pathid="provided.path" conf="provided"/>
<target name="compile" depends="resolve">
<javac ...
<path refid="compile.path"/>
<path refid="provide.path"/>
<target name="build" depends="compile">
<ivy:retrieve pattern="build/lib/[artifact].[ext]" conf="runtime"/>
<war ...
<lib dir="build/lib"/>
<target name="clean">
<delete dir="build"/>

Keep Ivy from including test dependencies

Consider an ivy.xml like the following:
<ivy-module version="2.0">
<info organisation="" module="FooBar" />
<dependency org="net.sf.ehcache" name="ehcache-core" rev="2.2.0" />
When I run Ivy, it fetches all dependencies for EHCache, even testing dependencies. Specifically, it tries to pull in Hibernate 3.5.1 (which, in the POM file, is listed as a "test" dependency).
How do I prevent Ivy from including test dependencies? I could list it as an excluded dependency, but I don't want to have to do this for every test dependency. I'm new to Ivy and used to the way Maven does things. I was reading about configurations but I don't understand how this aspect of Maven's "scope" maps to "configurations."
You need to define the configuration of the dependency like:
<dependency org="net.sf.ehcache" name="ehcache-core" rev="2.2.0" conf="compile"/>
If you omit conf it is assumed, that you meant conf ="*", which will download all configurations for that dependency.
Here is a simple Example:
<conf name="test" visibility="public" />
<conf name="compile" visibility="public" />
<artifact name="${}" type="jar" conf="compile" ext="jar"/>
<artifact name="${}-test" type="jar" conf="test" ext="jar"/>
<!-- COMPILE -->
<dependency org="log4j" name="log4j" rev="1.2.14" conf="compile->*"/>
<dependency org="apache" name="commons-net" rev="2.0" conf="compile->*"/>
<dependency org="itext" name="itext" rev="1.4.6" conf="compile->*"/>
<dependency org="jsch" name="jsch" rev="0.1.29" conf="test->*"/>
<!-- TEST -->
In this example jsch will be included in the test and the compile configuration.
If you resolve this dependency later with conf ="compile" you will get all dependencies EXCEPT jsch.
If you resolve this dependency with conf ="test" you will get jsch only.
And if test would extend compile, you would get all jars.
<conf name="test" visibility="public" extends="compile" />
<conf name="compile" visibility="public" />

Issues using ivy

I am new bie to ivy.
I am using packager resolver and that packager resolver resolves the zip file, unzip it, extracts the jar file from it in temp build file, but it stays temporarily and only the jar file which i specified as a module name gets copied to destination rest of all are ignored. Is there a way i can get all the jar files? I use preseverBuildDirectories but is there a better way to do it?
Also is it possible for me to publish an artifact to svn using normal ivy? I got error while i was trying to use ivy 2.1.0 on XP using ant 1.8.0 java.illegalArguementException saying authorization failed. Is there a way i can work through ivy:publish?
Is there a way i can use ivy variable in packager.xml?
Thanks in advance,
1) Packager resolver
You need to include an ivy file for the repackaged module listing all the artifacts.
Here's my example that downloads the files associated with the Solr distribution
<settings defaultResolver="maven2"/>
<caches defaultCacheDir="${user.home}/.ivy2/cache"/>
<ibiblio name="maven2" m2compatible="true"/>
<packager name="repackage" buildRoot="${user.home}/.ivy2/packager/build" resourceCache="${user.home}/.ivy2/packager/cache" preserveBuildDirectories="false">
<ivy pattern="file:///${ivy.settings.dir}/packager/[organisation]/[module]/ivy-[revision].xml"/>
<artifact pattern="file:///${ivy.settings.dir}/packager/[organisation]/[module]/packager-[revision].xml"/>
<module organisation="org.apache.solr" name="solr" resolver="repackage"/>
Note how the packager resolver specifies a path to both an ivy and packager file.
The ivy file specifies the artifacts that are part of the package in the publications section.
<ivy-module version="2.0">
<info organisation="org.apache.solr" module="solr" revision="1.4.0"/>
<conf name="jars" description="Jars released with SOLR distribution"/>
<conf name="webapps" description="Web applications"/>
<!-- jars -->
<artifact name="solr-cell" conf="jars"/>
<artifact name="solr-clustering" conf="jars"/>
<artifact name="solr-core" conf="jars"/>
<artifact name="solr-dataimporthandler" conf="jars"/>
<artifact name="solr-dataimporthandler-extras" conf="jars"/>
<!-- webapps -->
<artifact name="solr" type="war" conf="webapps"/>
The packager file contains the logic that copies out each artifact listed in the ivy file for the solr module.
<packager-module version="1.0">
<property name="name" value="${ivy.packager.module}"/>
<property name="version" value="${ivy.packager.revision}"/>
<resource dest="archive" url="" sha1="521d4d7ce536dd16c424a11ae8837b65e6b7bd2d">
<url href=""/>
<!-- Jar artifacts -->
<move file="archive/apache-${name}-${version}/dist/apache-${name}-cell-${version}.jar" tofile="artifacts/jars/${name}-cell.jar"/>
<move file="archive/apache-${name}-${version}/dist/apache-${name}-clustering-${version}.jar" tofile="artifacts/jars/${name}-clustering.jar"/>
<move file="archive/apache-${name}-${version}/dist/apache-${name}-core-${version}.jar" tofile="artifacts/jars/${name}-core.jar"/>
<move file="archive/apache-${name}-${version}/dist/apache-${name}-dataimporthandler-${version}.jar" tofile="artifacts/jars/${name}-dataimporthandler.jar"/>
<move file="archive/apache-${name}-${version}/dist/apache-${name}-dataimporthandler-extras-${version}.jar" tofile="artifacts/jars/${name}-dataimporthandler-extras.jar"/>
<!-- War artifacts -->
<move file="archive/apache-${name}-${version}/dist/apache-${name}-${version}.war" tofile="artifacts/wars/${name}.war"/>
2) Publish to subversion
I've never used it myself but I think you need to configure the subversion resolver and use this to publish your artifacts
3) Using ivy variable in packager file
The packager file listed above uses two ivy variables. Not sure what your question is.
Update: Supporting 3rd party jars
The publications section of the ivy file include the version number in the name of the 3rd party jar:
ivy file
<artifact name="abc-1.0" conf="jars"/>
<artifact name="pqr-2.0" conf="jars"/>
packager file
<move file="archive/apache-${name}-${version}/dist/abc-1.0.jar" tofile="artifacts/jars/abc-1.0.jar"/>
<move file="archive/apache-${name}-${version}/dist/pqr-2.0.jar" tofile="artifacts/jars/pqr-2.0.jar"/>

Using Maven from Ant

Are there ant plugins that wrap maven so that I can make use of its dependency management features to download jars for me and place them in my ant build's lib folder?
My specific problem is that I'm using the Crap4j plugin for Hudson, but it doesn't, as of yet, support Maven. Since it's a small project, maven is overkill, but I don't want to go without mvn dependency:copy-dependcies if I don't have to.
Any suggestions? (other than suck it up)
There is a new set of Ant tasks that use Mercury. Mercury is the refactored code that will be the basis of way that Maven 3 interacts with Maven (and OSGi) repositories that is being implemented by Oleg Gusakov. Mercury is well tested, and you can start using it in Ant projects today. Take a look at some of the How-to documents that Oleg has written:
Here's a simple example of using Mercury in an Ant build.xml file. The following build file creates a classpath that depends on verion 3.0 of the asm artifact:
<javac srcdir="src/main/java"
<dependency name="asm:asm:3.0"/>
There are a lot of advanced features such as support for verifying PGP signatures or MD5 digests. You can also start to define different repositories that Mercury depends on. This XML allows you to define a reference to a repository such as Nexus in addition to using a local directory as a repository:
<repo id="myCentral"
<repository dir="/my/local/repo"/>
<javac srcdir="src/main/java"
<dependency name="asm:asm:3.0"/>
If you need to reference a repository that requires authentication Mercury has support for storing a username and password:
<repo id="myCentral"
<auth name="foo" pass="bar"/>
<javac srcdir="src/main/java"
<dependency name="asm:asm:3.0"/>
Most compelling is the ability to publish an artifact to a repository from an Ant build file. If you work in an organization of any scale, you'll want to start thinking about deploying artifacts to a repository manager like Nexus. With Mercury, you can start deploying artifacts to a repository manager without having to adopt Maven. Here's a build file that defines an authenticated repository and writes an artifact:
<repo id="myCentral"
<auth name="foo" pass="bar"/>
<write repoid="myCentral"
Mercury is ready to use, and you can expect a lot of developments from Oleg going forward. If you want to start using it, the best place to look is at Oleg's How-to Page. (Note: This information will soon be integrated into the Definitive Guide)
Whilst the mercury tasks work, I haven't used them. I have had good success with their predecessors, the maven-ant-tasks. They're fairly simple to get going, if you already have a POM handy.
<project name="blah" xmlns:artifact="antlib:org.apache.maven.artifact.ant">
<!-- If you drop the maven-ant-tasks in ~/.ant/lib, you don't need these two bits. -->
<taskdef uri="antlib:org.apache.maven.artifact.ant"
classpathref="ant.classpath" />
<path id="ant.classpath">
<fileset dir="${ant.tasks.dir}">
<include name="*.jar" />
<target name="resolve" description="--> retrieve dependencies with maven">
<!-- Resolve dependencies -->
<artifact:dependencies filesetId="dependency.fileset">
<pom file="pom.xml" />
<!-- Copy all dependencies to the correct location. -->
<copy todir="${web.dir}/WEB-INF/lib">
<fileset refid="dependency.fileset" />
<!-- This mapper strips off all leading directory information -->
<mapper type="flatten" />
I like to keep my ant task jars inside the project, so I've added the taskdef and path. But if you want to put maven-ant-tasks-2.0.9.jar in ~/.ant/lib, then you don't need to declare this stuff. I think.
If you think that Maven is overkill in your project, you could/should try Apache Ivy: It's a very powerful dependency management library similar to the Maven one.
If you're hosting a project on the web, take also a look at Ivy Roundup, it's a repository of Ivy definitions for various libraries.
Just use the Maven Ant Tasks. They can be downloaded at the normal maven download page.
Refer this: Why you should use the Maven Ant Tasks instead of Maven or Ivy
I wouldn't really recommend Ivy for reasons given in the link above.
It is very simple to run Maven goal from Ant
<target name="buildProject" description="Builds the individual project">
<exec dir="${source.dir}\${projectName}" executable="cmd">
<arg value="/c"/>
<arg value="${env.MAVEN_HOME}\bin\mvn.bat"/>
<arg line="clean install" />
Using this you are allow to run any kind of Maven goal from Ant...
In my case i just want an ejb jar to be at the repository so i could use it on another project with maven as dependency so:
<target name="runMaven" depends="deploy" description="LLama al maven.">
<exec executable="cmd">
<arg value="/c"/>
<arg value="mvn.bat install:install-file -DgroupId=com.advance.fisa.prototipo.camel -DartifactId=batch-process -Dversion=1.0 -Dpackaging=jar -Dfile=${jarDirectory}\batch-process.jar"/>
Download Maven Ant Tasks then use this:
<target name="getDependencies">
<path id="maven-ant-tasks.classpath" path="${basedir}${file.separator}maven${file.separator}lib${file.separator}maven-ant-tasks.jar" />
<typedef resource="org/apache/maven/artifact/ant/antlib.xml" uri="antlib:org.apache.maven.artifact.ant" classpathref="maven-ant-tasks.classpath" />
<artifact:dependencies filesetId="dependency.fileset" type="jar">
<pom file="pom.xml" />
<!--TODO take care of existing duplicates in the case of changed/upgraded dependencies-->
<copy todir="lib">
<fileset refid="dependency.fileset" />
<mapper type="flatten" from="${dependency.versions}" />
I am working on the same problem right now. I installed all necessary libs in my local Maven repo and from there I put it into our Company Maven Repo.
It is not working quite right yet. Some of the tests fail that work nicely in my Maven test run, but since the outcome of the test is not important for the coverage data, I am quite satisfied.
Here is my Maven snippet. I hope that helps.
<property name="compile_classpath" refid="maven.compile.classpath"/>
<property name="runtime_classpath" refid="maven.runtime.classpath"/>
<property name="test_classpath" refid="maven.test.classpath"/>
<property name="plugin_classpath" refid="maven.plugin.classpath"/>
<property name="CRAP4J_HOME" value="${user.home}/Projects/crap4j"/>
<taskdef name="crap4j" classname="org.crap4j.anttask.Crap4jAntTask">
<fileset dir="${CRAP4J_HOME}/lib">
<include name="**/*.jar"/>
<crap4j projectdir="${project.basedir}/alm-jar-server"
<pathElement location="${project.basedir}/target/classes"/>
<pathElement location="${project.basedir}/src/main/java"/>
<pathElement location="${project.basedir}/target/test-classes"/>
<fileset dir="${user.home}/.m2/repository">
<include name="**/*.jar"/>