Oozie workflow: Hive action failed because of Tez - hive

Running my script on a data node running the hive client tools is working. But when i schedule the hive script using Oozie than i get the Error as shown below.
I've set the tez.lib.uris in the tez-site.xml to hdfs:///apps/tez/,hdfs:///apps/tez/lib/
What I'm missing here?
Hive script:
USE av_raw;
LOAD DATA INPATH '${INPUT}' INTO TABLE alarms_stg;
INSERT INTO TABLE alarms PARTITION (year, month)
SELECT * FROM alarms_stg WHERE job_id = '${JOBID}';
Workflow action:
<!-- load processed data and store in hive -->
<action name="load-data">
<hive xmlns="uri:oozie:hive-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<script>load_data.hive</script>
<param>INPUT=${complete}</param>
<param>JOBID=${wf:actionData('stage-data')['hadoopJobs']}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
Error:
Log Type: stderr
Log Length: 3227
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/grid/5/hadoop/yarn/local/filecache/2418/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
Logging initialized using configuration in file:/grid/2/hadoop/yarn/local/usercache/hdfs/appcache/application_1417175595182_12259/container_1417175595182_12259_01_000002/hive-log4j.properties
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:358)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:316)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:277)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:137)
at org.apache.tez.client.TezSession.start(TezSession.java:105)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:185)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:356)
... 19 more

Please try to add tez.lib.uris=hdfs:///apps/tez/,hdfs:///apps/tez/lib/ in workflow.xml of your Oozie job
e.g) workflow.xml
<!-- load processed data and store in hive -->
<action name="load-data">
<hive xmlns="uri:oozie:hive-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez/,hdfs:///apps/tez/lib/</value>
</property>
</configuration>
<script>load_data.hive</script>
<param>INPUT=${complete}</param>
<param>JOBID=${wf:actionData('stage-data')['hadoopJobs']}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>

Eventually you can try to add value of "tez.lib.uris" directly in "Workflow Settings" under "Hadoop Properties" .
tez.lib.uris = hdfs:///apps/tez/,hdfs:///apps/tez/lib/
Before you add it verify the correct value in tez-site.xml.

Related

Oozie shell action failing

I am trying to test oozie shell action in my cloudera vm (quickstart vm). When running a simple hdfs command (hadoop fs -put ...) script its working but when I am triggering a hive script the oozie job is finished with status "KILLED". On oozie consol only error message I am getting is
"Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]"
While the underlying job in history server(name node logs) is coming as SUCCEEDED. Below are oozie job details :
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="WorkFlow1">
<start to="shell-node" />
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<file>${myscriptpath}#${myscript}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}] </message>
</kill>
<end name="end" />
</workflow-app>
------------------------------------
job.properties
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=hdfs://quickstart.cloudera:8032
queueName=default
myscript=test.sh
myscriptpath=${nameNode}/oozie/sl/test.sh
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/oozie/sl/
workflowAppUri=${nameNode}/oozie/sl/
-----------------------------------------------
test.sh
hive -e "create table test2 as select * from test"
Would really appreciate if anyone can point me in direction I am getting it wrong.
It would be good if you have a look into the Oozie Hive action.
Its pretty easy to configure. Hive action will take care of setting everything.
https://oozie.apache.org/docs/4.3.0/DG_HiveActionExtension.html
To connect hive , you need to explicitly add the hive-site.xml or the Hive server details for it to connect.

Hive action failing with SLF4J error : SLF4J: Class path contains multiple SLF4J bindings

I am trying to create a simple workflow with a hive action. I'm using Cloudera Quickstart VM (CDH 5.12). The following are the components of my workflow:
1) top_n_products.hql
create table instacart.top_n as
(
select * from
(
select row_number() over (order by no_of_times_ordered desc)as num_rank, product_id, product_name, no_of_times_ordered
from
(
select A.product_id, B.product_name, count(*) as no_of_times_ordered from
instacart.order_products__train as A
left outer join
instacart.products as B
on A.product_id=B.product_id
group by A.product_id, B.product_name
)C
)D
where num_rank <= ${N}
);
2) hive-config.xml
I have basically copied the default hive-site.xml from /etc/hive/conf into my workflow workspace folder and renamed it to hive-config.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
3) Workflow properties
In the hive action, I set the following:
- set HIVE XML, Job XML paths to my hive-config.xml
- Also added hive-config.xml to Files
- In the workflow properties, set the path to my workspace
- Defined the parameter N in my query
Screenshot of my Hive Action properties
When I try to run the workflow it fails, and the stderr throws following error:
Log Type: stderr
Log Upload Time: Mon Nov 20 19:49:04 -0800 2017
Log Length: 2759
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/130/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Nov 20, 2017 7:47:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to #Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
.
.
.
.
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Below are the workflow.xml and job.properties that are generated:
1) Workflow XML:
<workflow-app name="Top_N_Products" xmlns="uri:oozie:workflow:0.5">
<global>
<job-xml>hive-config.xml</job-xml>
</global>
<start to="hive-87ac"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="hive-87ac" cred="hcat">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-config.xml</job-xml>
<script>top_n_products.hql</script>
<param>N={N}</param>
<file>hive-config.xml#hive-config.xml</file>
</hive>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
2) job.properties
security_enabled=False
send_email=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=localhost:8032
N=10
Please note that the hive query runs perfectly fine through the Hive query editor. Am I missing something while configuring the workflow? Any help is appreciated!
Thanks,
Deb

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain]

I'm trying to run a pig script by triggering it through oozie. Here is the workflow.xml, job.properties and error message. Please help me to solve the issue. I am using BigInsight VM to run this.
workflow.xml
<workflow-app name="PigApp" xmlns="uri:oozie:workflow:0.1">
<start to="PigAction"/>
<action name="PigAction">
<pig>
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<prepare></prepare>
<configuration>
<property>
<name>oozie.action.external.stats.write</name>
<value>true</value>
</property>
<property>
<name>oozie.action.sharelib.for.pig</name>
<value>pig</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m -Xms1000m -Xmn100m</value>
</property>
</configuration>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Error message[${wf:errorMessage()}]</message>
</kill>
<end name="end"/>
</workflow-app>
Job.properties
#JobTracker and NodeName
jobtracker=bivm:9001
namenode=bivm:9000
#HDFS path where you need to copy workflow.xml and lib/*.jar to
oozie.wf.application.path=hdfs://bivm:9000/user/biadmin/oozieWF/
oozie.libpath=hdfs://bivm:9000/user/biadmin/oozieWF/lib
oozie.use.system.libpath=true
oozie.action.sharelib.for.pig=pig
wf_path=hdfs://bivm:9000/user/biadmin/oozieWF/
#one of the values from Hadoop mapred.queue.names
queueName=default
enter code here
Error Message:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], main() threw exception, jline.ConsoleReaderInputStream
java.lang.NoClassDefFoundError: jline.ConsoleReaderInputStream
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:219)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:665)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942)
at java.lang.ClassLoader.loadClass(ClassLoader.java:851)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:827)
... 18 more
If it is problem related to pig jar then specify the version on link to download. I'm using pig 0.12.0 jar.

Error in sqoop action in oozie while fetching data from teradata to hive

I am using HDP 2.3. I am getting the following error in sqoop action in oozie while fetching data from teradata to hive.
Sqoop action in my workflow.xml:
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${hadoop_jobTrackerURL}</job-tracker>
<name-node>${hadoop_nameNodeURL}</name-node>
<job-xml>lib/hive-site.xml</job-xml>
<configuration>
<property>
<name>oozie.launcher.mapreduce.user.classpath.first</name>
<value>true</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${hadoop_yarnQueueName}</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:teradata://192.168.145.129/DBS_PORT=1025,DATABASE=DS_TBL_DB</arg>
<arg>--username</arg>
<arg>dbc</arg>
<arg>--password</arg>
<arg>dbc</arg>
<arg>--driver</arg>
<arg>com.teradata.jdbc.TeraDriver</arg>
<arg>--query</arg>
<arg>select * from ds_tbl_db.catalog_page WHERE $CONDITIONS</arg>
<arg>--hive-import</arg>
<arg>--hive-table</arg>
<arg>catalog_page123</arg>
<arg>--target-dir</arg>
<arg>/user/root/db/catalog_page1234</arg>
<arg>-m</arg>
<arg>1</arg>
<arg>--verbose</arg>
NOTE: I added tdgssconfig.jar & terajdbc4.jar and all hive dependency jars in /share/lib. Also tried with including the dependencies in the lib folder for workflow.
Error Stack:
ERROR [main] tool.ImportTool (ImportTool.java:run(613)) - Encountered
IOException running import job: java.io.IOException: Cannot run
program "hive": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:528)
at org.apache.sqoop.util.Executor.exec(Executor.java:76)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:391)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:344)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:245)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 31 more

Configuring Nutch 2.3 with HSQL 2.3.3 - ClassNotFoundException : org/apache/avro/ipc/ByteBufferOutputStream

I'm getting ClassNotFoundException : org/apache/avro/ipc/ByteBufferOutputStream when I run apache Nutch with HSQLDB although I have all the avro related jar files under lib
avro-1.7.6.jar
avro-compiler-1.7.6.jar
avro-ipc-1.7.6.jar
avro-mapred-1.7.6.jar
This is what I did:
Got HSQLDB up and running
root#elephant hsqldb# sudo java -cp /home/hsqldb/hsqldb-2.3.3/hsqldb/lib/hsqldb.jar org.hsqldb.server.Server --props /home/hsqldb/hsqldb-2.3.3/hsqldb/conf/server.properties
[Server#372f7a8d]: [Thread[main,5,main]]: checkRunning(false) entered
[Server#372f7a8d]: [Thread[main,5,main]]: checkRunning(false) exited
[Server#372f7a8d]: Startup sequence initiated from main() method
[Server#372f7a8d]: Loaded properties from [/home/hsqldb/hsqldb-2.3.3/hsqldb/conf/server.properties]
[Server#372f7a8d]: Initiating startup sequence...
[Server#372f7a8d]: Server socket opened successfully in 28 ms.
[Server#372f7a8d]: Database [index=0, id=0, db=file:/home/hsqldb/hsqldb-2.3.3/hsqldb/data/nutch, alias=nutchdb] opened sucessfully in 1406 ms.
[Server#372f7a8d]: Startup sequence completed in 1438 ms.
[Server#372f7a8d]: 2015-12-26 18:30:13.841 HSQLDB server 2.3.3 is online on port 9001
[Server#372f7a8d]: To close normally, connect and execute SHUTDOWN SQL
[Server#372f7a8d]: From command line, use [Ctrl]+[C] to abort abruptly
Configured ivy/ivy.xml
uncommented below lines in ivy.xml
<dependency org="org.apache.gora" name="gora-core" rev="0.5" conf="*->default"/>
and
<dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating"
conf="*->default" />
uncommented the below lines conf/gora.properites
###############################
# Default SqlStore properties #
###############################
gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver
gora.sqlstore.jdbc.url=jdbc:hsqldb:hsql://localhost/nutchdb
gora.sqlstore.jdbc.user=sa
gora.sqlstore.jdbc.password=
Ran ant build
ant runtime
Added configuration for nutch-site.xml
root#elephant conf# cat nutch-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>storage.data.store.class</name>
<value>org.apache.gora.sql.store.SqlStore</value>
</property>
<property>
<name>http.agent.name</name>
<value>NutchCrawler</value>
</property>
<property>
<name>http.robots.agents</name>
<value>NutchCrawler,*</value>
</property>
</configuration>
Created seed.txt under urls folder
Executed the nutch by injecting the urls
[root#elephant local]# bin/nutch inject urls/
InjectorJob: starting at 2015-12-26 19:11:24
InjectorJob: Injecting urlDir: urls
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/ipc/ByteBufferOutputStream
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:259)
at org.apache.nutch.storage.StorageUtils.getDataStoreClass(StorageUtils.java:93)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:77)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:218)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.ipc.ByteBufferOutputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
Gora-sql is not supported. Due some licenses issues (if I am not wrong), it became disabled around Gora 0.2.
So I suggest you to use other storage like, for example, HBase.
How to get HBase up&running fast: read answer at https://stackoverflow.com/a/39837926/582789