java.io.IOException: Job failed - apache

I'm trying to index a site with "Apache Nutch 1.4" and when I run the command below, the following error occurs "java.io.IOException: Job failed"
bin/nutch solrindex http://localhost:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
I installed "Tomca6" and "Apache Solr 3.5.0" to work with Nutch but unfortunately is not working
simulation
root#debian:/usr/share/nutch/runtime/local$ bin/nutch solrindex http://localhost:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/*
SolrIndexer: starting at 2012-03-28 18:45:25
Adding 48 documents
java.io.IOException: Job failed!
root#debian:/usr/share/nutch/runtime/local$
Can someone help me please?

This error often occurs if the mapping of nutch result fields onto Solr field is incorrect or incomplete. This results in the "update" action being rejected by the Solr server. Unfortunately, at some point in the call chain this error is converted into a "IO error" which is a little misleading. My recommendation is to access the web console of the Solr server (which is accessible using the same URL as for the submissing of links, e.g. in this case http://some.solr.server:8983/solr/) and go to to the logging tab. Errors concerning the mapping will show up there!

Looks like Solr is not configured right. (Please ensure that the input linkdb, crawldb and segments are present in the location that you pass command line).
Read
Setting up Solr 1.4 with Apache Tomcat 6.X
Nutch 1.3 and Solr Integration .

Related

Cannot start renderd service for mod_tile

I am building an OSM tile server as per directions available here: https://switch2osm.org/manually-building-a-tile-server-16-04-2-lts/ on an Amazon EC2 instance with Ubuntu 16-04 LTS.
Everything is working well until the step of starting renderd as a service:
sudo /etc/init.d/renderd start
This returns an error of: "Job for renderd.service failed because the control process exited with error code. See "systemctl status renderd.service" and "journalctl -xe" for details."
Checking the details mentioned gives messages like:
"renderd.service: Control process exited, code=exited status=203"
"The error number returned by this process is 8."
I can however run renderd directly no problem as below, and can even (slowly) load tiles into a leaflet map, I just cannot run it as a service.
sudo -u username renderd -f -c /usr/local/etc/renderd.conf
I have also tried changing to my rendering user and starting the service from there, but then I get a password prompt for user ubuntu (there isn't one).
What else can I test out or investigate to find out what the problem is?
I decided to start building my server again from scratch, this time also using information from other tutorials: https://www.linuxbabe.com/linux-server/openstreetmap-tile-server-ubuntu-16-04 and https://ircama.github.io/osm-carto-tutorials/tile-server-ubuntu
Following those instructions, renderd now runs as a service. The main difference I noticed was those tutorials above use https://github.com/openstreetmap/mod_tile.git rather than the
https://github.com/SomeoneElseOSM/mod_tile.git source I used before, so perhaps the settings of the branched mod_tile were not compatible with my server.

RavenDb in ram and 404 Not found errors

I'm tring to use a full in-memory RavenDB (version 2.5.2996) to run some integration tests.
I started the RavenDB server using the following command:
Raven.Server.exe --ram --debug
The server started correctly.
The integration tests stuck and I get a lot of errors in the RavenDB debug console:
Request #143: GET - o ms - <database name> - 404 - /indexes/Raven/DocumentByEntityName?definition=yes
Using a normal RavenDB instance (not in memory), the integration tests pass.
I tried to search in the RavenDB documentation for some clues but I didn't find anything. Anyone can help me to understand why it is not working?
The --ram options is for the system database, you need to specify that each db is also running in memory by setting Raven/RunInMemory = true when you create the db.

HBase Nutch error [Ljava.lang.StackTraceElement

My apache nutch is crawling and in log file following error is appeared.
ERROR store.HBaseStore - Connection refused 2014-11-17 00:00:38,255 ERROR store.HBaseStore - [Ljava.lang.StackTraceElement;#6dce5061
How to remove this error. According to my search this error is because of hbase and not in nutch. This question is posted here but it has no answer.I have to bounty this question if do not get an answer that's why I am posting again.
Some informations of my small cluster is following ( 2 machine cluster)
On machine one, hadoop and hbase are running
On machine two, apache nutch crawler(2.2.1) is running.
When I check log files of hbase and hadoop, there isn't any information about bug. Because of this bug, crawled data in not going to be saved in hbase(machine1). That's a real problem for me and my crawler in not crawler properly. There is about 266 GB already crawled data in table.
This problem "Connection refused" is simply because your region server is not running properly

Mule with redis

I am trying to use Redis connector in Mule, but when I start the application it gives the below error
org.xml.sax.SAXParseException: schema_reference.4: Failed to read schema document 'http://www.mulesoft.org/schema/mule/redis/current/mule-redis.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>
I checked the URL, and it is giving 404 error. I tried replacing current with 3.2, but no luck.
Do anybody have idea, how can we make it running ?

Sunspot lock issue on EngineYard

I am having a problem when creating a new record on a RoR3 server.
It updates SolR indexes and it's having a problem with a lock.
RSolr::Error::Http (RSolr::Error::Http - 500 Internal Server Error
Error: Lock obtain timed out: NativeFSLock#/data/dfcgit_r3/releases/20130620195714/solr/data/production/index/write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock#/data/dfcgit_r3/releases/20130620195714/solr/data/production/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1108)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:83)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:219)
Any help with this?
We had the same error when running sunspot solr on amazon ec2.
The 'write'lock' indicated that some process had not released the lock on a resource, either the web server process was still at it or Solr had some other process running. I ran a check on the solr processes running by executing
ps -aux |grep solr
And it showed there were 4 processes running! So I stopped solr from the command : sunspot:solr:stop, then again ran the grep, killed the solr processes listed (kill -9) and then sunspot:solr:start
And the Sun shined again. It worked fine there after