Crawling Jira with Manifoldcf and Solr - String index out of range - apache

I am using Manifoldcf v2.7.1 and Solr v5.2.1 and trying to crawl Jira using the Jira connector and am getting the following error in Manifoldcf:
Error: Repeated service interruptions - failure processing document:
Error from server at (servername:port/solr/jira): String index out of range: -11
Note: I removed my server and port info from the error message
One of the error logs from Solr is showing the following at the top of the stacktrace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -11
at org.apache.solr.request.macro.MacroExpander._expand(MacroExpander.java:144)
Don't know what is causing this area and how to fix it. Thanks in advance!

Turns out that there was a Jira issue with Java written in the comments section. I think that it wasn't being exited out properly by Manifold. To avoid this, I excluded this one issue that was causing issues from future crawls.

Related

Ontotext GraphDB Repository cannot be used for queries

I am getting an error message while trying to sparql in a particular repository.
Error :
The currently selected repository cannot be used for queries due to an error:
Page [id=7, ref=1,private=false,deprecated=false] from pso has size of 206 != 820 which is written in the index: PageIndex#244 [OPENED] ref:3 (parent=null freePages=1 privatePages=0 deprecatedPages=0 unusedPages=0)
So I tried to recreate the repository by uploading a new RDF file, but still issue persist. Any solution? Thanks in advance
The error indicates an inconsistency between what is written in the index (pso.index) and the actual page (pso). Is there any chance that the binary files were modified/over-written/partially merged? Under normal operation, you should never get this an error.
The only way to hide this error is to start GraphDB with: ./graphdb -Dthrow.exception.on.index.inconsistency=false. I will recommend doing this only for dumping the repository content into an RDF file, drop the repository, and recreate it.

Not able to start the ignite server through java code

I am using ignite native and using atomicity as TRANSACTIONAL_SNAPSHOT when I am trying the load the old storage which was configured with amoticity TRNASACTIONAL it is giving the Unknown page type issue after deleting the .dat file but if I am using new storage it is working fine. Can anybody help me?
org.h2.jdbc.JdbcSQLException: General error: "java.lang.IllegalStateException: Unknown page type: 10009 pageId: 0002ffff00000006"; SQL statement:
CREATE TABLE "DFM"."ANSWER_TYPE_ENUM" (_KEY VARCHAR INVISIBLE NOT NULL,_VAL OTHER INVISIBLE,"ID" VARCHAR,"ENUM_VALUE" VARCHAR) engine "org.apache.ignite.internal.processors.query.h2.H2TableEngine" [50000-197]
I've never seen errors like these, but I would say that TRANSACTIONAL_SNAPSHOT is experimental and should be avoided for now.

Talend (7.0.1) - Cannot modify mapred.job.name at runtime

I am having some trouble running a simple tHiveCreateTable job in Talend OS for Big Data (Print of the job where I am getting this error).
The Hive connection is fine and the job worked until Ranger was activated in the cluster.
After ranger, I started getting the following log:
[statistics] connecting to socket on port 3345
[statistics] connected
Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
[statistics] disconnected
This error occurs either using Tez or MapReduce for the job, throwing an exception in the following line of the automatically generated code:
// For MapReduce Mode
stmt_tHiveCreateTable_1.execute("set mapred.job.name=" + queryIdentifier);
Do you know any solution or workarround for this?
Thanks in advance
It is possible to disable changing mapreduce.job.name and hive.query.name at runtime by Talend7 jobs.
Edit the file
{talend_install_dir}/plugins/org.talend.designer.components.localprovider_7.1.1.20181026_1147/components/templates/Hive/SetQueryName.javajet
and comment out lines 6 and 11 like that:
// stmt_<%=cid %>.execute("set mapred.job.name=" + queryIdentifier_<%=cid %>);
// stmt_<%=cid %>.execute("set hive.query.name=" + queryIdentifier_<%=cid %>);
It solved this issue for me.

Error when connecting Hive with kibi?

I am using kibi-community-demo-full-4.6.4-linux-x64 version.
In datasource:
"connection_string": "jdbc:hive://localhost:10000/root",
"libpath": "/home/pare/Downloads/jar/",
"drivername": "org.apache.hadoop.hive.jdbc.HiveDriver",
"libs": "hive-jdbc-0.11.0.jar,hive-metastore-0.11.0.jar,libthrift-0.9.1.jar,hive-service-0.13.1.jar,hive-jdbc-1.2.1.2.3.2.0-2950-standalone.jar,hadoop-common-2.7.1.2.3.2.0-2950.jar",
After that when in queries I write a query it will show error like:
Queries Editor: Error 400 Bad Request: Error running static method java.lang.IllegalArgumentException: Bad URL format at org.apache.hive.jdbc.Utils.parseURL(Utils.java:185) at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:84)
What is the error can any one explain me how to solve it?
I am able to connect after changing the jars version.
and also I changed the driver name "org.apache.hive.jdbc.HiveDriver".

Lucene Search Error Stack

I am seeing the following error when trying to search using Lucene. (version 1.4.3). Any ideas as to why I could be seeing this and how to fix it?
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:195)
at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:55)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:109)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:106)
at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:43)
In this same environment I also see the following error:
Caused by: java.io.IOException: Lock obtain timed out:
Lock#/tmp/lucene-3ec31395c8e06a56e2939f1fdda16c67-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:223)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:213)
The same code works in a test environment, however not in production. Cannot identify any obvious differences between the two environments.
File permissions are wrong (it needs write permission) or your are not able to access a locked file that the current process needs.