apache solr 6 multiple dataimporthandler - apache

I want to index from two different databases. Therefore I make two data-config.xml files with different names.
I integrate in solrconfig.xml file two requestHandler with DataimportHandler.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">  
<lst name="defaults">
<str name="config">data-config-847.xml</str>
</lst>
<requestHandler name="/dataimport857" class="org.apache.solr.handler.dataimport.DataImportHandler">  
<lst name="defaults">
<str name="config">data-config-857.xml</str>
</lst>
But it does not function. I did the same configuration in solr 4.7, it function without problem. What ist different between solr 4.7 and solr 6.0? Or how it function?

It is probably SOLR-8993 affecting new Admin UI.
Workarounds:
Use legacy Admin UI, accessible through a link on the top of the screen
Pass config value as a URL parameter invoking DIH URL directly and not via Admin UI. The defaults section is just that - defaults that can be overridden with URL parameters.

Related

Apache Solr 9.0 OCR of Image Saved As PDF

I've got solr 9.0 running with the a request handler set up for Tika per https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html.
If I pass in a pdf document (that is, a text document that is stored as a PDF), I get the expected results of being able to query for the content of the document.
If, however, I pass in a pdf that is an image (it is a scanned page from a newsletter, then saved as a PDF), no OCR is taking place. I'm using solrj to communicate with the solr install.
I also tried indexing the PDF after it was exported as a PNG. This worked testing with locally running tika, but not with solr.
fun index(file: File) {
val urlString = "http://localhost:8983/solr/films"
val solr = HttpSolrClient.Builder(urlString).build()
solr.parser = XMLResponseParser()
val req = ContentStreamUpdateRequest("/update/extract")
// I've tried both "image/pdf" and "application/pdf"
req.addFile(file, "image/pdf")
req.setParam("literal.id", file.name);
req.setAction(ACTION.COMMIT, true, true)
val result = solr.request(req)
println("Result: $result")
}
solrconfig.xml
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
I decided to test in isolation with tika, so I started the docker container
docker run -it \
--name tika-server-ocr \
-d \
-p 9998:9998 \
apache/tika:1.24-full
If I passed in the file as a PDF, it did not work:
curl -T "285 October-5.pdf" http://localhost:9998/tika
If I pass in an exported png from the PDF, it does work:
curl -T "285 October-5 copy.png" http://localhost:9998/tika
NEGOTIATING GARRY'S ANCHORAGE
Garry's Anchorage is a popular rest spot on the western
I'm guessing there is a bit of config or perhaps a parameter I need to send in to solr during the extraction?

'sharedLib' in solr.xml not affecting after upgrading from solr 4.7 to solr 5.5.5

I am trying to upgrade solr 4.7 to solr 5.5.5.
In solr 4.7 solr.xml contained <str name="sharedLib">common</str> so configuration files such as solrconfig.xml loaded from the path '$SOLR_HOME/solr/common/conf' instead loading this file from each core.
Now, after upgrading to solr 5.5.5, I put the same 'sharedLib' but after starting solr I see errors in the UI for each core:
mycore_1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core mycore_1: Error loading solr config from ($SOLR_HOME)/solr/mycore_1/conf/solrconfig.xml
mycore_2: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core mycore_2: Error loading solr config from ($SOLR_HOME)/solr/mycore_2/conf/solrconfig.xml
remote_shared_instance: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core remote_shared_instance: Error loading solr config from ($SOLR_HOME)/solr/remote_shared_instance/conf/solrconfig.xml
Seems that it tried to search for the configuration files in the core's folders instead in the shared folder.
solr.xml placed in the right folder ($SOLR_HOME) because when deleting this file I get the message: 'olr home directory $SOLR_HOME/solr must contain a solr.xml file!' when starting solr.
I start solr from $SOLR_HOME which is different from the solr installation folder.
This is how the 'solr' section in solr.xml file looks:
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
<str name="sharedLib">common</str>
</solr>
I also tried using configsets - I created a new folder under $SOLR_HOME 'solr/solr/configsets/common/conf' with cond files but how can I tell solr to refer to this folder? NOTE that the original 'configsets' folder is under solr installation folder 'solr-5.5.5/server/solr/configsets' and not under $SOLR_HOME.
Have I missed something?
Thanks!
Mike

Solr 5 restart losing cores

Noob Solr question
I am trying to set up Solr, and to aid I have been using Apache Solr installer from bitnami.
This will install Solr 5.4.
I have gone and created a new core, and everything looks good. However when I restart solr, the core I have just created is lost.
I have not altered any configuration items from what is installed by Bitnami
I have been reading up about how Solr 5 is self discovering, and I am sure that everything is correct.
This is a copy of my solr.xml file from C:\Bitnami\solr-5.4.0-0\apache-solr\solr
<?xml version="1.0" encoding="UTF-8" ?>
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:600000}</int>
<int name="connTimeout">${connTimeout:60000}</int>
</shardHandlerFactory>
</solr>
And I have checked, and in the core folder I have created, there is a core.properties file in the the conf folder. This is the contents of the file
#Written by CorePropertiesLocator
#Tue Dec 22 10:37:24 UTC 2015
name=sitecore_analytics_index
config=solrconfig.xml
schema=schema.xml
dataDir=data
loadOnStartup=true
So I cannot understand why the core is not being discovered. Any help greatly appreciated
ps. I am doing this on Windows and not *nix

Allow more than 2 gb file upload in struts2

I am using Struts 2.1 in my project.
In struts.xml maxsize element in my project is as follows :
<constant name="struts.multipart.maxSize" value="2147483648" />
For the file upload process,
is it possible to supersede the normal 2 Gb file limit of Struts2 ?
You should migrate to the latest version of Struts2.
From 2.3.20 and above, a new MulitpartRequest implementation can be used to upload large files:
Alternate Libraries
The struts.multipart.parser used by the fileUpload interceptor to
handle HTTP POST requests, encoded using the MIME-type
multipart/form-data, can be changed out. Currently there are two
choices, jakarta and pell. The jakarta parser is a standard part of
the Struts 2 framework needing only its required libraries added to a
project. The pell parser uses Jason Pell's multipart parser instead of
the Commons-FileUpload library. The pell parser is a Struts 2 plugin,
for more details see:
http://cwiki.apache.org/S2PLUGINS/pell-multipart-plugin.html. There
was a third alternative, cos, but it was removed due to licensing
incompatibilities.
As from Struts version 2.3.18 a new implementation of MultiPartRequest
was added - JakartaStreamMultiPartRequest. It can be used to handle
large files, see WW-3025 for more details, but you can simple set
<constant name="struts.multipart.parser" value="jakarta-stream" />
> in struts.xml to start using it.

Special character encoding issue on Solaris with weblogic server

I have an application which uses fop and xslt to generate the PDF file. The special characters as §£?ÐÅÆ are appearing as ???? in PDF.
The weblogic server is running on solaris machine.
I have already tried with
<charset-params>
<input-charset>
<resource-path>/*</resource-path>
<java-charset-name>UTF-8</java-charset-name>
</input-charset>
<charset-mapping>
<iana-charset-name>UTF-8</iana-charset-name>
<java-charset-name>UTF-8</java-charset-name>
</charset-mapping>
</charset-params>
in weblogic.xml.
I have also tried with
transformer.setOutputProperty( OutputKeys.METHOD, "xml");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty( OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
Nothing seems to be working over there.
Have you set up fop to find fonts with those characters in? For instance, on Solaris 11
using fop (though not with weblogic) I had to set up paths for fonts in a fop-conf.xml:
<?xml version="1.0"?>
<!-- NOTE: This is the version of the configuration -->
<fop version="1.0">
<renderers>
<renderer mime="application/pdf">
<fonts>
<!-- register all the fonts found in a directory -->
<directory>/usr/share/fonts/TrueType/core/</directory>
<directory>/usr/share/fonts/TrueType/dejavu/</directory>
<directory>/usr/share/fonts/TrueType/liberation/</directory>
<directory>/usr/share/fonts/TrueType/unifont/</directory>
<!-- register all the fonts found in a directory and all of its sub directories (use with care) -->
<!-- directory recursive="true">C:\MyFonts2</directory -->
<!-- automatically detect operating system installed fonts -->
<auto-detect/>
</fonts>
</renderer>
<renderer mime="application/postscript">
<fonts>
<directory>/usr/share/fonts/X11/Type1/</directory>
<directory>/usr/share/ghostscript/fonts/</directory>
<directory>/usr/share/fonts/TrueType/core/</directory>
<directory>/usr/share/fonts/TrueType/dejavu/</directory>
<directory>/usr/share/fonts/TrueType/liberation/</directory>
<directory>/usr/share/fonts/TrueType/unifont/</directory>
</fonts>
</renderer>
</renderers>
</fop>
(Font paths will be different on older versions of Solaris.)
For more details, see:
http://xmlgraphics.apache.org/fop/trunk/fonts.html
http://www.sagehill.net/docbookxsl/AddFont.html#ConfigFontFop