Solr - Suggester not returning any suggestiions

Solr - Suggester not returning any suggestiions - apache

I'm trying to setup the Solr suggester module.
I've followed the guide and setup my core as such:
solrconfig.xml
<searchComponent class="solr.SpellCheckComponent" name="suggest">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">city</str> <!-- the indexed field to derive suggestions from -->
<float name="threshold">0</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
schema.xml
<types>
<fieldType class="solr.TextField" name="textSpell" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<!-- my other fieldsfields -->
<field name="city" type="textSpell" indexed="true" stored="true"/>
</fields>
and then rebuild the spellcheck with:
http://localhost:4569/solr/myCore/suggest?q=a&spellcheck=true&spellcheck.build=true
and then do a search:
http://localhost:4569/solr/myCore/suggest?q=aberdean&spellcheck=true&spellcheck=on
but I always get an empty suggestions in the response:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions"/>
</lst>
</response>
I've checked the suggestions in this question
Any idea why I'm not getting results?

Can you commit and try your query? I guess it's because of the
<str name="buildOnCommit">true</str>
Can you try again making it false.

Related

Solr suggester no results

I want to use the Solr suggester component for city names. I have the following settings:
schema.xml
Field definition
<fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The field i want to apply the suggester on
<field name="city" type="string" indexed="true" stored="false"/>
The copy field
<copyField source="city" dest="citySuggest"/>
The field
<field name="citySuggest" type="textSuggest" stored="false" indexed="true" />
solr-config.xml
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">citySuggest</str>
<str name="suggestAnalyzerFieldType">string</str>
</lst>
</searchComponent>
Then i run
http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath&suggest.build=true
to build the suggest component
Finally i run
http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath
but i get an empty result set
{"responseHeader":{"status":0,"QTime":0},"suggest":{"mySuggester":{"Ath":{"numFound":0,"suggestions":[]}}}}
Are there any obvious mistakes? Any thoughts?

Τry the following:
In the field with name="citySuggest" set the attribute stored="true".
Then rebuild the suggester.

extract the content excerpt from Apache solr

I have used Solr for my Mysql table to index as well as search.Now i want to get an excerpt from the matched results and also high lighten it .
Where Do i have to make changes
is it here ? if yes , what ?
<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- VelocityResponseWriter settings -->
<str name="wt">velocity</str>
<str name="v.template">browse</str>
<str name="v.layout">layout</str>
<str name="title">Solritas</str>
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="mlt.qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="mlt.fl">text,features,name,sku,id,manu,cat</str>
<int name="mlt.count">3</int>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="facet">on</str>
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.query">ipod</str>
<str name="facet.query">GB</str>
<str name="facet.mincount">1</str>
<str name="facet.pivot">cat,inStock</str>
<str name="facet.range">price</str>
<int name="f.price.facet.range.start">0</int>
<int name="f.price.facet.range.end">600</int>
<int name="f.price.facet.range.gap">50</int>
<str name="f.price.facet.range.other">after</str>
<str name="facet.range">manufacturedate_dt</str>
<str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
<str name="f.manufacturedate_dt.facet.range.end">NOW</str>
<str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
<str name="f.manufacturedate_dt.facet.range.other">before</str>
<str name="f.manufacturedate_dt.facet.range.other">after</str>
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">text features name</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
<!--
<str name="url-scheme">httpx</str>
-->
</requestHandler>

You would need to change the following configurations to enable highlighting and specify the fields on which they need to be enabled.
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">text features name</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
Check HighlightingParameters which would detail each of the parameters you can configure for highlighting.

Solr 4.0 UI issue

I just discovered today that there is a new solr release (4.0 ALPHA). So, I give it a try. After setting it up (under Tomcat) I had the following error message:
This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml:
<!-- Admin Handlers - This will register all the standard admin RequestHandlers. -->
<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
The above error popped up when I added the following in the solrconfig.xml:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
Does anybody know what it is wrong?
Thank you in advance,
Tom
Greece
The solr folder structure:
+solr
+conf
+data
+lib
-contrib
-dist
The solr.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
<cores adminPath="/admin/cores" defaultCoreName="collection1">
<core name="collection1" instanceDir="." />
</cores>
</solr>
The solrconfig.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<luceneMatchVersion>LUCENE_40</luceneMatchVersion>
<lib dir="lib/dist/" regex="apache-solr-cell-\d.*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-clustering-\d.*\.jar" />
<lib dir="lib/contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-dataimporthandler-\d.*\.jar" />
<!--<lib dir="lib/contrib/dataimporthandler/lib/" regex=".*\.jar" />-->
<lib dir="lib/dist/" regex="apache-solr-langid-\d.*\.jar" />
<lib dir="lib/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-velocity-\d.*\.jar" />
<lib dir="lib/contrib/velocity/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-dataimporthandler-extras-\d.*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex="tika-core-\d.*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex="tika-parsers-\d.*\.jar" />
<lib dir="/total/crap/dir/ignored" />
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<indexConfig>
</indexConfig>
<jmx />
<updateHandler class="solr.DirectUpdateHandler2">
<autoCommit>
<maxTime>15000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<updateLog>
<str name="dir">${solr.data.dir:}</str>
</updateLog>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">static firstSearcher warming in solrconfig.xml</str>
</lst>
</arr>
</listener>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="false" >
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" />
<httpCaching never304="true" />
</requestDispatcher>
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
</requestHandler>
<requestHandler name="/query" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="df">text</str>
</lst>
</requestHandler>
<requestHandler name="/get" class="solr.RealTimeGetHandler">
<lst name="defaults">
<str name="omitHeader">true</str>
</lst>
</requestHandler>
<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- VelocityResponseWriter settings -->
<str name="wt">velocity</str>
<str name="v.template">browse</str>
<str name="v.layout">layout</str>
<str name="title">Solritas</str>
<!-- Query settings -->
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="mm">100%</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="mlt.qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="mlt.fl">text,features,name,sku,id,manu,cat</str>
<int name="mlt.count">3</int>
<!-- Faceting defaults -->
<str name="facet">on</str>
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.query">ipod</str>
<str name="facet.query">GB</str>
<str name="facet.mincount">1</str>
<str name="facet.pivot">cat,inStock</str>
<str name="facet.range.other">after</str>
<str name="facet.range">price</str>
<int name="f.price.facet.range.start">0</int>
<int name="f.price.facet.range.end">600</int>
<int name="f.price.facet.range.gap">50</int>
<str name="facet.range">popularity</str>
<int name="f.popularity.facet.range.start">0</int>
<int name="f.popularity.facet.range.end">10</int>
<int name="f.popularity.facet.range.gap">3</int>
<str name="facet.range">manufacturedate_dt</str>
<str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
<str name="f.manufacturedate_dt.facet.range.end">NOW</str>
<str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
<str name="f.manufacturedate_dt.facet.range.other">before</str>
<str name="f.manufacturedate_dt.facet.range.other">after</str>
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">text features name</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
<!-- Spell checking defaults -->
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.maxCollations">3</str>
</lst>
<!-- append spellchecking to our list of components -->
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
</requestHandler>
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<!-- All the main content goes into "text"... if you need to return
the extracted text or do highlighting, use a stored field. -->
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
<requestHandler name="/analysis/field"
startup="lazy"
class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/analysis/document"
class="solr.DocumentAnalysisRequestHandler"
startup="lazy" />
<requestHandler name="/admin/"
class="solr.admin.AdminHandlers" />
<!-- ping/healthcheck -->
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
<lst name="invariants">
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<!-- Echo the request contents back to the client -->
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" >
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" />
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
<str name="distanceMeasure">internal</str>
<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
<float name="accuracy">0.5</float>
<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
<int name="maxEdits">2</int>
<!-- the minimum shared prefix when enumerating terms -->
<int name="minPrefix">1</int>
<!-- maximum number of inspections per result. -->
<int name="maxInspections">5</int>
<!-- minimum length of a query term to be considered for correction -->
<int name="minQueryLength">4</int>
<!-- maximum threshold of documents a query term can appear to be considered for correction -->
<float name="maxQueryFrequency">0.01</float>
<!-- uncomment this to require suggestions to occur in 1% of the documents
<float name="thresholdTokenFrequency">.01</float>
-->
</lst>
<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">name</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">10</int>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.alternativeTermCount">5</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.maxCollationTries">10</str>
<str name="spellcheck.maxCollations">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="tvComponent" class="solr.TermVectorComponent"/>
<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
<searchComponent name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent" >
<!-- Declare an engine -->
<lst name="engine">
<!-- The name, only one can be named "default" -->
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="${solr.clustering.enabled:false}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<!-- The title field -->
<str name="carrot.title">name</str>
<str name="carrot.url">id</str>
<!-- The field to cluster on -->
<str name="carrot.snippet">features</str>
<!-- produce summaries -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
<searchComponent name="terms" class="solr.TermsComponent"/>
<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<!-- A request handler for demonstrating the elevator component -->
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
default="true"
class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<fragmenter name="regex"
class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"&apos;]{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html"
default="true"
class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<!-- Configure the standard encoder -->
<encoder name="html"
class="solr.highlight.HtmlEncoder" />
<!-- Configure the standard fragListBuilder -->
<fragListBuilder name="simple"
class="solr.highlight.SimpleFragListBuilder"/>
<!-- Configure the single fragListBuilder -->
<fragListBuilder name="single"
class="solr.highlight.SingleFragListBuilder"/>
<!-- Configure the weighted fragListBuilder -->
<fragListBuilder name="weighted"
default="true"
class="solr.highlight.WeightedFragListBuilder"/>
<!-- default tag FragmentsBuilder -->
<fragmentsBuilder name="default"
default="true"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<!--
<lst name="defaults">
<str name="hl.multiValuedSeparatorChar">/</str>
</lst>
-->
</fragmentsBuilder>
<!-- multi-colored tag FragmentsBuilder -->
<fragmentsBuilder name="colored"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default"
default="true"
class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!?
</str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator"
class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
<str name="hl.bs.type">WORD</str>
<!-- language and country are used when constructing Locale object. -->
<!-- And the Locale object will be used when getting instance of BreakIterator -->
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
</searchComponent>
<queryResponseWriter name="json" class="solr.JSONResponseWriter">
<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy"/>
<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>

I checked the log file and i guess the error lies here/ Can't find resource 'solrconfig.xml' in classpath or 'solr.\conf/ It is not able to locate the solrconfig.xml file.
Apology,i dont know how to rectify it. May be a pro can . Cheers :)

This is caused most likely by some other error in your config. To find out what is going on go to Tomcat folder /logs/ and look for a file catalina.*.log
Reading through exception message you should be able to figure out witch part of configuration fails, if not paste it here.

I was having similar a problem when trying to deploy as a webapp under tomcat. I found this tutorial http://adeithzya.wordpress.com/2011/08/25/using-apache-solr-with-spring-framework/ and was able to get admin UI to work with one minor change, ie instead of windows file structure i was using Mac OS.
"I will be using Apache Tomcat as the web server for SOLR, hence browse to Tomcat configuration directory, in my case it’s “C:\apache-tomcat-6.0.32\conf\Catalina\localhost” add new .xml file “my-solr-app.xml” (it can be any name). Here are the content of the .xml file:
Start Apache Tomcat and open a browser to the “http://localhost:8080/my-solr-app/”, you may need to change the port number configured for Tomcat web server. If you see the welcome screen, then you’ve just finished Apache SOLR setup."
I had a small hiccup because i did not copy the correct folder, ensure you copy the entire "example" directory in your new directory

In solr ALPHA 4.0 distribution there is example folders. go to example-->solr--config folder
and copy the solrconfig.xml to your core whatever you have created . and if you r using data import handler copy this in solrconfig.xml
data-config.xml
your problem will be solved i have already tried and succeded

I had this same error. To fix it first make sure tomcat knows where your solr home is.
you can either set the path to solr home in the webapps/solr/web.xml file
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/Path/To/My/solr/Home/solr/</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
Or you can do this export JAVA_OPTS=”$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/”
I also upgraded to tomcat 7.0.8 because there's a url bug in earlier versions of tomcat 7 that prevent solr from working.

Indexing binary files from database issue (no errors)

I am trying to index binary files stored in a database (mysql) and I have no success. I have a solr configured as below:
Solr file structure
+solr
+bookledger(core0)
-conf
+lib(all necessary libraries)
+contrib
+dist
+data
+bookledger
-index
-spellchecker
+ktimatologio
-index
-spellchecker
+ktimatologio(core1)
-conf
+lib(all necessary libraries)
+contrib
+dist
As you can see the configuration concerns a multicore solr setup. Now, on the bookledger(core0) I have indexed binary files successfully (stored in a database). In the second core when I conduct full-import I see no errors! Then, when I try to query the binary content the output is like: [B#660b1b14. What am I missing here?
Thank you in advance,
Tom
The solr.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
<cores adminPath="/admin/cores">
<core name="ktimatologio" instanceDir="ktimatologio" dataDir="../data/ktimatologio"/>
<core name="bookledger" instanceDir="bookledger" dataDir="../data/bookledger"/>
</cores>
</solr>
The solrconfig.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<luceneMatchVersion>LUCENE_36</luceneMatchVersion>
<lib dir="lib/dist/" regex="apache-solr-cell-\d.*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-clustering-\d.*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-dataimporthandler-\d.*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-langid-\d.*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-velocity-\d.*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-dataimporthandler-extras-\d.*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex=".*\.jar" />
<lib dir="lib/contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="lib/contrib/dataimporthandler/lib/" regex=".*\.jar" />
<lib dir="lib/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="lib/contrib/velocity/lib/" regex=".*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex="tika-core-\d.*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex="tika-parsers-\d.*\.jar" />
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<indexConfig>
</indexConfig>
<jmx />
<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">static firstSearcher warming in solrconfig.xml</str>
</lst>
</arr>
</listener>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<requestDispatcher>
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" />
</requestDispatcher>
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">100</int>
</lst>
</requestHandler>
<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- VelocityResponseWriter settings -->
<str name="wt">velocity</str>
<str name="v.template">browse</str>
<str name="v.layout">layout</str>
<str name="title">Solritas</str>
<str name="df">text</str>
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="mlt.qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="mlt.fl">text,features,name,sku,id,manu,cat</str>
<int name="mlt.count">3</int>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="facet">on</str>
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.query">ipod</str>
<str name="facet.query">GB</str>
<str name="facet.mincount">1</str>
<str name="facet.pivot">cat,inStock</str>
<str name="facet.range.other">after</str>
<str name="facet.range">price</str>
<int name="f.price.facet.range.start">0</int>
<int name="f.price.facet.range.end">600</int>
<int name="f.price.facet.range.gap">50</int>
<str name="facet.range">popularity</str>
<int name="f.popularity.facet.range.start">0</int>
<int name="f.popularity.facet.range.end">10</int>
<int name="f.popularity.facet.range.gap">3</int>
<str name="facet.range">manufacturedate_dt</str>
<str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
<str name="f.manufacturedate_dt.facet.range.end">NOW</str>
<str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
<str name="f.manufacturedate_dt.facet.range.other">before</str>
<str name="f.manufacturedate_dt.facet.range.other">after</str>
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">text features name</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<requestHandler name="/update"
class="solr.XmlUpdateRequestHandler">
</requestHandler>
<requestHandler name="/update/javabin"
class="solr.BinaryUpdateRequestHandler" />
<requestHandler name="/update/csv"
class="solr.CSVRequestHandler"
startup="lazy" />
<requestHandler name="/update/json"
class="solr.JsonUpdateRequestHandler"
startup="lazy" />
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<!-- All the main content goes into "text"... if you need to return
the extracted text or do highlighting, use a stored field. -->
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
<requestHandler name="/update/xslt"
startup="lazy"
class="solr.XsltUpdateRequestHandler"/>
<requestHandler name="/analysis/field"
startup="lazy"
class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/analysis/document"
class="solr.DocumentAnalysisRequestHandler"
startup="lazy" />
<!-- Admin Handlers
Admin Handlers - This will register all the standard admin
RequestHandlers.
-->
<requestHandler name="/admin/"
class="solr.admin.AdminHandlers" />
<!-- ping/healthcheck -->
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
<lst name="invariants">
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<!-- Echo the request contents back to the client -->
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" >
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="tvComponent" class="solr.TermVectorComponent"/>
<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
<searchComponent name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent" >
<!-- Declare an engine -->
<lst name="engine">
<!-- The name, only one can be named "default" -->
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="${solr.clustering.enabled:false}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<!-- The title field -->
<str name="carrot.title">name</str>
<str name="carrot.url">id</str>
<!-- The field to cluster on -->
<str name="carrot.snippet">features</str>
<!-- produce summaries -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>
<str name="df">text</str>
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
<searchComponent name="terms" class="solr.TermsComponent"/>
<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<!-- A request handler for demonstrating the elevator component -->
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
<!-- Highlighting Component
http://wiki.apache.org/solr/HighlightingParameters
-->
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
default="true"
class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!-- A regular-expression-based fragmenter
(for sentence extraction)
-->
<fragmenter name="regex"
class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"&apos;]{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html"
default="true"
class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<!-- Configure the standard encoder -->
<encoder name="html"
class="solr.highlight.HtmlEncoder" />
<!-- Configure the standard fragListBuilder -->
<fragListBuilder name="simple"
default="true"
class="solr.highlight.SimpleFragListBuilder"/>
<!-- Configure the single fragListBuilder -->
<fragListBuilder name="single"
class="solr.highlight.SingleFragListBuilder"/>
<!-- default tag FragmentsBuilder -->
<fragmentsBuilder name="default"
default="true"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<!--
<lst name="defaults">
<str name="hl.multiValuedSeparatorChar">/</str>
</lst>
-->
</fragmentsBuilder>
<!-- multi-colored tag FragmentsBuilder -->
<fragmentsBuilder name="colored"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default"
default="true"
class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!?
</str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator"
class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.type">WORD</str>
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
</searchComponent>
<queryResponseWriter name="json" class="solr.JSONResponseWriter">
<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy"/>
-->
<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
The schema.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="ktimatologio" version="1.5">
<types>
<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
<fieldtype name="binary" class="solr.BinaryField"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<!-- A Trie based date field for faster date range queries and date faceting. -->
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
<fieldType name="pint" class="solr.IntField"/>
<fieldType name="plong" class="solr.LongField"/>
<fieldType name="pfloat" class="solr.FloatField"/>
<fieldType name="pdouble" class="solr.DoubleField"/>
<fieldType name="pdate" class="solr.DateField" sortMissingLast="true"/>
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<!-- Greek -->
<fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- greek specific lowercase for sigma -->
<filter class="solr.GreekLowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_el.txt" enablePositionIncrements="true"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_ktimatologio" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.GreekLowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="solr.GreekLowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
<fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
<fieldtype name="geohash" class="solr.GeoHashField"/>
<fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="solr_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="title" type="text_ktimatologio" indexed="true" stored="true"/>
<field name="model" type="text_ktimatologio" indexed="true" stored="true" multiValued="false"/>
<field name="type" type="text_ktimatologio" indexed="true" stored="true"/>
<field name="url" type="text_ktimatologio" indexed="true" stored="true"/>
<field name="content" type="text_ktimatologio" indexed="true" stored="true" multiValued="true"/>
<field name="last_modified" type="string" indexed="true" stored="true"/>
</fields>
<uniqueKey>solr_id</uniqueKey>
<defaultSearchField>content</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="title" dest="content" />
</schema>
The data-config.xml file:
<dataConfig>
<dataSource type="JdbcDataSource"
autoCommit="true" batchSize="-1"
convertType="false"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://127.0.0.1:3306/ktimatologio"
user="root"
password="1a2b3c4d"/>
<dataSource name="fieldReader" type="FieldStreamDataSource" />
<document>
<entity name="aitiologikes_ektheseis"
dataSource="db"
transformer="HTMLStripTransformer"
query="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, body AS content from aitiologikes_ektheseis where type = 'text'"
deltaImportQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, body AS content from aitiologikes_ektheseis where type = 'text' and id='${dataimporter.delta.id}'"
deltaQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, body AS content from aitiologikes_ektheseis where type = 'text' and last_modified > '${dataimporter.last_index_time}'">
<field column="id" name="id" />
<field column="solr_id" name="solr_id" />
<field column="title" name="title" stripHTML="true" />
<field column="model" name="model" stripHTML="true" />
<field column="type" name="type" stripHTML="true" />
<field column="url" name="url" stripHTML="true" />
<field column="last_modified" name="last_modified" stripHTML="true" />
<field column="content" name="content" stripHTML="true" />
</entity>
<entity name="aitiologikes_ektheseis_bin"
query="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS content from aitiologikes_ektheseis where type = 'bin'"
deltaImportQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS content from aitiologikes_ektheseis where type = 'bin' and id='${dataimporter.delta.id}'"
deltaQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS content from aitiologikes_ektheseis where type = 'bin' and last_modified > '${dataimporter.last_index_time}'"
transformer="TemplateTransformer"
dataSource="db">
<entity dataSource="fieldReader" processor="TikaEntityProcessor" dataField="aitiologikes_ektheseis_bin.content" format="text">
<field column="id" name="id" />
<field column="solr_id" name="solr_id" />
<field column="title" name="title" stripHTML="true" />
<field column="model" name="model" stripHTML="true" />
<field column="type" name="type" stripHTML="true" />
<field column="url" name="url" stripHTML="true" />
<field column="last_modified" name="last_modified" stripHTML="true" />
<field column="content" name="content" stripHTML="true" />
</entity>
</entity>
</document>
</dataConfig>

Finally i have found the solution. Notice the entity queries and the column definition in the data-config.xml:
....
<entity name="aitiologikes_ektheseis_bin"
query="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS content from aitiologikes_ektheseis where type = 'bin'"
deltaImportQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS content from aitiologikes_ektheseis where type = 'bin' and id='${dataimporter.delta.id}'"
deltaQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS content from aitiologikes_ektheseis where type = 'bin' and last_modified > '${dataimporter.last_index_time}'"
transformer="TemplateTransformer"
dataSource="db">
<entity dataSource="fieldReader" processor="TikaEntityProcessor" dataField="aitiologikes_ektheseis_bin.content" format="text">
<field column="id" name="id" />
<field column="solr_id" name="solr_id" />
<field column="title" name="title" stripHTML="true" />
<field column="model" name="model" stripHTML="true" />
<field column="type" name="type" stripHTML="true" />
<field column="url" name="url" stripHTML="true" />
<field column="last_modified" name="last_modified" stripHTML="true" />
<field column="content" name="content" stripHTML="true" />
</entity>
</entity>
</document>
</dataConfig>
In order "Tika" to "see" the content and extract it, i have to change the "content" to "text".
One thing more. The correct syntax is:
<entity name="aitiologikes_ektheseis_bin"
query="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS text from aitiologikes_ektheseis where type = 'bin'"
deltaImportQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS text from aitiologikes_ektheseis where type = 'bin' and id='${dataimporter.delta.id}'"
deltaQuery="select id, title, model, type, url, last_modified, CONCAT_WS('_',id,model) AS solr_id, bin_con AS text from aitiologikes_ektheseis where type = 'bin' and last_modified > '${dataimporter.last_index_time}'"
transformer="TemplateTransformer"
dataSource="db">
<field column="id" name="id" />
<field column="solr_id" name="solr_id" />
<field column="title" name="title" />
<field column="model" name="model" />
<field column="type" name="type" />
<field column="url" name="url" />
<field column="last_modified" name="last_modified" />
<entity dataSource="fieldReader" processor="TikaEntityProcessor" dataField="aitiologikes_ektheseis_bin.text" format="text">
<field column="text" name="content" />
</entity>
</entity>
</document>
</dataConfig>
I hope this helps someone.
Be well,
Tom

Solr pdf indexing issues

I am trying to index pdf with solr with no success. Is it the baseDir and/or url in the datanfig.xml? How do i properly set the above attributes correctly? I am getting the following when i am indexing pdf:
From Solr:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst><lst name="initArgs">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</lst><str name="command">full-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Time Elapsed">0:0:4.231</str>
<str name="Total Requests made to DataSource">0</str>
<str name="Total Rows Fetched">1</str>
<str name="Total Documents Processed">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2012-05-11 18:43:30</str>
<str name="">Indexing failed. Rolled back all changes.</str>
<str name="Rolledback">2012-05-11 18:43:30</str></lst><str name="WARNING">This response format is experimental. It is likely to change in the future.</str>
</response>
The log file:
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*} 0 4
11 Μαϊ 2012 6:55:28 μμ org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:tika Processing Document # 1
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:tika Processing Document # 1
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:tika Processing Document # 1
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:915)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:635)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.lang.ClassNotFoundException: Unable to load TikaEntityProcessor or org.apache.solr.handler.dataimport.TikaEntityProcessor
at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:1110)
at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:912)
... 8 more
Caused by: org.apache.solr.common.SolrException: Error loading class 'TikaEntityProcessor'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:394)
at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:1100)
... 9 more
Caused by: java.lang.ClassNotFoundException: TikaEntityProcessor
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:378)
... 10 more
The data-config.xml:
<?xml version="1.0" encoding="utf-8"?>
<dataConfig>
<dataSource type="BinFileDataSource" name="binary" />
<document>
<entity name="f" dataSource="binary" rootEntity="false" processor="FileListEntityProcessor" baseDir="/solr/solr/docu/" fileName=".*pdf" recursive="true">
<entity name="tika" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text">
<field column="id" name="id" meta="true" />
<field column="fake_id" name="fake_id" />
<field column="model" name="model" meta="true" />
<field column="text" name="biog" />
</entity>
</entity>
</document>
</dataConfig>
The solrconfig.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>
<luceneMatchVersion>LUCENE_36</luceneMatchVersion>
<lib dir="lib/dist/" regex="apache-solr-cell-\d.*\.jar" />
<lib dir="lib/contrib/extraction/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-clustering-\d.*\.jar" />
<lib dir="lib/contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-dataimporthandler-\d.*\.jar" />
<lib dir="lib/contrib/dataimporthandler/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-langid-\d.*\.jar" />
<lib dir="lib/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="lib/dist/" regex="apache-solr-velocity-\d.*\.jar" />
<lib dir="lib/contrib/velocity/lib/" regex=".*\.jar" />
<lib dir="lib/contrib/extraction/lib/" />
guration.
-->
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<indexConfig>
</indexConfig>
<jmx />
<!-- The default high-performance update handler -->
<updateHandler class="solr.DirectUpdateHandler2">
</updateHandler>
<!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Query section - these settings control query time things like caches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
<listener event="newSearcher" class="solr.QuerySenderListener">
<arr name="queries">
</arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
<arr name="queries">
<lst>
<str name="q">static firstSearcher warming in solrconfig.xml</str>
</lst>
</arr>
</listener>
<useColdSearcher>false</useColdSearcher>
<maxWarmingSearchers>2</maxWarmingSearchers>
</query>
<requestDispatcher>
<requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048000" />
<httpCaching never304="true" />
</requestDispatcher>
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">100</int>
<str name="df">biog</str>
</lst>
</requestHandler>
<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- VelocityResponseWriter settings -->
<str name="wt">velocity</str>
<str name="v.template">browse</str>
<str name="v.layout">layout</str>
<str name="title">Solritas</str>
<str name="df">text</str>
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="mlt.qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="mlt.fl">text,features,name,sku,id,manu,cat</str>
<int name="mlt.count">3</int>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="facet">on</str>
<str name="facet.field">cat</str>
<str name="facet.field">manu_exact</str>
<str name="facet.query">ipod</str>
<str name="facet.query">GB</str>
<str name="facet.mincount">1</str>
<str name="facet.pivot">cat,inStock</str>
<str name="facet.range.other">after</str>
<str name="facet.range">price</str>
<int name="f.price.facet.range.start">0</int>
<int name="f.price.facet.range.end">600</int>
<int name="f.price.facet.range.gap">50</int>
<str name="facet.range">popularity</str>
<int name="f.popularity.facet.range.start">0</int>
<int name="f.popularity.facet.range.end">10</int>
<int name="f.popularity.facet.range.gap">3</int>
<str name="facet.range">manufacturedate_dt</str>
<str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
<str name="f.manufacturedate_dt.facet.range.end">NOW</str>
<str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
<str name="f.manufacturedate_dt.facet.range.other">before</str>
<str name="f.manufacturedate_dt.facet.range.other">after</str>
<!-- Highlighting defaults -->
<str name="hl">on</str>
<str name="hl.fl">text features name</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
<!--
<str name="url-scheme">httpx</str>
-->
</requestHandler>
<requestHandler name="/update"
class="solr.XmlUpdateRequestHandler">
</requestHandler>
<requestHandler name="/update/javabin"
class="solr.BinaryUpdateRequestHandler" />
<requestHandler name="/update/csv"
class="solr.CSVRequestHandler"
startup="lazy" />
<requestHandler name="/update/json"
class="solr.JsonUpdateRequestHandler"
startup="lazy" />
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<!-- All the main content goes into "text"... if you need to return
the extracted text or do highlighting, use a stored field. -->
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes -->
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
<requestHandler name="/update/xslt"
startup="lazy"
class="solr.XsltUpdateRequestHandler"/>
<requestHandler name="/analysis/field"
startup="lazy"
class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/analysis/document"
class="solr.DocumentAnalysisRequestHandler"
startup="lazy" />
<requestHandler name="/admin/"
class="solr.admin.AdminHandlers" />
<!-- ping/healthcheck -->
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
<lst name="invariants">
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<!-- Echo the request contents back to the client -->
<requestHandler name="/debug/dump" class="solr.DumpRequestHandler" >
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
<searchComponent name="tvComponent" class="solr.TermVectorComponent"/>
<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">text</str>
<bool name="tv">true</bool>
</lst>
<arr name="last-components">
<str>tvComponent</str>
</arr>
</requestHandler>
<searchComponent name="clustering"
enable="${solr.clustering.enabled:false}"
class="solr.clustering.ClusteringComponent" >
<!-- Declare an engine -->
<lst name="engine">
<!-- The name, only one can be named "default" -->
<str name="name">default</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="${solr.clustering.enabled:false}"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">default</str>
<bool name="clustering.results">true</bool>
<!-- The title field -->
<str name="carrot.title">name</str>
<str name="carrot.url">id</str>
<!-- The field to cluster on -->
<str name="carrot.snippet">features</str>
<!-- produce summaries -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>
<str name="df">text</str>
<str name="defType">edismax</str>
<str name="qf">
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
<searchComponent name="terms" class="solr.TermsComponent"/>
<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
<!-- A request handler for demonstrating the elevator component -->
<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="df">text</str>
</lst>
<arr name="last-components">
<str>elevator</str>
</arr>
</requestHandler>
<!-- Highlighting Component
http://wiki.apache.org/solr/HighlightingParameters
-->
<searchComponent class="solr.HighlightComponent" name="highlight">
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap"
default="true"
class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!-- A regular-expression-based fragmenter
(for sentence extraction)
-->
<fragmenter name="regex"
class="solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"&apos;]{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html"
default="true"
class="solr.highlight.HtmlFormatter">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
<!-- Configure the standard encoder -->
<encoder name="html"
class="solr.highlight.HtmlEncoder" />
<!-- Configure the standard fragListBuilder -->
<fragListBuilder name="simple"
default="true"
class="solr.highlight.SimpleFragListBuilder"/>
<!-- Configure the single fragListBuilder -->
<fragListBuilder name="single"
class="solr.highlight.SingleFragListBuilder"/>
<!-- default tag FragmentsBuilder -->
<fragmentsBuilder name="default"
default="true"
class="solr.highlight.ScoreOrderFragmentsBuilder">
</fragmentsBuilder>
<!-- multi-colored tag FragmentsBuilder -->
<fragmentsBuilder name="colored"
class="solr.highlight.ScoreOrderFragmentsBuilder">
<lst name="defaults">
<str name="hl.tag.pre"><![CDATA[
<b style="background:yellow">,<b style="background:lawgreen">,
<b style="background:aquamarine">,<b style="background:magenta">,
<b style="background:palegreen">,<b style="background:coral">,
<b style="background:wheat">,<b style="background:khaki">,
<b style="background:lime">,<b style="background:deepskyblue">]]></str>
<str name="hl.tag.post"><![CDATA[</b>]]></str>
</lst>
</fragmentsBuilder>
<boundaryScanner name="default"
default="true"
class="solr.highlight.SimpleBoundaryScanner">
<lst name="defaults">
<str name="hl.bs.maxScan">10</str>
<str name="hl.bs.chars">.,!?
</str>
</lst>
</boundaryScanner>
<boundaryScanner name="breakIterator"
class="solr.highlight.BreakIteratorBoundaryScanner">
<lst name="defaults">
<!-- type should be one of:
* CHARACTER
* WORD (default)
* LINE
* SENTENCE
-->
<str name="hl.bs.type">WORD</str>
<!-- language and country are used when constructing Locale
object which will be used when getting instance of
BreakIterator
-->
<str name="hl.bs.language">en</str>
<str name="hl.bs.country">US</str>
</lst>
</boundaryScanner>
</highlighting>
</searchComponent>
<queryResponseWriter name="json" class="solr.JSONResponseWriter">
<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy"/>
<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
<int name="xsltCacheLifetimeSeconds">5</int>
</queryResponseWriter>
<!-- Legacy config for the admin interface -->
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>

For Tika you need the apache-solr-dataimporthandler-extras-3.6.0 in the dist directory.

I have indexed pdf & doc files using Solrj library. The following code snippet works:
String urlString = "http://localhost:8983/solr";
SolrServer solr = null;
try {
solr = new CommonsHttpSolrServer(urlString);
} catch (MalformedURLException e2) {
e2.printStackTrace();
}
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
try {
try {
up.addFile(file);
} catch (IOException e1) {
e1.printStackTrace();}
up.setParam("literal.id", solrId);
up.setParam("uprefix", "attr_");
up.setParam("fmap.content", "attr_content");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
try {
solr.request(up);
} catch (IOException e) {
e.printStackTrace();
}
} catch (SolrServerException e) {
e.printStackTrace();
}
Once indexed , you can query the "attr_content" (content of the pdf files).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Solr - Suggester not returning any suggestiions - apache

Can you commit and try your query? I guess it's because of the <str name="buildOnCommit">true</str> Can you try again making it false.

Related

Solr suggester no results

extract the content excerpt from Apache solr

Solr 4.0 UI issue

Indexing binary files from database issue (no errors)

Solr pdf indexing issues

Categories

Resources