Solr - How to make several searches on a single query? - apache

I'm trying to make a query that:
First try to match strings, just lowercase processing, and generate a score, boost x
Second, try to use the 'text_en' processing and generate a score, boost y
Trird, Use 'text_en' with synonyms, and generate a score, boost z
Is this possible?
I made the following configuration but I don't think it's doing what I described... It works but I don't trust on the results.
solrconfig.xml:
<requestHandler name="/querystems" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
<str name="qf">
lowercase^2 title_without_synonym description_without_synonym title_stems^1.5 description_stems^1
</str>
<str name="q.alt">*:*</str>
<str name="rows">100</str>
<str name="fl">id,title,description,reuseCount,score,title_stems,description_stems</str>
<str name="defType">edismax</str>
</lst>
</requestHandler>
schema.xml:
<field name="title" type="text_en_test" indexed="true" stored="true"/>
<field name="title_without_synonym" type="text_en_without_synonym" indexed="true" stored="true"/>
<field name="title_stems" type="text_en_stems" indexed="true" stored="true"/>
<copyField source="title" dest="title_without_synonym" maxChars="30000" />
<copyField source="title" dest="title_stems" maxChars="30000" />
<field name="description" type="text_en_test" indexed="true" stored="true"/>
<field name="description_without_synonym" type="text_en_without_synonym" indexed="true" stored="true"/>
<field name="description_stems" type="text_en_stems" indexed="true" stored="true"/>
<copyField source="description" dest="description_without_synonym" maxChars="30000" />
<copyField source="description" dest="description_stems" maxChars="30000" />
I don't even know why that 'lowercase' field works at the query configuration, as it's a 'fieldType', not 'field'... Can someone help me undestanding and better configurating this scenario?

Related

index all files inside a folder in solr

I am having troubles indexing a folder in solr
example-data-config.xml:
<dataConfig>
<dataSource type="BinFileDataSource" />
<document>
<entity name="files"
dataSource="null"
rootEntity="false"
processor="FileListEntityProcessor"
baseDir="C:\Temp\" fileName=".*"
recursive="true"
onError="skip">
<field column="fileAbsolutePath" name="id" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastModified" />
<entity
name="documentImport"
processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}"
format="text">
<field column="file" name="fileName"/>
<field column="Author" name="author" meta="true"/>
<field column="text" name="text"/>
</entity>
</entity>
</document>
then I create the schema.xml:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="fileName" type="string" indexed="true" stored="true" />
<field name="author" type="string" indexed="true" stored="true" />
<field name="title" type="string" indexed="true" stored="true" />
<field name="size" type="plong" indexed="true" stored="true" />
<field name="lastModified" type="pdate" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/>
finally I modify the file solrConfig.xml adding the requesthandler and the dataImportHandler and dataImportHandler-extra jars:
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">example-data-config.xml</str>
</lst>
</requestHandler>
I run it and the result is:
Inside that folder there are like 20.000 files in diferent formats (.py,.java,.wsdl, etc)
Any suggestion will be appreciated. Thanks :)
Check your Solr logs . Answer for what is the Root Cause will definitely be there . I also faced same situation once and found through solr logs that my DataImportHandler was throwing exceptions because of encrypted documents present in the folder . Your reasons may be different, but first analyze your solr logs, execute your entity again in DataImport section, and then check the immediate logs for errors by going on the logging section on admin page . If you are getting errors other than I what I mentioned , post them here , so they can be understood and deciphered .

Solr suggester no results

I want to use the Solr suggester component for city names. I have the following settings:
schema.xml
Field definition
<fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The field i want to apply the suggester on
<field name="city" type="string" indexed="true" stored="false"/>
The copy field
<copyField source="city" dest="citySuggest"/>
The field
<field name="citySuggest" type="textSuggest" stored="false" indexed="true" />
solr-config.xml
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">citySuggest</str>
<str name="suggestAnalyzerFieldType">string</str>
</lst>
</searchComponent>
Then i run
http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath&suggest.build=true
to build the suggest component
Finally i run
http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath
but i get an empty result set
{"responseHeader":{"status":0,"QTime":0},"suggest":{"mySuggester":{"Ath":{"numFound":0,"suggestions":[]}}}}
Are there any obvious mistakes? Any thoughts?
Τry the following:
In the field with name="citySuggest" set the attribute stored="true".
Then rebuild the suggester.

Solr: Facet is not creating any output

I am using Solr 4.4.0 and running some basic queries. This is what I do when i insert title:* in the query box
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">3</int>
<lst name="params">
<str name="q">title:*</str>
<str name="indent">true</str>
<str name="wt">xml</str>
<str name="_">1430883449558</str>
</lst>
</lst>
<result name="response" numFound="70" start="0">
<doc>
<str name="id">db01</str>
<str name="isbn">1933988177</str>
<str name="author">Michael McCandless, Erik Hatcher, Otis Gospodnetic</str>
<str name="author_s">Michael McCandless, Erik Hatcher, Otis Gospodnetic</str>
<int name="numpages">475</int>
<str name="description">When Lucene first hit the scene five years ago, it was nothing short of amazing. By using this open-source, highly scalable, super-fast search engine, developers could integrate search into applications quickly and efficiently. A lot has changed since then-search has grown from a "nice-to-have" feature into an indispensable part of most enterprise applications. Lucene now powers search in diverse companies including Akamai, Netflix, LinkedIn, Technorati, HotJobs, Epiphany, FedEx, Mayo Clinic, MIT, New Scientist Magazine, and many others.</str>
<str name="category">Computers/Programming/Information Retrieval/Lucene</str>
<float name="price">31.49</float>
<str name="price_c">31.49,USD</str>
<arr name="title">
<str>Lucene In Action, 2nd</str>
</arr>
<str name="yearpub">2010</str>
<date name="pubdate">2010-07-28T00:00:01Z</date>
<str name="publisher">Manning Publications</str>
<str name="store">37.763649,-122.24313</str>
<long name="_version_">1500385802538975232</long></doc>
and so on and so forth 70 times...
And this is okay, because this is the result I want(70 books), but when I try to add the facet.field = publisher, it doesn't do anything. It returns the exact same output as above. How can I get this facet to work? the indexing is set to true and everything. what am I doing wrong? Here is an excerpt of my schema:
<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
<field name="keywords" type="text_general" indexed="true" stored="true"/>
<field name="category" type="text_general" indexed="true" stored="true"/>
<field name="resourcename" type="text_general" indexed="true" stored="true"/>
<field name="url" type="text_general" indexed="true" stored="true"/>
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="last_modified" type="date" indexed="true" stored="true"/>
<field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="yearpub" type="string" indexed="true" stored="true"/>
<field name="pubdate" type="date" indexed="true" stored="true"/>
<field name="publisher" type="text_general" indexed="true" stored="true"/>
<field name="numpages" type="int" indexed="true" stored="true"/>
<field name="isbn" type="text_general" indexed="true" stored="true"/>
You need to change text_general on publisher field which uses WhitespaceTokenizerFactory means it splits phrases/strings into chunks whenever it encounters whitespace.
<field name="publisher" type="text_general" indexed="true" stored="true"/>
So Cambridge University Press is divided into
Cambridge
University
Press
Either remove that tokenizer or use other fieldType which doesn't use WhitespaceTokenizerFactory
You can use string fieldtype so update following and restart Solr and index data again
<field name="publisher" type="string" indexed="true" stored="true"/>

Solr 4 - missing required field: uuid

I'm having issues generating a UUID using the dataImportHandler in Solr4. Im trying to import from an existing MySQL database.
My schema.xml contains:
<fields>
<field name="uuid" type="uuid" indexed="true" stored="true" required="true" />
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="address" type="text_general" indexed="true" stored="true"/>
<field name="city" type="text_general" indexed="true" stored="true" />
<field name="county" type="string" indexed="true" stored="true" />
<field name="lat" type="text_general" indexed="true" stored="true" />
<field name="lng" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
<field name="price" type="float" indexed="true" stored="true"/>
<field name="bedrooms" type="float" indexed="true" stored="true" />
<field name="image" type="string" indexed="true" stored="true"/>
<field name="region" type="location_rpt" indexed="true" stored="true" />
<defaultSearchField>address</defaultSearchField>
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
</fields>
<uniqueKey>uuid</uniqueKey>
and then in <types>
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
My Solrconfig.xml contains:
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">  
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uuid</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
Whenever I run the update, some docs are inserted ok , buy many return with:
org.apache.solr.common.SolrException: [doc=204] missing required field: uuid
Going by the example at link it should be
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
.........
<lst name="defaults">
<str name="config">data-config.xml</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uuid</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Indexing data from pdf

I am trying to index data from pdf now, and I am getting the following response from Solr:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</lst>
<str name="command">full-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Time Elapsed">0:0:1.236</str>
<str name="Total Requests made to DataSource">0</str>
<str name="Total Rows Fetched">1</str>
<str name="Total Documents Processed">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2012-05-11 15:45:01</str>
<str name="">Indexing failed. Rolled back all changes.</str>
<str name="Rolledback">2012-05-11 15:45:01</str></lst><str name="WARNING">This response format is experimental. It is likely to change in the future.</str>
</response>
The log files showing this:
org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/apache/tika/parser/AutoDetectParser
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/apache/tika/parser/AutoDetectParser
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoClassDefFoundError: org/apache/tika/parser/AutoDetectParser
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:759)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.lang.NoClassDefFoundError: org/apache/tika/parser/AutoDetectParser
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:388)
at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:1100)
at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocBuilder.java:912)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:635)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.apache.tika.parser.AutoDetectParser
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 13 more
The configuration file looks like:
data-config.xml:
<?xml version="1.0" encoding="utf-8"?>
<dataConfig>
<dataSource type="BinFileDataSource" name="binary" />
<document>
<entity name="f" dataSource="binary" rootEntity="false" processor="FileListEntityProcessor" baseDir="C:\solr\solr\docu" fileName=".*pdf" recursive="true">
<entity name="tika" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text">
<field column="id" name="id" meta="true" />
<field column="fake_id" name="fake_id" />
<field column="model" name="model" meta="true" />
<field column="text" name="biog" />
</entity>
</entity>
</document>
</dataConfig>
schema.xml:
<fields>
<field name="id" type="string" indexed="true" stored="true" />
<field name="fake_id" type="string" indexed="true" stored="true" />
<field name="model" type="text_en" indexed="true" stored="true" />
<field name="firstname" type="text_en" indexed="true" stored="true"/>
<field name="lastname" type="text_en" indexed="true" stored="true"/>
<field name="title" type="text_en" indexed="true" stored="true"/>
<field name="biog" type="text_en" indexed="true" stored="true"/>
</fields>
<uniqueKey>fake_id</uniqueKey>
<defaultSearchField>biog</defaultSearchField>
Finally the “Tika” jars that I have are:
tika-core-1.0.jar and tika-parsers-1.0.jar
What is going wrong?
Thanks
The problem could be in your data-config.xml, you specified the bin file dataSource on entity named "f" (with FileListEntityProcessor processor) insttead of entity with TikaEntityProcessor processor.
I think you could try with this code:
<?xml version="1.0" encoding="utf-8"?>
<dataConfig>
<dataSource type="BinFileDataSource" name="binary" />
<document>
<entity name="f" dataSource="null" rootEntity="false" processor="FileListEntityProcessor" baseDir="C:\solr\solr\docu" fileName=".*pdf" recursive="true">
<entity name="tika" dataSource="binary" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text">
<field column="id" name="id" meta="true" />
<field column="fake_id" name="fake_id" />
<field column="model" name="model" meta="true" />
<field column="text" name="biog" />
</entity>
</entity>
</document>
</dataConfig>