Solr Language Detection - apache

I have a field "text", which I need to copy to text_en or text_es based on the language of "text".
Below is my managed_schema.xml:
<updateRequestProcessorChain name="langid">
<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
<bool name="langid">true</bool>
<str name="langid.fl">text</str>
<str name="langid.langField">tweet_lang</str>
<str name="langid.whitelist">es,en</str>
<bool name="langid.map">true</bool>
<!--bool name="langid.map.individual">true</bool-->
<str name="langid.map.individual.fl">text</str>
<bool name="langid.map.keepOrig">true</bool>
<str name="langid.fallback">ko</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I created a copy field text_en and text_es.When I post the data in spanish, data is copied from text to text_en and text_es as well!
How do I solve this?
Thanks!

By creating copyFields from text to text_en and text_es you get incoming data into both fields regardless of the langage detection, that is what copyField is supposed to do.
The updateRequestProcessor will actually make a copy (rather than a move) because you set <bool name="langid.map.keepOrig">true</bool>.
Other than that, the processor's config looks fine, just remove these copyFields and ensure the mapped fields text_en and text_es are well defined in your schema.

Thanks for the headsup!
The issue is solved by removing the copy fields and created dynamic fields
*_es and
*_en in schema.xml

Related

No "content" field created when indexing PDF with solr

I have succesfully indexed PDF's using the POST command as described in the following link: http://makble.com/how-to-extract-text-from-pdf-and-post-into-solr
Terms stored within an indexed PDF file can be queried and can be found using general queries or the text field.
However, I do not see the "content" field as generated as I can with the other PDF related fields. I tried editing the managed-schema file to add the fields:
<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
<copyField source="content" dest="text"/>
I get the following error when I attemp to reload the core:
<str name="msg">Error handling 'reload' action</str>
<str name="trace">
org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:110) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
My solrconfig.xml has this:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
I would like to have the "content" field available to perform search only for the text located within the indexed pdf files.
1) Do not manually edit the schema file. Instead use the Schema API.
2) fmap.content maps the content field to the _text_ field in your case.
If you have a content field already defined, then just removing this particular parameter from the ExtractingRequestHandler definition should do the job.

Apache solr search is not working when i give the criteria q=value to search

Apache solr search is not working when i give the criteria q='value to search'. This is working fine when i gave q=':' and it fetches all the result.
I am using the Apache solr version 4.7.0
The question needs more information.
yet .. the reason for not returning data could be the following potential reasons
Did you use the default query field df>text or did you edit in the solrconfig.xml?
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
If the default is text field, did you populate the data into the field name "text" in schema.xml ?
If the default field is something else, dod you populate that field?
With the above clues you should be able to solve out.

How do I set a default field list in my solr schema?

I have a schema that contains a fairly large text field.
I've gzipped it and enabled lazy loading, but it will still be fetched unless every client using solr explicitly sets the field list (fl) parameter.
How can I configure solr to omit the large gzipped text field from the results when querying without a field list parameter?
The simplest way to do this is to add the field list to the requestHandler. Assuming that you are using the default /select request handler, you will need to modify your solrconfig.xml adding the fl option to the list of defaults for the /select requestHandler. See below for an example.
<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="fl">field1,field2,field3</str>
</lst>
....
</requestHandler>
So in this example, I am setting the fl parameter so the query will return field1, field2 and field3 by default. These will be the fields returned when querying unless the request specifies the fl parameter and then whatever fields are sent will be returned.
These defaults can be set per requestHandler, so if you are using an different requestHandler, then just modify your configuration as needed.
Hope this helps.

Solr/Lucene spellcheck suggestions based on multiple fields

I have a database with Vendor's information: name and address (address, city, zip and country fields). I need to search this database and return some vendors. On the search box, the user could type anything: name of the vendor, part of the address, city, zip,... And, if I can't find any results, I need to implement a google like "Did you mean" feature to give a suggestion to the user.
I thought about using Solr/Lucene to do it. I've installed Solr, exported the information I need using CSV file and created the indexes based on this file. Now I am able to get suggestions from a Solr field using solr.SpellCheckComponent. The thing is my suggestion is based in a single field and need it to get information from address, city, zip, country and name fields.
On solr config file I have something like this:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count>1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
I can run queries like:
http://localhost:8983/solr/spell?q=some_company_name&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
Does anyone know how to change my config file in order to have suggestions from multiple fields?
Thanks!!!
In order to configure Solr spellcheck to use words from several fields you should:
Declare a new field. The New field declaration should use the properties type="textSpell" and multiValued="true". For example: <field name="didYouMean" type="textSpell" indexed="true" multiValued="true"/>.
Copy all the fields, of which their words should be part of the spellcheck index, into the new field. For example: <copyField source="field1" dest="didYouMean"/>
<copyField source="field2" dest="didYouMean"/>.
Configure Solr to use the new field. Do it by set the field name to use your spellcheck field name. For example: <str name="field">didYouMean</str>.
For more and detailed information visit Solr spellcheck compound from several fields
You use copyfield for this in schema.xml.<copyField source="*" dest="contentSpell"/> will copy all the fields to contentSpell.
Then change <str name="field">name</str> to <str name="field">contentSpell</str> en you will get suggestions from all fields.

How do I implement a solr spell checker?

I want to implement a spellchecker component in my search application using solr. What configuration is required to change for it?
Add the following section to your solrconfig.xml
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<!--
Optional, it is required when more than one spellchecker is configured.
Select non-default name with spellcheck.dictionary in request handler.
-->
<str name="name">default</str>
<!-- The classname is optional, defaults to IndexBasedSpellChecker -->
<str name="classname">solr.IndexBasedSpellChecker</str>
<!--
Load tokens from the following field for spell checking,
analyzer for the field's type as defined in schema.xml are used
-->
<str name="field">spell</str>
<!-- Optional, by default use in-memory index (RAMDirectory) -->
<str name="spellcheckIndexDir">./spellchecker</str>
<!-- Set the accuracy (float) to be used for the suggestions. Default is 0.5 -->
<str name="accuracy">0.7</str>
<!-- Require terms to occur in 1/100th of 1% of documents in order to be included in the dictionary -->
<float name="thresholdTokenFrequency">.0001</float>
</lst>
<!-- Example of using different distance measure -->
<lst name="spellchecker">
<str name="name">jarowinkler</str>
<str name="field">lowerfilt</str>
<!-- Use a different Distance Measure -->
<str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
<!-- This field type's analyzer is used by the QueryConverter to tokenize the value for "q" parameter -->
<str name="queryAnalyzerFieldType">textSpell</str>
</searchComponent>
<!--
The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens. Uses a simple regular expression
to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser.
Optional, defaults to solr.SpellingQueryConverter
-->
<queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/>
<!-- Add to a RequestHandler
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NOTE: YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT. THIS IS DONE HERE SOLELY FOR
THE SIMPLICITY OF THE EXAMPLE. YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-->
<requestHandler name="/spellCheckCompRH" class="solr.SearchHandler">
<lst name="defaults">
<!-- Optional, must match spell checker's name as defined above, defaults to "default" -->
<str name="spellcheck.dictionary">default</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<!-- Add to a RequestHandler
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
REPEAT NOTE: YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT. THIS IS DONE HERE SOLELY FOR
THE SIMPLICITY OF THE EXAMPLE. YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-->
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
This config sample from Solr Wiki ,
After adding this you can request to build spellchecker index
http://localhost:8983/solr/spell?q=some query&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
Note to not include the last part of the query in each request because this woill build the spelling index all time you request so
the previous becomes after the first request
http://localhost:8983/solr/spell?q=some query&spellcheck=true&spellcheck.collate=true
In the previous XML sextion son't forget to replace the field spell by the field on which you want to build your spellchecker against
And now you can feel the power of spellchecking