Solr Sunspot non-indexed field - lucene

Solr (via Lucene) supports different ways to indicate the way a field is indexed in a document: indexed, tokenized, stored,...
I'm looking for a way to have fields that are stored in Solr but are not indexed. Is there a way to achieve that in Sunspot?

Sunspot's configuration DSL supports an option of :stored => true for many of its default types. For the example of the stored string, it would be much simpler than my first example:
searchable do
string :name, :stored => true
end
This generates a field name of name_ss corresponding to the following dynamicField already present in Sunspot's standard schema:
<dynamicField name="*_ss" stored="true" type="string" multiValued="false" indexed="true"/>
You can also create your own custom field or dynamicField in your schema.xml to be stored but not indexed, and then use the Sunspot 1.2 :as option to specify a corresponding field name.
For example, a more verbose version of the above. In your schema:
<dynamicField name="*_stored_string" type="string" indexed="false" stored="true" />
And in your model:
searchable do
string :name, :as => 'name_stored_string'
end

You can try :
http://localhost:8983/solr/admin/luke?numTerms=0
And read with xpath or regex those fields with schema attribute value:
<str name="I">Indexed</str>
<str name="T">Tokenized</str>
<str name="S">Stored</str>
You will get something like:
<lst name="field">
<str name="type">stringGeneralType</str>
<str name="schema">--SM---------</str>
</lst>

Related

Apache Solr - Document is missing mandatory uniqueKey field: id

I'm using Solr7.1 (SolrCloud mode) and I don't have requirement to enforce document uniqueness.
Hence I marked id field (designated as unique key) in schema as required="false".
<field name="id" type="string" indexed="true" stored="false" required="false" multiValued="false" />
<uniqueKey>id</uniqueKey>
And I am trying to index some documents using solr Admin UI and I am trying without specifying 'id' field.
{
"cat": "books",
"name": "JayStore"
}
I was expecting it to index successfully but solr is throwing error saying 'mandatory unique key field id is missing'
Could some one guide me what I'm doing wrong.
The uniqueKey field is required internally by Solr for certain features, such as using cursorMark - meaning that the field that is defined as a uniqueKey is required. It's also used for routing etc. inside SolrCloud by default (IIRC), so if it's not present Solr won't be able to shard your documents correctly. Setting it as not required in the schema won't relax that requirement.
But you can work around this by defining an UUID field, and using a UUID Update Processor as described in the old wiki. This will generate a unique UUID for each document when you index it, meaning each document will get a unique identificator attached by default.
UUID is short for Universal Unique IDentifier. The UUID standard RFC-4122 includes several types of UUID with different input formats. There is a UUID field type (called UUIDField) in Solr 1.4 which implements version 4. Fields are defined in the schema.xml file with:
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
in Solr 4, this field must be populated via solr.UUIDUpdateProcessorFactory.
<field name="id" type="uuid" indexed="true" stored="true" required="true"/>
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" /
</updateRequestProcessorChain>

No "content" field created when indexing PDF with solr

I have succesfully indexed PDF's using the POST command as described in the following link: http://makble.com/how-to-extract-text-from-pdf-and-post-into-solr
Terms stored within an indexed PDF file can be queried and can be found using general queries or the text field.
However, I do not see the "content" field as generated as I can with the other PDF related fields. I tried editing the managed-schema file to add the fields:
<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
<copyField source="content" dest="text"/>
I get the following error when I attemp to reload the core:
<str name="msg">Error handling 'reload' action</str>
<str name="trace">
org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:110) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
My solrconfig.xml has this:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
I would like to have the "content" field available to perform search only for the text located within the indexed pdf files.
1) Do not manually edit the schema file. Instead use the Schema API.
2) fmap.content maps the content field to the _text_ field in your case.
If you have a content field already defined, then just removing this particular parameter from the ExtractingRequestHandler definition should do the job.

Apache Solr undefined field score field in function query

I am using solr 4.10. I have to change relevance of documents based on a field boost and document score. For that, I have come to know that I should use function query. Following is the syntax of boost field in schema
<field name="boost" type="float" stored="true" indexed="false" default="1.0"/>
My first question is that can function queries be used on stored fields only?
When I try using above schema, like following query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id&sort=pow(score,%20boost)%20asc
There was some error saying like
sort param could not be parsed as a query, and is not a field that exists in the index:
then I changed the schema like
<field name="boost" type="float" stored="true" indexed="true" default="1.0"/>
Then above problem was gone but a new error appeared for query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id,pow(score,%20boost)
Following error appeared
<lst name="error">
<str name="msg">undefined field: "score"</str>
<int name="code">400</int>
</lst>
Where I am wrong?
Am I correct to change attributes of boost field?
I would recommend to use a boost function and sort just by score (default = no order param needed).
bf=linear(boost,100,0)
You may use other functions. That depends on your usecase.
Just check out the solr docs for function queries.

How to get recored creation time in solr?

I'm using DataImportHandler to index data from Postgres
I would like to get the record creation time so I could compare it to the actual object creation time later
These records are being updated (by id), so adding "NOW" field won't do the trick
This is how I did it eventually:
1.Use multiValued
schema.xml:
<field name="creation_time" type="date" indexed="false" stored="true" required="false" multiValued="true" />
2. Add FirstFieldValueUpdateProcessorFactory to the update process chain, which will keep only the first value
solrconfig.xml under updateRequestProcessorChain:
<processor class="solr.FirstFieldValueUpdateProcessorFactory">
<str name="fieldName">creation_time</str>
</processor>
3. When indexing use solr 4.0 atomic update "add" on this field:
{"creation_time": {"add":"2012-03-06T15:02:45.017Z"}}
The solution is taken from here:
https://issues.apache.org/jira/browse/SOLR-4468

Facet query will give wrong output on dynamicfield in solr

I have dynamicField as 'pa_mydynamicfieldname' in Solr 4.0
I have store value in this field as :
I have indexed my data by Encoding using System.Web.HttpUtility.UrlEncode(pa_mydynamicfieldname)
such as : 2.2+GHz+Intel+Pentium+Dual-Core+E2200
When i apply facet query to get result then output is as :
<lst name="facet_fields">
<lst name="pa_mydynamicfieldname">
<int name="2.2">1</int>
<int name="2.5">1</int>
<int name="core">1</int>
<int name="dual">1</int>
<int name="e2200">1</int>
<int name="ghz">1</int>
<int name="intel">1</int>
<int name="pentium">1</int>
</lst>
Instead of this I want output as :
<lst name="facet_fields">
<lst name="pa_mydynamicfieldname">
<int name="2.2+GHz+Intel+Pentium+Dual-Core+E2200">1</int>
</lst>
how can do this in Solr while applying facet query ?
Updated on 15-May-13
From Schema, text field is defined as:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
And dynamic field is defined as:
<dynamicField name="pa_*" type="text" indexed="true" stored="true" multiValued="true" required="false" />
We need it as multi-valued field, because a document may have multiple value defined for each product.
Please Help me.
Thanks
In order to accomplish the behavior that you are desiring, you will need to change the fieldType for the dynamic field in your schema.xml. Currently, your pa_mydyanmicfieldname is probably defined as a type="text_general" and with multivalued="true". So your field value is being split into tokens and these tokens are then being stored as multiple values. This is producing the behavior you show with multiple words/tokens being returned as facet values.
Since you want to store the original value as you submit it, please change your fieldType to just a plain old string and not multivalued:
<dynamicField name="*_mydynamicfeldname" type="string"
indexed="true" stored="true"/>
Or you can alternately take advantage of the predefined string based dynamic field defined in the example schema.xml:
<dynamicField name="*_s" type="string" indexed="true" stored="true" />
You will need to reindex your data after making this change to your schema.xml for new field types to be stored properly and reflected in the search results.