Apache Solr undefined field score field in function query - apache

I am using solr 4.10. I have to change relevance of documents based on a field boost and document score. For that, I have come to know that I should use function query. Following is the syntax of boost field in schema
<field name="boost" type="float" stored="true" indexed="false" default="1.0"/>
My first question is that can function queries be used on stored fields only?
When I try using above schema, like following query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id&sort=pow(score,%20boost)%20asc
There was some error saying like
sort param could not be parsed as a query, and is not a field that exists in the index:
then I changed the schema like
<field name="boost" type="float" stored="true" indexed="true" default="1.0"/>
Then above problem was gone but a new error appeared for query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id,pow(score,%20boost)
Following error appeared
<lst name="error">
<str name="msg">undefined field: "score"</str>
<int name="code">400</int>
</lst>
Where I am wrong?
Am I correct to change attributes of boost field?

I would recommend to use a boost function and sort just by score (default = no order param needed).
bf=linear(boost,100,0)
You may use other functions. That depends on your usecase.
Just check out the solr docs for function queries.

Related

Search Predicate Builder

I am using Lucene search with Sitecore 7.2 and using predicate builder to search for data. I have included a computed field in the index which is a string. When I search on that field using .Contains(mystring), it fails when there is 'and' present in mystring. If there is no 'and' in the mystring it works.
Can you please suggest me anything?
Lucene by default, when the field and query is processed, will strip out what are called "stop words" such as and and the etc.
If you dont want this behaviour you can add an entry into the fieldMap section of your configuration to tell Sitecore how to process the field ...
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="YOURFIELDNAME" storageType="YES" indexType="UN_TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
</field>
...
</fieldNames>
.. this example tells Sitecore, for that field, to not tokenize and also to put everything into lowercase. You can change to different analyzers to get the results you want.
You can try setting the indexType to TOKENIZED but still using the LowerCaseKeywordAnalyzer as another combination. UN_TOKENIZED will mean that your string will be processed as a single token which may not be what you want.
I have solved it, taking a hint from #Stephen Pope 's reply. In order to make your computed field untokenized you have to add it to both raw:AddFieldByFieldName and AddComputedIndexField.
See link below
http://www.sitecore.net/Community/Technical-Blogs/Martina-Welander-Sitecore-Blog/Posts/2013/09/Sitecore-7-Search-Tips-Computed-Fields.aspx

How to get recored creation time in solr?

I'm using DataImportHandler to index data from Postgres
I would like to get the record creation time so I could compare it to the actual object creation time later
These records are being updated (by id), so adding "NOW" field won't do the trick
This is how I did it eventually:
1.Use multiValued
schema.xml:
<field name="creation_time" type="date" indexed="false" stored="true" required="false" multiValued="true" />
2. Add FirstFieldValueUpdateProcessorFactory to the update process chain, which will keep only the first value
solrconfig.xml under updateRequestProcessorChain:
<processor class="solr.FirstFieldValueUpdateProcessorFactory">
<str name="fieldName">creation_time</str>
</processor>
3. When indexing use solr 4.0 atomic update "add" on this field:
{"creation_time": {"add":"2012-03-06T15:02:45.017Z"}}
The solution is taken from here:
https://issues.apache.org/jira/browse/SOLR-4468

Solr Sunspot non-indexed field

Solr (via Lucene) supports different ways to indicate the way a field is indexed in a document: indexed, tokenized, stored,...
I'm looking for a way to have fields that are stored in Solr but are not indexed. Is there a way to achieve that in Sunspot?
Sunspot's configuration DSL supports an option of :stored => true for many of its default types. For the example of the stored string, it would be much simpler than my first example:
searchable do
string :name, :stored => true
end
This generates a field name of name_ss corresponding to the following dynamicField already present in Sunspot's standard schema:
<dynamicField name="*_ss" stored="true" type="string" multiValued="false" indexed="true"/>
You can also create your own custom field or dynamicField in your schema.xml to be stored but not indexed, and then use the Sunspot 1.2 :as option to specify a corresponding field name.
For example, a more verbose version of the above. In your schema:
<dynamicField name="*_stored_string" type="string" indexed="false" stored="true" />
And in your model:
searchable do
string :name, :as => 'name_stored_string'
end
You can try :
http://localhost:8983/solr/admin/luke?numTerms=0
And read with xpath or regex those fields with schema attribute value:
<str name="I">Indexed</str>
<str name="T">Tokenized</str>
<str name="S">Stored</str>
You will get something like:
<lst name="field">
<str name="type">stringGeneralType</str>
<str name="schema">--SM---------</str>
</lst>

What is the use of "multiValued" field type in Solr?

I'm new to Apache Solr. Even after reading the documentation part, I'm finding it difficult to clearly understand the functionality and use of the multiValued field type property.
What internally Solr does/treats/handles a field that is marked as multiValued?
What is the difference in indexing in Solr between a field that is multiValued and those that are not?
Can somebody explain with some good example?
Doc says:
multiValued=true|false
True if this
field may contain multiple values per
document, i.e. if it can appear
multiple times in a document
A multivalued field is useful when there are more than one value present for the field. An easy example would be tags, there can be multiple tags that need to be indexed. so if we have tags field as multivalued then solr response will return a list instead of a string value. One point to note is that you need to submit multiple lines for each value of the tags like:
<field name="tags">tag1</tags>
<field name="tags">tag2</tags>
...
<field name="tags">tagn</tags>
Once you have all the values index you can search or filter results by any value, e,g. you can find all documents with tag1 using query like
q=tags:tag1
or use the tags to filter out results like
q=query&fq=tags:tag1
multiValued defined in the schema whether the field is allowed to have more than one value.
For instance:
if I have a fieldType called ID which is multiValued=false indexing a document such as this:
doc {
id : [ 1, 2]
...
}
would cause an exception to be thrown in the indexing thread and the document will not be indexed (schema validation will fail).
On the other hand if I do have multiple values for a field I would want to set multiValued=true in order to guarantee that indexing is done correctly, for example:
doc {
id : 1
keywords: [ hello, world ]
...
}
In this case you would define "keywords" as a multiValued field.
I use multiple value fields only with copyfields, so think this way, say all fields will be single valued unless it's a copyfield, for example I have following fields:
<field name="id" type="string" indexed="true" stored="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="subject" type="string" indexed="true" stored="true"/>
<field name="location" type="string" indexed="true" stored="true"/>
I want to query one field only and possibly to search all 4 fields above, then we need to use copyfield. first to create a new field call 'all', then copy everything into 'all'
<field name="all" type="text" indexed="true" stored="true" multiValued="true"/>
<copyField source="*" dest="all"/>
Now field 'all' need to be multi-valued.

Location aware search

I am trying location aware search with spatial example found in
http://www.ibm.com/developerworks/java/library/j-spatial/#indexing.approaches.
The schema.xml has a geohash field, but this field is not present in any of the .osm files (present in data folder) used to index. I am not able to understand how the value is assigned to it, so that when I give this query
http://localhost:8983/solr/select/?q=_val_:"recip (ghhsin(geohash(44.79, -93), geohash, 3963.205), 1, 1, 0)"^100
result set has geohash value retrieved. How is it happening? Please help me.
The Solr wiki has a pretty good page on how Spatial search can be done with solr 1.5+.
To summarize, your schema defines 'geohash' typed fields:
<fieldtype name="geohash" class="solr.GeoHashField"/>
<field name="destination" type="geohash" indexed="true" stored="true"/>
Data feeders pass in geohashed coordinates:
<field name="destination">cbj1pb56p4b</field> <!-- 45.17614 -93.87341 -->
You probably should go back to using simple latitude and longitude coordinates to start off with. There are better docs for it.