I'm using DataImportHandler to index data from Postgres
I would like to get the record creation time so I could compare it to the actual object creation time later
These records are being updated (by id), so adding "NOW" field won't do the trick
This is how I did it eventually:
1.Use multiValued
schema.xml:
<field name="creation_time" type="date" indexed="false" stored="true" required="false" multiValued="true" />
2. Add FirstFieldValueUpdateProcessorFactory to the update process chain, which will keep only the first value
solrconfig.xml under updateRequestProcessorChain:
<processor class="solr.FirstFieldValueUpdateProcessorFactory">
<str name="fieldName">creation_time</str>
</processor>
3. When indexing use solr 4.0 atomic update "add" on this field:
{"creation_time": {"add":"2012-03-06T15:02:45.017Z"}}
The solution is taken from here:
https://issues.apache.org/jira/browse/SOLR-4468
Related
I am migrating from SOLR 4.10.2 to 8.1.1. For some reason, in the 8.1.1 core, a pdate index named IDX_ExpirationDate is appearing as a field in the search results documents.
I have several other indexes that are defined and (correctly) do not appear in the results. But the index I am having trouble with is the only one based on a pdate.
Here is a sample 8.1.1 response that demonstrates the issue:
"response":{"numFound":58871,"start":0,"docs":[
{
"id":"11111",
"ExpirationDate":"2018-01-26T00:00:00Z",
"_version_":1641033044033798170,
"IDX_ExpirationDate":["2018-01-26T00:00:00Z"]},
{
"id":"22222",
"ExpirationDate":"2018-02-20T00:00:00Z",
"_version_":1641032965380112384,
"IDX_ExpirationDate":["2018-02-20T00:00:00Z"]},
ExpirationDate is supposed to be there, but IDX_ExpirationDate should not. I know that I can probably keep using date, but it is deprecated, and part of the reason for upgrading to 8.1.1 is to use the latest non-deprecated stuff ;-)
I have an index named IDX_ExpirationDate based on a field called ExpirationDate that was a date field in 4.10.2:
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<field name="IDX_ExpirationDate" type="date" indexed="true" stored="false" multiValued="true" />
<field name="ExpirationDate" type = "date" indexed = "true" stored = "true" />
<copyField source="ExpirationDate" dest="IDX_ExpirationDate"/>
In the 8.1.1 core, I have this configured as a pdate:
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
<field name="IDX_ExpirationDate" type="pdate" indexed="true" stored="false" multiValued="true" />
<field name="ExpirationDate" type = "pdate" indexed = "true" stored = "true" />
<copyField source="ExpirationDate" dest="IDX_ExpirationDate"/>
Fixed.
According to Shawn Heisey on the solruser mailing list, the pdate type defaults to docValues=true and useDocValuesAsStored="true", which makes it appear in results.
So I changed the IDX_ExpirationDate by adding useDocValuesAsStored="false", reloaded the index, and it no longer appears in the results:
<field name="IDX_ExpirationDate" type="pdate" indexed="true" stored="false" multiValued="true" useDocValuesAsStored="false"/>
I'm using Solr7.1 (SolrCloud mode) and I don't have requirement to enforce document uniqueness.
Hence I marked id field (designated as unique key) in schema as required="false".
<field name="id" type="string" indexed="true" stored="false" required="false" multiValued="false" />
<uniqueKey>id</uniqueKey>
And I am trying to index some documents using solr Admin UI and I am trying without specifying 'id' field.
{
"cat": "books",
"name": "JayStore"
}
I was expecting it to index successfully but solr is throwing error saying 'mandatory unique key field id is missing'
Could some one guide me what I'm doing wrong.
The uniqueKey field is required internally by Solr for certain features, such as using cursorMark - meaning that the field that is defined as a uniqueKey is required. It's also used for routing etc. inside SolrCloud by default (IIRC), so if it's not present Solr won't be able to shard your documents correctly. Setting it as not required in the schema won't relax that requirement.
But you can work around this by defining an UUID field, and using a UUID Update Processor as described in the old wiki. This will generate a unique UUID for each document when you index it, meaning each document will get a unique identificator attached by default.
UUID is short for Universal Unique IDentifier. The UUID standard RFC-4122 includes several types of UUID with different input formats. There is a UUID field type (called UUIDField) in Solr 1.4 which implements version 4. Fields are defined in the schema.xml file with:
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
in Solr 4, this field must be populated via solr.UUIDUpdateProcessorFactory.
<field name="id" type="uuid" indexed="true" stored="true" required="true"/>
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" /
</updateRequestProcessorChain>
I am using solr 4.10. I have to change relevance of documents based on a field boost and document score. For that, I have come to know that I should use function query. Following is the syntax of boost field in schema
<field name="boost" type="float" stored="true" indexed="false" default="1.0"/>
My first question is that can function queries be used on stored fields only?
When I try using above schema, like following query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id&sort=pow(score,%20boost)%20asc
There was some error saying like
sort param could not be parsed as a query, and is not a field that exists in the index:
then I changed the schema like
<field name="boost" type="float" stored="true" indexed="true" default="1.0"/>
Then above problem was gone but a new error appeared for query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id,pow(score,%20boost)
Following error appeared
<lst name="error">
<str name="msg">undefined field: "score"</str>
<int name="code">400</int>
</lst>
Where I am wrong?
Am I correct to change attributes of boost field?
I would recommend to use a boost function and sort just by score (default = no order param needed).
bf=linear(boost,100,0)
You may use other functions. That depends on your usecase.
Just check out the solr docs for function queries.
I'm stuck on this one issue. What i want to do is to query on a Multivalued and see if a value comes up at least try. For example the field must be "FREE","FREE" and not just "FREE" or "FREE","IN_USE".
Field
<field name="point_statusses" type="string" indexed="true" stored="true" multiValued="true" />
Type
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
SQL
GROUP_CONCAT(cp.status) as point_statusses
Clarification:
I have an object that has multiple plugs and those all have a status of FREE, IN_USE or ERROR. What i want to do is filter on ones that have two plugs with status FREE and I can't change the structure of the schema.xml. How do i query to for this?
Unfortunately, it cannot be done without applying any changes to schema, because solr.StrField does not preserve term frequency information.
Quote from schema.xml:
...
1.2: omitTermFreqAndPositions attribute introduced, true by default
except for text fields.
...
However, if you can apply some changes, then the following will work (tested on the Solr 4.5.1):
1) Make one of the following changes to schema:
Change field to text_general (or any solr.TextField field);
<field name="point_statusses" type="text_general" indexed="true" stored="true" multiValued="true" />
OR add omitTermFreqAndPositions="false" to point_statusses definition:
<field name="point_statusses" type="string" indexed="true" stored="true" multiValued="true" omitTermFreqAndPositions="false"/>
2) Filter by term frequency.
Examples:
Search documents having exactly 2 'FREE' point_statusses:
{!frange l=2 u=2}termfreq(point_statusses,'FREE')
Or from 2 to 3 'FREE' point_statusses:
{!frange l=2 u=3}termfreq(point_statusses,'FREE')
The final solr query may look like this:
http://localhost:8983/solr/stack20746538/select?q=*:*&fq={!frange l=2 u=3}termfreq(point_statusses,'FREE')
I'm new to Apache Solr. Even after reading the documentation part, I'm finding it difficult to clearly understand the functionality and use of the multiValued field type property.
What internally Solr does/treats/handles a field that is marked as multiValued?
What is the difference in indexing in Solr between a field that is multiValued and those that are not?
Can somebody explain with some good example?
Doc says:
multiValued=true|false
True if this
field may contain multiple values per
document, i.e. if it can appear
multiple times in a document
A multivalued field is useful when there are more than one value present for the field. An easy example would be tags, there can be multiple tags that need to be indexed. so if we have tags field as multivalued then solr response will return a list instead of a string value. One point to note is that you need to submit multiple lines for each value of the tags like:
<field name="tags">tag1</tags>
<field name="tags">tag2</tags>
...
<field name="tags">tagn</tags>
Once you have all the values index you can search or filter results by any value, e,g. you can find all documents with tag1 using query like
q=tags:tag1
or use the tags to filter out results like
q=query&fq=tags:tag1
multiValued defined in the schema whether the field is allowed to have more than one value.
For instance:
if I have a fieldType called ID which is multiValued=false indexing a document such as this:
doc {
id : [ 1, 2]
...
}
would cause an exception to be thrown in the indexing thread and the document will not be indexed (schema validation will fail).
On the other hand if I do have multiple values for a field I would want to set multiValued=true in order to guarantee that indexing is done correctly, for example:
doc {
id : 1
keywords: [ hello, world ]
...
}
In this case you would define "keywords" as a multiValued field.
I use multiple value fields only with copyfields, so think this way, say all fields will be single valued unless it's a copyfield, for example I have following fields:
<field name="id" type="string" indexed="true" stored="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="subject" type="string" indexed="true" stored="true"/>
<field name="location" type="string" indexed="true" stored="true"/>
I want to query one field only and possibly to search all 4 fields above, then we need to use copyfield. first to create a new field call 'all', then copy everything into 'all'
<field name="all" type="text" indexed="true" stored="true" multiValued="true"/>
<copyField source="*" dest="all"/>
Now field 'all' need to be multi-valued.