SOLR index on pdate field included in search results - apache

I am migrating from SOLR 4.10.2 to 8.1.1. For some reason, in the 8.1.1 core, a pdate index named IDX_ExpirationDate is appearing as a field in the search results documents.
I have several other indexes that are defined and (correctly) do not appear in the results. But the index I am having trouble with is the only one based on a pdate.
Here is a sample 8.1.1 response that demonstrates the issue:
"response":{"numFound":58871,"start":0,"docs":[
{
"id":"11111",
"ExpirationDate":"2018-01-26T00:00:00Z",
"_version_":1641033044033798170,
"IDX_ExpirationDate":["2018-01-26T00:00:00Z"]},
{
"id":"22222",
"ExpirationDate":"2018-02-20T00:00:00Z",
"_version_":1641032965380112384,
"IDX_ExpirationDate":["2018-02-20T00:00:00Z"]},
ExpirationDate is supposed to be there, but IDX_ExpirationDate should not. I know that I can probably keep using date, but it is deprecated, and part of the reason for upgrading to 8.1.1 is to use the latest non-deprecated stuff ;-)
I have an index named IDX_ExpirationDate based on a field called ExpirationDate that was a date field in 4.10.2:
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<field name="IDX_ExpirationDate" type="date" indexed="true" stored="false" multiValued="true" />
<field name="ExpirationDate" type = "date" indexed = "true" stored = "true" />
<copyField source="ExpirationDate" dest="IDX_ExpirationDate"/>
In the 8.1.1 core, I have this configured as a pdate:
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
<field name="IDX_ExpirationDate" type="pdate" indexed="true" stored="false" multiValued="true" />
<field name="ExpirationDate" type = "pdate" indexed = "true" stored = "true" />
<copyField source="ExpirationDate" dest="IDX_ExpirationDate"/>

Fixed.
According to Shawn Heisey on the solruser mailing list, the pdate type defaults to docValues=true and useDocValuesAsStored="true", which makes it appear in results.
So I changed the IDX_ExpirationDate by adding useDocValuesAsStored="false", reloaded the index, and it no longer appears in the results:
<field name="IDX_ExpirationDate" type="pdate" indexed="true" stored="false" multiValued="true" useDocValuesAsStored="false"/>

Related

Solr + schema.xml creating custom FieldType Object

This is only an example, but it will Help me to get further
I have an Object "person" with Fields [Age,Name]
My schema.xml
<field name="age" type="string" indexed="true" stored="false"/>
<field name="name" type="string" indexed="true" stored="false"/>
everything is ok, but I want to add +1 more Field "relation" (or parents,children etc.)
Person[age,name,relation] -> Relation has also [age,name]
how can i, insert a FieldType Relation to my schema.xml ?
<field name="age" type="string" indexed="true" stored="false"/>
<field name="name" type="string" indexed="true" stored="false"/>
<field name="relation" type="???" indexed="true" stored="false"/>
I want to add an Field, which takes all existing Fields like above
<field name="field1" type="string">
<field name="field2" type="string">
<field name="field3" type="string">
<field name="field4" type="field1,field2,field3">
Solr doesn't really support what you want, so you'd probably either index it with a multivalued field that contains ids that point to the other documents, such as (any reason why the age field is a string and not an int?):
<field name="id" type="int" indexed="true" stored="false"/>
<field name="age" type="string" indexed="true" stored="false"/>
<field name="name" type="string" indexed="true" stored="false"/>
<field name="relation" type="int" multiValued="true" indexed="true" stored="false" />
.. and then query all documents with a given relation when displaying a document (making two queries to Solr).
You can also use nested child documents, but it requires a bit more handling (since everything is contained in one document, you'll have to update everything together).
Solr prefers everything to be in denormalized way. Multi value is in that direction. But as #MatsLindh said, it involves 2 queries, because most of the times the child entities would be more than just a single field(arrays of strings v/s array of entities).
(Parent and child in your case is Person and "relation")
Nested child documents, is more of object relation, just like how we have in other frameworks. You have parent documents, you have child documents, and solr maintains the relationship and we should have a field which identies parent from child. The good part about this is
You can get parent document with child field matching
All child documents for a parent field matching
Finally only one query
Nested stuff, is recent addition. We are using lucid works to interact with solr. They suggested not to use nested documents, so we ended up with multivalue. But if allowed at your infrastructure, and solr framework itself having the feature, i think there is no wrong in using it.

Solr query search for multiple instances for single keyword

I'm stuck on this one issue. What i want to do is to query on a Multivalued and see if a value comes up at least try. For example the field must be "FREE","FREE" and not just "FREE" or "FREE","IN_USE".
Field
<field name="point_statusses" type="string" indexed="true" stored="true" multiValued="true" />
Type
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
SQL
GROUP_CONCAT(cp.status) as point_statusses
Clarification:
I have an object that has multiple plugs and those all have a status of FREE, IN_USE or ERROR. What i want to do is filter on ones that have two plugs with status FREE and I can't change the structure of the schema.xml. How do i query to for this?
Unfortunately, it cannot be done without applying any changes to schema, because solr.StrField does not preserve term frequency information.
Quote from schema.xml:
...
1.2: omitTermFreqAndPositions attribute introduced, true by default
except for text fields.
...
However, if you can apply some changes, then the following will work (tested on the Solr 4.5.1):
1) Make one of the following changes to schema:
Change field to text_general (or any solr.TextField field);
<field name="point_statusses" type="text_general" indexed="true" stored="true" multiValued="true" />
OR add omitTermFreqAndPositions="false" to point_statusses definition:
<field name="point_statusses" type="string" indexed="true" stored="true" multiValued="true" omitTermFreqAndPositions="false"/>
2) Filter by term frequency.
Examples:
Search documents having exactly 2 'FREE' point_statusses:
{!frange l=2 u=2}termfreq(point_statusses,'FREE')
Or from 2 to 3 'FREE' point_statusses:
{!frange l=2 u=3}termfreq(point_statusses,'FREE')
The final solr query may look like this:
http://localhost:8983/solr/stack20746538/select?q=*:*&fq={!frange l=2 u=3}termfreq(point_statusses,'FREE')

Solr, multiple indexes

I want to index 2 different entities (2 tables in SQL in this case) into my Lucene index. One table containing products, another containing news items.
To be able to use the same search method (query) to search for both products and news items, I understand they must be in the same index, so a several core setup of Solr wouldn't work - right?
In data-config.xml I have defined 2 document types with corresponding entities.
In schema.xml I have defined fields for products and news items as well.
In my databse design (tables) my product table's unique key is called "ProductID", where as my news item's unique key is called "Id" (this is made by the CMS I'm using).
In data-config.xml should I just map both my unique id's to the same "name". Would that be all to make this work?
Am I following the right approach here?
Example of what I'm thinking;
data-config.xml
<!-- Products -->
<document name="products">
<entity name="product" dataSource="sqlServer" pk="ProductID" query="SELECT
ProductID,
ProductNumber,
ProductName,
FROM EcomProducts">
<field column="ProductID" name="**Id**"/>
<field column="ProductNumber" name="ProductNumber"/>
<field column="ProductName" name="ProductName"/>
</entity>
</document>
<!-- News items --->
<document name="newsitems">
<entity name="newsitems" dataSource="sqlServer" pk="id" query="SELECT
Id,
NewsItemTitle,
NewsItemContent,
FROM ItemType_NewsItems">
<field column="Id" name="**Id**"/>
<field column="NewsItemTitle" name="NewsItemTitle"/>
<field column="NewsItemContent" name="NewsItemContent"/>
</entity>
</document>
schema.xml
<!-- Products --->
<field name="**Id**" type="text_general" indexed="true" stored="true" required="true" />
<field name="ProductNumber" type="text_general" indexed="true" stored="true" required="false" />
<field name="ProductName" type="text_general" indexed="true" stored="true" required="false" multiValued="false"/>
<!-- Tips og fif --->
<field name="**Id**" type="text_general" indexed="true" stored="true" required="true" />
<field name="NewsItemTitle" type="text_general" indexed="true" stored="true" required="false" />
<field name="NewsItemContent" type="text_general" indexed="true" stored="true" required="false" />
Id
If you want to search them together, I would map the metadata to a common schema. Perhaps both ProductName and NewItemTitle would go in a "title" field. Some metadata will be unique to each type. Or you can index the info twice, once as ProductName and once as title.
Unless you can be sure the IDs will always, always be unique across the two sources, you may want to prefix them. It is also very handy to have a field that marks the type of each document. That allows searching only one type and, in DIH, you can use it to delete only one type.
In your SQL, you can add columns like this:
concat('product-',cast(ProductId as char)) as id,
'product' as type,
That is MySQL syntax, it might need tweaking for SQLServer.

What is the use of "multiValued" field type in Solr?

I'm new to Apache Solr. Even after reading the documentation part, I'm finding it difficult to clearly understand the functionality and use of the multiValued field type property.
What internally Solr does/treats/handles a field that is marked as multiValued?
What is the difference in indexing in Solr between a field that is multiValued and those that are not?
Can somebody explain with some good example?
Doc says:
multiValued=true|false
True if this
field may contain multiple values per
document, i.e. if it can appear
multiple times in a document
A multivalued field is useful when there are more than one value present for the field. An easy example would be tags, there can be multiple tags that need to be indexed. so if we have tags field as multivalued then solr response will return a list instead of a string value. One point to note is that you need to submit multiple lines for each value of the tags like:
<field name="tags">tag1</tags>
<field name="tags">tag2</tags>
...
<field name="tags">tagn</tags>
Once you have all the values index you can search or filter results by any value, e,g. you can find all documents with tag1 using query like
q=tags:tag1
or use the tags to filter out results like
q=query&fq=tags:tag1
multiValued defined in the schema whether the field is allowed to have more than one value.
For instance:
if I have a fieldType called ID which is multiValued=false indexing a document such as this:
doc {
id : [ 1, 2]
...
}
would cause an exception to be thrown in the indexing thread and the document will not be indexed (schema validation will fail).
On the other hand if I do have multiple values for a field I would want to set multiValued=true in order to guarantee that indexing is done correctly, for example:
doc {
id : 1
keywords: [ hello, world ]
...
}
In this case you would define "keywords" as a multiValued field.
I use multiple value fields only with copyfields, so think this way, say all fields will be single valued unless it's a copyfield, for example I have following fields:
<field name="id" type="string" indexed="true" stored="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="subject" type="string" indexed="true" stored="true"/>
<field name="location" type="string" indexed="true" stored="true"/>
I want to query one field only and possibly to search all 4 fields above, then we need to use copyfield. first to create a new field call 'all', then copy everything into 'all'
<field name="all" type="text" indexed="true" stored="true" multiValued="true"/>
<copyField source="*" dest="all"/>
Now field 'all' need to be multi-valued.

Problem with Solr dynamic/copy Field

I have a problem that i have a dynamic field in schema.xml
as <dynamicField name="sec_*" type="text" indexed="true" stored="false"/>
and <field name="Contents" type="text" indexed="true" stored="false" multiValued="true"/>
dynamic field is copied to Contents field as
<copyField source="sec_*" dest="Contents"/>
now when i perform search using some dynamic fields like "sec_1069:risk" it filters documents that does not contains that dynamic field called sec_1069 can any body help how i can force this thing that solr should not filter documents that don't have that dynamic field.
Try sec_1069:risk OR -sec_1069:[* TO *]