Facet query will give wrong output on dynamicfield in solr

Facet query will give wrong output on dynamicfield in solr - e-commerce

I have dynamicField as 'pa_mydynamicfieldname' in Solr 4.0
I have store value in this field as :
I have indexed my data by Encoding using System.Web.HttpUtility.UrlEncode(pa_mydynamicfieldname)
such as : 2.2+GHz+Intel+Pentium+Dual-Core+E2200
When i apply facet query to get result then output is as :
<lst name="facet_fields">
<lst name="pa_mydynamicfieldname">
<int name="2.2">1</int>
<int name="2.5">1</int>
<int name="core">1</int>
<int name="dual">1</int>
<int name="e2200">1</int>
<int name="ghz">1</int>
<int name="intel">1</int>
<int name="pentium">1</int>
</lst>
Instead of this I want output as :
<lst name="facet_fields">
<lst name="pa_mydynamicfieldname">
<int name="2.2+GHz+Intel+Pentium+Dual-Core+E2200">1</int>
</lst>
how can do this in Solr while applying facet query ?
Updated on 15-May-13
From Schema, text field is defined as:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
And dynamic field is defined as:
<dynamicField name="pa_*" type="text" indexed="true" stored="true" multiValued="true" required="false" />
We need it as multi-valued field, because a document may have multiple value defined for each product.
Please Help me.
Thanks

In order to accomplish the behavior that you are desiring, you will need to change the fieldType for the dynamic field in your schema.xml. Currently, your pa_mydyanmicfieldname is probably defined as a type="text_general" and with multivalued="true". So your field value is being split into tokens and these tokens are then being stored as multiple values. This is producing the behavior you show with multiple words/tokens being returned as facet values.
Since you want to store the original value as you submit it, please change your fieldType to just a plain old string and not multivalued:
<dynamicField name="*_mydynamicfeldname" type="string"
indexed="true" stored="true"/>
Or you can alternately take advantage of the predefined string based dynamic field defined in the example schema.xml:
<dynamicField name="*_s" type="string" indexed="true" stored="true" />
You will need to reindex your data after making this change to your schema.xml for new field types to be stored properly and reflected in the search results.

Related

Solr - custom text field is being processed as Long instead of Text/String

I'm running Solr 7.5 for a Sitecore 9.2 instance. I added a custom field type to the Solr schema (/conf/managed-schema.xml), so that I can search on string fields case-insensitively, based on this post
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have the field in my site's solr config:
<field fieldName="c_item_part_number_ci" returnType="string_ci">Feature.Products.ComputedFields.ItemPartNumberField,Feature.Products</field>
But in my Solr admin, it's showing as a LongPoint field, and when I try to index my items, I get an error, because I'm trying to pass text but it's expecting a number (the field is called Item Part Number but it can contain text, in this case "Compact-Item-20")
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">246</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">java.lang.NumberFormatException</str>
</lst>
<str name="msg">ERROR: [doc=sitecore://web/{46730869-114b-47ab-9c71-218fbe858caf}?lang=en&ver=1&ndx=sitecore_web_index] Error adding field 'c_item_part_number_ci'='CAB' msg=For input string: "CAB"</str>
<int name="code">400</int>
</lst>
</response>

No "content" field created when indexing PDF with solr

I have succesfully indexed PDF's using the POST command as described in the following link: http://makble.com/how-to-extract-text-from-pdf-and-post-into-solr
Terms stored within an indexed PDF file can be queried and can be found using general queries or the text field.
However, I do not see the "content" field as generated as I can with the other PDF related fields. I tried editing the managed-schema file to add the fields:
<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
<copyField source="content" dest="text"/>
I get the following error when I attemp to reload the core:
<str name="msg">Error handling 'reload' action</str>
<str name="trace">
org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:110) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
My solrconfig.xml has this:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
I would like to have the "content" field available to perform search only for the text located within the indexed pdf files.

1) Do not manually edit the schema file. Instead use the Schema API.
2) fmap.content maps the content field to the _text_ field in your case.
If you have a content field already defined, then just removing this particular parameter from the ExtractingRequestHandler definition should do the job.

Solr Sunspot non-indexed field

Solr (via Lucene) supports different ways to indicate the way a field is indexed in a document: indexed, tokenized, stored,...
I'm looking for a way to have fields that are stored in Solr but are not indexed. Is there a way to achieve that in Sunspot?

Sunspot's configuration DSL supports an option of :stored => true for many of its default types. For the example of the stored string, it would be much simpler than my first example:
searchable do
string :name, :stored => true
end
This generates a field name of name_ss corresponding to the following dynamicField already present in Sunspot's standard schema:
<dynamicField name="*_ss" stored="true" type="string" multiValued="false" indexed="true"/>
You can also create your own custom field or dynamicField in your schema.xml to be stored but not indexed, and then use the Sunspot 1.2 :as option to specify a corresponding field name.
For example, a more verbose version of the above. In your schema:
<dynamicField name="*_stored_string" type="string" indexed="false" stored="true" />
And in your model:
searchable do
string :name, :as => 'name_stored_string'
end

You can try :
http://localhost:8983/solr/admin/luke?numTerms=0
And read with xpath or regex those fields with schema attribute value:
<str name="I">Indexed</str>
<str name="T">Tokenized</str>
<str name="S">Stored</str>
You will get something like:
<lst name="field">
<str name="type">stringGeneralType</str>
<str name="schema">--SM---------</str>
</lst>

SOLR DataImportHandler does not evaluate expressions

I'm trying to use SOLR DataImportHandler to feed data. Configuration was simple and straightforward and everything worked fine, when I was importing only one field from root entity.
But when I tried to import fields from nested entities, it doesn't work and I'm really puzzled and stuck.
Here is relevant snippet from my dataconfig:
<dataConfig>
<dataSource ... />
<document>
<entity name="a" query="select id, b_id from a" pk="id">
<entity name="b" query="select title from b where id ='${a.b_id}'">
<field column="title" name="title" />
</entity>
</entity>
</document>
</dataConfig>
When I try to debug import using DIH Development Console with verbose switched on, i can see something like:
...
<lst name="document#3">
<str>----------- row #1-------------</str>
<str name="ID">PST_210-SI.10 </str>
<str name="B_ID">6c2r3490seeqvb86pgb4c4trf9</str>
<str>---------------------------------------------</str>
−
<lst name="entity:b">
<str name="query">select title from b where id =''</str>
<str name="query">select title from b where id =''</str>
<str name="query">select title from b where id =''</str>
<str name="time-taken">0:0:0.1</str>
<str name="time-taken">0:0:0.1</str>
<str name="time-taken">0:0:0.1</str>
</lst>
</lst>
I think the interesting point are the 3 queries in entity b, where the id field is empty. It seems to me, like the ${a.b_id} is not evaluated, but I can't find out why.
Can anyone help, please?
Thanks in advance.

Ha, as usual - after spending whole afternoon trying to find out solution, when I run out all ideas and ask community a question.. I suddenly find the solution myself :)
The catch was case sensitivity - If you look properly on the verbose XML output, there is for some reason . So I tried to use expression ${a.B_ID} and it works!
Maybe the upper case could be speciffic only for Oracle JDBC driver.

Faceting with Solr using "string" fields, "text" fields and "copy" fields

I have a problem with Solr and Faceting and wondering if anyone knows of the fix. I have a work around for it at the minute, however i really want to work out why my query isn't working.
Here is my Schema, simplified to make it easier to follow:
<fields>
<field name="uniqueid" type="string" indexed="true" required="true"/>
<!-- Indexed and Stored Field --->
<field name="recordtype" type="text" indexed="true" stored="true"/>
<!-- Facet Version of fields -->
<field name="frecordtype" type="string" indexed="true" stored="false"/>
</fields>
<!-- Copy fields for facet searches -->
<copyField source="recordtype" dest="frecordtype"/>
As you can see I have a case insensitive field called recordtype and it's copied to a case sensitive field frecordtype which does not tokenize the text. This is because solr returns the indexed value rather than the stored value in the faceting results.
When i try the following query:
http://localhost:8080
/solr
/select
?version=2.2
&facet.field=%7b!ex%3dfrecordtype%7dfrecordtype
&facet=on
&fq=%7b!tag%3dfrecordtype%7dfrecordtype%3aLarge%20Record
&f1=*%2cscore
&rows=20
&start=0
&qt=standard
&q=text%3a%25
I don't get any results, however the facteting still shows there is 1 record.
<result name="response" numFound="0" start="0" />
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="frecordtype">
<int name="Large Record">1</int>
<int name="Small Record">12</int>
<int name="Other">1</int>
</lst>
</lst>
<lst name="facet_dates" />
</lst>
However if i change the fitler query (line 7 only) to be on the "recordtype" insted of frecordtype:
http://localhost:8080
/solr
/select
?version=2.2
&facet.field=%7b!ex%3dfrecordtype%7dfrecordtype
&facet=on
&fq=%7b!tag%3dfrecordtype%7drecordtype%3aLarge%20Record
&f1=*%2cscore
&rows=20
&start=0
&qt=standard
&q=text%3a%25
I get the 1 result back that i want.
<result name="response" numFound="1" start="0" />
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="frecordtype">
<int name="Large Record">1</int>
<int name="Small Record">12</int>
<int name="Other">1</int>
</lst>
</lst>
<lst name="facet_dates" />
</lst>
So my question is, is there something i need to do in order to get the first version of the query to return the results i want? Perhaps it's something to do with URL Encoding or something? Any hints from some solr guru's or otherwise would be very grateful.
NOTE: This isn't necessary a faceting question as the faceting is actually working. It's more a query question in that I can't perform a query on a "string" field, even though the case and spacing is exactly the same as the indexed version.
EDIT: For more information on faceting you can check out these blog post's on it:
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit
Thanks
Dave

You need quotes around the values
E.g.
frecordtype:"Large Record"
works
frecordtype:Large Record
This will search for Large in the frecordtype, which will bring back nothing.. then Record across the default field in solr.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Facet query will give wrong output on dynamicfield in solr - e-commerce

Related

Solr - custom text field is being processed as Long instead of Text/String

No "content" field created when indexing PDF with solr

Solr Sunspot non-indexed field

SOLR DataImportHandler does not evaluate expressions

Faceting with Solr using "string" fields, "text" fields and "copy" fields

Categories

Resources