SOLR 6 - indexing documents - indexing

I need to index documents on a SOLR server and update a specific field. I am using post jar on Windows for indexing the documents.
First question: is it possible to set the value of the required field directly from the post tool?
If not, text field is not stored but just indexed. As such, when I am doing the update of the field, the text field is losing all the content. I am updating the field using the http update (POST). The post parameters are: {"id":"D:\TESTNEWATTACH\AnexaNr.docx","PCC_TABLENAME":{"set":"PCC_CRM_ATTACH"}}
The main question is: how can I index a document and set a field belonging to that document without losing the document content search ability?

To update just a single field in a document, all fields has to be set as stored. If you don't have the fields set as stored, you'll lose the content when doing the update (as the process internally is retrieve document, update document, resubmit document).
The post tool supports giving arbitrary parameters to the update handler:
-params "=[&=...]" (values must be URL-encoded; these pass through to Solr update request)
.. which you can use with literal.fieldname=value to provide a value for the field(s) directly in the post request.
literal.<fieldname>
Populates a field with the name supplied with the specified value for each document. The data can be multivalued if the field is multivalued.

Related

Apache Solr only return fields that value/query string was found in

I am just getting started with Apache Solr.
I have successfully run through the Apache tutorials and have now created my own collection and indexed my files.
Whilst the documentation is extensive I cannot find if there is a way to query all fields, but only return the fields that the search string/query was found in.
For example, if I have a file:
Filename: Weekly Report For Company X.pdf
Associated / indexed meta-data:
"id":"S:\\Weekly Reports\\JAN\\Weekly Report For Company X.PDF",
"date":["2017-11-02T19:14:07Z"],
"pdf_pdfversion":[1.6],
"company":["Microsoft"],
"access_permission_can_print_degraded":[true],
"subject":["weekly report; reports; weekly"],
"contenttypeid":["0x010100F29081EC69D67544A17D8172A093E42E"],
"dc_format":["application/pdf; version=1.6"],
If I query for "Weekly Report" I only want to return the 'id' and 'subject' fields as these are the only fields that contain the actual queried values. If other fields contained the string, I would want them returned too.
I'm leaning towards 'it cannot be done' (but hope I am wrong) as I liken it to a SQL query. It has to know what fields to return in the SQL statement and does not remove fields based on no matching string.
Since I don't know the matched fields before running the query I cannot use the filter list option at the point of executing the query.
Is this possible?
While this may be not precisely what you want, but you could mimic similar behaviour with highlighting.
All you need to do - is to create dismax query with qf being all fields that you have (e.g qf=id,subject,company)
Then you need to request highlighting, request all fields for it (hl.fl=id,subject,company) and enable hl.requireFieldMatch which would force Solr to return only fields which were matched for the query.
In this case you will have a highlighting section, that will contain ids of the matched documents and only highlighted contents of matched fields

Lucene - exclude fields from being searched

I have a search index and require a lucene query which will conditionally search specified fields. The end result will be that if you're logged into the website, all fields will be searched, or if you're logged out, specified fields will be skipped by modifying the lucene query.
The closest I have at the moment is:
+(term1~ term2~) +_culture:([en-gb TO en-gb] [invariantifieldivaluei TO invariantifieldivaluei]) **-FieldToIgnore1:(term1 term2) -FieldToIgnore2:(term1 term2)**
The problem with this however is if one of the search terms exists in one of the fields not mentioned (FieldToIgnore1 or FieldToIgnore2), then the document is ignored because it's been excluded as one one of the fields to ignore were matched.
How can this be modified so lucene doesn't even match against the fields to ignore?
Instead of qualifying your search via Lucene and the Smart Search Results webpart, have you tried modifying the searchability of the document fields themselves. You can set search parameters on the Page Type or index itself.
Go to Page Types --> [your doc type] --> Search fields, and set what fields are and aren't exposed to searching.
Version 9 gives you these settings in the Smart Search app. See these docs for details.

Adding an extra field to already indexed data Solr

I have indexed approximately 1000 documents in Solr. But all of them are missing a field. I need to add a field to all these documents, and this field will have the same value for all of them. I do not have access to these documents to index them again. Is there any way to do this without re-indexing all the data again?
Unless you've configured your schema to store all values, no, there is no usable way to add a field to the documents without reindexing. If you all fields are stored, you can use atomic updates to add a new field for a document, so you could query Solr for the ids of all existing documents and perform an update that way.
Otherwise you're going to have to go with the suggestion from #michielvoo, and return a static value from the query string .. but then you could also just append it in your application before returning it to the user (or, you could add the field as a default value for the request handler in solrconfig.xml, so that you can edit and change it server side).

SOLR query to check if a field is set

so I added a new field called 'theField' to my documents. Some of them managed to have the fields set correctly others don't
How do I go about making a SOLR query to only select documents whose 'theField' field is set properly?
You can use a Range Query with both ends open. like:
theField:[* TO *]

How to avoid retrieve entire stored field from solr

I'm using sunspot and solr for a rails app to search ebook contens, for highlight feature I have to set the ebook_content as a stored filed, every time I queried solr for result, it sends back the entire document content about the book, which makes the query very slow.
How could I only get the result without the stored field?
The fl parameter of Solr allows you to specify which fields you want returned in the result. If you had fields id, title, ebook_content, then you could use fl=id,title to omit the ebook_content field. I don't think there's support in Solr for getting all fields except one (e.g. -ebook_content).
Update
If you don't want to return the field in the normal results, but still want highlighting on that field, exclude the field as I described above, then turn on the highlighter:
hl=true
set the field(s) which should be highlighted:
hl.fl=ebook_content
and set the size of the highlighting fragment (in characters):
hl.fragsize=50
your finished query looks something like this:
?q=search term&fl=id,title&hl=true&hl.fl=ebook_content&hl.fragsize=50