Lucene - exclude fields from being searched

Lucene - exclude fields from being searched - lucene

I have a search index and require a lucene query which will conditionally search specified fields. The end result will be that if you're logged into the website, all fields will be searched, or if you're logged out, specified fields will be skipped by modifying the lucene query.
The closest I have at the moment is:
+(term1~ term2~) +_culture:([en-gb TO en-gb] [invariantifieldivaluei TO invariantifieldivaluei]) **-FieldToIgnore1:(term1 term2) -FieldToIgnore2:(term1 term2)**
The problem with this however is if one of the search terms exists in one of the fields not mentioned (FieldToIgnore1 or FieldToIgnore2), then the document is ignored because it's been excluded as one one of the fields to ignore were matched.
How can this be modified so lucene doesn't even match against the fields to ignore?

Instead of qualifying your search via Lucene and the Smart Search Results webpart, have you tried modifying the searchability of the document fields themselves. You can set search parameters on the Page Type or index itself.
Go to Page Types --> [your doc type] --> Search fields, and set what fields are and aren't exposed to searching.
Version 9 gives you these settings in the Smart Search app. See these docs for details.

Related

Apache Solr only return fields that value/query string was found in

I am just getting started with Apache Solr.
I have successfully run through the Apache tutorials and have now created my own collection and indexed my files.
Whilst the documentation is extensive I cannot find if there is a way to query all fields, but only return the fields that the search string/query was found in.
For example, if I have a file:
Filename: Weekly Report For Company X.pdf
Associated / indexed meta-data:
"id":"S:\\Weekly Reports\\JAN\\Weekly Report For Company X.PDF",
"date":["2017-11-02T19:14:07Z"],
"pdf_pdfversion":[1.6],
"company":["Microsoft"],
"access_permission_can_print_degraded":[true],
"subject":["weekly report; reports; weekly"],
"contenttypeid":["0x010100F29081EC69D67544A17D8172A093E42E"],
"dc_format":["application/pdf; version=1.6"],
If I query for "Weekly Report" I only want to return the 'id' and 'subject' fields as these are the only fields that contain the actual queried values. If other fields contained the string, I would want them returned too.
I'm leaning towards 'it cannot be done' (but hope I am wrong) as I liken it to a SQL query. It has to know what fields to return in the SQL statement and does not remove fields based on no matching string.
Since I don't know the matched fields before running the query I cannot use the filter list option at the point of executing the query.
Is this possible?

While this may be not precisely what you want, but you could mimic similar behaviour with highlighting.
All you need to do - is to create dismax query with qf being all fields that you have (e.g qf=id,subject,company)
Then you need to request highlighting, request all fields for it (hl.fl=id,subject,company) and enable hl.requireFieldMatch which would force Solr to return only fields which were matched for the query.
In this case you will have a highlighting section, that will contain ids of the matched documents and only highlighted contents of matched fields

SOLR 6 - indexing documents

I need to index documents on a SOLR server and update a specific field. I am using post jar on Windows for indexing the documents.
First question: is it possible to set the value of the required field directly from the post tool?
If not, text field is not stored but just indexed. As such, when I am doing the update of the field, the text field is losing all the content. I am updating the field using the http update (POST). The post parameters are: {"id":"D:\TESTNEWATTACH\AnexaNr.docx","PCC_TABLENAME":{"set":"PCC_CRM_ATTACH"}}
The main question is: how can I index a document and set a field belonging to that document without losing the document content search ability?

To update just a single field in a document, all fields has to be set as stored. If you don't have the fields set as stored, you'll lose the content when doing the update (as the process internally is retrieve document, update document, resubmit document).
The post tool supports giving arbitrary parameters to the update handler:
-params "=[&=...]" (values must be URL-encoded; these pass through to Solr update request)
.. which you can use with literal.fieldname=value to provide a value for the field(s) directly in the post request.
literal.<fieldname>
Populates a field with the name supplied with the specified value for each document. The data can be multivalued if the field is multivalued.

How to allow only one find per document searched on Lucene

I only want my Lucene search to give the highest scoring highlighted fragment per document. So say I have 5 documents with the word "performance" on each one three times, I still only want 5 results to be printed and highlighted to the results page. How can I go about doing that? Thanks!

You get only one fragment per document returned from the search by calling getBestFragment, rather than getBestFragments.
If your call to search is returning the same documents more than once, you very likely have more than one copy of the same document in your index. Make sure that if you intend to create a new index, you open your IndexWriter with it's OpenMode set to: IndexWriterConfig.OpenMode.CREATE.

Showing more feature attributes in Solr Highlight

How can I get more feature fields from Solr highlight output?
Currently the Highlight just returns the text snippet and docID.
During the indexing step I indexed the feature alongside with other fields I'd like to get back.
Thank you in advance!

You can specify other fields to return highlighting on using the hl.fl parameter. For multiple extra fields, just use that field repeatedly. For example, if you want to highlight in the fields author and title, you would append
&hl.fl=author&hl.fl=title
to your Solr query. Take a look at the linked page for other highlighting options.

How to avoid retrieve entire stored field from solr

I'm using sunspot and solr for a rails app to search ebook contens, for highlight feature I have to set the ebook_content as a stored filed, every time I queried solr for result, it sends back the entire document content about the book, which makes the query very slow.
How could I only get the result without the stored field?

The fl parameter of Solr allows you to specify which fields you want returned in the result. If you had fields id, title, ebook_content, then you could use fl=id,title to omit the ebook_content field. I don't think there's support in Solr for getting all fields except one (e.g. -ebook_content).
Update
If you don't want to return the field in the normal results, but still want highlighting on that field, exclude the field as I described above, then turn on the highlighter:
hl=true
set the field(s) which should be highlighted:
hl.fl=ebook_content
and set the size of the highlighting fragment (in characters):
hl.fragsize=50
your finished query looks something like this:
?q=search term&fl=id,title&hl=true&hl.fl=ebook_content&hl.fragsize=50

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lucene - exclude fields from being searched - lucene

Related

Apache Solr only return fields that value/query string was found in

SOLR 6 - indexing documents

How to allow only one find per document searched on Lucene

Showing more feature attributes in Solr Highlight

How to avoid retrieve entire stored field from solr

Categories

Resources