so I added a new field called 'theField' to my documents. Some of them managed to have the fields set correctly others don't
How do I go about making a SOLR query to only select documents whose 'theField' field is set properly?
You can use a Range Query with both ends open. like:
theField:[* TO *]
Related
I am just getting started with Apache Solr.
I have successfully run through the Apache tutorials and have now created my own collection and indexed my files.
Whilst the documentation is extensive I cannot find if there is a way to query all fields, but only return the fields that the search string/query was found in.
For example, if I have a file:
Filename: Weekly Report For Company X.pdf
Associated / indexed meta-data:
"id":"S:\\Weekly Reports\\JAN\\Weekly Report For Company X.PDF",
"date":["2017-11-02T19:14:07Z"],
"pdf_pdfversion":[1.6],
"company":["Microsoft"],
"access_permission_can_print_degraded":[true],
"subject":["weekly report; reports; weekly"],
"contenttypeid":["0x010100F29081EC69D67544A17D8172A093E42E"],
"dc_format":["application/pdf; version=1.6"],
If I query for "Weekly Report" I only want to return the 'id' and 'subject' fields as these are the only fields that contain the actual queried values. If other fields contained the string, I would want them returned too.
I'm leaning towards 'it cannot be done' (but hope I am wrong) as I liken it to a SQL query. It has to know what fields to return in the SQL statement and does not remove fields based on no matching string.
Since I don't know the matched fields before running the query I cannot use the filter list option at the point of executing the query.
Is this possible?
While this may be not precisely what you want, but you could mimic similar behaviour with highlighting.
All you need to do - is to create dismax query with qf being all fields that you have (e.g qf=id,subject,company)
Then you need to request highlighting, request all fields for it (hl.fl=id,subject,company) and enable hl.requireFieldMatch which would force Solr to return only fields which were matched for the query.
In this case you will have a highlighting section, that will contain ids of the matched documents and only highlighted contents of matched fields
I need to index documents on a SOLR server and update a specific field. I am using post jar on Windows for indexing the documents.
First question: is it possible to set the value of the required field directly from the post tool?
If not, text field is not stored but just indexed. As such, when I am doing the update of the field, the text field is losing all the content. I am updating the field using the http update (POST). The post parameters are: {"id":"D:\TESTNEWATTACH\AnexaNr.docx","PCC_TABLENAME":{"set":"PCC_CRM_ATTACH"}}
The main question is: how can I index a document and set a field belonging to that document without losing the document content search ability?
To update just a single field in a document, all fields has to be set as stored. If you don't have the fields set as stored, you'll lose the content when doing the update (as the process internally is retrieve document, update document, resubmit document).
The post tool supports giving arbitrary parameters to the update handler:
-params "=[&=...]" (values must be URL-encoded; these pass through to Solr update request)
.. which you can use with literal.fieldname=value to provide a value for the field(s) directly in the post request.
literal.<fieldname>
Populates a field with the name supplied with the specified value for each document. The data can be multivalued if the field is multivalued.
I have indexed approximately 1000 documents in Solr. But all of them are missing a field. I need to add a field to all these documents, and this field will have the same value for all of them. I do not have access to these documents to index them again. Is there any way to do this without re-indexing all the data again?
Unless you've configured your schema to store all values, no, there is no usable way to add a field to the documents without reindexing. If you all fields are stored, you can use atomic updates to add a new field for a document, so you could query Solr for the ids of all existing documents and perform an update that way.
Otherwise you're going to have to go with the suggestion from #michielvoo, and return a static value from the query string .. but then you could also just append it in your application before returning it to the user (or, you could add the field as a default value for the request handler in solrconfig.xml, so that you can edit and change it server side).
I'm using sunspot and solr for a rails app to search ebook contens, for highlight feature I have to set the ebook_content as a stored filed, every time I queried solr for result, it sends back the entire document content about the book, which makes the query very slow.
How could I only get the result without the stored field?
The fl parameter of Solr allows you to specify which fields you want returned in the result. If you had fields id, title, ebook_content, then you could use fl=id,title to omit the ebook_content field. I don't think there's support in Solr for getting all fields except one (e.g. -ebook_content).
Update
If you don't want to return the field in the normal results, but still want highlighting on that field, exclude the field as I described above, then turn on the highlighter:
hl=true
set the field(s) which should be highlighted:
hl.fl=ebook_content
and set the size of the highlighting fragment (in characters):
hl.fragsize=50
your finished query looks something like this:
?q=search term&fl=id,title&hl=true&hl.fl=ebook_content&hl.fragsize=50
I am running a Solr instance on Jetty and when I search using the Solr admin panel, it returns the entire document. What should I do to get only specified fields from each Solr document returned by the search?
/?q=query&fl=field1,field2,field3
From the Solr Admin home page, click on "Full Interface". On that page there is a box called "Fields to Return". You can list the you want here (comma-separated). "*" means all fields.
http://xx.xxx.xx.xx:8983/solr/corename/select?indent=on&q=*:*&wt=json&fl=ImageID,Imagepath,Category
This link has fl parameter:
fl is a field list, which will display the specified fields from the indexed list.
The best way is to run the query from Admin concole. When we run it, it also provides the actuall SQL query executed. Just copy the query and use it.
About the question: select specific fields from the table. In the admin console look for 'FL' text box. write the field names you want to retrieve, comma sapereted. Hit the 'Execute Query' button.
Top right side the SQL will be available.
Generated Query: ......select?fl=FIELDNAME&indent=on&q=:&wt=json
you can simply pass fl parameter with required fields name in your query.
&fl=field1,field2,field3&query=:
your response documents contains only mentioned fields.