SOLR: Get Full Text Content for document that matches query - apache

I have a SOLR instance and wish to extract out the full text content that was indexed within the instance. Is this possible?
If there is a query type I can use to fetch the full text content from the instance, I'd be grateful if someone could point me to it!

Well, it turns out that the SOLR schema must specify stored="true" on the field that stores the full text in order for a query to fetch all of the content in that field.
My schema specified the opposite, which means that this text is lot retrievable (though it is searchable, as the schema specifies index="true"!

Related

How to use Lucene Luke for testing search results on more than one field?

I am using Lucene Luke to test search index results and noticed that I cannot select more than one field in 'Default field' drop down list. Is this by design or we cannot use Luke tool for searching against multiple fields?
Basically I would like to know SOLR qf(query field) equivalent in Lucene.
Thanks
You can search using format field:query.
For details see: https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description
Lucene supports fielded data. When performing a search you can either
specify a field, or use the default field. The field names and default
field is implementation specific.
You can search any field by typing the field name followed by a colon
":" and then the term you are looking for.
As an example, let's assume a Lucene index contains two fields, title
and text and text is the default field. If you want to find the
document entitled "The Right Way" which contains the text "don't go
this way", you can enter:
title:"The Right Way" AND text:go or
title:"The Right Way" AND go Since text is the default field, the
field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:The Right Way Will only find "The" in the title field. It will
find "Right" and "Way" in the default field (in this case the text
field).

Search in all columns in crate

As elastic search has _all field I am not able to find anything regarding that in cratedb. SO do we need to maintain our own analyzed field for that purpose or does crate provide something in built?
The _all field is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved.
The _all field allows you to search for values in documents without knowing which field contains the value. This makes it a useful option when getting started with a new dataset
refer : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-all-field.html
We don't have something similar to that, so you'd need to add it to the query or maintain a dedicated column.

search a analyzed field through the stored original value in elasticsearch

In elastic search I have a field that is analyzed and I am also storing the original value. I want to search the field with the stored value not the analyzed value.
Is there any way to do it?
note: I cannot make the field not_analyzed, because I am searching the analyzed values also.
Take a look at the multi fields type, which will allow to two store the field both analyzed for full text search and not_analyzed for exact matches.

Solr: what are the default values for fields which does not have a default value explicitly set?

I'm working with Solr's schema.xml, and I know that I can use the 'default' attribute to specify a default value which is to be used if a value for a given field has not been provided. However, say that I choose not to set the 'default' attribute, which default value will Solr then fall back to?
I would think that the field type which I've used for the given field would have a default value which would be used, but I have had not success finding any details about this. Alternatively, I'd think that not providing a value and not setting a default value effectively would be as if that field does not exist for the particular document.
However, I'm not sure and I'd like to know :-)
UPDATE 1
As far as I can see, Solr just throws an error and returns an error 400 "Bad Request" if no default value has been set and no value has been provided for a given field. In other words, Solr does not seem to apply any "fallback" default values in case no value is provided and no default value has been set in schema.xml.
UPDATE 2
My above update seems to be wrong. If no value has been provided for a field and no default value has been set for that field, then Solr will just treat the field as if it does not exist for that particular document. This behaviour does, of course, not apply if the field is required.
If you don't supply value for field during indexing, solr will use default value as defined in schema.xml file. If default is not defined, solr ignores this field. If field is marked as required in schema.xml - solr will reject this document with error.
Example:
<field name="comments" type="text" indexed="true" stored="true" required="true"/>
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" />
From my experience if you do not specify a field when loading documents, then Solr will just ignore that field when it indexes the document and your statement "not providing a value and not setting a default value effectively would be as if that field does not exist for the particular document" is true. The catch is that you need to only specify the fields that you want to add for the given document. Check out the xml exampledocs that come with the Solr Distribution to see some examples of files that contain differing field sets.
Though you define fields in file called schema.xml, Solr documents are in fact schemeless. That means that internally Solr engine (Lucene) doesn't have any definitions of fields each document must have. With Lucene you can easily add field myCompletelyNewField to any document without affecting other documents anyhow.
So, what is the reason for schema.xml? Each field in Solr/Lucene has several properties, most known of them are indexed and stored properties. Moreover, all fields must be bound to some internal data type and processing units. For example, id field must be stored as string, and description field must be analyzed with some English analyzer, cleaned with stopwords filter and so on. Passing all this information in the add request to Solr is very inconvenient. Since you know what fields you will use and have access to Solr server (in most cases, at least), it is much easier to move all this info to separate file. And this file is schema.xml.
So, now you must understand that schema.xml define fields that are allowed, but not fields that must exist in document. Additional modifiers like required and default just provide additional services before adding documents to the index. I.e. required will force Solr's "front-end" to check whether specified field exists in new document. If yes, it passes document further, otherwise it rejects new doc. default causes same check, but if field is absent, it adds it with default values and passes document further.
As for your "Bad Request" error, I guess you have error somewhere else, e.g. you add empty field (field exist, but its value is "") while it is not allowed, or use incorrect value for the field, or have some other modifiers that contradict actual field added.

SQL Full Text search on HTML/XML data

I have a sql full text catalog on a cms database (SQL 2005). The database holds the CMS page content within a ntext column which is part of the full text catalog. As expected the searching takes into account the xml tags within the page content so searching for "H1" returns all the pages with H1 tags.
Is it possible to apply filters within the full text search to only index data within the xml tags.
I can see it is possible for SQL full text search to index/search .html binary types or xml columns. However as you can see the setup is slightly different to this.
Many Thanks,
Adam
Unfortunately, you can't change away from the default "text" iFilter on a text/varchar ntext/nvarchar column.
If you can't change the data type of the column to varbinary, your next-best bet might be to add the HTML tag names as stop words, so they get ignored during indexing and searching.
I should add that ntext has been deprecated, so you will need to move away from it eventually anyway.