Location aware search - lucene

I am trying location aware search with spatial example found in
http://www.ibm.com/developerworks/java/library/j-spatial/#indexing.approaches.
The schema.xml has a geohash field, but this field is not present in any of the .osm files (present in data folder) used to index. I am not able to understand how the value is assigned to it, so that when I give this query
http://localhost:8983/solr/select/?q=_val_:"recip (ghhsin(geohash(44.79, -93), geohash, 3963.205), 1, 1, 0)"^100
result set has geohash value retrieved. How is it happening? Please help me.

The Solr wiki has a pretty good page on how Spatial search can be done with solr 1.5+.
To summarize, your schema defines 'geohash' typed fields:
<fieldtype name="geohash" class="solr.GeoHashField"/>
<field name="destination" type="geohash" indexed="true" stored="true"/>
Data feeders pass in geohashed coordinates:
<field name="destination">cbj1pb56p4b</field> <!-- 45.17614 -93.87341 -->
You probably should go back to using simple latitude and longitude coordinates to start off with. There are better docs for it.

Related

How do I get empty fields in SOLR indexed for a schemaless collection?

How do I get empty fields in SOLR indexed? I am using solr 7.2.0
I am using schemaless SOLR to try to index everything as string, but for files with empty fields, those fields do not get indexed. Is there a way to get them to show up?
col1,col2,col3
a,,1
d,e,
g,h,3
for example column 1 shows up as
{
"col1":"a",
"col3":"1",
}
I'm trying to also get col2 to show up.
in my solrconfig.xml i have this
<dynamicField name="*" type="text_general" indexed="true" stored="true" required="true" default="" />
and I have any traces of the remove-blank processor removed from my config. I've reloaded and deleted/recreated by collection multiple times. Is there a solution for this?
The CSV import module has its own option to keep empty fields - f.<field name>.keepEmpty=true.
If you don't give that option, the CSV handler will never give the empty field value to the next step in your indexing process.
Giving f.col2.keepEmpty=True as an URL argument should at least give you a better starting point.
maybe preprocess your csv file like this:
s/,,/, ,/g
That is, add an space between both commas (you will have to specially deal with the last value differntly though, there is a regex for that).
And then try again. Right now solr is reading the value as non existant, making it a space has more chances to make it through, and would not change search results (if you don't have some crazy analysis chains)

Apache Solr undefined field score field in function query

I am using solr 4.10. I have to change relevance of documents based on a field boost and document score. For that, I have come to know that I should use function query. Following is the syntax of boost field in schema
<field name="boost" type="float" stored="true" indexed="false" default="1.0"/>
My first question is that can function queries be used on stored fields only?
When I try using above schema, like following query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id&sort=pow(score,%20boost)%20asc
There was some error saying like
sort param could not be parsed as a query, and is not a field that exists in the index:
then I changed the schema like
<field name="boost" type="float" stored="true" indexed="true" default="1.0"/>
Then above problem was gone but a new error appeared for query
http://localhost:8983/solr/select?q=bank&df=keywords&fl=id,pow(score,%20boost)
Following error appeared
<lst name="error">
<str name="msg">undefined field: "score"</str>
<int name="code">400</int>
</lst>
Where I am wrong?
Am I correct to change attributes of boost field?
I would recommend to use a boost function and sort just by score (default = no order param needed).
bf=linear(boost,100,0)
You may use other functions. That depends on your usecase.
Just check out the solr docs for function queries.

Solr returning different output fields for each document in result

Hello i've read the Solr wiki and searched here but didn't find a solution for my use-case:
We're indexing customer-data with different kind of contracts into a single document.
So each customer will result in a Solr document witth one or more different contracts.
The fields for each contract are added dynamically via import (e.g. contract_type_1_s, contract_type_2_s, ...; contract_change_date_1_dt, contract_change_date_2_dt, ...). So all fields with '2' are related to contract no 2.
With this the user is able to search for customers who have a contract of type one and none of type two and so on.
My use case is now to return only the fields of the contract which matched the query.
Here's an example:
<doc>
<field name="id">100</field>
<field name="customer_name">paul</field>
<field name="contract_type_1_s">inhouse</field>
<field name="contract_change_date_1_dt">2012-09-01T00:00:00Z</field>
</doc>
<doc>
<field name="id">101</field>
<field name="customer_name">fred</field>
<field name="contract_type_1_s">inhouse</field>
<field name="contract_change_date_1_dt">2012-09-01T00:00:00Z</field>
<field name="contract_type_2_s">external</field>
<field name="contract_change_date_2_dt">2012-09-01T00:00:00Z</field>
</doc>
<doc>
<field name="id">102</field>
<field name="customer_name">karl</field>
<field name="contract_type_1_s">external</field>
<field name="contract_change_date_1_dt">2012-09-01T00:00:00Z</field>
<field name="contract_type_2_s">inhouse</field>
<field name="contract_change_date_2_dt">2012-09-01T00:00:00Z</field>
</doc>
If the user now searches for customers with contract-type 'external' the documents with ids 101 and 102 are in the result. Now i want to return different fields of the contract which matched the query.
In this example these should be contract_change_date_1_dt for document 102 and contract_change_date_2_dt for document 101, since contract no 1 is external in document 102 and contract no 2 is external in document 101.
Is there a way to achive this behavior with build-in components?
I know that i can find out which fields matched the query with the highlight-component.
I endet up with following resolution, but it forces me to extend Solr:
Write a QParser to to identify needet fields, add them to the fl-param
Do a Highlighting-Query before returning the results to the Client
Iterate over all docs in result and add the fields which matched the query per doc into the result list
I Hope i made my problem clear. Any suggestions which is a good way to archive this are really appreciated.
greetings René
if someone needs a similar thing ;-)
I now managed to build up my custom result list in the following way:
Create a Custom QueryComponent (extending standard QueryComponent) to store the fields which are used in the query. In the prepare-method activate highlighting with the stored fields:
// Making params modifieable
ModifiableSolrParams modifiableParams = new ModifiableSolrParams(params);
req.setParams(modifiableParams);
modifiableParams.set(HighlightParams.FIELDS, queryFieldList);
modifiableParams.set(HighlightParams.HIGHLIGHT, "true");
modifiableParams.set(HighlightParams.FIELD_MATCH, "true");
modifiableParams.set(HighlightParams.SIMPLE_PRE, "");
modifiableParams.set(HighlightParams.SIMPLE_POST, "");
Create a Custom HighlightComponent (extending standard HighlightComponent) to build the result out of the std. result. In the process-method i now get the highlight info and extract the information i need:
NamedList<Object> rspValues = rb.rsp.getValues();
NamedList<Object> nlHl = (NamedList<Object>) rspValues.get("highlighting");
this.hlDocsAndFields = extractHighlightingInfo(nlHl);
For that i created a custom List, which is able to count the matches per contract (how much fields of contract_X_s are in the highlighted results).
This works fine.
I now stuck at the response writer who resolves the document-fields himself when he builds the response :-(
Has annyone a suggestion on changig/customizing the response writer?
greetings René
I now managed the whole thing.
I must not change the response writer at all :-)
I just had to store all fields which i resolved for each document and add them to the response:
rb.rsp.setReturnFields(globalResultFields);
greetings René

How to index blob field in Apache Solr indexing?

I am using Apache Solr to index my data, I have blob field which I want to be indexed too...but I dont know what is the fieldType to be declared in the 'scheme.xml'....
I tried following:
" field name="abstract" type="text" indexed="true" stored="true" required="true" "
but when I tried to search then that field is shown as :
id, abstract, title, price, publishedDate
1, [B#1e9b7b2, Spain Consumer, 3795.0, 2009-01-19T18:30:00Z
'abstract' is my blob filed which is nothing but big string...and I wanted text search on same field but when I indexed it then it is showing like this...
please suggest me what can I do?
thanking in advance...
Solr FAQ mentions this for the blob http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5
You can check for searching-rich-format-documents-stored-dbms
There was an JIRA issue for contributing the BlobTransformer, but doesn't seem to make it into the Code. You can refer the patch and pick the transformer for your use probably.
Not sure if its renamed/refactored/renamed differently in the Current versions.

What is the use of "multiValued" field type in Solr?

I'm new to Apache Solr. Even after reading the documentation part, I'm finding it difficult to clearly understand the functionality and use of the multiValued field type property.
What internally Solr does/treats/handles a field that is marked as multiValued?
What is the difference in indexing in Solr between a field that is multiValued and those that are not?
Can somebody explain with some good example?
Doc says:
multiValued=true|false
True if this
field may contain multiple values per
document, i.e. if it can appear
multiple times in a document
A multivalued field is useful when there are more than one value present for the field. An easy example would be tags, there can be multiple tags that need to be indexed. so if we have tags field as multivalued then solr response will return a list instead of a string value. One point to note is that you need to submit multiple lines for each value of the tags like:
<field name="tags">tag1</tags>
<field name="tags">tag2</tags>
...
<field name="tags">tagn</tags>
Once you have all the values index you can search or filter results by any value, e,g. you can find all documents with tag1 using query like
q=tags:tag1
or use the tags to filter out results like
q=query&fq=tags:tag1
multiValued defined in the schema whether the field is allowed to have more than one value.
For instance:
if I have a fieldType called ID which is multiValued=false indexing a document such as this:
doc {
id : [ 1, 2]
...
}
would cause an exception to be thrown in the indexing thread and the document will not be indexed (schema validation will fail).
On the other hand if I do have multiple values for a field I would want to set multiValued=true in order to guarantee that indexing is done correctly, for example:
doc {
id : 1
keywords: [ hello, world ]
...
}
In this case you would define "keywords" as a multiValued field.
I use multiple value fields only with copyfields, so think this way, say all fields will be single valued unless it's a copyfield, for example I have following fields:
<field name="id" type="string" indexed="true" stored="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="subject" type="string" indexed="true" stored="true"/>
<field name="location" type="string" indexed="true" stored="true"/>
I want to query one field only and possibly to search all 4 fields above, then we need to use copyfield. first to create a new field call 'all', then copy everything into 'all'
<field name="all" type="text" indexed="true" stored="true" multiValued="true"/>
<copyField source="*" dest="all"/>
Now field 'all' need to be multi-valued.