Adding Prefix when creating an Index using Jredisearch - redis

I use Jredisearch(com.redislabs:jredisearch:2.0.0) to store data in an Index. I want to add a prefix while creating the Index. I am able to add prefix using the below Redisearch command
FT.CREATE MyIndex ON HASH PREFIX 1 doc: SCHEMA name TEXT
But not able to find options for the same when writing in Java. I use the following code in Java,
client.createIndex(schema, Client.IndexOptions.defaultOptions());
Could anyone suggest how do we add Prefix when using Jredisearch?

IndexDefinition class has a setPrefixes(...) method which serves your purpose.
Note: You may have to create IndexDefinition using new IndexDefinition().

Related

nested field in Solr 5.2

I'm new to Solr and I have a very specific problem that I need to solve:
I have a csv file that contains my Solr document. Now, I do have a column (field) that's not only multiValued, but also contains 'subfields'
for example
"id":"0101",
"addMaterials":[{"name":"Mat1", "property":"prop1"},
{"name":"Mat2","property":"prop2"},
{"name":"Mat3","property":"prop3"}],
"mainProperty":"mainproperty1",
"URL":"http://www.mySite..."
where id, addMaterials, mainProperty, and URL are my main fields while 'name' and 'property' are my subfields. I know that Solr is designed to handle denormalized documents but denormalizing is not a possible solution for my application.
What I'm thinking is to just separate my data set and move the fields (that have subfields) to another document and somehow make a new field to link it to the orginial document (e.g. fromIdField).
Is there any other solution to do this? My minimum goal is to index the values of addMaterials field (even without indexing the subfields)
from:
"addMaterials":[{"name":"Mat1", "property":"prop1"},
{"name":"Mat2","property":"prop2"},
{"name":"Mat3","property":"prop3"}],
to
"addMaterials":{"name":"Mat1", "property":"prop1"}
"addMaterials":{"name":"Mat2", "property":"prop2"}
"addMaterials":{"name":"Mat3", "property":"prop3"}
Thanks in advance.
I have found a solution to my problem. Instead of separating my data set, I kept the addMaterials field as a multiValued field and ignored the subfields. So I only have one multiValued field to be indexed. What I did was to use the update/ request of Solr to index my csv file and put },{ as my separator in my addMaterials multiValued field. The indexed document looks like this:
"addMaterials": ["[{\"name\":\"Mat1\", \"property\":\"prop1\"",
"\"name\":\"Mat2\", \"property\":\"prop2\"",
"\"name\":\"Mat3\", \"property\":\"prop3\"}]"]
I indexed my document using this:
curl "http://localhost:8983/solr/<coreName>/update/csv?
stream.file=C:/userName/Solr/solr-5.2.0/documentFolder/myFile.csv&
f.addMaterials.split=true&
f.addMaterials.separator=\},\{&
stream.contentType=text/plain;charset=utf-8"
Also, this assumes that the addMaterials field is a multiValued field. So make sure you modify your schema first before indexing your document using the procedure above. Otherwise, it will give an error saying that the f. is not a multiValued field.
Of course, if you need to query against the sub-fields then I guess you can use the !join command/function of Solr.

Date range search using Google Custom Search API

I am using the Google Custom Search API to search for images. My implementation is using Java, and this is how I build my search string:
URL url = new URL("https://ajax.googleapis.com/ajax/services/search/images?"
+ "v=1.0&q=barack%20obama&userip=INSERT-USER-IP");
How would I modify the URL to limit search results, for example, to: 2014-08-15 and 2014-09-31?
You can specify a date range using the sort parameter. For your example, you would add this to your query string: sort=date:r:20140815:20140931.
This is documented at https://developers.google.com/custom-search/docs/structured_data#page_dates
Also if you use Google's Java API you can use the Query class and its setSort() method rather than building the URL by hand.
I think the better way is to put this into query itself. Query parameter contains 'after' flag which can be used like:
https://customsearch.googleapis.com/customsearch/v1?
key=<api_key>&
cx=<search_engine_id>&
q="<your_search_word> after:<YYYY-MM-DD>"

howto write an LDAP schema?

i want to introduce a new class "MyAppValues".
when you add this to class to an existiting objects it should be possible add the following attributes:
MyAppKey is a mandantory string, that cannot be emtpy
MyAppOptionalValue is an optional string, that can be empty
How can this be done?
Is there an easy tutorial out there?
You need to define your own schema, read this page to understand the LDAP schema basics and
then you can use Apache Directory studio to create the schema.

Can I customize Elastic Search to use my own Stop Word list?

specifically, I want to index everything (e.g. the who) with no stop word list. Is elastic search flexible enough and easy enough to change?
By default, the analyzer elasticsearch uses is a standard analyzer with the default Lucene English stopwords. I have configured elasticsearch to use the same analyzer but without stopwords by adding the following to the elasticsearch.yml file.
# Index Settings
index:
analysis:
analyzer:
# set standard analyzer with no stop words as the default for both indexing and searching
default:
type: standard
stopwords: _none_
Yes, you can do this using ElasticSearch's internal config YAML file.
See the config docs for how to change the analyzer settings.
You can override default analyzer globally and turn off the stopword filter by adding these lines to your elasticsearch.yml:
index.analysis.analyzer.default:
type: custom
tokenizer: standard
filter: standard, lowercase
This will create a custom analyzer with the standard tokenizer and two filters: standard and lowercase. This way your custom analyzer will be identical to the standard analyzer but it will not use the stopword filter. Because it's named "default", elasticsearch will use it everywhere where analyzer is not explicitly set.
Certainly you can. Use stopwords_path insead of stopwords. for more information http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stop-analyzer.html

Indexing file paths or URIs in Lucene

Some of the documents I store in Lucene have fields that contain file paths or URIs. I'd like users to be able to retrieve these documents if their query terms contain a path or URI segment.
For example, if the path is
C:\home\user\research\whitepapers\analysis\detail.txt
I'd like the user to be able to find it by queriying for path:whitepapers.
Likewise, if the URI is
http://www.stackoverflow.com/questions/ask
A query containing uri:questions would retrieve it.
Do I need to use a special analyzer for these fields, or will StandardAnaylzer do the job? Will I need to do any pre-processing of these fields? (To replace the forward slashes or backslashes with spaces, for example?)
Suggestions welcome!
You can use StandardAnalyzer.
I tested this, by adding the following function to Lucene's TestStandardAnalyzer.java:
public void testBackslashes() throws Exception {
assertAnalyzesTo(a, "C:\\home\\user\\research\\whitepapers\\analysis\\detail.txt", new String[]{"c","home", "user", "research","whitepapers", "analysis", "detail.txt"});
assertAnalyzesTo(a, "http://www.stackoverflow.com/questions/ask", new String[]{"http", "www.stackoverflow.com","questions","ask"});
}
This unit test passed using Lucene 2.9.1. You may want to try it with your specific Lucene distribution. I guess it does what you want, while keeping domain names and file names unbroken. Did I mention that I like unit tests?