ElasticSearch analyzed fields - vb.net

I'm building my search but need to analyze 1 field with different analyzers. My problem is for a field I need to have an analyzer on it for stemming (snowball) and then also one to keep the full word as one token (keyword). I can get this to work by the following index settings:
curl -X PUT "http://localhost:9200/$IndexName/" -d '{
"settings":{
"analysis":{
"analyzer":{
"analyzer1":{
"type":"custom",
"tokenizer":"keyword",
"filter":[ "standard", "lowercase", "stop", "snowball", "my_synonyms" ]
}
}
},
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms_path ": "synonyms.txt"
}
}
}
},
"mappings": {
"product": {
"properties": {
"title": {
"type": "string",
"search_analyzer" : "analyzer1",
"index_analyzer" : "analyzer1"
}
}
}
}
}';
The problem comes when searching on a single word in the title field. If it's populated with The Cat in the Hat it will store it as "The Cat in the Hat" but if I search for cats I get nothing returned.
Is this even possible to accomplish or do I need to have 2 separate fields and analyze one with keyword and the other with snowball?
I'm using nest in vb code to index the data if that matters.
Thanks
Robert

You can apply two different analyzers to the same using the fields property (previously known as multi fields).
My VB.NET is a bit rusty, so I hope you don't mind the C# examples. If you're using the latest code from the dev branch, Fields was just added to each core mapping descriptor so you can now do this:
client.Map<Foo>(m => m
.Properties(props => props
.String(s => s
.Name(o => o.Bar)
.Analyzer("keyword")
.Fields(fs => fs
.String(f => f
.Name(o => o.Bar.Suffix("stemmed"))
.Analyzer("snowball")
)
)
)
)
);
Otherwise, if you're using NEST 1.0.2 or earlier (which you likely are), you have to accomplish this via the older multi field type way:
client.Map<Foo>(m => m
.Properties(props => props
.MultiField(mf => mf
.Name(o => o.Bar)
.Fields(fs => fs
.String(s => s
.Name(o => o.Bar)
.Analyzer("keyword"))
.String(s => s
.Name(o => o.Bar.Suffix("stemmed"))
.Analyzer("snowball"))
)
)
)
);
Both ways are supported by Elasticsearch and will do the exact same thing. Applying the keyword analyzer to the primary bar field, and the snowball analyzer to the bar.stemmed field. stemmed of course was just the suffix I chose in these examples, you can use whatever suffix name you desire. In fact, you don't need to add a suffix, you can name the multi field something completely different than the primary field.

Related

Use sprintf syntax inside logstash's sprintf syntax

For the below data structure:
{
"sprints": [
{
"id": 17193,
"name": "Sprint 12"
},
{
"id": 16510,
"name": "Sprint 11"
}
],
"velocityStatEntries": {
"16510": {
"estimated": {
"value": 49
},
"completed": {
"value": 36
}
},
"17193": {
"estimated": {
"value": 52
},
"completed": {
"value": 70
}
}
}
}
Given this, I want to be able to produce an Elasticsearch object that's easier to handle, by adding the values of the Estimated and Completed fields to the sprints with their matching IDs.
Ideally, I would like to handle this without writing Ruby, but I am not finding a logstash-native solution that handles this scnenario.
First, I split the data on the sprints field using split, so, I only have a single sprints object, and can use [sprints][id] to know what sprint I'm processing.
Then, I have attempted to work with the mutate filter, in one of two ways:
- using merge to add the [velocityStateEntries][] object to the
current sprint
- using add_field to add the two fields I need
Syntactically, is this possible? Ideally, I would want to be able to do a 'double substitution' of sorts, obtaining the estimated time for the current sprint something like:
add_field => {
"estimatedTime" => "%{[velocityStatEntries][%{[sprints][id]}][estimated][value]}"
}
but this only seems to work with a hardcoded format such as "estimatedTime" => "%{[velocityStatEntries][1234][estimated][value]}"
Do I have to use the Ruby format for this?
For what it's worth, the Ruby solution is very simple:
ruby {
code => "
sprintId = event.get('[sprints][id]');
estimated = event.get('[velocityStatEntries]['+(sprintId).to_s+'][estimated][value]');
completed = event.get('[velocityStatEntries]['+(sprintId).to_s+'][completed][value]');
event.set('[sprints][estimatedUnits]', estimated);
event.set('[sprints][completedUnits]', completed);
"
}

Creating Customer on Balanced Payments

I am confused by the Balanced Payment documentation, specifically for creating customers:
The Balanced docs say to create a customer with this code:
$marketplace = Balanced\Marketplace::mine();
$customer = $marketplace->customers->create(array(
'address' => array(
'postal_code' => '48120',
),
'dob_month' => '7',
'dob_year' => '1963',
'name' => 'Henry Ford',
));
The goal is to get a json response:
{
"customers": [
{
"address": {
"postal_code": "48120",
//more key -> value pairs
},
//more key -> value pairs
"href": "/customers/CU3SSJgvA5Z69kt05MusbPeE",
}
The problem that I am having is that I cannot find any reference as to how to send the info to Balanced. Do I use balanced.js to tokenize it the same way I tokenize a credit card?
Take a look at https://docs.balancedpayments.com/1.1/api/customers/#create-a-customer . You would use one of the clients, such as the Python or PHP clients found at https://docs.balancedpayments.com/1.1/overview/getting-started/#client-libraries .
Balanced.js is just for card tokenizations to aid with PCI-compliance -- using it, sensitive card information is posted directly to Balanced and never has to touch your own servers.

Join / split search words in elasticsearch (using tire)

I have the following analyzer (a slight tweak to the way snowball would be setup):
string_analyzer: {
filter: [ "standard", "stop", "snowball" ],
tokenizer: "lowercase"
}
Here is the field it is applied to:
indexes :title, type: 'string', analyzer: 'string_analyzer'
query do
match ['title'], search_terms, fuzziness: 0.5, max_expansions: 10, operator: 'and'
end
I have a record in my index with title foo bar.
If I search for foo bar it appears in the results.
However, if I search for foobar it doesn't.
Can someone explain why and if possible how I could get it to?
Can someone explain how I could get the reverse of this to work as well so that if I had a record with title foobar a user could search for foo bar and see it as a result?
Thanks
You can only search for tokens that are in your index. So let's look at what you are indexing.
You're currently using the lowercase tokenizer (which tokenizes a string on non-letter characters and lowercases them) then applying the standard filter (redundant, because you are not using the standard tokenizer), the stop and snowball filters.
If we create that analyzer:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"string_analyzer" : {
"filter" : [
"standard",
"stop",
"snowball"
],
"tokenizer" : "lowercase"
}
}
}
}
}
'
and use the analyze API to test it out:
curl -XGET 'http://127.0.0.1:9200/test/_analyze?pretty=1&text=foo+bar&analyzer=string_analyzer'
you'll see that "foo bar" produces the terms ["foo","bar"] and "foobar" produces the term ["foobar"]. So indexing "foo bar" and searching for "foobar" currently cannot work.
If you want to be able to search "inside" words, then you need to break words up into smaller tokens. To do this, we use the ngram analyzer.
So delete the test index:
curl -XDELETE 'http://127.0.0.1:9200/test/?pretty=1'
and specify a new analyzer:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"filter" : {
"ngrams" : {
"max_gram" : 5,
"min_gram" : 1,
"type" : "ngram"
}
},
"analyzer" : {
"ngrams" : {
"filter" : [
"standard",
"lowercase",
"ngrams"
],
"tokenizer" : "standard"
}
}
}
}
}
'
Now, if we test the analyzer, we get:
"foo bar" => [f,o,o,fo,oo,foo,b,a,r,ba,ar,bar]
"foobar" => [f,o,o,b,a,r,fo,oo,ob,ba,ar,foo,oob,oba,bar,foob,ooba,obar,fooba,oobar]
So if we index "foo bar" and we search for "foobar" using the match query, then the query becomes a query looking for any of those tokens, some of which exist in the index.
Unfortunately, it'll also overlap with "wear the fox hat" (f,o,a). While foobar will appear higher up the list of results because it has more tokens in common, you will still get apparently unrelated results.
This can be controlled by using the minimum_should_match parameter, eg:
curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"match" : {
"my_field" : {
"minimum_should_match" : "60%",
"query" : "foobar"
}
}
}
}
'
The exact value for minimim_should_match depends upon your data - experiment with it.

Unable to filter out n shingle(n - gram) facets using the "exclude" words option provided in the "facets" query

I am trying to make a tagcloud of words and phrases using the facets feature of elasticsearch.
My mapping:
curl -XPOST http://localhost:9200/myIndex/ -d '{
...
"analysis":{
"filter":{
"myCustomShingle":{
"type":"shingle",
"max_shingle_size":3,
"output_unigrams":true
}
},
"analyzer":{ //making a custom analyzer
"myAnalyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase",
"myCustomShingle",
"stop"
]
}
}
}
...
},
"mappings":{
...
"description":{ //the field to be analyzed for making the tag cloud
"type":"string",
"analyzer":"myAnalyzer",
"null_value" : "null"
},
...
}
Query for generating facets:
curl -X POST "http://localhost:9200/myIndex/myType/_search?&pretty=true" -d '
{
"size":"0",
"query": {
match_all:{}
},
"facets": {
"blah": {
"terms": {
"fields" : ["description"],
"exclude" : [ 'evil' ], //remove facets that contain these words
"size": "50"
}
}
}
}
My problem is, when I insert a word say 'evil' in the "exclude" option of "facets", it successfully removes the facets containing the words(or single shingles) that match 'evil'. But it doesn't remove the 2/3 word shingles, "resident evil" , "evil computer", "my evil cat". How do I remove the facets of phrases containing the "exclude words"?
It isn't completely clear what you want to achieve. You usually wouldn't make facets on analyzed fields. Maybe you could explain why you're making shingles so that we can help achieving what you want in a better way.
With the exclude facet parameter you can exclude some specific entry, but evil is not the same as resident evil. If you want to exclude it you need to specify it. Facets are made based on indexed terms, and resident evil is in fact a single term in the index, which is not the same as the term evil.
Given the choice that you already made for indexing and faceting, there is a way to achieve what you want. Elasticsearch has a really powerful scripting module. You can use a script to decide whether each entry should be included in the facet or not like this:
{
"query": {
"match_all" : {}
},
"facets": {
"tags": {
"terms": {
"field" : "tags",
"script" : "term.contains('evil') ? true : false"
}
}
}
}

Highlighting matched results on _all fields

I want the matched results to be highlighted. This works for me if I mention the field name and it returns the highlighted text, however if I give the field as "_all", it is not returning any value.
This works for me:
curl -XGET "http://localhost:9200/my_index/my_type/_search?q=stackoverflow&size=999" -d '{
"highlight":{
"fields":{
"my_field":{}
}
}
}'
This returns the expected value as follows:
[highlight] => stdClass Object ( [my_field] => Array ( [0] => stackoverflow is the best website for techies ) )
But when I give this:
curl -XGET "http://localhost:9200/my_index/my_type/_search?q=stackoverflow&size=999" -d '{
"highlight":{
"fields":{
"_all":{}
}
}
}'
I get null value/no result.
[highlight] => stdClass Object ( [_all] => Array () )
How do I get it to work on any field so that I don't have to mention the field name?
To avoid the need to add _all as a stored field in your index
An alternative quick fix: use * instead of _all:
curl -XGET "http://localhost:9200/my_index/my_type/_search?q=stackoverflow&size=999" -d '{
"highlight":{
"fields":{
"*":{}
}
}
}'
If you are using ES 2.x then you need to set require_field_match option to false due to changes made, From the doc
The default value for the require_field_match option has changed from false to true, meaning that the highlighters will, by default, only take the fields that were queried into account.
This means that, when querying the _all field, trying to highlight on
any field other than _all will produce no highlighted snippets.
"highlight": {
"fields": {
"*": {}
},
"require_field_match": false
}
You need to map the _all field as stored. The mapping below should do the trick. Note though that this will add to the index size.
{
"my_type": {
"_all": {
"enabled": true,
"store": "yes"
}
}}
This library has functions for query highlighting including highlighting across all fields. The README explains how to create an elasticsearch index with _all field stored etc:
https://github.com/niranjan-uma-shankar/Elasticsearch-PHP-class