Query for missing fields in nested documents - lucene

I have a user document which contains many tags
Here is the mapping:
{
"user" : {
"properties" : {
"tags" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "string",
"index" : "not_analyzed",
"store" : "yes"
},
"current" : {
"type" : "boolean"
},
"type" : {
"type" : "string"
},
"value" : {
"type" : "multi_field",
"fields" : {
"value" : {
"type" : "string",
"analyzer" : "name_analyzer"
},
"value_untouched" : {
"type" : "string",
"index" : "not_analyzed",
"include_in_all" : false
}
}
}
}
}
}
}
}
Here are the sample user documents:
User 1
{
"created_at": 1317484762000,
"updated_at": 1367040856000,
"tags": [
{
"type": "college",
"value": "Dhirubhai Ambani Institute of Information and Communication Technology",
"id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
},
{
"type": "company",
"value": "alma connect",
"id": "58ad4afcc8415216ea451339aaecf311ed40e132"
},
{
"type": "company",
"value": "Google",
"id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5",
"current": true
},
{
"type": "discipline",
"value": "B.Tech.",
"id": "a7706af7f1477cbb1ac0ceb0e8531de8da4ef1eb",
"institute_id": "4fb424a5addf32296f00013a"
},
]
}
User 2:
{
"created_at": 1318513355000,
"updated_at": 1364888695000,
"tags": [
{
"type": "college",
"value": "Dhirubhai Ambani Institute of Information and Communication Technology",
"id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
},
{
"type": "college",
"value": "Bharatiya Vidya Bhavan's Public School, Jubilee hills, Hyderabad",
"id": "d20730345465a974dc61f2132eb72b04e2f5330c"
},
{
"type": "company",
"value": "Alma Connect",
"id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5"
},
{
"type": "sector",
"value": "Website and Software Development",
"id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a4"
}
]
}
User 3:
{
"created_at": 1318513355001,
"updated_at": 1364888695010,
"tags": [
{
"type": "college",
"value": "Dhirubhai Ambani Institute of Information and Communication Technology",
"id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361821"
},
{
"type": "sector",
"value": "Website and Software Development",
"id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a1"
}
]
}
Using the above ES documents for search, I want to construct a query where I need to fetch users who have company tags in nested tag documents or the users who do not have any company tags. What will be my search query?
For example in above case, if search for google tag, then the returned documents should be 'user 1' and 'user 3' (as user 1 has company tag google and user 3 has no company tag). User 2 is not returned as it has a company tag other than google too.

Not trivial at all, mainly due to the not have a type:company tag clause. Here's what I came up with:
{
"or" : {
"filters" : [ {
"nested" : {
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"tags.value" : "google"
}
}, {
"term" : {
"tags.type" : "company"
}
} ]
}
},
"path" : "tags"
}
}, {
"not" : {
"filter" : {
"nested" : {
"filter" : {
"term" : {
"tags.type" : "company"
}
},
"path" : "tags"
}
}
}
} ]
}
}
It contains an or filter with two nested clauses: the first one finds the documents that have tags.type:company and tags.value:google, while the second one finds all the documents that don't have any tags.type:company.
This needs to be optimized though since and/or/not filters don't take advantage of caching for filters that work with bitsets, like the term filter does. It would be best to take some more time to find a way to use a bool filter and obtain the same result. Have a lookt this article to know more.

Related

How to find match elements in between two collections in mongodb?

I am working on mongodb database, but i am little stuck in one logic, how do i find match elements in between two collections in mongodb.
Users Collection
[{
"_id": "57cd539d168df87ae2695543",
"userid": "3658975589",
"name": "John Doe",
"email": "johndoe#gmail.com",
"number": "123654789"
}, {
"_id": "57cd53e6168df87ae2695544",
"userid": "789456123",
"name": "William Rust",
"email": "williamrust#gmail.com",
"number": "963258741"
}]
Contacts Collection
[{
"_id": "57cd2f6c3966037787ce9550",
"contact": [{
"id": "457899979",
"fullname": "Abcd Hello",
"phonenumber": "123575784565",
"currentUserid": "123456789"
}, {
"id": "7994949849",
"fullname": "Keyboard Mouse",
"phonenumber": "23658974262",
"currentUserid": "123456789"
}, {
"id": "7848848885",
"fullname": "John Doe",
"phonenumber": "852147852",
"currentUserid": "123456789"
}]
}]
So i want to find (phone number) matched elements from these two collections and list out those elements with their name and email.
Please kindly go through my post and suggest me some solution.
I'm guessing that you want to do is "aggregate + lookup". Something like this:
db.users.aggregate([{$lookup:
{
from: "contacts",
localField: "number",
foreignField: "phonenumber",
as: "same"
}
},
{
$match: { "same": { $ne: [] } }
}
])
As a result you get:
{
"_id" : "57cd539d168df87ae2695543",
"userid" : "3658975589",
"name" : "Anshuman Pattnaik",
"email" : "anshuman#gmail.com",
"number" : "7022650603",
"same" : [
{
"_id" : ObjectId("5b361b864aa5144b974c9733"),
"id" : "7848848885",
"fullname" : "Anshuman Pattnaik",
"phonenumber" : "7022650603",
"currentUserid" : "123456789"
}
]
}
If you want show only the name and the email, you have to add { $project: { name: 1, email:1, _id:0 }
db.users.aggregate([{$lookup:
{
from: "contacts",
localField: "number",
foreignField: "phonenumber",
as: "same"
}
},
{
$match: { "same": { $ne: [] } }
},
{ $project: { name: 1, email:1, _id:0 }
])
Then you'll get:
{ "name" : "Anshuman Pattnaik", "email" : "anshuman#gmail.com" }
For this to work you have to correct the insert of your contacts like this:
db.contacts.insert(
[{
"id": "457899979",
"fullname": "Abcd Hello",
"phonenumber": "123575784565",
"currentUserid": "123456789"
}, {
"id": "7994949849",
"fullname": "Keyboard Mouse",
"phonenumber": "23658974262",
"currentUserid": "123456789"
}, {
"id": "7848848885",
"fullname": "Anshuman Pattnaik",
"phonenumber": "7022650603",
"currentUserid": "123456789"
}]
)
Hope it works!
For more information https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
it's not your complete answer, but it may help you to solve your problem.
you can compare two documents using below function. for more details see this answer
var compareCollections = function(){
db.users collection.find().forEach(function(obj1){
db.contacts collection.find({/*if you know some properties, you can put them here...if don't, leave this empty*/}).forEach(function(obj2){
var equals = function(o1, o2){
// some code.
};
if(equals(ob1, obj2)){
// Do what you want to do
}
});
});
};
db.eval(compareCollections);

Elasticsearch - Index Mapping settings for both exact and partial matching

I'm new to elasticsearch and am trying to learn how to index using optimal mapping settings to achieve the following.
If I have a document like this
{"name":"Galapagos Islands"}
I want to get this a result for both the following queries
1) Partial matching
{
"query": {
"match": {
"name": "ga"
}
}
}
2) Exact matching
{
"query": {
"term": {
"name": "Galapagos Islands"
}
}
}
With the setting I have currently. I am able to achieve the partial matching part. But exact matching returns no results. Please find below the settings with which I indexed.
{
"mappings": {
"islands": {
"properties": {
"name":{
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "search_ngram"
}
}
}
},
"settings":{
"analysis":{
"analyzer":{
"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[ "standard", "lowercase", "stop", "kstem", "ngram" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
},
"filter":{
"ngram":{
"type":"ngram",
"min_gram":2,
"max_gram":15
}
}
}
}
}
What is the correct way to do exact matching and partial matching on a field ?
UPDATE
After recreating the index with settings given below. My mappings look like this
curl -XGET 'localhost:9200/testing/_mappings?pretty'
{
"testing" : {
"mappings" : {
"islands" : {
"properties" : {
"name" : {
"type" : "string",
"index_analyzer" : "autocomplete",
"search_analyzer" : "search_ngram",
"fields" : {
"raw" : {
"type" : "string",
"analyzer" : "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
}
My indexing settings are the below
{
"mappings": {
"islands": {
"properties": {
"name":{
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "search_ngram",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
},
"settings":{
"analysis":{
"analyzer":{
"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[ "standard", "lowercase", "stop", "kstem", "ngram" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
},
"my_keyword_lowercase_analyzer": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
},
"filter":{
"ngram":{
"type":"ngram",
"min_gram":2,
"max_gram":15
}
}
}
}
}
And with all the above, when I query like this
curl -XGET 'localhost:9200/testing/islands/_search?pretty' -d '{"query": {"term": {"name.raw" : "Galapagos Islands"}}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
And My document is this
curl -XGET 'localhost:9200/testing/islands/1?pretty'
{
"_index" : "testing",
"_type" : "islands",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source":{"name":"Galapagos Islands"}
}
Add a subfield to your name property which should be not_analyzed. Or, if you care about lowercase/uppercase, a keyword tokenizer together with a lowercase filter.
This should index Galapagos as is, not modifications. Then you can do your term search.
For example, a keyword analyzer together with lowercase filter:
"my_keyword_lowercase_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
And the mapping:
"properties": {
"name":{
"type": "string",
"index_analyzer": "autocomplete",
"search_analyzer": "search_ngram",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
The query to be used is:
{
"query": {
"term": {
"name.raw": "galapagos islands"
}
}
}
So, instead of using the same field - name - you should be using name.raw (the subfield).

make a new array from a nested object using Lodash

Here is my data
[
{
"properties": {
"key": {
"data": "companya data",
"company": "Company A"
}
},
"uniqueId" : 1
},
{
"properties": {
"key": {
"data": "companyb data",
"company": "Company B"
}
},
"uniqueId" : 2
},
{
"properties": {
"key": {
"data": "companyc data",
"company": "Company C"
}
},
"uniqueId" : 3
}
]
The format I need for my typeahead directive is below. I was trying to figure out the other post I made but still couldn't make it work. The best was to just make the nested collection as a simple collection of object.
[
{
"uniqueId" : 1,
"data": "companya data"
},
{
"uniqueId" : 2,
"data": "companyb data"
},
{
"uniqueId" : 3,
"data": "companyc data"
}
]
I got it!
console.log(
_(jsonData).map(function(obj) {
return {
d : obj.properties.key.data,
id : obj.uniqueId
}
})
.value()
);
You do not have to use the chaining feature of lodash as long as you are only performing one operation. You can simply use:
_.map(jsonData, function(obj) {
return {
d : obj.properties.key.data,
id : obj.uniqueId
}
});

dojo How Tree insert data to children?

hi i want to insert data to children to Tree.but I want to put the data.for example i want to update children[0] information.Rather than creating a new one I'd like to update the existing data.
my Tree.json
{
"name": "SCATTER/BUBBLE CHART",
"id": "SCATTERBUBBLE",
"children": [
{
"name": "Series",
"id": "SERIES",
"children": [
{
"name" : "Data:X",
"id" : "DX"
},
{
"name" : "Data:Y",
"id" : "DY"
}
]
},
{
"name": "XAxis",
"id": "X"
},
{
"name": "YAxis",
"id": "Y"
}
]
}
if i click button,i want to result
{
"name": "SCATTER/BUBBLE CHART",
"id": "SCATTERBUBBLE",
"children": [
{
"name": "Series",
"id": "SERIES",
"children": [
{
"name" : "Data:X",
"id" : "DX"
},
{
"name" : "Data:Y",
"id" : "DY"
},
{
"name" : "Data:Z",
"id" : "DZ"
}
]
},
{
"name": "XAxis",
"id": "X"
},
{
"name": "YAxis",
"id": "Y"
},
{
"name": "ZAxis",
"id": "Z"
}
]
}
i don't know update children tree ask for advice
Use node.item to get the store item object which has created the node. I hope you have the node object. For instance if you want to get the root node of your tree :-
var rootNode = dijit.byId("treeID").attr("rootNode");
After you get the node's item object you may update any of its attributes and your store will be modified. Your store should also extend "dojo/store/Observable", so that your tree gets updated with the changes to store.

ElasticSearch - how to give priority to the matching from the same row

I have the following documents in ElasticSearch 0.19.11, using:
{ "title": "dogs species",
"col_names": [ "name", "description", "country_of_origin" ],
"rows": [
{ "row": [ "Boxer", "good dog", "Germany" ] },
{ "row": [ "Irish Setter", "great dog", "Ireland" ] }
]
}
{ "title": "Misc stuff",
"col_names": [ "foo" ],
"rows": [
{ "row": [ "Setter is impotant" ] },
{ "row": [ "Ireland is green" ] }
]
}
The mapping is as follows:
{
"table" : {
"properties" : {
"title" : {"type" : "string"},
"col_names" : {"type" : "string"},
"rows" : {
"properties" : {
"row" : {"type" : "string"}
}
}
}
}
}
Question: I'm now searching for "Ireland Setter" and I need to have a higher score for documents that have search terms in the same row.
Currently the second document gets score of 0.22, while the first one - 0.14.
I want the first document to get a higher score in this case, since it has both "Ireland" and "Setter" in the same row. How can it be done?
With great cooperation from ElasticSearch google-group members, the solution is found.
Here is the link to the discussion: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/4O9dff2SNhg