Scoring documents in Lucene 6.2.0

Scoring documents in Lucene 6.2.0 - lucene

My query in lucene 6.2.0 goes like:
query query = new PhraseQuery.Builder()
.add(new Term("country","russia"))
.setSlop(1)
.build();
Basically among all my documents which are:
{
"_id" : ObjectId("586b723b4b9a835db416fa26"),
"name" : "test",
"countries" : {
"country" : [
{
"name" : "russia"
},
{
"name" : "USA china"
}
]
}
}
{
"_id" : ObjectId("586b73f24b9a835fefb10ca5"),
"name" : "nitika jain",
"countries" : {
"country" : [
{
"name" : "russia and denmrk"
},
{
"name" : "USA china"
}
]
}
}
{
"_id" : ObjectId("586b744f4b9a835fefb10ca7"),
"name" : "arjun",
"countries" : {
"country" : [
{
"name" : "russia pakistan"
},
{
"name" : "india iraq"
}
]
}
}
I want a document which has only russia. Ideally it should be the one highest scored, but instead I get something like "Found 3 hits."
Document<stored,indexed,tokenized<id:586b723b4b9a835db416fa26> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name:test> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<countries:{ "country" : [ { "name" : "russia"} , { "name" : "USA china"}]}> stored,indexed,tokenized<country:russia> stored,indexed,tokenized<country:USA china>>**0.12874341**
Document<stored,indexed,tokenized<id:586b73f24b9a835fefb10ca5> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name:nitika jain> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<countries:{ "country" : [ { "name" : "russia and denmrk"} , { "name" : "USA china"}]}> stored,indexed,tokenized<country:russia and denmrk> stored,indexed,tokenized<country:USA china>>**0.12874341**
Document<stored,indexed,tokenized<id:586b744f4b9a835fefb10ca7> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name:arjun> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<countries:{ "country" : [ { "name" : "russia pakistan"} , { "name" : "india iraq"}]}> stored,indexed,tokenized<country:russia pakistan> stored,indexed,tokenized<country:india iraq>>**0.12874341**
All 3 results are equally scored. How can I get the document with only russia to be highest scored?

In Phrase queries, the slop is zero by default, requiring exact matches. that means that if you modify your query in this way:
query query = new PhraseQuery.Builder()
.add(new Term("country","russia"))
.build();
you'll get what you're looking for.

Related

How could I create indexes in postgres using jsonb?

I have a table in my database as follows
my_table:jsonb
[ {
"name" : "world map",
"type" : "activated",
"map" : [ {
"displayOrder" : 0,
"value" : 123
}, {
"displayOrder" : 1,
"value" : 456
}, {
"displayOrder" : 2,
"value" : 789
} ]
}, {
"name" : "regional map",
"type" : "disabled"
} ]
I would like to create indices for the name, type and displayOrder fields, which would be the best way?

Remove Subdocument items with $pull

I'm trying to remove items from subdocuments using ExpressJS and Mongoose but it is only removing the first items, not the sub items.
So I want to remove "subitem 2" in the messages Array
This is the structure:
{
"_id" : ObjectId("5c4ee94b30ebd71cbed89a35"),
"title" : "Test",
"subitem" : [
{
"_id" : ObjectId("5c4ee95630ebd71cbed89a36"),
"title" : "Item 1",
"messages" : [
{
"_id" : ObjectId("5c4ee95f30ebd71cbed89a37"),
"type" : "single_article",
"date" : "Jan 28, 2019",
"title" : "subitem 1",
"text" : ""
}
]
},
{
"_id" : ObjectId("5c4ee96830ebd71cbed89a38"),
"title" : "item 2",
"messages" : [
{
"_id" : ObjectId("5c4ee96e30ebd71cbed89a39"),
"type" : "single_article",
"date" : "Jan 28, 2019",
"title" : "subitem 2",
"text" : ""
}
]
}
],
"__v" : 0
}
And this is the $pull method:
getController.deleteRec = function(req,res,collection){
var id = req.params.id;
console.log(id);
collection.updateOne({'subitem.messages._id': id}, {$pull: {'subitem.0.messages': {"_id": id}}}).
then(function(result){
console.log(result);
});
};
Now I know why it is only deleting the first item because I have "subitem.0.messages". How can I loop over this, so it can delete all items?

You can use $ as a wildcard index, removing all elements in the array matching your query like this:
{$pull: {'subitem.$.messages': {"_id": id}}}
if you want to remove multiple documents:
{$pull: {'subitem.$.messages': {"_id": {$in : [id, id2, id3...]}}}}

Mongodb: Pull corresponding value from another collection

I have two diff collections:
CollectionA
{
"_id" : 1.0,
"1234" : "GROUP"
}
{
"_id" : 2.0,
"2345" : "SUBGROUP"
}
CollectionB
{
"_id" : 1.0,
"config" : "1234",
"description" : "DCS"
}
{
"_id" : 2.0,
"config" : "2345",
"description" : "BCS"
}
I was expecting the below output when i write a find query by joining the two collections. Can we able to get the requested output by using $lookup function?
{
"_id" : 1.0,
"config" : "GROUP",
"description" : "DCS",
}
{
"_id" : 2.0,
"config" : "SUBGROUP",
"description" : "BCS",
}

You can implement your query like this:
db.getCollection('a').aggregate([{
$lookup: {
from: 'b',
localField: '_id',
foreignField: '_id',
as: 'data'
}
},
{
$unwind: '$data'
},
{
$project: {
config: '$1234', //In your schema you may have the same key instead of 1234 and 2345
description: '$data.description'
}
}
]);

db.find vs db.aggregation to select nested array Object

I'v tried to perform the following query :
db.getCollection('fxh').find({"username": "user1", "pf.acc.accnbr" : 915177},{userid: true, "pf.pfid": true, "pf.acc.accid":true})
and my collection is the following :
{
"_id" : ObjectId("5932fd8f381d4c0a7de21942"),
"userid" : 1496513894,
"username" : "user1",
"email" : "user1#gmail.com",
"fullname" : "User 1",
"pf" : {
"acc" : [
{
"cyc" : [
{
"det" : {
"status" : "New",
"dcycid" : 1496513941
},
"status" : "New",
"name" : "QPT202017_M1",
"cycid" : 1496513940
}
],
"status" : "New",
"accnbr" : 915177,
"accid" : 1496513939
},
{
"cyc" : [
{
"det" : {
"status" : "New",
"dcycid" : 1496552643
},
"status" : "New",
"name" : "QPT202017_S8",
"cycid" : 1496552642
}
],
"status" : "New",
"accnbr" : 73497,
"accid" : 1496552641
}
],
"pfid" : 1496513935,
},
"lastupdate" : ISODate("2017-06-03T18:18:55.080Z"),
"__v" : 0
}
When I execute the query the result is the following :
{
"_id" : ObjectId("5932fd8f381d4c0a7de21942"),
"userid" : 1496513894,
"portfolio" : {
"acc" : [
{
"accid" : 1496513939
},
{
"accid" : 1496552641
}
],
"pfid" : 1496513935
}
}
And my problem is that I need to see only the concerned accid and the result returns the all accid !.
Any idea how just to return the selected accid of accnbr ?
NB : I have also tried to add $ sign at the end of my query , it
selects the right acc but it returns the all objects or I need just
only ONE returned object.
On 6/5/17
I also used the aggregate command instead of find and it get result by using this :
db.getCollection('fxh').aggregate([ { $unwind : "$pf.acc"} , { $match : {"username":"adh1", "pf.acc.accbr": 915177 } }, {$project : {_id:0, accid: "$pf.acc.accid"}}])
But could NOT get a lower level result, when I ran this :
db.getCollection('fxh').aggregate([ { $unwind : "$pf.acc.cyc"} , { $match : {"username":"adh1", "pf.acc.accbr": 915177, "pf.acc.cyc.name": "QPT202017_M1" } }, {$project : {_id:0, cycid: "$pf.acc.cyc.cycid"}}])
Any idea ?

You can try the below aggregation pipeline.
The idea is to $unwind one nested level at a time, starting from the outermost to the innermost.
For each nested level unwinding, you can apply the$match to limit the documents and continue till you have the desired shape.
You can $group it together at the end to get back to the original shape.
db.getCollection('fxh').aggregate([
{ $match : {"username":"adh1"} },
{ $unwind : "$pf.acc"} ,
{ $match : {"pf.acc.accbr": 915177 } },
{ $unwind : "$pf.acc.cyc"},
{ $match : {"pf.acc.cyc.name": "QPT202017_M1" } },
{$project : {_id:0, accid: "$pf.acc.accid", cycid: "$pf.acc.cyc.cycid"}}])

Custom score in Elastic Search

I have an index, where each document represents a class, with list of students and a list of teachers.
Here is the mapping:
{
"properties" : {
"students" : {
"include_in_root" : 1,
"type" : "nested",
"properties" : {
"name" : {
"first" : "text",
"last" : "text",
},
},
"email" : { "type" : "string", "index" : "not_analyzed" },
},
"teachers" : {
"include_in_root" : 0,
"include_in_all" : 0,
"type" : "nested",
"properties" : {
"name" : {
"title" : { "type" : "string", "index" : "not_analyzed" },
"first" : "text",
"last" : "text",
},
},
},
},
}
I would like to influence Elastic Search score function, such that it will give higher score to documents with matched techers.
For example, if I have search for "Smith", I would like the documents with teacher Smith be with higher score than documents with students Smith.
Does anyone knows how it should be done in Elastic Search?
In other words, how can I return the results ordered by some logic?
Thanks in advance!

Answered here:
https://groups.google.com/forum/?fromgroups#!search/%22Use$20a$20bool$20query$20to$20run$20two$20nested$20queries%22/elasticsearch/XTAb7eHcbqI/UgnKDh7ou48J

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scoring documents in Lucene 6.2.0 - lucene

In Phrase queries, the slop is zero by default, requiring exact matches. that means that if you modify your query in this way: query query = new PhraseQuery.Builder() .add(new Term("country","russia")) .build(); you'll get what you're looking for.

Related

How could I create indexes in postgres using jsonb?

Remove Subdocument items with $pull

Mongodb: Pull corresponding value from another collection

db.find vs db.aggregation to select nested array Object

Custom score in Elastic Search

Categories

Resources