How to know whether a email address exists on Elasticsearch? - api

So I have added some data (email address with name) on elasticsearch . Now want to verify a particular email address is exists or not .
Below is the code which I have Executed via Postman :
**URL :** localhost:9200/demo1/demo_emails/_bulk (Put request)
**Raw Data (json) :**
{ "create" : { "_index" : "demo1", "_type" : "Supp_emails", "_id" : "2" } }
{ "name" : "x1", "email": "x1#r.com" }
{ "create" : { "_index" : "demo1", "_type" : "Supp_emails", "_id" : "3" } }
{ "name" : "x2", "email": "x2#r.com" }
You can see "demo1" is the Index & "demo_emails" is the field type . And I have added two email address on that index.
Now want to verify whether 'x1#r.com' is exist or not ?
I have tried the below query, but its showing all details instead of one email
**URL :** localhost:9200/demo1/Supp_emails/_search?q=email:x1#r.com (Get request)
**Output :**
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.87546873,
"hits": [
{
"_index": "demo1",
"_type": "demo_emails",
"_id": "1",
"_score": 0.87546873,
"_source": {
"name": "x1",
"email": "x1#r.com"
}
},
{
"_index": "demo1",
"_type": "demo_emails",
"_id": "2",
"_score": 0.87546873,
"_source": {
"name": "x2",
"email": "x2#r.com"
}
}

When you search for email:x1#r.com the tokenizer breaks up the string into tokens. These tokens depend on which analyzer and tokenizer you've used.
I'm assuming you didn't do any fancy analysis, so these tokens would be :
GET index/_analyze
{
"tokens": [
{
"token": "x1",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "r.com",
"start_offset": 3,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
}
]
}
If you take a look at the documentation of query_string you'll find that default operator is OR
So your query actually looks for x1 OR r.com, this is why both of these documents return.
You can see this by adding the parameter http://elastic...?q=...&explain=true.
How do you solve this issue? Use a different operator. Just change the default operator to AND, d http://elastic...?q=...&default_operator=true
Useful links:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/indices-analyze.html

Related

Karate API json response - How to validate the presence of a key which sometimes come and sometimes not in the API response

I need help in validating the presence of one key in the response. The response of the API looks like -
"persons": [
{
"id": "27",
"source": {
"personId": 281,
"emailAddress": "abc#abc.com",
"firstName": "Steve"
}
},
{
"id": "28",
"source": {
"personId": 353,
"emailAddress": "abcd#abc.com",
"firstName": "John"
"LastName" : "Cena"
}
}
]
}
I want to assert if the source.LastName appears or not and if it does then it should always hold a string value.
The solution should be generic and it should work for 30 or 40 persons object also down the time, currently I am using karate version 0.9.4 and need a resolution for handling such scenarios.
Thanks in advance !
In the schema "##string" validates that the field can be null or a string.
Sample Code:
Feature: Schema validation
Scenario:
* def resp =
"""
{
"persons": [
{
"id": "27",
"source": {
"personId": 281,
"emailAddress": "abc#abc.com",
"firstName": "Steve"
}
},
{
"id": "28",
"source": {
"personId": 353,
"emailAddress": "abcd#abc.com",
"firstName": "John",
"LastName" : "Cena"
}
}
]
}
"""
* def schema =
"""
{
"id": "#string",
"source": {
"personId": "#number",
"emailAddress": "#string",
"firstName": "#string",
"LastName": "##string"
}
}
"""
* match each resp.persons == schema

How to find match elements in between two collections in mongodb?

I am working on mongodb database, but i am little stuck in one logic, how do i find match elements in between two collections in mongodb.
Users Collection
[{
"_id": "57cd539d168df87ae2695543",
"userid": "3658975589",
"name": "John Doe",
"email": "johndoe#gmail.com",
"number": "123654789"
}, {
"_id": "57cd53e6168df87ae2695544",
"userid": "789456123",
"name": "William Rust",
"email": "williamrust#gmail.com",
"number": "963258741"
}]
Contacts Collection
[{
"_id": "57cd2f6c3966037787ce9550",
"contact": [{
"id": "457899979",
"fullname": "Abcd Hello",
"phonenumber": "123575784565",
"currentUserid": "123456789"
}, {
"id": "7994949849",
"fullname": "Keyboard Mouse",
"phonenumber": "23658974262",
"currentUserid": "123456789"
}, {
"id": "7848848885",
"fullname": "John Doe",
"phonenumber": "852147852",
"currentUserid": "123456789"
}]
}]
So i want to find (phone number) matched elements from these two collections and list out those elements with their name and email.
Please kindly go through my post and suggest me some solution.
I'm guessing that you want to do is "aggregate + lookup". Something like this:
db.users.aggregate([{$lookup:
{
from: "contacts",
localField: "number",
foreignField: "phonenumber",
as: "same"
}
},
{
$match: { "same": { $ne: [] } }
}
])
As a result you get:
{
"_id" : "57cd539d168df87ae2695543",
"userid" : "3658975589",
"name" : "Anshuman Pattnaik",
"email" : "anshuman#gmail.com",
"number" : "7022650603",
"same" : [
{
"_id" : ObjectId("5b361b864aa5144b974c9733"),
"id" : "7848848885",
"fullname" : "Anshuman Pattnaik",
"phonenumber" : "7022650603",
"currentUserid" : "123456789"
}
]
}
If you want show only the name and the email, you have to add { $project: { name: 1, email:1, _id:0 }
db.users.aggregate([{$lookup:
{
from: "contacts",
localField: "number",
foreignField: "phonenumber",
as: "same"
}
},
{
$match: { "same": { $ne: [] } }
},
{ $project: { name: 1, email:1, _id:0 }
])
Then you'll get:
{ "name" : "Anshuman Pattnaik", "email" : "anshuman#gmail.com" }
For this to work you have to correct the insert of your contacts like this:
db.contacts.insert(
[{
"id": "457899979",
"fullname": "Abcd Hello",
"phonenumber": "123575784565",
"currentUserid": "123456789"
}, {
"id": "7994949849",
"fullname": "Keyboard Mouse",
"phonenumber": "23658974262",
"currentUserid": "123456789"
}, {
"id": "7848848885",
"fullname": "Anshuman Pattnaik",
"phonenumber": "7022650603",
"currentUserid": "123456789"
}]
)
Hope it works!
For more information https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
it's not your complete answer, but it may help you to solve your problem.
you can compare two documents using below function. for more details see this answer
var compareCollections = function(){
db.users collection.find().forEach(function(obj1){
db.contacts collection.find({/*if you know some properties, you can put them here...if don't, leave this empty*/}).forEach(function(obj2){
var equals = function(o1, o2){
// some code.
};
if(equals(ob1, obj2)){
// Do what you want to do
}
});
});
};
db.eval(compareCollections);

Transform JSON response with lodash

I'm new in lodash (v3.10.1), and having a hard time understanding.
Hope someone can help.
I have an input something like this:
{
{"id":1,"name":"Matthew","company":{"id":1,"name":"abc","industry":{"id":5,"name":"Medical"}}},
{"id":2,"name":"Mark","company":{"id":1,"name":"abc","industry":{"id":5,"name":"Medical"}}},
{"id":3,"name":"Luke","company":{"id":1,"name":"abc","industry":{"id":5,"name":"Medical"}}},
{"id":4,"name":"John","company":{"id":1,"name":"abc","industry":{"id":5,"name":"Medical"}}},
{"id":5,"name":"Paul","company":{"id":1,"name":"abc","industry":{"id":5,"name":"Medical"}}}
];
I would like to output this or close to this:
{
"industries": [
{
"industry":{
"id":5,
"name":"Medical",
"companies": [
{
"company":{
"id":1,
"name":"abc",
"employees": [
{"id":1,"name":"Matthew"},
{"id":2,"name":"Mark"},
{"id":3,"name":"Luke"},
{"id":4,"name":"John"},
{"id":5,"name":"Paul"}
]
}
}
]
}
}
]
}
Here's something that gets you close to what you want. I structured the output to be an object instead of an array. You don't need the industries or industry properties in your example output. The output structure looks like this:
{
"industry name": {
"id": "id of industry",
"companies": [
{
"company name": "name of company",
"id": "id of company",
"employees": [
{
"id": "id of company",
"name": "name of employee"
}
]
}
]
}
}
I use the _.chain function to wrap the collection with a lodash wrapper object. This enables me to explicitly chain lodash functions.
From there, I use the _.groupBy function to group elements of the collection by their industry name. Since I'm chaining, I don't have to pass in the array again to the function. It's implicitly passed via the lodash wrapper. The second argument of the _.groupBy is the path to the value I want to group elements by. In this case, it's the path to the industry name: company.industry.name. _.groupBy returns an object with each employee grouped by their industry (industries are keys for this object).
I then do use _.transform to transform each industry object. _.transform is essentially _.reduce except that the results returned from the _.transform function is always an object.
The function passed to the _.transform function gets executed against each key/value pair in the object. In the function, I use _.groupBy again to group employees by company. Based off the results of _.groupBy, I map the values to the final structure I want for each employee object.
I then call the _.value function because I want to unwrap the output collection from the lodash wrapper object.
I hope this made sense. If it doesn't, I highly recommend reading Lo-Dash Essentials. After reading the book, I finally got why lodash is so useful.
"use strict";
var _ = require('lodash');
var emps = [
{ "id": 1, "name": "Matthew", "company": { "id": 1, "name": "abc", "industry": { "id": 5, "name": "Medical" } } },
{ "id": 2, "name": "Mark", "company": { "id": 1, "name": "abc", "industry": { "id": 5, "name": "Medical" } } },
{ "id": 3, "name": "Luke", "company": { "id": 1, "name": "abc", "industry": { "id": 5, "name": "Medical" } } },
{ "id": 4, "name": "John", "company": { "id": 1, "name": "abc", "industry": { "id": 5, "name": "Medical" } } },
{ "id": 5, "name": "Paul", "company": { "id": 1, "name": "abc", "industry": { "id": 5, "name": "Medical" } } }
];
var result = _.chain(emps)
.groupBy("company.industry.name")
.transform(function(result, employees, industry) {
result[industry] = {};
result[industry].id = _.get(employees[0], "company.industry.id");
result[ industry ][ 'companies' ] = _.map(_.groupBy(employees, "company.name"), function( employees, company ) {
return {
company: company,
id: _.get(employees[ 0 ], 'company.id'),
employees: _.map(employees, _.partialRight(_.pick, [ 'id', 'name' ]))
};
});
return result;
})
.value();
Results from your example are as follows:
{
"Medical": {
"id": 5,
"companies": [
{
"company": "abc",
"id": 1,
"employees": [
{
"id": 1,
"name": "Matthew"
},
{
"id": 2,
"name": "Mark"
},
{
"id": 3,
"name": "Luke"
},
{
"id": 4,
"name": "John"
},
{
"id": 5,
"name": "Paul"
}
]
}
]
}
}
If you ever wanted the exact same structure as in the questions, I solved it using the jsonata library:
(
/* lets flatten it out for ease of accessing the properties*/
$step1 := $ ~> | $ |
{
"employee_id": id,
"employee_name": name,
"company_id": company.id,
"company_name": company.name,
"industry_id": company.industry.id,
"industry_name": company.industry.name
},
["company", "id", "name"] |;
/* now the magic begins*/
$step2 := {
"industries":
[($step1{
"industry" & $string(industry_id): ${
"id": $distinct(industry_id)#$I,
"name": $distinct(industry_name),
"companies": [({
"company" & $string(company_id): {
"id": $distinct(company_id),
"name": $distinct(company_name),
"employees": [$.{
"id": $distinct(employee_id),
"name": $distinct(employee_name)
}]
}
} ~> $each(function($v){ {"company": $v} }))]
}
} ~> $each(function($v){ {"industry": $v} }))]
};
)
You can see it in action on the live demo site: https://try.jsonata.org/VvW4uTRz_

ElasticSearch - return the complete value of a facet for a query

I've recently started using ElasticSearch. I try to complete some use cases. I have a problem for one of them.
I have indexed some users with their full name (e.g. "Jean-Paul Gautier", "Jean De La Fontaine").
I try to get all the full names responding to some query.
For example, I want the 100 most frequent full names beggining by "J"
{
"query": {
"query_string" : { "query": "full_name:J*" } }
},
"facets":{
"name":{
"terms":{
"field": "full_name",
"size":100
}
}
}
}
The result I get is all the words of the full names : "Jean", "Paul", "Gautier", "De", "La", "Fontaine".
How to get "Jean-Paul Gautier" and "Jean De La Fontaine" (all the full_name values begging by 'J') ? The "post_filter" option is not doing this, it only restrict this above subset.
I have to configure "how works" this full_name facet
I have to add some options to this current query
I have to do some "mapping" (very obscure for the moment)
Thanks
You just need to set "index": "not_analyzed" on the field, and you will be able to get back the full, unmodified field values in your facet.
Typically, it's nice to have one version of the field that isn't analyzed (for faceting) and another that is (for searching). The "multi_field" field type is useful for this.
So in this case, I can define a mapping as follows:
curl -XPUT "http://localhost:9200/test_index/" -d'
{
"mappings": {
"people": {
"properties": {
"full_name": {
"type": "multi_field",
"fields": {
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"full_name": {
"type": "string"
}
}
}
}
}
}
}'
Here we have two sub-fields. The one with the same name as the parent will be the default, so if you search against the "full_name" field, Elasticsearch will actually use "full_name.full_name". "full_name.untouched" will give you the facet results you want.
So next I add two documents:
curl -XPUT "http://localhost:9200/test_index/people/1" -d'
{
"full_name": "Jean-Paul Gautier"
}'
curl -XPUT "http://localhost:9200/test_index/people/2" -d'
{
"full_name": "Jean De La Fontaine"
}'
And then I can facet on each field to see what is returned:
curl -XPOST "http://localhost:9200/test_index/_search" -d'
{
"size": 0,
"facets": {
"name_terms": {
"terms": {
"field": "full_name"
}
},
"name_untouched": {
"terms": {
"field": "full_name.untouched",
"size": 100
}
}
}
}'
and I get back the following:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"facets": {
"name_terms": {
"_type": "terms",
"missing": 0,
"total": 7,
"other": 0,
"terms": [
{
"term": "jean",
"count": 2
},
{
"term": "paul",
"count": 1
},
{
"term": "la",
"count": 1
},
{
"term": "gautier",
"count": 1
},
{
"term": "fontaine",
"count": 1
},
{
"term": "de",
"count": 1
}
]
},
"name_untouched": {
"_type": "terms",
"missing": 0,
"total": 2,
"other": 0,
"terms": [
{
"term": "Jean-Paul Gautier",
"count": 1
},
{
"term": "Jean De La Fontaine",
"count": 1
}
]
}
}
}
As you can see, the analyzed field returns single-word, lower-cased tokens (when you don't specify an analyzer, the standard analyzer is used), and the un-analyzed sub-field returns the unmodified original text.
Here is a runnable example you can play with:
http://sense.qbox.io/gist/7abc063e2611846011dd874648fd1b77450b19a5
Try altering the mapping for "full_name":
"properties": {
"full_name": {
"type": "string",
"index": "not_analyzed"
}
...
}
not_analyzed means that it will be kept as is, capitals, spaces, dashes etc, so that "Jean De La Fontaine" will stay findable and not be tokenized into "Jean" "De" "La" "Fontaine"
You can experiment with different analyzers using the api
Notice what the standard one does to a mulit part name:
GET /_analyze?analyzer=standard
{'Jean Claude Van Dame'}
{
"tokens": [
{
"token": "jean",
"start_offset": 2,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "claude",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "van",
"start_offset": 14,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "dame",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}

Query for missing fields in nested documents

I have a user document which contains many tags
Here is the mapping:
{
"user" : {
"properties" : {
"tags" : {
"type" : "nested",
"properties" : {
"id" : {
"type" : "string",
"index" : "not_analyzed",
"store" : "yes"
},
"current" : {
"type" : "boolean"
},
"type" : {
"type" : "string"
},
"value" : {
"type" : "multi_field",
"fields" : {
"value" : {
"type" : "string",
"analyzer" : "name_analyzer"
},
"value_untouched" : {
"type" : "string",
"index" : "not_analyzed",
"include_in_all" : false
}
}
}
}
}
}
}
}
Here are the sample user documents:
User 1
{
"created_at": 1317484762000,
"updated_at": 1367040856000,
"tags": [
{
"type": "college",
"value": "Dhirubhai Ambani Institute of Information and Communication Technology",
"id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
},
{
"type": "company",
"value": "alma connect",
"id": "58ad4afcc8415216ea451339aaecf311ed40e132"
},
{
"type": "company",
"value": "Google",
"id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5",
"current": true
},
{
"type": "discipline",
"value": "B.Tech.",
"id": "a7706af7f1477cbb1ac0ceb0e8531de8da4ef1eb",
"institute_id": "4fb424a5addf32296f00013a"
},
]
}
User 2:
{
"created_at": 1318513355000,
"updated_at": 1364888695000,
"tags": [
{
"type": "college",
"value": "Dhirubhai Ambani Institute of Information and Communication Technology",
"id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361829"
},
{
"type": "college",
"value": "Bharatiya Vidya Bhavan's Public School, Jubilee hills, Hyderabad",
"id": "d20730345465a974dc61f2132eb72b04e2f5330c"
},
{
"type": "company",
"value": "Alma Connect",
"id": "93bc8199c5fe7adfd181d59e7182c73fec74eab5"
},
{
"type": "sector",
"value": "Website and Software Development",
"id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a4"
}
]
}
User 3:
{
"created_at": 1318513355001,
"updated_at": 1364888695010,
"tags": [
{
"type": "college",
"value": "Dhirubhai Ambani Institute of Information and Communication Technology",
"id": "a6f51ef8b34eb8f24d1c5be5e4ff509e2a361821"
},
{
"type": "sector",
"value": "Website and Software Development",
"id": "dc387d78fc99ab43e6ae2b83562c85cf3503a8a1"
}
]
}
Using the above ES documents for search, I want to construct a query where I need to fetch users who have company tags in nested tag documents or the users who do not have any company tags. What will be my search query?
For example in above case, if search for google tag, then the returned documents should be 'user 1' and 'user 3' (as user 1 has company tag google and user 3 has no company tag). User 2 is not returned as it has a company tag other than google too.
Not trivial at all, mainly due to the not have a type:company tag clause. Here's what I came up with:
{
"or" : {
"filters" : [ {
"nested" : {
"filter" : {
"and" : {
"filters" : [ {
"term" : {
"tags.value" : "google"
}
}, {
"term" : {
"tags.type" : "company"
}
} ]
}
},
"path" : "tags"
}
}, {
"not" : {
"filter" : {
"nested" : {
"filter" : {
"term" : {
"tags.type" : "company"
}
},
"path" : "tags"
}
}
}
} ]
}
}
It contains an or filter with two nested clauses: the first one finds the documents that have tags.type:company and tags.value:google, while the second one finds all the documents that don't have any tags.type:company.
This needs to be optimized though since and/or/not filters don't take advantage of caching for filters that work with bitsets, like the term filter does. It would be best to take some more time to find a way to use a bool filter and obtain the same result. Have a lookt this article to know more.