Search including special characters in MongoDB Atlas - mongodb-query

I faced with the issue when I try to search for several words including a special character (section sign "§").
Example: AB § 32.
I want all words "AB", "32" and symbol "§" to be included in found documents.
In some cases document can be found, in some not.
If my document contains the following text then search finds it:
Lagrum: 32 § 1 mom. första stycket a) kommunalskattelagen (1928:370) AB
But if document contains this text then search doesn't find:
Lagrum: 32 § 1 mom. första stycket AB
For symbol "§" I use UT8-encoding "\xc2\xa7".
Index uses "lucene.swedish" analyzer.
"Content": [
{
"analyzer": "lucene.swedish",
"minGrams": 4,
"tokenization": "nGram",
"type": "autocomplete"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
Query looks like:
{
"index": "test_index",
"compound": {
"filter": [
{
"text": {
"query": [
"111111111111"
],
"path": "ProductId"
}
},
],
"must": [
{
"autocomplete": {
"query": [
"AB"
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"\xc2\xa7",
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"32"
],
"path": "Content"
}
}
],
},
"count": {
"type": "lowerBound",
"threshold": 500
}
}
The question is what is wrong with the search and how can I get a correct result (return both above mentioned documents) ?

Focusing only on the content field, here is an index definition that should work for your requirements. The docs are here. Let me know if this works for you.
{
"mappings": {
"dynamic": false,
"fields": {
"content": [
{
"type": "autocomplete",
"tokenization": "nGram",
"minGrams": 4,
"maxGrams": 7,
"foldDiacritics": false,
"analyzer": "lucene.whitespace"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
}
}
}

Related

Not able to match same json which has mismatch in indexes of array of objects (Karate) [duplicate]

This question already has an answer here:
Is there a simple match for objects containing array where the array content order doesn't matter?
(1 answer)
Closed 1 year ago.
Trying to match two jsons, but getting test fails. Well, both jsons are the same but objects indexes inside the array are not same. I think should not make any difference. Following are two jsons:
This is the code line: And match response contains ScenarioModelResponse where
**response : **
{
"relationships": [
{
"sourceId": "36",
"targetId": "149",
"type": "Reid Enright"
}
],
"modelId": "027f93d1-ef9e-4f1e-b2c4-684436c5b18a",
"elements": [
{
"externalRefId": "36",
"attributes": {
"jsonPbject": "Reid Enright"
},
"id": "057f7b7e-11b9-4779-97c0-67485153c285",
"type": "Rocky Shore"
},
{
"externalRefId": "149",
"attributes": {
"jsonPbject": "Ben Lyon"
},
"id": "325b989e-b299-4cfc-86b5-0813106da38e",
"type": "Claire Voyance"
}
]
}
ScenarioModelResponse :
{
"relationships": [
{
"sourceId": "36",
"targetId": "149",
"type": "Reid Enright"
}
],
"modelId": "027f93d1-ef9e-4f1e-b2c4-684436c5b18a",
"elements": [
{
"externalRefId": "149",
"attributes": {
"jsonPbject": "Ben Lyon"
},
"id": "325b989e-b299-4cfc-86b5-0813106da38e",
"type": "Claire Voyance"
},
{
"externalRefId": "36",
"attributes": {
"jsonPbject": "Reid Enright"
},
"id": "057f7b7e-11b9-4779-97c0-67485153c285",
"type": "Rocky Shore"
}
]
}
This the error I am getting after execution :
$.elements[0].externalRefId | not equal (STRING:STRING)
'149'
'36'
The arrays are NOT the same. This can be solved in 2 lines:
* match response.relationships == expected.relationships
* match response.elements contains only expected.elements
For a detailed explanation, refer:
https://stackoverflow.com/a/65939070/143475
https://stackoverflow.com/a/55710769/143475

Nested "for loop" searches in SQL - Azure CosmosDB

I am using Cosmos DB and have a document with the following simplified structure:
{
"id1":"123",
"stuff": [
{
"id2": "stuff",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big ostrich",
"meta": 1
}
]
},
{
"id3": "default",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
}
My issue is - I have an array of these documents and need to search name to see if it matches my search word. For example I want both big trees and trees to return if a user types in trees.
So currently I push every document into an array and do the following:
For each document
for each stuff
for each a.b.c.d[0].e
for each classes
var splice = name.split(' ')
if (splice.includes(searchWord))
return id1, id2 and id3.
Using cosmosDB I am using SQL with the following code:
client.queryDocuments(
collection,
`SELECT * FROM root r`
).toArray((err, results) => {stuff});
This effectively brings every document in my collection into an array to perform the search manually above as mentioned.
This is going to cause issues when I have 1000s or 1,000,000s of documents in the array and I believe I should be leveraging the search mechanics available within Cosmos itself. Is anyone able to help me to work out what SQL query would be able to perform this type of function?
Having searched everything is it also possible to search the 5 latest documents?
Thanks for any insight in advance!
1.Is anyone able to help me to work out what SQL query would be able to
perform this type of function?
According to your sample and description, I suggest you using ARRAY_CONTAINS in cosmos db sql. Please refer to my sample:
sample documents:
[
{
"id1": "123",
"stuff": [
{
"id2": "stuff",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big ostrich",
"meta": 1
}
]
},
{
"id3": "default",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
},
{
"id1": "456",
"stuff": [
{
"id2": "stuff2",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things2",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "trees",
"meta": 1
}
]
},
{
"id3": "default2",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
},
{
"id1": "789",
"stuff": [
{
"id2": "stuff3",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things3",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big",
"meta": 1
}
]
},
{
"id3": "default3",
"name": "other",
"classes": [
{
"name": "big trees",
"meta": 1
}
]
}
]
}
]
}
}
}
}
]
}
]
query :
SELECT distinct c.id1,stuff.id2,e.id3 FROM c
join stuff in c.stuff
join d in stuff.a.b.c.d
join e in d.e
where ARRAY_CONTAINS(e.classes,{name:"trees"},true)
or ARRAY_CONTAINS(e.classes,{name:"big trees"},true)
output:
2.Having searched everything is it also possible to search the 5 latest
documents?
Per my research, features like LIMIT is not supported in cosmos so far. However , TOP is supported by cosmos db. So if you could add sort field(such as date or id), then you could use sql:
select top 5 from c order by c.sort desc

Elastic Search Query for displaying the documents when it contains all the set of specific properties

I am new to Elastic Search APIs. I have a requirement where i need to query and list the documents which compulsorily contains following properties, say
"request: "/v3?id=100000" & "type: "GET"
Result should contains list of documents containing both the above. I have tried the following and it gets either of the above.
{
"query": {
"match": {
"type": "GET"
}
}
}
I tried
{
"query": {
"match": {
"type": "GET",
"request: "/v3/id=100000"
}
}
}
It fails...
Can someone suggest me a query to list all the docs with both the properties set as above ? Not sure how to use filters, if I try it shows failures - parse exceptions.
My example document:
{
"_index": "logstash-2016.04.22",
"_type": "endpoint-access",
"_id": "fAhTQkDRQTiHKlzuleNA",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2016-04-22T15:26:35.153Z",
"offset": "43714176",
"ident": "-",
"auth": "-",
"timestamp": "22/Apr/2016:15:26:35 +0000",
"type": "GET",
"request": "/v3?id=1b32e833-b521",
"httpversion": "1.1",
"response": "500",
"bytes": "265",
"referrer": "-",
"agent": "-",
"x_forwarded_for": "\"101.2.123.24\""
"host": "101.123.115.167"
},
"sort": [
1461338795153,
1461338795153
]
}
You may use "must" to get the result:
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "GET"
}
},
{
"match": {
"request": "/v3/id=100000"
}
}
]
}
}
}

Cloudant Search Queries Index Function

I can't find very much documentation on how to properly define the index function such that I can do a full text search on the information that I need.
I've used the Alchemy API to add "entities" json to my documents.
For instance, I have a document with the following:
"_id": "redacted",
"_rev": "redacted",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
I haven't been able to find any code snippets showing how to construct a search index function that looks at nested json.
My stab at indexing the whole object appears to be incorrect.
This is the full design document:
{
"_id": "_design/entities",
"_rev": "redacted",
"views": {},
"language": "javascript",
"indexes": {
"entities": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.entities.relevance > 0.5){\n index(\"default\", doc.entities.text, {\"store\":\"yes\"});\n }\n\n}"
}
}
}
And the search index formatted a little bit more clearly is
function (doc) {
if (doc.entities.relevance > 0.5){
index("default", doc.entities.text, {"store":"yes"});
}
}
Adding the for loop as suggested below makes a lot of sense.
However, I still am not able to return any results.
My query is
"https://user.cloudant.com/calbills/_design/entities/_search/entities?q=Governors"
Server response is:
{"total_rows":0,"bookmark":"g2o","rows":[]}
The "for..in" style loop doesn't seem to work.
However, I do get results using the more standard for loop loops.
function (doc) {
if(doc.entities){
var arrayLength = doc.entities.length;
for (var i = 0; i < arrayLength; i++) {
if (parseFloat(doc.entities[i].relevance) > 0.5)
index("default", doc.entities[i].text);
}
}
}
Cheers!
Your need to loop on the elements in the doc.entities array.
function (doc) {
for(entity in doc.entities){
if (parseFloat(entity.relevance) > 0.5){
index("default", entity.text, {"store":"yes"});
}
}
}
This is what I tried :
function(doc){
if(doc.entities){
for( var p in doc.entities ){
if (doc.entities[p].relevance > 0.5)
{
index("entitiestext", doc.entities[p].text, {"store":"yes"});
}
}
}
}
Query String used :"q=entitiestext:California Constitution&include_docs=true"
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
},
"doc": {
"_id": "redacted",
"_rev": "4-7f6e6db246abcf2f884dc0b91451272a",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
}
}
]
}
Query String used: q=entitiestext:California Constitution
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
}
}
]
}

elasticsearch / lucene highlight

I'm using ElasticSearch to index documents.
My mapping is:
"mongodocid": {
"boost": 1.0,
"store": "yes",
"type": "string"
},
"fulltext": {
"boost": 1.0,
"index": "analyzed",
"store": "yes",
"type": "string",
"term_vector": "with_positions_offsets"
}
To highlight the complete fulltext I am setting number_of_framgments to 0.
If I do the following Lucene-like string query:
{
"highlight": {
"pre_tags": "<b>",
"fields": {
"fulltext": {
"number_of_fragments": 0
}
},
"post_tags": "</b>"
},
"query": {
"query_string": {
"query": "fulltext:test"
}
},
"size": 100
}
For some documents in the result set the length of the highlighted fulltext is smaller than the fulltext itself.
Since I am setting number_of_fragments to 0 and pre_tags/post_tags are added this should not happen.
Now comes the strange behaviour: If I only search for one of the failing elements by doing this:
{
"highlight": {
"pre_tags": "<b>",
"fields": {
"fulltext": {
"number_of_fragments": 0
}
},
"post_tags": "</b>"
},
"query": {
"query_string": {
"query": "fulltext:test AND mongodocid:4d0a861c2ebef6032c00b1ec"
}
},
"size": 100
}
then all works fine.
Any ideas?
Sounds like issue which has been fixed in 0.14.0 (see #479). As of writing the 0.14.0 hasn't been released yet, can you try master?