ranked full text search results using Lucene with modeshape

ranked full text search results using Lucene with modeshape - lucene

I'm trying to get full-text search working with modeshape. I'm particularly interested in ranked results based on lucene index. Here is my repository configuration
"indexProviders": {
"lucene": {
"classname": "lucene",
"directory": "${user.home}/repository/indexes"
}
},
"indexes": {
"textFromFiles": {
"kind": "text",
"provider": "lucene",
"nodeType": "nt:resource",
"columns": "jcr:data(BINARY)"
}
},
I noticed a lucene index created at the specified location. I added 10-15 filesc with varied number of occurrence of search term into repository, and tried searching using some words. I am printing the score as shown below
QueryManager querymgr = session.getWorkspace().getQueryManager();
String query = "SELECT file.* FROM [nt:hierarchyNode] as file LEFT JOIN [nt:resource] as data ON ISCHILDNODE(data , file) WHERE "
+ "contains(data.*, '" + searchText + "')";
Query createQuery = querymgr.createQuery(query, Query.JCR_SQL2);
QueryResult result = createQuery.execute();
RowIterator rows = result.getRows();
while(rows.hasNext()){
Row nextRow = rows.nextRow();
LOGGER.info("score : {}", nextRow.getScore());
}
But, here score is always 1.0 for all results.
Also tried a simpler query without join...
SELECT data.* FROM [nt:resource] as data WHERE contains(data.*, 'searchterm')
but no luck

Related

Query for entire JSON document in nested JSON schema

Background:
I wish to locate the entire JSON document that has a condition where "state" = "new" and where length(Features.id) > 4
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
This is what I have tried to do:
Since this is a nested document. My query looks like this:
A stackoverflow member has helped me to access the nested contents within the query, but is there a way to obtain the full document
I have used:
SELECT VALUE t.id FROM t IN f.feedback.Features where t.state = 'new' and length(t.id)>4
This will give me the ids.
My desire is to have access to the full document with this condition?
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
Any help is appreciated

Try this
SELECT *
FROM f
WHERE
f.feedback.Features[0].state = 'new'
AND length(f.feedback.Features[0].id)>4
Here is the SELECT spec for CosmosDB for more details
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-select
Also, check out "working with JSON" in CosmosDB notes
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-working-with-json
If the Features array has more than 1 value, you can use EXISTS clause to search within them. See specs of EXISTS here with examples:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#exists-expression

SQL query as string in JSON file

It is possible to store SQL query in JSON array of objects?? Because when i have something like this:
[{
"id": "1",
"query": "SELECT ID FROM table"
},
{
"id": "2",
"query": "SELECT ID FROM table"
},
{
"id": "3",
"query": "SELECT USER FROM table"
}
]
JSON file in VSCode is ok no error it is getting nasty when i want to store complex queries with joins etc.
for example this query even if i format it correctly it will generate error in JSON file about formatting
(just example i not it is not valid)
SELECT user, id, , count(price) as numrev
FROM price
where id = 1 and user = 0
group by user, id, price
that it can't be stored in string

It is bit easy to do, but requires on extra step.
Simply convert/encode you raw SQL queries in base64 text.
Decode the text before you execute the queries in you code.
If the JSON file is created automatically by a program/code
All most all programming languages proved base64 encode / decode functions as part of the core if not download compatible package / library to achieve this automation
var queries = [{
"id": "1",
"query": "U0VMRUNUIElEIEZST00gdGFibGU="
},
{
"id": "2",
"query": "U0VMRUNUIElEIEZST00gdGFibGU="
},
{
"id": "3",
"query": "U0VMRUNUIFVTRVIgRlJPTSB0YWJsZQ=="
},
{
"id": "4",
"query": "U0VMRUNUIHVzZXIsIGlkLCAsIGNvdW50KHByaWNlKSBhcyBudW1yZXYKICBGUk9NIHByaWNlCiAgd2hlcmUgaWQgPSAxIGFuZCB1c2VyID0gMCAKICBncm91cCBieSB1c2VyLCBpZCwgcHJpY2U="
}
];
for (i = 0; i < queries.length; i++) {
console.log("id = " + queries[i].id + ", query = " + atob(queries[i].query));
}
when you parse you JSON array make sure to decode before you execute the SQL queries.
let me know this one helped you.. ☺☻☺
FYI , refer http://www.utilities-online.info/base64/
enter image description here

Filter datetime value within a JSON collection of a collection inside CosmoDB using SQL

Using Microsoft CosmoDBs SQL like syntax. I have a collection of entries that follow a schema like this (simplified for this post)
{"id":"123456",
"activities": {
"activityA": {
"loginType": "siteA",
"lastLogin": "2018-02-06T19:42:22.205Z"
},
"activityB": {
"loginType": "siteB",
"lastLogin": "2018-03-07T11:39:50.346Z"
},
"activityC": {
"loginType": "siteC",
"lastLogin": "2018-04-08T15:21:15.312Z"
}
}
}
Without knowing the exact index into the activities entry activities list/sub collection, how can I query to get back all items in the Cosmo db collection that have a "lastLogin" matching a date range?
If I only wanted to search on the first item in the activities list, I could do something like this using index 0.
SELECT * FROM c where (c.activities[0].lastLogin > '2018-01-01T00:00:00') and (c.activities[0].lastLogin <= '2019-02-15T00:00:00')
But I want to search all entries in the list. Would be nice if there was something like this:
SELECT * FROM c where (c.activities[?].lastLogin > '2018-01-01T00:00:00') and (c.activities[?].lastLogin <= '2019-02-15T00:00:00')
But that doesn't exist.

The answer is that you can not iterate over a non list collection. Had the collection item been structured like this
{"id":"123456",
"activities": [
{ "label": "activityA",
"loginType": "siteA",
"lastLogin": "2018-02-06T19:42:22.205Z"
},
{
"label": "activtyB",
"loginType": "siteB",
"lastLogin": "2018-03-07T11:39:50.346Z"
},
etc...
It would be easy to crease a UDF to iterate over with something like this
UDF: filterActivityList
function(activityList, targetDateTimeStart, targetDateTimeEnd) {
var s, _i, _len;
for (_i = 0, _len = activityList.length; _i < _len; _i++) {
s = activityList[_i];
if ((s.lastLogin >= targetDateTimeStart) && (s.lastLogin < targetDateTimeEnd))
{
return true;
}
}
return false;
}
Then to query:
select * from c WHERE udf.filterActivityList(c.activities, '2018-01-01T00:00:00', '2018-02-01T00:00:00');
If I were to leave the structure as a JSON hierarchy instead of converting it to a JSON list then I would have to write another udf to accept the top level node of the hierarchy as an input parameter and have it convert the notes under it to a list, then apply the udf.filterActivityList UDF to the result. From my experience this approach is resource intensive and takes a very long time for Cosmo to process.

MongoDB like statement with multiple fields

With SQL we can do the following :
select * from x where concat(x.y ," ",x.z) like "%find m%"
when x.y = "find" and x.z = "me".
How do I do the same thing with MongoDB, When I use a JSON structure similar to this:
{
data:
[
{
id:1,
value : "find"
},
{
id:2,
value : "me"
}
]
}

The comparison to SQL here is not valid since no relational database has the same concept of embedded arrays that MongoDB has, and is provided in your example. You can only "concat" between "fields in a row" of a table. Basically not the same thing.
You can do this with the JavaScript evaluation of $where, which is not optimal, but it's a start. And you can add some extra "smarts" to the match as well with caution:
db.collection.find({
"$or": [
{ "data.value": /^f/ },
{ "data.value": /^m/ }
],
"$where": function() {
var items = [];
this.data.forEach(function(item) {
items.push(item.value);
});
var myString = items.join(" ");
if ( myString.match(/find m/) != null )
return 1;
}
})
So there you go. We optimized this a bit by taking the first characters from your "test string" in each word and compared the tokens to each element of the array in the document.
The next part "concatenates" the array elements into a string and then does a "regex" comparison ( same as "like" ) on the concatenated result to see if it matches. Where it does then the document is considered a match and returned.
Not optimal, but these are the options available to MongoDB on a structure like this. Perhaps the structure should be different. But you don't specify why you want this so we can't advise a better solution to what you want to achieve.

facets with ravendb

i am trying to work with the facet ability in ravendb but getting strange results.
i have a documents like :
{
"SearchableModel": "42LC2RR ",
"ModelName": "42LC2RR",
"ModelID": 490578,
"Name": "LG 42 Television 42LC2RR",
"Desctription": "fffff",
"Image": "1/4/9/8/18278941c",
"MinPrice": 9400.0,
"MaxPrice": 9400.0,
"StoreAmounts": 1,
"AuctionAmounts": 0,
"Popolarity": 3,
"ViewScore": 0.0,
"ReviewAmount": 2,
"ReviewScore": 45,
"Sog": "E-TV",
"SogID": 1,
"IsModel": true,
"Manufacrurer": "LG",
"ParamsList": [
"1994267",
"46570",
"4134",
"4132",
"4118",
"46566",
"4110",
"180676",
"239517",
"750771",
"2658507",
"2658498",
"46627",
"4136",
"169941",
"169846",
"145620",
"169940",
"141416",
"3190767",
"3190768",
"144720",
"2300706",
"4093",
"4009",
"1418470",
"179766",
"190025",
"170557",
"170189",
"43768",
"4138",
"67976",
"239516",
"3190771",
"141195"
],
}
where the ParamList each represents a property of the product and in our application we have in cache what each param represents.
when searching for a specific product i would like to count all the returning attributes to be able to add the amount of each item after the search.
After searching lg in televisions category i want to get :
Param:4134 witch is a representative of LCD and the amount :65.
but unfortunately i am getting strange results. only some params are counted and some not.
on some searchers where i am getting results back i dont get any amounts back.
i am using the latest stable version of RavenDB.
index :
from doc in docs
from param in doc.ParamsList
select new {Name=doc.Name,Description=doc.Description,SearchNotVisible = doc.SearchNotVisible,SogID=doc.SogID,Param =param}
facet :
DocumentStore documentStore = new DocumentStore { ConnectionStringName = "Server" };
documentStore.Initialize();
using (IDocumentSession session = documentStore.OpenSession())
{
List<Facet> _facets = new List<Facet>
{
new Facet {Name = "Param"}
};
session.Store(new FacetSetup { Id = "facets/Params", Facets = _facets });
session.SaveChanges();
}
usage example :
IDictionary<string, IEnumerable<FacetValue>> facets = session.Advanced.DatabaseCommands.GetFacets("FullIndexParams", new IndexQuery { Query = "Name:lg" }, "facets/Params");
i tried many variations without success.
does anyone have ideas what am i doing wrong ?
Thanks

Use this index, it should resolve your problem:
from doc in docs
select new {Name=doc.Name,Description=doc.Description,SearchNotVisible = doc.SearchNotVisible,SogID=doc.SogID,Param = doc.ParamsList}

What analyzer you set for "Name" field. I see you search by Name "lg". By default, Ravendb use KeywordAnalyzer, means you must search by exact name. You should set another analyzer for Name or Description field (StandardAnalyzer for example).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

ranked full text search results using Lucene with modeshape - lucene

Related

Query for entire JSON document in nested JSON schema

SQL query as string in JSON file

Filter datetime value within a JSON collection of a collection inside CosmoDB using SQL

MongoDB like statement with multiple fields

facets with ravendb

Categories

Resources