Ordering a query by the string length of one of the fields - lucene

In RavenDB (build 2330) I'm trying to order my results by the string length of one of the indexed terms.
var result = session.Query<Entity, IndexDefinition>()
.Where(condition)
.OrderBy(x => x.Token.Length);
However the results look to be un-sorted. Is this possible in RavenDB (or via a Lucene query) and if so what is the syntax?

You need to add a field to IndexDefinition to order by, and define the SortOption to Int or something more appropriate (however you don't want to use String which is default).
If you want to use the Linq API like in your example you need to add a field named Token_Length to the index' Map function (see Matt's comment):
from doc in docs
select new
{
...
Token_Length = doc.TokenLength
}
And then you can query using the Linq API:
var result = session.Query<Entity, IndexDefinition>()
.Where(condition)
.OrderBy(x => x.Token.Length);
Or if you really want the field to be called TokenLength (or something other than Token_Length) you can use a LuceneQuery:
from doc in docs
select new
{
...
TokenLength = doc.Token.Length
}
And you'd query like this:
var result = session.Advanced.LuceneQuery<Entity, IndexDefinition>()
.Where(condition)
.OrderBy("TokenLength");

Related

How can I get Examine to perform a phrase query in Umbraco 7?

I am trying to build some custom search logic in Umbraco 7 (7.3.6) which will search for multiple terms supplied by a user, where those terms may include phrases enclosed in quotes.
I have the following code which takes the supplied term, uses a regex to split individual terms (whilst maintaining those enclosed in quotes), then uses a series of GroupedOr calls to search against multiple fields
var searcher = Examine.ExamineManager.Instance.SearchProviderCollection[this.searchConfig.SiteSearchProviderName];
var searchCriteria = searcher.CreateSearchCriteria(Examine.SearchCriteria.BooleanOperation.Or);
var splitTerms = Regex.Matches(term, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
var query = searchCriteria.GroupedOr(
new[] { BaseContent.FIELD_NodeName },
this.GetValues(splitTerms, 3, 0.8F))
.Or()
.GroupedOr(
new[] { this.searchConfig.ContentFieldName },
this.GetValues(splitTerms, 1, 0.8F));
This is the GetValues method:
private IExamineValue[] GetValues(string[] terms, float boost, float fuzziness)
{
return terms
.Select(t => t.Boost(boost).Value.Fuzzy(fuzziness))
.ToArray();
}
I have a document in my index which contains the term "The quick brown fox jumps over the lazy dog". If I pass the string "\"brown fox\"" through the above logic then examine my query I can see my query object contains the following Lucene query:
(nodeName:"brown fox"~0.8) (_content:"brown fox"~0.8)
However, when I use this to build a search query as follows, I get no results.
var searchQuery = searcher
.Search(query.Compile(), 100)
.OrderByDescending(x => x.Score)
.TakeWhile(x => x.Score > 0.05f);
But if I run the exact same Lucene query using Luke I get the result I was expecting.
Is anyone able to help me understand this? Extra marks if you can explain why my boost values aren't being added to the Lucene query!!

Entity Framework filter data by string sql

I am storing some filter data in my table. Let me make it more clear: I want to store some where clauses and their values in a database and use them when I want to retrieve data from a database.
For example, consider a people table (entity set) and some filters on it in another table:
"age" , "> 70"
"gender" , "= male"
Now when I retrieve data from the people table I want to get these filters to filter my data.
I know I can generate a SQL query as a string and execute that but is there any other better way in EF, LINQ?
One solution is to use Dynamic Linq Library , using this library you can have:
filterTable = //some code to retrive it
var whereClause = string.Join(" AND ", filterTable.Select(x=> x.Left + x.Right));
var result = context.People.Where(whereClause).ToList();
Assuming that filter table has columns Left and Right and you want to join filters by AND.
My suggestion is to include more details in the filter table, for example separate the operators from operands and add a column that determines the join is And or OR and a column that determines the other row which joins this one. You need a tree structure if you want to handle more complex queries like (A and B)Or(C and D).
Another solution is to build expression tree from filter table. Here is a simple example:
var arg = Expression.Parameter(typeof(People));
Expression whereClause;
for(var row in filterTable)
{
Expression rowClause;
var left = Expression.PropertyOrField(arg, row.PropertyName);
//here a type cast is needed for example
//var right = Expression.Constant(int.Parse(row.Right));
var right = Expression.Constant(row.Right, left.Member.MemberType);
switch(row.Operator)
{
case "=":
rowClause = Expression.Equal(left, right);
break;
case ">":
rowClause = Expression.GreaterThan(left, right);
break;
case ">=":
rowClause = Expression.GreaterThanOrEqual(left, right);
break;
}
if(whereClause == null)
{
whereClause = rowClause;
}
else
{
whereClause = Expression.AndAlso(whereClause, rowClause);
}
}
var lambda = Expression.Lambda<Func<People, bool>>(whereClause, arg);
context.People.Where(lambda);
this is very simplified example, you should do many validations type casting and ... in order to make this works for all kind of queries.
This is an interesting question. First off, make sure you're honest with yourself: you are creating a new query language, and this is not a trivial task (however trivial your expressions may seem).
If you're certain you're not underestimating the task, then you'll want to look at LINQ expression trees (reference documentation).
Unfortunately, it's quite a broad subject, I encourage you to learn the basics and ask more specific questions as they come up. Your goal is to interpret your filter expression records (fetched from your table) and create a LINQ expression tree for the predicate that they represent. You can then pass the tree to Where() calls as usual.
Without knowing what your UI looks like here is a simple example of what I was talking about in my comments regarding Serialize.Linq library
public void QuerySerializeDeserialize()
{
var exp = "(User.Age > 7 AND User.FirstName == \"Daniel\") OR User.Age < 10";
var user = Expression.Parameter(typeof (User), "User");
var parsExpression =
System.Linq.Dynamic.DynamicExpression.ParseLambda(new[] {user}, null, exp);
//Convert the Expression to JSON
var query = e.ToJson();
//Deserialize JSON back to expression
var serializer = new ExpressionSerializer(new JsonSerializer());
var dExp = serializer.DeserializeText(query);
using (var context = new AppContext())
{
var set = context.Set<User>().Where((Expression<Func<User, bool>>) dExp);
}
}
You can probably get fancier using reflection and invoking your generic LINQ query based on the types coming in from the expression. This way you can avoid casting the expression like I did at the end of the example.

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

What return value to use to match two different kinds of Types

I'm using LINQ towards my MSSQL database. I have the TypeOfMetaData-table, the UserMetaData-table and the MetaDataHasType-table that has foreign-keys from the TypeOfMetaData- and the UserMetaData-table. What I need to do a method to get all MetaData and Types and return them. The problem is that I don't know what kind of return value I should use to be able to match the correct rows together.
Thanks for the help,
wardh
You could use an Anonymous Type (var) to store the result:
var result =
yourDataContext
.UserMetaData_Table
.Select(
userMetaData =>
new
{
UserMetaData = userMetaData,
Types = userMetaData.MetaDataHasTypes.Select(types => types.TypeOfMetaData),
})
.ToArray();
If this isn't what you want, could you update you question to with an example of the data context and the classes you have and what you have tried so far.

Does Dapper support the like operator?

Using Dapper-dot-net...
The following yields no results in the data object:
var data = conn.Query(#"
select top 25
Term as Label,
Type,
ID
from SearchTerms
WHERE Term like '%#T%'",
new { T = (string)term });
However, when I just use a regular String Format like:
string QueryString = String.Format("select top 25 Term as Label, Type, ID from SearchTerms WHERE Term like '%{0}%'", term);
var data = conn.Query(QueryString);
I get 25 rows back in the collection. Is Dapper not correctly parsing the end of the parameter #T?
Try:
term = "whateverterm";
var encodeForLike = term => term.Replace("[", "[[]").Replace("%", "[%]");
string term = "%" + encodeForLike(term) + "%";
var data = conn.Query(#"
select top 25
Term as Label,
Type,
ID
from SearchTerms
WHERE Term like #term",
new { term });
There is nothing special about like operators, you never want your params inside string literals, they will not work, instead they will be interpreted as a string.
note
The hard-coded example in your second snippet is strongly discouraged, besides being a huge problem with sql injection, it can cause dapper to leak.
caveat
Any like match that is leading with a wildcard is not SARGable, which means it is slow and will require an index scan.
Yes it does. This simple solution has worked for me everytime:
db.Query<Remitente>("SELECT *
FROM Remitentes
WHERE Nombre LIKE #n", new { n = "%" + nombre + "%" })
.ToList();
Best way to use this to add concat function in query as it save in sql injecting as well, but concat function is only support above than sql 2012
string query = "SELECT * from country WHERE Name LIKE CONCAT('%',#name,'%');"
var results = connection.query<country>(query, new {name});
The answer from Sam wasn't working for me so after some testing I came up with using the SQLite CONCAT equivalent which seems to work:
string sql = "SELECT * FROM myTable WHERE Name LIKE '%' || #NAME || '%'";
var data = IEnumerable data = conn.Query(sql, new { NAME = Name });
Just to digress on Sam's answer, here is how I created two helper methods to make searches a bit easier using the LIKE operator.
First, creating a method for generating a parameterized query, this method uses dynamic: , but creating a strongly typed generic method should be more desired in many cases where you want static typing instead of dynamic.
public static dynamic ParameterizedQuery(this IDbConnection connection, string sql, Dictionary<string, object> parametersDictionary)
{
if (string.IsNullOrEmpty(sql))
{
return null;
}
string missingParameters = string.Empty;
foreach (var item in parametersDictionary)
{
if (!sql.Contains(item.Key))
{
missingParameters += $"Missing parameter: {item.Key}";
}
}
if (!string.IsNullOrEmpty(missingParameters))
{
throw new ArgumentException($"Parameterized query failed. {missingParameters}");
}
var parameters = new DynamicParameters(parametersDictionary);
return connection.Query(sql, parameters);
}
Then adding a method to create a Like search term that will work with Dapper.
public static string Like(string searchTerm)
{
if (string.IsNullOrEmpty(searchTerm))
{
return null;
}
Func<string, string> encodeForLike = searchTerm => searchTerm.Replace("[", "[[]").Replace("%", "[%]");
return $"%{encodeForLike(searchTerm)}%";
}
Example usage:
var sql = $"select * from products where ProductName like #ProdName";
var herringsInNorthwindDb = connection.ParameterizedQuery(sql, new Dictionary<string, object> { { "#ProdName", Like("sild") } });
foreach (var herring in herringsInNorthwindDb)
{
Console.WriteLine($"{herring.ProductName}");
}
And we get our sample data from Northwind DB:
I like this approach, since we get helper extension methods to do repetitive work.
My solution simple to this problem :
parameter.Add("#nomeCliente", dfNomeCliPesquisa.Text.ToUpper());
query = "SELECT * FROM cadastrocliente WHERE upper(nome) LIKE " + "'%" + dfNomeCliPesquisa.Text.ToUpper() + "%'";