SQL - Combine search queries instead of multiple DB trips - sql

A search term comes from UI to search entities of a table. The order that these search results should show up in UI is like this:
First: exact match
Second: starts with that term
Third: contains a word of that term
Forth: ends with that term
Fifth: contains the term in any matter
So I first got the entities from DB:
result = entities.Where(e => e.Name.Contains(searchTerm)).ToList();
And then I rearranged them in memory:
var sortedEntities = result.Where(e => e.Name.ToLower() == searchTerm.ToLower())
.Union(result.Where(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase)))
.Union(result.Where(e => e.Name.Contains($" {searchTerm} ")))
.Union(result.Where(e => e.Name.EndsWith(searchTerm, StringComparison.OrdinalIgnoreCase)))
.Union(result.Where(e => e.Name.Contains(searchTerm)));
It was working fine until I added paging. Now if an exact match is on page 2 (in data coming from DB) it won't show up first.
The only solution I can think of is to separate the requests (so 5 requests in this case) and keep track of page size manually. My question is that is there a way to tell DB to respect that order and get the sorted data in one DB trip?

It took me some time to realize that you use Union in an attempt to order data by "match strength": first the ones that match exactly, then the ones that match with different case, etc. When I see Unions with predicates my Pavlov-conditioned mind translates it into ORs. I had to switch from thinking fast to slow.
So the problem is that there is no predictable sorting. No doubt, the chained Union statements do produce a deterministic final sort order, but it's not necessarily the order of the Unions, because each Union also executes an implicit Distinct. The general rule is, if you want a specific sort order, use OrderBy methods.
Having said that, and taking...
var result = entities
.Where(e => e.Name.Contains(searchTerm))
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize).ToList();
...the desired result seems to be obtainable by:
var sortedEntities = result
.OrderByDescending(e => e.Name == searchTerm)
.ThenByDescending(e => e.Name.ToLower() == searchTerm.ToLower())
.ThenByDescending(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase))
... etc.
(descending, because false orders before true)
However, if there are more matches than pageSize the ordering will be too late. If pageSize = 20 and item 21 is the first exact match this item will not be on page 1. Which means: the ordering should be done before paging.
The first step would be to remove the .ToList() from the first statement. If you remove it, the first statement is an IQueryable expression and Entity Framework is able to combine the full statement into one SQL statement. The next step would be to move Skip/Take to the end of the full statement and it'll also be part of the SQL.
var result = entities.Where(e => e.Name.Contains(searchTerm));
var sortedEntities = result
.OrderByDescending(e => e.Name == searchTerm)
.ThenByDescending(e => e.Name.ToLower() == searchTerm.ToLower())
.ThenByDescending(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase))
... etc
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize).ToList();
But now a new problem blows in.
Since string comparison with StringComparison.OrdinalIgnoreCase isn't supported Entity Framework will auto-switch to client-side evaluation for part of the statement. All of the filtered results will be returned from the database, but most of the the ordering and all of the paging will be done in memory.
That may not be too bad when the filter is narrow, but very bad when it's wide. So, ultimately, to do this right, you have to remove StringComparison.OrdinalIgnoreCase and settle with a somewhat less refined match strength. Bringing us to the
End result:
var result = entities.Where(e => e.Name.Contains(searchTerm));
var sortedEntities = result
.OrderByDescending(e => e.Name == searchTerm)
.ThenByDescending(e => e.Name.StartsWith(searchTerm))
.ThenByDescending(e => e.Name.Contains($" {searchTerm} "))
.ThenByDescending(e => e.Name.EndsWith(searchTerm))
.ThenByDescending(e => e.Name.Contains(searchTerm))
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize).ToList();
Why "less refined"? Because, according your comments, the database collation isn't case sensitive, so SQL can't distinguish exact matches by case without adding COLLATE statements. That's something we can't do with LINQ.

Related

How can I use a method to add a filter on a property in Entity Framework Core?

I need to conditionally add a filter to particular dates in a query. There are common preconditions and the filter will be the same. Therefore I would like the common code to be in a method which can perform these checks and then have the consumer pass in the property which the filter should be applied to (could be applied to multiple).
Here is a simplified version of my code.
var query = dbContext.Documents.AsQueryable();
query = FilterDocumentsByDate(query, x => x.CreatedDate);
query = FilterDocumentsByDate(query, x => x.SubmittedDate);
private IQueryable<Document> FilterDocumentsByDate(IQueryable<Document> query, Func<Document, DateTime> propertyToSearch)
{
query = query.Where(x => propertyToSearch(x).Year > 2000);
return query;
}
When I look at the query in SQL profiler, I can see that the query is missing the WHERE clause (so all documents are being retrieved and the filter is being done in memory). If I copy/paste the code inline for both dates (instead of calling the method twice) then the WHERE clause for the both dates are included in the query.
Is there no way to add a WHERE condition to an IQueryable by passing a property in a Func which can be properly translated to SQL by Entity Framework?
EF is unable to understand your query, so it breaks and executes WHERE clause in memory.
The solution is creating dynamic expressions.
var query = dbContext.Documents.AsQueryable();
query = FilterDocumentsByDate(query, x => x.CreatedDate.Year);
query = FilterDocumentsByDate(query, x => x.SubmittedDate.Year);
private IQueryable<Document> FilterDocumentsByDate(IQueryable<Document> query, Expression<Func<Document, int>> expression)
{
var parameter = expression.Parameters.FirstOrDefault();
Expression comparisonExpression = Expression.Equal(expression.Body, Expression.Constant(2000));
Expression<Func<Document, bool>> exp = Expression.Lambda<Func<Document, bool>>(comparisonExpression, parameter);
query = query.Where(exp);
return query;
}
I am sorry, I haven't run this myself, but this should create WHERE statement. Let me know how it goes.

Doctrine2 fetch Count more optimized and faster way Or Zf2 library

I am using Doctrine2 and Zf2 , now when I need to fetch count of rows, I have got the following two ways to fetch it. But my worry is which will be more optimized and faster way, as in future the rows would be more than 50k. Any suggestions or any other ways to fetch the count ?? Is there any function to get count which can be used with findBy ???
Or should I use normal Zf2 Database library to fetch count. I just found that ORM is not preferred to fetch results when data is huge. Please any help would be appreciated
$members = $this->getEntityManager()->getRepository('User\Entity\Members')->findBy(array('id' => $id, 'status' => '1'));
$membersCnt = sizeof($members);
or
$qb = $this->getEntityManager()->createQueryBuilder();
$qb->select('count(p)')
->from('User\Entity\Members', 'p')
->where('p.id = '.$id)
->andWhere('p.status = 1');
$membersCnt = $qb->getQuery()->getSingleScalarResult();
Comparison
1) Your EntityRepository::findBy() approach will do this:
Query the database for the rows matching your criteria. The database will return the complete rows.
The database result is then transformed (hydrated) into full PHP objects (entities).
2) Your EntityManager::createQueryBuilder() approach will do this:
Query the database for the number of rows matching your criteria. The database will return a simple number (actually a string representing a number).
The database result is then transformed from a string to a PHP integer.
You can safely conclude that option 2 is far more efficient than option 1:
The database can optimize the query for counting, which might make the query faster (take less time).
Far less data is returned from the database.
No entities are hydrated (only a simple string to integer cast).
All in all less processing power and less memory will be used.
Security comment
Never concatenate values into a query!
This can make you vulnerable to SQL injection attacks when those values are (derived from) user-input.
Also, Doctrine2 can't make use of prepared statements / parameter binding, which can lead to some performance-loss when the same query is used often (with or without different parameters).
In other words, replace this:
->where('p.id = '.$id)
->andWhere('p.status = 1')
with this:
->where('p.id = :id')
->andWhere('p.status = :status')
->setParameters(array('id' => $id, 'status' => 1))
or:
->where($qb->expr()->andX(
$qb->expr()->eq('p.id', ':id'),
$qb->expr()->eq('p.status', ':status')
)
->setParameters(array('id' => $id, 'status' => 1))
Additionally
For this particular query, there's no need to use the QueryBuilder, you can use straight DQL in stead:
$dql = 'SELECT COUNT(p) FROM User\Entity\Members p WHERE p.id = :id AND p.status = :status';
$q = $this->getEntityManager()->createQuery($dql);
$q->setParameters(array('id' => $id, 'status' => 1));
$membersCnt = $q->getSingleScalarResult();
You should totally go to the dql version of the count.
With the first method you will hydrate (convert from db resultset to objects) each of the rows as single object and put them on one array and then count the amount items in that array. That will be a totally waste of memory and cycles if the only objective is to know the number of elements in that result set.
With the second method the dql will be gracefully converted to SELECT COUNT(*) Blah blah blah
plain SQL sentence and will retrieve directly the count from db.
The comment about ORM is not preferred to when to retrieve data is huge is true, in big batch process you should paginate your query to retrieve data instead all at the same time to avoid memory overrides but in that case you are only retrieving a single number, the total count so this rule doesn’t apply.
Query builder is so slow .
Use DQL for faster select .
$query = $this->getEntityManager()->createQuery("SELECT count(m) FROM User\Entity\Members m WHERE m.status = 1 AND m.id = :id ");
$query->setParameter(':id', $id);
You need setParameter for prevent SQL injection .
Stored procedure is fastest but it depend on your DB .
Make all relations of entity Lazy.

How to build the correct LINQ query (generate OR instead of AND)

I need some help in building the correct query. I have an Employees table. I need to get a list of all employees, that EENO (Employee ID) contains a string from a supplied array of partial Employee IDs.
When I use this code
// IEnumerable<string> employeeIds is a collection of partial Employee IDs
IQueryable<Employee> query = Employees;
foreach (string id in employeeIds)
{
query = query.Where(e => e.EENO.Contains(id));
}
return query;
I will get something like:
SELECT *
FROM Employees
WHERE EENO LIKE '%1111111%'
AND EENO LIKE '%2222222%'
AND EENO LIKE '%3333333%'
AND EENO LIKE '%4444444%'
Which doesn't make sense.
I need "OR" instead of "AND" in resulting SQL.
Thank you!
UPDATE
This code I wrote using PredicateBuilder works perfectly when I need to include these employees.
var predicate = PredicateBuilder.False<Employee>();
foreach (string id in employeeIds)
{
var temp = id;
predicate = predicate.Or(e => e.EENO.Contains(temp));
}
var query = Employees.Where(predicate);
Now, I need to write an opposite code, to exclude these employees,
here it is but it is not working: the generated SQL is totally weird.
var predicate = PredicateBuilder.False<Employee>();
foreach (string id in employeeIds)
{
var temp = id;
predicate = predicate.And(e => !e.EENO.Contains(temp)); // changed to "And" and "!"
}
var query = Employees.Where(predicate);
return query;
It's supposed to generate SQL Where clause like this one:
WHERE EENO NOT LIKE '%11111%'
AND NOT LIKE '%22222%'
AND NOT LIKE '%33333%'
But it's not happening
The SQL generated is this: http://i.imgur.com/9MDP7.png
Any help is appreciated. Thanks.
Instead of the foreach, just build the IQueryable once:
query = query.Where(e => employeeIds.Contains(e.EENO));
I'd take a look at http://www.albahari.com/nutshell/predicatebuilder.aspx. This has a great way of building Or queries, and is written by the guy that wrote LinqPad. The above link also has examples of usage.
I believe you can use Any():
var query = Employees.Where(emp => employeeIds.Any(id => id.Contains(emp.EENO)));
If you don't want to use a predicate builder, then the only other option is to UNION each of the collections together on an intermediate query:
// IEnumerable<string> employeeIds is a collection of partial Employee IDs
IQueryable<Employee> query = Enumerable.Empty<Employee>().AsQueryable();
foreach (string id in employeeIds)
{
string tempID = id;
query = query.Union(Employees.Where(e => e.EENO.Contains(tempID));
}
return query;
Also keep in mind that closure rules are going to break your predicate and only end up filtering on your last criteria. That's why I have the tempID variable inside the foreach loop.
EDIT: So here's the compendium of all the issues you've run across:
Generate ORs instead of ANDS
Done, using PredicateBuilder.
Only last predicate is being applied
Addressed by assigning a temp variable in your inner loop (due to closure rules)
Exclusion predicates not working
You need to start with the correct base case. When you use ORs, you need to make sure you start with the false case first, that way you only include records where AT LEAST ONE predicate matches (otherwise doesn't return anything). The reason for this is that the base case should just get ignored for purposes of evaluation. In other words false || predicate1 || predicate2 || ... really is just predicate1 || predicate2 || ... because you're looking for at least one true in your list of predicates (and you just need a base to build on). The opposite applies to the AND case. You start with true so that it gets "ignored" for purposes of evaluation, but you still need a base case. In other words: true && predicate1 && ... is the same as predicate1 && .... Hope that addresses your last issue.

get only one row and one column from table using nhibernate

I have query:
var query = this.session.QueryOver<Products>()
.Where(uic => uic.PageNumber == nextPage[0])
.SingleOrDefault(uic => uic.ProductNumber)
But this query result is type Products. It is possible that result will be only integer type of column ProductNumber ?
Try something like this:
var query = this.session.QueryOver<Products>()
.Where(uic => uic.PageNumber == nextPage[0])
.Select(uic => uic.ProductNumber)
.SingleOrDefault<int>();
Since you need a single primitive type value, you can do .Select to define the result column, and then do .SingleOrDefault to get the only result. For complex types, you'd need to use transformers.
You can find more info about QueryOver in this blog post on nhibernate.info: http://nhibernate.info/blog/2009/12/17/queryover-in-nh-3-0.html
You can use Miroslav's answer for QueryOver, but this would look cleaner with LINQ:
var productNumber = session.Query<Products>()
.Where(uic => uic.PageNumber == nextPage[0])
.Select(uic => uic.ProductNumber)
.SingleOrDefault();
Notice you don't need a cast, as the Select operator changes the expression type to the return type of its parameter (which is the type of ProductNumber).

linq match word with boundaries

say i have a nvarchar field in my database that looks like this
1, "abc abccc dabc"
2, "abccc dabc"
3, "abccc abc dabc"
i need a select LINQ query that would match the word "abc" with boundaries not part of a string
in this case only row 1 and 3 would match
from row in table.AsEnumerable()
where row.Foo.Split(new char[] {' ', '\t'}, StringSplitOptions.None)
.Contains("abc")
select row
It's important to include the call to AsEnumerable, which means the query is executed on the client-side, else (I'm pretty sure) the Where clause won't get converted into SQL succesfully.
Maybe a regular expression like this (nb - not compiled or tested):
var matches = from a in yourCollection
where Regex.Match(a.field, ".*\sabc\s.*")
select a;
datacontext.Table.Where(
e => Regex.Match(e.field, #"(.*?[\s\t]|^)abc([\s\t].*?|$)")
);
or
datacontext.Table.Where(
e => e.Split(' ', '\t').Contains("abc");
);
For efficiency, you want to do as much of the filtering as possible on the server, and then the rest of the filtering on the client. You can't use Regex on the server (SQL Server doesn't support it) so the solution is to first use a LIKE-type search (by calling .Contains) then use Regex on the client to further refine the results:
db.MyTable
.Where (t => t.MyField.Contains ("abc"))
.AsEnumerable() // Executes locally from this point on
.Where (t => Regex.IsMatch (t.MyField, #"\babc\b"))
This ensures that you retrieve only the rows from SQL Server than contain the letters 'abc' (regardless of whether they're a word-boundary match or not) and use Regex on the client-side to further restrict the result set so that only matches that are on word boundaries are included.