Doctrine2 fetch Count more optimized and faster way Or Zf2 library - orm

I am using Doctrine2 and Zf2 , now when I need to fetch count of rows, I have got the following two ways to fetch it. But my worry is which will be more optimized and faster way, as in future the rows would be more than 50k. Any suggestions or any other ways to fetch the count ?? Is there any function to get count which can be used with findBy ???
Or should I use normal Zf2 Database library to fetch count. I just found that ORM is not preferred to fetch results when data is huge. Please any help would be appreciated
$members = $this->getEntityManager()->getRepository('User\Entity\Members')->findBy(array('id' => $id, 'status' => '1'));
$membersCnt = sizeof($members);
or
$qb = $this->getEntityManager()->createQueryBuilder();
$qb->select('count(p)')
->from('User\Entity\Members', 'p')
->where('p.id = '.$id)
->andWhere('p.status = 1');
$membersCnt = $qb->getQuery()->getSingleScalarResult();

Comparison
1) Your EntityRepository::findBy() approach will do this:
Query the database for the rows matching your criteria. The database will return the complete rows.
The database result is then transformed (hydrated) into full PHP objects (entities).
2) Your EntityManager::createQueryBuilder() approach will do this:
Query the database for the number of rows matching your criteria. The database will return a simple number (actually a string representing a number).
The database result is then transformed from a string to a PHP integer.
You can safely conclude that option 2 is far more efficient than option 1:
The database can optimize the query for counting, which might make the query faster (take less time).
Far less data is returned from the database.
No entities are hydrated (only a simple string to integer cast).
All in all less processing power and less memory will be used.
Security comment
Never concatenate values into a query!
This can make you vulnerable to SQL injection attacks when those values are (derived from) user-input.
Also, Doctrine2 can't make use of prepared statements / parameter binding, which can lead to some performance-loss when the same query is used often (with or without different parameters).
In other words, replace this:
->where('p.id = '.$id)
->andWhere('p.status = 1')
with this:
->where('p.id = :id')
->andWhere('p.status = :status')
->setParameters(array('id' => $id, 'status' => 1))
or:
->where($qb->expr()->andX(
$qb->expr()->eq('p.id', ':id'),
$qb->expr()->eq('p.status', ':status')
)
->setParameters(array('id' => $id, 'status' => 1))
Additionally
For this particular query, there's no need to use the QueryBuilder, you can use straight DQL in stead:
$dql = 'SELECT COUNT(p) FROM User\Entity\Members p WHERE p.id = :id AND p.status = :status';
$q = $this->getEntityManager()->createQuery($dql);
$q->setParameters(array('id' => $id, 'status' => 1));
$membersCnt = $q->getSingleScalarResult();

You should totally go to the dql version of the count.
With the first method you will hydrate (convert from db resultset to objects) each of the rows as single object and put them on one array and then count the amount items in that array. That will be a totally waste of memory and cycles if the only objective is to know the number of elements in that result set.
With the second method the dql will be gracefully converted to SELECT COUNT(*) Blah blah blah
plain SQL sentence and will retrieve directly the count from db.
The comment about ORM is not preferred to when to retrieve data is huge is true, in big batch process you should paginate your query to retrieve data instead all at the same time to avoid memory overrides but in that case you are only retrieving a single number, the total count so this rule doesn’t apply.

Query builder is so slow .
Use DQL for faster select .
$query = $this->getEntityManager()->createQuery("SELECT count(m) FROM User\Entity\Members m WHERE m.status = 1 AND m.id = :id ");
$query->setParameter(':id', $id);
You need setParameter for prevent SQL injection .
Stored procedure is fastest but it depend on your DB .
Make all relations of entity Lazy.

Related

SQL - Combine search queries instead of multiple DB trips

A search term comes from UI to search entities of a table. The order that these search results should show up in UI is like this:
First: exact match
Second: starts with that term
Third: contains a word of that term
Forth: ends with that term
Fifth: contains the term in any matter
So I first got the entities from DB:
result = entities.Where(e => e.Name.Contains(searchTerm)).ToList();
And then I rearranged them in memory:
var sortedEntities = result.Where(e => e.Name.ToLower() == searchTerm.ToLower())
.Union(result.Where(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase)))
.Union(result.Where(e => e.Name.Contains($" {searchTerm} ")))
.Union(result.Where(e => e.Name.EndsWith(searchTerm, StringComparison.OrdinalIgnoreCase)))
.Union(result.Where(e => e.Name.Contains(searchTerm)));
It was working fine until I added paging. Now if an exact match is on page 2 (in data coming from DB) it won't show up first.
The only solution I can think of is to separate the requests (so 5 requests in this case) and keep track of page size manually. My question is that is there a way to tell DB to respect that order and get the sorted data in one DB trip?
It took me some time to realize that you use Union in an attempt to order data by "match strength": first the ones that match exactly, then the ones that match with different case, etc. When I see Unions with predicates my Pavlov-conditioned mind translates it into ORs. I had to switch from thinking fast to slow.
So the problem is that there is no predictable sorting. No doubt, the chained Union statements do produce a deterministic final sort order, but it's not necessarily the order of the Unions, because each Union also executes an implicit Distinct. The general rule is, if you want a specific sort order, use OrderBy methods.
Having said that, and taking...
var result = entities
.Where(e => e.Name.Contains(searchTerm))
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize).ToList();
...the desired result seems to be obtainable by:
var sortedEntities = result
.OrderByDescending(e => e.Name == searchTerm)
.ThenByDescending(e => e.Name.ToLower() == searchTerm.ToLower())
.ThenByDescending(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase))
... etc.
(descending, because false orders before true)
However, if there are more matches than pageSize the ordering will be too late. If pageSize = 20 and item 21 is the first exact match this item will not be on page 1. Which means: the ordering should be done before paging.
The first step would be to remove the .ToList() from the first statement. If you remove it, the first statement is an IQueryable expression and Entity Framework is able to combine the full statement into one SQL statement. The next step would be to move Skip/Take to the end of the full statement and it'll also be part of the SQL.
var result = entities.Where(e => e.Name.Contains(searchTerm));
var sortedEntities = result
.OrderByDescending(e => e.Name == searchTerm)
.ThenByDescending(e => e.Name.ToLower() == searchTerm.ToLower())
.ThenByDescending(e => e.Name.StartsWith(searchTerm, StringComparison.OrdinalIgnoreCase))
... etc
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize).ToList();
But now a new problem blows in.
Since string comparison with StringComparison.OrdinalIgnoreCase isn't supported Entity Framework will auto-switch to client-side evaluation for part of the statement. All of the filtered results will be returned from the database, but most of the the ordering and all of the paging will be done in memory.
That may not be too bad when the filter is narrow, but very bad when it's wide. So, ultimately, to do this right, you have to remove StringComparison.OrdinalIgnoreCase and settle with a somewhat less refined match strength. Bringing us to the
End result:
var result = entities.Where(e => e.Name.Contains(searchTerm));
var sortedEntities = result
.OrderByDescending(e => e.Name == searchTerm)
.ThenByDescending(e => e.Name.StartsWith(searchTerm))
.ThenByDescending(e => e.Name.Contains($" {searchTerm} "))
.ThenByDescending(e => e.Name.EndsWith(searchTerm))
.ThenByDescending(e => e.Name.Contains(searchTerm))
.Skip((pageNumber - 1) * pageSize)
.Take(pageSize).ToList();
Why "less refined"? Because, according your comments, the database collation isn't case sensitive, so SQL can't distinguish exact matches by case without adding COLLATE statements. That's something we can't do with LINQ.

Handle null values within SQL IN clause

I have following sql query in my hbm file. The SCHEMA, A and B are schema and two tables.
select
*
from SCHEMA.A os
inner join SCHEMA.B o
on o.ORGANIZATION_ID = os.ORGANIZATION_ID
where
case
when (:pass = 'N' and os.ORG_ID in (:orgIdList)) then 1
when (:pass = 'Y') then 1
end = 1
and (os.ORG_SYNONYM like :orgSynonym or :orgSynonym is null)
This is a pretty simple query. I had to use the case - when to handle the null value of "orgIdList" parameter(when null is passed to sql IN it gives error). Below is the relevant java code which sets the parameter.
if (_orgSynonym.getOrgIdList().isEmpty()) {
query.setString("orgIdList", "pass");
query.setString("pass", "Y");
} else {
query.setString("pass", "N");
query.setParameterList("orgIdList", _orgSynonym.getOrgIdList());
}
This works and give me the expected output. But I would like to know if there is a better way to handle this situation(orgIdList sometimes become null).
There must be at least one element in the comma separated list that defines the set of values for the IN expression.
In other words, regardless of Hibernate's ability to parse the query and to pass an IN(), regardless of the support of this syntax by particular databases (PosgreSQL doesn't according to the Jira issue), Best practice is use a dynamic query here if you want your code to be portable (and I usually prefer to use the Criteria API for dynamic queries).
If not need some other work around like what you have done.
or wrap the list from custom list et.

doctrine native sql not accepting parameter list

I'm trying to do native SQL in Doctrine. Basically I have 2 parameters:
CANDIDATE_ID - user for who we delete entries,
list of FILE_ID to keep
So I make
$this->getEntityManager()->getConnection()->
executeUpdate( "DELETE FROM FILE WHERE CANDIDATE_ID = :ID AND NOT ID IN :KEEPID",
array(
"ID" => $candidate->id,
"KEEPID" => array(2) )
);
But Doctrine fails:
Notice: Array to string conversion in D:\xampp\htdocs\azk\vendor\doctrine\dbal\lib\Doctrine\DBAL\Connection.php on line 786
Is this bug in Doctrine? I'm making somewhere else select with IN but with QueryBuilder and it's working. Maybe someone could suggest better way of deleting entries, with QueryBuilder for example?
$stmt = $conn->executeQuery('SELECT * FROM articles WHERE id IN (?)',
array(array(1, 2, 3, 4, 5, 6)),
array(\Doctrine\DBAL\Connection::PARAM_INT_ARRAY)
);
From Doctrine's documentation.
You can't pass an array of IDs to a parameter. You can do this for scalar values, but even if this had a 'toString', it wouldn't be what you want.
String concatenation is one method,
"DELETE FROM FILE WHERE CANDIDATE_ID = :ID AND NOT ID IN (". implode(",", $list_of_ids) .")"
But this method goes straight around parameters, and therefore suffers in terms of readability, and is limited to a certain maximum line length, which can vary between databases.
Another approach is to write a function returning a table result, which takes a string of IDs as a parameter.
You could also solve this with a join to a table containing the IDs to keep.
It's a problem I've seen many times with few good answers, but it's usually caused by a misunderstanding in the way the database is modelled. This is a 'code smell' for database access.

Yii Framework: How to get the num_rows?

As the official documentation does not say how to do a simply "num_rows" with their system, i need some help here: How to get the amount of rows in the result set ?
Assuming:
$connection=Yii::app()->db;
$command=$connection->createCommand($sql);
This will work for insert, update and delete:
$rowCount=$command->execute();
execute(): performs a non-query SQL statement, such as INSERT, UPDATE and DELETE. If successful, it returns the number of rows that are affected by the execution.
For select, you could do the following:
$dataReader=$command->query();
This generates the CDbDataReader instance and CDbDataReader provides a rowCount property
http://www.yiiframework.com/doc/api/1.1/CDbDataReader#rowCount-detail
$rowCount = $dataReader->rowCount;
About rowCount => Returns the number of rows in the result set. Note, most DBMS may not give a meaningful count. In this case, use "SELECT COUNT(*) FROM tableName" to obtain the number of rows.
ActiveRecord has count method which can be used.
$cntCriteria = new CDbCriteria();
$cntCriteria->condition = "categoryId = :categoryId";
$cntCriteria->params[':categoryId'] = $categoryRow->categoryId;
$articleCount = Article::model()->count($cntCriteria);
There is one more way to do this. When we execute a sql query it will return the result as array only. So we can able get the count of the rows using count() function like below.
$output=User::model()->findAllBySql("select * from user");//User is a model belongs to the user table
$count_val=count($output);//$count_val has the value of number of rows in the output.

checking if all the items in list occur in another list using linq

I am stuck with a problem here. I am trying to compare items in a list to another list with much more items using linq.
For example:
list 1: 10,15,20
list 2: 10,13,14,15,20,30,45,54,67,87
I should get TRUE if all the items in list 1 occur in list 2. So the example above should return TRUE
Like you can see I can't use sequenceEquals
Any ideas?
EDIT:
list2 is actually not a list it is a column in sql thas has following values:
<id>673</id><id>698</id><id>735</id><id>1118</id><id>1120</id><id>25353</id>.
in linq I did the following queries thanks to Jon Skeets help:
var query = from e in db
where e.taxonomy_parent_id == 722
select e.taxonomy_item_id;
query is IQueryable of longs at this moment
var query2 = from e in db
where query.Contains(e.taxonomy_item_id)
where !lsTaxIDstring.Except(e.taxonomy_ids.Replace("<id>", "")
.Replace("</id>", "")
.Split(',').ToList())
.Any()
select e.taxonomy_item_id;
But now I am getting the error Local sequence cannot be used in LINQ to SQL implementation of query operators except the Contains() operator.
How about:
if (!list1.Except(list2).Any())
That's about the simplest approach I can think of. You could explicitly create sets etc if you want:
HashSet<int> set2 = new HashSet<int>(list2);
if (!list1.Any(x => set2.Contains(x)))
but I'd expect that to pretty much be the implementation of Except anyway.
This should be what you want:
!list1.Except(list2).Any()
var result = list1.All(i => list2.Any(i2 => i2 == i));