ScrollableResults size gives repeated value - sql

I am working on application using hibernate and spring. I am trying to get count of result got by query by using ScrollableResults but as query contains lots of join(Inner joins), the result contains id repeated many times. this creates problem for ScrollableResults when i am using it to know total no of unique rows(or unique ids) returned from database. please help. Some part of code is below :
StringBuffer queryBuf = new StringBuffer("Some SQL query with lots of Joins");
Query query = getSession().createSQLQuery(queryBuf.toString());
query.setReadOnly(true);
ScrollableResults results = query.scroll();
if (results.isLast() == false)
results.last();
int total = results.getRowNumber() + 1;
logger.debug(">>>>>>TOTAL COUNT<<<<<< = {}", total);
It gives total count 1440 but actual unique rows in database is 504.
Thanks in Advance.

You can try
Integer count= ((Long)query.uniqueResult()).intValue();

Unfortunately, getRowNumber does not give you the size, or the number of results, but the current position in the results. ScrollableResults does not provide a way to get the number of results out-of-the-box.
I am referring to ScrollableResults Hibernate Version 5.4.
As a workaround, you can try
Long l_resultsCount = 0L;
while(results.next()) {
l_resultsCount++;
}

getRowNumber() gives the number of the current row.
Call last() and afterwards getRowNumber()+1 will give the total number of results.

Related

How to remove collection from model [duplicate]

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):
for id in mycoll.find(query, fields={}):
mycoll.remove(id)
This does not seem to be very efficient. Is there a better way?
EDIT
OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:
def reduce_duplicates(mydb, max_group_size):
# 1. Count the group sizes
res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
# 2. For each entry from the filter scratch collection having count > max_group_size
deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
key = entry['_id']
group_size = int(entry['value'])
# 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
mydb.static.remove(id)
return res['counts']['input']
So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:
MR the data to (key, count) pairs.
Iterate over all the pairs with count > max_group_size
Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
Delete each and every found record.
As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.
Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:
mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])
This attempt fails miserably. Moreover, it seems to screw mongo.Observe:
C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database
Needless to say, that the foreach-found-remove approach works and yields the expected results.
Now, I hope I have provided enough context and (hopefully) have restored my lost honour.
You can use a query to remove all matching documents
var query = {name: 'John'};
db.collection.remove(query);
Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.
Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.
You can remove it directly using MongoDB scripting language:
db.mycoll.remove({_id:'your_id_here'});
Would deleteMany() be more efficient? I've recently found that remove() is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)
db.collection.deleteMany(
<filter>,
{
writeConcern: <document>,
collation: <document>
}
)
I would recommend paging if large number of records.
First: Get the count of data you want to delete:
-------------------------- COUNT --------------------------
var query= {"FEILD":"XYZ", 'DATE': {$lt:new ISODate("2019-11-10")}};
db.COL.aggregate([
{$match:query},
{$count: "all"}
])
Second: Start deleting chunk by chunk:
-------------------------- DELETE --------------------------
var query= {"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var cursor = db.COL.aggregate([
{$match:query},
{ $limit : 5 }
])
cursor.forEach(function (doc){
db.COL.remove({"_id": doc._id});
});
and this should be faster:
var query={"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var ids = db.COL.find(query, {_id: 1}).limit(5);
db.tags.deleteMany({"_id": { "$in": ids.map(r => r._id)}});
Run this query in cmd
db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});
If you are using node.js write this code
User.remove({ _id: req.body.id },, function(err){...});

Share NHibernate parameter in multiple queries

I have two queries, first query return top 10/20 recoords, second query return total record count from first query. Both queries need to use same filter condition.
How can I write filter condition and parameter used in filter condition in one place and use in both the queries.
Condition I can store in string variable and use in both the queries but how to share parameters?
I am using HQL
Check this similar Q & A: Nhibernate migrate ICriteria to QueryOver
There is a native support in NHiberante for row count. Let's have some query
// the QueryOver
var query = session.QueryOver<MyEntity>();
It could have any amount of where parts, projections... Now we just take its underlying criteria and use a transformer to create brand new criteria - out-of-box ready to get total rowcount
// GET A ROW COUNT query (ICriteria)
var rowCount = CriteriaTransformer.TransformToRowCount(query.UnderlyingCriteria);
Next step is to use FUTURE to get both queries in one round trip to DB
// ask for a list, but with a Future, to combine both in one SQL statement
var list = query
.Future<MyEntity>()
.ToList();
// execute the main and count query at once
var count = rowCount
.FutureValue<int>()
.Value;
// list is now in memory, ready to be used
var list = futureList
.ToList();

PDO SQL issue displaying multiple rows when using COUNT()

To display my results from PDO, I always use following PHP code for example:
$STH = $DBH->prepare("SELECT logo_id, guess_count, guessed, count(id) AS counter FROM guess WHERE user_id=:id");
$STH->bindParam(":id",$loginuser['id']);
$STH->execute();
while($row = $STH->fetch()){
print_r($row);
}
Now the issue is that I only get one result. I used to use $STH->rowCount() to check the amount of rows returned, but this method isn't really advised for SELECT statements because in some databases it doesn't react correctly. So I used the count(id) AS counter, but now I only get one result every time, even though the value of $row['counter'] is larger than one.
What is the correct way to count the amount of results in one query?
If you want to check the number of rows that are returned by a query, there are a couple of options.
You could do a ->fetchAll to get an array of all rows. (This isn't advisable for large result sets (i.e. a lot of rows returned by the query); you could add a LIMIT clause on your query to avoid returning more than a certain number of rows, if what you are checking is whether you get more than one row back, you would only need to retrieve two rows.) Checking the length of the array is trivial.
Another option is to run a another, separate query, to get the count separately, e.g.
SELECT COUNT(1) AS counter FROM guess WHERE user_id=:id
But, that approach requires another round trip to the database.
And the old standby SQL_CALC_ROUND_ROWS is another option, though that too can have problematic performance with large sets.
You could also just add a loop counter in your existing code:
$i = 0;
while($row = $STH->fetch()){
$i++
print_r($row);
}
print "fetched row count: ".$i;
If what you need is an exact count of the number of rows that satisfy a particular predicate PRIOR to running a query to return the rows, then the separate COUNT(1) query is likely the most suitable approach. Yes, it's extra code in your app; I recommend you preface the code with a comment that indicates the purpose of the code... to get an exact count of rows that satisfy a set of predicates, prior to running a query that will retrieve the rows.
If I had to process the rows anyway, and adding LIMIT 0,100 to the query was acceptable, I would go for the ->fetchAll(), get the count from the length of the array, and process the rows from the array.
You have to use GROUP BY. Your query should look like
SELECT logo_id, guess_count, guessed, COUNT(id) AS counter
FROM guess
WHERE user_id=:id
GROUP BY logo_id, guess_count, guessed

Total count for a limited result

So i implemented the paging for dojo.store.jsonRest to use as store in the dojox.grid.DataGrid. In the server im using Symfony 2 and as ORM Doctrine, im new to this two frameworks.
For Dojo jsonRest the response of the server must have a header Content-Range containing the result offset, limit and the total number of records (without the limit).
So for a response with a Content-Range: items 0-24/66 header, if the user where to scroll the grid records to the 24 row, it will make a async request with Range: 24-66 header, then the response header should have a Content-Range: items 24-66/66. This is done so Dojo can know how many request it can make for the paginated data and the records range for the presented and subsequent request.
So my problem is that to get the total number of records without the limit, i had to make a COUNT query using the same query that has the offset and limit. I don't like this.
I want to know if there is a way i can get the total count and the limited result without making two queries.
public function getByTextCount($text)
{
$dql = "SELECT COUNT(s.id) FROM Bundle:Something s WHERE s.text LIKE :text";
$query = $this->getEntityManager()->createQuery($dql);
$query->setParameter('text', '%'.$text.'%');
return $query->getSingleScalarResult();
}
-
public function getByText($text, $offset=0, $limit=24)
{
$dql = "SELECT r FROM Bundle:Something s WHERE s.text LIKE :text";
$query = $this->getEntityManager()->createQuery($dql);
$query->setParameter('text', '%'.$text.'%');
$query->setFirstResult($offset);
$query->setMaxResults($limit);
return $query->getArrayResult();
}
If you're using MySQL, you can do a SELECT FOUND_ROWS().
From the documentation.
A SELECT statement may include a LIMIT clause to restrict the number
of rows the server returns to the client. In some cases, it is
desirable to know how many rows the statement would have returned
without the LIMIT, but without running the statement again. To obtain
this row count, include a SQL_CALC_FOUND_ROWS option in the SELECT
statement, and then invoke FOUND_ROWS() afterward:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
If you want to use Doctrine only (i.e. to avoid vendor-specific SQL) you can always reset part of the query after you have selected the entities:
// $qb is a Doctrine Query Builder
// $query is the actual DQL query returned from $qb->getQuery()
// and then updated with the ->setFirstResult(OFFSET) and ->setMaxResults(LIMIT)
// Get the entities as an array ready for JSON serialization
$entities = $query->getArrayResult();
// Reset the query and get the total records ready for the Range header
// 'e' in count(e) is the alias for the entity specified in the Query Builder
$count = $qb->resetDQLPart('orderBy')
->select('COUNT(e)')
->getQuery()
->getSingleScalarResult();

What's the least expensive way to get the number of rows (data) in a SQLite DB?

When I need to get the number of row(data) inside a SQLite database, I run the following pseudo code.
cmd = "SELECT Count(*) FROM benchmark"
res = runcommand(cmd)
read res to get result.
But, I'm not sure if it's the best way to go. What would be the optimum way to get the number of data in a SQLite DB? I use python for accessing SQLite.
Your query is correct but I would add an alias to make it easier to refer to the result:
SELECT COUNT(*) AS cnt FROM benchmark
Regarding this line:
count size of res
You don't want to count the number of rows in the result set - there will always be only one row. Just read the result out from the column cnt of the first row.