Best way to get Count for paging in ravenDB - ravendb

I need to find the number of documents that are in the raven database , so that I can properly page the documents out. I had the following implementation -
public int Getcount<T>()
{
IQueryable<T> queryable = from p in _session.Query<T>().Customize(x =>x.WaitForNonStaleResultsAsOfLastWrite())
select p;
return queryable.Count();
}
But if the count is too large then it times out.
I tried the method suggested in FAQs -
public int GetCount<T>()
{
//IQueryable<T> queryable = from p in _session.Query<T>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
// select p;
//return queryable.Count();
RavenQueryStatistics stats;
var results = _session.Query<T>()
.Statistics(out stats);
return stats.TotalResults;
}
This always returns 0.
What am I doing wrong?

stats.TotalResults is 0 because the query was never executed. Try this instead:
var results = _session
.Query<T>()
.Statistics(out stats)
.Take(0)
.ToArray();

The strange syntax to get the statistics tripped me up as well. I can see why the query needs to be run in order to populate the statistic object but the syntax is a bit verbose imo.
I have written the following extension method for use in my unit tests. It helps keep the code terse.
Extension Method
public static int QuickCount<T>(this IRavenQueryable<T> results)
{
RavenQueryStatistics stats;
results.Statistics(out stats).Take(0).ToArray();
return stats.TotalResults;
}
Unit Test
...
db.Query<T>().QuickCount().ShouldBeGreaterThan(128);
...

Related

Select Count(*) Query using Dapper in .Net Core API returns incorrect value

I'm trying to do a select count query in Sql Server using Dapper. The expected response should be 0 when a profile does not exist. When I do the query in SSMS it returns correctly, but in the API using Dapper it returns 1. Any idea why this is happening?
public IActionResult GetProfileCount(string profileId)
{
int profileCount = 0;
using (IDbConnection db = new SqlConnection(connectionString))
{
try
{
profileCount = db.Query($"select count(*) from Profile where Id='{profileId}'").Count();
}
catch(Exception ex)
{
Console.WriteLine($"Error retrieving count for ProfileId: {profileId}", ex.Message);
}
}
return Ok(profileCount);
}
I see you added your own answer but could I recommend not doing it that way. When you do
profileCount = db.Query($"select * from Profile where Id='{profileId}'").Count();
What you are actually doing is selecting every field from the database, pulling it into your C# application, and then counting how many results you got back. Then you are binning all that data you got back, very inefficient!
Change it to this :
profileCount = db.QueryFirst<int>($"select count(*) from Profile where Id = #profileId", new { profileId })");
Instead you are selecting an "int" from the result set, which just so happens to be your count(*). Perfect!
More on querying in Dapper here : https://dotnetcoretutorials.com/2019/08/05/dapper-in-net-core-part-2-dapper-query-basics/
Also notice that (similar to the other answer), I am using parameterized queries. I also heavily recommend this as it protects you from SQL Injection. Your initial example is very vulnerable!
You can read a little more about SQL Injection in C#/MSSQL here https://dotnetcoretutorials.com/2017/10/11/owasp-top-10-asp-net-core-sql-injection/ But just know that Dapper protects you from it as long as you use the inbuilt helpers to add parameters to your queries.
Another option is use the method ExecuteScalar for "select count" queries:
profileCount = db.ExecuteScalar<int>("select count(*) from Profile where Id=#profileId", new { profileId });
Ref.: https://www.learndapper.com/selecting-scalar-values
Try and change your query to the following:
db.Query($"select count(*) from Profile where Id = #ProfileId", new { ProfileId = profileId }).Count()
I figured it out. The .Count() is counting the rows of the result, which is going to be 1 because the result is one row displaying the number 0. I switched my code to this and it works now.
public IActionResult GetProfileCount(string profileId)
{
int profileCount = 0;
using (IDbConnection db = new SqlConnection(connectionString))
{
try
{
profileCount = db.Query($"select * from Profile where Id='{profileId}'").Count();
}
catch(Exception ex)
{
Console.WriteLine($"Error retrieving count for ProfileId: {profileId}", ex.Message);
}
}
return Ok(profileCount);
}

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

Having difficulty with specific Future<T> query and collections with nHibernate.

For the sake of example, I am removing non-queried and non-essential data just to figure out how to do the initial query here.
I have a model structure like this.
class Path {
Guid Id { get; protected set; }
IList<Step> Steps { get; set; }
void AddStep(Step entity) {
// write up bidirectional association
}
}
class Step {
Guid Id { get; protected set; }
Path Path { get; set; }
// other data irreleveent
}
Now assuming 50000 steps, each with 5000 steps... I do realize I don't want to return all of them at once. But putting a limit on my query fetch isn't my real problem.
Here is the exact query I am attempting to use. I am getting the exception..
NHibernate.QueryException : duplicate alias: lpStep
----> System.ArgumentException : An item with the same key has already been added.
I'm not entirely sure how to handle this scenario. if I use a flat out Fetch on the Path query, I get Select+N errors from the NHibernate Profiler.
I do have batching enabled - but as far as I am aware, that only really applies to inserts, not retrievals. But in any case I am getting back these errors and not sure how to handle it. Any ideas?
using (var Transaction = Session.BeginTransaction()) {
Path lpPath = null;
Step lpStep = null;
var lpPaths = Session.QueryOver<Path>(() => lpPath)
.Take(50)
.Future<Path>();
var lpSteps = Session.QueryOver<Step>(() => lpStep)
.JoinAlias(() => lpPath.Steps, () => lpStep)
.Where(o => o.Path.Id == lpPath.Id)
.Take(12)
.Future<Step>();
Transaction.Commit();
foreach (var path in lpPaths) {
Console.WriteLine("{0} fetched {1} Steps",
path.Id, path.Steps.Count);
}
}
I basically want to say ..
Select (50) Paths, also, as a separate select but part of the same trip, Select the first (12) Steps that belong the previously selected Paths.
But if I use a flat out join, I get 110 rows, whereas I expect to have 2 tables, 1 of 50 rows, 1 of 600 rows.
Can someone explain to me what I am doing wrong?
mind you, I can do some minor alterations and the query runs, but it isn't 'optimized'. I can get the data I want, but it takes multiple trips and lazy loading. I can optimize the actual Path selection easily enough but it is those blasted Steps. If I just take a restrictive where clause out of the lpSteps query, it just returns the first 12 steps, not returning 12 steps for each query done.
I've looked at some of the other stack overflow posts on Future<T> and found them to look a lot like this. So I don't understand why it isn't working. I suspect that what is going on is this..
lpPaths runs.
lpSteps tries to run, first one succeeds.
lpSteps then tries to run again, finds it cannot redefine lpPaths.
Apocolypse
I'm really hoping someone smarter than me can enlighten me on the absolute most optimal way to write this.
i cant really understand what your use case is. why do you only need the first 12 Steps of each Path? What about batches of Steps to process
IList<Guid> pathIds;
while ((pathIds = QueryOver.For<Path>()
.Where(...)
.Projection(path => path.Id)
.SetmaxResults(100)).Count > 0)
{
int batch = 0;
const int batchsize = 600;
IList<Step> steps;
while ((steps = Session.QueryOver<Step>()
.Where(step => step.Path.Id).In(pathIds)
.Where(step => step. ...)
.SetFirstResult(batch * batchsize)
.Take(batchsize)
.List<Step>()).Count > 0)
{
DoSomething(steps);
batch++;
}
}

Querying Raven with Where() only filters against the first 128 documents?

We're using Raven to validate logins so people can get into our site.
What we've found is that if you do this:
// Context is an IDocumentSession
Context.Query<UserModels>()
.SingleOrDefault(u => u.Email.ToLower() == email.ToLower());
The query only filters on the first 128 docs of the documents in
Raven. There are several thousand in our database, so unless your
email happens to be in that first 128 returned, you're out of luck.
None of the Raven samples code or any
sample code I've come across on the net performs any looping using
Skip() and Take() to iterate through the set.
Is this the desired behavior of Raven?
Is it the same behavior even if you use an advanced Lucene Query? ie; Do advanced queries behave any differently?
Is the solution below appropriate? Looks a little ugly. :P
My solution is to loop through the set of all documents until I
encounter a non null result, then I break and return .
public T SingleWithIndex(string indexName, Func<T, bool> where)
{
var pageIndex = 1;
const int pageSize = 1024;
RavenQueryStatistics stats;
var queryResults = Context.Query<T>(indexName)
.Statistics(out stats)
.Customize(x => x.WaitForNonStaleResults())
.Take(pageSize)
.Where(where).SingleOrDefault();
if (queryResults == null && stats.TotalResults > pageSize)
{
for (var i = 0; i < (stats.TotalResults / (pageIndex * pageSize)); i++)
{
queryResults = Context.Query<T>(indexName)
.Statistics(out stats)
.Customize(x => x.WaitForNonStaleResults())
.Skip(pageIndex * pageSize)
.Take(pageSize)
.Where(where).SingleOrDefault();
if (queryResults != null) break;
pageIndex++;
}
}
return queryResults;
}
EDIT:
Using the fix below is not passing query params to my RavenDB instance. Not sure why yet.
Context.Query<UserModels>()
.Where(u => u.Email == email)
.SingleOrDefault();
In the end I am using Advanced Lucene Syntax instead of linq queries and things are working as expected.
RavenDB does not understand SingleOrDefault, so it performs a query without the filter. Your condition is then executed on the result set, but per default Raven only returns the first 128 documents.
Instead, you have to call
Context.Query<UserModels>()
.Where(u => u.Email == email)
.SingleOrDefault();
so the filtering is done by RavenDB/Lucene.

NHibernate, Sum Query

If i have a simple named query defined, the preforms a count function, on one column:
<query name="Activity.GetAllMiles">
<![CDATA[
select sum(Distance) from Activity
]]>
</query>
How do I get the result of a sum or any query that dont return of one the mapped entities, with NHibernate using Either IQuery or ICriteria?
Here is my attempt (im unable to test it right now), would this work?
public decimal Find(String namedQuery)
{
using (ISession session = NHibernateHelper.OpenSession())
{
IQuery query = session.GetNamedQuery(namedQuery);
return query.UniqueResult<decimal>();
}
}
As an indirect answer to your question, here is how I do it without a named query.
var session = GetSession();
var criteria = session.CreateCriteria(typeof(Order))
.Add(Restrictions.Eq("Product", product))
.SetProjection(Projections.CountDistinct("Price"));
return (int) criteria.UniqueResult();
Sorry! I actually wanted a sum, not a count, which explains alot. Iv edited the post accordingly
This works fine:
var criteria = session.CreateCriteria(typeof(Activity))
.SetProjection(Projections.Sum("Distance"));
return (double)criteria.UniqueResult();
The named query approach still dies, "Errors in named queries: {Activity.GetAllMiles}":
using (ISession session = NHibernateHelper.OpenSession())
{
IQuery query = session.GetNamedQuery("Activity.GetAllMiles");
return query.UniqueResult<double>();
}
I think in your original example, you just need to to query.UniqueResult(); the count will return an integer.