NHibernate Version: 3.4.0.4000
I'm currently working on optimizing our code so that we can reduce the number of round trips to the database and am looking at a for loop that is one of the culprits. I'm having a hard time figuring out how to batch all of these iterations into a future that gets executed once when sent to SQL Server. Essentially each iteration of the loop causes 2 queries to hit the database!
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = _session.Query<OptionVersion>().Where(x => x.Option.Id == choice.OptionId).OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault();
choice.ChoiceVersion = _session.Query<ChoiceVersion>().OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).Where(x => x.Choice.Id == choice.ChoiceId).FirstOrDefault();
}
One option is to extract OptionId and ChoiceId from all the LineItemChoices into two lists in local memory. Then issue just two queries, one for options and one for choices, giving these lists in .Where(x => optionIds.Contains(x.Option.Id)). This corresponds to SQL IN operator. This requires some postprocessing. You will get two result lists (transform to dictionary or lookup if you expect many results), that you need to process to populate the choice objects. This postprocessing is local and tends to be very cheap compared to database roundtrips. This option can be a bit tricky if the existing FirstOrDefault part is absolutely necessary. Do you expect there to be more than result for a single optionId? If not, this code could instead have used SingleOrDefault, which could just be dropped if converting to use IN-queries.
The other option is to use futures (https://nhibernate.info/doc/nhibernate-reference/performance.html#performance-future). For Linq it means to use ToFuture or ToFutureValue at the end, which also conflicts with FirstOrDefault I believe. The important thing is that you need to loop over all line item choices to initialize ALL queries BEFORE you access the value of any of them. So this is likely to also result in some postprocessing, where you would first store the future values in some list, and then in a second loop access the real value from each query to populate the line item choice.
If you to expect that the queries can yield more than one result (before applying FirstOrDefault), I think you can just use Take(1) instead, as that will still return an IQueryable where you can apply the future method.
The first option is probably the most efficient, since it will just be two queries and allow the database engine to make just one pass over the tables.
Keep the limit on the maximum number of parameters that can be given in an SQL query in mind. If there can be thousands of line item choices, you may need to split them in batches and query for at most 2000 identifiers per round trip.
Adding on the Oskar answer, NHibernate Futures was implement in NHibernate 2.1. It is available on method Future for collections and FutureValue for single values.
In your case, you could separate the IDs of the list in memory ...
var optionIds = lineItem.LineItemChoices.Select(x => x.OptionId);
var choiceIds = lineItem.LineItemChoices.Select(x => x.ChoiceId);
... and execute two queries using Future<T> to get two lits in one hit over the database.
var optionVersions = _session.Query<OptionVersion>()
.Where(x => optionIds.Contains(x.Option.Id))
.OrderByDescending(x => x.OptionVersionNumber)
.Future<OptionVersion>();
var choiceVersions = _session.Query<ChoiceVersion>()
.Where(x => choiceIds.Contains(x.Choice.Id))
.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber)
.Future<ChoiceVersion>();
After with all you need in memory, you could loop on the original collection you have and search in memory the data to fill up the choice object.
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = optionVersions.OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault(x => x.Option.Id == choice.OptionId);
choice.ChoiceVersion = choiceVersions.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).FirstOrDefault(x => x.Choice.Id == choice.ChoiceId);
}
In my controller method for the the index view I have the following line.
#students_instance = Student.includes(:memo_tests => {:memo_target => :memo_level})
So for each Student I eager-load all necessary info.
Later on in a .map block, I call the .where() method on one of the relations as shown below.
#all_students = #students_instance.map do |student|
...
last_pass = student.memo_tests.where(:result => true).last.created_at.utc
difference_in_weeks = ((last_pass.to_i - current_date.to_i) / 1.week).round
...
end
This leads to a single SQL query for each student. And since I have over 300+ students, leads to very slow load times and over 300+ SQL queries.
Am I right in thinking that this is caused by the .where() method. I think this because I have checked everything else and these are the two lines that cause all of the queries.
More importantly, is there a better way to do this that reduces these queries to a single query?
The moment you ask where, the statement is translated to a query. Normally, the result should be sql-cached...
Anyway, in order to be sure, you can instead add programming logic to your statement. That way, you are not requesting a NEW sql statement.
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.sort.last
EDIT
I see the OP's question does not require sorting... So, leaving the sorting out:
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.last
compact is required to remove nil results from the array.
I want to use Second level Cache for my query with eager loading(query is below wrote in 3 different ways, i use query cache). I have standard one to many association. I set entity cache for parent, child, and association between parent and class. And the 2nd level cache doesn't work because i got exceptions.
I wrote my query in 3 different way:
Criteria:
session.CreateCriteria<DictionaryMaster>().SetFetchMode("DictionaryItems", FetchMode.Eager)
.SetResultTransformer(new DistinctRootEntityResultTransformer())
.SetCacheable(true).SetCacheMode(CacheMode.Normal)
.List<DictionaryMaster>().ToList();
When I invoked this query i got exception "Unable to perform find[SQL: SQL not available]"
I think that the problem exists because I use DistinctRootEntityResultTransformer transform. I will start create my custom transform class and I hope that will work.
Query over:
session.QueryOver<DictionaryMaster>().Fetch(x => x.DictionaryItems).Eager
.TransformUsing(new DistinctRootEntityResultTransformer())
.Cacheable().CacheMode(CacheMode.Normal)
.List<DictionaryMaster>().ToList();
Exception is the same as in Criteria.
Linq:
session.Query<DictionaryMaster>().Fetch(x => x.DictionaryItems).Cacheable().CacheMode(CacheMode.Normal).ToList();
Here error depends from the version of nhibernate in 3.1 a got this error https://nhibernate.jira.com/browse/NH-2587?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel
but in 3.2 version i got this: https://nhibernate.jira.com/browse/NH-2856
Thanks in advance
sJHony I have found solution, but for me is more like workaround. Therefore if you know any other solution give me sign.
I utilize QueryOver without any transformation, the fault of this solution is that the query return elements equal amount of childs. Next I retrieve not multiplied list in memory using distinct.
This solution is ok, but when we add one more Fetch for query, then collection in object also be multiplied, that is why i modified collection type from IList(Bag) to ISet(Set).
Code looks like:
var queryCacheResult =
session.QueryOver<DictionaryMaster>()
.Fetch(x => x.DictionaryItems).Eager
.Cacheable().CacheMode(CacheMode.Normal)
.List<DictionaryMaster>().ToList();
return queryCacheResult.Distinct(new KeyEqualityComparer<DictionaryMaster>(x => x.Code)).ToList();
I have rails controller coding as below:
#checked_contact_ids = #list.contacts.all(
:conditions => {
"contacts_lists.contact_id" => #list.contacts.map(&:id),
"contacts_lists.is_checked" => true
}
).map(&:id)
its equivalent to sql
SELECT *
FROM "contacts"
INNER JOIN "contacts_lists" ON "contacts".id = "contacts_lists".contact_id
WHERE ("contacts_lists".list_id = 67494 )
This above query takes more time to run, I want another way to run the same query with minimum time.
Is anyone knows please notice me Or is it possible? or is the above query enough for give output?
I am waiting information...................
I think the main problem with your original AR query is that it isn't doing any joins at all; you pull a bunch of objects out of the database via #list.contacts and then throw most of that work away to get just the IDs.
A first step would be to replace the "contacts_lists.contact_id" => #list.contacts.map(&:id) with a :joins => 'contact_lists' but you'd still be pulling a bunch of stuff out of the database, instantiating a bunch of objects, and then throwing it all away with the .map(&:id) to get just ID numbers.
You know SQL already so I'd probably go straight to SQL via a convenience method on your List model (or whatever #list is), something like this:
def checked_contact_ids
connection.execute(%Q{
SELECT contacts.id
FROM contacts
INNER JOIN contacts_lists ON contacts.id = contacts_lists.contact_id
WHERE contacts_lists.list_id = #{self.id}
AND contacts_lists.is_checked = 't'
}).map { |r| r['id'] }
end
And then, in your controller:
#checked_contact_ids = #list.checked_contact_ids
If that isn't fast enough then review your indexes on the contacts_lists table.
There's no good reason not go straight to SQL when you know exactly what data you need and you need it fast; just keep the SQL isolated inside your models and you shouldn't have any problems.
In a Rails app, I have a model, Machine, that contains the following named scope:
named_scope :needs_updates, lambda {
{ :select => self.column_names.collect{|c| "\"machines\".\"#{c}\""}.join(','),
:group => self.column_names.collect{|c| "\"machines\".\"#{c}\""}.join(','),
:joins => 'LEFT JOIN "machine_updates" ON "machine_updates"."machine_id" = "machines"."id"',
:having => ['"machines"."manual_updates" = ? AND "machines"."in_use" = ? AND (MAX("machine_updates"."date") IS NULL OR MAX("machine_updates"."date") < ?)', true, true, UPDATE_THRESHOLD.days.ago]
}
}
This named scope works fine in development mode. In production mode, however, it returns the 2 models as expected, but the models are empty or uninitialized; that is, actual objects are returned (not nil), but all the fields are nil. For example, when inspecting the return value of the named scope in the console, the following is returned:
[#<Machine >, #<Machine >]
But, as you can see, all the fields of the objects returned are set to nil.
The production and development environments are essentially the same. Both are using a SQLite database. Here is the SQL statement that is generated for the query:
SELECT
"machines"."id",
"machines"."machine_name",
"machines"."hostname",
"machines"."mac_address",
"machines"."ip_address",
"machines"."hard_drive",
"machines"."ram",
"machines"."machine_type",
"machines"."use",
"machines"."comments",
"machines"."in_use",
"machines"."model",
"machines"."vendor_id",
"machines"."operating_system_id",
"machines"."location",
"machines"."acquisition_date",
"machines"."rpi_tag",
"machines"."processor",
"machines"."processor_speed",
"machines"."manual_updates",
"machines"."serial_number",
"machines"."owner"
FROM
"machines"
LEFT JOIN
"machine_updates" ON "machine_updates"."machine_id" = "machines"."id"
GROUP BY
"machines"."id",
"machines"."machine_name",
"machines"."hostname",
"machines"."mac_address",
"machines"."ip_address",
"machines"."hard_drive",
"machines"."ram",
"machines"."machine_type",
"machines"."use",
"machines"."comments",
"machines"."in_use",
"machines"."model",
"machines"."vendor_id",
"machines"."operating_system_id",
"machines"."location",
"machines"."acquisition_date",
"machines"."rpi_tag",
"machines"."processor",
"machines"."processor_speed",
"machines"."manual_updates",
"machines"."serial_number",
"machines"."owner"
HAVING
"machines"."manual_updates" = 't'
AND "machines"."in_use" = 't'
AND (MAX("machine_updates"."date") IS NULL
OR MAX("machine_updates"."date") < '2010-03-26 13:46:28')
Any ideas what's going wrong?
This might not be related to what is happening to you, but it sounds similar enough, so here it goes: are you using the rails cache for anything?
I got nearly the same results as you when I tried to cache the results of a query (as explained on railscast #115).
I tracked down the issue to a still open rails bug that makes cached ActiveRecords unusable - you have to choose between not using cached AR or applying a patch and getting memory leaks.
The cache works ok with non-AR objects, so I ended up "translating" the stuff I needed to integers and arrays, and cached that.
Hope this helps!
Seems like the grouping may be causing the problem. Is the data also identical in both dev & production?
Um, I'm not sure you're having the problem you think you're having.
[#<Machine >, #<Machine >]
implies that you have called "inspect" on the array... but not on each of the individual machine-objects inside it. This may be a silly question, but have you actually tried calling inspect on the individual Machine objects returned to really see if they have nil in the columns?
Machine.needs_updates.each do |m|
p m.inspect
end
?
If that does in fact result in nil-column data. My next suggestion is that you copy the generated SQL and go into the standard mysql interface and see what you get when you run that SQL... and then paste it into your question above so we can see.