How to Convert foor loop to NHibernate Futures for performance - nhibernate

NHibernate Version: 3.4.0.4000
I'm currently working on optimizing our code so that we can reduce the number of round trips to the database and am looking at a for loop that is one of the culprits. I'm having a hard time figuring out how to batch all of these iterations into a future that gets executed once when sent to SQL Server. Essentially each iteration of the loop causes 2 queries to hit the database!
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = _session.Query<OptionVersion>().Where(x => x.Option.Id == choice.OptionId).OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault();
choice.ChoiceVersion = _session.Query<ChoiceVersion>().OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).Where(x => x.Choice.Id == choice.ChoiceId).FirstOrDefault();
}

One option is to extract OptionId and ChoiceId from all the LineItemChoices into two lists in local memory. Then issue just two queries, one for options and one for choices, giving these lists in .Where(x => optionIds.Contains(x.Option.Id)). This corresponds to SQL IN operator. This requires some postprocessing. You will get two result lists (transform to dictionary or lookup if you expect many results), that you need to process to populate the choice objects. This postprocessing is local and tends to be very cheap compared to database roundtrips. This option can be a bit tricky if the existing FirstOrDefault part is absolutely necessary. Do you expect there to be more than result for a single optionId? If not, this code could instead have used SingleOrDefault, which could just be dropped if converting to use IN-queries.
The other option is to use futures (https://nhibernate.info/doc/nhibernate-reference/performance.html#performance-future). For Linq it means to use ToFuture or ToFutureValue at the end, which also conflicts with FirstOrDefault I believe. The important thing is that you need to loop over all line item choices to initialize ALL queries BEFORE you access the value of any of them. So this is likely to also result in some postprocessing, where you would first store the future values in some list, and then in a second loop access the real value from each query to populate the line item choice.
If you to expect that the queries can yield more than one result (before applying FirstOrDefault), I think you can just use Take(1) instead, as that will still return an IQueryable where you can apply the future method.
The first option is probably the most efficient, since it will just be two queries and allow the database engine to make just one pass over the tables.
Keep the limit on the maximum number of parameters that can be given in an SQL query in mind. If there can be thousands of line item choices, you may need to split them in batches and query for at most 2000 identifiers per round trip.

Adding on the Oskar answer, NHibernate Futures was implement in NHibernate 2.1. It is available on method Future for collections and FutureValue for single values.
In your case, you could separate the IDs of the list in memory ...
var optionIds = lineItem.LineItemChoices.Select(x => x.OptionId);
var choiceIds = lineItem.LineItemChoices.Select(x => x.ChoiceId);
... and execute two queries using Future<T> to get two lits in one hit over the database.
var optionVersions = _session.Query<OptionVersion>()
.Where(x => optionIds.Contains(x.Option.Id))
.OrderByDescending(x => x.OptionVersionNumber)
.Future<OptionVersion>();
var choiceVersions = _session.Query<ChoiceVersion>()
.Where(x => choiceIds.Contains(x.Choice.Id))
.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber)
.Future<ChoiceVersion>();
After with all you need in memory, you could loop on the original collection you have and search in memory the data to fill up the choice object.
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = optionVersions.OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault(x => x.Option.Id == choice.OptionId);
choice.ChoiceVersion = choiceVersions.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).FirstOrDefault(x => x.Choice.Id == choice.ChoiceId);
}

Related

Why does the where() method run SQL queries after all nested relations are eager-loaded?

In my controller method for the the index view I have the following line.
#students_instance = Student.includes(:memo_tests => {:memo_target => :memo_level})
So for each Student I eager-load all necessary info.
Later on in a .map block, I call the .where() method on one of the relations as shown below.
#all_students = #students_instance.map do |student|
...
last_pass = student.memo_tests.where(:result => true).last.created_at.utc
difference_in_weeks = ((last_pass.to_i - current_date.to_i) / 1.week).round
...
end
This leads to a single SQL query for each student. And since I have over 300+ students, leads to very slow load times and over 300+ SQL queries.
Am I right in thinking that this is caused by the .where() method. I think this because I have checked everything else and these are the two lines that cause all of the queries.
More importantly, is there a better way to do this that reduces these queries to a single query?
The moment you ask where, the statement is translated to a query. Normally, the result should be sql-cached...
Anyway, in order to be sure, you can instead add programming logic to your statement. That way, you are not requesting a NEW sql statement.
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.sort.last
EDIT
I see the OP's question does not require sorting... So, leaving the sorting out:
last_pass = student.memo_tests.map {|m| m.created_at if m.result}.compact.last
compact is required to remove nil results from the array.

How to handle caching counters by redis?

I am using Postgres as main DB and REDIS for caching. I am working on caching mechanism for one db query which takes to much time (It's about 5-6 JOINS + nested SELECTS). For now I am caching results of this query using SET 'some key' JSON.stringify(query.result). This works fine, however I have one column that cannot be cached - it is called commentsCount. It has to be always up to date. As a temporary solution, I am querying db just for this one particular field like this:
app.get('/post/getBySlug/:slug',function(req,res,next){
var cacheKey = req.params.slug+'|'+req.params.language; // "my-post-slug|en-us" for example
cache.get(cacheKey, function(err, post){
throw err if err;
if(post) {
db.getPostCommentsCount({ where: { id: post.id }}).done(function(err,commentsCount){
throw err if err;
post.commentsCount = commentsCount;
res.json(post);
next()
})
} else {
db.getFullPostBySlug(req.params.slug, req.params.language).done(function(err, post){
throw err if err;
cache.set(cacheKey, post);
res.json(post);
next();
})
}
})
})
But it is still now what I want, because main DB is still queried. Is there any standard/good practise on storing counters in REDIS? My comment insert function looks like this:
START TRANSACTION
INSERT INTO "Comments" VALUES (...) // insert comments
UPDATE "Posts" SET "commentsCount" = "commentsCount" + 1 WHERE "Posts"."id" = 123456 // update counter on post
COMMIT TRANSACTION
I am using transaction because I dont want comment to be inserted without incrementing comments count. As a "side" question - is it better to make 2 sql queries in transaction or write a trigger to handle incrementing counter??
According to my query (I posted link to gist in comments):
We dont plan more than 2 languages (though it is possible)
I made those counters because I have to keep counters separate per language, be able to order by those separate counters and also be able to order by sum of the counters (total for all languages) - I found it hard to make query that would order by sum of columns from separate rows while still returning those rows... (At the begining counters were stored in language translations).
Generally this query looks for post where exists translation with specific 'slug' and 'language' (slug+language on post translation is unique index). Morover post has to be published (isPublished = boolean) and post.status has to be 'published' (status = enum) or post.iscomingSoon has to be true (isComingSoon = boolean). Do you have idea what index/ordering I could add to this query? Or should I just remove limit?
In every translation table I keep language as TEXT. It can be for example en-us or zh-cn etc. Do you think I should make it enum or maybe I should make another table to store languages and just keep language_id in translations?
Author actually can be null :)

Save huge array to database

First the introduction, in case there's is a better approach: I have a product table with *product_id* and stock, where stock can be as big as 5000 or 10000, I need to create a list (in another table) where I have a row for each item, this is, if a *propduct_id* has stock 1000 I'll have 1000 rows with this *product_id*, and plus, this list needs to be random.
I chose a PHP (symfony2) solution, as I found how to get a random single product_id based on stock or even how to random order the product list, but I didn't find how to "multiply" this rows by stock.
Now, the main problem:
So, in PHP it's no so difficult, get product_id list, "multiply" by stock and shuffle, the problem comes when I want to save:
If I use $em->flush every 100 records or more I get a memory overflow after a while
If I use $em->flush in every record it takes ages to save
This is my code to save which maybe you can improve:
foreach ($huge_random_list as $indice => $id_product)
{
$preasignacion = new ListaPreasignacion();
$preasignacion->setProductId($id_product);
$preasignacion->setOrden($indice+1);
$em->persist($preasignacion);
if ($indice % 100 == 0) $em->flush();
}
$em->flush();
Edit with final solution based on #Pazi suggestion:
$conn = $em->getConnection();
foreach ($huge_random_list as $indice => $id_product)
{
$conn->executeUpdate("insert into product_list(product_id, order) "
." values({$id_product}, {$indice})");
}
I would suggest to abstain from doctrine ORM and use the DBAL connection an pure sql queries for this purpose. I do this always in my applications, where I have to store much data in short time. Doctrine adds too much overhead with objects, checks and dehydrating. You can retrieve the DBAL connection via the DI container. For example in a contoller:
conn = $this->get('database_connection');
Read more about DBAL

Rails 3 selecting only values

In rails 3, I would like to do the following:
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id")
This works, but i get the following from the DB:
[{"some_other_connection_id":254},{"some_other_connection_id":315}]
Now, those id-s are the ones I need, but I am uncapable of making a query that only gives me the ids. I do not want to have to itterate over the resulst, only to get those numbers out. Are there any way for me to do this with something like :
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id").values()
Or something of that nautre?
I have been trying with the ".select_values()" found at Git-hub, but it only returns "some_other_connection_id".
I am not an expert in rails, so this info might be helpful also:
The "SomeModel" is a connecting table, for a many-to-many relation in one of my other models. So, accually what I am trying to do is to, from the array of IDs, get all the entries from the other side of the connection. Basicly I have the source ids, and i want to get the data from the models with all the target ids. If there is a magic way of getting these without me having to do all the sql myself (with some help from active record) it would be really nice!
Thanks :)
Try pluck method
SomeModel.where(:some => condition).pluck("some_field")
it works like
SomeModel.where(:some => condition).select("some_field").map(&:some_field)
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id").map &:some_other_connection_id
This is essentially a shorthand for:
results = SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id")
results.map {|row| row.some_other_connection_id}
Look at Array#map for details on map method.
Beware that there is no lazy loading here, as it iterates over the results, but it shouldn't be a problem, unless you want to add more constructs to you query or retrieve some associated objects(which should not be the case as you haven't got the ids for loading the associated objects).

Grouping items into first occurances

I totally can't get my head around writing a correct query for my problem this morning so here's hoping that someone out there can help me out.
I have a database table called Sessions which basically looks like this
Sessions:
SessionID
SessionStarted
IPAddress
..other meta data..
I have a requirement where I am to show how many new Sessions (where new is defined as from a previously unseen IPAddress) arrive each day over a given period. Basically, each IPAddress should count only once in the results, namely for the day of the first session from the IPAddress. So I'm looking for a result like:
[Date] [New]
2009-10-01 : 11
2009-10-02 : 6
2009-10-03 : 19
..and so on
...which I can plot on some nice chart and show to important people. I would very much prefer a Linq2SQL query as that is what we are currently using for data access, but if I'm out of luck I may be able to go with some raw SQL (accessed via stored procedure, but I would really, really, really prefer Linq2SQL).
(As a bonus my next step will very likely be qualifying which sessions should be included by filtering on some of the other meta data)
Hoping that someone clever will help me out here...
I would use something like this.
var result = data.OrderBy(x => x.SessionStarted)
.GroupBy(x => x.IPAddress)
.Select(x => x.First())
.GroupBy(x => x.SessionStarted.Date)
.Select(x => new { Date = x.Key, New = x.Count() });