For our web-application (ASP.NET) we're using Fluent NHibernate (2.1.2) with 2nd-Level caching not only for entities, but also for queries (generating queries with the criteria API). We're using the Session-Per-Request pattern and one SessionFactory applicationwide, so the cache serves all Nhibernate-Sessions.
Problem:
We have to deal with different "Access-Rigths" per user on the data-objects in our legacy-database (Oracle) - that is, views constrain the returning data per user-rights.
So there's the situation, where for example the same view is queried by our criteria with the excact same query, but returns a different resultset, depending on the user-Rights.
Now, to gain performance, the mentioned query is cached. But this gives us the problem, that when the query is first fired from an action of user A, it caches the resulting ID's, which are the ID's to which user A has access rights. Shortly after, the same query is fired from an action of user B and Nhibernate then picks the cached ID's from the first call (from user A) and tries to get the corresponding entities, to which User B doesn't have access-rights (or maybe not for all of them). We're checking the rights with event-listeners, so our appliction throws an access-right-exception in the mentioned case.
Thoughts:
Not caching the queries could be an option against this. But performance is cleary an issue in our application, so it would be really desirable to have cached queries user-wise.
We even thought about a SessionFactory per user, to have a cache per user, sort of. But this has clearly an impact on ressources, is somewhat of an overkill and honestly isn't an option, because
there are entities, which have to be accessed, and are manipulated, by multiple users (think of a user-group), creating issues with stale data in the "individual caches" and so on. So that's a no-go.
What would be a valid solution for this? Is there something like "best practice" for such a situation?
Idea:
As I was stuck with this yesterday, seeing no way out, I slept over it, and today I came up with some sort of a "hack".
As NHibernate caches the query by query-text and parameters ("clauses"), I thought about a way, to "smuggle" something user-dependent in that signature of the queries, so it would
cache every query per user, but would not alter the query itself (concerning the result of the query).
So "creativity" guided me to this (example-code):
string userName = GetCurrentUser();
ICriteria criteria = session.CreateCriteria(typeof (EntityType))
.SetCacheable(true)
.SetCacheMode(CacheMode.Normal)
.Add(Expression.Eq("PropertyA", 1))
.Add(Expression.IsNotNull("PropertyB"))
.Add(Expression.Sql(string.Format("'{0}' = '{0}'", userName)));
return criteria.List();
This line:
.Add(Expression.Sql(string.Format("{0} = {0}", userName)))
results in a where-clause, which always evaluates to true, but "changes" the query from Nhibernate's viewpoint, so it caches per separate "userName".
I know, it's kind of ugly and I'm not really pleased with it.
Does anybody knows any alternative approach?
thanks in advance.
Related
I have the following scenario where the search returns a list of userid values (1,2,3,4,5,6... etc.) If the search were to be run again, the results are guaranteed to change given some time. However I need to stored the instance of the search results to be used in the future.
We have a current implementation (legacy), which creates a record for the search_id with the criteria and inserts every row returned into a different table with the associated search_id.
table search_results
search_id unsigned int FK, PK (clustered index)
user_id unsigned int FK
This is an unacceptable approach as this table has grown onto millions of records. I've considered partitioning the table, but either I will have numerous partitions (1000s).
I've optimized the existing tables that search results expired unless they're used elsewhere, so all the search results are referenced elsewhere.
In the current schema, I cannot store the results as serialized arrays or XML. I am looking to efficiently store the search result information, such that it can be efficiently accessed later without being burdened by the number of records.
EDIT: Thank you for the answers, I don't have any problems running the searches themselves, but the result set for the search gets used in this case for recipient lists, which will be used over and over again, the purpose of storing is exactly to have a snapshot of the data at the given time.
The answer is don't store query results. It's a terrible idea!
It introduces statefulness, which is very bad unless you really (really really) need it
It isn't scalable (as you're finding out)
The data is stale as soon as it's stored
The correct approach is to fix your query/database so it runs acceptable quickly.
If you can't make the queries faster using better SQL and/or indexes etc, I recommend using lucene (or any text-based search engine) and denormalizing your database into it. Lucene queries are incredibly fast.
I recently did exactly this on a large web site that was doing what you're doing: It was caching query results from the production relational database in the session object in an attempt top speed up queries, but it was a mess, and wasn't much faster anyway - before my time, a "senior" java developer (whose name started with Jam.. and ended with .illiams) who was actually a moron decided it was a good idea.
I put in Solr (a java-tailored lucene implementation) and kept Solr up to date with the relational database (using work queues) and the web queries are now just a few milliseconds.
Is there a reason why you need to store every search? Surely you would want the most up to date information available for the user ?
I'll admit first, this isn't a great solution.
Setup another database alongside your current one [SYS_Searches]
The save script could use SELECT INTO [SYS_Searches].Results_{Search_ID}
The script that retrieves can do a simple SELECT out of the matching table.
Benefits:
Every search is neatly packed into it's own table, [preferably in another DB]
The retrieval query is very simple
The retrieval time should be very quick, no massive table scans.
Drawbacks:
You will have a table for every x user * y searches a user can store.
This could get very silly very quickly unless there is management involved to expire results or the user can only have 1 cached search result set.
Not pretty, but I can't think of another way.
Excuse the potential n00bness of this question - still trying to get my head around this non-relational NoSQL stuff.
I've been super impressed with the performance and simplicity of ElasicSearch, but I've got a mapping (borderline NoSQL theroy) question to answer before I dive too deeply into the implementation.
Lets continue to use the Twitter examples ElasticSearch have in their documentation.
Basically, we know a tweet belongs to an user, and a user has many tweets.
The objects look something like this:
user = {'screen_name':'d2kagw', 'id_str':'1234567890', 'favourites_count':'15', ...}
tweet = {'message':'lorem lipsum...', 'user_id_str':'1234567890', ...}
What I'm wondering is, can the tweet object have a reference to the user object?
Since I want to be able to write queries like:
{'query': {
'term':{'message':'lipsum'},
'range':{'user.favourites_count':{'from':10, 'to':30'}}
}}
Which I would like to return the tweets matching with the user objects as part of the response (vs. having to lazy load them later).
Am I asking too much of it?
Should I be expected to throw all the user data into the tweet object if I want to query the data in that way?
In my implementation (doesn't use twitter, this was just an elegant example) I need to have the two datasets as different indexes due to the various ways I have to query the data, so I'm not sure if I can use an object type AND have the index structure I require.
Thanks in advance for your help.
ElasticSearch doesn't really support table joins that we are so used to in SQL world. The closest it gets to it is Has Child Query that allows limiting results in one table based on a persence of a record in another table and even here it's limited to 1-to-many (parent-children) relationship.
So, a common approach in this world would be to denormalize everything and query one index at a time.
Version nHibernate 2.1
As can be seen from the vast array of similar questions - we're not alone in experiencing problems with paging generating duplicates. We thought it was just happening with HQL queries but one of our clients has reported seeing it where the query is a Criteria query.
So far we've only seen it on the reporting side - where we tend to collect bits of information from various 'associated' entities and use the AliasToBeanTransformer to put it into a DTO (DataTransferObject):
.SetResultTransformer(new AliasToBeanResultTransformer(typeof(OurDTO)));
We're not new to nHibernate, but we're certainly not aware of so many of the subtleties of it, and as a result weren't aware of
new NHibernate.Transform.DistinctRootEntityResultTransformer()
which could potentially eliminate our duplicates, but I'm struggling to see how we could do this when it's not a mapped entity, i.e. a DTO.
We've tried creating a custom dialect which seems to have served some people well enough to be confident of consistent behaviour.
I realise there's no such thing as a silver bullet and context is always the kicker, but has anyone managed to come up with a solution for this?
The code we use to handle the collation of the pages is as follows:
query.SetMaxResults(50);
for (int i = 0; ; ++i)
{
query.SetFirstResult(i * 50);
IList results = query.List();
cumulativeResults.AddRange(results);
OnRecordsLoaded(results.Count);
if (results.Count < 50)
{ break; }
Many thanks for any input on this.
With kind regards
Colin
NHibernate does not produce duplicates. The relational database does. And you cannot prevent that.
If your query involves a one - to -many join say you have customer and order tables and there is a one to many relation between customers and orders and you query the customers filtering by order, you will get multiple multiple customers (of same identity)
The way to prevent it to use HashedSets in memory assuming you propery overrode Equals and GetHashCode for your entities which you should. If you put the result into HashedSet (from Iesi or .NET 4) they will elminate the duplicates.
That's one of the gotchas of ORMs.
I'm working in a large project using Kohana 3 framework, actually I need to improve it adding a cache system to reduce the number of MySQL connections.
I'm thinking in develop a basic (but general) module to generate a full query results caching but separately manage the table query results into different groups.
Pex:
cache groups: users, roles, roles_users, etc.
Each group contains all the query results from the correspondant table. So, if I want to get values from 'users', the cache system would automatically add the result to the cache system, but if I update the 'users' table all the keys in 'users' group would be deleted. I know, it's not so smart but it's fast and safe (the system also generate user lists, and the results may be correct).
Then, my question is: ¿Where an how can I make the "injection" of my code in the application tree?
I need, firstly (to generate a hash key) the full query (for a certain table -used as group-), and the result of that query to store. And, when another hash (in the that group) is the same as stored one, the value must be getted from memcached.
So, I need: the table name, the query and the result... I think it's possible extending the Database class, implementing the cache in the execute() method, but I can't find it!
I'm in the correct way? Where the execute() method is?
I built a Kohana 3 module that accomplishes this, but it must be used with the query builder. It also uses Memcache to cache the queries. It invalidates on inserts/updates/deletes.
Here's a link:
Kohana Memcache Query Caching
I am starting to play with (Fluent) nHibernate and I am wondering if someone can help with the following. I'm sure it's a total noob question.
I want to do:
delete from TABX where name = 'abc'
where table TABX is defined as:
ID int
name varchar(32)
...
I build the code based on internet samples:
using (ITransaction transaction = session.BeginTransaction())
{
IQuery query = session.CreateQuery("FROM TABX WHERE name = :uid")
.SetString("uid", "abc");
session.Delete(query.List<Person>()[0]);
transaction.Commit();
}
but alas, it's generating two queries (one select and one delete). I want to do this in a single statement, as in my original SQL. What is the correct way of doing this?
Also, I noticed that in most samples on the internet, people tend to always wrap all queries in transactions. Why is that? If I'm only running a single statement, that seems an overkill. Do people tend to just mindlessly cut and paste, or is there a reason beyond that? For example, in my query above, if I do manage it to get it from two queries down to one, i should be able to remove the begin/commit transaction, no?
if it matters, I'm using PostgreSQL for experimenting.
You can do a delete in one step with the following code:
session.CreateQuery("DELETE TABX WHERE name = :uid")
.SetString("uid", "abc")
.ExecuteUpdate();
However, by doing it that way you avoid event listener calls (it's just mapped to a simple SQL call), cache updates, etc.
Your first query comes from query.List<Person>().
Your actual delete statement comes from session.Delete(...)
Usually, when you are dealing with only one object, you will use Load() or Get().
Session.Load(type, id) will create the object for you without looking it up in the database . However, as soon as you access one of the object's properties, it will hydrate the object.
Session.Get(type, id) will actually look up the data for you.
As far as transactions, this is a good article explaining why it is good to wrap all of your nHibernate queries with transactions.
http://nhprof.com/Learn/Alerts/DoNotUseImplicitTransactions
In NHibernate, I've noticed it is most common to do a delete with two queries like you see. I believe this is expected behavior. The only way around it off the top of my head is to use caching, then the first query could be loaded from the cache if it happened to be run earlier.
As far as wrapping everything in a transaction: in most databases, transactions are implicit for every query anyways. The explicit transactions are just a guarantee that the data won't be changed out from under you mid-operation.