ASP.NET Core - caching users from Identity - asp.net-core

I'm working with a standard ASP.NET Core 2.1 program, and I've been considering the problem that a great many of my controller methods require the current logged on user to be retrieved.
I noticed that the asp.net core Identity code uses a DBSet to hold the entities and that subsequent calls to it should be reading from the local entities in memory and not hitting the DB, but it appears that every time, my code requires a DB read (I know as I'm running SQL Profiler and see the select queries against AspNetUsers being run using Id as the key)
I know there's so many ways to set Identity up, and its changed over the versions that maybe I'm not doing something right, or is there a fundamental problem here that could be addressed.
I set up the default EF and Identity stores in startup.cs's ConfigureServices:
services.AddDbContext<MyDBContext>(options => options.UseSqlServer(Configuration.GetConnectionString("MyDBContext")));
services.AddIdentity<CustomIdentity, Models.Role>().AddDefaultTokenProviders().AddEntityFrameworkStores<MyDBContext>();
and read the user in each controller method:
var user = await _userManager.GetUserAsync(HttpContext.User);
in the Identity code, it seems that this method calls the UserStore FindByIdAsync method that calls FindAsync on the DBSet of users.
the EF performance paper says:
It’s important to note that two different ObjectContext instances will have two different ObjectStateManager instances, meaning that they have separate object caches.
So what could be going wrong here, any suggestions why ASP.NET Core's EF calls within Userstore are not using the local DBSet of entities? Or am I thinking this wrongly - and each time a call is made to a controller, a new EF context is created?

any suggestions why ASP.NET Core's EF calls within Userstore are not using the local DBSet of entities?
Actually, FindAsync does do that. Quoting msdn (emphasis mine)...
Asynchronously finds an entity with the given primary key values. If an entity with the given primary key values exists in the context, then it is returned immediately without making a request to the store. Otherwise, a request is made to the store for an entity with the given primary key values and this entity, if found, is attached to the context and returned. If no entity is found in the context or the store, then null is returned.
So you can't avoid the initial read per request for the object. But subsequent reads in the same request won't query the store. That's the best you can do outside crazy levels of micro-optimization

Yes. Controller's are instantiated and destroyed with each request, regardless of whether it's the same or a different user making the request. Further, the context is request-scoped, so it too is instantiated and destroyed with each request. If you query the same user multiple times during the same request, it will attempt to use the entity cache for subsequent queries, but you're likely not doing that.
That said, this is a text-book example of premature optimization. Querying a user from the database is an extremely quick query. It's just a simple select statement on a primary key - it doesn't get any more quick or simple as far as database queries go. You might be able to save a few milliseconds if you utilize memory caching, but that comes with a whole set of considerations, particularly being careful to segregate the cache by user, so that you don't accidentally bring in the wrong data for the wrong user. More to the point, memory cache is problematic for a host of reasons, so it's more typical to use distributed caching in production. Once you go there, caching doesn't really buy you anything for a simple query like this, because you're merely fetching it from the distributed cache store (which could even be a database like SQL Server) instead of your database. It only makes sense to cache complex and/or slow queries, as it's only then that retrieving it from cache actually ends up being quicker than just hitting the database again.
Long and short, don't be afraid to query the database. That's what it's there for. It's your source for data, so if you need the data, make the query. Once you have your site going, you can profile or otherwise monitor the performance, and if you notice slow or excessive queries, then you can start looking at ways to optimize. Don't worry about it until it's actually a problem.

Related

Best Practice For Updating Entity in Web Api

I'm researching best practice for updating entity from action that called by client. There are several ways to do that but none of them seem best practice.
1- Getting datas that will be updated via reflection from request model and update entity with these properties. But reflection doesn't recommended to use in web api.
2- Sending all datas of entity to client and getting it's updated version from request. It seems make unnecessary traffic.
3- Getting datas that will be updated and check them with if else conditions for getting which ones changed. It's so basic and not generic, seems so unprofessionally.
Request Model that i talked about is clone of entity model.
First off, don't use Reflection. It's slow as hell and makes your code extra fragile.
When it comes to EF, usually there are 3 possible solutions:
1; The client sends the whole updated entity, and only the updated entity. In this case, you simply have to attach the entity to the corresponding entityset and mark the entity state as Modified.
2; The client sends both the original entity and the updated entity. You attach the original and set the currentvalues to the the update entity.
3; The client only sends the modified properties, not the whole entity. In this case you have to query the original entity from the db and set the properties either one by one or again override the currentvalues.
The 3 approaches differ in their bandwith requirement and the number of queries they make.
1; If we take this as the baseline, it has a bandwith requirement of sending one entity from the client to the server, and then sending this one entity from the server to the db. This makes 1 db query altogehter (attaching does not require querying, so only the saving changes part initiates the query).
2; This has a bandwith of sending two entities from the client to the server. Here you have to send less data from the server to the db, because the changed properties are calculated when you set the currentvalues. Again, just 1 query (attaching and setting currentvalues don't initiate queries, so only the saving changes part creates a query).
3; This has the least bandwith requirement both from the client to the server and from the server to the db (both times only the changed properties are sent). However, this does need one more query besides saving, because you have to query the original values from the db, before setting the changes.
I ususally find the the first approach is a good trade-off between the other two. It does send more data than the third, but still less than the second, and it only initiates the one query for saving data. Also I like to minimize the traffic between the client and the server even if it means there is more traffic between the server and the db. The clients (for me at least) are usually mobile, so no guaranteed bandwith, no guaranteed battery lifetime. The server and the db are much "closer" and they don't have these restrictions. But of course this can be different for your application.

NHibernate Second Level Cache with database change notification on desktop App

I am developing a WPF application using NHibernate to communicate with a PostgreSQL Database.
The only caching provider that works on a desktop app is Bamboo Prevalence (correct me if I am wrong). Given that every computer running my application will have different Session Factory, my application retrieves stale data from the cache.
My question is, how can I tell NHibernate/Prevalence to look at the timestamp of when the data was last updated, and if the cache is stale, refresh it?
Well, I found out that there is no way the Second Level cache can know if the database was changed outside Nhibernate/Cache, so what I did was creating a new column 'Timestamp' on all my tables.
On my queries, I first select the timestamp of the db using Session.Cachemode(CacheMode.Ignore) to get the timestamp of the db and I compare with the result from the cache. In the case the timestamps differ, I invalidate the cache for that query and run it again.
About the SysCache, even knowing it 'can work' on a WPF desktop app, I was not keen to use System.Web.Cache as my application would need the the complete .Net Framework instead of the Client Profile. I did a search and for my happiness someone wrote a Nhiberate cache proviver that implements (System.Runtime.Caching), witch is not a ASP.Net component. If anyone is interested you can find the source at:
https://github.com/Leftyx/nhcontrib/tree/master/src/NHibernate.Caches/MemoryCache
Well that is a property that you could set at the cache level and expire items according to your applications needs and then have the cache. Ncache is a possible L2 cache provider for NHibernate. NCache ensures that its cache is consistent across multiple servers and all cache updates are synchronized correctly so no data integrity issues arise. To learn more please visit:
http://www.alachisoft.com/ncache/nhibernate-l2cache-index.html

Repeating a query does not refresh the properties of the returned objects

When doing a criteria query with NHibernate, I want to get fresh results and not old ones from a cache.
The process is basically:
Query persistent objects into NHibernate application.
Change database entries externally (another program, manual edit in SSMS / MSSQL etc.).
Query persistence objects (with same query code), previously loaded objects shall be refreshed from database.
Here's the code (slightly changed object names):
public IOrder GetOrderByOrderId(int orderId)
{
...
IList result;
var query =
session.CreateCriteria(typeof(Order))
.SetFetchMode("Products", FetchMode.Eager)
.SetFetchMode("Customer", FetchMode.Eager)
.SetFetchMode("OrderItems", FetchMode.Eager)
.Add(Restrictions.Eq("OrderId", orderId));
query.SetCacheMode(CacheMode.Ignore);
query.SetCacheable(false);
result = query.List();
...
}
The SetCacheMode and SetCacheable have been added by me to disable the cache. Also, the NHibernate factory is set up with config parameter UseQueryCache=false:
Cfg.SetProperty(NHibernate.Cfg.Environment.UseQueryCache, "false");
No matter what I do, including Put/Refresh cache modes, for query or session: NHibernate keeps returning me outdated objects the second time the query is called, without the externally committed changes. Info btw.: the outdated value in this case is the value of a Version column (to test if a stale object state can be detected before saving). But I need fresh query results for multiple reasons!
NHibernate even generates an SQL query, but it is never used for the values returned.
Keeping the sessions open is neccessary to do dynamic updates on dirty columns only (also no stateless sessions for solution!); I don't want to add Clear(), Evict() or such everywhere in code, especially since the query is on a lower level and doesn't remember the objects previously loaded. Pessimistic locking would kill performance (multi-user environment!)
Is there any way to force NHibernate, by configuration, to send queries directly to the DB and get fresh results, not using unwanted caching functions?
First of all: this doesn't have anything to do with second-level caching (which is what SetCacheMode and SetCacheable control). Even if it did, those control caching of the query, not caching of the returned entities.
When an object has already been loaded into the current session (also called "first-level cache" by some people, although it's not a cache but an Identity Map), querying it again from the DB using any method will never override its value.
This is by design and there are good reasons for it behaving this way.
If you need to update potentially changed values in multiple records with a query, you will have to Evict them previously.
Alternatively, you might want to read about Stateless Sessions.
Is this code running in a transaction? Or is that external process running in a transaction? If one of those two is still in a transaction, you will not see any updates.
If that is not the case, you might be able to find the problem in the log messages that NHibernate is creating. These are very informative and will always tell you exactly what it is doing.
Keeping the sessions open is neccessary to do dynamic updates on dirty columns only
This is either the problem or it will become a problem in the future. NHibernate is doing all it can to make your life better, but you are trying to do as much as possible to prevent NHibernate to do it's job properly.
If you want NHibernate to update the dirty columns only, you could look at the dynamic-update-attribute in your class mapping file.

Optimize NHibernate: Load all data from specified tables

Scenario: I have built an ASP.NET MVC application that manages my cooking recipes. I am using FluentNHibernate to access data from the following tables:
Users
Categories
Recipes
RecipeCategories (many-to-many junction table)
UserBookmarkedRecipes (many-to-many junction table)
UserCookedRecipes (many-to-many junction table)
Question: Is there any way to tell NHibernate to load all data from all tables listed above and store it in memory / in NHibernate's cache so that there do not have to be any additional database requests?
Motivation behind question: The variety of many-to-many relationships poses a problem and would greatly benefit from that optimization.
Note regarding data: The overall amount of data is extremely small. We are talking about less than 100 recipes at the moment.
Instead of preloading everything, I would suggest loading once and then holding it as it's accessed.
NHibernate maintains two different caches, and does a pretty good job of keeping them in sync with your underlying data store. By default, it uses what is called a "first level" cache on a per-session basis but I don't think that's what you want. You can read about the differences at the nhibernate faq page on caching
I suspect a second level cache is what you need (this is available throughout your app). You'll need to get a cache provider from NHContrib (download the version that matches your version of NHibernate). The SysCache2 provider will probably be the easiest to set up for your scenario, as long as your app will be the ONLY thing writing to the database. If other processes will be writing, you will need to ensure that all are using the same cache as an intermediary if you want it to stay in sync.
The second level cache is configured with a timeout that you can set to whatever you need. Don't think it can be infinite but you can set it to long periods if you want (probably not a terrible idea to go back to DB from time to time though). If you want to preload everything up front, you can simply access all your entities from your global.asax's Application_Start method, but this shouldn't be necessary.
You will need to configure your session factory to use the cache. Call the .Cache(...) method when fluently configuring your session factory, it should be relatively self-explanatory.
You will also need to set Cache.ReadWrite() in both your entity mappings AND your relationship mappings. You can do this by convention or by calling Cache.ReadWrite() in your fluent mappings.
Something like:
public class RecipeMap : ClassMap<Recipe> {
public RecipeMap () {
Cache.ReadWrite();
Id(x => x.Id);
HasManyToMany(x => x.Ingredients).Cache.ReadWrite();
}
}
On the Cache calls in your mappings you can specify ReadOnly or whatever else you may need. NonStrictReadWrite is an interesting one, it can boost performance significantly but at an increased risk of reading stale data from the cache.

NHibernate Caching Dilemma

My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.
Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.
Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.
In my ideal fantasy world, my services would not have to change at all:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
// execute HQL/criteria call and have it automatically use the cache where possible
}
There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.
However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:
Configure it to always have the last 7 days worth of data in the cache. eg. "For this table, cache all records where this field is between 7 days ago and now."
Have the ability to manually maintain the cache. As new data enters the system, it would be nice if I could just throw it straight into the cache rather than waiting until the cache is invalidated. Similarly, as data falls out of the time period, I'd like to be able to pull it from the cache.
Have NHibernate intelligently understand when it can serve a query directly from the cache rather than hitting the database at all. eg. If the user asks for an aggregate of data over the past 3 days, that aggregation should be calculated directly from the cache rather than touching the DB.
Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?
Assuming #3 is asking too much, I would require some logic in my services like this:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
if (CanBeServicedFromCache(starting, ending, filter))
{
// execute some LINQ to object code or whatever to determine the aggregation results
}
else
{
// execute HQL/criteria call to determine the aggregation results
}
}
This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.
That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.
I suspect if I can get a hold of the implementation of ICache at runtime, all I would need to do is call the Put() method to stick my data into the cache. But this might be treading on dangerous ground...
Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?
Thanks
PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.
Stop using your transactional ( OLTP ) datasource for analytical ( OLAP ) queries and the problem goes away.
When a domain significant event occurs (eg a new entity enters the system or is updated), fire an event ( a la domain events ). Wire up a handler for the event that takes the details of the created or updated entity and stores the data in a denormalised reporting store specifically designed to allow reporting of the aggregates you desire ( most likely push the data into a star schema ). Now your reporting is simply the querying of aggregates ( which may even be precalculated ) along predefined axes requiring nothing more than a simple select and a few joins. Querying can be carried out using something like L2SQL or even simple parameterised queries and datareaders.
Performance gains should be significant as you can optimise the read side for fast lookups across many criteria while optimising the write side for fast lookups by id and reduced index load on write.
Additional performance and scalability is also gained as once you have migrated to this approach, you can then physically separate your read and write stores such that you can run n read stores for every write store thereby allowing your solution to scale out to meet increased read demands while write demands increase at a lower rate.
Define 2 cache regions "aggregation" and "aggregation.today" with a large expiry time. Use these for your aggregation queries for previous days and today respectively.
In DoIt(), make 1 NH query per day in the requested range using cacheable queries. Combine the query results in C#.
Prime the cache with a background process which calls DoIt() periodically with the date range that you need to be cached. The frequency of this process must be lower than the expiry time of the aggregation cache regions.
When today's data changes, clear cache region "aggregation.today". If you want to reload this cache region quickly, either do so immediately or have another more frequent background process which calls DoIt() for today.
When you have query caching enabled, NHibernate will pull the results from cache if possible. This is based on the query and parameters values.
When analyzing the NHibernate cache details i remember reading something that you should not relay on the cache being there, witch seems a good suggestion.
Instead of trying to make your O/R Mapper cover your applications needs i think rolling your own data/cache management strategy might be more reasonable.
Also the 7 days caching rule you talk about sounds like something business related, witch is something the O/R mapper should not know about.
In conclusion make your app work without any caching at all, than use a profiler (or more - .net,sql,nhibernate profiler ) to see where the bottlenecks are and start improving the "red" parts by eventually adding caching or any other optimizations.
PS: about caching in general - in my experience one caching point is fine, two caches is in the gray zone and you should have a strong reason for the separation and more than two is asking for trouble.
hope it helps