Best Practice For Updating Entity in Web Api - sql

I'm researching best practice for updating entity from action that called by client. There are several ways to do that but none of them seem best practice.
1- Getting datas that will be updated via reflection from request model and update entity with these properties. But reflection doesn't recommended to use in web api.
2- Sending all datas of entity to client and getting it's updated version from request. It seems make unnecessary traffic.
3- Getting datas that will be updated and check them with if else conditions for getting which ones changed. It's so basic and not generic, seems so unprofessionally.
Request Model that i talked about is clone of entity model.

First off, don't use Reflection. It's slow as hell and makes your code extra fragile.
When it comes to EF, usually there are 3 possible solutions:
1; The client sends the whole updated entity, and only the updated entity. In this case, you simply have to attach the entity to the corresponding entityset and mark the entity state as Modified.
2; The client sends both the original entity and the updated entity. You attach the original and set the currentvalues to the the update entity.
3; The client only sends the modified properties, not the whole entity. In this case you have to query the original entity from the db and set the properties either one by one or again override the currentvalues.
The 3 approaches differ in their bandwith requirement and the number of queries they make.
1; If we take this as the baseline, it has a bandwith requirement of sending one entity from the client to the server, and then sending this one entity from the server to the db. This makes 1 db query altogehter (attaching does not require querying, so only the saving changes part initiates the query).
2; This has a bandwith of sending two entities from the client to the server. Here you have to send less data from the server to the db, because the changed properties are calculated when you set the currentvalues. Again, just 1 query (attaching and setting currentvalues don't initiate queries, so only the saving changes part creates a query).
3; This has the least bandwith requirement both from the client to the server and from the server to the db (both times only the changed properties are sent). However, this does need one more query besides saving, because you have to query the original values from the db, before setting the changes.
I ususally find the the first approach is a good trade-off between the other two. It does send more data than the third, but still less than the second, and it only initiates the one query for saving data. Also I like to minimize the traffic between the client and the server even if it means there is more traffic between the server and the db. The clients (for me at least) are usually mobile, so no guaranteed bandwith, no guaranteed battery lifetime. The server and the db are much "closer" and they don't have these restrictions. But of course this can be different for your application.

Related

ASP.NET Core - caching users from Identity

I'm working with a standard ASP.NET Core 2.1 program, and I've been considering the problem that a great many of my controller methods require the current logged on user to be retrieved.
I noticed that the asp.net core Identity code uses a DBSet to hold the entities and that subsequent calls to it should be reading from the local entities in memory and not hitting the DB, but it appears that every time, my code requires a DB read (I know as I'm running SQL Profiler and see the select queries against AspNetUsers being run using Id as the key)
I know there's so many ways to set Identity up, and its changed over the versions that maybe I'm not doing something right, or is there a fundamental problem here that could be addressed.
I set up the default EF and Identity stores in startup.cs's ConfigureServices:
services.AddDbContext<MyDBContext>(options => options.UseSqlServer(Configuration.GetConnectionString("MyDBContext")));
services.AddIdentity<CustomIdentity, Models.Role>().AddDefaultTokenProviders().AddEntityFrameworkStores<MyDBContext>();
and read the user in each controller method:
var user = await _userManager.GetUserAsync(HttpContext.User);
in the Identity code, it seems that this method calls the UserStore FindByIdAsync method that calls FindAsync on the DBSet of users.
the EF performance paper says:
It’s important to note that two different ObjectContext instances will have two different ObjectStateManager instances, meaning that they have separate object caches.
So what could be going wrong here, any suggestions why ASP.NET Core's EF calls within Userstore are not using the local DBSet of entities? Or am I thinking this wrongly - and each time a call is made to a controller, a new EF context is created?
any suggestions why ASP.NET Core's EF calls within Userstore are not using the local DBSet of entities?
Actually, FindAsync does do that. Quoting msdn (emphasis mine)...
Asynchronously finds an entity with the given primary key values. If an entity with the given primary key values exists in the context, then it is returned immediately without making a request to the store. Otherwise, a request is made to the store for an entity with the given primary key values and this entity, if found, is attached to the context and returned. If no entity is found in the context or the store, then null is returned.
So you can't avoid the initial read per request for the object. But subsequent reads in the same request won't query the store. That's the best you can do outside crazy levels of micro-optimization
Yes. Controller's are instantiated and destroyed with each request, regardless of whether it's the same or a different user making the request. Further, the context is request-scoped, so it too is instantiated and destroyed with each request. If you query the same user multiple times during the same request, it will attempt to use the entity cache for subsequent queries, but you're likely not doing that.
That said, this is a text-book example of premature optimization. Querying a user from the database is an extremely quick query. It's just a simple select statement on a primary key - it doesn't get any more quick or simple as far as database queries go. You might be able to save a few milliseconds if you utilize memory caching, but that comes with a whole set of considerations, particularly being careful to segregate the cache by user, so that you don't accidentally bring in the wrong data for the wrong user. More to the point, memory cache is problematic for a host of reasons, so it's more typical to use distributed caching in production. Once you go there, caching doesn't really buy you anything for a simple query like this, because you're merely fetching it from the distributed cache store (which could even be a database like SQL Server) instead of your database. It only makes sense to cache complex and/or slow queries, as it's only then that retrieving it from cache actually ends up being quicker than just hitting the database again.
Long and short, don't be afraid to query the database. That's what it's there for. It's your source for data, so if you need the data, make the query. Once you have your site going, you can profile or otherwise monitor the performance, and if you notice slow or excessive queries, then you can start looking at ways to optimize. Don't worry about it until it's actually a problem.

Handling paging with changing sort orders

I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.
I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.
Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).
Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?
Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.
This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:
Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.
Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.
Maintain different state for each client
This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.
There are variations within those that mix the various compromises, but that's what it all boils down to.
For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.
Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.
Etc.
See also:
How to provide an API client with 1,000,000 database results?
Using "Cursors" for paging in PostgreSQL
Iterate over large external postgres db, manipulate rows, write output to rails postgres db
offset/limit performance optimization
If PostgreSQL count(*) is always slow how to paginate complex queries?
How to return sample row from database one by one
I'd probably implement a hybrid solution of some form, like:
Using a cursor, read and immediately send the first part of the data to the client.
Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.
Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.
If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.
That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.
I am not aware of a perfect solution for this problem. But if you want the user to have a stale view of the data then cursor is the way to go. Only tuning you can do is to store only the data for 1st 2 pages in the cursor. Beyond that you fetch it again.

Good ways to decouple GUIs from SOAP/WS-API update/write calls?

Let's assume we have some configuration GUI that in its current form uses direct DB transactions to submit new configurations for more than one configurable component in a consistent manner.
Now let's move the data (DB) stuff behind some SOAP/WS API. The GUI has no direct DB access anymore. The transactional behaviour must remain, but the API should NOT be designed to explcitly accommodate the GUI form submissions. In fact, I don't even know how the new GUI will work or how the user input will be structured. Therefore I need to provide something like WS-AtomicTransaction on the API server side. However, there are (at least) two caveats:
The GUI is written in PHP: I don't think there is any WS-Transaction support in PHP available.
I don't want to keep DB transactions open on the server side while waiting for additional client requests.
Solutions I can think of:
using Camel's aggregation. However, that would make things more complicated in at least two ways:
You cannot use DB row ids of newly inserted rows in the subsequent calls inside the same transaction. You need to use some sort of symbolic back-referencing because there would be no communication between client and server while processing the aggregated messages.
call replies would not be immediate (or the immediate and separate reply to each single call would only be some sort of a stub, ie. not containing any useful information beyond "your message has been attached to TX xyz" -- if that's at all possible in the Camel aggregation case).
the two disadvantages of the previous solution make me think of request batches where possibly the WS standards provide means for referencing call results in subsequent calls inside the batch transaction. Is there any such thing already available? Maybe even as a PHP client?
trying to eliminate lock contention in the database by carefully using row-level locks etc. However, when inserting new elements, my guess is that usually pages and index pages need to be locked by the DB.
maybe some server-side persistence layer using optimistic locking? But again, that would not return any DB IDs back to the client before the final commit if DB writes would be postponed until the commit (don't know if that's possible at all).
What do YOU think?
Transactions are a powerful tool and we easily get into a thinking pattern in which we see every problem as a nail we hit with this big hammer. I can relate to your confusion because I've experienced it myself. Unfortunately I have no better advice for you than to try not think in terms of transactions but of atomic API calls.
When I think in terms of transactions, my thought pattern usually goes like this:
start transaction
read (repeat as required)
update (repeat as required)
commit/roll back
It takes some time to realize that we overuse this pattern. Actual conflicts are rare and there are many other ways of dealing with them. Here is a commonly used one in APIs
read and send data to client (atomic API call)
update data (on the client)
send original + updates back to the server (atomic API call)
start transaction (on server)
read
compare with original from client
if not same, return error (client should retry)
if same, update
commit
The last six points are part of the implementation of the API call.
Ferenc Mihaly
http://theamiableapi.com

WCF/RIA with one common set of CRUD methods

I am very new to WCF/RIA services. I am looking to build an application using PRISM/MEF where I can offer new plug-ins for the application from time to time. Now, my database structure is pretty much static. It will not see many changes during its life (but there still might be a few). The new plug-ins will use the entity classes exposed by the database.
My Question is when I create new plug-in controls, these controls might need some special server side methods to be run. Which would mean I update my WCF/RIA service to account for the new methods. I really want to avoid that and was wondering if it is possible to create a WCF service that has just 4 CRUD mehods. I can pass any entity to these methods and depending upon the type, the entity gets saved, updated or deleted. Also it lets me pass any kind of LINQ query to the get method and returns me the appropriate results. The goal is to avoid making changes to WCF service unless the underlying DB structure changes.
Whatever special methods I add to my plug-in, they could simply mean passing complex LINQ queries to the generic Get method and get the results on the client side. Most of entity management happens on the client. WCF becomes a simple (yet powerful) layer over my database that lets me access any entity and process any complex query based on client side LINQ queries.
Thanks,
M
Have these 4 CRUD operations in a seperated Domain Service.

NHibernate Caching Dilemma

My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.
Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.
Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.
In my ideal fantasy world, my services would not have to change at all:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
// execute HQL/criteria call and have it automatically use the cache where possible
}
There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.
However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:
Configure it to always have the last 7 days worth of data in the cache. eg. "For this table, cache all records where this field is between 7 days ago and now."
Have the ability to manually maintain the cache. As new data enters the system, it would be nice if I could just throw it straight into the cache rather than waiting until the cache is invalidated. Similarly, as data falls out of the time period, I'd like to be able to pull it from the cache.
Have NHibernate intelligently understand when it can serve a query directly from the cache rather than hitting the database at all. eg. If the user asks for an aggregate of data over the past 3 days, that aggregation should be calculated directly from the cache rather than touching the DB.
Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?
Assuming #3 is asking too much, I would require some logic in my services like this:
public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
if (CanBeServicedFromCache(starting, ending, filter))
{
// execute some LINQ to object code or whatever to determine the aggregation results
}
else
{
// execute HQL/criteria call to determine the aggregation results
}
}
This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.
That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.
I suspect if I can get a hold of the implementation of ICache at runtime, all I would need to do is call the Put() method to stick my data into the cache. But this might be treading on dangerous ground...
Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?
Thanks
PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.
Stop using your transactional ( OLTP ) datasource for analytical ( OLAP ) queries and the problem goes away.
When a domain significant event occurs (eg a new entity enters the system or is updated), fire an event ( a la domain events ). Wire up a handler for the event that takes the details of the created or updated entity and stores the data in a denormalised reporting store specifically designed to allow reporting of the aggregates you desire ( most likely push the data into a star schema ). Now your reporting is simply the querying of aggregates ( which may even be precalculated ) along predefined axes requiring nothing more than a simple select and a few joins. Querying can be carried out using something like L2SQL or even simple parameterised queries and datareaders.
Performance gains should be significant as you can optimise the read side for fast lookups across many criteria while optimising the write side for fast lookups by id and reduced index load on write.
Additional performance and scalability is also gained as once you have migrated to this approach, you can then physically separate your read and write stores such that you can run n read stores for every write store thereby allowing your solution to scale out to meet increased read demands while write demands increase at a lower rate.
Define 2 cache regions "aggregation" and "aggregation.today" with a large expiry time. Use these for your aggregation queries for previous days and today respectively.
In DoIt(), make 1 NH query per day in the requested range using cacheable queries. Combine the query results in C#.
Prime the cache with a background process which calls DoIt() periodically with the date range that you need to be cached. The frequency of this process must be lower than the expiry time of the aggregation cache regions.
When today's data changes, clear cache region "aggregation.today". If you want to reload this cache region quickly, either do so immediately or have another more frequent background process which calls DoIt() for today.
When you have query caching enabled, NHibernate will pull the results from cache if possible. This is based on the query and parameters values.
When analyzing the NHibernate cache details i remember reading something that you should not relay on the cache being there, witch seems a good suggestion.
Instead of trying to make your O/R Mapper cover your applications needs i think rolling your own data/cache management strategy might be more reasonable.
Also the 7 days caching rule you talk about sounds like something business related, witch is something the O/R mapper should not know about.
In conclusion make your app work without any caching at all, than use a profiler (or more - .net,sql,nhibernate profiler ) to see where the bottlenecks are and start improving the "red" parts by eventually adding caching or any other optimizations.
PS: about caching in general - in my experience one caching point is fine, two caches is in the gray zone and you should have a strong reason for the separation and more than two is asking for trouble.
hope it helps