Strange behaviour of code inside TransactionScope? - wcf

We are facing a very complex issue in our production application.
We have a WCF method which creates a complex Entity in the database with all its relation.
public void InsertEntity(Entity entity)
{
using(TransactionScope scope = new TransactionScope())
{
EntityDao.Create(entity);
}
}
EntityDao.Create(entity) method is very complex and has huge pieces of logic. During the entire process of creation it creates several child entities and also have several queries to database.
During the entire WCF request of entity creation usually Connection is maintained in a ThreadStatic variable and reused by the DAOs. Although some of the queries in DAO described in step 2 uses a new connection and closes it after use.
Overall we have seen that the above process behaviour is erratic. Some of the queries in the inner DAO does not even return actual data from the database? The same query when run to the actaul data store gives correct result.
What can be possible reason of this behaviour?

ThreadStatic is not recommended. Use CallContext instead. I have code at http://code.google.com/p/softwareishardwork/ which demos the correct way to handle connections in a manner you describe (tested in severe high performance scenarios). Try a test case using this code.

Related

Using IgniteSet as part of dynamic remote filter for Ignite's Continuous Query

Has anyone ever tried using IgniteSet or similar data structures for setting remote filters on a continuous query? There is not much documentation on how IgniteSet works and hence this question. Basically my use case is as follows:
I have a distributed cache implemented using Ignite. A user is interested in real time updates from my cache based on some criteria. I will have more than one user subscribing to these updates. Rather than run n continuous queries, I intend to run one continuous query for n users with the remote filter backed by some distributed data structure.
I think IgniteSet could work for me. But I am not sure how this will affect the performance of my app server in production since I am not entirely sure how IgniteSet would work (due to minimal documentation on this topic). Basically, if I need to update the ignite set data structure, will it be dynamically updated for all remote nodes as well and will this mean I will start receiving updates for the filter that might be evaluated (to true) on these remote nodes?
qry.setRemoteFilterFactory(new Factory<CacheEntryEventFilter<PersonKey, Person>>() {
#Override public CacheEntryEventFilter<PersonKey, Person> create() {
return new CacheEntryEventFilter<PersonKey, Person>() {
#Override public boolean evaluate(CacheEntryEvent<? extends PersonKey, ? extends Person> e) {
//IgniteSet maintained outside of filter
return igniteSet.contains(e.getKey().getCity());
}
};
}
});
Sorry if I am missing something obvious here.
Any help would be greatly appreciated!
IgniteSet is backed by a cache, and like all Ignite caches, is designed to allow all nodes to see updates as soon as they are available.
see: https://ignite.apache.org/docs/latest/data-structures/queue-and-set for configuration settings.
The design you are proposing is subject to race conditions. A consumer of the continuous query could come in before the appropriate writer had a chance to update the given IgniteSet.
Use appropriate synchronization mechanisms to work out all edge conditions. Examples/Descriptions here: https://ignite.apache.org/features/datastructures.html

ASP.NET Core - caching users from Identity

I'm working with a standard ASP.NET Core 2.1 program, and I've been considering the problem that a great many of my controller methods require the current logged on user to be retrieved.
I noticed that the asp.net core Identity code uses a DBSet to hold the entities and that subsequent calls to it should be reading from the local entities in memory and not hitting the DB, but it appears that every time, my code requires a DB read (I know as I'm running SQL Profiler and see the select queries against AspNetUsers being run using Id as the key)
I know there's so many ways to set Identity up, and its changed over the versions that maybe I'm not doing something right, or is there a fundamental problem here that could be addressed.
I set up the default EF and Identity stores in startup.cs's ConfigureServices:
services.AddDbContext<MyDBContext>(options => options.UseSqlServer(Configuration.GetConnectionString("MyDBContext")));
services.AddIdentity<CustomIdentity, Models.Role>().AddDefaultTokenProviders().AddEntityFrameworkStores<MyDBContext>();
and read the user in each controller method:
var user = await _userManager.GetUserAsync(HttpContext.User);
in the Identity code, it seems that this method calls the UserStore FindByIdAsync method that calls FindAsync on the DBSet of users.
the EF performance paper says:
It’s important to note that two different ObjectContext instances will have two different ObjectStateManager instances, meaning that they have separate object caches.
So what could be going wrong here, any suggestions why ASP.NET Core's EF calls within Userstore are not using the local DBSet of entities? Or am I thinking this wrongly - and each time a call is made to a controller, a new EF context is created?
any suggestions why ASP.NET Core's EF calls within Userstore are not using the local DBSet of entities?
Actually, FindAsync does do that. Quoting msdn (emphasis mine)...
Asynchronously finds an entity with the given primary key values. If an entity with the given primary key values exists in the context, then it is returned immediately without making a request to the store. Otherwise, a request is made to the store for an entity with the given primary key values and this entity, if found, is attached to the context and returned. If no entity is found in the context or the store, then null is returned.
So you can't avoid the initial read per request for the object. But subsequent reads in the same request won't query the store. That's the best you can do outside crazy levels of micro-optimization
Yes. Controller's are instantiated and destroyed with each request, regardless of whether it's the same or a different user making the request. Further, the context is request-scoped, so it too is instantiated and destroyed with each request. If you query the same user multiple times during the same request, it will attempt to use the entity cache for subsequent queries, but you're likely not doing that.
That said, this is a text-book example of premature optimization. Querying a user from the database is an extremely quick query. It's just a simple select statement on a primary key - it doesn't get any more quick or simple as far as database queries go. You might be able to save a few milliseconds if you utilize memory caching, but that comes with a whole set of considerations, particularly being careful to segregate the cache by user, so that you don't accidentally bring in the wrong data for the wrong user. More to the point, memory cache is problematic for a host of reasons, so it's more typical to use distributed caching in production. Once you go there, caching doesn't really buy you anything for a simple query like this, because you're merely fetching it from the distributed cache store (which could even be a database like SQL Server) instead of your database. It only makes sense to cache complex and/or slow queries, as it's only then that retrieving it from cache actually ends up being quicker than just hitting the database again.
Long and short, don't be afraid to query the database. That's what it's there for. It's your source for data, so if you need the data, make the query. Once you have your site going, you can profile or otherwise monitor the performance, and if you notice slow or excessive queries, then you can start looking at ways to optimize. Don't worry about it until it's actually a problem.

How Can a Data Access Object (DAO) Allow Simultaneous Updates to a Subset of Columns?

Please forgive me if I misuse any OOP terminology as I'm still getting my feet wet on the subject.
I've been reading up on object oriented programming (OOP) - specifically for web applications. I have been going over the concept of a data access object (DAO). The DAO is responsible for CRUD (Create, Read, Update, and Delete) methods and connecting your application's service (business logic) layer to the database.
My question specifically pertains to the Update() method within a DAO. In the examples I've read about, developers typically pass a bean object into the DAO update() method as its main argument updateCustomer(customerBean) The method then executes some SQL which updates all of the columns based on the data in the bean.
The problem I see with this logic is that the update() method updates ALL columns within the database based on the bean's data and could theoretically cause it to overwrite columns another user or system might need to update simultaneously.
A simplified example might be:
User 1 updates field A in the bean
User 2 updates field B in the bean
User 2 passes bean to DAO, DAO updates all fields.
User 1 passes bean to DAO, DAO updates all fields.
User 2's changes have been lost!
I've read about Optimistic Locking and Pessimistic Locking as possible solutions for only allowing one update at a time but I can think of many cases where an application needs to allow for editing different parts of a record at the same time without locking or throwing an error.
For example, lets say an administrator is updating a customer's lastName at the same time the customer logs into the web site and the login system needs to update the dateLastLoggedIn column while simultaneously a scheduled task needs to update a lastPaymentReminderDate. In this crazy example, if you were passing a bean object to the update() method and saving the entire record of data each time its possible that whichever process runs the update() method last would overwrite all of the data.
Surely there must be a way to solve this. I've come up with a few possibilities based on my research but I would be curious to know the proper/best way to accomplish this.
Possible solution 1: DAO Update() Method Does Not Accept Bean as Argument
If the update() method accepted a structure of data containing all of the columns in the database that need updating instead of a bean object you could make your SQL statement smart enough to only update the fields that were passed to the method. For example, the argument might look like this:
{
customerID: 1,
firstName: 'John'
}
This would basically tell the update() method to only update the column firstName based on the customerID, 1. This would make your DAO extremely flexible and would give the service layer the ability to dynamically interact with the database. I have a gut feeling that this violates some "golden rule" of OOP but I'm not sure which. I've also never seen any examples online of a DAO behaving like this.
Possible Solution 2: Add additional update() methods to your DAO.
You could also solve this by adding more specific update() methods to your DAO. For example you might have one for dateLastLoggedIn()' and 'dateLastPaymentReminderDate(). This way each service that needs to update the record could theoretically do so simultaneously. Any locking could be done for each specific update method if needed.
The main downside of this approach is that your DAO will start to get pretty muddy with all kinds of update statements and I've seen many blog posts writing about how messy DAOs can quickly become.
How would you solve this type of conundrum with DAO objects assuming you need to allow for updating subsets of record data simultaneously? Would you stick with passing a bean to the DAO or is there some other solution I haven't considered?
If you do a DAO.read() operation that returns a bean, then update the bean with the user's new values, then pass that bean to the DAO.update(bean) method, then you shouldn't have a problem unless the two user operations happen within milliseconds of each other. Your question implies that the beans are being stored in the session scope or something like that before passed to the update() method. If that's what you're doing, don't, for exactly the reasons you described. You don't want your bean getting out of sync with the db record. For even better security, wrap a transaction around the read and update operations, then there'd be no way the two users could step on each other's toes, even if user2 submits his changes at the exact same time as user 1.
Read(), set values, update() is the way to go, I think. Keep the beans fresh. Nobody wants stale beans.

WCF insert into DB using EF

I have created a WCF service with below operation to insert into DB. (Pseudo code)
// Context mode is percall
AddUser
{
var dbCtx = new MyEntity();
dbCtx.DbSet.Add(Record);
dbCtx.SaveChanges();
}
This method is called many times by the client asynchronously. How to improve its performance? How to perform group insert and call savechanges across multiple calls.
For performance improvement, Firstly, you need to benchmark your method calls. for e.g. what time is it taking for the method if it is called by 'n' users.
As one of the option you may use Visual Studio Instrumentation profiler (https://msdn.microsoft.com/en-us/library/dd264994.aspx) to know the HotPath and then work on Performance Improvement.
Also, on the wcf side, you can definitely make some improvements and refer to links
1. https://msdn.microsoft.com/en-us/library/vstudio/Hh273113%28v=VS.100%29.aspx
2. Performance Tuning WCF Service
For EF, you can make optimizations like precompiled queries etc. More details at link https://msdn.microsoft.com/en-us/data/hh949853.aspx

NHibernate: Creating a ConnectionProvider that dynamically chooses which of several databases to connect to?

I have a project that connects to many SQL Server databases. They all have the same schema, but different data. Data is essentially separated by customer. When a request comes in to the asp.net app, it can tell which database is needed and sets up a session.
What we're doing now is creating a new SessionFactory for each customer database. This has worked out alright for a while, but with more customers we're creating more databases. We're starting to run into memory issues because each factory has it's own QueryPlanCache. I wrote a post about my debugging of the memory.
I want to make it so that we have one SessionFactory that uses a ConnectionProvider to open a connection to the right database. What I have so far looks something like this:
public class DatabaseSpecificConnectionProvider : DriverConnectionProvider
{
public override IDbConnection GetConnection()
{
if (!ThreadConnectionString.HasValue)
return base.GetConnection();
var connection = Driver.CreateConnection();
try
{
connection.ConnectionString = ThreadConnectionString.Value;
connection.Open();
}
catch(DbException)
{
connection.Dispose();
throw;
}
return connection;
}
}
This works great if there is only one database needed to handle the request since I can set the connection string in a thread local variable during initization. Where I run into trouble is when I have an admin-level operation that needs to access several databases.
Since the ConnectionProvider has no idea which session is opening the connection, it can't decide which one to use. I could set a thread local variable before opening the session, but that has trouble since the session connections are opened lazily.
I'm also going to need to create a CacheProvider to avoid cache colisions. That's going to have run into a similar problem.
So any ideas? Or is this just asking too much from NHibernate?
Edit: I found this answer that suggests I'd have to rely on some global state which is what I'd like to avoid. If I have multiple sessions active, I'd like the ConnectionProvider to respond with a connection to the appropriate database.
Edit 2: I'm leaning towards a solution that would create a ConnectionProvider for the default Session that is always used for each site. And then for connections to additional databases I'd open the connection and pass it in. The downsides to this I can see is that I can't use the second level cache on ancillary Sessions and I'll have to track and close the connection myself.
I've settled on a workaround and I'm listing it here in case anyone runs across this again.
It turned out I couldn't find anyway to make the ConnectionProvider change databases depending on session. It could only realistically depend on the context of the current request.
In my case, 95% of the time only the one customer's database is going to be needed. I created a SessionFactory and a ConnectionProvider that would handle that. For the remaining corner cases, I created a second SessionFactory and when I open the Session, I pass in a new Connection.
The downside to that is that the Session that talks to the second database can't use the second level cache and I have to make sure I close the connection at the end of the request.
That seems to be working well enough for now, but I'm curious how well it'll stand up in the long run.