ADO.NET (WCF) Data Services Query Interceptor Hangs IIS - wcf

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database.
Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of Data Services and navigation properties on derived types. STILL not fixed in .NET 4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods)
Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database, ie the query never makes it past the web server. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye.
Details:
Data model: Agent->Person->Student
Student has a collection of referrals
Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent->Organization->School) can be accessed by anyone who has authenticated.
The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries.
Sample Interceptor
[QueryInterceptor("Agents")]
public Expression<Func<Agent, Boolean>> OnQueryAgents()
{
//Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist
return ag =>
(ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) &&
ag.Person.OrganizationPersons.Count<OrganizationPerson>(op =>
op.Organization.ScopePermissions.Any<ScopePermission>
(p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) ||
op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission>
(p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0;
}
The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above.
Sample Query
This sample query is just that, a sample, intended to illustrate to third parties how to access the data using the provided web service. I realize a production query wouldn't have that many expands. (But also remember that to get the entire object in the OOP sense I need an Agent, Person, and Student row.)
var referrals =
(from r in service.Referrals
.Expand("Organization/ParentOrganization")
.Expand("Educator/Person/Agent")
.Expand("Student/Person/Agent")
.Expand("Student")
.Expand("Grade")
.Expand("ProblemBehavior")
.Expand("Location")
.Expand("Motivation")
.Expand("AdminDecision")
.Expand("OthersInvolved")
where
r.DateCreated >= coupledays &&
r.DateDeleted == null
select r);
Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that existing database logic can't be changed (though I can add to it) and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. These outside parties will be performing regular batch jobs to import our data into their database/data-warehouse.
THANK YOU!!!
UPDATE: Posted this issue on MSDN, received similar feedback. http://social.msdn.microsoft.com/Forums/en-US/adodotnetdataservices/thread/1ccfc96c-dd35-4879-b36b-57e915d5e02f/

I'm just guessing here... but doing that many expands is almost never a good idea. The query will undoubtedly expand into some pretty awful SQL, that could easily cause timeouts.
Add TPT to the equation and things only get worse :(
Alex

Related

Returning objects on CQRS commands with MediatR

I have been reading about MediatR and CQRS latelly and I saw many people saying that commands shouldn't return domain objects. They can return values but they're limited to returning erros values, failure/success information and the Id of the newly created entities.
My question is how to return this new objetct to the client if the command can return only the Id of the new entity.
1) Should I query the database again with this new Id? If so, isn't that bad that I making a new trip to the database to get an object that was in the memory a few seconds ago?
2) What's the correct way of returning the entities created by the commands?
I think the more important question is why you shouldn't return domain objects from commands. If the reason for that seems like a valid reason for you, you should look into alternatives such as executing a query right after the command to fetch the domain object.
If, however, returning the domain object from the command fits your needs and does not impose any direct problems, then why not just do it and keep things simple and straightforward?

How to query multiple aggregates efficiently with DDD?

When I need to invoke some business method, I need to get all aggregate roots related to the operation, even if the operation is as primitive as the one given below (just adding item into a collection). What am I missing? Or is CRUD-based approach where you run one single query including table joins, selects and insert at the end - and database engine does all the work for you - actually better in terms of performance?
In the code below I need to query separate aggregate root (which creates another database connection and sends another select query). In real world applications I have been querying a lot more than one single aggregate, up to 8 for a single business action. How can I improve performance/query overhead?
Domain aggregate roots:
class Device
{
Set<ParameterId> parameters;
void AddParameter(Parameter parameter)
{
parameters.Add(parameter.Id);
}
}
class Parameter
{
ParameterId Id { get; }
}
Application layer:
class DeviceApplication
{
private DeviceRepository _deviceRepo;
private ParameterRepository _parameterRepo;
void AddParameterToDevice(string deviceId, string parameterId)
{
var aParameterId = new ParameterId(parameterId);
var aDeviceId = new DeviceId(deviceId);
var parameter = _parameterRepo.FindById(aParameterId);
if (parameter == null) throw;
var device = _deviceRepo.FindById(aDeviceId);
if (device == null) throw;
device.AddParameter(parameter);
_deviceRepo.Save(device);
}
}
Possible solution
I've been told that you can pass just an Id of another aggregate like this:
class Device
{
void AddParameter(ParameterId parameterId)
{
parameters.Add(parameterId);
}
}
But IMO it breaks incapsulation (by explicitely emphasizing term ID into the business), also it doesn't prevent from pasting wrong or otherwise incorrect identity (created by user).
And Vaughn Vernon gives examples of application services that use the first approach (passing whole aggregate instance).
The short answer is - don't query your aggregates at all.
An aggregate is a model that exposes behaviour, not data. Generally, it is considered a code smell to have getters on aggregates (ID is the exception). This makes querying a little tricky.
Broadly speaking there are 2 related ways to go about solving this. There are probably more but at least these don't break the encapsulation.
Option 1: Use domain events -
By getting your domain (aggregate roots) to emit events which illustrate the changes to internal state you can build up tables in your database specifically designed for querying. Done right you will have highly performant, denormalised queryable data, which can be linearly scaled if necessary. This makes for very simple queries. I have an example of this on this blog post: How to Build a Master-Details View when using CQRS and Event Sourcing
Option 2: Infer query tables -
I'm a huge fan of option 1 but if you don't have an event sourced approach you will still need to persist the state of your aggregates at some point. There are all sorts of ways to do this but you could plug into the persistence pipeline for your aggregates a process whereby you extract queryable data into a read model for use with your queries.
I hope that makes sense.
If you figured out that having RDBMS query with joins will work in this case - probably you have wrong aggregate boundaries.
For example - why would you need to load the Parameter in order to add it to the Device? You already have the identity of this Parameter, all you need to do is to add this id to the list of references Parameters in the Device. If you do it in order to satisfy your ORM - you're most probably doing something wrong.
Also remember that your aggregate is the transactional boundary. You really would want to complete all database operations inside one transaction and one connection.

Why is Orchard so slow when executing a content item query?

Lets say i want to query all Orchard user IDs and i want to include those users that have been removed (aka soft deleted) also. The DB contains around 1000 users.
Option A - takes around 2 minutes
Orchard.ContentManagement.IContentManager lContentManager = ...;
lContentManager
.Query<Orchard.Users.Models.UserPart, Orchard.Users.Models.UserPartRecord>(Orchard.ContentManagement.VersionOptions.AllVersions)
.List()
.Select(u => u.Id)
.ToList();
Option B - executes with almost unnoticeable delay
Orchard.Data.IRepository<Orchard.Users.Models.UserPartRecord> UserRepository = ...;
UserRepository .Fetch(u => true).Select(u => u.Id).ToList();
I don't see any SQL queries being executed in SQL Profiler when using Option A. I guess it has something to do with NHibernate or caching.
Is there any way to optimize Option A?
Could it be because the IContentManager version is accessing the data via InfoSet (basically an xml representation of the data), where as the IRepository version uses the actual DB table itself.
I seem to remember reading that though Infoset is great in many cases, when you're dealing with larger datasets with sorting / filtering it is more efficient to go direct to the table, as using Infoset requires each xml fragment to be parsed and elements extracted before you get to the data.
Since 'the shift', Orchard uses both so you can use whichever method best suits to your needs. I can't find the article that explained it now, but this explains the shift & infosets quite nicely:
http://weblogs.asp.net/bleroy/the-shift-how-orchard-painlessly-shifted-to-document-storage-and-how-it-ll-affect-you
Hope that helps you?

Single Instance VS PerCall in WCF

There are a lot of posts saying that SingleInstance is a bad design. But I think it is the best choice in my situation.
In my service I have to return a list of currently logged-in users (with additional data). This list is identical for all clients. I want to retrieve this list from database every 5 seconds (for example) and return a copy of it to the client, when needed.
If I use PerCall instancing mode, I will retrieve this list from database every single time. This list is supposed to contain ~200-500 records, but can grow up to 10 000 in the future. Every record is complex and contains about 10 fields.
So what about performance? Is it better to use "bad design" and get list once or to use "good approach" and get list from database on every call?
So what about performance? Is it better to use "bad design" and get
list once or to use "good approach" and get list from database on
every call?
Performance and good design are NOT mutually exclusive. The problem with using a single instance is that it can only service a single request at a time. So all other requests are waiting on it to finish doing it's thing.
Alternatively you could just leverage a caching layer to hold the results of your query instead of coupling that to your service.
Then your code might look something like this:
public IEnumerable<BigDataRecord> GetBigExpensiveQuery(){
//Double checked locking pattern is necessary to prevent
// filling the cache multiple times in a multi-threaded
// environment
if(Cache["BigQuery"] == null){
lock(_bigQueryLock){
if(Cache["BigQuery"] == null){
var data = DoBigQuery();
Cache.AddCacheItem(data, TimeSpan.FromSeconds(5));
}
}
}
return Cache["BigQuery"];
}
Now you can have as many instances as you want all accessing the same Cache.

Memory leak in Rails 3.0.11 migration

A migration contains the following:
Service.find_by_sql("select
service_id,
registrations.regulator_given_id,
registrations.regulator_id
from
registrations
order by
service_id, updated_at desc").each do |s|
this_service_id = s["service_id"]
if this_service_id != last_service_id
Service.find(this_service_id).update_attributes!(:regulator_id => s["regulator_id"],
:regulator_given_id => s["regulator_given_id"])
last_service_id = this_service_id
end
end
and it is eating up memory, to the point where it will not run in the 512MB allowed in Heroku (the registrations table has 60,000 items). Is there a known problem? Workaround? Fix in a later version of Rails?
Thanks in advance
Edit following request to clarify:
That is all the relevant source - the rest of the migration creates the two new columns that are being populated. The situation is that I have data about services from multiple sources (regulators of the services) in the registrations table. I have decided to 'promote' some of the data ([prime]regulator_id and [prime]regulator_given_key) into the services table for the prime regulators to speed up certain queries.
This will load all 60000 items in one go and keep those 60000 AR objects around, which will consume a fair amount of memory. Rails does provide a find_each method for breaking down a query like that into chunks of 1000 objects at a time, but it doesn't allow you to specify an ordering as you do.
You're probably best off implementing your own paging scheme. Using limit/offset is a possibility however large OFFSET values are usually inefficient because the database server has to generate a bunch of results that it then discards.
An alternative is to add conditions to your query that ensures that you don't return already processed items, for example specifying that service_id be less than the previously returned values. This is more complicated if when compared in this matter some items are equal. With both of these paging type schemes you probably need to think about what happens if a row gets inserted into your registrations table while you are processing it (probably not a problem with migrations, assuming you run them with access to the site disabled)
(Note: OP reports this didn't work)
Try something like this:
previous = nil
Registration.select('service_id, regulator_id, regulator_given_id')
.order('service_id, updated_at DESC')
.each do |r|
if previous != r.service_id
service = Service.find r.service_id
service.update_attributes(:regulator_id => r.regulator_id, :regulator_given_id => r.regulator_given_id)
previous = r.service_id
end
end
This is a kind of hacky way of getting the most recent record from regulators -- there's undoubtedly a better way to do it with DISTINCT or GROUP BY in SQL all in a single query, which would not only be a lot faster, but also more elegant. But this is just a migration, right? And I didn't promise elegant. I also am not sure it will work and resolve the problem, but I think so :-)
The key change is that instead of using SQL, this uses AREL, meaning (I think) the update operation is performed once on each associated record as AREL returns them. With SQL, you return them all and store in an array, then update them all. I also don't think it's necessary to use the .select(...) clause.
Very interested in the result, so let me know if it works!