Single Instance VS PerCall in WCF - wcf

There are a lot of posts saying that SingleInstance is a bad design. But I think it is the best choice in my situation.
In my service I have to return a list of currently logged-in users (with additional data). This list is identical for all clients. I want to retrieve this list from database every 5 seconds (for example) and return a copy of it to the client, when needed.
If I use PerCall instancing mode, I will retrieve this list from database every single time. This list is supposed to contain ~200-500 records, but can grow up to 10 000 in the future. Every record is complex and contains about 10 fields.
So what about performance? Is it better to use "bad design" and get list once or to use "good approach" and get list from database on every call?

So what about performance? Is it better to use "bad design" and get
list once or to use "good approach" and get list from database on
every call?
Performance and good design are NOT mutually exclusive. The problem with using a single instance is that it can only service a single request at a time. So all other requests are waiting on it to finish doing it's thing.
Alternatively you could just leverage a caching layer to hold the results of your query instead of coupling that to your service.
Then your code might look something like this:
public IEnumerable<BigDataRecord> GetBigExpensiveQuery(){
//Double checked locking pattern is necessary to prevent
// filling the cache multiple times in a multi-threaded
// environment
if(Cache["BigQuery"] == null){
lock(_bigQueryLock){
if(Cache["BigQuery"] == null){
var data = DoBigQuery();
Cache.AddCacheItem(data, TimeSpan.FromSeconds(5));
}
}
}
return Cache["BigQuery"];
}
Now you can have as many instances as you want all accessing the same Cache.

Related

Ravendb memory leak on query

I'm having a hard problem solving an issue with RavenDB.
At my work we have a process to trying to identify potential duplicates in our database on a specified collection (let's call it users collection).
That means, I'm iterating through the collection and for each document there is a query that is trying to find similar entities. So just imagine, it's quite a long task to run.
My problem is, when the task starts running, the memory consumption for RavenDB is going higher and higher, it's literally just growing and growing, and it seems to continue until it reaches the maximum memory of the system.
But it doesn't really makes sense, since I'm only doing query, I'm using one single index and take a default page size when querying (128).
Anybody meet a similar problem like this? I really have no idea what is going on in ravendb. but it seems like a memory leak.
RavenDB version: 3.0.179
When i need to do massive operations on large collections i work following this steps to prevent problems on memory usage:
I use Query Streaming to extract all the ids of the documents that i want to process (with a dedicated session)
I open a new session for each id, i load the document and then i do what i need
First, a recommendation: if you don't want duplicates, store them with a well-known ID. For example, suppose you don't want duplicate User objects. You'd store them with an ID that makes them unique:
var user = new User() { Email = "foo#bar.com" };
var id = "Users/" + user.Email; // A well-known ID
dbSession.Store(user, id);
Then, when you want to check for duplicates, just check against the well known name:
public string RegisterNewUser(string email)
{
// Unlike .Query, the .Load call is ACID and never stale.
var existingUser = dbSession.Load<User>("Users/" + email);
if (existingUser != null)
{
return "Sorry, that email is already taken.";
}
}
If you follow this pattern, you won't have to worry about running complex queries nor worry about stale indexes.
If this scenario can't work for you for some reason, then we can help diagnose your memory issues. But to diagnose that, we'll need to see your code.

How to query multiple aggregates efficiently with DDD?

When I need to invoke some business method, I need to get all aggregate roots related to the operation, even if the operation is as primitive as the one given below (just adding item into a collection). What am I missing? Or is CRUD-based approach where you run one single query including table joins, selects and insert at the end - and database engine does all the work for you - actually better in terms of performance?
In the code below I need to query separate aggregate root (which creates another database connection and sends another select query). In real world applications I have been querying a lot more than one single aggregate, up to 8 for a single business action. How can I improve performance/query overhead?
Domain aggregate roots:
class Device
{
Set<ParameterId> parameters;
void AddParameter(Parameter parameter)
{
parameters.Add(parameter.Id);
}
}
class Parameter
{
ParameterId Id { get; }
}
Application layer:
class DeviceApplication
{
private DeviceRepository _deviceRepo;
private ParameterRepository _parameterRepo;
void AddParameterToDevice(string deviceId, string parameterId)
{
var aParameterId = new ParameterId(parameterId);
var aDeviceId = new DeviceId(deviceId);
var parameter = _parameterRepo.FindById(aParameterId);
if (parameter == null) throw;
var device = _deviceRepo.FindById(aDeviceId);
if (device == null) throw;
device.AddParameter(parameter);
_deviceRepo.Save(device);
}
}
Possible solution
I've been told that you can pass just an Id of another aggregate like this:
class Device
{
void AddParameter(ParameterId parameterId)
{
parameters.Add(parameterId);
}
}
But IMO it breaks incapsulation (by explicitely emphasizing term ID into the business), also it doesn't prevent from pasting wrong or otherwise incorrect identity (created by user).
And Vaughn Vernon gives examples of application services that use the first approach (passing whole aggregate instance).
The short answer is - don't query your aggregates at all.
An aggregate is a model that exposes behaviour, not data. Generally, it is considered a code smell to have getters on aggregates (ID is the exception). This makes querying a little tricky.
Broadly speaking there are 2 related ways to go about solving this. There are probably more but at least these don't break the encapsulation.
Option 1: Use domain events -
By getting your domain (aggregate roots) to emit events which illustrate the changes to internal state you can build up tables in your database specifically designed for querying. Done right you will have highly performant, denormalised queryable data, which can be linearly scaled if necessary. This makes for very simple queries. I have an example of this on this blog post: How to Build a Master-Details View when using CQRS and Event Sourcing
Option 2: Infer query tables -
I'm a huge fan of option 1 but if you don't have an event sourced approach you will still need to persist the state of your aggregates at some point. There are all sorts of ways to do this but you could plug into the persistence pipeline for your aggregates a process whereby you extract queryable data into a read model for use with your queries.
I hope that makes sense.
If you figured out that having RDBMS query with joins will work in this case - probably you have wrong aggregate boundaries.
For example - why would you need to load the Parameter in order to add it to the Device? You already have the identity of this Parameter, all you need to do is to add this id to the list of references Parameters in the Device. If you do it in order to satisfy your ORM - you're most probably doing something wrong.
Also remember that your aggregate is the transactional boundary. You really would want to complete all database operations inside one transaction and one connection.

Work with dto's to build an API with DDD

I'm starting to work with dto's (data transfer objects) and I have some doubts about the best way to build the system architecture of the API.
Imagine a domain entity 'A', with relations to 'B', 'C' and 'D'. We have a service 'S' that return a json list with all "A's". It's correct to create an 'ADTO' in that service, fill with "BDTO's", "CDTO's" and "DDTO's"? If then we have another service "S2", and we need to return an specific set of "B's", then we need to create another tree of "B2DTO's" with "C2DTOS's", "D2DTO's"... ? Is this the correct way to do it?
I see that this way, we'll have a huge and complex tree of DTO's, with an specific DTO's for each use case.
EDIT:
I forgot the assemblers part. Is necessary to implement a different assembler for every DTO? for example, for an entity A, we have two DTO's. Can I use the same assembler or is better to have A1Assembler and A2Assembler?
Your DTOs should represent a set of data that you want your client to have. Usually, you should never 'copy' your entities into DTOs because you may have fields that you don't want to share with the world. Let's supposed that you are creating automatically a 'tracking' column with the ID of who entered that data, or say that you have a Customer entity with password fields. You don't want that to be part of your DTOs. That's why you must be EXTRA CAREFUL when using AutoMapper etc.
When you design DTOs think about what your client needs from that endpoint specifically. Sometimes DTOs may look the same and that's ok. Also, your DTOs can be as simple or as complex as needed. One crazy example, lets say that a page where you show an artist, his songs, the voting rate for those songs and some extra data.
If your use case justifies it, you may very well put all of that into a DTO. DTO all they do is carry data.
YES, your services should return DTOS (POCO).
Also, DTO is just naming convention. Don't get caught up in the "dto" suffix. A 'command' coming from a client is a DTO, but in that case you would call it AddNewCustomerCommand for example.
Makes sense?
I think you mistake what your DTO's are. You'll have 2 kind of DTO's Roughly speaking
1) they can be your domain entities, then you can return ADTO, BDTO and CDTO. But those DTO's can be fairly consistent (why would B2DTO be any different from BDTO)
If you look at what your json would look at
{
Id: 1
name: "foobar",
$type: "A",
B: [ {
name: "b-bar",
$type: "B"}]
CIds: [ 2,23, 42]
}
Here you see 2 kind of objects, some (B's) are returned in full in your DTO as subobjects. Others (like C) are turned by Id and can be queried separately. If it's S2 which implements the C query or not you don't care about.
2) When you get to an architecture like CQRS then you do get different DTO's. (projections or commands) but then you would also see this in the naming of the DTO's. Forexample are
AListOnOverviewPageDTO, AUserEditDetailDTO, etc.
Now it makes very much sense to have different DTO's since they are projections representing very different usecases. (and not a full object as is common in DDD)
Update The reason you want different DTO's are twofold. First of all it allows you to optimize each call separately. Maybe the list needs to be faster so (using CQRS) you can put the right indexes on your data so your listDTO will be returned faster. It allows you to reason about usecases more easily. Since each DTO represents 1 usecase/screen (Otherwise you get cases ok in the userlistDTO i need to populate only these 3 fields in this case ..etc.).
Furthermore you need to make sure you API is honest. NEVER EVER return a "age" field with empty data but have some other call return the same user but with another call return the same user with a real age. It makes your backend appear broken. However if i had a call to /users/list and another call to /users/1/detail it would be natural if the detail calls returned more fields about a specific user

Getting specific Backbone.js models from a collection without getting all models first

I'm new to Backbone.js. I'm intrigued by the idea that you can just supply a URL to a collection and then proceed to create, update, delete, and get models from that collection and it handle all the interaction with the API.
In the small task management sample applications and numerous demo's I've seen of this on the web, it seems that the collection.fetch() is used to pull down all models from the server then do something with them. However, more often than not, in a real application, you don't want to pull down hundreds of thousands or even millions of records by issuing a GET statement to the API.
Using the baked-in connection.sync method, how can I specify parameters to GET specific record sets? For example, I may want to GET records with a date of 2/1/2014 or GET records that owned by a specific user id.
In this question, collection.find is used to do this, but does this still pull down all records to the client first then "finds" them or does the collection.sync method know to specify arguments when doing a GET to the server?
You do use fetch, but you provide options as seen in collection.fetch([options]).
So for example to obtain the one model where id is myIDvar:
collection.fetch(
{
data: { id: myIDvar },
success: function (model, response, options) {
// do a little dance;
}
};
My offhand recollections is that find, findWhere and where would invoke all models being downloaded and then the filtering taking place on the client. I believe with fetch the filtering takes places on the server side.
You can implement some kind of pagination on server side and update your collection with limited number of records. In this case all your data will be up to date with backend.
You can do it by overriding fetch method with you own implementaion, or specify params
For example:
collection.fetch({data: {page: 3})
You can also use find where method here
collection.findWhere(attributes)

ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database.
Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of Data Services and navigation properties on derived types. STILL not fixed in .NET 4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods)
Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database, ie the query never makes it past the web server. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye.
Details:
Data model: Agent->Person->Student
Student has a collection of referrals
Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent->Organization->School) can be accessed by anyone who has authenticated.
The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries.
Sample Interceptor
[QueryInterceptor("Agents")]
public Expression<Func<Agent, Boolean>> OnQueryAgents()
{
//Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist
return ag =>
(ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) &&
ag.Person.OrganizationPersons.Count<OrganizationPerson>(op =>
op.Organization.ScopePermissions.Any<ScopePermission>
(p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) ||
op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission>
(p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0;
}
The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above.
Sample Query
This sample query is just that, a sample, intended to illustrate to third parties how to access the data using the provided web service. I realize a production query wouldn't have that many expands. (But also remember that to get the entire object in the OOP sense I need an Agent, Person, and Student row.)
var referrals =
(from r in service.Referrals
.Expand("Organization/ParentOrganization")
.Expand("Educator/Person/Agent")
.Expand("Student/Person/Agent")
.Expand("Student")
.Expand("Grade")
.Expand("ProblemBehavior")
.Expand("Location")
.Expand("Motivation")
.Expand("AdminDecision")
.Expand("OthersInvolved")
where
r.DateCreated >= coupledays &&
r.DateDeleted == null
select r);
Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that existing database logic can't be changed (though I can add to it) and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. These outside parties will be performing regular batch jobs to import our data into their database/data-warehouse.
THANK YOU!!!
UPDATE: Posted this issue on MSDN, received similar feedback. http://social.msdn.microsoft.com/Forums/en-US/adodotnetdataservices/thread/1ccfc96c-dd35-4879-b36b-57e915d5e02f/
I'm just guessing here... but doing that many expands is almost never a good idea. The query will undoubtedly expand into some pretty awful SQL, that could easily cause timeouts.
Add TPT to the equation and things only get worse :(
Alex