NHibernate Filtering data best practices - nhibernate

I have the following situation:
User logs in, opens an overview of all products, can only see a list of products where a condition is added, this condition is variable. Example: WHERE category in ('catA', 'CatB')
Administrator logs in, opens an overview of all products, he can see all products no filter applied.
I need to make this as dynamically as possible. My data access classes are using Generics for most of the time.
I've seen filters but my conditions are very variable, so i don't see this as scalable enough.

We use NH filters for something similar, and it works fine. If no filter needs to be applied, you can omit setting any value for the filter. We use these filters for more basic stuff, filters that are applied nearly 100% of the time, fx deleted objects filters, client data segregating, etc. Not sure what scalability aspect you're looking for?
For more high level and complex filtering, we use a custom class that manipulates a repository root. Something like the following:
public IQueryOver<TIn, TOut> Apply(IQueryOver<TIn, TOut> query)
{
return query.Where(x => ... );
}
If you have an IoC container integrated with your NH usage, something like this can easily be generalized and plugged into your stack. We have these repository manipulators that do simple where clauses, and others that generate complex where clauses that reference domain logic and others that joins a second table on and filters on that.

You could save all categories in an category list and pass this list to the query. If the list is not null and contains elements you can work with the following:
List<string> allowedCategoriesList = new List<string>();
allowedCategoriesList.Add(...);
...
.WhereRestrictionOn(x => x.category).IsIn(allowedCategoriesList)
It's only important to skip this entry if you do not have any filters (so, you want to see all entries without filtering), as you will otherwise see not one single result.

Related

How to handle concurrency in faunadb

I've some backend APIs which connect to faunadb; I'm able to do everything I need with data but I've some serious doubts about concurrent modifications (which maybe are not strictly related to faunadb only, but I'd like to understand how to deal with it using this technology).
One example above all: I want to create a new document (A) in a collection (X) which is linked (via reference or other fields) to other documents (B and C) in another collection (Y); in order to be linked, these documents (B and C) must satisfy a condition (e.g. field F = "V"). Once A has been created, B and C cannot be modified (or the condition will be invalidated!).
Of course the API to create the document A can run concurrently with the API used to modify documents B and C.
Here comes the doubt: what if, while creating the document A linked to document B and C, someone else changes field F of document B to something different from "V"?
I could end up with A linked to a wrong document, because both APIs don't know what the other one is doing..
Do I need to use the "Do" function in both APIs to create atomic transactions? So I can:
Check if B and C are valid and, if yes, create A in a single transaction
Check if B is linked to A and, if it doesn't, modify it in a single transaction
Thanks everyone.
Fauna tries to present a consistent data view no matter or when your clients need to ask. Isolation of transaction effects is what matters on short time scales (typically less than 10ms).
The Do function merely lets you combine multiple disparate FQL expressions into a single query. There is no conditional processing aspect to Do.
You can certainly check conditions before undertaking operations, and all Fauna queries are atomic transactions: all of the query succeeds or none of its does.
Arranging for intermediate query values in order to perform conditional logic does tend to make FQL queries more complex, but they are definitely possible:
The query for your first API might look something like this:
Let(
{
document_b: Get(<reference to B document>),
document_c: Get(<reference to C document>),
required_b: Select(["data", "required_field"], Var("document_b"),
required_c: Select(["data", "other_required"], Var("document_c"),
condition: And(Var("required_b"), Var("required_c")),
},
If(
Var("condition"),
Create(Collection("A"), { data: { <document A data }}),
Abort("Cannot create A because the conditions have not been met.")
)
)
The Let function allows you to compose named values for intermediate expressions, which can read or write whatever they need, along with logical operations that determine which conditions need to be tested. The value composition is followed by an expression which, in this example, tests the conditions and only creates the document in the A collection when the conditions are met. When the conditions are not met, the transaction is aborted with an appropriate error message.
Let can nest Lets as much as required, provide the query fits within the maximum query length of 16MB, so you can embed a significant amount of logic into your queries. When the length of a single query is not sufficient, you can define UDFs that can be called, which allow you to store business logic that you can use any number of times.
See the E-commerce tutorial for a UDF that performs all of the processing required to submit an order, check if there is sufficient product in stock, deduct requested quantities from inventory, set backordered status, and create the order.

Efficient Querying Data With Shared Conditions

I have multiple sets of data which are sourced from an Entity Framework code-first context (SQL CE). There's a GUI which displays the number of records in each query set, and upon changing some set condition (e.g. Date), the sets all need to recalculate their "count" value.
While every set's query is slightly different in some way, most of them share common conditions in some way. A simple example:
RelevantCustomers = People.Where(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0 && P.Type=="Customer")
RelevantSuppliers = People.Where(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0 && P.Type=="Supplier")
So the thing is, there's enough of these demanding queries, that each time the user changes some condition (e.g. SelectedDate), it takes a really long time to recalculate the number of records in each set.
I realise that part of the reason for this is the need to query through, for example, the transactions each time to check what is really the same condition for both RelevantCustomers and RelevantSuppliers.
So my question is that, given these sets share common "base conditions" which depend on the same sets of data, is there some more efficicent way I could be calculating these sets?
I was thinking something with custom generic classes like this:
QueryGroup<People>(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0)
{
new Query<People>("Customers", P=>P.Type=="Customer"),
new Query<People>("Suppliers", P=>P.Type=="Supplier")
}
I can structure this just fine, but what I'm finding is that it makes basically no difference to the efficiency as it still needs to repeat the "shared condition" for each set.
I've also tried pulling the base condition data out as a static "ToList()" first, but this causes issues when running into navigation entities (i.e. People.Addresses don't get loaded).
Is there some method I'm not aware of here in terms of efficiency?
Thanks in advance!
Give something like this a try: Combine "similar" values into fewer queries, then separate the results afterwards. Also, use Any() rather than Count() for exists check. Your updated attempt goes part-way, but will still result in 2x hits to the database. Also, when querying it helps to ensure that you are querying against indexed fields, and those indexes will be more efficient with numeric IDs rather than strings. (I.e. a TypeID of 1 vs. 2 for "Customer" vs. "Supplier") Normalized values are better for indexing and lead to smaller records, at the cost of extra verbose queries.
var types = new string[] {"Customer", "Supplier"};
var people = People.Where(p => types.Contains(p.Type)
&& p.Transactions.Any(t => t.Date > selectedDate)).ToList();
var relevantCustomers = people.Where(p => p.Type == "Customer").ToList();
var relevantSuppliers = people.Where(p => p.Type == "Supplier").ToList();
This results in just one hit to the database, and the Any should be more perform-ant than fetching an entire count. We split the customers and suppliers after the fact from the in-memory set. The caveat here is that any attempt to access details such as transactions etc. on customers and suppliers would result in lazy-load hits since we didn't eager load them. If you need entire entity graphs then be sure to .Include() relevant details, or be more selective on the data extracted from the first query. I.e. select anonymous types with the applicable details rather than just the entity.

How to query multiple aggregates efficiently with DDD?

When I need to invoke some business method, I need to get all aggregate roots related to the operation, even if the operation is as primitive as the one given below (just adding item into a collection). What am I missing? Or is CRUD-based approach where you run one single query including table joins, selects and insert at the end - and database engine does all the work for you - actually better in terms of performance?
In the code below I need to query separate aggregate root (which creates another database connection and sends another select query). In real world applications I have been querying a lot more than one single aggregate, up to 8 for a single business action. How can I improve performance/query overhead?
Domain aggregate roots:
class Device
{
Set<ParameterId> parameters;
void AddParameter(Parameter parameter)
{
parameters.Add(parameter.Id);
}
}
class Parameter
{
ParameterId Id { get; }
}
Application layer:
class DeviceApplication
{
private DeviceRepository _deviceRepo;
private ParameterRepository _parameterRepo;
void AddParameterToDevice(string deviceId, string parameterId)
{
var aParameterId = new ParameterId(parameterId);
var aDeviceId = new DeviceId(deviceId);
var parameter = _parameterRepo.FindById(aParameterId);
if (parameter == null) throw;
var device = _deviceRepo.FindById(aDeviceId);
if (device == null) throw;
device.AddParameter(parameter);
_deviceRepo.Save(device);
}
}
Possible solution
I've been told that you can pass just an Id of another aggregate like this:
class Device
{
void AddParameter(ParameterId parameterId)
{
parameters.Add(parameterId);
}
}
But IMO it breaks incapsulation (by explicitely emphasizing term ID into the business), also it doesn't prevent from pasting wrong or otherwise incorrect identity (created by user).
And Vaughn Vernon gives examples of application services that use the first approach (passing whole aggregate instance).
The short answer is - don't query your aggregates at all.
An aggregate is a model that exposes behaviour, not data. Generally, it is considered a code smell to have getters on aggregates (ID is the exception). This makes querying a little tricky.
Broadly speaking there are 2 related ways to go about solving this. There are probably more but at least these don't break the encapsulation.
Option 1: Use domain events -
By getting your domain (aggregate roots) to emit events which illustrate the changes to internal state you can build up tables in your database specifically designed for querying. Done right you will have highly performant, denormalised queryable data, which can be linearly scaled if necessary. This makes for very simple queries. I have an example of this on this blog post: How to Build a Master-Details View when using CQRS and Event Sourcing
Option 2: Infer query tables -
I'm a huge fan of option 1 but if you don't have an event sourced approach you will still need to persist the state of your aggregates at some point. There are all sorts of ways to do this but you could plug into the persistence pipeline for your aggregates a process whereby you extract queryable data into a read model for use with your queries.
I hope that makes sense.
If you figured out that having RDBMS query with joins will work in this case - probably you have wrong aggregate boundaries.
For example - why would you need to load the Parameter in order to add it to the Device? You already have the identity of this Parameter, all you need to do is to add this id to the list of references Parameters in the Device. If you do it in order to satisfy your ORM - you're most probably doing something wrong.
Also remember that your aggregate is the transactional boundary. You really would want to complete all database operations inside one transaction and one connection.

Getting specific Backbone.js models from a collection without getting all models first

I'm new to Backbone.js. I'm intrigued by the idea that you can just supply a URL to a collection and then proceed to create, update, delete, and get models from that collection and it handle all the interaction with the API.
In the small task management sample applications and numerous demo's I've seen of this on the web, it seems that the collection.fetch() is used to pull down all models from the server then do something with them. However, more often than not, in a real application, you don't want to pull down hundreds of thousands or even millions of records by issuing a GET statement to the API.
Using the baked-in connection.sync method, how can I specify parameters to GET specific record sets? For example, I may want to GET records with a date of 2/1/2014 or GET records that owned by a specific user id.
In this question, collection.find is used to do this, but does this still pull down all records to the client first then "finds" them or does the collection.sync method know to specify arguments when doing a GET to the server?
You do use fetch, but you provide options as seen in collection.fetch([options]).
So for example to obtain the one model where id is myIDvar:
collection.fetch(
{
data: { id: myIDvar },
success: function (model, response, options) {
// do a little dance;
}
};
My offhand recollections is that find, findWhere and where would invoke all models being downloaded and then the filtering taking place on the client. I believe with fetch the filtering takes places on the server side.
You can implement some kind of pagination on server side and update your collection with limited number of records. In this case all your data will be up to date with backend.
You can do it by overriding fetch method with you own implementaion, or specify params
For example:
collection.fetch({data: {page: 3})
You can also use find where method here
collection.findWhere(attributes)

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.