How to Model Complex Query Classes in CQRS - oop

How to Model Query classes (CQRS), given that data is accumulated from various places and business logic is then run on top of this data. Currently, we have code to pull out required data in Manager class and business logic in Domain Model. Is there a better way. high level suggestions will help. Hiererachy is webapi Controller-> Manager -> DomainModel |-> Infrastructure( to get required data)

Generally speaking, write models (generated from Commands) are not mirroring the read models (fetch from the Queries).
Write models (Aggregate Roots) are designed to ensure consistency and invariants of a domain, while read models are mostly used to build UI and/or an API.
If you design a simple domain for a Blog, you may have a Post aggregate and PostSummary as well as PostDetails or even a simple Post.
Both named similarly but in a different context of use.
Your Aggregate will probably refers its author only by reference (id) while your read model may be a flattened and pre-built with all the necessary informations required for your UI.
You end up with two models where your Aggregate does not even expose any getters (that's the read model purpose).

It sounds like you're doing just the C part of CQRS, not the Q. In CQRS there are 2 data models, one that is updated via commands (the write model) and one that is custom made just for display purposes (the read model). when a command makes a change to data, it loads a full aggregate with business rules from the write model, makes appropriate changes, and saves. It then (usually by sending a message) requests an update of the read model.
The read model is a collection of tables that are custom built for specific target UI pages. Data duplication is everywhere. The idea is that reads should be very fast because they are just a "select *" query from the read table.
If you had implemented a read model, then your question doesn't make sense because there are no complex query classes. If you've not implemented CQRS, then normal advice would apply, such as creating repositories to contain the query, etc.

Related

Best practice around GraphQL nesting depth

Is there an optimum maximum depth to nesting?
We are often presented with the option to try to represent complex heirarchical data models with the nesting they demonstrate in real life. In my work this is genetics and modelling protein / transcript / homology relationships where it is possible to have very deep nesting up to maybe 7/8 levels. We use dataloader to make nested batching more efficient and resolver level caching with directives. Is it good practice to model a schema on a real life data model or should you focus on making your resolvers reasonable to query and keep nesting to a maximum ideal depth of say 4 levels?
When designing a schema is it better to create a different parent resolver for a type or use arguments that to direct a conditional response?
If I have two sets of for example ‘cars’ let’s say I have cars produced by Volvo and cars produced by tesla and the underlying data while having similarities is originally pulled from different apis with different characteristics. Is it best practice to have a tesla_cars and volvo_cars resolver or one cars resolver which uses for example a manufacturer argument to act differently on the data it returns and homogenise the response especially where there may then be a sub resolver that expects certain fields which may not be similar in the original data.
Or is it better to say that these two things are both cars but the shape of the data we have for them is significantly different so its better to create seperate resolvers with totally or notably different fields?
Should my resolvers and graphQL apis try to model the data they describe or should I allow duplication in order to create efficient application focused queries and responses?
We often find ourselves wondering do we have a seperate API for application x and y that maybe use underlying data and possibly even multiple sources (different databases or even API calls) inside resolvers very differently or should we try to make a resolver work with any application even if that means using type like arguments to allow custom filtering and conditional behaviour?
Is there an optimum maximum depth to nesting?
In general I'd say: don't restrict your schema. Your resolvers / data fetchers will only get called when the client requests the corresponding fields.
Look at it from this point of view: If your client needs the data from 8 levels of the hierarchy to work, then he will ask for it no matter what. With a restricted schema the client will execute multiple requests. With an unrestricted schema he can get all he needs in a single request. Though the amount processing on your server side and amount of data will still be the same, just split across multiple network requests.
The unrestricted schema has several benefits:
The client can decide if he wants all the data at once or use multiple requests
The server may be able to optimize the data fetching process (i.e. don't fetch duplicate data) when he knows everything the client wants to receive
The restricted schema on the other hand has only downsides.
When designing a schema is it better to create a different parent resolver for a type or use arguments that to direct a conditional response
That's a matter of taste and what you want to achieve. But if you expect your application to grow and incorporate more car manufacturers, your API may become messy, if there are lot's of abc_cars and xyz_cars queries.
Another thing to keep in mind: Even if the shape of data is different, all cars have something in common: They are some kind of type Car. And all of them have for example a construction year. If you now want to be able to query "all cars sorted by construction year" you will need a single query endpoint.
You can have a single cars query endpoint in your api an then use interfaces to query different kinds of cars. Just like GraphQL Relay's node endpoint works: Single endpoint that can query all types that implement the Node interface.
On the other hand, if you've got a very specialized application, where your type is not extensible (like for example white and black chess pieces), then I think it's totally valid to have a white_pieces and black_pieces endpoint in your API.
Another thing to keep in mind: With a single endpoint some queries become extremely hard (or even impossible), like "sort white_pieces by value ascending, and black_pieces by value descending". This is much easier if there are separate endpoints for each color.
But even this is solvable if you have a single endpoint for all pieces, and simply call it twice.
Should my resolvers and graphQL apis try to model the data they describe or should I allow duplication in order to create efficient application focused queries and responses?
That's question of use case and scalability. If you have exactly two types of clients that use the API in different ways, just build two seperate APIs. But if you expect your application to grow, get more different clients, then of course it will become an unmaintainable mess to have 20 APIs.
In this case have a look at schema directives. You can for example decorate your types and fields to make them behave differently for each client or even show/hide parts of your API depending on the client.
Summary:
Build your API with your clients in mind.
Keep things object oriented, make use of interfaces for similar types.
Don't provide endpoints you clients don't need, you can still extend your schema later if necessary.
Think of your data a huge graph ;) that's what GraphQL is all about.

In DDD, are repositories the only type of classes which can touch persistence?

In DDD, aggregate roots are persisted via repositories. But are repositories the only classes that can touch persistence in a bounded context?
I am using CQRS along side DDD. In the query side, things like view count, upvotes, these things need to be persisted but I feel it is awkward to model them as aggregate roots. I am limiting DDD aggregate root modeling to the command side. The query side is not allowed to use repositories. But often query side asks for small amount of persistence capabilities.
Also, I am using domain events, certain domain events also need to be persisted. I need something called event storage, but I only heard such terms appear in event sourcing (ES) and I am not using ES.
If such persistent classes are indeed needed. How do I call them, which layer should they belong to?
[Update]
When I read answers below, I realized my question is a bit ambiguous. By touch, I mainly mean write (and also including read).
Thanks.
In the query side, things like view count, upvotes, these things need
to be persisted
Not necessarily. CQRS doesn't specify
whether the read model should be materialized in its own database
how the read model is updated
The simplest CQRS implementation is one where the query side and command side use the same tables. The persistent source for Read Models could also be SQL (materialized) views based on these tables. If you do have a separate database for reads, it can be kept up-to-date by additional Command Handlers or sub-handlers or Event Handlers that operate after the command has been executed.
You can see a minimalist - yet perfectly CQRS compliant - implementation here : https://github.com/gregoryyoung/m-r/tree/master/SimpleCQRS
But are repositories the only classes that can touch persistence in a
bounded context?
No, in a CQRS context, Read Model Facades (a.k.a. read side repos) can also read from it and your read model update mechanism write to it.
Also, I am using domain events, certain domain events also need to be
persisted. I need something called event storage, but I only heard
such terms appear in event sourcing (ES) and I am not using ES.
Event stores are the main storage technology of event-sourced systems. You could use them to store a few domain events on the side in a non-ES application, but they may be overkill and too complex for the task. It depends if you need all the guarantees they offer in terms of delivery, consistency, concurrency/versioning, etc. Otherwise, a regular RDBMS or NoSQL store can do the trick.
First, you need to think about your object model independantly of how you will store it in the database. You're designing an object model. Forget about the database for a moment.
You're saying that you don't want view counts or upvotes to be aggregate roots. That means you want to put them in an aggregate with some other objects. One of those objects is the aggregate root.
Without knowing more about your model, it's hard to say what you could do with more details, but the basic way would be to persist the aggregate root with the corresponding repository. The repository is not only responsible of storing the aggregate root, but the entire aggregate, following the relationships.
Think about the other side, when you are using the repository to retrieve an entity. You get an instance of your aggregate root, but if you follow the relationships, you also have all those other objects. It's perfectly logical that when you save an entity, all those other objects are saved too.
I don't know which technology you're using, but you should write your repository so that it does that.
also, why is the query side not allowed to use repositories ? Repositories are not only used to save data. They are also used to retrieve it. How are you retrieving objects without repositories (even if you don't modify them ?)

Kohana ORM & MVC

although I am pretty decent at PHP I am new to frameworks.
started with CI last week and found myself looking at Kohana this week.
I have few questions to that regard:
why ORM vs traditional SQL or active queries?
if the model must fetch data from DB , how come in ORM most of the action happens in the controller ( or so it seems ) ie ( $data=$q->where('category', '=', 'articles')->find_all();}
how would I do a conditional query in ORM? ( something like if (isset($_GET['category']))...etc ) if the condition is passed to the model? or should the controller do all the conditions
FYI my queries tend to have numerous joins and my limited knowledge tells me that I should have a query controller that passes queries parameters to a query model which does the query and returns results.
Please let me know if my understanding is correct
thank you very much
ORM is some kind of wrapper over the DB layer. So, you just call $user->find($id) instead of $db->query('select * from users where id='.$id) or DB::select()->from('users')->where('id', '=', $id)->limit(1)->execute(). You declare model params (table name, relations etc) and use only model methods to work with its data. You can easily change DB structure or DB engine without modifying a lot of controller code.
Agree with Ikke, controller should avoid model specific data like query conditions. For example, create method get_by_category($category).
See #2. All args you want should be passed into model method (this can be done using chaining, like $object->set_category($category)->set_time_limit(time())->limit(10)).
ORM is just another way to get at your data. The idea is that there are many common kind of operations, and that could be automated. And because the relations between tables can easily be translated to objects referencing eachother, ORM was created.
It's up to you if you want to use the supplied ORM module. There are others which are also commonly used (like sprig, jelly and auto-modeler).
My personal opinion is to limit that kind of operations to a minimum. Very simple operations can be done this way, because it barely produces any advantages in placing them in the model, but the best way is to try to put the business logic as much in the models as possible.
Another point is that it should be the view that gets the data from the models. That way, when you want to reuse a view, very little code has to be duplicated. But to prevent too much logic getting in your views, it's recommended to use so-called viewclasses which contain the logic for your views, and is the interface for your views to talk to.
There is a Validation library to make sure that all the data for your model is correct. Your models shouldn't know about $_GET and $_POST, but the data from those arrays can be passed to your models.

Complex taxonomy ORM mapping - looking for suggestions

In my project (ASP.NET MVC + NHibernate) I have all my entities, lets say Documents, described by set of custom metadata. Metadata is contained in a structure that can have multiple tags, categories etc. These terms have the most importance for users seeking the document they want, so it has an impact on views as well as underlying data structures, database querying etc.
From view side of application, what interests me the most are the string values for the terms. Ideally I would like to operate directly on the collections of strings like that:
class MetadataAsSeenInViews
{
public IList<string> Categories;
public IList<string> Tags;
// etc.
}
From model perspective, I could use the same structure, do the simplest-possible ORM mapping and use it in queries like "fetch all documents with metadata exactly like this".
But that kind of structure could turn out useless if the application needs to perform complex database queries like "fetch all documents, for which at least one of categories is IN (cat1, cat2, ..., catN) OR at least one of tags is IN (tag1, ..., tagN)". In that case, for performance reasons, we would probably use numeric keys for categories and tags.
So one can imagine a structure opposite to MetadataAsSeenInViews that operates on numeric keys and provide complex mappings of integers to strings and other way round. But that solution doesn't really satisfy me for several reasons:
it smells like single responsibility violation, as we're dealing with database-specific issues when just wanting to describe Document business object
database keys are leaking through all layers
it adds unnecessary complexity in views
and I believe it doesn't take advantage of what can good ORM do
Ideally I would like to have:
single, as simple as possible metadata structure (ideally like the one at the top) in my whole application
complex querying issues addressed only in the database layer (meaning DB + ORM + at less as possible additional code for data layer)
Do you have any ideas how to structure the code and do the ORM mappings to be as elegant, as effective and as performant as it is possible?
I have found that it is problematic to use domain entities directly in the views. To help decouple things I apply two different techniques.
Most importantly I'm using separate ViewModel classes to pass data to views. When the data corresponds nicely with a domain model entity, AutoMapper can ease the pain of copying data between them, but otherwise a bit of manual wiring is needed. Seems like a lot of work in the beginning but really helps out once the project starts growing, and is especially important if you haven't just designed the database from scratch. I'm also using an intermediate service layer to obtain ViewModels in order to keep the controllers lean and to be able to reuse the logic.
The second option is mostly for performance reasons, but I usually end up creating custom repositories for fetching data that spans entities. That is, I create a custom class to hold the data I'm interested in, and then write custom LINQ (or whatever) to project the result into that. This can often dramatically increase performance over just fetching entities and applying the projection after the data has been retrieved.
Let me know if I haven't been elaborate enough.
The solution I've finally implemented don't fully satisfy me, but it'll do by now.
I've divided my Tags/Categories into "real entities", mapped in NHibernate as separate entities and "references", mapped as components depending from entities they describe.
So in my C# code I have two separate classes - TagEntity and TagReference which both carry the same information, looking from domain perspective. TagEntity knows database id and is managed by NHibernate sessions, whereas TagReference carries only the tag name as string so it is quite handy to use in the whole application and if needed it is still easily convertible to TagEntity using static lookup dictionary.
That entity/reference separation allows me to query the database in more efficient way, joining two tables only, like select from articles join articles_tags ... where articles_tags.tag_id = X without joining the tags table, which will be joined too when doing simple fully-object-oriented NHibernate queries.

What is the difference between Database Abstraction Layer & Data Access Layer?

I am actually stuck in 3-tier structure. I surfed the internet and found two terminologies "Database Abstraction Layer" & "Data Access Layer".
What are the differences between the two?
Data Access Layer= Create, Read, Update, Delete (CRUD) operations specific to your application domain
Data Abstraction Layer= performs generic database operations like connections, commands, parameters insulating you from vendor specific data libraries and providing one high level api for accessing data regardless of whether you use MySQL, Microsoft SQL Server, Oracle, DB2, etc...
My understanding is that a data access layer does not actually abstract the database, but rather makes database operations and query building easier.
For example, data access layers usually have APIs very similar to SQL syntax that still require knowledge of the database's structure in order to write:
$Users->select('name,email,datejoined')->where('rank > 0')->limit(10);
Data abstraction layers are usually full blown ORM's (Object-Relational Mappers) that theoretically prevent the need to understand any underlying database structure or have any knowledge of SQL. The syntax might be something like this:
Factory::find('Users', 10)->filter('rank > 0');
And all the objects might be fully populated with all the fields, possibly joined with any parent or child objects if you set it that way.
However, this abstraction comes with a price. I personally find ORM's like doctrine or propel to be unnecessary and inefficient. In most cases a simple data access layer will do fine, with manual SQL for anything that requires special attention, instead of having to destroy your application's performance for some syntactic sugar. This area is a pretty heated debate so I won't go into it anymore.
If you meant database abstraction layer, then it would be something along the lines of PDO, so that your code can be used for a larger number of database vendors. PDO works with MySQL, PostgreSQL, and mysqli among others, I believe.
From Wiki:
Data Access Layer
A data access layer (DAL) in computer software, is a layer of a
computer program which provides simplified access to data stored in
persistent storage of some kind, such as an entity-relational
database.
For example, the DAL might return a reference to an object (in terms
of object-oriented programming) complete with its attributes instead
of a row of fields from a database table. This allows the client (or
user) modules to be created with a higher level of abstraction. This
kind of model could be implemented by creating a class of data access
methods that directly reference a corresponding set of database stored
procedures. Another implementation could potentially retrieve or write
records to or from a file system. The DAL hides this complexity of the
underlying data store from the external world.
For example, instead of using commands such as insert, delete, and
update to access a specific table in a database, a class and a few
stored procedures could be created in the database. The procedures
would be called from a method inside the class, which would return an
object containing the requested values. Or, the insert, delete and
update commands could be executed within simple functions like
registeruser or loginuser stored within the data access layer.
In short, your basic CRUD functionalities/logics on business objects to push to/pull from Persistance/Storage layer falls here. For most cases you might want just this. ORM mapping, interfaces of business objects of Model etc fall here.
Database Abstraction Layer
A database abstraction layer is an application programming interface
which unifies the communication between a computer application and
databases such as SQL Server, DB2, MySQL, PostgreSQL, Oracle or
SQLite. Traditionally, all database vendors provide their own
interface tailored to their products which leaves it to the
application programmer to implement code for all database interfaces
he or she would like to support. Database abstraction layers reduce
the amount of work by providing a consistent API to the developer and
hide the database specifics behind this interface as much as possible.
There exist many abstraction layers with different interfaces in
numerous programming languages.
Basically, its an additional layer of abstraction so that you CRUD against vendor independent interfaces and worry less about implementation details of various database vendors. You will need this only if you would want to support more than one database. ORMs, Micro ORMs, wrappers, generic driver classes, whatever the name is, etc that deals with connection establishment, parameter handling, execution etc fall here. It's just an additional layer just before Persistance/Storage layer. In 3 tier terminology, both these layers fall under one as they are not logically separate.
To summarize, DAL is about data, DbAL is about database. DAL defines operations, DbAL operates. DAL sits behind DbAL which is just behind actual Db. DAL calls DbAL. DAL is a good thing to separate business logics (in Model) from CRUD logics, while DbAL is seldom needed (but I love it). DAL is more high level design mapping, DbAL is more low level architecture and implementation. Both separates responsibilities. ORMs are massive structures that does both for you. I'm not sure how you separate them when using ORMs. You need not since ORMs handle all that for you. Ideally, I would anyway have DAL in one project, and DbAL in another which I would simply call Persistence layer since there is no point in separating Db and operations on it.