CQRS: Read model projection update in an API

CQRS: Read model projection update in an API - api

I would like to have a simple CQRS implementation on an API.
In short:
Separate routes for Command and Query.
Separate DB tables (on the same DB at the moment). Normalized one for Command and a de-normalized one for Query.
Asynchronous event-driven update of Query Read Model, using existing external Event Bus.
After the Command is executed, naturally I need to raise an event and pass it to the Event Bus.
Event Bus would process the event and pass it to it's subscriber(s).
In this case the subscriber is Read Model which needs to be updated.
So I need a callback route on the API which gets the event from Command Bus and updated the Read Model projection (i.e.: updating the de-normalized DB table which is used for Queries).
The problem is that the update of the Read Model projection is neither a Command (we do not execute any Domain Logic) nor a Query.
The questions is:
How should this async Read Model update work in order to be compliant both with CQRS and DDD?

How should this async Read Model update work in order to be compliant both with CQRS and DDD?
I normally think of the flow of information as a triangle.
We copy information from the outside world into our "write model", via commands
We copy information from the write model into our "read model"
We copy information from the read model to the outside world, via queries.
Common language for that middle step is "projection".
So the projection (typically) runs asynchronously, querying the "write model" and updating the "read model".
In the architecture you outlined, it would be the projection that is subscribed to the bus. When the bus signals that the write model has changed, we wake up the projection, and let it run so that it can update the read model.
(Note the flow of information - the signal we get from the bus triggers the projection to run, but the projection copies data from the write model, not from the event bus message. This isn't the only way to arrange things, but it is simple, and therefore easy to reason about when things start going pear shaped.)
It is often the case that the projection will store some of its own metadata when it updates the read model, so as to not repeat work.

Related

Azure Data Factory: Execute Pipeline activity cannot reference calling pipeline, cyclical behaviour required

I have a number of pipelines that need to cycle depending on availability of data. If the data is not there wait and try again. The pipe behaviours are largely controlled by a database which captures logs which are used to make decisions about processing.
I read the Microsoft documentation about the Execute Pipeline activity which states that
The Execute Pipeline activity allows a Data Factory or Synapse
pipeline to invoke another pipeline.
It does not explicitly state that it is impossible though. I tried to reference Pipe_A from Pipe_A but the pipe is not visible in the drop down. I need a work-around for this restriction.
Constraints:
The pipe must not call all pipes again, just the pipe in question. The preceding pipe is running all pipes in parallel.
I don't know how many iterations are needed and cannot specify this quantity.
As far as possible best effort has been implemented and this pattern should continue.
Ideas:
Create a intermediary pipe that can be referenced. This is no good I would need to do this for every pipe that requires this behaviour because dynamic content is not allowed for pipe selection. This approach would also pollute the Data Factory workspace.
Direct control flow backwards after waiting inside the same pipeline if condition is met. This won't work either, the If activity does not allow expression of flow within the same context as the If activity itself.
I thought about externalising this behaviour to a Python application which could be attached to an Azure Function if needed. The application would handle the scheduling and waiting. The application could call any pipe it needed and could itself be invoked by the pipe in question. This seems drastic!
Finally, I discovered an activity Until which has do while behaviour. I could wrap these pipes in Until, the pipe executes and finishes and sets database state to 'finished' or cannot finish and sets the state to incomplete and waits. The expression then either kicks off another execution or it does not. Additional conditional logic can be included as required in the procedure that will be used to set a value to variable used by the expression in the Until. I would need a variable per pipe.
I think idea 4 makes sense, I thought I would post this anyway in case people can spot limitations in this approach and/or recommend an approach.

Yes, absolutely agree with All About BI, its seems in your scenario the best suited ADF Activity is Until :
The Until activity in ADF functions as a wrapper and parent component
for iterations, with inner child activities comprising the block of
items to iterate over. The result (s) from those inner child
activities must then be used in the parent Until expression to
determine if another iteration is necessary. Alternatively, if the
pipeline can be maintained
The assessment condition for the Until activity might comprise outputs from other activities, pipeline parameters, or variables.
When used in conjunction with the Wait activity, the Until activity allows you to create loop conditions to periodically check the status of specific operations. Here are some examples:
Check to see if the database table has been updated with new rows.
Check to see if the SQL job is complete.
Check to see whether any new files have been added to a specific
folder.

Updating OpenFlow group table bucket list in OpenDaylight

I have a mininet (v2.2.2) network with openvswitch (v2.5.2), controlled by OpenDaylight Carbon. My application is an OpenDaylight karaf feature.
The application creates a flow (for multicasts) to a group table (type=all) and adds/removes buckets as needed.
To add/remove buckets, I first check if there is an existing group table:
InstanceIdentifier<Group> groupIid = InstanceIdentifier.builder(Nodes.class)
.child(Node.class, new NodeKey(NodId))
.augmentation(FlowCapableNode.class)
.child(Group.class, grpKey)
.build();
ReadOnlyTransaction roTx = dataBroker.newReadOnlyTransaction();
Future<Optional<Group>> futOptGrp = rwTx.read(LogicalDatastoreType.OPERATIONAL, groupIid);
If it doesn't find the group table, it is created (SalGroupService.addGroup()). If it does find the group table, it is updated (SalGroupService.updateGroup()).
The problem is that it takes some time after the RPC call add/updateGroup() to see the changes in the data model. Waiting for the Future<RPCResult<?>> doesn't guarantee that the data model has the same state as the device.
So, how do I read the group table and bucket list from the data model and make sure that I am indeed reading the same state as the current state of the device?
I know that
Add/UpdateGroupInputBuilder has setTransactionUri()
DataBroker gives transaction to read/write
you should use transaction chaining
But I cannot figure out how to combine these.
Thank you
EDIT: Or do I have to use write transactions in stead of RPC calls?

I dropped using RPC calls for writing flows and switched to using writes to the config datastore. It will still take some time to see the changes appear in the actual device and in the operational datastore but that is ok as long as I use the config datastore for both reads and writes.
However, I have to keep in mind that it is not guaranteed that changes to the config datastore will always make it to the actual device. My flows are not that complicated in the sense that conflicts are unlikely to happen. Still, I will probably check consistency between operational and configuration datastore.

Trident or Storm topology that writes on Redis

I have a problem with a topology. I try to explain the workflow...
I have a source that emits ~500k tuples every 2 minutes, these tuples must be read by a spout and processed exatly once like a single object (i think a batch in trident).
After that, a bolt/function/what else?...must appends a timestamp and save the tuples into Redis.
I tried to implement a Trident topology with a Function that save all the tuples into Redis using a Jedis object (Redis library for Java) into this Function class, but when i deploy i receive a NotSerializable Exception on this object.
My question is.How can i implement a Function that writes on Redis this batch of tuples? Reading on the web i cannot found any example that writes from a function to Redis or any example using State object in Trident (probably i have to use it...)
My simple topology:
TridentTopology topology = new TridentTopology();
topology.newStream("myStream", new mySpout()).each(new Fields("field1", "field2"), new myFunction("redis_ip", "6379"));
Thanks in advance

(replying about state in general since the specific issue related to Redis seems solved in other comments)
The concepts of DB updates in Storm becomes clearer when we keep in mind that Storm reads from distributed (or "partitioned") data sources (through Storm "spouts"), processes streams of data on many nodes in parallel, optionally perform calculations on those streams of data (called "aggregations") and saves the results to distributed data stores (called "states"). Aggregation is a very broad term that just means "computing stuff": for example computing the minimum value over a stream is seen in Storm as an aggregation of the previously known minimum value with the new values currently processed in some node of the cluster.
With the concepts of aggregations and partition in mind, we can have a look at the two main primitives in Storm that allow to save something in a state: partitionPersist and persistentAggregate, the first one runs at the level of each cluster node without coordination with the other partitions and feels a bit like talking to the DB through a DAO, while the second one involves "repartitioning" the tuples (i.e. re-distributing them across the cluster, typically along some groupby logic), doing some calculation (an "aggregate") before reading/saving something to DB and it feels a bit like talking to a HashMap rather than a DB (Storm calls the DB a "MapState" in that case, or a "Snapshot" if there's only one key in the map).
One more thing to have in mind is that the exactly once semantic of Storm is not achieved by processing each tuple exactly once: this would be too brittle since there are potentially several read/write operations per tuple defined in our topology, we want to avoid 2-phase commits for scalability reasons and at large scale, network partitions become more likely. Rather, Storm will typically continue replaying the tuples until he's sure they have been completely successfully processed at least once. The important relationship of this to state updates is that Storm gives us primitive (OpaqueMap) that allows idempotent state update so that those replays do not corrupt previously stored data. For example, if we are summing up the numbers [1,2,3,4,5], the resulting thing saved in DB will always be 15 even if they are replayed and processed in the "sum" operation several times due to some transient failure. OpaqueMap has a slight impact on the format used to save data in DB. Note that those replay and opaque logic are only present if we tell Storm to act like that, but we usually do.
If you're interested in reading more, I posted 2 blog articles here on the subject.
http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-storm/
http://svendvanderveken.wordpress.com/2014/02/05/error-handling-in-storm-trident-topologies/
One last thing: as hinted by the replay stuff above, Storm is a very asynchronous mechanism by nature: we typically have some data producer that post event in a queueing system (e,g. Kafka or 0MQ) and Storm reads from there. As a result, assigning a timestamp from within storm as suggested in the question may or may not have the desired effect: this timestamp will reflect the "latest successful processing time", not the data ingestion time, and of course it will not be identical in case of replayed tuples.

Have you tried trident-state for redis. There is a code on github that does it already:
https://github.com/kstyrc/trident-redis.
Let me know if this answers your question or not.

CQRS Command how to store and query entities that are not persisted to data store immediately

In CQRS, we separate Commands and Queries. As I understand it, Commands raise Domain Events that may modify Entity states while Queries return View specific DTO's directly from a data store. According to this article, the UI makes commands through a Command Bus which creates Commands that are handled by their respective CommandHandlers who then orchestrate the Domain Logic to determine the occurrence of Domain Events and persist/publish any state changes to a Repository (optionally using Event Sourcing). After being persisted, state changes are available through Queries.
Now, what if a Command creates an Entity that is not persisted/published immediately? Firstly, where is that not-yet-persisted Entity held? Is it in the Command Bus, the Command Handler, the Repository, or should a new thin application layer hold it? How should a Query gain access to it?
The problem here is that it seems like any Queries for unpersisted Entities differ significantly from those of persisted Entities, unless CQRS demands that ALL Entities be persisted upon creation, which IMO is not necessarily compatible with all Domains.
Specifically, I'm trying to build software to record training information for various Training Sessions. However, I would like it if Training Sessions were persisted manually by a Save Session button as opposed to always upon creation. I don't know where a StartNewTrainingSessionCommand would store the new Training Session so that it can be Queried, if not in the data store.

I think you understood things a bit wrong: A command is sent via a service bus to a command handler which uses the business objects to do the work. Domain events should be generated by the business (domain) objects, but sometimes the command handler does that too.
I don't see a reason for a created entity not to be saved. In your particular case, if the domain allows it, you can have a default, empty TrainingSession saved automatically then updated when the user press the Save button.
If this approach is not feasible, then simply store the input data, pretty much the view models in a temporary place (session, db) and issue the command only when the user clicks the button.

Keeping Gemfire in Sync with a Database

We are developing an application that makes use of deep object models that are being stored in Gemfire. Clients will access a REST service to perform cache operations instead of accessing the cache via the Gemfire client. We are also planning to have a database underneath Gemfire as the system of record making use of asynchronous write-behind to do the DB updates and inserts.
In this scenario it seems to be a non-trivial problem to guarantee that an insert or update into Gemfire will result in a successfull insert or update in the DB without setting up elaborate server-side validation (essentially the constraints to the Gemfire operation would have to match the DB operation constraints). We cannot bubble the DB insert or update success/failure back to the client without making the DB call synchronous to the Gemfire operation. This would obviously defeat the purpose of using Gemfire for low latency client operations.
We are curious as to how other Gemfire adopters who are using write-behind have solved the problem of keeping the DB in sync with the Gemfire data fabric?

We generally recommend to implement validations in GemFire in a CacheWriter since by definition it's going to be called before any modification occurs on the cache.
http://gemfire.docs.gopivotal.com/javadocs/japi/com/gemstone/gemfire/cache/CacheWriter.html
That said, a pattern that I saw on a customer was to receive data on region "A" and implement basic validations there, accepting that data. This data will then be copied to the DB using write-behind as you describe, batched through AsyncEventListener where in the try/catch of the insertion, if any error occur they store that data in a another region that has another Listener for the write-behind, non-batched, so you can actually see which record failed and decide what to do accordingly.
Something like:
data -> cacheWriter basic checks -> region A -> AEL (batch 500~N events) -> DB
If error occurs copy to region B -> Listener persist individual records on DB ->
Do some action on the failed record.
In their case, they had an interface to manually clear pending records on region B.
Hope that helps, if not maybe you can provide more details about your use case...
Cheers.

You can register an AsyncEventListener and update your DB with batch updates. For more info:
http://pubs.vmware.com/vfabric53/topic/com.vmware.vfabric.gemfire.7.0/developing/events/implementing_write_behind_event_handler.html

Gemfire ver8 and above solves your problem of accessing Cache objects REST service.
Gemfire REST

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas