How to process invokeAll EntryProcessor from map/set's values in custom order? - ignite

For the function:
invokeAll()
It use Map/Set which contains the entry will be processed, I want process the each entry in a custom order, i.e as the same of the key order
in document:
The order that the entries for the keys are processed is undefined. Implementations may choose to process the entries in any order, including concurrently. Furthermore there is no guarantee implementations will use the same EntryProcessor instance to process each entry, as the case may be in a non-local cache topology.
For this line:
Implementations may choose to process the entries in any order, including concurrently
I don't know how to do this, is there any example?
If I use a TreeMap/TreeSet to save the key with order, does the entry will be handled same as its key order in the TreeMap/TreeSet?
By the way, as invoke has a internal lock, does invokeAll will also hold the lock for all the keys in map / set, until the entryprocessor finished?

The documentation you're referring to is, in fact, inherited from javax.cache.Cache::invokeAll. "Implementation" here means not an EntryProcessor but an implementation of the JSR 166 (AKA JCache, AKA javax.cache package) - and Ignite implements it in IgniteCache.
What this documentation means is that specification of the javax.cache.Cache interface allows its implementations to invoke EntryProcessors in any order. Ignite chooses not give any additional details to it, and there is not way to influence the order here.
Also, remember that Ignite is distributed, so the processing of entries in invokeAll is inherently concurrent. If you need strict order, it's probably better to iterate over the keys and use invoke on each key.

Related

Way to call data source only once for multiple keys in apache geode

I have apache geode as inline cache where it is connected to postgres as datasource. Since for getting all keys at once when getall method id invoked from region It calls CacheLoader Sequentially. Is there a way so that i can get all keys and call my datasource at once such as calling in query from Cacheloader.
I don't there's a way of accomplishing this out of the box using a CacheLoader since, as you already verified, the callback is invoked sequentially on every key not found within the Region. You might be able to pre-populate the Region with all those keys you know must be there, though, but keys not found while executing Region.getAll() will still be retrieved sequentially by invoking the configured CacheLoader.

When and how to assign unique id to an entity in DDD?

The best example would be an User entity which needs to be persisted. I have the following candidates to assign unique identifier to an user:
Assign keys provided by back-end (NDB, MySQL etc.).
Hand crafting unique identifier through some service (like system clock).
Properties like emailId.
Taking a simple example of a detailed view, we often have a detailed display of an user like some/path/users/{user_id}, if we keep emailId as the unique id then there are chances that an user may change its email id one day and breaks it.
Which is a better approach to identify the same entity?
Named UUID.
UUID, because it gives the identifier a nice predictable structure, without introducing any semantic implications (like your email id example). Think surrogate key.
Named UUID, because you want the generated id to be deterministic. Deterministic means reproducable : you can move your system to a test environment, and replay commands to inspect the results.
It also gives you an extra way to detect duplicated work - what should happen in your system if a create user command is repeated (example: user POSTs the same web form twice). There are various ways that you can guard against this in your intermediate layers, but a really easy way to cover this in your persistence layer (aka in your system of record) is to put a uniqueness constraint on the id. Because running the command a second time produces a "new" user entity with the same id, the persistence layer will object to the duplication, and you can handle things from there.
Thus, you get idempotent command handling even if all of your intermediate guard layers restart during the interval between the duplicated commands.
Named UUID gives you these properties; for instance, you might build the uuid from an identifier for the type of the entity and the id of the command (the duplicated command will have the same id when it is resent).
You can use transient properties of the user (like email address) as part of the seed for your named uuid if you have a guarantee that the property won't ever be assigned to someone else. Are you sure vivek#stackoverflow.com won't be assigned to another user? Then it's not a good seed to use.
Back end key assignment won't detect a collision if a command is duplicated - you would need to rely on some other bit of state to detect the collision.
System clock isn't a good choice, because it makes reproducing the same id difficult. A local copy of the system clock can work, if you can reproduce the updates to the local clock in your test environment. But that's a bunch of extra effort your don't want if time isn't already part of your domain model.
https://www.ietf.org/rfc/rfc4122.txt (Section 4.3)
http://www.rfc-editor.org/errata_search.php?rfc=4122&eid=1352 (Errata for the example in the spec)
Generating v5 UUID. What is name and namespace?
I agree with #VoiceOfUnreason but only partially. We all know that UUIDs are terrible to spell and keep track of. All methods to use incremental and meaningful UUIDs resolve only parts of these issues.
An aggregate is being created with some id that is already available to the creating party. Although UUID can be generated without involving any external components, this is not the only solution. Using an external identity provider like Twitter Snowflake (retired) is an option too.
It is not very complicated to create very simple and reliable identity provider that can return incrementing long value by being given an aggregate type name.
Surely, it increases the complexity and can only be justified when there is a requirement to generate sequential unique numeric values. Resilience of this service becomes very important and needs to be addressed carefully. But it can just be seen as any other critical infrastructure component and we know that every system has quite a few of those anyway.

Are BPMN sequenceFlow allowed to reference specific activities within another process/subprocess?

I am modeling a complex process using BPMN 2.0
I have split the process into multiple global processes which can reference one another through call activity.
However, in one or two special cases, I would like to actually call directly into the middle of one of the other processes. I do not want to have to create an entirely duplicate [sub]process with just the first couple nodes missing and would also prefer not to split those couple nodes into their own little process.
I don't think common BPMN 2.0 tools support this, but is it explicitly disallowed by the spec? For instance, I read through http://www.omg.org/spec/BPMN/2.0.2/PDF and I don't see anywhere that it claims that a sequenceFlow's targetRef must be within the same FlowElementsContainer. Maybe it is just implied?
The correct way to do this would be to create several "none" start events in the global process and then reference the correct one via the targetRef attribute of a sequence flow incoming to the call activity. The spec says on p. 239:
"If the Process is used as a global Process (a callable Process that
can be invoked from Call Activities of other Processes) and there are
multiple None Start Events, then when flow is transferred from the
parent Process to the global Process, only one of the global Process’s
Start Events will be triggered. The targetRef attribute of a Sequence
Flow incoming to the Call Activity object can be extended to identify
the appropriate Start Event."

CQRS - options around "get or create"

I'm putting something together using a CQRS pattern (no event sourcing, nor DDD, but a clear difference between command and query).
The operation I'm trying to model is a "get-or-create", given a set of parameters. The item being created (or gotten) is effectively a unique communications link ID. Either of two parties can say "get-or-create comms link between me and the other" and a new temporary random ID is returned (which would be valid between them both). They can then send/receive messages using that ID (a PostMessage command or GetRecentMessages query). This temporary ID can be passed around, but can also be centrally invalidated, controlled, etc. Different sessions between the two parties should be recorded separately.
I know that the more typical "insert-then-get-me-the-ID-back" is handled by the command having a GUID parameter. But this doesn't seem to apply here because of course the item might already exist..
My options, I believe:
Execute a GetOrCreateCommsLink command followed by a GetActiveCommsLinkId query, i.e. command, then query. Feels wrong because commands are supposedly typically asynchronous (though not in my simple prototype so far), and is it right to wait for a command then run a query in my service layer?
Run a GetExistingOrNewActiveCommsLinkId query, which will either return an existing session ID, or create and return one. Feels wrong because it's a dirty cheat, both reading and mutating state in a query..
Don't use CQRS for this part of the app
Have each client use their own ID for the session - NotifyCommsLinkIdentifier command from each side specifies the parameters and their own ID, which is linked internally to the actual ID by the command. Then run a GetUnderlyingCommsLinkId query, given the identifier previously specified, to uncover the ID if needed. Feels wrong to because inventing this extra concept seems to be only because of the CQRS pattern, rather than any actual domain/business need
I suppose my question in general is how to deal with potential get-then-act, or act-then-get scenarios. Should I simply chain them together in my service layer, as per option 1.
Is there a standard approach, or standard approaches, to this?
So you are talking about CQS, not CQRS. Basically you are trying to find workarounds in order to strictly implement CQS pattern for something that naturally may not really be an asynchronous command.
My advice is: don't try to apply a pattern because of the pattern, but because it makes sense. Does it make sense in your case? What would be the benefit? Remember that you are not Amazon. Do you really need it?
That said, what I typically do is not the purist way, but allowing a command to return a simple ID if it's needed. This will make your architecture a lot more simple; and you still separate commands from queries which to me is the most important advantage.

RESTful implementation for "archiving" an entry

Say I have a table for machines, and one of the columns is called status. This can be either active, inactive, or discarded. A discarded machine is no longer useful and it only used for transaction purposes.
Assume now I want to discard a machine. Since this is an entry update, RESTfully, this would be a PUT request to /machines/:id. However, since this is a very special kind of an update, there could be other operations that would occur as well (for instance, remove any assigned users and what not).
So should this be more like a POST to /machines/:id/discard?
From a strict REST perspective, have you considered implementing a PATCH? In this manner, you can have it update just that status field and then also tie it in to updating everything else that is necessary?
References:
https://www.mnot.net/blog/2012/09/05/patch
http://jasonsirota.com/rest-partial-updates-use-post-put-or-patch
I think the most generic way would be to POST a Machine object with { status: 'discarded' } to /machines/:id/.
Personally, I'd prefer the /machines/:id/discard approach, though. It might not be exactly in accordance with the spec, but it's more clear and easier to filter. For example, some users might be allowed to update a machine but not "archive" it.
Personally I think post should be used when the resource Id is either unknown or not relevant to the update being made.
This would make put the method I would use especially since you have other status types that will also need to be updated
path
/machines/id
Message body
{"status":"discarded"}