When and how to assign unique id to an entity in DDD? - oop

The best example would be an User entity which needs to be persisted. I have the following candidates to assign unique identifier to an user:
Assign keys provided by back-end (NDB, MySQL etc.).
Hand crafting unique identifier through some service (like system clock).
Properties like emailId.
Taking a simple example of a detailed view, we often have a detailed display of an user like some/path/users/{user_id}, if we keep emailId as the unique id then there are chances that an user may change its email id one day and breaks it.
Which is a better approach to identify the same entity?

Named UUID.
UUID, because it gives the identifier a nice predictable structure, without introducing any semantic implications (like your email id example). Think surrogate key.
Named UUID, because you want the generated id to be deterministic. Deterministic means reproducable : you can move your system to a test environment, and replay commands to inspect the results.
It also gives you an extra way to detect duplicated work - what should happen in your system if a create user command is repeated (example: user POSTs the same web form twice). There are various ways that you can guard against this in your intermediate layers, but a really easy way to cover this in your persistence layer (aka in your system of record) is to put a uniqueness constraint on the id. Because running the command a second time produces a "new" user entity with the same id, the persistence layer will object to the duplication, and you can handle things from there.
Thus, you get idempotent command handling even if all of your intermediate guard layers restart during the interval between the duplicated commands.
Named UUID gives you these properties; for instance, you might build the uuid from an identifier for the type of the entity and the id of the command (the duplicated command will have the same id when it is resent).
You can use transient properties of the user (like email address) as part of the seed for your named uuid if you have a guarantee that the property won't ever be assigned to someone else. Are you sure vivek#stackoverflow.com won't be assigned to another user? Then it's not a good seed to use.
Back end key assignment won't detect a collision if a command is duplicated - you would need to rely on some other bit of state to detect the collision.
System clock isn't a good choice, because it makes reproducing the same id difficult. A local copy of the system clock can work, if you can reproduce the updates to the local clock in your test environment. But that's a bunch of extra effort your don't want if time isn't already part of your domain model.
https://www.ietf.org/rfc/rfc4122.txt (Section 4.3)
http://www.rfc-editor.org/errata_search.php?rfc=4122&eid=1352 (Errata for the example in the spec)
Generating v5 UUID. What is name and namespace?

I agree with #VoiceOfUnreason but only partially. We all know that UUIDs are terrible to spell and keep track of. All methods to use incremental and meaningful UUIDs resolve only parts of these issues.
An aggregate is being created with some id that is already available to the creating party. Although UUID can be generated without involving any external components, this is not the only solution. Using an external identity provider like Twitter Snowflake (retired) is an option too.
It is not very complicated to create very simple and reliable identity provider that can return incrementing long value by being given an aggregate type name.
Surely, it increases the complexity and can only be justified when there is a requirement to generate sequential unique numeric values. Resilience of this service becomes very important and needs to be addressed carefully. But it can just be seen as any other critical infrastructure component and we know that every system has quite a few of those anyway.

Related

Google App Engine: Is there any security concern with giving away datastore urlsafe entity keys in an API?

I want to give out anonymous IDs to certain entities in a public token-based API.
Is there any reason I shouldn't be using the urlsafe string of entity keys for that, since they are already anonymized (at least in my case, where I’m not using my own data to construct the key?
Google App Engine and the Datastore are considered safe as long as I'm not handing anyone the key, which I'm not, right?
Thank you.
One of their documentations says ....The urlsafe keyword parameter uses a websafe-base64-encoded serialized reference but it's best to think of it as just an opaque unique string.... I think this is what you're referring to when you say it is anonymized
But a subsequent documentation says ....The string representation of a key looks cryptic, but is not encrypted! It can be converted back to the raw key data, both kind and identifier. If you don't want to expose this data to your users (and allow them to easily guess other entities' keys), then encrypt these strings or use something else....
You can decode the key yourself via base64 - usually there is no risk in giving it away.
The huge risk is in taking an urlsafe entity keys as parameters and using them to read from the datastore. An attacker can trick your application in reading arbitrary data from your datastore project. This is to my knowledge nowhere documented.
So basically, any variant of this is a no-go in a web server:
def get(params):
data = datastore.get(urlsavedeccode(params.key))
return data
Any key supplied from the outside should be never used with the datastore since you can not be shure you are reading the kind / path you are expecting. This is basically the same scope of risk as SQL injection.

CQRS - options around "get or create"

I'm putting something together using a CQRS pattern (no event sourcing, nor DDD, but a clear difference between command and query).
The operation I'm trying to model is a "get-or-create", given a set of parameters. The item being created (or gotten) is effectively a unique communications link ID. Either of two parties can say "get-or-create comms link between me and the other" and a new temporary random ID is returned (which would be valid between them both). They can then send/receive messages using that ID (a PostMessage command or GetRecentMessages query). This temporary ID can be passed around, but can also be centrally invalidated, controlled, etc. Different sessions between the two parties should be recorded separately.
I know that the more typical "insert-then-get-me-the-ID-back" is handled by the command having a GUID parameter. But this doesn't seem to apply here because of course the item might already exist..
My options, I believe:
Execute a GetOrCreateCommsLink command followed by a GetActiveCommsLinkId query, i.e. command, then query. Feels wrong because commands are supposedly typically asynchronous (though not in my simple prototype so far), and is it right to wait for a command then run a query in my service layer?
Run a GetExistingOrNewActiveCommsLinkId query, which will either return an existing session ID, or create and return one. Feels wrong because it's a dirty cheat, both reading and mutating state in a query..
Don't use CQRS for this part of the app
Have each client use their own ID for the session - NotifyCommsLinkIdentifier command from each side specifies the parameters and their own ID, which is linked internally to the actual ID by the command. Then run a GetUnderlyingCommsLinkId query, given the identifier previously specified, to uncover the ID if needed. Feels wrong to because inventing this extra concept seems to be only because of the CQRS pattern, rather than any actual domain/business need
I suppose my question in general is how to deal with potential get-then-act, or act-then-get scenarios. Should I simply chain them together in my service layer, as per option 1.
Is there a standard approach, or standard approaches, to this?
So you are talking about CQS, not CQRS. Basically you are trying to find workarounds in order to strictly implement CQS pattern for something that naturally may not really be an asynchronous command.
My advice is: don't try to apply a pattern because of the pattern, but because it makes sense. Does it make sense in your case? What would be the benefit? Remember that you are not Amazon. Do you really need it?
That said, what I typically do is not the purist way, but allowing a command to return a simple ID if it's needed. This will make your architecture a lot more simple; and you still separate commands from queries which to me is the most important advantage.

RESTful implementation for "archiving" an entry

Say I have a table for machines, and one of the columns is called status. This can be either active, inactive, or discarded. A discarded machine is no longer useful and it only used for transaction purposes.
Assume now I want to discard a machine. Since this is an entry update, RESTfully, this would be a PUT request to /machines/:id. However, since this is a very special kind of an update, there could be other operations that would occur as well (for instance, remove any assigned users and what not).
So should this be more like a POST to /machines/:id/discard?
From a strict REST perspective, have you considered implementing a PATCH? In this manner, you can have it update just that status field and then also tie it in to updating everything else that is necessary?
References:
https://www.mnot.net/blog/2012/09/05/patch
http://jasonsirota.com/rest-partial-updates-use-post-put-or-patch
I think the most generic way would be to POST a Machine object with { status: 'discarded' } to /machines/:id/.
Personally, I'd prefer the /machines/:id/discard approach, though. It might not be exactly in accordance with the spec, but it's more clear and easier to filter. For example, some users might be allowed to update a machine but not "archive" it.
Personally I think post should be used when the resource Id is either unknown or not relevant to the update being made.
This would make put the method I would use especially since you have other status types that will also need to be updated
path
/machines/id
Message body
{"status":"discarded"}

RESTful way to preallocate an ID

For some reasons my application needs to have an API that flows like:
Client calls server to get ID for a new resource.
Then user spends a while filling out the forms for the resource.
Then user clicks save (or not...), and when he does the client saves by writing to /myresource/{id}
What is the RESTful way to design this?
How should the first call look like? On server side, it's a matter of generating an ID and returning it. It has side effects (increments sequence and thus "reserves space"), but it doesn't explicitly create a resource.
If I understand correctly, the 3rd call should be a PUT because it creates something with a known URI.
One way you could do it is:
client POSTs empty body to /myresource/
server answers with status code 302 (Found) with a Location response header set to /myresource/newresourceid (to indicate the ID / URI that should be used to create the resource)
client PUTs the new resource to /myresource/newresourceid once the user is done filling the form.
Seems RESTful enough. ;)
I'm interested to see the other answers to this question as I imagine there's a lot of ways to do this.
If possible I would let your auto-incrementing ID in the database serve as your surrogate key and assign another field to be your business identifier. It could be something like a product code or a GUID.
With this in mind the client can now create the ID themselves which removes the need for step 1 at all. They would do a PUT to a url such as /myresource/MLN5001 or /myresource/3F2504E0-4F89-11D3-9A0C-0305E82C3301 to create the resource. If the ID is already in use return a 409 Conflict with the conflict in the response body. Otherwise return a 201 Created and include the URI to the resource in the response body and location header.
I would go with
GET /myresource/new-id
POST /myresource/{id}
Your walkthrough is pretty clear on the verb:
"to GET [an] ID for a new resource"
you could rename new-id to whatever you think makes it most clear. If you have multiple resources you need to do this for, it would probably be better to split out the generator into its own resource, such as
GET /id-generator/myresource
GET /id-generator/my-other-resource
If there are multiple cases, the user will quickly learn they need to hit id-generator to get their ID. If it's only one case, it's annoying for them to only have to use it infrequently.
I guess you could also do
GET /myresource-id-generator/next
which looks a little clearer, but then if you ever need another ID to be generated you have to make another resource to do it.
ID allocation is non-idempotent — two invokes of the allocation operation will get different IDs — so that should always be a POST. From that point on, the resource should conceptually exist. However, what I'd do at that point is fill it out with reasonable default values (whether that involves doing POSTs or PUTs is rather immaterial to the RESTfulness of the overall design), so the user can then take their time to alter the things that they want to look like they want them to.
The question then becomes one of whether there should be some kind of “release this; I'm done with altering it” operation at the end. Strict RESTfulness says there shouldn't, as if you know the resource identifier (the URL) then you can talk about it. On the other hand, that doesn't mean the hosting server has to tell anyone else about the resource until the creating user is happy with it; general HATEOAS principles say nothing about when others can discover that a resource exists or whether knowing the name lets you read the thing, but it's entirely reasonable to deny to third parties that a resource exists until the owner of the resource turns on the “make this public” flag.

Event Sourcing using NHibernate

I'm moving from pure DDD paradigm to CQRS. My current concern is with Event Sourcing and, more specifically, organizing Event Store. I've read tons of blogs posts but still can't understand some things. So correct me if I'm wrong.
Each event basically consists of:
- Event date/time
- type of Event (we can figure out type of AggregateRoot from this as well)
- AggregateRoot id (Guid)
- AggregateRoot version (to maintain the order of updates)
- Event data (some serialized class with data necessary to make update)
Now, if my Event data consists of simple value types (ints, strings, enums, etc.) then it's easy. But what if I have to pass another AggregateRoot? I can't serialize the whole AR as a part of Event data (think of all the data and lazy loading), basically I only need to store Id of that AR. But then, when I need to apply that event, I'd need to get that AR from database first. And it doesn't feel right to do so from my Domain Model (calling Repositories and working with AR Ids).
What's the best approach for this?
p.s. For a concrete example, let's assume there's a Model which consists of Task and User entities (both ARs). Task hold a reference to User responsible. But the responsible User can be changed.
Update: I think I've found the source of my confusion. I believe event sourcing should be used only for building read model. And in this case passing Ids and raw data is ok. But the same events used on aggregates themselves. And this I cannot understand.
In DDD an aggregate is a consistency/invariant boundary, so one may never depend on another to maintain its invariants. When we start using this design restriction we find very few situations where is necessary to store a full reference to the other, usually we store its id and (if necessary) version and a copy of the relevant attributes.
For example, using the usual Order/LineItem and Product problem we would copy the Product's id and price in the LineItem, instead of a full reference. This way prevents changes in the Product's price affect the Order/LineItem aggregate's invariants. If is necessary to update the LineItem price after Product price changes we need to keep track of the PriceChanged event from used Products and send a compensating command to the Order/LineItem. Usually this coordination/synchronization is handled by a saga.
In Event Sourcing, the state of the aggregate is defined by Events, and nothing more. All domain model stuff (ala DDD) is there just to decide what domain events should be raised. Event should know nothing about your Domain, it should be simple DTO. In fact, it is perfectly OK to have Event Sourcing without DDD.
As i understand Event Sourcing, it is supposed to help people get rid of relational data models and ORM like NHibernate or Entity Framework, since each of them is a science on its own. Programmers could then simply focus on business logic. I saw here some relational schemas used for event stores, and they were simply ID, Version, Timestamp plus an NClob or NVarchar(max) column to store the event payload schema-less.