How to model the domain with Amazon S3

How to model the domain with Amazon S3 - amazon-s3

Lets say I have an Illustration entity which is an aggregate root. That entity contains some data about the artwork and is persisted in a SQL database while the artwork itself is persisted on Amazon S3. Also, I would want to save some scaled-down or thumbnail versions of the artwork so I introduced a Blob entity in many-to-one relationship with Illustration for representing the binary data of the artwork in various versions.
Now I wonder how should i design the persistence of Blobs. Amazon S3 is a kind of database (please don't start the flame what is a true database here ;) ), just different than SQL and I think that it should be abstracted like so, that means by a Repository. So I would have a BlobRepository where I could store artwork data. On the other hand - in this domain Blob is definitely not an aggregate root - it is always used as a part of Illustration aggregate. So it should't have its own repository.
So maybe Amazon S3 should not be treated as a persistence technology, but rather as a generic external service, next to EmailSender, CurrencyConverter etc.? If so, where should I inject this service? Into Illustration entity methods, IllustrationsRepository, application service layer?

First of all when dealing with DDD there is no many-to-one or any RDBMS concepts, because in DDD the db does not exist, everyhthing is sent to a Repository. If you're using an ORM, know that the ORM entities are NOT Domain entities, they are Persistence objects.
That being said, I think the Illustration Repository should abstract both the RDBMS and S3. These are persistence implementation details, the repo should deal with them. Basically the repo will receive the Illustration AR, which will be saved partly in RDBMS and partly as a blob in S3.
So the Domain doesn't know and shouldn't know about Amazon S3, maybe tomorrow you'll want to use Azure Db, why should the Domain care about it?. It's the Repository's responsibility to deal with it.

Related

Using ORM in dapr

Currently I am learning Dapr, and I find it powerful for micro services development, but after reading more, I found some limitations, the most important of them is "state management".
How can we use tools like EF Core or Dapper for interaction with databases in Dapr? Is DaprClient class the only way of interaction with databases?

You should continue to use those ORMs alongside Dapr as its not a replacement for them. ORMs already act like a translation layer (Repository layer) for relational stored data. State management is simple KVP storage with some minimal query ability. Think of your state as a Dictionary<string,object>, where that object can be anything simple or complex. this is well suited for technologies like Redis, DynamoDB, MongoDB, Cassandra etc. Again, what you would put in cache, is what you would put in state storage. So, you can still (and should) have an ORM in your service for relational data and pass in runtime configuration for provider, connection string etc, while being able to utilize all the other features of Dapr via DaprClient, via HTTPClient or via GRPC clients.
Bonus, with many providers in EFCore/EF you can add tracing headers (https://github.com/opentracing-contrib/csharp-netcore) that will seamlessly plug into the tracing provided for free with Dapr. That way you will see a full e2e trace all the way through your state and relational data

Sending data using Avro objects, is there an advantage to using schema registry?

I have an application where I generate avro objects using an AVSC file and then produce using them objects. I can consumer then with the same schema in another application if I wish by generating the pojos there. This is done with an Avro plugin. The thing I noticed is that the schema doesn't exist in the schema registry.
I think if I change my producer type/settings it might create it there (I am using kafka spring). Is there any advantage to having it there, is what I am doing at the minute just serialization of data, is it the same as say just creating GSON objects from data and producing them?
Is it bad practice not having the schema in the registry?

To answer the question, "is there an advantage" - yes. At the least, it allows for other applications to discover what is contained in the topic, whether that's another Java application using Spring, or not.
You don't require the schemas to be contained within the consumer codebase
And you say you're using the Confluent serializers, but there's no way to "skip" schema registration, so the schemas should be in the Registry by default under "your_topic-value"

Best Practice for Restoring Both S3 and DynamoDB Together

One of the things I see becoming more of a problem in micro-service architecture is disaster recovery. For instance, a common pattern is to store large data objects in S3 such as multimedia data, whilst JSON data would go in DynamoDB. But what happens when you have a hacker come and manages to delete a whole buck of data from your DynamoDB?
You also need to make sure your S3 bucket is restored to the same state is was at that time but are there elegant ways of doing this? The concern is that it will be difficult to guarantee that both the S3 backup and the DynamoDB database are in sync?

I am not aware of a solution to do a genuine synchronised backup-restore across services. However you could use the native DynamoDB point in time restore and the third party S3-pit-restore library to restore both services to a common point in time.

Is this a good use-case for Redis on a ServiceStack REST API?

I'm creating a mobile app and it requires a API service backend to get/put information for each user. I'll be developing the web service on ServiceStack, but was wondering about the storage. I love the idea of a fast in-memory caching system like Redis, but I have a few questions:
I created a sample schema of what my data store should look like. Does this seems like it's a good case for using Redis as opposed to a MySQL DB or something like that?
schema http://www.miles3.com/uploads/redis.png
How difficult is the setup for persisting the Redis store to disk or is it kind of built-in when you do writes to the store? (I'm a newbie on this NoSQL stuff)
I currently have my setup on AWS using a Linux micro instance (because it's free for a year). I know many factors go into this answer, but in general will this be enough for my web service and Redis? Since Redis is in-memory will that be enough? I guess if my mobile app skyrockets (hey, we can dream right?) then I'll start hitting the ceiling of the instance.

What to think about when desigining a NoSQL Redis application
1) To develop correctly in Redis you should be thinking more about how you would structure the relationships in your C# program i.e. with the C# collection classes rather than a Relational Model meant for an RDBMS. The better mindset would be to think more about data storage like a Document database rather than RDBMS tables. Essentially everything gets blobbed in Redis via a key (index) so you just need to work out what your primary entities are (i.e. aggregate roots)
which would get kept in its own 'key namespace' or whether it's non-primary entity, i.e. simply metadata which should just get persisted with its parent entity.
Examples of Redis as a primary Data Store
Here is a good article that walks through creating a simple blogging application using Redis:
http://www.servicestack.net/docs/redis-client/designing-nosql-database
You can also look at the source code of RedisStackOverflow for another real world example using Redis.
Basically you would need to store and fetch the items of each type separately.
var redisUsers = redis.As<User>();
var user = redisUsers.GetById(1);
var userIsWatching = redisUsers.GetRelatedEntities<Watching>(user.Id);
The way you store relationship between entities is making use of Redis's Sets, e.g: you can store the Users/Watchers relationship conceptually with:
SET["ids:User>Watcher:{UserId}"] = [{watcherId1},{watcherId2},...]
Redis is schema-less and idempotent
Storing ids into redis sets is idempotent i.e. you can add watcherId1 to the same set multiple times and it will only ever have one occurrence of it. This is nice because it means you don't ever need to check the existence of the relationship and can freely keep adding related ids like they've never existed.
Related: writing or reading to a Redis collection (e.g. List) that does not exist is the same as writing to an empty collection, i.e. A list gets created on-the-fly when you add an item to a list whilst accessing a non-existent list will simply return 0 results. This is a friction-free and productivity win since you don't have to define your schemas up front in order to use them. Although should you need to Redis provides the EXISTS operation to determine whether a key exists or a TYPE operation so you can determine its type.
Create your relationships/indexes on your writes
One thing to remember is because there are no implicit indexes in Redis, you will generally need to setup your indexes/relationships needed for reading yourself during your writes. Basically you need to think about all your query requirements up front and ensure you set up the necessary relationships at write time. The above RedisStackOverflow source code is a good example that shows this.
Note: the ServiceStack.Redis C# provider assumes you have a unique field called Id that is its primary key. You can configure it to use a different field with the ModelConfig.Id() config mapping.
Redis Persistance
2) Redis supports 2 types persistence modes out-of-the-box RDB and Append Only File (AOF). RDB writes routine snapshots whilst the Append Only File acts like a transaction journal recording all the changes in-between snapshots - I recommend adding both until your comfortable with what each does and what your application needs. You can read all Redis persistence at http://redis.io/topics/persistence.
Note Redis also supports trivial replication you can read more about at: http://redis.io/topics/replication
Redis loves RAM
3) Since Redis operates predominantly in memory the most important resource is that you have enough RAM to hold your entire dataset in memory + a buffer for when it snapshots to disk. Redis is very efficient so even a small AWS instance will be able to handle a lot of load - what you want to look for is having enough RAM.
Visualizing your data with the Redis Admin UI
Finally if you're using the ServiceStack C# Redis Client I recommend installing the Redis Admin UI which provides a nice visual view of your entities. You can see a live demo of it at:
http://servicestack.net/RedisAdminUI/AjaxClient/

Eventual Consistency and messaging

I came across this problem, and so far it seems that the only solution is stronger consistency model. The service is Amazon S3, which provides eventual consistency. We use it as blob storage backend.
The problem is, we introduced messaging pattern to our application, and we love it. There's no doubt about it's benefits. However, it seems that it demands stronger consistency. Scenario:
subsystem acquires data from user
data is saved to S3
message is sent
message is received by another subsystem
data is read from S3
...crickets. Is this the old data? Sometimes it is.
So. We tried to the obvious, send the data in the message to avoid inconsistent reading from S3. But that's pretty nasty thing to do, the message get unnecessarily big, and when the receiver is too busy or goes down, and receives the message late while there's already new data available, it fails.
Is there a solution to this or do we really need to dump S3 for some more consistent backend like RDBMS or MongoDB?

If your scenario allows for your data to always be written on S3 under a new key (by always creating new objects) then you can rely on Amazon's read-after-write consistency.
Here is Amazon's description that describes this consistency model:
Amazon S3 buckets in the US West (Northern California), EU (Ireland),
Asia Pacific (Singapore), and Asia Pacific (Tokyo) Regions provide
read-after-write consistency for PUTS of new objects and eventual
consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the
US Standard Region provide eventual consistency.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas