This is an issue that I have struggled with in a number of systems but this one is a good example. It is to do with when one system consumes WCF services from another system, and each system has their own database, but there are relationships between the two databases.
We have a central database that holds a record of all documents in the company. This database includes Document and Folder tables and it mimicks a windows file structure. NHibernate takes care of data access, a domain layer handles logic (validating filenames/no identical filenames in the same folder etc.) and a service layer sits on that, with services named 'CreateDocument(bytes[])', 'RenameDocument(id, newName) ', 'SearchDocuments(filename, filesize, createdDate)' etc. These services are exposed with WCF.
An HR system consumes these services. The HR database has a separate database that has foreign keys to the Document database: it contains an HRDocument table that has a foreign key DocumentId, and then HR specific such as EmployeeId and ContractId.
Here are the problems amonst others:
1) In order to save a document, I have to call the WCF service to save it to the central db, return the ID and then save to the HRDocument table (along with the HR specific information). Because of the WCF call and all Document specific data access being done within the Document application, this can't be done all within one transaction, resulting in a possible loss of transaction integrity.
2) In order to search on say, employeeId and createdDate, I have to call the search service passing in the createdDate (Document database specific fields) and then search the HRDocument database on the Id's of the returned records to filter the results returned. This feels messy, slow and just wrong.
I could duplicate the NHibernate mapping files to the Document database in the DAL of the HR application. This means I could specify the relationship between HRDocument and Document. This means I could join the tables and search like that but would also mean I would have to duplicate domain logic and violate the DRY principle, and all that entails.
I can't help feeling I'm doing something wrong here and have missed something simple.
I recommend you to apply CQRS and Event Driven Architecture principles here
Use Guids as primary keys - then you
will be able to generate primary key
for document and pass it to WCF
method call.
Use messaging on other side of WCF service to prevent data loss
(in case of database failure and
something like that).
Remove constaints between databases - immediate
consistent applications don't
scale. Use eventual consistency
paradigm instead.
Introduce separate data storage for
reads purpose that contains denormalized data. Then you will be able
to do search very easy. To ensure
consistency in your read storage (in
case when Document creation
failed) you could implement some
simple workflow (saga in terms of
CQRS)
You can create a common codebase which will include base implementation of Document along with all the mappings, base Domain Model etc.
A Document Service and an HR System use the same codebase. But in HR System you extend base Document class (or classes) with your HRDocument using inheritance mapping strategy which will suit your needs the best.
public class HRDocument : Document
And from HR System you don't even have to call Document Service anymore, you just use NH and enjoy ACID and all that. But Document Service is still there and there's no code duplication.
Related
The database we are designing allows users to authenticate with multiple 3rd party services, mostly social media (twitter, facebook, etc). There will be an unknown and growing number of these services. Each service requires a unique set of data for authentication that is not standard with the other services.
One user may authenticate many services, but they may only authenticate with one of each type of service.
Possible Solutions:
A) The most direct solution to this issue is to simply add a column for each service to the user table which contains the JSON authentication for that service. However, this violates normalization by leaving a large number of nulls in the database. What happens when there are 50 of these integrations for instance?
B) Each service gets its own table in the database. JSON is no longer needed as each field can be properly described. Then a lookup table is needed "user_has_service" for each service. This is a table which contains only two foreign keys, one for the user and one for the service, linking them together. This option seems the most correct but is very inefficient and will take many operations to determine what services a user has, increasing with the number of services. I believe also in this case, the ID field for the lookup table would need to be some kind of hash of the user and service together so that duplicate inserts are not possible.
Not at all a database expert and I have been grappling with this one for quite a while. Any thoughts?
A) The most direct solution ... JSON
You are right, option A is grossly incorrect. It breaks Codds' First Normal Form, thus it is not Relational. NULL in the database is an indication of incomplete Normalisation, which leads to complex SQL code. To be avoided at all costs.
similar but unique
To be clear, that they are unique to the Service is true. That {LoginName; UserName; Email; UserId; etc} are all similar is true in the implementation sense only, not in the data.
I may need to sketch this out.
That is a great idea. A visual data model is far more effective, because (a) the mind can comprehend it much better than text, and (b) therefore work out details; contradictions; missing bits; etc. Much easier to progress each iteration visually, than with text.
Second, we have had visual modelling tools since 1987 (1984 for a closed group), which have been made a Standard in 1993. Hopefully you appreciate that a standard-compliant model is better than a home-grown or corporate-supplied one. It displays all technical details rather than a small subset.
Is there a name for this strategy
It is plain old Relational Data Modelling, which includes Normalisation (ensuring compliance with Codd's Normal Forms, as opposed to the insanity of implementing the NFs is fragmented progressive steps).
Obstacle
One problem that needs to be understood and eliminated is this. The "theoreticians" market and propagate 1960's Record Filing Systems under the banner of "relational". That is characterised by a Record IDs in every file. That method ensures the database remains physical, not logical, the very thing that Codd overcame with his Relational Model: a database that is logical and therefore extremely easy to navigate, by any querying party, current; planned; or unplanned.
The essential difference between 1960's RFS and post-1970 Relational Databases is:
whereas the RFS maintains references between Files by physical pointer (Record ID), the Relational Database maintains references between Tables by logical Key.
A logical Key is "made up from the data" as per Codd
(A datum that is fabricated by the system is not "made up from the data")
(Use of the SQL command PRIMARY KEY does not magically anoint the datum with the properties and qualities of a Relational Key: if you use PRIMARY KEY RecordID you are in 1960's physical paradigm, not the post-1970 Relational paradigm)
Logical Keys provide Relational Integrity (as distinct from Referential Integrity, which is an ordinary function of SQL), which is far superior to that obtained by 1960's RFS
As well as far superior Speed and Power (far less JOINs, and smaller sets)
Relational Database
Therefore I will give you the answer as a Relational Data Model, as per Codd.
Just one example of Relational Integrity:
the ServiceProperty FK elements in UserServiceProperty is constrained to PK (particular combination) in ServiceProperty
a UserServiceProperty row with Facebook.Email is prevented
A Record ID based 1960's RFS that the "theoreticians" promote as "relational" cannot do that, various errors such as that one are allowed.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners.
The IDEF1X Anatomy is a refresher for those who have lapsed.
If you have trouble reading the Predicates directly from the Data Model, let me know and I will produce them in text form.
Please feel free to ask questions, the more specific the better.
You could set up:
a referential table called services to list all the available services, with columns like service_id (primary key), service_name and descriptions and so on. Each service is represented as one record in this table.
a table called services_properties to store the properties of the services; this table has 3 columns: service_id (foreign key to the primary key of services), property_name and property_value. A unique constraint can be set up on service_id/propery_value tuples to avoid duplicates. Each service has several records in the services_properties table. This flexible structure lets you store as many different properties as needed for each service without creating a new table for each service
a mapping table called user_services, that relates users to services. Columns would be service_id and user_id, as foreign keys to the primary keys of the services table and users table. You can query this table to easily list the services subscribed by each user.
I'm creating a mobile app and it requires a API service backend to get/put information for each user. I'll be developing the web service on ServiceStack, but was wondering about the storage. I love the idea of a fast in-memory caching system like Redis, but I have a few questions:
I created a sample schema of what my data store should look like. Does this seems like it's a good case for using Redis as opposed to a MySQL DB or something like that?
schema http://www.miles3.com/uploads/redis.png
How difficult is the setup for persisting the Redis store to disk or is it kind of built-in when you do writes to the store? (I'm a newbie on this NoSQL stuff)
I currently have my setup on AWS using a Linux micro instance (because it's free for a year). I know many factors go into this answer, but in general will this be enough for my web service and Redis? Since Redis is in-memory will that be enough? I guess if my mobile app skyrockets (hey, we can dream right?) then I'll start hitting the ceiling of the instance.
What to think about when desigining a NoSQL Redis application
1) To develop correctly in Redis you should be thinking more about how you would structure the relationships in your C# program i.e. with the C# collection classes rather than a Relational Model meant for an RDBMS. The better mindset would be to think more about data storage like a Document database rather than RDBMS tables. Essentially everything gets blobbed in Redis via a key (index) so you just need to work out what your primary entities are (i.e. aggregate roots)
which would get kept in its own 'key namespace' or whether it's non-primary entity, i.e. simply metadata which should just get persisted with its parent entity.
Examples of Redis as a primary Data Store
Here is a good article that walks through creating a simple blogging application using Redis:
http://www.servicestack.net/docs/redis-client/designing-nosql-database
You can also look at the source code of RedisStackOverflow for another real world example using Redis.
Basically you would need to store and fetch the items of each type separately.
var redisUsers = redis.As<User>();
var user = redisUsers.GetById(1);
var userIsWatching = redisUsers.GetRelatedEntities<Watching>(user.Id);
The way you store relationship between entities is making use of Redis's Sets, e.g: you can store the Users/Watchers relationship conceptually with:
SET["ids:User>Watcher:{UserId}"] = [{watcherId1},{watcherId2},...]
Redis is schema-less and idempotent
Storing ids into redis sets is idempotent i.e. you can add watcherId1 to the same set multiple times and it will only ever have one occurrence of it. This is nice because it means you don't ever need to check the existence of the relationship and can freely keep adding related ids like they've never existed.
Related: writing or reading to a Redis collection (e.g. List) that does not exist is the same as writing to an empty collection, i.e. A list gets created on-the-fly when you add an item to a list whilst accessing a non-existent list will simply return 0 results. This is a friction-free and productivity win since you don't have to define your schemas up front in order to use them. Although should you need to Redis provides the EXISTS operation to determine whether a key exists or a TYPE operation so you can determine its type.
Create your relationships/indexes on your writes
One thing to remember is because there are no implicit indexes in Redis, you will generally need to setup your indexes/relationships needed for reading yourself during your writes. Basically you need to think about all your query requirements up front and ensure you set up the necessary relationships at write time. The above RedisStackOverflow source code is a good example that shows this.
Note: the ServiceStack.Redis C# provider assumes you have a unique field called Id that is its primary key. You can configure it to use a different field with the ModelConfig.Id() config mapping.
Redis Persistance
2) Redis supports 2 types persistence modes out-of-the-box RDB and Append Only File (AOF). RDB writes routine snapshots whilst the Append Only File acts like a transaction journal recording all the changes in-between snapshots - I recommend adding both until your comfortable with what each does and what your application needs. You can read all Redis persistence at http://redis.io/topics/persistence.
Note Redis also supports trivial replication you can read more about at: http://redis.io/topics/replication
Redis loves RAM
3) Since Redis operates predominantly in memory the most important resource is that you have enough RAM to hold your entire dataset in memory + a buffer for when it snapshots to disk. Redis is very efficient so even a small AWS instance will be able to handle a lot of load - what you want to look for is having enough RAM.
Visualizing your data with the Redis Admin UI
Finally if you're using the ServiceStack C# Redis Client I recommend installing the Redis Admin UI which provides a nice visual view of your entities. You can see a live demo of it at:
http://servicestack.net/RedisAdminUI/AjaxClient/
I have a requirement in which I have create customer into external system, When i create it in AX. first solution which came to my mind, was too consume the external party web service into AX.
but how will it implement distributed transactions ??
Your integration options are many, but I would not recommend inserting directly into SQL. That is a really bad solution. The obvious reason is the fact that having an insertion of records done outside of the AOT breaks the business logic persisted in the AOT. The database is only meant for persisting data and optimization of read and write of data. The business logic in the AOT is responsible for hold all the logic when creating and updating records.
In my opinion the only choice you have is through x++ code. Standard AX has a special class for this purpose, "AxCustTable". You want to use this class to make sure your customer data is in best shape throughout the life if your integration.
As for transportation of data between the systems, your choices are many:
* Textfiles
* Xml files
* webservices
* read data from a different sql database
* etc
Just make sure that when you write data to your CustTable all the current and future business logic is still in effect. Insering data directly into the CustTable on the database level would efficently break that dependency.
Good luck!
You didn't write all the requirements fully. So this is what I'm thinking.
I would use Message Queue for this. It can happen that the external system cannot save the customer at the time when the user creates the customer in AX (down time or something else). I wouldn't let the external system stand too much in the way of my application...
How will it implement distributed transactions?
It will not!
But does it matter?
Customer data changes rarely and rarely need transactional scope.
You can create Linked Server and use distributed transactions with direct SQL (class ODBCConnection)
We have a Data Access service in our SOA WCF system. This service is responsible for doing CRUD (create, update, delete) operations on "system wide" database tables, and is also the source of this data for queries. Any other service in the system wanting to access the tables under the contol of the DAS have to go to the DAS to get it or modify it. We use Entity Framework and built our own POCO state tracking system for this DAS.
We have other tables in our database that belong to single services and store data only for their own use, ie state information they can access if they crash and resume or recording of business information. We have a rule any one table cannot be accessed by more than one service: so data needed by multiple services ends up in the DAS.
Truth is I have never really understood why a Data Access Service is a good idea as opposed to just accessing tables directly. It seems to be to be slower, our DAS is not transactional as it cannot send back a POCO graph for database update (only single POCOS at a time) and we have issues also where the DAS is actually a client to another service which needs data from it...circular dependancy.
Why bother with a DAS? Why is a DAS so important when it comes to SOA? What am I missing here? Single point of control?
Is it also an SOA design flaw that not all tables are part of a DAS and that some services have their own "private" tables?
Any discussion about this welcome.
You're correct in thinking that this is the proper way to do things, and you're also correct that it slows things down and can occasionally be cumbersome. SOA necessarily trades off some efficiency in exchange for ensuring single points of control for all data associated with a service. In fact, even the idea of having a "common DAS" service is slightly smelly in some SOA circles.
By centralizing all CRUD operations to one service in an SOA application, you can ensure data integrity and that business rules are being acted upon properly. To give an example, think of an entity you'd like to store that has some business rules associated with it that are difficult to approach from a pure SQL perspective - for example, let's say a table that stores file references, and create / update services that ensure that these files exist.
With SOA and a single access point to those tables, you can code the logic into the create / update methods and be reasonably assured that the data you're recieving from the service is valid - i.e. the files referenced exist. If anyone was capable of writing to these tables or retrieving data from them, no such assurance would exist - even if you're calling the service yourself, you don't know what other programmers, through malice or just plan forgetfulness, forgot to implement that critical business rule. This leads to defensive programming where every bit of client code is ensuring business logic independently, and ultimately a tangled mess of business logic scattered throughout your application.
Another benefit is scalability and maintanability. Let's say one of your services is accessing a huge chunk of data. With SOA, everything is "black-boxed" so that your client code doesn't have much knowledge of how the data is ultimately obtained. You could change your RDBMS, partition tables, or implement caching, and make that all invisible to the client code calling it - ensuring your painful updates only need to be made in one place. With database code scattered throughout your app, this sort of upgrade becomes extremely painful.
We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.