WCF Data Service / Surrogate Key

WCF Data Service / Surrogate Key - wcf

I want to build a WCF data service which should be used for CRUD operations on a database backend. In order to identify the related record of the object in the database I have to know it's primary key. I use surrogate keys in my database schema.
Is it a good practice to pass the surrogate keys to the caller, so that it is possible to identify the records in the database in subsequent calls? (Caller retrieves object, caller modifies object, caller calls WCF update method) I know that surrogate keys should normally not be used outside the database. If that is not a good idea, what other options do I have?
Any advice is greatly apreciated.

Yes, your solution is totally adequate. It's the simplest way to map CLR objects to persistent entities. Moreover, your services' consumers may find this unqiue identifier useful when programming UI, for logging purposes etc.
I'd go this way without hesitation.

I think this all depends on what type of data you are talking about. If you are talking about pure resource driven data, there are no issues with exposing surrogate keys. However, if this is business data you should only expose business keys. It allows disjoint systems to talk in a generic way.

Related

Multiple data insertions using async writing with Apache Geode

We have Apache Geode connected to Postgres using an AEQ + AsyncCacheListener configured to write data to Postgres. During async write, we submit the list of events that we want to persist and it asynchronously inserts those events. Let's say I have two client applications which calls processEvents for async writing and both have some events in common which violate some key. But, after client calls processEvents, control is immediately returned to client. In such cases how will client know if some issue occurred? What are the best practices to tackle this?

What do you mean by the events in common "violate some key"? Like a primary or foreign key constraint, or some other database constraint perhaps (e.g. uniqueness, non-null values, etc)?
Handling a conflict depends on the importance and nature of the data being inserted, or written to the backend (Postgres) database from Geode and its significance to the application, from a requirements and business logic POV.
If 2 (or more) client applications are writing to the same cache/database entries/records, then certainly some type of collision will eventually occur, and how it is handled will depend on the data and the type of operation performed on the data.
In general, handling the violation closer to where and when the violation occurs (e.g. inside the AsyncEventListener itself) maybe preferable or ideal, since then you should have most of the necessary information (e.g. DataAccessException, events, additional capabilities to query the DB) to deal with the situation.
Inside the AEQ Listener, you could employ different strategies depending on the data and operation as determined by the application:
First update wins (enforced by optimistic locking)
Perform a merge
Log [failed] event(s)
Overwrite value(s) (last update wins).
...
You could employ Geode to conflate events stored in the AEQ for the same key, which should minimize collisions/conflicts.
If the client (as in "client" in a client/server topology) needs to be informed, then you could write the failed events to another Region where a client registers a CQ to be notified when entries are written to this (failed events) Region. The client-side handler associated the CQ could then take the appropriate action, such as notifying the end-user, refreshing and then retrying the operation, and so on.
Given the async nature of the initial write, then you can only respond asynchronously once the violation occurs. This is not unlike in a Reactive world (namely with onSuccess/onFailure event handlers).
So, in this situation, I don't think there really is a "best practice" per-say, rather only "recommendations". For example, handling the situation as near to the actual occurrence of the violation as possible, since handling the violation usually involves having the necessary information readily available to make the best possible, informed decision on the right course of action.
Sometimes you can automate the recovery, other times you might need manual intervention. Most definitely, do not guess. Clearly document your application/systems (configured) behavior when it can handle a situation and when it cannot.
I don't think there is a general, 1 size fits all solution in this case.
I hope this gives you some ideas to think about.

What is the optimal relational database design for storing an unknown number of similar but unique entities

The database we are designing allows users to authenticate with multiple 3rd party services, mostly social media (twitter, facebook, etc). There will be an unknown and growing number of these services. Each service requires a unique set of data for authentication that is not standard with the other services.
One user may authenticate many services, but they may only authenticate with one of each type of service.
Possible Solutions:
A) The most direct solution to this issue is to simply add a column for each service to the user table which contains the JSON authentication for that service. However, this violates normalization by leaving a large number of nulls in the database. What happens when there are 50 of these integrations for instance?
B) Each service gets its own table in the database. JSON is no longer needed as each field can be properly described. Then a lookup table is needed "user_has_service" for each service. This is a table which contains only two foreign keys, one for the user and one for the service, linking them together. This option seems the most correct but is very inefficient and will take many operations to determine what services a user has, increasing with the number of services. I believe also in this case, the ID field for the lookup table would need to be some kind of hash of the user and service together so that duplicate inserts are not possible.
Not at all a database expert and I have been grappling with this one for quite a while. Any thoughts?

A) The most direct solution ... JSON
You are right, option A is grossly incorrect. It breaks Codds' First Normal Form, thus it is not Relational. NULL in the database is an indication of incomplete Normalisation, which leads to complex SQL code. To be avoided at all costs.
similar but unique
To be clear, that they are unique to the Service is true. That {LoginName; UserName; Email; UserId; etc} are all similar is true in the implementation sense only, not in the data.
I may need to sketch this out.
That is a great idea. A visual data model is far more effective, because (a) the mind can comprehend it much better than text, and (b) therefore work out details; contradictions; missing bits; etc. Much easier to progress each iteration visually, than with text.
Second, we have had visual modelling tools since 1987 (1984 for a closed group), which have been made a Standard in 1993. Hopefully you appreciate that a standard-compliant model is better than a home-grown or corporate-supplied one. It displays all technical details rather than a small subset.
Is there a name for this strategy
It is plain old Relational Data Modelling, which includes Normalisation (ensuring compliance with Codd's Normal Forms, as opposed to the insanity of implementing the NFs is fragmented progressive steps).
Obstacle
One problem that needs to be understood and eliminated is this. The "theoreticians" market and propagate 1960's Record Filing Systems under the banner of "relational". That is characterised by a Record IDs in every file. That method ensures the database remains physical, not logical, the very thing that Codd overcame with his Relational Model: a database that is logical and therefore extremely easy to navigate, by any querying party, current; planned; or unplanned.
The essential difference between 1960's RFS and post-1970 Relational Databases is:
whereas the RFS maintains references between Files by physical pointer (Record ID), the Relational Database maintains references between Tables by logical Key.
A logical Key is "made up from the data" as per Codd
(A datum that is fabricated by the system is not "made up from the data")
(Use of the SQL command PRIMARY KEY does not magically anoint the datum with the properties and qualities of a Relational Key: if you use PRIMARY KEY RecordID you are in 1960's physical paradigm, not the post-1970 Relational paradigm)
Logical Keys provide Relational Integrity (as distinct from Referential Integrity, which is an ordinary function of SQL), which is far superior to that obtained by 1960's RFS
As well as far superior Speed and Power (far less JOINs, and smaller sets)
Relational Database
Therefore I will give you the answer as a Relational Data Model, as per Codd.
Just one example of Relational Integrity:
the ServiceProperty FK elements in UserServiceProperty is constrained to PK (particular combination) in ServiceProperty
a UserServiceProperty row with Facebook.Email is prevented
A Record ID based 1960's RFS that the "theoreticians" promote as "relational" cannot do that, various errors such as that one are allowed.
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993
My IDEF1X Introduction is essential reading for beginners.
The IDEF1X Anatomy is a refresher for those who have lapsed.
If you have trouble reading the Predicates directly from the Data Model, let me know and I will produce them in text form.
Please feel free to ask questions, the more specific the better.

You could set up:
a referential table called services to list all the available services, with columns like service_id (primary key), service_name and descriptions and so on. Each service is represented as one record in this table.
a table called services_properties to store the properties of the services; this table has 3 columns: service_id (foreign key to the primary key of services), property_name and property_value. A unique constraint can be set up on service_id/propery_value tuples to avoid duplicates. Each service has several records in the services_properties table. This flexible structure lets you store as many different properties as needed for each service without creating a new table for each service
a mapping table called user_services, that relates users to services. Columns would be service_id and user_id, as foreign keys to the primary keys of the services table and users table. You can query this table to easily list the services subscribed by each user.

Microservices are compatible with existing SQL database?

I'm creating a microservice architecture with Core, rabbitMQ, strangler pattern ... but I have to use an existing SQL database (Transaction requeriment).
Doing a research I don't found a lot of information about how implement SQL database, but I think it's impossible to do a transactional operation on different services at the same time.
1- Every service must have access to entirely database?
2- Is a good idea do a service exclusive to do transactionals operations?
3- SQL with microservices it's maybe too much slow?
I don't know if exist a standard for this.
Thanks.

The whole point of microservices is about having small, independent services that are decoupled as much as possible.
Sharing a common database introduces very strong coupling, and is not recommended.
If two services need the same data, you could either (a) have a different database for each, and replicate the data, or (b) introduce a third service that is responsible for access to the database.
If you're looking for a bigger-scale distributed transaction across microservices, then you should look into things like sagas. Typically you'll have a coordinator ("process manager" in some literature) that tracks the various operations, and can compensate or cancel actions that have been performed if the transaction as a whole is bound to fail.
3- SQL with microservices it's maybe too much slow?
What makes you think so?
There is nothing about SQL that makes it inadequate for microservices. Microservices may vary wildly in terms of what they do and what they require. SQL will be perfectly suitable for some microservices, and possibly not so suitable for others. It depends on the service.

It look like you need a distributed transactions in your system
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681205(v=vs.85).aspx
Also there is a nice book devoted to microservices. It includes distributed transactions and other patters used in microservice bases apps.
http://shop.oreilly.com/product/0636920033158.do

1- Every service must have access to entirely database?
No. A microservice has its own schema related to the Aggregate Root / Service that it offers. If a service needs data of another entity, it invokes the APIs provided by another micro service.
2- Is a good idea do a service exclusive to do transactionals
operations?
No. Each microservice is a transaction boundary in its own right. Distributed transactions, particularly using 2PC, do not perform particularly well.
3- SQL with microservices it's maybe too much slow?
I am not totally clear as to why you make such a statement.

System consuming WCF services from another system, when underlying databases have relationships

This is an issue that I have struggled with in a number of systems but this one is a good example. It is to do with when one system consumes WCF services from another system, and each system has their own database, but there are relationships between the two databases.
We have a central database that holds a record of all documents in the company. This database includes Document and Folder tables and it mimicks a windows file structure. NHibernate takes care of data access, a domain layer handles logic (validating filenames/no identical filenames in the same folder etc.) and a service layer sits on that, with services named 'CreateDocument(bytes[])', 'RenameDocument(id, newName) ', 'SearchDocuments(filename, filesize, createdDate)' etc. These services are exposed with WCF.
An HR system consumes these services. The HR database has a separate database that has foreign keys to the Document database: it contains an HRDocument table that has a foreign key DocumentId, and then HR specific such as EmployeeId and ContractId.
Here are the problems amonst others:
1) In order to save a document, I have to call the WCF service to save it to the central db, return the ID and then save to the HRDocument table (along with the HR specific information). Because of the WCF call and all Document specific data access being done within the Document application, this can't be done all within one transaction, resulting in a possible loss of transaction integrity.
2) In order to search on say, employeeId and createdDate, I have to call the search service passing in the createdDate (Document database specific fields) and then search the HRDocument database on the Id's of the returned records to filter the results returned. This feels messy, slow and just wrong.
I could duplicate the NHibernate mapping files to the Document database in the DAL of the HR application. This means I could specify the relationship between HRDocument and Document. This means I could join the tables and search like that but would also mean I would have to duplicate domain logic and violate the DRY principle, and all that entails.
I can't help feeling I'm doing something wrong here and have missed something simple.

I recommend you to apply CQRS and Event Driven Architecture principles here
Use Guids as primary keys - then you
will be able to generate primary key
for document and pass it to WCF
method call.
Use messaging on other side of WCF service to prevent data loss
(in case of database failure and
something like that).
Remove constaints between databases - immediate
consistent applications don't
scale. Use eventual consistency
paradigm instead.
Introduce separate data storage for
reads purpose that contains denormalized data. Then you will be able
to do search very easy. To ensure
consistency in your read storage (in
case when Document creation
failed) you could implement some
simple workflow (saga in terms of
CQRS)

You can create a common codebase which will include base implementation of Document along with all the mappings, base Domain Model etc.
A Document Service and an HR System use the same codebase. But in HR System you extend base Document class (or classes) with your HRDocument using inheritance mapping strategy which will suit your needs the best.
public class HRDocument : Document
And from HR System you don't even have to call Document Service anymore, you just use NH and enjoy ACID and all that. But Document Service is still there and there's no code duplication.

WCF SOA: CRUD Data Access Service...why bother (or is our design wrong)?

We have a Data Access service in our SOA WCF system. This service is responsible for doing CRUD (create, update, delete) operations on "system wide" database tables, and is also the source of this data for queries. Any other service in the system wanting to access the tables under the contol of the DAS have to go to the DAS to get it or modify it. We use Entity Framework and built our own POCO state tracking system for this DAS.
We have other tables in our database that belong to single services and store data only for their own use, ie state information they can access if they crash and resume or recording of business information. We have a rule any one table cannot be accessed by more than one service: so data needed by multiple services ends up in the DAS.
Truth is I have never really understood why a Data Access Service is a good idea as opposed to just accessing tables directly. It seems to be to be slower, our DAS is not transactional as it cannot send back a POCO graph for database update (only single POCOS at a time) and we have issues also where the DAS is actually a client to another service which needs data from it...circular dependancy.
Why bother with a DAS? Why is a DAS so important when it comes to SOA? What am I missing here? Single point of control?
Is it also an SOA design flaw that not all tables are part of a DAS and that some services have their own "private" tables?
Any discussion about this welcome.

You're correct in thinking that this is the proper way to do things, and you're also correct that it slows things down and can occasionally be cumbersome. SOA necessarily trades off some efficiency in exchange for ensuring single points of control for all data associated with a service. In fact, even the idea of having a "common DAS" service is slightly smelly in some SOA circles.
By centralizing all CRUD operations to one service in an SOA application, you can ensure data integrity and that business rules are being acted upon properly. To give an example, think of an entity you'd like to store that has some business rules associated with it that are difficult to approach from a pure SQL perspective - for example, let's say a table that stores file references, and create / update services that ensure that these files exist.
With SOA and a single access point to those tables, you can code the logic into the create / update methods and be reasonably assured that the data you're recieving from the service is valid - i.e. the files referenced exist. If anyone was capable of writing to these tables or retrieving data from them, no such assurance would exist - even if you're calling the service yourself, you don't know what other programmers, through malice or just plan forgetfulness, forgot to implement that critical business rule. This leads to defensive programming where every bit of client code is ensuring business logic independently, and ultimately a tangled mess of business logic scattered throughout your application.
Another benefit is scalability and maintanability. Let's say one of your services is accessing a huge chunk of data. With SOA, everything is "black-boxed" so that your client code doesn't have much knowledge of how the data is ultimately obtained. You could change your RDBMS, partition tables, or implement caching, and make that all invisible to the client code calling it - ensuring your painful updates only need to be made in one place. With database code scattered throughout your app, this sort of upgrade becomes extremely painful.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas