Use of GUIDs as id in public facing data integration web service - wcf

To put the question into some context, the system exposing the web service uses GUIDs internally as identifiers for all entities.
In such case, when designing a public facing data integration web service (used mainly for importing or exporting data from the system to other, external systems), what would you consider as pros and cons of using the internal identifiers in the interface of the service?
With such solution, the export methods of the web service would return the dto's identified with GUIDs and the import methods would accept similar dto's - I assume the external system would be responsible for generating new GUIDs for new entities.
What are the potential issues one might face with this approach?
The technology stack for the system and web service is .NET / WCF / SOAP

First, let's look at the more generic "how do I set up a public API" question, my first exercise is determining what information is needed by the consumer of the service. I also look and see if there are is company specific naming in the object model. I then create a service model (data contract, if you want WCF specific) that matches the view I want to show the consumer. This includes a unique key, which is more often a SKU string (human readable key) than a GUID/int (the actual derived primary key), as the SKU is public and the means of storing in the database is not. So, in general, I would not expose these primary key concepts, if that is what the GUID is.
Now to the question of "do you see problems with this approach". I will focus on more general concepts so you can make a more informed decision, as there is no 100% right/wrong answer.
As long as this is machine to machine and the use of the GUID is something both systems are aware of, I see nothing particularly scary about this approach. If this ultimate goes to a human readable system where the GUID has to be interacted with, then you have an issue.
One potential issue with the system is exposing your own primary key information to customer or client systems, who don't have to understand this level of detail. If this is actually "semi-public" with a select list of vendors, the "risk" might be less. This is the primary issue I see.
One could argue the weight of the GUID (128 bits) versus a smaller identifier, but this is a bogus answer, IMO, as the network latency more than outweighs sending a few more bytes as a request parameter.

Related

What is the difference between performing a procedure on a resource and performing a state transformation?

I'm new to web development and I'm attempting to understand REST. The tutorial I'm watching makes mention of the difference between "procedures" and "state transformation". Stating that REST is based on the notion of "state transformation", but it does not delineate the difference between the two.
This has left me wondering what is the difference between the two? Why can't an operation which transforms the state of a resource also be considered a procedure? After all, 'procedure' sounds like a generic enough term that it would also encompass an operation that would transform the state of a resource.
So, what is the difference between performing a procedure on a resource, and performing a state transformation? Or is it merely a matter of semantics?
I have also tried searching for the answer but can't seem to find anything that will shed light on this.
TL;DR
RPC focues on sending a payload containing method names and arguments in a predefined format. Clients couple tightly to servers through a shared interface (Skeletton classes, WSDL or other interface definition languages (IDLs))
REST focues on decoupling clients from servers and on introducing indirections, like support of multiple different media types to marshal resource state in, and the whole interaction concepts summarized by HATEOAS where hypertext controls are used to drive the application state forward through a domain application protocol / state machine on the server side. Responses usually contain semi-structured data, which usually don't go well with simple CRUD application, that follow the definition of corresponding media type definition (i.e. the HTML spec). If you will the state of a resource is transformed into a representation format adhering to the rules in the media type definition and transferred to the remote side
In network programming, remote procedure call (RPC)-style invocations, i.e. often used in RMI, Corba, SOAP or similar frameworks, will send usually a method name that should be invoked at the server alongside with parameters to feed the method with. The return value is then marshalled into corresponding response and sent back to the caller. What a client could invoke is usually exposed via external stuff, i.e. skeletton classes, WSDL- or other form of contracts and so on. So far, so simple. This is how most of the networking stuff works. However, the drawback here is that a client is tightly coupled to the exposed interface (skeletton classes, WSDL, external documentation) and many problems in internet computing arise due to changes over time that are not adequatly depictable in those interfaces.
If you take a closer look though at how the Web works for decades, change is an inherent part of it. Your browser will just show the most recent state of a resource (Web page) it has. It might eigther got it from its cache or from a server it asked for. If the version available in its cache is older than a predefined threshold value it will ignore the cached value and request a new version. If there happened an update since the last version your browser is automatically served with the new version. Fielding, who was working on the HTTP 1.0 and 1.1 spec back then, analyzed how the interaction on the Web takes place and generalized his findings into the REST architecture design. So, if you will, REST is just Web surfing for applications.
Unfortunately a mojority of enthusiasts and professional have not yet understood what REST really is and there is so much false information available in regards to REST, even here at Stackoverflow most people don't seem to care actually and posts explaining the true nature of REST are downvoted and wrong information upvoted.
So, what does REST different than typical RPC-like method invocations?
First, REST relies on a certain set of uniform interfaces, that are the same for every participant in that architecture. These are i.e. HTTP as transport layer and a naming scheme for resource (URI) so that everyone acts on these fixed principles. This helps to reduce interoperability issues that are just way to common in traditional network programming.
Next, it relies on a basic principle: Servers teach clients what they need to know. But how does a server know what a client need to know? Well, as Jim Webber pointed out, the designer of the application develops a state machine (or domain application protocol) a client will follow through. Think of a checkout system on your favorite online shop. At one point it presents you the items in your trolly and offers you a choice to progress to the next "page" where you can enter the shipping address and on further progressing through the state machine you will be asked for your payment options and so on until at one point to finished the checkout and are served with a "Thank you" page that summarizes your order. Under the hood you just progressed through their protocol on how to place orders and used application controls to progress your client further through their state machine. You therefore got served with some Web forms and links that you used to fulfill your task. In essence, this is what Hypertext as the engine of application state (or HATEOAS for short) is all about.
On the Web HTML forms are used to teach a client about what properties a resource supports, which ones are editable and so on. Besides that, it also teaches clients on the actual URI to send input data to, the HTTP operation to utilze as well as, mostly implicitly given, the media type to marshal the request into. I.e. a regular HTML form will use application/x-www-form-urlencoded as its default media type to send the data to the server. So a full HTTP request for an input of a first and last name may look like this:
POST /path/to/resource HTTP/1.1
Host: acme.org
Connection: close
Accept: */*
User-Agent: ...
Content-Type: application/x-www-form-urlencoded
Content-Length: 32
firstName=Roman&lastName=Vottner
The same data could be sent using a different representation format, if it were supported by the media type the form was issued for. Unfortunately, HTML does not support that many.
Links provided by a server should usually be annotated (or accompanyied) by so called link relation names that put the current resource in relation with the given URI. If you will they are the predicate in a tripple of subject (current page), predicate (link relation name) and object (link target resource). Such names, of course, should be standadized or at least follow the Web linking extension mechanism. URIs itself are opaque, meaning they themselves don't provide meaning and should therefore not get parsed and analyzed at all. A common mistake often seen in so called "REST APIs" is that they have typed resources, i.e. a user resource or a car resource that can be marshalled on the client side to a programming language specific object (i.e. Java object of class User or the like) that is pretty common in traditional RPC-sytle programming. In a REST architecture the representation format however is usually semi-structured data, i.e. a mix of syntax defining control inputs or elements and actual data. As such, a direct mapping from DB-Entry, to Model-Object to a resource itself, as done by so many CRUD applications, is not possible.
Why is this all done in first place?
If you compare traditional network programming a client is usually only able to work with that one server and if something at that server changes clients may be affected and thus stop working. There is a tight coupling between those two apparent. The REST architecture introduces a couple of indirections, i.e. usage of link relations instead of attempting to analyze meaningful URIs as well as usage of a multitude of possible media-types instead of relying on a specified version format, which help to decouple clients from servers. I.e. instead on coupling to the server in regards of the message exchanged, both, client and server couple to media types. Through content-type negotiation a client simply tells the server of its capabilities and the server should generate a response the client can process. Instead of focusing on one message format, REST has the freedom of almost infinite ones as long as both, client and server, support these. The more media types a peer supports, the more likely it will be to interact with other peers in that network.
All these points I've mentioned above lead to a strict decoupling of client and servers, which grant the latter one to evolve freely without having to fear that changes introduce will break clients as neither the transport protocol nor the naming scheme have changed and the changes introdcued are still in scope of the media-type definition. So, well-behaved peers in that network will be able to pick up changes on the fly automatically. This is especially handy if you develop an application that should withstand the sands of time and still server clients in years to come.
If you don't need such properties, there is nothing wrong with not being "RESTful" at all, just don't call such services/APIs REST then. Also, developing REST is for sure more overhead compared to typical RPC-style interactions.

Resolving design dependency between microservices

I am designing an e-commerce application with microservice approach, using ORM(JPA) for data persistence for one of the microservice named OrderService. OrderService owns functionality related to persisting and reporting orders, which essentially include customer and product information. Customer and product functionality is managed by different microservices.
My question is at ORM layer OrderService need POJO which belongs to ProductService and CustomerService. What is the best way to deal with this dependency between services? Should application needs to design in different way?
There are few things that one should take into consideration when try to find a solution
1. You cannot access the database of other services, you have to make a call.
2. You should try not to keep data from other services into yours. Data duplication lead to an inconsistent state and should be avoided if you can
3. You should have a means to query data from other services when asked for.
Now with those points, I will mostly restrict data from other services to some reference ids (which should be immutable). At ORM layer I will just fetch the reference IDs and bloat them up by making an API call to concerned services(business layer).
You may realize that you are making way too many calls for say getting customer name to customer service using customer id, if that is the case, you may consider saving some of these information in your system. But be cautioned. Data that you saved should not be volatile and make sure you have done due diligence in making that call.
Recently, I have gone through many design principles of microservices and realizes that CQRS-ES and data replication with eventual consistency is best solution of this issue. We try to make communication Asynchronous as much as possible, uses point to point synchronous communication between microservices only when necessary.
This is a fairly common situation when designing microservices. Most microservices will require access to data available through another microservices or an external provider.
The best way to deal with this is to design each microservice as a "separate" application and think of all other microservices as external to it.
So, the developer of Microservice#1 (M1) would have to check into the Microservice#2 (M2) spec and write simple POJO classes for the data he fetches from there. Just like he would do if he were using some external API like Facebook.
Do note that that M1 will always talk to M2 (via REST for example) and never to the DB directly for the data it needs.
Ideally, each microservice would have its own database (or part clone of a central database)

Need some advice for a web service API?

My company has a product that will I feel can benefit from a web service API. We are using MSMQ to route messages back and forth through the backend system. Currently we are building an ASP.Net application that communicates with a web service (WCF) that, in turn, talks to MSMQ for us. Later on down the road, we may have other client applications (not necessarily written in .Net). The message going into MSMQ is an object that has a property made up of an array of strings. There is also a property that contains the command (a string) that will be routed through the system. Personally, I am not a huge fan of this, but I was told it is for scalability and every system can use strings.
My thought, regarding the web services was to model some objects based on our data that can be passed into and out of the web services so they are easily consumed by the client. Initially, I was passing the message object, mentioned above, with the array of strings in it. I was finding that I was creating objects on the client to represent that data, making the client responsible for creating those objects. I feel the web service layer should really be handling this. That is how I have always worked with services. I did this so it was easier for me to move data around the client.
It was recommended to our group we should maintain the “single entry point” into the system by offering an object that contains commands and have one web service to take care of everything. So, the web service would have one method in it, Let’s call it MakeRequest and it would return an object (either serialized XML or JSON). The suggestion was to have a base object that may contain some sort of list of commands that other objects can inherit from. Any other object may have its own command structure, but still inherit base commands. What is passed back from the service is not clear right now, but it could be that “message object” with an object attached to it representing the data. I don’t know.
My recommendation was to model our objects after our actual data and create services for the types of data we are working with. We would create a base service interface that would house any common methods used for all services. So for example, GetById, GetByName, GetAll, Save, etc. Anything specific to a given service would be implemented for that specific implementation. So a User service may have a method GetUserByUsernameAndPassword, but since it implements the base interface it would also contain the “base” methods. We would have several methods in a service that would return the type of object expected, based on the service being called. We could house everything in one service, but I still would like to get something back that is more usable. I feel this approach leaves the client out of making decisions about what commands to be passed. When I connect to a User service and call the method GetById(int id) I would expect to get back a User object.
I had the luxury of working with MS when I started developing WCF services. So, I have a good foundation and understanding of the technology, but I am not the one designing it this time.
So, I am not opposed to the “single entry point” idea, but any thoughts about why either approach is more scalable than the other would be appreciated. I have never worked with such a systematic approach to a service layer before. Maybe I need to get over that?
I think there are merits to both approaches.
Typically, if you are writing an API that is going to be consumed by a completely separate group of developers (perhaps in another company), then you want the API to be as self-explanative and discoverable as possible. Having specific web service methods that return specific objects is much easier to work with from the consumer's perspective.
However, many companies use web services as one of many layers to their applications. In this case, it may reduce maintenance to have a generic API. I've seen some clever mechanisms that require no changes whatsoever to the service in order to add another column to a table that is returned from the database.
My personal preference is for the specific API. I think that the specific methods are much easier to work with - and are largely self-documenting. The specific operation needs to be executed at some point, so why not expose it for what it is? You'd get laughed at if you wrote:
public void MyApiMethod(string operationToPerform, params object[] args)
{
switch(operationToPerform)
{
case "InsertCustomer":
InsertCustomer(args);
break;
case "UpdateCustomer":
UpdateCustomer(args);
break;
...
case "Juggle5BallsAtOnce":
Juggle5BallsAtOnce(args);
break;
}
}
So why do that with a Web Service? It'd be much better to have:
public void InsertCustomer(Customer customer)
{
...
}
public void UpdateCustomer(Customer customer)
{
...
}
...
public void Juggle5BallsAtOnce(bool useApplesAndEatThemConcurrently)
{
...
}

Designing WCF data contracts and operations

I'm starting to design a wcf service bus that is small now but will grow as our business grow so I'm concerned about some grwoing problems and also trying not to YAGNI too much. It's a e-commerce platform. The problem is I'm having too many second thoughts about where to put stuff. I will give a scenario to demonstrate all my questions.
We have an e-commerce website that sells products and ultimately deliveries them. For this we have a PlaceOrder service which, among other parameters, expects an Address object that in this context (our website placing an order) is made of City, Street and ZipCode.
We also do business with partners that use our platform only to sell products. They take care of the delivery. For this scenario we have a PlaceOrderForPartner service that, among other objects, expects an Address object. However, in this context (partner placing an order) the Address object is made of different information that is relevant only to a order placed by partner.
Given this scenario I have several questions:
1) How to organize this DataContracts objects in namespaces and folders in my Solution? I thought about having a folder per-context (Partner, Customer, etc) to keep the services and the DataContracts.
So I would have
- MySolution.sln
- Partner (folder)
- PartnetService.svc
- DataContracts (folder)
- Address
- Customer (folder)
- Customer.svc
- DataContracts (folder)
- Address
Using this way I would have a namespace to place all my context-specific datacontracts.
2) What about service design? Should I create a service for each one that might place and order and create a PlaceOrder method inside it like this:
Partner.svc/PlaceOrder
Customer.svc/PlaceOrder
or create an Order service with PlaceOrderForPartner and PlaceInternalOrder like this:
Order.svc/PlaceOrderForPartner
Order.svc/PlaceOrderForCustomer
3) Assuming that I pick the first option in the last question, what should I do with the operations that are made on the order and common to Partner and Customer?
4) Should I put DataContracts and Service definition in the same assembly? One for each? Everything with the service implementation?
5) How to name input and output messages for operations? Should I use the entities themselves or go for OperationNameRequest and OperationNameResponse template?
Bottom line my great question is: How to "organize" the datacontracts and services involved in a service creation?
Thanks in advance for any thoughts on this!
Besides what TomTom mentioned, I would also like to add my 2 cents here:
I like to structure my WCF solutions like this:
Contracts (class library)
Contains all the service, operations, fault, and data contracts. Can be shared between server and client in a pure .NET-to-.NET scenario
Service implementation (class library)
Contains the code to implement the services, and any support/helper methods needed to achieve this. Nothing else.
Service host(s) (optional - can be Winforms, Console App, NT Service)
Contains service host(s) for debugging/testing, or possibly also for production.
This basically gives me the server-side of things.
On the client side:
Client proxies (class library)
I like to package my client proxies into a separate class library, so that they can be reused by multiple actual client apps. This can be done using svcutil or "Add Service Reference" and manually tweaking the resulting horrible app.config's, or by doing manual implementation of client proxies (when sharing the contracts assembly) using ClientBase<T> or ChannelFactory<T> constructs.
1-n actual clients (any type of app)
Will typically only reference the client proxies assembly, or maybe the contracts assembly, too, if it's being shared. This can be ASP.NET, WPF, Winforms, console app, other services - you name it.
That way; I have a nice and clean layout, I use it consistently over and over again, and I really think this has made my code cleaner and easier to maintain.
This was inspired by Miguel Castro's Extreme WCF screen cast on DotNet Rocks TV with Carl Franklin - highly recommended screen cast !
You start wrong on th highest level.
It should not be "PlaceOrder" service, but "OrderManager". Maybe you want to add more service functions later - like inquiring for orders, cancel orders, change orders - who knows. In general, I would keep the number of "services" (.svc) small and add methods there. Otherwise you end up with a HUGH overhead for using them, in code - without any real benefit.
Why separate between partners and customers? I am sure with 15 minutes of data design, you could break things down to exactly ONE data structure so you could have only one service. If not... make that two methods on one interface, limit by security. But I seriously would NOT like two programs for that. Rather have two address fields - "Address" and "PartnerInfo", and only one can be set (other has to be null), checked in the logic.
Separate out into two projects. Interfaces, data contracts go into a separate project (blabalbla.Api) so that customers can actually get the DLL if they want - at least it makes things a lot easier on your end, you can rely on "shared type", no need to generate the wrappers internally. Allows a lot better testing (as sub-projets dont forget to regenerate the wrappers.... which may lead to errors when testing them).
I always put the implementation into a "blabla.Service" project. Url would be "Services.blabla.com/" in a subdomain (or "api.blabla.com", depends mostly on mood, but lately I am going for api mostly) - separates thigns out from the main website.

WCF - Domain Objects and IExtensibleDataObject

Typical scenario. We use old-school XML Web Services internally for communicating between a server farm and several distributed and local clients. No third parties involved, only our applications used by ourselves and our customers.
We're currently pondering moving from XML WS to a WCF/object-based model and have been experimenting with various approaches. One of them involves transferring the domain objects/aggregates directly over the wire, possibly invoking DataContract attributes on them.
By using IExtensibleDataObject and a DataContract using the Order property on the DataMembers, we should be able to cope with simple property versioning issues (remember, we control all clients and can easily force-update them).
I keep hearing that we should use dedicated, transfer-only Data Transfer Objects (DTOs) over the wire.
Why? Is there still a reason to do so? We use the same domain model on the server side and client side, of course, prefilling collections, etc. only when deemed right and "necessary." Collection properties utilize the service locator principle and IoC to invoke either an NHibernate-based "service" to fetch data directly (on the server side), and a WCF "service" client on the client side to talk to the WCF server farm.
So - why do we need to use DTOs?
Having worked with both approaches (shared domain objects and DTOs) I'd say the big problem with shared domain objects is when you don't control all clients, but from my past experiences I'd usually use DTOs unless it development speed were of the essence.
If there's any chance that you won't always be in control of the clients then I'd definately recommend DTOs, because as soon as you share your domain objects with someone else's client application you start tying your internals to someone else's dev cycle.
I've also found DTOs useful when working in a versioned service environment, which allowed us to radically change the internals of our app but still accept calls to the old versions of our service interfaces.
Finally, if you have a lot of client applications it might also be beneficial to use DTOs as you're then protected with an easily versionable service.
In my experience DTOs are most useful for:
Strictly defining what will be sent over the wire and having a type specifically devoted to that definition.
Isolating the rest of your application, client and server, from future changes.
Interoperability with non-.Net systems. DTOs certainly aren't a requirement, but they make it easier to design "safe" types.
In your scenario these design features may not matter that much. I've used WCF with both strict DTOs and shared Domain Objects and in both scenarios it worked great. The only thing I noticed when sending Domain Objects over the wire was that I tended to send more data (and in unexpected ways) then I needed to. This was likely more due to my lack of experience with WCF than anything else; but it's something you should definitely be wary of should you choose to go that route.