Strategies for dealing with split application - wcf

To deal with recent growth our application has been split across two sets of separate infrastructure. Approximately half of our customers are on set 1 and the other half are on set 2.
Both sets have different urls (api1.ourdomain.com and api2.ourdomain.com).
Problem is clients accidentally putt the wrong url and then wonder why they get error messages.
Other then user education any other strategies for dealing with this mess?
Is it possible to redirect requests to the correct endpoint?
Thanks.

I don't think your question is detailed enough to provide meaningful feedback. There are obviously several factors that could easily contribute to a recommendation.
Does your application make use of user profiles (or a similar construct)? If so you might consider associating a primary URI for each user in their profile and include logic in your application to interrogate the profile for each request and redirect if a user goes to the wrong URI.
Is this an authorization issue? If so you might consider including some basic authorization routing that provides a custom 403 page with the proper URL.
If you could provide additional detail I think we could be more helpful.

Related

REST API responses based on authentication, best practices?

I have an API with endpoint GET /users/{id} which returns a User object. The User object can contain sensitive fields such as cardLast4, cardBrand, etc.
{
firstName: ...,
lastName: ...,
cardLast4: ...,
cardBrand: ...
}
If the user calls that endpoint with their own ID, all fields should be visible. However, if it is someone elses ID then cardLast4 and cardBrand should be hidden.
I want to know what are the best practices here for designing my response. I see three options:
Option 1. Two DTOs, one with all fields and one without the hidden fields:
// OtherUserDTO
{
firstName: ...,
lastName: ..., // cardLast4 and cardBrand hidden
}
I can see this becoming out of hand with DTOs based on role, what if now I have UserDTOForAdminRole, UserDTOForAccountingRole, etc... It looks like it quickly gets out of hand with the number of potential DTOs.
Option 2. One response object being the User, but null out the values that the user should not be able to see.
{
firstName: ...,
lastName: ...,
cardLast4: null, // hidden
cardBrand: null // hidden
}
Option 3. Create another endpoint such as /payment-methods?userId={userId} even though PaymentMethod is not an entity in my database. This will now require 2 api calls to get all the data. If the userId is not their own, it will return 403 forbidden.
{
cardLast4: ...,
cardBrand: ...
}
What are the best practices here?
You're gonna get different opinions about this, but I feel that doing a GET request on some endpoint, and getting a different shape of data depending on the authorization status can be confusing.
So I would be tempted, if it's reasonable to do this, to expose the privileged data via a secondary endpoint. Either by just exposing the private properties there, or by having 2 distinct endpoints, one with the unprivileged data and a second that repeats the data + the new private properties.
I tend to go for option 1 here, because an API endpoint is not just a means to get data. The URI is an identity, so I would want /users/123 to mean the same thing everywhere, and have a second /users/123/secret-properties
I have an API with endpoint GET /users/{id} which returns a User object.
In general, it may help to reframe your thinking -- resources in REST are generalizations of documents (think "web pages"), not generalizations of objects. "HTTP is an application protocol whose application domain is the transfer of documents over a network" -- Jim Webber, 2011
If the user calls that endpoint with their own ID, all fields should be visible. However, if it is someone elses ID then cardLast4 and cardBrand should be hidden.
Big picture view: in HTTP, you've got a bit of tension between privacy (only show documents with sensitive information to people allowed access) and caching (save bandwidth and server pressure by using copies of documents to satisfy more than one request).
Cache is an important architectural constraint in the REST architectural style; that's the bit that puts the "web scale" in the world wide web.
OK, good news first -- HTTP has special rules for caching web requests with Authorization headers. Unless you deliberately opt-in to allowing the responses to be re-used, you don't have to worry the caching.
Treating the two different views as two different documents, with different identifiers, makes almost everything easier -- the public documents are available to the public, the sensitive documents are locked down, operators looking at traffic in the log can distinguish the two different views because the logged identifier is different, and so on.
The thing that isn't easier: the case where someone is editing (POST/PUT/PATCH) one document and expecting to see the changes appear in the other. Cache-invalidation is one of the two hard problems in computer science. HTTP doesn't have a general purpose mechanism that allows the origin server to mark arbitrary documents for invalidation - successful unsafe requests will invalidate the effective-target-uri, the Location, the Content-Location, and that's it... and all three of those values have other important uses, making them more challenging to game.
Documents with different absolute-uri are different documents, and those documents, once copied from the origin server, can get out of sync.
This is the option I would normally choose - a client looking at cached copies of a document isn't seeing changes made by the server
OK, you decide that you don't like those trade offs. Can we do it with just one resource identifier? You immediately lose some clarity in your general purpose logs, but perhaps a bespoke logging system will get you past that.
You probably also have to dump public caching at this point. The only general purpose header that changes between the user allowed to look at the sensitive information and the user who isn't? That's the authorization header, and there's no "Vary" mechanism on authorization.
You've also got something of a challenge for the user who is making changes to the sensitive copy, but wants to now review the public copy (to make sure nothing leaked? or to make sure that the publicly visible changes took hold?)
There's no general purpose header for "show me the public version", so either you need to use a non standard header (which general purpose components will ignore), or you need to try standardizing something and then driving adoption by the implementors of general purpose components. It's doable (PATCH happened, after all) but it's a lot of work.
The other trick you can try is to play games with Content-Type and the Accept header -- perhaps clients use something normal for the public version (ex application/json), and a specialized type for the sensitive version (application/prs.example-sensitive+json).
That would allow the origin server to use the Vary header to indicate that the response is only suitable if the same accept headers are used.
Once again, general purpose components aren't going to know about your bespoke content-type, and are never going to ask for it.
The standardization route really isn't going to help you here, because the thing you really need is that clients discriminate between the two modes, where general purpose components today are trying to use that channel to advertise all of the standardized representations that they can handle.
I don't think this actually gets you anywhere that you can't fake more easily with a bespoke header.
REST leans heavily into the idea of using readily standardizable forms; if you think this is a general problem that could potentially apply to all resources in the world, then a header is the right way to go. So a reasonable approach would be to try a custom header, and get a bunch of experience with it, then try writing something up and getting everybody to buy in.
If you want something that just works with the out of the box web that we have today, use two different URI and move on to solving important problems.

Which kind of information can be collect about "website third parties"?

I have collected all the requests made by websites with the aim to identify the third-parties through the requests which are made by a website. I used selenium and WebDriver to do that.
These requests can be made by the JavaScript present in the source code of the website or can be dynamically called by the web-page from the advertisements or can be initiated by Google or DoubleClick or Facebook. These requests help to track the data that is being shared by these websites with or without the user consent.
You can see an example of the requests when the browser wants to load this website: www.focuscamera.com/ in this excel file:
https://drive.google.com/file/d/16wNA0dFUehrjPww31TAIj8GZUZ05LsIU/view?usp=sharing
My questions are:
1- which kind of HTTP header field can be used for my analysis if I tend to gather some info about third parties? my goal is to distinguish and differentiate the third party behavior!
For example, the field content-length in the requests indicates the size of the entity-body. So a request with higher content-length means that the third party received and collect more data/information?
2- What does exactly content-length indicates? what does exactly "HTTP request body data" contain?
3- Are there any other HTTP header fields that I can use if I aim to distinguish and differentiate the third party behavior? ( a list of field I collect can be found in sheet1 of the excel file I shared before)
4- Are there any other information on the internet that I can use if I aim to distinguish and differentiate the third party behavior? For example, I use cookiepedia.co.uk in order to know what kind of services third parties provide? is it functionality, performance, or Targeting/advertising?
It sounds like you may be reinventing the wheel here. Take a look at https://webbkoll.dataskydd.net; they provide lots of security and privacy analysis on any site you like. Generate nice visual request maps using https://requestmap.webperf.tools:
Try using that tool on sites like wired.com and forbes.com to see how spectacularly bad it can get!
To answer your questions specifically:
Headers are not massively useful as they are within each request (it's the request itself that's more interesting), but the important ones from a privacy perspective will be Referer and Set-cookie. Content-length does indeed tell you how big the request body is – that will always be 0 on a GET request and so is usually omitted – large post requests indicate more data is being transmitted, but that may be down to inefficiency rather than anything else.
Content-length indicates the length of the data (in bytes) within the body of a POST request. An HTTP request body can contain any kind of data: text, images, video, audio, formatted data.
There are some, but most headers are functional rather than semantic, concerned with making the request actually work. It's more interesting that requests happen at all than what they contain.
You can't necessarily tell what kind of service a third party is providing from the requests themselves, but the domains they are going to are more interesting. For example anything going to doubleclick.com is going to be ad and tracking related because of what that domain is known to be used for (Webbkoll cites these as "known trackers"); So you're correct that sites like cookiepedia can help you find out what a particular service does. The divisions between functional/performance/profiling are mostly made up by ad companies to excuse their behaviour, and you can't tell what they are using data for, only whether they are receiving data, and what data they are receiving (because you can see what's in the requests they make using browser developer tools). To clarify - a site could receive your full name and address, but do absolutely nothing with it; but you can't tell that from looking at the data that's sent. In privacy terms, it's always best to assume the worst (because ad companies absolutely cannot be trusted!), so if they are receiving data, assume it will be abused.

Using HTTP for REST API: automatically cacheable?

I was wondering, to make a "RESTful API" you need to satisfy the 6 architectural constraints listed below:
http://en.wikipedia.org/wiki/Representational_state_transfer#Architectural_constraints
Is it safe to state that when you are creating a REST API over the HTTP protocol, the "cacheable" constraint is automatically satisfied? Because HTTP already provides a cache system "out-of-the-box" through HTTP headers: http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
So no need anymore to worry about that?
Maybe sounds like a stupid question, but I want to be sure. :-)
Kind regards!
K.
Let me expand a bit on the challenges of creating correct caching logic:
Typically, the backend of the API is a database holding all kinds of little pieces of information.
The typical presentation within a REST API can be an accumulated view (So, let's say, a users activity log, containing a list of the last user actions within a portal, something along those lines).
Now, in order to know if your API URL /user/123/activity has changed (after the timestamp the client is sending you in the "If-modified-since"-header), you would have to check if there have been any additional activities after the last request. The overhead of doing that might be the same as simply fetching the result again. So, in a lot of cases, people just don't really bother, which is a shame, as proper caching can have a huge impact on Web App performance.
Maybe this gives a bit more detail,
Jan
you are correct, HTTP already gives you the means to identify cacheable elements, but as your API will be generated by some server-side logic, you will still need to make sure the code "behind" your API will se the right HTTP headers and be ready and able to react to "If-modified-since" requests in an ideal world.
Creating a reliable "Last-modified" timestamp as well as checking against it reliably is actually quiet a feat ;-)
Hope this helps a bit,
Jan

How RESTful is using subdomains as resource identifiers?

We have a single-page app (AngularJs) which interacts with the backend using REST API. The app allows each user to see information about the company the user works at, but not any other company's data. Our current REST API looks like this:
domain.com/companies/123
domain.com/companies/123/employees
domain.com/employees/987
NOTE: All ids are GUIDs, hence the last end-point doesn't have company id, just the employee id.
We recently started working on enforcing the requirement of each user having access to information related exclusively the company where the user works. This means that on the backend we need to track who the logged in user is (which is simple auth problem) as well as determining the company whose information is being accessed. The latter is not easy to determine from our REST API calls, because some of them do not include company id, such as the last one shown above.
We decided that instead of tracking company ID in the UI and sending it with each request, we would put it in the subdomain. So, assuming that ACME company has id=123 our API would change as follows:
acme.domain.com
acme.domain.com/employees
acme.domain.com/employees/987
This makes identifying the company very easy on the backend and requires minor changes to REST calls from our single-page app. However, my concern is that it breaks the RESTfulness of our API. This may also introduce some CORS problems, but I don't have a use case for it now.
I would like to hear your thoughts on this and how you dealt with this problem in the past.
Thanks!
In a similar application, we did put the 'company id' into the path (every company-specific path), not as a subdomain.
I wouldn't care a jot about whether some terminology enthusiast thought my design was "RESTful
" or not, but I can see several disadvantages to using domains, mostly stemming from the fact that the world tends to assume that the domain identifies "the server", and the path is how you find an item on that server. There will be a certain amount of extra stuff you'll have to deal with with multiple domains which you wouldn't with paths:
HTTPS - you'd need a wildcard certificate instead of a simple one
DNS - you're either going to have wildcard DNS entries, or your application management is now going to involve DNS management
All the CORS stuff which you mention - may or may not be a headache in your specific application - anything which is making 'same domain' assumptions about security policy is going to be affected.
Of course, if you want lots of isolation between companies, and effectively you would be as happy running a separate server for each company, then it's not a bad design. I can't see it's more or less RESTful, as that's just a matter of viewpoint.
There is nothing "unrestful" in using subdomains. URIs in REST are opaque, meaning that you don't really care about what the URI is, but only about the fact that every single resource in the system can be identified and referenced independently.
Also, in a RESTful application, you never compose URLs manually, but you traverse the hypermedia links you find at the API endpoint and in all the returned responses. Since you don't need to manually compose URIs, from the REST point of view it's indifferent how they look. Having a URI such as
//domain.com/ABGHTYT12345H
would be as RESTful as
//domain.com/companies/acme/employees/123
or
//domain.com/acme/employees/smith-charles
or
//acme.domain.com/employees/123
All of those are equally RESTful.
But... I like to think of usable APIs, and when it comes to usability having readable meaningful URLs is a must for me. Also following conventions is a good idea. In your particular case, there is nothing unrestful with the route, but it is unusual to find that kind of behaviour in an API, so it might not be the best practice. Also, as someone pointed out, it might complicate your development (Not specifically on the CORS part though, that one is easily solved by sending a few HTTP headers)
So, even if I can't see anything non REST on your proposal, the conventions elsewhere would be against subdomains on an API.

Suggestions on addressing REST resources for API

I'm a new REST convert and I'm trying to design my first RESTful (hopefully) api and here is my question about addressing resources
Some notes first:
The data described here are 3d render
jobs
A user (graphics company) has multiple projects.
A project has multiple render jobs.
A render job has multiple frames.
There is a hierarchy enforced in the data (1 render job
belongs to one project, to one user)
How's this for naming my resourses...?
https:/api.myrenderjobsite.com/
/users/graphicscompany/projects
/users/graphicscompany/projects/112233
/users/graphicscompany/projects/112233/renders/
/users/graphicscompany/projects/112233/renders/889900
/users/graphicscompany/projects/112233/renders/889900/frames/0004
OR a shortened address for renders?
/users/graphicscompany/renders/889900
/users/graphicscompany/renders/889900/frames/0004
OR should I shorten (even more) the address if possible, omitting the user when not needed...?
/projects/112233/
/renders/889900/
/renders/889900/frames/0004
THANK YOU!
Instead of thinking about your api in terms of URLs, try thinking of it more like pages and links
between those pages.
Consider the following:
Will it be reasonable to create a resource for users? Do you have 10, 20 or 50 users? Or do you have 10,000 users? If it is the latter then obviously creating a single resource that represents all users is probably not going too work to well when you do a GET on it.
Is the list of Users a reasonable root url? i.e. The entry point into your service. Should the list of projects that belong to a GraphicsCompany be a separate resource, or should it just be embedded into the Graphics Company resource? You can ask the same question of each of the 1-to-many relationships that exist. Even if you do decide to merge the list of projects into the GraphicsCompany resource, you may still want a distinct resource to exist simple for the purpose of being able to POST to it in order to create a new project for that company.
Using this approach you should be able get a good idea of most of the resources in your API and how they are connected without having to worry about what your URLs look like. In fact if you do the design right, then any client application you right will not need to know anything about the URLs that you create. The only part of the system that cares what the URL looks like is your server, so that it can dispatch the request to the right controller.
The other significant question you need to ask yourself is what media type are you going to use for these resources. How many different clients will need to access these resources? Are you writing the clients, or is someone else? Should you attempt to reuse an existing standard like XHTML and classes/microformats? Could you squeeze most of the information into Atom? Maybe Atom with some extra namespaces like GDATA does it? Or is this only going to be used internally so you can just create your own media types, like application/vnd.YourCompany.Project+xml, application/vnd.YourCompany.Render+xml, etc.
There are many things to think about when designing a REST api, don't get hung up on what your URLs look like and you should really try to avoid doing "design by URL".
Presuming that you authenticate to the service, I would use the 1st option, but remove the user, particularly if the user is the currently logged in user.
If user actually represents something else (like client), I would include it, but not if it simply designates the currently logged in user. Agree with StaxMan, though, don't worry too much about squeezing the paths, as readability is key in RESTful APIs.
Personally I would not try to squeeze path too much, that is, some amount of redundant information is helpful both to quickly see what resource is, and for future expansion.
Generally users won't be typing paths anyway, so verbosity is not all that bad.