RESTful API to safeguard server and client from large datasets - api

I am working on designing a RESTful API and need second opinion on the design. I will be abstracting away the problem statement for better understanding.
Consider a URI /search?key1=value1&key2=value2, which can potentially return a huge result set for a given search criteria for key1 and key2.
My mandate is to make sure that the server and client are bounded by limits to prevent performance degradation. If that limit is reached and the intended data is not found in result set, user will be asked to refine the search query to narrow down. (I am not thinking of pagination, that is for a different problem set)
Approach is to allow client specify a limit to server that it(client) can comfortably handle, and to help server set a limit for itself to prevent from generating huge result sets affecting performance.
Client can do /search?key1=value1&key2=value2&maxresults=xxxx to specify it's limit.
Server can set it's own limit as a configuration param for search URI. While serving a request, server will take a min of (client's limit, server's limit) and generate result set satisfying the effective limit.
The JSON generated will have a meta data part which will mention if the result was truncated or not, and the effective limit set. The client can inspect this part and ask the user to refine search if "truncated" is "true". The problem domain actually allows the user to refine to a single item.
{
"result": {
"truncated": "true",
"limit": "2000",
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
}
The questions I am trying to answer are:
Is this violating any REST principles?
Is there a standard convention to do the same that I might follow?
Are there good examples on public APIs that you can quote? (Jira RESTful API has a couple of examples)
Is there any gotcha in this design which may affect us in the future?
Any view on this will be appreciated ...
Thanks!

From my point of view this fits REST principles quite well. I would suggest not to add result size meta data values to the response payload but as HTTP headers. So instead of
{
"result": {
"truncated": "true",
"limit": "2000",
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
}
The service would send
{
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
and add additional custom HTTP headers
x-result-truncated:1
x-result-limit:1000
This approach has the benefit that meta data values that are not a part of the payload from a client's perspective are sent in the meta data section of the your response where for example content-type are transmitted.
An additional benefit is that packing the meta data into HTTP headers is reusable for other services as well and you do not have to change the schema of the returned payload, that means clients keep working as expected (except that some results may be truncated).

Related

Is having two dependent resources not compliance with the RESTFul approach?

Context
In our project, we need to represent resources defined by the users. That is, every user can have different resources, with different fields, different validations, etc. So we have two different things to represent in our API:
Resource definition: this is just a really similar thing to a json schema, it contains the fields definitions of the resource and its limitations (like min and max value for numeric fields). For instance, this could be the resource definition for a Person:
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
Resource instance: this is just an instance of the specified resource. For instance, for the Person resource definition, we can have the following instances:
[
{
"firstName": "Elena",
"lastName": "Gomez",
},
{
"firstName": "Elena2",
"lastName": "Gomez2",
},
]
First opinion
So, it seems this kind of presents some conflicts with the Restful API approach. In particular, I think it has some problems with the Uniform Interface. When you get a resource, you should be able to handle the resource without any additional information. With this design, you need to make an additional request to first get the resource definition. Let's see this with an example:
Suppose you are our web client. And you are logged in as an user with the Person resource. To show a person in the UI, you first need to know the structure of the Person resource, that is, you to do the following request: GET /resource_definitions/person. And then, you need to request the person object: GET /resource/person/123.
Second opinion
Others seem to think that this is not a problem and that the design is still RESTful. Every time you ask for something to an API, you need to know the format previously, is not self-documented in the API, so it makes sense for this endpoint to behave the same as the others.
Question
So what do you think? Is the proposed solution compliance with the RESTful approach to API design?
The simple solution is to add a link:
{
"_links": {
"describedby": {
"href": "https://example.com/person.schema.json",
"type": "application/schema+json"
}
},
"firstName": "Elena",
"lastName": "Gomez"
}
You could also put this in a header. This is semantically equivalent:
Link: <https://example.com/person.schema.json>; rel="describedby" type="application/schema+json"
It does not violate the uniform interface if there is no standard for this kind of stuff, but there is. RDF e.g. JSON-LD and schema.org vocab can handle most of these types. Even for REST there is an RDF vocab called Hydra, though the community is not that active nowadays.
As of the actual problem, I would look around, maybe RDF technologies or graph technologies are better for it, though I am not sure how much connection there is in your graph. If it is just a few types and instances, then I would probably stick to REST.
Ohh I see meanwhile, you used an actual JSON schema. Then that part is certainly uniform interface compatible. As of the instances you need to add something like type: "https://example.com/person.schema.json" and you are ok. Maybe a vendor specific JSON derived MIME type which describes what "type" means in this context if you want to be super precise or just use JSON-LD instead. https://www.w3.org/2019/wot/json-schema Or an alternative more common solution is using RDFS and XSD with JSON-LD instead of JSON Schema.

JSON API relationship with meta information

I have an entity contract with relationship contract_contacts that should be presented in JSON API format.
To be more clear here's the structure of my entities:
Contract
id
name
ContractContact
contract_id
contact_id
type
comment
Contact
id
name
Possible JSON API output will look like:
{
"data": {
"type": "contracts",
"id": "1",
"attributes": {
"name": "Contract 1"
},
"relationships": {
"contacts": {
"data": [
{
"type": "contract_contacts",
"id": "1"
},
{
"type": "contract_contacts",
"id": "2"
}
]
}
}
}
}
This approach is not good enough - you have to create additional resource for relation where you will store your contact and comment with type. You have to include with 2 levels deep to get you contact fields. Also in this case to create contract frontend should work with both resources:
Create contract contact and get id
Then Create contract with relationship
with id from above
The second approach is seems hacky to me because it will use meta and it's up to you how to use it. Example:
{
"data": {
"type": "contracts",
"id": "1",
"attributes": {
"name": "Contract 1"
},
"relationships": {
"contacts": {
"data": [
{
"meta": {
"comment": "comment 1",
"type": 1
},
"type": "contacts",
"id": "10"
},
{
"meta": {
"comment": "comment 2",
"type": 2
},
"type": "contacts",
"id": "11"
}
]
}
}
}
}
This approach will simplify the mess with api requests that was in previous example.
But is that correct to POST/PUT/PATCH with meta fields as they are not supposed to be changed from client (or supposed to be)? I'm confused with this part.
The relationship that you are describing is often referred to as a has-many-through relationship: A contract has a many contacts through a contract_contacts. These is defined as a relationship that links two resources through an intermediate resource.
JSON:API specification does not provide first-level support for these kind of relationship. You should instead model them through separate resources as described by you as your first option. This allows you to create, modify and delete your intermediate resource in the same way as any other resource. Doing so reduces the complexity as the intermediate resource is just another resource type as any other.
You mentioned two problems with doing so:
You have to include with 2 levels deep to get you contact fields.
This is true but shouldn't be an issue. include query parameter allows your client to sideload resources any many level deep as it needs. The response document might be a little bit bigger than it would be if the information of the intermediate resource is stored on the relationship itself but that shouldn't be relevant in production after gzip.
Also in this case to create contract frontend should work with both resources:
Create contract contact and get id
Then Create contract with relationship with id from above
This is true and a serious limitation of the current stable version of JSON:API specification (v1.0). It's not directly related to has-many-through relationships so. It's a general limitation of the specification, which does not support creating, modifying and/or deleting more than one resource with one request.
An official Atomic Operations extension is proposed for v1.1 of the specification to address that limitation. It's very likely that these one or a similar proposal will be included in the upcoming version.
It might be tempting to store the information of the intermediate model as meta data on the relationship. But doing so will introduce serious limitations which for I would strongly recommend to not take that path:
The JSON:API specification does not cover changing meta data. You would need to introduce your own specification to create or update these meta data.
Client-side libraries for the JSON:API specification do not expect such information to be available as meta data of the relationship. It's very likely that the consumers will have a hard time processing the information.
Storing information of the intermediate resource would lock you into using resource linkage to express relationship information in a resource document. You would not be able to use related resource links. These may introduce serious performance issues as resource linkage requires to always lookup the IDs of related resources in the database, which is not required if using related resource links.

How to add content and moreDetailsUrl for Google Search suggest?

I'm using GSA (version 6.14) and we would like to get an auto suggest function on our website. Works fine for basic requests, but it seems the GSA offers more functionality when you would be using user-added results. However, I can find nowhere a reference on how to add user-added results.
This is what the information tells me today :
/suggest?q=<query>&max=<num>&site=<collection>&client=<frontend>&access=p&format=rich
should return a response as below :
{
"query": "<query>",
"results": [
{ "name": "<term 1>", "type": "suggest"},
{ "name": "<term 2>", "type": "suggest"},
{ "name": "<term 3>", "type": "uar", "content": "Title of UAR",
"moreDetailsUrl": "URL of UAR"}
]
}
I am able to get results as the first 2 lines, but would like to get results as the last line also, so with content and a moreDetailsUrl. So maybe a very stupid question but I am not able to find the answer anywhere : How and where do I add this UAR ?
I actually want to understand if it's feasible to get metadata into the content part of the JSON, so if for instance an icon meta is available I'd like to have it included in the JSON so I can enrich my search results.
User Added Results are a OneBox that can be added to multiple frontends. See this: https://developers.google.com/search-appliance/documentation/614/admin_searchexp/ce_improving_search#uar
When done with Suggest, the data is fed from user entering 'keymatches' directly. What's different about them is that they are a direct link versus a suggested query. If you use the out of the box experience, you'll click a link to the url instead of running another query.

RESTful API design: reasonable to accept unique identifiers rather than resource URIs?

Are there disadvantages to allowing a RESTful API to accept a representation with an implicit link to another resource?
To illustrate this, take that I have two resources:
GET /people/:id
GET /houses/:id
A person has a unique identifier in addition to id, which is their email.
Are there disadvantages to allowing the following interaction?
POST /houses
{
"_links": {
"owner": {
"email": "example#example.com"
}
},
"street_number": 20
}
The server know that the email field is unique, and therefore can be used to identify the people resource. It will create an association to that person.
The reason to allow this would be to make it easier on the API client, where they don't have to first look up the URI of the resource.
In contrast, I would certainly allow this type of call:
POST /houses
{
"_links": {
"owner": {
"href": "/people/3"
}
},
"street_number": 20
}
The common sense should drive here. If in your business model the user has a unique email, there is no drawback for me.

RESTfully handling sub-resources

I've been creating a RESTful application and am undecided over how I should handle requests that don't return all entities of a resource or return multiple resources (a GET /resource/all request). Please allow me a few moments to setup the situation (I'll try to generalize this as much as possible so it can apply to others besides me):
Let's say I'm creating a product API. For simplicity, let's say it returns JSON (after the proper accept headers are sent). Products can be accessed at /product/[id]. Products have reviews which can be accessed at /products/[id]/review/[id].
My first question lies in this sub-resource pattern. Since you may not always want the reviews when you GET a product, they are accessible by another URI. From what I read I should include the URI of the request that will return all review URI's for a product in the response for a product request. How should I go about this so that it abides to RESTful standards? Should it be a header like Reviews-URI: /product/123/review/all or should I include the URL in the response body like so:
{ 'name': 'Shamwow',
'price': '$14.99',
'reviews': '/product/123/review/all'
}
My second question is about how the /product/[id]/review/all request should function. I've heard that I should just send the URL's of all of the reviews and make the user GET each of them instead of packaging all of them into one request. How should I indicate this array of review URIs according to RESTful standards? Should I use a header or list the URIs in the response body like so:
{ 'reviews': [ '/product/123/review/1',
'/product/123/review/2',
'/product/123/review/3'
]
}
Your problem is you're not using Hypermedia. Hypermedia specifically has elements that hold links to other things.
You should consider HAL, as this is a Hypermedia content type that happens to also be in JSON.
Then you can leverage the links within HAL to provide references to your reviews.
As to your first question (header or body), definitely do not invent your own custom header. Some here will argue that you should use the Link header, but I think you'll find plenty of need for nested links and should keep them in the body.
How you indicate either the URI to the reviews/ resource, or the list of URI's within that, is entirely up to the media type you select to represent each resource. If you're using HTML, for example, you can use an anchor tag. If you're using plain JSON, which has no hypermedia syntax, you'll have to spend some time in the documentation for your API describing which values are URI's, either by nominating them with special keys, or wrapping them in special syntax like {"link": "reviews/123"}, or with a related schema document.
Take a look at Shoji, a JSON-based media type which was designed explicitly for this pattern of subresources.
The JSON Schema standard might help you here, in particular Hyper-Schemas.
It lets you define how to extract link URIs from your data, and what their "rel"s are - essentially turning your JSON data into hyper-media. So for your first bit of data, you might write a schema like:
{
"title": "Product",
"type": "object",
"properties": {...},
"links": [
{"rel": "reviews", "href": "{reviews}"}
]
}
The value of href is a URI Template - so for example, if your data included productId, then you could replace the value of href with "/product/{productId}/review/all".
For the second bit of example data (the list of reviews) you might have a schema like this:
{
"type": "object",
"properties": {
"reviews": {
"type": "array",
"items": {
"links": [
{"rel": "full", "href": "{$}"}
]
}
}
}
}
In the URI Template of href, the special value of {$} means "the value of the JSON node itself". So that Hyper-Schema specifies that each item in the reviews array should be replaced with the data at the specified URL (rel="full").