Graph Store protocol support in GraphDB - graphdb

I'm having troubles with using the Graph Store protocol, as documented in GraphDB's help section (the REST API docs). Specifically, I have two issues:
The Graph Store protocol is supposed to support PUT requests (see https://rdf4j.org/documentation/reference/rest-api/), but the GraphDB REST API documentation only indicates GET, DELETE and POST operations (when listing all operations under "graph-store" section of the docs)
The notion of "directly referenced graph" does not seem to be working, not sure if I'm doing something wrong. What I tried is:
Step1. I created repository myrepo and included a named graph with the IRI http://example.org/graph1
Step2. I tried to access the graph by including various forms of its IRI in the URL. None of the following works:
http://localhost:7200/repository/myrepo/rdf-graphs/graph1
http://localhost:7200/repository/myrepo/rdf-graphs/http://example.org/graph1
http://localhost:7200/repository/myrepo/rdf-graphs/http%3A%2F%2Fexample.org%2Fgraph1
Also, the "Try it out!" button provided in the REST API docs under each operation reports Bad Request if I try to fill those boxes (repository=myrepo, graph=graph1)
Any ideas how this feature can actually be used?
Is there a specific way of writing the "directly referenced named graph" in the request URL? (perhaps GraphDB generates some resolvable identifiers for each named graph? how would they look like?)

I confirm your observations and posted a bug GDB-5486
instead of POST you could use DELETE then PUT.
For the time being, use "indirectly referenced".
For the record, "indirectly referenced graph" works, and returns various formats, eg:
> curl -HAccept:text/turtle 'http://localhost:7200/repository/myrepo/rdf-graphs/service?graph=http%3A%2F%2Fexample.org%2Fgraph1'
<http://example.org/s> <http://example.org/p> <http://example.org/o> .
> curl -HAccept:application/trig 'http://localhost:7200/repository/myrepo/rdf-graphs/service?graph=http%3A%2F%2Fexample.org%2Fgraph1'
<http://example.org/graph1> {
<http://example.org/s> <http://example.org/p> <http://example.org/o> .
}
> curl -HAccept:text/nquads 'http://localhost:7200/repository/myrepo/rdf-graphs/service?graph=http%3A%2F%2Fexample.org%2Fgraph1'
<http://example.org/s> <http://example.org/p> <http://example.org/o> <http://example.org/graph1> .
> curl -HAccept:application/ld+json 'http://localhost:7200/repository/myrepo/rdf-graphs/service?graph=http%3A%2F%2Fexample.org%2Fgraph1'
[ {
"#graph" : [ {
"#id" : "http://example.org/s",
"http://example.org/p" : [ {
"#id" : "http://example.org/o"
} ]
} ],
"#id" : "http://example.org/graph1"
} ]

The SPARQL 1.1 Graph Store HTTP protocol is often misunderstood, particularly the notion of "directly referenced graph". When you call the protocol with a URL like http://localhost:7200/repository/myrepo/rdf-graphs/graph1 you literally provide a named graph identified by the whole URL, i.e. your named graph would be "http://localhost:7200/repository/myrepo/rdf-graphs/graph1" and not just "graph1". Consequently you can't use a URL like "http://localhost:7200/repository/myrepo/rdf-graphs/http://example.org/graph1" and expect that the protocol will interpret this as addressing the named graph "http://example.org/graph1". The protocol also supports "indirectly referenced graphs", which is the only way to use a graph URI that isn't derived from the URL used to call the protocol. Please see https://www.w3.org/TR/sparql11-http-rdf-update/#direct-graph-identification for a more detailed explanation.
Because of the above confusion I recommend to avoid using the Graph Store protocol entirely and instead use the SPARQL 1.1 Protocol, which can do everything the Graph Store protocol can except for the convoluted notion of directly referenced graphs. Admittedly the REST API doc "Try it out" feature is broken for some of the Graph Store protocol endpoints.
E.g. to fetch all statements in the named graph http://example.com/graph1 you could do this with curl:
curl -H 'Accept: text/turtle' 'http://localhost:7200/repositories/myrepo/statements?context=%3Chttp%3A%2F%2Fexample.org%2Fgraph1%3E'
To add data to a named graph simply send the data using POST, to replace the data use PUT and to delete the data issue a DELETE request.
This is available in the REST API doc section of the GraphDB Workbench, under "repositories". Note that in the SPARQL 1.1 Protocol URIs must be encircled in < >, unlike in the SPARQL 1.1 Graph Store protocol.

Related

Creating API - general question about verbs

I decided to move my application to a new level by creating a RESTful API.
I think I understand the general principles, I have read some tutorials.
My model is pretty simple. I have Projects and Tasks.
So to get the lists of Tasks for a Project you call:
GET /project/:id/tasks
to get a single Task:
GET /task/:id
To create a Task in a Project
CREATE /task
payload: { projectId: :id }
To edit a Task
PATCH /task/:taskId
payload: { data to be changed }
etc...
So far, so good.
But now I want to implement an operation that moves a Task from one Project to another.
My first guess was to do:
PATCH /task/:taskId
payload: { projectId: :projectId }
but I do not feel comfortable with revealing the internal structure of my backend to the frontend.
Of course, it is just a convention and has nothing to do with security, but I would feel better with something like:
PATCH /task/:taskId
payload: { newProject: :projectId }
where there is no direct relation between the 'newProject' and the real column in the database.
But then, the next operation comes.
I want to copy ALL tasks from Project A to Project B with one API call.
PUT /task
payload: { fromProject: :projectA, toProject: :projectB }
Is it a correct RESTful approach? If not - what is the correct one?
What is missing here is "a second verb".
You can see that we are creating a new task(s) hence: 'PUT' but we also 'copy' which is implied by fromProject and toProject.
Is it a correct RESTful approach? If not - what is the correct one?
To begin, think about how you would do it in a web browser: the world wide web is the reference implementation for the REST architectural style.
One of the first things that you will notice: on the web, we are almost always using POST to make changes to the server. You fill in a form in a browser, submit the form, the browser takes information from the input controls of the form to create the HTTP request body, the server figures out how to do the work that is described.
What we have in HTTP is a standardized semantics for messages that manipulate individual documents ("resources"); doing useful work is a side effect of manipulating documents (see Webber 2011).
The trick of POST is that it is the method whose standardized meaning includes the case where "this method isn't worth standardizing" (see Fielding 2009).
POST /2cc3e500-77d5-4d6d-b3ac-e384fca9fb8d
Content-Type: text/plain
Bob,
Please copy all of the tasks from project A to project B
The request line and headers here are metadata in the transfer of documents over a network domain. That is to say, that's the information we are sharing with the general purpose HTTP application.
The actual underlying business semantics of the changes we are making to documents is not something that the HTTP application cares about -- that's the whole point, after all.
That said - if you are really trying to do manipulation of document hierarchies in general purpose and standardized way, then you should maybe see if your problem is a close match to the WebDAV specifications (RFC 2291, RFC 4918, RFC 3253, etc).
If the constraints described by those documents are acceptable to you, then you may find that a lot of the work has already been done.

How to specify the model version label in a REST API request?

As described in the documentation, using the version_labels field, you can specify a label to a model version in order to handle canary deployments.
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/serving_config.md#assigning-string-labels-to-model-versions-to-simplify-canary-and-rollback
For example, you can have model 43 labeled as stable and model 44 labeled as canary.
That feature sounds really neat, but I did not find in the doc how to adapt my POST request to specify the label I want to use.
Until now, I was using something of the sort:
curl -d '{"instances": <<my input data>>}' -X POST http://localhost:8501/v1/models/<<my model name>>:predict
Any idea ?
Update:
Based on comments on this GitHub Issue, #misterpeddy states that, as of August 14th 2019:
Re: not being able to access the version using labels via HTTP - this is something that's not possible today (AFAIR) - only through the grpc interface can you declare labels :(
To the best of my knowledge, this feature is yet to be implemented.
Original Answer:
It looks like the current implementation of the HTTP API Handler expects the version to be numeric.
You can see the regular expression that attempts to parse the URL here.
prediction_api_regex_(
R"((?i)/v1/models/([^/:]+)(?:/versions/(\d+))?:(classify|regress|predict))")
The \d defines an expectation for a numeric version indicator rather than a text label.
I've opened a corresponding TensorFlow Serving issue here.
The REST API for TensorFlow Serving is defined here: https://www.tensorflow.org/tfx/serving/api_rest#url_4
For the predict method it would be:
http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict
where ${MODEL_VERSION} would be stable or canary

in REST what method to use for the sync operation

Synchronizing data once user gets online involves both Insert and Update (Upsert) and I'm sending both kinds of records in a single request (array) and then server iterates through records to determine insert or update.
My question is whether to use POST or PUT?
Also how a response from the server (JSON) should like in it's body? the data sent is an array, for example
{
"ids" : "15,16,17",
"success" : true
}
Edit:
And what should be the response code, it has both create and update operations:
200 OK
201 Created
REST is not CRUD. Mapping HTTP methods to CRUD operations is a convention introduced by some frameworks, but it has nothing to do with REST. Read this answer for some clarification on that.
A PUT is a complete replacement that ignores the current state of the resource. Think of the mv command in a shell. If there's nothing on the destination, it creates it. If there's something, it replaces completely, ignoring whatever is in there. That's how a PUT should work. Ideally, your application should have an uniform implementation of PUT that works in the exact same way with any URI that supports the method..
A POST submits the payload to be processed by the target resource under predefined rules. This means you can use POST for any operation that isn't already standardized by the HTTP protocol.
In your case, it's clearly not a complete replacement, so it's not a case for PUT. Use POST.

HTTP: Is it acceptable to use the COPY method where the Destination header is not a URI?

Background
I'm building an API that allows clients to manipulate geospatial objects. These objects contain a location on the world (in latitude/longitude), and a good bit of metadata. The actual API is rather large, so I present a simplified version here.
Current API
Consider an API with two objects, features and attributes.
The feature endpoint is /api/feature and looks like:
{
id: 5,
name: "My super cool feature",
geometry: {
type: "Point",
coordinates: [
-88.043355345726,
43.293055846667315
]
}
}
The attribute endpoint is /api/attribute. An attribute looks like:
{
id: 3,
feature_id: 5,
name: "attr-name",
value: "value"
}
You can interact with these objects by issuing HTTP requests to their endpoints using different HTTP methods, like you might expect:
GET /api/feature/5 reads the feature with id 5.
PUT /api/feature/5 updates the feature with id 5.
POST /api/feature creates a new feature.
DELETE /api/feature/5 deletes the feature with id 5.
Same goes for attributes.
Attributes are related to features by foreign key (commonly expressed as "features have many attributes").
The Problem
It would be useful to be able to make a copy of a feature and all its metadata (all the attributes that belong to it). The use case is more or less, "I just made this feature and gave it a bunch of attributes, now I want the same thing... but over there." So the only difference between the two features would be their geometries.
Solution #1: Make the client do it.
My first thought was to just have the client do it. Create a new feature with the same name at a new location, then iterate through all the attributes on the source feature, issuing POST requests to make copies of them on the new feature. This, however, suffers from a few problems. First, it isn't atomic. Should the client's Internet connection flake out during this process, you'd be left with an incomplete copy, which is lame. Second, it'd probably be slow, especially for features with many attributes. Anyway, this is a bad idea.
Solution #2: Add copy functionality to the API.
Doing the copy server-side, in a single API call, would be the better approach. This leads me to https://www.rfc-editor.org/rfc/rfc2518#section-8.8 and the COPY method. Being able to do a deep copy of a feature in a single COPY /api/feature/5 request seems ideal.
The Question
My issue, here, is the semantics of COPY don't quite fit the use I envision for it. Issuing a COPY request on a resource executes a copy of that resource to the destination specified in the Destination header. According to the RFC, Destination must be present, and it must be a URI specifying where the copied resource will end up. In my case, the destination for the copied feature is a geometry, which is decidedly not a URI.
So, my questions are: Would stuffing json for the geometry into the Destination header of a COPY request be a perversion of the spec? Is COPY even the right thing to use, here? If not, what alternatives are there? I just want to be sure I'm implementing this in the most HTTP-kosher way.
Well, you'll need a way to make the Destination a URI then (why is that a problem). If you're using the Destination header field for something else, you're not using COPY per spec. (And, BTW, the current specification is RFC 4918)

Link relation granularity vs precision in a custom media type?

I am in the process of designing a custom media type for a RESTful API, and have researched the types and semantic meaning of the some of the 'standard' link relations to give my design some steer.
To demonstrate the problem let's say that I have a resource that I can perform standard read, change, delete methods on and that I use the HTTP idioms of GET, PUT and DELETE respectively to implement those methods.
I could reasonably (re)use the "edit" link relation (from the IANA link registry) as defined in RFC5023 which states:
"...The value of "edit" specifies that the value of the href attribute
is the IRI of an editable Member Entry. When appearing within an
atom:entry, the href IRI can be used to retrieve, update, and delete
the Resource represented by that Entry...."
In this way, the user-agent can understand that a link with a "edit' relationship, will allow the resource to be GET, PUT and DELETEd.
However, and herein lies the problem, if the resource state is edited such that the resource now supports only GET and DELETE operations, the "edit" relation is no longer precise.
In order to retain the precision I need to either i) OPTION A: specify another (compound) link relation that supports GET & DELETE only, or ii) OPTION B: specify individual links for each possible state transfer and use the appropriate ones to indicate the permitted state transfers. The latter approach offers precision but seems overly verbose.
Alternatively, (OPTION C) I could leave the "edit" relationship in place and accept the lack of precision i.e. the link would convey the GET, PUT, DELETE semantics but a user-agent attempting a PUT would be met with an HTTP error '405 - Method not allowed'. However, I'm not happy with this approach either as it implies to the client a state transition which is not supported.
In summary, the question is what is the most sensible way to balance link relation generality and precision?
After some serious investigation I conclude that I'm trying to solve the wrong problem. Rather than be concerned with the granularity of HTTP verb in the definition of the Link Relation, a more refined question is 'Should the HTTP idioms (verbs) be conflated into the Link Relation?'.
I had used AtomPub as a reference of how to do Link Relations (for REST) and it turns out that this was an error. In the AtomPub mail archive Roy Fielding advises that (in REST terms) the approach to 'edit' is wrong and concludes that it is unnecessary. The argument suggests that there are other (HTTP) mechanisms to convey such properties and that they therefore have no place in 'rel' attribute.
The other mechanisms aren't made explicit in the mail archive, but I suspect they include the following options:
Let the user-agent try and examine the response (2xx or 4xx), or
Use OPTIONS to ask the resource for the permitted operations, or
Include an 'Allow' header in successful GET requests to convey
permitted resource operations to the user-agent.
Interestingly, Roy considers the 'Allow' header to be "a form of hypertext".
In summary, the answer to my own question is:
"Do not conflate HTTP operations into the meaning of 'rel' "
and
"Use the (provided) HTTP mechanisms to determine permitted resource operations"
Edit: I should add that there are some special uses of POST as data sink where these rules need to bent a little, but then they are a special case.
The WRML specification takes an approach where each "link" object can have a rel property.
GET /dogs/1
{
"links" : {
"self" : {
"href" : "http://api.example.com/dogs/1
"rel" : "http://api.example.com/relations/self"
}
}
}
And the client can then follow the rel url
GET /relations/self
{
"name" : "self"
"description" : " A reference back to the same object you are currently interacting with"
"method" : "GET"
}
The spec does recommend that each rel should have exactly 1 method specified. This has the enefit of being very explicit with your clients what they should do, and limits the amount of out of band knowledge that is required. I personally go back and forth on this, because I think there is some value in saying that certain "rel" provide multiple HTTP methods. Imagine a link for the owner of the dog
GET /dogs/1
{
"links" : {
"self" : {
"href" : "http://api.example.com/dogs/1
"rel" : "http://api.example.com/relations/self"
}
"owner" : {
"href" : "http://api.example.com/owner/1
"rel" : "http://api.example.com/relations/owner"
}
}
}
It would be nice to let "owner" imply GET and PUT since those are both valid actions. THe counter to that is you should always need to do a GET before doing an update so the value in giving that information prior to retrieving the resource is bad form.
So I guess all that said I would vote for OPTION B.
Another option would be to leave the "edit" relation, and allow a consumer who wants to know what they can currently perform on the resource to make a request with an OPTIONS HTTP method and the server can return a response with an Allow header to indicate the allowed methods on the resource given it's current state.
It doesn't give you the availability of the PUT operation without an extra request, but is fairly "clean" and lets you use a standard relation and HTTP mechanism.