API design for machine learning models - api

Using Django rest framework to build an API webservice that contains many of already trained machine learning models. Some models can predict a batch_size of 1 or an image at a time. Others need a history of data (timelines) to be able to predict/forecasts. Usually these timelines can hardly fit and passed as parameter. Being that, we want to give the requester the ability to request by either:
sending the data (small batches) to predict as parameter.
passing a database id/reference as parameter then the API will query the database and do the predictions.
So the question is, what would be the best API design for identifying which approach the requester chose?. Some considered approaches:
Add /db to the path of the endpoint ex: POST models/<X>/db. The problem with this approach is that (2x) endpoints are generated for each model.
Add parameter db as boolean to each request. The problem with such approach is that it adds additional overhead for each request just to check which approach. Also, make the code less readable.
Global variable set for each requester when signed for the API token. The problem is that you restricted the requester for 1 mode which is not convenient.
What would be the best approach for this case

The fact that you currently have more than one source would cause me to seriously consider attempting to abstract the "source" component as much as possible, to allow all manner of sources. For example, suppose that future users would like to pull data out of a mongodb, instead of a whatever db you currently are using? Or from some other storage structure? Or pull from a third party? Or, or, or....
In any case the question is now "how much do they all have in common, and what should they all implement?"
class Source(object):
def __get_batch__(self, batch_size=1):
raise NotImplementedError() #each source needs to implement this on its own
#http_library.POST_endpoint("/db")
class DBSource(Source):
def __init__(self, post_data):
if post_data["table"] in ["data1", "data2"]:
self.table = table
else:
raise Exception("Must use predefined table to prevent SQL injection")
def __get_batch__(self, batch_size=1):
return sql_library.query("SELECT * FROM {} LIMIT ?".format(self.table), batch_size)
#http_library.POST_endpoint("/local")
class LocalSource(Source):
def __init__(self, post_data):
self.data = post_data["data"]
def __get_batch__(self, batch_size=1):
data = self.data[self.i, self.i+batch_size]
i += batch_size
return data
This is just an example. However, if a fixed part of your path designates "the source", then you have left yourself open to scale this indefinitely.

Add /db to the path of the endpoint ex: POST models//db. The problem with this approach is that (2x) endpoints are generated for
each model.
Inevitable. DRY out common code to sub-methods.
Add parameter db as boolean to each request. The problem with such approach is that it adds additional overhead for each request just to
check which approach. Also, make the code less readable.
There won't be any additional overhead (that's what your underlying framework does to match a URL to a function/method anyway). However, these are 2 separate functionalities, I would keep them separate, so I would prefer the first approach.
Global variable set for each requester when signed for the API token. The problem is that you restricted the requester for 1 mode
which is not convenient.
Yikes! unless you provide a UI letting a user to select his preference and apply it globally (I don't think any UX will agree to that)
That being said, the api design should be driven by questioning who is mastering (or owning) the data. If it's the application and user already knows the ID of that entity, then you shouldn't be asking the data from the user.
If it's the user, and then if it won't fit in a POST body, then I would say, a real-time API may not be the right solution, think about message queues/pub-sub based systems.
If you need a hybrid solution as you asked in the question, then, I would prefer the 1st approach.

Related

OOP - How to pass data "up" in abstraction?

I have run into a problem when designing my software.
My software consists of a few classes, Bot, Website, and Scraper.
Bot is the most abstract, executive class responsible for managing the program at a high-level.
Website is a class which contains scraped data from that particular website.
Scraper is a class which may have multiple instances per Website. Each instance is responsible for a different part of a single website.
Scraper has a function scrape_data() which returns the JSON data associated with the Website. I want to pass this data into the Website somehow, but can't find a way since Scraper sits on a lower level of abstraction. Here's the ideas I've tried:
# In this idea, Website would have to poll scraper. Scraper is already polling Server, so this seems messy and inefficient
class Website:
def __init__(self):
self.scrapers = list()
self.data = dict()
def add_scraper(self, scraper):
self.scrapers.append(scraper)
def add_data(type, json):
self.data[type] = json
...
# The problem here is scraper has no awareness of the dict of websites. It cannot pass the data returned by Scraper into the respective Website
class Bot:
def __init__(self):
self.scrapers = list()
self.websites = dict()
How can I solve my problem? What sort of more fundamental rules or design patterns apply to this problem, so I can use them in the future?
As soon as you start talking about a many-to-many parent/child relationship, You should be thinking about compositional patterns rather than traditional inheritance. Specifically, the Decorator Pattern. Your add_scraper method is kind of a tipoff that you're essentially looking to build a handler-stack.
The classic example for this pattern is a set of classes responsible for producing the price of a coffee. You start with a base component "coffee", and you have one class per ingredient, each with its own price modifier. A class for whole milk, one for skim, one for sugar, one for hazelnut syrup, one for chocolate, etc. And all the ingredients as well as the base components share an interface that guarantees the existence of a 'getPrice' method. As the user places their order, the base component gets injected into the first ingredient/wrapper-class. The wrapped object gets injected into subsequent ingredient-wrappers and so-on, until finally getPrice is called. And each instance of getPrice should be written to first pull from the previously injected one, so the calculation reaches throughout all layers.
The benefits are that new ingredients can be added without impacting the existing menu, existing ones can have their price changed in isolation, and ingredients can be added to multiple types of drinks.
In your case, the data-struct being decorated is the Website object. The ingredient classes would be your Scrapers, and the getPrice method would be scrape_data. And the scrape_data method should expect to receive an instance of Website as a parameter, and return it after hydration. Each Scraper needs no awareness of how the other scrapers work, or which ones to implement. All it needs to know is that a previous one exists and adheres to an interface guaranteeing that it too has a scrape_data method. And all will ultimately be manipulating the same Website object, so that what gets spit back out to your Bot has been hydrated by all of them.
This puts the onus of knowing what Scrapers to apply to what Website on your Bot class, which is essentially now a Service class. Since it lives in the upper abstraction layer, it has the high-level perspective needed to know this.
One way to go about this is, taking inspiration from noded structures, to have an atribute in the Scraper class that directly references its respective Website, as if I'm understanding correctly you described a one-to-many relationship (one Website can have multiple Scrapers). Then, when a Scraper needs to pass its data to its Website, you can reference directly said atribute:
class Website:
def __init__(self):
self.scrapers = list() #You can indeed remove this list of scrapers since the
#scrapper will reference its master website, not the other way around
self.data = dict() #I'm not sure how you want the data to be stores,
#it could be a list, a dict, etc.
def add_scraper(self, scraper):
self.scrapers.append(scraper)
def add_data(type, json):
self.data[type] = json
class Scraper:
def __init__(self, master_website):
#Respective code
self.master = master_website #This way you have a direct reference to the website.
#This master_website is a Website object
...
def scrape_data(self):
json = #this returns the scraped data in JSON format
self.master.add_data(type, json)
I don't know how efficient this would be or if you want to know at any moment which scrapers are linked to which website, though

How to organize endpoints when using FeathersJS's seemingly restrictive api methods?

I'm trying to figure out if FeathersJS suits my needs. I have looked at several examples and use cases. FeathersJS uses a set of request methods : find, get, create, update, patch and delete. No other methods let alone custom methods can be implemented and used, as confirmed on this other SO post..
Let's imagine this application where users can save their app settings. Careless of following method conventions, I would create an endpoint describing the action that is performed by the user. In this case, we could have, for instance: /saveSettings. Knowing there won't be any setting-finding, -creation, -updating (only some -patching) or -deleting. I might also need a /getSettings route.
My question is: can every action be reduced down to these request methods? To me, these actions are strongly bound to a specific collection/model. Sometimes, we need to create actions that are not bound to a single collection and could potentially interact with more than one collection/model.
For this example, I'm guessing it would be translated in FeathersJS with a service named Setting which would hold two methods: get() and a patch().
If that is the correct approach, it looks to me as if this solution is more server-oriented than client-oriented in the sense that we have to know, client-side, what underlying collection is going to get changed or affected. It feels like we are losing some level of freedom by not having some kind of routing between endpoints and services (like we have in vanilla ExpressJS).
Here's another example: I have a game character that can skill-up. When the user decides to skill-up a particular skill, a request is sent to the server. This endpoint can look like POST: /skillUp What would it be in FeathersJS? by implementing SkillUpService#create?
I hope you get the issue I'm trying to highlight here. Do you have some ideas to share or recommendations on how to organize the API in this particular framework?
I'm not an expert of featherJs, but if you build your database and models with a good logic,
these methods are all you need :
for the settings example, saveSettings corresponds to setting.patch({options}) so to the route settings/:id?options (method PATCH) since the user already has some default settings (created whith the user). getSetting would correspond to setting.find(query)
To create the user AND the settings, I guess you have a method to call setting.create({defaultOptions}) when the user CREATE route is called. This would be the right way.
for the skillUp route, depends on the conception of your database, but I guess it would be something like a table that gives you the level/skills/character, so you need a service for this specific table and to call skillLevel.patch({character, level})
In addition to the correct answer that #gui3 has already given, it is probably worth pointing out that Feathers is intentionally restricting in order to help you create RESTful APIs which focus on resources (data) and a known set of methods you can execute on them.
Aside from the answer you linked, this is also explained in more detail in the FAQ and an introduction to REST API design and why Feathers does what it does can be found in this article: Design patterns for modern web APIs. These are best practises that helped scale the internet (specifically the HTTP protocol) to what it is today and can work really well for creating APIs. If you still want to use the routes you are suggesting (which a not RESTful) then Feathers is not the right tool for the job.
One strategy you may want to consider is using a request parameter in a POST body such as { "action": "type" } and use a switch statement to conditionally perform the desired action. An example of this strategy is discussed in this tutorial.

Ember change normalizeResponse based on queried model

I'm using a second datastore with my Ember app, so I can communicate with a separate external API. I have no control over this API.
With a DS.JSONSerializer I can add some missing properties like id:
normalizeResponse(store, primaryModelClass, payload, id, requestType) {
if (requestType == 'query') {
payload.forEach(function(el, index) {
payload[index].id = index
})
}
Now I can do some different tricks for each different requestType. But every response is parsed. Now sometimes a response from one request needs to be parsed differently.
So what I am trying to do is change the normalizeResponse functionality for each different request path (mapped to a fake model using pathForType in an adapter for this store). But the argument store is always the same (obviously) and the argument promaryModelClass is always "unknown mixin" - not sure if this can be any help.
How can I find what model was requested? With this information I could do a switch() in normalizeResponse.
Is there a different way to achieve my goal that does not require me to make a separate adapter for every path/model?
There are over a dozen normalize functions available. Something should work for what I am trying to achieve.
I think this is a great example of a use case of not using ember data.
Assuming that you have models A,B,C that are all working great with ember data, leave those alone.
I'd create a separate service and make raw requests to that different endpoint. So you'd replace this.store.query('thing', {args}) with a separate service that uses ember-ajax (or ember-fetch or whatever). If you need, you can use that service to hold the data that you need (Ember-data is just a service anyway) or you can create models and push them into the store manually.
Without knowing more about your exact situation, hard to give a specific code/advice, but I'd just avoid this problem and write your own custom service.
You can use primaryModelClass.modelName.

How to design a RESTful API that queries info about a verb (e.g. a potential POST request)?

I'm learning how to design a RESTful API and I've come across a quandary.
Say I have a POST endpoint to perform an action. The action has a certain cost associated with it. The cost depends on what the action is, particularly, on the body of the POST. For example, given these two requests:
POST /flooblinate
{"intensity": 50, "colorful": true, "blargs": [{"norg": 43}]}
POST /flooblinate
{"intensity": 100, "colorful": false, "blargs": []}
Say the first one costs 500 and the second one costs 740.
I want to create a method which will tell me what the cost of posting the action will be. Since I am not creating or updating anything, it seems that GET is the most appropriate verb. However, a request body with GET should not have any meaning. This means that I'd have to put the data in the query string, say by URL encoding the request body to be passed to the POST:
GET /flooblinate/getCost?body=%7B%22intensity%22%3A+50%2C+%22colorful%22%3A+true%2C+%22blargs%22%3A+%5B%7B%22norg%22%3A+43%7D%5D%7D
This seems less than ideal since it's two data formats for the same thing. But the following:
POST /flooblinate/getCost
{"intensity": 50, "colorful": true, "blargs": [{"norg": 43}]}
This also seems less than ideal since it's abusing the POST verb to query information, instead of to create or update something.
What's the correct choice to make, here? Is there any third alternative? Is there a way to rethink this design fundamentally which will obviate the need to make this choice?
Personally I'm not for adding dryRyn flags. I try to avoid boolean flags in general unless they're really required.
I've two ideas to cover this scenario:
One is to introduce state on the backend site, e.g. STARTED, FINISHED. When given resource action is submitted it has STARTED state and calculated cost which cam be viewed via GET method. Such resource may be modified with PUT and PATCH methods and is submitted when given method changes its state to FINISHED. Resources that didn't change its state for given amount of time are removed are their state is changed to other terminal value.
Second idea is to introduce a new endpoint called e.g. /calculations. If you need to calculate the cost of given action you just send the same payload (via POST) to this endpoint and in return a cost is given. Then resource send may be kept on server for some established TTL or kept forever.
In all the scenarios given (including yours) there's a need to make at least two requests.
The nicest choice here seems to be to have the endpoints return the info that need querying, and add a dryRun parameter to those endpoints. Thus:
POST /flooblinate?dryRun=true
{"intensity": 50, "colorful": true, "blargs": [{"norg": 43}]}
Returns:
{"cost": 500, /* whatever else */
And then posting with dryRun=false actually commits the action.

How do multiple versions of a REST API share the same data model?

There is a ton of documentation on academic theory and best practices on how to manage versioning for RESTful Web Services, however I have not seen much discussion on how multiple REST APIs interact with data.
I'd like to see various architectural strategies or documentation on how to handle hosting multiple versions of your app that rely on the same data pool.
For instance, suppose you make a database level destructive change to a database table that causes you to have to increment your major API version to v2.
Now at any given time, users could be interacting with the v1 web service and the v2 web service at the same time and creating data that is visible and editable by both services. How should this be handled?
Most of changes introduced to API affect the content of the response, till changes introduced are incremental this is not a very big problem (note: you should never expose the exact DB model directly to the clients).
When you make a destructive/significant change to DB model and new API version of API is introduced, there are two options:
Turn the previous version off, filter out all queries to reply with 301 and new location.
If 1. is impossible to need to maintain both previous and current version of the API. Since this might time and money consuming it should be done only for some time and finally previous version should be turned off.
What with DB model? When two versions of API are active at the same time I'd try to keep the DB model as consistent as possible - having in mind that running two versions at the same time is just temporary. But as I wrote earlier, DB model should never be exposed directly to the clients - this may help you to avoid a lot of problems.
I have given this a little thought...
One solution may be this:
Just because the v1 API should not change, it doesn't mean the underlying implementation cannot change. You can modify the v1 implementation code to set a default value, omit the saving of a field, return an unchecked exception, or do some kind of computational logic that helps the v1 API to be compatible with the shared datasource. Then, implement a better, cleaner, more idealistic implementation in v2.
when you are going to change any thing in your API structure that can change the response, you most increase you'r API Version.
for example you have this request and response:
request post: a, b, c, d
res: {a,b,c+d}
and your are going to add 'e' in your response fetched from database.
if you don't have any change based on 'e' in current client versions, you can add it on your current API version.
but if you'r new changes are going to change last responses, for example:
res: {a+e, b, c+d}
you most increase API number to prevent crashing.
changing in the request input's are the same.