Is this API too simple? - api

There are a multitude of key-value stores available. Currently you need to choose one and stick with it. I believe an independent open API, not made by a key-value store vendor would make switching between stores much easier.
Therefore I'm building a datastore abstraction layer (like ODBC but focused on simpler key value stores) so that someone build an app once, and change key-value stores if necessary. Is this API too simple?
get(Key)
set(Key, Value)
exists(Key)
delete(Key)
As all the APIs I have seen so far seem to add so much I was wondering how many additional methods were necessary?
I have received some replies saying that set(null) could be used to delete an item and if get returns null then this means that an item doesn't exist. This is bad for two reasons. Firstly, is it not good to mix return types and statuses, and secondly, not all languages have the concept of null. See:
Do all programming languages have a clear concept of NIL, null, or undefined?
I do want to be able to perform many types of operation on the data, but as I understand it everything can be built up on top of a key value store. Is this correct? And should I provide these value added functions too? e.g: like mapreduce, or indexes
Internally we already have a basic version of this in Erlang and Ruby and it has saved us alot of time, and also enabled us to test performance for specific use cases of different key value stores

Do only what is absolute necessary, instead of asking if it is too simple, ask if it is too much, even if it only has one method.

Your API lacks some useful functions like "hasKey" and "clear". You might want to look at, say, Python's hack at it, http://docs.python.org/tutorial/datastructures.html#dictionaries, and pick and choose additional functions.
Everyone is saying, "simple is good" and that's true until "simple is too simple."

If all you are doing is getting, setting, and deleting keys, this is fine.

There is no such thing as "too simple" for an API. The simpler the better! If it solves the need the way it is, then leave it.

The delete method is unnecessary. You can just pass null to set.
Edited to add:
I'm only kidding! I would keep delete, and probably add Count, Contains, and maybe an enumerator (or two).

When creating an API, you need to ask yourself, what does my API provide the user. If your API is so simplistic that it is faster and easier for your client to write their own app, then your API has failed. Ask yourself, does my functionality give them specific benefits. If the answer is no, it is too simplistic and generic.

I am all for simplifying an interface to its bare minimum but without having more details about the requirements of the system, it is tough to tell if this interface is sufficient. Sure looks concise enough though.
Don't forget to document the semantics for "key non-existent" as it isn't clear from reading your API definition above. updated: I see you have added the exists method: is this necessary? you could use the get method and define a NIL of some sort, no?
Maybe worth thinking about: how about considering "freshness" of a value? i.e. an associated "last-modified" timestamp? Of course, it depends on your system requirements.
What about access control? Is it within scope of the API definition?
What about iterating through the keys? If there is a possibility of a large set, you might want to include some pagination semantics.

As mentioned, the simpler the better, but a simple iterator or key-listing method could be of use. I always end up needing to iterate through the set. A "size()" method too, if not taken care of by the iterator. It obviously depends on your usage, though.

It's not too simple, it's beautiful. If "exists(key)" is just a convenient shorthand for "get(Key) != null", you should consider removing it. I guess that depends on how large or complex the value you get() is.

Related

Model validation logic in the database via constraints. Good idea, bad idea, or not worth it?

It's always rubbed me the wrong way to write code in my model's clean method to validate various constraints on the data when these same constraints aren't also present in the database.
After all, the database already has constraints for some of my data, like NOT NULL.
So, I've been writing RawSQL migrations that ADD CONSTRAINT some_logic in my most recent project that matches whatever logic I have in my clean() method.
It works OK, but it isn't an insignificant task to remember to add these constraints, add tests for these migrations, and update them when my model changes. Also, of course, I'm violating DRY by writing code in two places to do the same thing.
Should I give up this quixotic quest?
This is by no means a comprehensive answer, but at least I wanted to give my opinion.
There has been many frameworks that have pushed the idea of removing the constraints from the database, in order to check them at the application level. The idea seemed nice to me at first (in the early 2000s) but after some years I came to the (very personal) conclusion that this is a bad idea.
I think, to me it boils down to two things:
Data survives much longer than the applications. Whole systems go obsolete, but the data survives many more years. Sometimes the application is replaced, but the database is stil the same one.
The application is not as reliable when it comes to validate data. I'm talking about programming defects here. One version of the app may work well and then the next one has a bug. It may be that one developer moves out of the company, then the new replacement -- who doesn't know as much -- changes the app with disastrous consequences. All that time a simple database constraint (that is usually very cheap to implement) could have enforced data quality.
Yep, I'm a fan of strict database constraint. Nevertheless, this doesn't mean I'm against application validations. These ones can show much nicer error messages.
If writing too much logic in clean() feels dirty, an in-between solution would be to use Django's built-in validators directly on your model fields.
The validation logic isn't saved in the database, but it is tracked in migrations. Like clean() logic, Validators require you to call Model.clean_fields(), but a ModelForm does this automatically.
You can also dig into django-db-constraints. The library might help do what you're looking to do, and the source code might help you roll a solution that fits your needs.

redis - see if a string contains any key from a set of keys

I have a set of strings, which I was planning to store in a redis set. What I want to do is to check if any of these strings [s] is present inside a subject string ( say S1 ).
I read about SSCAN in redis but it allows me to search if any set member matches a pattern. I want the opposite way round. I want to check if any of the patterns matches my string. Is this possible?
What you want to do is not possible, but if your plan is to match prefixes, like in an autocomplete, you can take a look at sorted sets and ZRANGEBYLEX. You can take a look at this example. Even though it's not working right now, the code is very simple.
There are several ways to do it, it just depends how you want it done. SSCAN is a perfectly legitimate approach where you do the processing client-side and potentially over the network. Depending on your requirements, this may or may not be a good choice.
The opposite way is to let Redis do it for you, or as much as possible, to save on bandwidth, latency and client cpu. The trade off is, of course, having Redis' cpu do it so it may impact performance in some cases.
When it comes to letting Redis do the work, please first understand that the functionality you're describing is not included in it. So you need to build your own and, again, that depends on your specific use case (e.g. how big are s and S1, is S1 indexable as well, ...). Without this information it is hard to make accurate recommendations but the naive (mine) approach would be to use Lua for the job. The script's logic should either check all permutations of S1 for existence in s with SISMEMBER, or, do Lua pattern matching of all of s's members to S1.
This solution, of course, has plenty of room for optimization if some assumptions/rules are set.
Edit: Sorted sets and ZLEX* stuff are also possibly good for this kind of thing, as #Soveran points out. To augment his example and for further inspiration, see here for a reversed version and think of the possibilities :) I still can't understand how someone didn't go and implement FTS in Redis!

Writing an API, benefits of: including nested objects automatically, not at all, or provide a parameter to specify which to include?

For example, we have an entity called ServiceConfig that contains a pointer to a Service and a Professional. If returned without including the fields would look like this:
{
'type': '__Pointer',
'className': 'Service',
'objectId': 'q92he840'
}
At which point they could query again to retrieve that service. However, it is often the case that they need the Service name. In which case it is inefficient to have to query again to get the service every time.
Options:
Automatically return the Service. In which case, we should automatically return the Industry for that Service as well in case they need that... same applies to all. Seems like we're returning data too often here.
Allow them to pass an includes parameter that specifies which entities to include. Format is an array of strings where using a . can allow them to include subclasses. In this case ['Professional', 'Service.Industry'] would work.
Can anyone identify why any one solution would be better than the others? I feel that the last solution is the best, however it does not seem to be common to do to in the APIs I've seen.
This is a good API Design decision to spend your time on before you release an initial version. Both your approaches are valid and it all depends on what you think are the most common ways that clients would use your API.
Here are some points that you could consider:
You might prefer the first approach where you do not give all the data upfront. Sometimes it is about efficiency and at times it is also about security and ensuring that any additional important data is only fetched on as as needed basis and on authorization.
Implementing the 2nd approach is going to take more effort on part of your team to design/code and test out the API. So you might want to consider how much of effort you want to put into release 1.0
Since you have nested data for example, the second approach will serve you well. Several public APIs do that as a matter of fact. For e.g. look at the LinkedIn public API and particularly the facets section, where you can specify the fields or additional information that you would like to return.
Look at some of the client applications that you have written and if you can identify for sure that some data is needed anyways upfront, then it can help in designed the return data.
Eventually monitoring API usage and doing some analysis on the number of calls, methods invoked will give you good inputs on what to do next.
If I had to make a choice and have a little bit more leeway in terms of effort, I would go with the 2nd option, even if it is a simple version at first.

What is the "right" way to get a list(or more generically, just an object) available multiple places?

In a program that I'm responsible for, we want to start keeping track of milestones. These milestones are quite simple and consist of a unique identifier, the project they're assigned to, a description, and a date that they should be accomplished by (or not, if there's no concrete due date).
We use a slightly modified Model-View-Presenter architecture, and currently I'm passing this list around through the presenters, but it seems fairly clunky, so I was wondering:
What's the best way to make this list available to all the presenters/views that need it?
We're using VB.NET 3.5, and I was toying with the idea of making this a shared property of the main presenter, but it does seem like that adds some unnecessary coupling.
I agree with Oded about keeping it as you have it, but if you insist on having it the way you describe, you could consider implementing it (the collection) as a singleton.
Have a read through this article

Modelica - how to implement a constructor for a record

What is the best way to implement a constructor for a record? It seems like a function should be able to return a record object in the instantiation of the record in some later model higher up the tree, but I can't get that to work. For now I just use a bunch of parameters at the top of the record that populate the variables stored in the record, but it seems like that will only work in simple cases.
Can anyone shed a little light? Perhaps I shouldn't be using a record but a model. Also does anyone know how the PDE functionality is coming? The book only says that it is coming, but I have seen some other things around.
I don't seem to have the clout to add tags (which makes sense, since my "reputation" is lower than yours) so sorry about that. I thought I had actually added one at one point, but perhaps I am mistaken.
I think you need to be clear what you mean by constructor since it has a very specific meaning in Modelica. If I understand your question correctly, it sounds like what you want to do is create an instance of a record that has some fields that are specified in the constructor arguments and from those arguments a bunch of other fields in the record are computed. Is that correct?
If so, there is a mechanism to do this. You mention "the book" but it isn't clear which one you mean. If it is mine, it definitely has no mention of these so called "record constructors" because it is too old. I do not know if Peter Fritzson's book mentions them either. However, they do exist and are documented in Section 12.6 of the Modelica 3.2 specification.
As for PDEs, there has been work into this kind of thing but nothing has really been done within the design group on this topic. I would add that if you want to solve either elliptical or parabolic PDEs on regular grids, this isn't too hard even with the current language. The only real drawback is that most tools probably don't handle sparsity very efficiently. Irregular grids would also be possible, but then you get into complicated basis functions. Finally, hyperbolic PDEs are, in my opinion, quite tricky (in any environment) due to the implicit physical constraints between time and space which are difficult to express (i.e. the CFL condition).
I hope that answers your questions so far.
I can only comment on your question regarding the book of Peter Fritzson. He confirmed that he's working on an update and he hopes to get it ready 'in the course of 2011'.
Original post here:
http://openmodelica.org/index.php/forum/topic?id=50
And thanks for initiating the modelica tag, I might be useful in the near future for me too... :-)
regards,
Roel