Is it possible to use Morphia in Scala?
Are there any other lightweight ORMs for MongoDB that support scala?
Check out Salat:
https://github.com/novus/salat
Salat uses pickled Scala signatures to serialize and deserialize case classes.
Morphia is just a persistence layer based on mongo-java-driver that uses annotation in a JPA-style for object mapping. It should perfectly work with Scala.
Among the "native" Scala drivers (worth to mention that all of them are also based on mongo-java-driver), Rogue (developed by Foursquare) is the closest ideologically to Morphia (though it doesn't use annotations, which aren't considered to be Scala-idiomatic).
I prefer "Mongo Scala Driver":
https://github.com/osinka/mongo-scala-driver
Morphia is probably much more approachable and has a (much) smoother learning curve, but it's crucial to realize that the static type-safety and auto-completion support Rogue gives you when querying is really one level above Morphia—Morphia is only runtime safe, which they also admit right the beginning of the README.
Compare:
val checkin: Option[Checkin] =
Checkin where (_.venueid eqs id)
and (_.userid eqs mayor.id)
and (_.cheat eqs false)
and (_._id after sixtyDaysAgo)
limit(1).get()
vs
Employee scottsBoss =
ds.find(Employee.class).filter("underlings", scottsKey).get();
If you change any of the field names or query values to be incorrect, you'll get an immediate typing error, whereas Morphia will only throw an exception at runtime.
See http://engineering.foursquare.com/2011/01/21/rogue-a-type-safe-scala-dsl-for-querying-mongodb/
Related
I have been exploring the CQRS/DDD-principles and patterns for a while now and have started implementing a sample project where I have split my storage-model into a WriteModel and a ReadModel. The WriteModel will use a simple NoSQL-like database where aggregates are stored in a key-value style, with value being just a serialized version of the aggregate.
I am now looking at ProtoBuf-Net for serializing and deserializing my domain model aggregates in and out of storage. Other than this post I haven't found any guidance or tips for using ProtoBuf-Net in this area. The point is that the (ideal) requirements for serialization and deserialization of aggregates is that the domain model should have as little knowledge as possible about this infrastructural concern, which implies the following:
No attributes on the classes
No constructors, getters, setters or any other piece of code just for the sake of serialization.
Ability to use any (custom) type possible and have it serialized/deserialized.
Thus far I have implemented just the serialization of the first versions of my aggregates which works perfectly fine. I use the RuntimeTypeModel.Default-instance to configure the MetaModel at runtime and have UseConstructor = false everywhere, which enables me to completely separate the serialization mechanics from my domain-assembly. I have even implemented a custom post-deserialization mechanism that enables me to just-in-time initialize fields after ProtoBuf-Net has deserialized it into a valid instance. So suppose I have class AggregateA like so:
[Version(1)]
public sealed class AggregateA
{
private readonly int _x;
private readonly string _y;
...
}
Then in my serialization-library I have code something along the following lines:
var metaType = RuntimeTypeModel.Default.Add(typeof(AggregateA), false);
metaType.UseConstructor = false;
metaType.AddField(1, "_x");
metaType.AddField(2, "_y");
...
However, I realize that up to this point I have only implemented the basic scenario, and I am now starting to think about how to approach versioning of my model. I am particularly interested in larger refactoring-scenario's, where type A has been split into type A1 and A2, for example:
[Version(2)]
public sealed class AggregateA1
{
private readonly int _x;
...
}
[Version(2)]
public sealed class AggregateA2
{
private readonly string _y;
...
}
Suppose I have a serialized bunch of instances of AggregateA, but now my domain model knows only AggregateA1 and AggregateA2, how would you handle this scenario with ProtoBuf-Net?
A second question deals with point 3: is ProtoBuf-Net capable of handling arbitrary types if you're willing to put in some extra configuration-effort? I've read about exceptions raised when using the DateTimeOffset-type, which makes me think not all types can be serialized by the framework out-of-the-box, but can I serialize these types by registering them in the RuntimeTypeModel? Should I even want to go there? Or better to forget about serializing common .NET types other than the simple ones?
protobuf-net is intended to work with predictable known models. It is true that everything can be configured at runtime, but I have not put any thought as to how to handle your A1/A2 scenario, precisely because that is not a supported scenario (in my defense, I can't see that working nicely with most serializers). Thinking off the top of my head, if you have the configuration/mapping data somewhere, then you could simply deserialize twice; i.e. as long as we still tell it that AggregateA1._x maps to 1 and AggregateA2._y maps to 2, you can do:
object a1 = model.Deserialize(source, null, typeof(AggregateA1));
source.Position = 0; // rewind
object a2 = model.Deserialize(source, null, typeof(AggregateA2));
However, more complex tweaks would require additional thought.
Re "arbitrary types"... define "arbitrary" ;p In particular, there is support for "surrogate" types which can be useful for some transformations - but without a very specific "problem statement" it is hard to answer completely.
Summary:
protobuf-net has an intended usage, which includes both serialization-aware (attributed, etc) and non-aware scenarios (runtime configuration, etc) - but it also works for a range of more bespoke scenarios (letting you drop to the raw reader/writer API if you want to). It does not and cannot guarantee to be a direct fit for every serialization scenario imaginable, and how well it behaves will depend on how far from that scenario you are.
I have a lot to learn in the way of OO patterns and this is a problem I've come across over the years. I end up in situations where my classes' sole purpose is procedural, just basically wrapping a procedure up in a class. It doesn't seem like the right OO way to do things, and I wonder if someone is experienced with this problem enough to help me consider it in a different way. My specific example in the current application follows.
In my application I'm taking a set of points from engineering survey equipment and normalizing them to be used elsewhere in the program. By "normalize" I mean a set of transformations of the full data set until a destination orientation is reached.
Each transformation procedure will take the input of an array of points (i.e. of the form class point { float x; float y; float z; }) and return an array of the same length but with different values. For example, a transformation like point[] RotateXY(point[] inList, float angle). The other kind of procedure wold be of the analysis type, used to supplement the normalization process and decide what transformation to do next. This type of procedure takes in the same points as a parameter but returns a different kind of dataset.
My question is, what is a good pattern to use in this situation? The one I was about to code in was a Normalization class which inherits class types of RotationXY for instance. But RotationXY's sole purpose is to rotate the points, so it would basically be implementing a single function. This doesn't seem very nice, though, for the reasons I mentioned in the first paragraph.
Thanks in advance!
The most common/natural approach for finding candidate classes in your problem domain is to look for nouns and then scan for the verbs/actions associated with those nouns to find the behavior that each class should implement. While this is generally a good advise, it doesn't mean that your objects must only represent concrete elements. When processes (which are generally modeled as methods) start to grow and become complex, it is a good practice to model them as objects. So, if your transformation has a weight on its own, it is ok to model it as an object and do something like:
class RotateXY
{
public function apply(point p)
{
//Apply the transformation
}
}
t = new RotateXY();
newPoint = t->apply(oldPoint);
in case you have many transformations you can create a polymorphic hierarchy and even chain one transformation after another. If you want to dig a bit deeper you can also take a look at the Command design pattern, which closely relates to this.
Some final comments:
If it fits your case, it is a good idea to model the transformation at the point level and then apply it to a collection of points. In that way you can properly isolate the transformation concept and is also easier to write test cases. You can later even create a Composite of transformations if you need.
I generally don't like the Utils (or similar) classes with a bunch of static methods, since in most of the cases it means that your model is missing the abstraction that should carry that behavior.
HTH
Typically, when it comes to classes that contain only static methods, I name them Util, e.g. DbUtil for facading DB access, FileUtil for file I/O etc. So find some term that all your methods have in common and name it that Util. Maybe in your case GeometryUtil or something along those lines.
Since the particulars of the transformations you apply seem ad-hoc for the problem and possibly prone to change in the future you could code them in a configuration file.
The point's client would read from the file and know what to do. As for the rotation or any other transformation method, they could go well as part of the Point class.
I see nothing particularly wrong with classes/interfaces having just essentially one member.
In your case the member is an "Operation with some arguments of one type that returns same type" - common for some math/functional problems. You may find convenient to have interface/base class and helper methods that combine multiple transformation classes together into more complex transformation.
Alternative approach: if you language support it is just go functional style altogether (similar to LINQ in C#).
On functional style suggestion: I's start with following basic functions (probably just find them in standard libraries for the language)
collection = map(collection, perItemFunction) to transform all items in a collection (Select in C#)
item = reduce (collection, agregateFunction) to reduce all items into single entity (Aggregate in C#)
combine 2 functions on item funcOnItem = combine(funcFirst, funcSecond). Can be expressed as lambda in C# Func<T,T> combined = x => second(first(x)).
"bind"/curry - fix one of arguments of a function functionOfOneArg = curry(funcOfArgs, fixedFirstArg). Can be expressed in C# as lambda Func<T,T> curried = x => funcOfTwoArg(fixedFirstArg, x).
This list will let you do something like "turn all points in collection on a over X axis by 10 and shift Y by 15": map(points, combine(curry(rotateX, 10), curry(shiftY(15))).
The syntax will depend on language. I.e. in JavaScript you just pass functions (and map/reduce are part of language already), C# - lambda and Func classes (like on argument function - Func<T,R>) are an option. In some languages you have to explicitly use class/interface to represent a "function" object.
Alternative approach: If you actually dealing with points and transformation another traditional approach is to use Matrix to represent all linear operations (if your language supports custom operators you get very natural looking code).
Object oriented programming in one way or another is very much possible in R. However, unlike for example Python, there are many ways to achieve object orientation:
The R.oo package
S3 and S4 classes
Reference classes
the proto package
My question is:
What major differences distinguish these ways of OO programming in R?
Ideally the answers here will serve as a reference for R programmers trying to decide which OO programming methods best suits their needs.
As such, I am asking for detail, presented in an objective manner, based on experience, and backed with facts and reference. Bonus points for clarifying how these methods map to standard OO practices.
S3 classes
Not really objects, more of a naming convention
Based around the . syntax: E.g. for print, print calls print.lm print.anova, etc. And if not found,print.default
S4 classes
Can dispatch on multiple arguments
More complicated to implement than S3
Reference classes
Primarily useful to avoid making copies of large objects (pass by reference)
Description of reasons to use RefClasses
proto
ggplot2 was originally written in proto, but will eventually be rewritten using S3.
Neat concept (prototypes, not classes), but seems tricky in practice
Next version of ggplot2 seems to be moving away from it
Description of the concept and implementation
R6 classes
By-reference
Does not depend on S4 classes
"Creating an R6 class is similar to the reference class, except that there’s no need to separate the fields and methods, and you can’t specify the types of the fields."
Edit on 3/8/12: The answer below responds to a piece of the originally posted question which has since been removed. I've copied it below, to provide context for my answer:
How do the different OO methods map to the more standard OO methods used in e.g. Java or Python?
My contribution relates to your second question, about how R's OO methods map to more standard OO methods. As I've thought about this in the past, I've returned again and again to two passages, one by Friedrich Leisch, and the other by John Chambers. Both do a good job of articulating why OO-like programming in R has a different flavor than in many other languages.
First, Friedrich Leisch, from "Creating R Packages: A Tutorial" (warning: PDF):
S is rare because it is both interactive and has a system for object-orientation. Designing classes clearly is programming, yet to make S useful as an interactive data analysis environment, it makes sense that it is a functional language. In "real" object-oriented programming (OOP) languages like C++ or Java class and method definitions are tightly bound together, methods are part of classes (and hence objects). We want incremental and interactive additions like user-defined methods for pre-defined classes. These additions can be made at any point in time, even on the fly at the command line prompt while we analyze a data set. S tries to make a compromise between object orientation and interactive use, and although compromises are never optimal with respect to all goals they try to reach, they often work surprisingly well in practice.
The other passage comes from John Chambers' superb book "Software for Data Analysis". (Link to quoted passage):
The OOP programming model differs from the S language in all but the first
point, even though S and some other functional languages support classes
and methods. Method definitions in an OOP system are local to the class;
there is no requirement that the same name for a method means the same
thing for an unrelated class. In contrast, method definitions in R do not
reside in a class definition; conceptually, they are associated with the generic
function. Class definitions enter in determining method selection, directly
or through inheritance. Programmers used to the OOP model are sometimes
frustrated or confused that their programming does not transfer to R directly,
but it cannot. The functional use of methods is more complicated but also
more attuned to having meaningful functions, and can't be reduced to the
OOP version.
S3 and S4 seem to be the official (i.e. built in) approaches for OO programming. I have begun using a combination of S3 with functions embedded in constructor function/method. My goal was to have a object$method() type syntax so that I have semi-private fields. I say semi-private because there is no way of really hiding them (as far as I know). Here is a simple example that doesn't actually do anything:
#' Constructor
EmailClass <- function(name, email) {
nc = list(
name = name,
email = email,
get = function(x) nc[[x]],
set = function(x, value) nc[[x]] <<- value,
props = list(),
history = list(),
getHistory = function() return(nc$history),
getNumMessagesSent = function() return(length(nc$history))
)
#Add a few more methods
nc$sendMail = function(to) {
cat(paste("Sending mail to", to, 'from', nc$email))
h <- nc$history
h[[(length(h)+1)]] <- list(to=to, timestamp=Sys.time())
assign('history', h, envir=nc)
}
nc$addProp = function(name, value) {
p <- nc$props
p[[name]] <- value
assign('props', p, envir=nc)
}
nc <- list2env(nc)
class(nc) <- "EmailClass"
return(nc)
}
#' Define S3 generic method for the print function.
print.EmailClass <- function(x) {
if(class(x) != "EmailClass") stop();
cat(paste(x$get("name"), "'s email address is ", x$get("email"), sep=''))
}
And some test code:
test <- EmailClass(name="Jason", "jason#bryer.org")
test$addProp('hello', 'world')
test$props
test
class(test)
str(test)
test$get("name")
test$get("email")
test$set("name", "Heather")
test$get("name")
test
test$sendMail("jbryer#excelsior.edu")
test$getHistory()
test$sendMail("test#domain.edu")
test$getNumMessagesSent()
test2 <- EmailClass("Nobody", "dontemailme#nowhere.com")
test2
test2$props
test2$getHistory()
test2$sendMail('nobody#exclesior.edu')
Here is a link to a blog post I wrote about this approach: http://bryer.org/2012/object-oriented-programming-in-r I would welcome comments, criticisms, and suggestions to this approach as I am not convinced myself if this is the best approach. However, for the problem I was trying to solve it has worked great. Specifically, for the makeR package (http://jbryer.github.com/makeR) I did not want users to change data fields directly because I needed to ensure that an XML file that represented my object's state would stay in sync. This worked perfectly as long as the users adhere to the rules I outline in the documentation.
If I search for NHibernate Criteria API query examples in internet there are examples that use Restrictions and others use Expression. What are the differences between those two?
For example:
posts = session.CreateCriteria<Post>()
.Add(Expression.Eq("Id", 1))
.List<Post>();
posts = session.CreateCriteria<Post>()
.Add(Restrictions.Eq("Id", 1))
.List<Post>();
I think Restrictions were released in NH2 and is now the favoured way.
According to Resharper whenever I use Expression I get a hint to say Access to a static member of a type via a derived type
Also according to this post by Ayende:-
Prefer to use the Restrictions instead
of the Expression class for defining
Criteria queries.
In the source code for namespace NHibernate.Criterion.Expression is says that "This class is semi-deprecated use Restrictions"
Expression inherits from Restrictions but it is recommended to use Restrictions. Expression is apparently deprecated.
According to Ayende (old post about NH 2.0), documentation will usually refer to Restrictions.
I'm designing a language, and I'm wondering if it's reasonable to make reference types non-nullable by default, and use "?" for nullable value and reference types. Are there any problems with this? What would you do about this:
class Foo {
Bar? b;
Bar b2;
Foo() {
b.DoSomething(); //valid, but will cause exception
b2.DoSomething(); //?
}
}
My current language design philosophy is that nullability should be something a programmer is forced to ask for, not given by default on reference types (in this, I agree with Tony Hoare - Google for his recent QCon talk).
On this specific example, with the unnullable b2, it wouldn't even pass static checks: Conservative analysis cannot guarantee that b2 isn't NULL, so the program is not semantically meaningful.
My ethos is simple enough. References are an indirection handle to some resource, which we can traverse to obtain access to that resource. Nullable references are either an indirection handle to a resource, or a notification that the resource is not available, and one is never sure up front which semantics are being used. This gives either a multitude of checks up front (Is it null? No? Yay!), or the inevitable NPE (or equivalent). Most programming resources are, these days, not massively resource constrained or bound to some finite underlying model - null references are, simplistically, one of...
Laziness: "I'll just bung a null in here". Which frankly, I don't have too much sympathy with
Confusion: "I don't know what to put in here yet". Typically also a legacy of older languages, where you had to declare your resource names before you knew what your resources were.
Errors: "It went wrong, here's a NULL". Better error reporting mechanisms are thus essential in a language
A hole: "I know I'll have something soon, give me a placeholder". This has more merit, and we can think of ways to combat this.
Of course, solving each of the cases that NULL current caters for with a better linguistic choice is no small feat, and may add more confusion that it helps. We can always go to immutable resources, so NULL in it's only useful states (error, and hole) isn't much real use. Imperative technqiues are here to stay though, and I'm frankly glad - this makes the search for better solutions in this space worthwhile.
Having reference types be non-nullable by default is the only reasonable choice. We are plagued by languages and runtimes that have screwed this up; you should do the Right Thing.
This feature was in Spec#. They defaulted to nullable references and used ! to indicate non-nullables. This was because they wanted backward compatibility.
In my dream language (of which I'd probably be the only user!) I'd make the same choice as you, non-nullable by default.
I would also make it illegal to use the . operator on a nullable reference (or anything else that would dereference it). How would you use them? You'd have to convert them to non-nullables first. How would you do this? By testing them for null.
In Java and C#, the if statement can only accept a bool test expression. I'd extend it to accept the name of a nullable reference variable:
if (myObj)
{
// in this scope, myObj is non-nullable, so can be used
}
This special syntax would be unsurprising to C/C++ programmers. I'd prefer a special syntax like this to make it clear that we are doing a check that modifies the type of the name myObj within the truth-branch.
I'd add a further bit of sugar:
if (SomeMethodReturningANullable() into anotherObj)
{
// anotherObj is non-nullable, so can be used
}
This just gives the name anotherObj to the result of the expression on the left of the into, so it can be used in the scope where it is valid.
I'd do the same kind of thing for the ?: operator.
string message = GetMessage() into m ? m : "No message available";
Note that string message is non-nullable, but so are the two possible results of the test above, so the assignment is value.
And then maybe a bit of sugar for the presumably common case of substituting a value for null:
string message = GetMessage() or "No message available";
Obviously or would only be validly applied to a nullable type on the left side, and a non-nullable on the right side.
(I'd also have a built-in notion of ownership for instance fields; the compiler would generate the IDisposable.Dispose method automatically, and the ~Destructor syntax would be used to augment Dispose, exactly as in C++/CLI.)
Spec# had another syntactic extension related to non-nullables, due to the problem of ensuring that non-nullables had been initialized correctly during construction:
class SpecSharpExampleClass
{
private string! _nonNullableExampleField;
public SpecSharpExampleClass(string s)
: _nonNullableExampleField(s)
{
}
}
In other words, you have to initialize fields in the same way as you'd call other constructors with base or this - unless of course you initialize them directly next to the field declaration.
Have a look at the Elvis operator proposal for Java 7. This does something similar, in that it encapsulates a null check and method dispatch in one operator, with a specified return value if the object is null. Hence:
String s = mayBeNull?.toString() ?: "null";
checks if the String s is null, and returns the string "null" if so, and the value of the string if not. Food for thought, perhaps.
A couple of examples of similar features in other languages:
boost::optional (C++)
Maybe (Haskell)
There's also Nullable<T> (from C#) but that is not such a good example because of the different treatment of reference vs. value types.
In your example you could add a conditional message send operator, e.g.
b?->DoSomething();
To send a message to b only if it is non-null.
Have the nullability be a configuration setting, enforceable in the authors source code. That way, you will allow people who like nullable objects by default enjoy them in their source code, while allowing those who would like all their objects be non-nullable by default have exactly that. Additionally, provide keywords or other facility to explicitly mark which of your declarations of objects and types can be nullable and which cannot, with something like nullable and not-nullable, to override the global defaults.
For instance
/// "translation unit 1"
#set nullable
{ /// Scope of default override, making all declarations within the scope nullable implicitly
Bar bar; /// Can be null
non-null Foo foo; /// Overriden, cannot be null
nullable FooBar foobar; /// Overriden, can be null, even without the scope definition above
}
/// Same style for opposite
/// ...
/// Top-bottom, until reset by scoped-setting or simply reset to another value
#set nullable;
/// Nullable types implicitly
#clear nullable;
/// Can also use '#set nullable = false' or '#set not-nullable = true'. Ugly, but human mind is a very original, mhm, thing.
Many people argue that giving everyone what they want is impossible, but if you are designing a new language, try new things. Tony Hoare introduced the concept of null in 1965 because he could not resist (his own words), and we are paying for it ever since (also, his own words, the man is regretful of it). Point is, smart, experienced people make mistakes that cost the rest of us, don't take anyones advice on this page as if it were the only truth, including mine. Evaluate and think about it.
I've read many many rants on how it's us poor inexperienced programmers who really don't understand where to really use null and where not, showing us patterns and antipatterns that are meant to prevent shooting ourselves in the foot. All the while, millions of still inexperienced programmers produce more code in languages that allow null. I may be inexperienced, but I know which of my objects don't benefit from being nullable.
Here we are, 13 years later, and C# did it.
And, yes, this is the biggest improvement in languages since Barbara and Stephen invented types in 1974.:
Programming With Abstract Data Types
Barbara Liskov
Massachusetts Institute of Technology
Project MAC
Cambridge, Massachusetts
Stephen Zilles
Cambridge Systems Group
IBM Systems Development Division
Cambridge, Massachusetts
Abstract
The motivation
behind the work in very-high-level languages is to ease the
programming task by providing the programmer with a language
containing primitives or abstractions suitable to his problem area.
The programmer is then able to spend his effort in the right place; he
concentrates on solving his problem, and the resulting program will be
more reliable as a result. Clearly, this is a worthwhile goal.
Unfortunately, it is very difficult for a designer to select in
advance all the abstractions which the users of his language might
need. If a language is to be used at all, it is likely to be used to
solve problems which its designer did not envision, and for which the
abstractions embedded in the language are not sufficient. This paper
presents an approach which allows the set of built-in abstractions to
be augmented when the need for a new data abstraction is discovered.
This approach to the handling of abstraction is an outgrowth of work
on designing a language for structured programming. Relevant aspects
of this language are described, and examples of the use and
definitions of abstractions are given.
I think null values are good: They are a clear indication that you did something wrong. If you fail to initialize a reference somewhere, you'll get an immediate notice.
The alternative would be that values are sometimes initialized to a default value. Logical errors are then a lot more difficult to detect, unless you put detection logic in those default values. This would be the same as just getting a null pointer exception.