When defining classes, does R have any notion of private vs public slots?
If there is no strict feature for this in the language, is there a common naming scheme?
EDIT :
To give a bit of history: In the setClass function an option 'access' was provided to create so-called 'privileged slots' that could only be accessed through getters and setters provided with the class. This would allow to create private slots (by not providing a getter), but this feature has never been implemented. The help page of ?setClass currently reads :
access and version, included for compatibility with S-Plus, but
currently ignored.
So there is no such thing as private and public slots, as through the # notation every slot is reachable. And personally I'm very happy with that, as it allows me to use information from objects that is not readily reachible by using the getters and setters included in the package. It also allows me to economize heavy calculations by avoiding overhead created by getters and setters.
I am not aware of any naming convention for making a distinction between public and "private" slots. You can make the distinction for yourself by preceding the "private" slots with a dot, but that has no effect on how that slots behave. It's also not common practice, as most R programmers don't care about public and private slots. They just don't provide getters and setters for slots that the average user shouldn't reach.
To give a trivial example : the following creates a class with two slots, and a getter and setter for only one slot.
setClass("Example",
representation(
aslot = "numeric",
.ahiddenslot = "character"
)
)
setGeneric("aslot", function(x) standardGeneric("aslot"))
setMethod("aslot","Example",function(x){
x#aslot
})
setGeneric("aslot<-", function(x,value) standardGeneric("aslot<-"))
setMethod("aslot<-","Example",function(x,value){
x#aslot <- value
x
})
You can also set a show method that just doesn't print out the hidden slot:
setMethod("show","Example",function(object){
cat("Example with value for aslot: ", object#aslot,"\n")
})
This gives the following normal use :
> X <- new("Example",aslot=1,.ahiddenslot="b")
> X
Example with value for aslot: 1
> aslot(X)
[1] 1
> aslot(X) <- 3
But the .ahiddenslot is still reachable:
> slot(X,".ahiddenslot")
[1] "b"
Related
Using getters and setters is a very well known practice in object oriented languages. This is done in order to have a greater control on the variables. To achieve this, we make the variables private in java and hence we need both getters and setters there.
But in kotlin this is not the case. Here even public variables are accessed through getters and setters by default. Though setters can be used to validate an assignment to a variable, getters just return the variable as it is stored (and I think this is it for them). Hence custom getters are not required at all.
I have also seen some wrong usage of this feature where instead of writing a zero argument function, they use a val and do the computation in the getter. This creates an illusion that the thing is just a val but in reality it does not store anything and instead it performs a computation every time.
So is there a real need to have a custom getter?
getters just return the variable as it is stored (and I think this is it for them). Hence custom getters are not required at all.
If that was really the case, why have getters at all in Java? One of the goals of encapsulation is to make sure a change in the class doesn't change it's API. It's the same in Kotlin.
I have also seen some wrong usage of this feature where instead of writing a zero argument function, they use a val and do the computation in the getter. This creates an illusion that the thing is just a val but in reality it does not store anything and instead it performs a computation every time.
This is a perfectly valid use case for a custom getter. In Kotlin, one must not assume that using a property is entirely free of overhead. There are many questions to ask yourself when choosing between a property with a getter or a zero-arg function:
Does it describe behavior? Use a function (walk(), build(), etc)
Does it describe state? Use a property (firstName, lastIndex, etc)
Additionally, a property getter should not throw an exception, should be either cheap to calculate or cached on first access, and should return the same result for multiple consecutive executions. Here's examples from the standard library:
ArrayDeque.first() is a function, it throws if deque is empty.
List.lastIndex is a property, it's cheap to calculate.
Lazy<T>.value is a property, the value is computed and cached on first access.
Most delegated properties make use of custom getters.
More reading:
Why use getters and setters/accessors?
Kotlin: should I define Function or Property?
Just some more info. Other than readability, the possibility of defining a custom getter allows you to evolve a class without changing its public members, even if you started with a simple val with no custom getter.
In a language without properties like Java, if you define a public field:
public class Foo {
public final int value;
public Foo(int value) {
this.value = value;
}
}
And then later you want to modify the class to add a feature where it returns negated values if you flip a Boolean, there's no way to do it without breaking code that uses the original version of the class. So you should have used getters and setters to begin with.
But in Kotlin, you can't directly expose a backing field like this, so it's impossible to paint yourself in a corner like you could with a public field in Java. If your original class is like this:
class Foo(val value: Int)
You could modify it like this to add the feature and have no impact on code that already uses the class.
class Foo(private val originalValue: Int) {
var isNegated = false
val value: Int
get() = if (isNegated) -originalValue else originalValue
}
#Private attribute example
class C {
has $!w; #private attribute
multi method w { $!w } #getter method
multi method w ( $_ ) { #setter method
warn “Don’t go changing my w!”; #some side action
$!w = $_
}
}
my $c = C.new
$c.w( 42 )
say $c.w #prints 42
$c.w: 43
say $c.w #prints 43
#but not
$c.w = 44
Cannot modify an immutable Int (43)
so far, so reasonable, and then
#Public attribute example
class C {
has $.v is rw #public attribute with automatic accessors
}
my $c = C.new
$c.v = 42
say $c.v #prints 42
#but not
$c.v( 43 ) #or $c.v: 43
Too many positionals passed; expected 1 argument but got 2
I like the immediacy of the ‘=‘ assignment, but I need the ease of bunging in side actions that multi methods provide. I understand that these are two different worlds, and that they do not mix.
BUT - I do not understand why I can’t just go
$c.v( 43 )
To set a public attribute
I feel that raku is guiding me to not mix these two modes - some attributes private and some public and that the pressure is towards the method method (with some : sugar from the colon) - is this the intent of Raku's design?
Am I missing something?
is this the intent of Raku's design?
It's fair to say that Raku isn't entirely unopinionated in this area. Your question touches on two themes in Raku's design, which are both worth a little discussion.
Raku has first-class l-values
Raku makes plentiful use of l-values being a first-class thing. When we write:
has $.x is rw;
The method that is generated is:
method x() is rw { $!x }
The is rw here indicates that the method is returning an l-value - that is, something that can be assigned to. Thus when we write:
$obj.x = 42;
This is not syntactic sugar: it really is a method call, and then the assignment operator being applied to the result of it. This works out, because the method call returns the Scalar container of the attribute, which can then be assigned into. One can use binding to split this into two steps, to see it's not a trivial syntactic transform. For example, this:
my $target := $obj.x;
$target = 42;
Would be assigning to the object attribute. This same mechanism is behind numerous other features, including list assignment. For example, this:
($x, $y) = "foo", "bar";
Works by constructing a List containing the containers $x and $y, and then the assignment operator in this case iterates each side pairwise to do the assignment. This means we can use rw object accessors there:
($obj.x, $obj.y) = "foo", "bar";
And it all just naturally works. This is also the mechanism behind assigning to slices of arrays and hashes.
One can also use Proxy in order to create an l-value container where the behavior of reading and writing it are under your control. Thus, you could put the side-actions into STORE. However...
Raku encourages semantic methods over "setters"
When we describe OO, terms like "encapsulation" and "data hiding" often come up. The key idea here is that the state model inside the object - that is, the way it chooses to represent the data it needs in order to implement its behaviors (the methods) - is free to evolve, for example to handle new requirements. The more complex the object, the more liberating this becomes.
However, getters and setters are methods that have an implicit connection with the state. While we might claim we're achieving data hiding because we're calling a method, not accessing state directly, my experience is that we quickly end up at a place where outside code is making sequences of setter calls to achieve an operation - which is a form of the feature envy anti-pattern. And if we're doing that, it's pretty certain we'll end up with logic outside of the object that does a mix of getter and setter operations to achieve an operation. Really, these operations should have been exposed as methods with a names that describes what is being achieved. This becomes even more important if we're in a concurrent setting; a well-designed object is often fairly easy to protect at the method boundary.
That said, many uses of class are really record/product types: they exist to simply group together a bunch of data items. It's no accident that the . sigil doesn't just generate an accessor, but also:
Opts the attribute into being set by the default object initialization logic (that is, a class Point { has $.x; has $.y; } can be instantiated as Point.new(x => 1, y => 2)), and also renders that in the .raku dumping method.
Opts the attribute into the default .Capture object, meaning we can use it in destructuring (e.g. sub translated(Point (:$x, :$y)) { ... }).
Which are the things you'd want if you were writing in a more procedural or functional style and using class as a means to define a record type.
The Raku design is not optimized for doing clever things in setters, because that is considered a poor thing to optimize for. It's beyond what's needed for a record type; in some languages we could argue we want to do validation of what's being assigned, but in Raku we can turn to subset types for that. At the same time, if we're really doing an OO design, then we want an API of meaningful behaviors that hides the state model, rather than to be thinking in terms of getters/setters, which tend to lead to a failure to colocate data and behavior, which is much of the point of doing OO anyway.
BUT - I do not understand why I can’t just go $c.v( 43 ) To set a public attribute
Well, that's really up to the architect. But seriously, no, that's simply not the standard way Raku works.
Now, it would be entirely possible to create an Attribute trait in module space, something like is settable, that would create an alternate accessor method that would accept a single value to set the value. The problem with doing this in core is, is that I think there are basically 2 camps in the world about the return value of such a mutator: would it return the new value, or the old value?
Please contact me if you're interested in implementing such a trait in module space.
I currently suspect you just got confused.1 Before I touch on that, let's start over with what you're not confused about:
I like the immediacy of the = assignment, but I need the ease of bunging in side actions that multi methods provide. ... I do not understand why I can’t just go $c.v( 43 ) To set a public attribute
You can do all of these things. That is to say you use = assignment, and multi methods, and "just go $c.v( 43 )", all at the same time if you want to:
class C {
has $!v;
multi method v is rw { $!v }
multi method v ( :$trace! ) is rw { say 'trace'; $!v }
multi method v ( $new-value ) { say 'new-value'; $!v = $new-value }
}
my $c = C.new;
$c.v = 41;
say $c.v; # 41
$c.v(:trace) = 42; # trace
say $c.v; # 42
$c.v(43); # new-value
say $c.v; # 43
A possible source of confusion1
Behind the scenes, has $.foo is rw generates an attribute and a single method along the lines of:
has $!foo;
method foo () is rw { $!foo }
The above isn't quite right though. Given the behavior we're seeing, the compiler's autogenerated foo method is somehow being declared in such a way that any new method of the same name silently shadows it.2
So if you want one or more custom methods with the same name as an attribute you must manually replicate the automatically generated method if you wish to retain the behavior it would normally be responsible for.
Footnotes
1 See jnthn's answer for a clear, thorough, authoritative accounting of Raku's opinion about private vs public getters/setters and what it does behind the scenes when you declare public getters/setters (i.e. write has $.foo).
2 If an autogenerated accessor method for an attribute was declared only, then Raku would, I presume, throw an exception if a method with the same name was declared. If it were declared multi, then it should not be shadowed if the new method was also declared multi, and should throw an exception if not. So the autogenerated accessor is being declared with neither only nor multi but instead in some way that allows silent shadowing.
I used to think of a Record as a container for (immutable) data, until I came across some enlightening reading.
Given that functions can be seen as values in F#, record fields can hold function values as well. This offers possibilities for state encapsulation.
module RecordFun =
type CounterRecord = {GetState : unit -> int ; Increment : unit -> unit}
// Constructor
let makeRecord() =
let count = ref 0
{GetState = (fun () -> !count) ; Increment = (fun () -> incr count)}
module ClassFun =
// Equivalent
type CounterClass() =
let count = ref 0
member x.GetState() = !count
member x.Increment() = incr count
usage
counter.GetState()
counter.Increment()
counter.GetState()
It seems that, apart from inheritance, there’s not much you can do with a Class, that you couldn’t do with a Record and a helper function. Which plays better with functional concepts, such as pattern matching, type inference, higher order functions, generic equality...
Analyzing further, the Record could be seen as an interface implemented by the makeRecord() constructor. Applying (sort of) separation of concerns, where the logic in the makeRecord function can be changed without risk of breaking the contract, i.e. record fields.
This separation becomes apparent when replacing the makeRecord function with a module that matches the type’s name (ref Christmas Tree Record).
module RecordFun =
type CounterRecord = {GetState : unit -> int ; Increment : unit -> unit}
// Module showing allowed operations
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module CounterRecord =
let private count = ref 0
let create () =
{GetState = (fun () -> !count) ; Increment = (fun () -> incr count)}
Q’s: Should records be looked upon as simple containers for data or does state encapsulation make sense? Where should we draw the line, when should we use a Class instead of a Record?
Note the model from the linked post is pure, whereas the code above is not.
I do not think there is a single universal answer to this question. It is certainly true that records and classes overlap in some of their potential uses and you can choose either of them.
The one difference that is worth keeping in mind is that the compiler automatically generates structural equality and structural comparison for records, which is something you do not get for free for classes. This is why records are an obvious choice for "data types".
The rules that I tend to follow when choosing between records & classes are:
Use records for data types (to get structural equality for free)
Use classes when I want to provide C#-friendly or .NET-style public API (e.g. with optional parameters). You can do this with records too, but I find classes more straightforward
Use records for types used locally - I think you often end up using records directly (e.g. creating them) and so adding/removing fields is more work. This is not a problem for records that are used within just a single file.
Use records if I need to create clones using the { ... with ... } syntax. This is particularly nice if you are writing some recursive processing and need to keep state.
I don't think everyone would agree with this and it is not covering all choices - but generally speaking, using records for data and local types and classes for the rest seems like a reasonable method for choosing between the two.
If you want to achieve data hiding in a record, I feel there are better ways of going about it, like abstract data type "pattern".
Take a look at this:
type CounterRecord =
private {
mutable count : int
}
member this.Count = this.count
member this.Increment() = this.count <- this.count + 1
static member Make() = { count = 0 }
The record constructor is private, so the only way of constructing an instance is through the static Make member,
count field is mutable - not something to be proud about, but I'd say fair game for your counter example. Also it's not accessible from outside the module where it's defined due to private modifier. To access it from outside, you have the read-only Count property.
Like in your example, there's an Increment function on the record that mutates the internal state.
Unlike your example, you can compare CounterRecord instances using auto-generated structural comparisons - as Tomas mentioned, the selling point of records.
As for records-as-interfaces, you might see that sometimes in the field, though I think it's more of a JavaScript/Haskell idiom. Unlike those languages, F# has the interface system of .NET, made even stronger when coupled with object expressions. I feel there's not much reason to repurpose records for that.
Up-front: I am aware that R is a functional language, so please don't bite ;-)
I've had great experiences with using an OOP approach for a lot of my programs.
Now, I'm wondering if there's a way to make a distinction between public and private methods when using S4 Reference Classes in R?
Example
Class Definitions
setRefClass("B",
field=list(
b.1="numeric",
b.2="logical"
),
methods=list(
thisIsPublic=function(...) {
thisIsPublic_ref(.self=.self, ...)
},
thisIsPrivate=function(...) {
thisIsPrivate_ref(.self=.self, ...)
}
)
)
setRefClass("A",
field=list(
a.1="B"
)
)
NOTE
I usually do not place the actual method definition within the class def but separate it to a S4 method (i.e. thisIsPublic_ref) for the following reasons:
That way the class def stays clearly arranged and is easier to read in cases when the individual method defs grow quite large.
It allows you to switch to a functional execution of methods at any time. Be x an instance of a certain class, you are able to call foo_ref(.self=x) instead of x$foo().
It allows you to byte-compile the methods via compiler::cmpfun() which I think is not possible if you have "plain" Reference Class methods.
It sure does not really make sense to make it that complicated for this specific example, but I thought I'd nevertheless illustrate that approach.
Method Definitions
setGeneric(
name="thisIsPublic_ref",
signature=c(".self"),
def=function(
.self,
...
) {
standardGeneric("thisIsPublic_ref")
}
)
setGeneric(
name="thisIsPrivate_ref",
signature=c(".self"),
def=function(
.self,
...
) {
standardGeneric("thisIsPrivate_ref")
}
)
require(compiler)
setMethod(
f="thisIsPublic_ref",
signature=signature(.self="B"),
definition=cmpfun(function(
.self,
...
){
.self$b.1 * 1000
})
)
setMethod(
f="thisIsPrivate_ref",
signature=signature(.self="B"),
definition=cmpfun(function(
.self,
...
){
.self$b.2
})
)
Instances
x.b <- new("B", b.1=10, b.2=TRUE)
x.a <- new("A", a.1=x.b, a.2="hello world")
Public vs. private
Instances of class A (i.e. x.a) should be allowed to use class B's public methods:
> x.a$a.1$thisIsPublic()
[1] 10000
Instances of class A (i.e. x.a) should not be allowed to use class B's private methods. So I would want this not to work, i.e. result in an error:
> x.a$a.1$thisIsPrivate()
[1] TRUE
Any idea how one could specify this?
The only thing I came up with so far:
Adding a sender argument to each method, explicitly specify it for each method call and check if class(.self) == class(sender). But that seems a bit “explicit“.
As functions are first-class objects in R, you can embed one inside the other, as follows:
hello <- function() {
print_ <- function() {
return ('hello world')
}
print_()
}
Yes, it's cheeky, probably not the cleanest way, but it does work... Invoke using 'hello()'.
The short answer is to make a package. R's object systems and it's means of partitioning code (namespaces) are more separate than their equivalents in Java-like languages.
When you make a package, you specify what gets exported in a file called NAMESPACE using directives export and exportMethods. You can choose not to export methods and other R objects that you wish to be package private (to use Java terminology). See the Namespaces with S4 classes and methods section of the Writing R Extensions manual
Making a package is tricky the first time you do it, but there's lot's of help. See the docs for package.skeleton and the Writing R Extensions manual linked above.
Make sure Reference classes are really what you want. Regular S4 classes are usually the more R-ish way, for whatever that's worth. A great source of information about R's many OO constructs (and about packaging, too) is on Hadley Wickham's devtools wiki.
I'm often running into the same trail of thought when I'm creating private methods, which application is to modify (usually initialize) an existing variable in scope of the class.
I can't decide which of the following two methods I prefer.
Lets say we have a class Test with a field variable x. Let it be an integer. How do you usually modify / initialize x ?
a) Modifying the field directly
private void initX(){
// Do something to determine x. Here its very simple.
x = 60;
}
b) Using a return value
private int initX(){
// Do something to determine x. Here its very simple.
return 60;
}
And in the constructor:
public Test(){
// a)
initX();
// b)
x = initX();
}
I like that its clear in b) which variable we are dealing with. But on the other hand, a) seems sufficient most of the time - the function name implies perfectly well what we are doing!
Which one do you prefer and why?
Thank for your answers guys! I'll make this a community wiki as I realize that there is no correct answer to this.
I usually prefer b), only I pick a different name, like computeX() in this case. A few reasons for why:
if I declare computeX() as protected, there is a simple way for a subclass to influent how it works, yet x itself can remain a private field;
I like to declare fields final if that's what they are; in this case a) is not an option since initialization has to happen in compiler (this is Java-specific, but your examples all look Java as well).
That said, I don't have a strong preference between the two methods. For instance, if I need to initialize several related fields at once, I will usually pick option a). That, though, only if I cannot or don't want for some reason, to initialize directly in constructor.
For initialization I prefer constructor initialization if it's possible,
public Test():x(val){...}, or write initialization code in the constructor body. Constructor is the best place to initialize all the fields (actually, it is the purpose of constructor). I'd use private initX() approach only if initialization code for X is too long (just for readability) and call this function from constructor. private int initX() in my opinion has nothing to do with initialization(unless you implement lazy initialization,but in this case it should return &int or const &int) , it is an accessor.
I would prefer option b), because you can make it a const function in languages that support it.
With option a), there is a temptation for new, lazy or just time-stressed developers to start adding little extra tasks into the initX method, instead of creating a new one.
Also, in b), you can remove initX() from the class definition, so consumers of the object don't even have to know it's there. For example, in C++.
In the header:
class Test {
private: int X;
public: Test();
...
}
In the CPP file:
static int initX() { return 60; }
Test::Test() {
X = initX();
}
Removing the init functions from the header file simplifies the class for the people that have to use it.
Neither?
I prefer to initialize in the constructor and only extract out an initialization method if I need a lot of fields initialized and/or need the ability to re-initialize at another point in the life time of an instance (without going through a destruct/construct).
More importantly, what does 60 mean?
If it is a meaningful value, make it a const with a meaningful name: NUMBER_OF_XXXXX, MINUTES_PER_HOUR, FIVE_DOZEN_APPLES, SPEED_LIMIT, ... regardless of how and where you subsequently use it (constructor, init method or getter function).
Making it a named constant makes the value re-useable in and of itself. And using a const is much more "findable", especially for more ubiquitous values (like 1 or -1) then using the actual value.
Only when you want to tie this const value to a specific class would it make sense to me to create a class const or var, or - it the language does not support those - a getter class function.
Another reason to make it a (virtual) getter function would be if descendant classes need the ability to start with a different initial value.
Edit (in response to comments):
For initializations that involve complex calculations I would also extract out a method to do the calculation. The choice of making that method a procedure that directly modifies the field value (a) or a function that returns the value it should be given (b), would be driven by the question whether or not the calculation would be needed at other times than "just the constructor".
If only needed at initialization in the constructor, I would prefer method (a).
If the calculation needs to be done at other times as well, I would opt for method (b) as it also makes it possible to assign the outcome to some other field or local variable and so can be used by descendants or other users of the class without affecting the inner state of the instance.
Actually only a) method behaves as expected (by analyzing method name). Method b) should be named 'return60' in your example or 'getXValue' in some more complicated one.
Both options are correct in my opinion. It all depeneds what was your intention when certain design was choosen. If your method has to do initialization only I would prefer a) beacuse it is simplier. In case x value is also used for something else somewhere in logic using b) option might lead to more consistent code.
You should also always write method names clearly and make those names corresponding with actual logic. (in this case method b) has confusing name).
#Frederik, if you use option b) and you have a LOT of field variables, the constructor will become a quite unwieldy block of code. Sometimes you just can't help but have lots and lots of member variables in a class (example: it's a domain object and it's data comes straight from a very wide table in the database). The most pragmatic approach would be to modularize the code as you need to.