I have got an algorithm that has got one function calculate(x, y) which calls many smaller functions which require to use two big files that I want to load only once.
One way to load it once is to load them and pass them to calculate(x, y, f1, f2) and they will be passed to next and next functions that are called inside.
def calculate(x, y, f1, f2):
a = function1(x, y, f1)
b = function2(x, y, f2)
def function1(x, y, f1):
#process further with f1 that is passed by arg further and further...
Other way is to make a class
class Algo
def __init__():
self.f1 = load_f1...
self.f2 = load_f2...
def calculate(self, x, y):
a = function1(x, y)
b = function2(x, y)
def function1(self, x, y):
#process further with self.f1...
and load files in init and call it this way.
Is it proper way to encapsulate function in a class?
Is there any pattern that solves that problem?
A common technique for solving this problem is encapsulating the large files in an object and passing those objects to the constructor of your algorithm class. For example:
class Algo
def __init__(self, file1, file2):
self.f1 = file1
self.f2 = file2
def calculate(self, x, y):
a = function1(x, y)
b = function2(x, y)
def function1(self, x, y):
#process further with self.f1...
Opening the files is the responsibility of the object that creates the Algo object. For example:
large_file_1 = open("some_file.txt")
large_file_2 = open("some_other_file.txt")
algo = Algo(large_file_1, large_file_2)
It may seem like a minor difference between passing the files into the constructor as arguments or simply opening the files within the constructor (as in your second example), but there are some important distinctions:
The responsibility of opening the files is moved outside of the Algo class
Fake files can be injected into the Algo class for testing
In the first case, the Algo class should only be concerned with the execution of your desired algorithm. Although opening a file may not seem like a difference of a responsibility, what if the logic for opening the file were to change? For example, suppose that the files were later compressed as .zip files to save space: The Algo class should not have to change. Instead, the creator of the Algo object would be responsible for altering how the files are opened. From the perspective of the Algo class, as long as a file handle is provided, it does not care if the file was stored as text, binary, compressed text, etc. By passing the files as arguments to the Algo constructor, we ensure that the Algo class is decoupled from the specific representation of the file.
In the second case, we are able to test the Algo class in isolation. Instead of using real data, we could inject fake files and test that the algorithm works with those fake files. If we had included the logic for opening the files within the body of the constructor, the Algo class would be forced to use the production (non-test) files and would be unable to test against faked/stubbed data.
I think, concerning the SRP principle, it is definitevily possible to use an own class just for loading your two big files.
I think it seems way more logic to use an own class rather then calling different methods, because as an user of your interface (or however youre code will be used) i don't want to call several different methods, i just want to do
FileLoader fileLoader = new FileLoeader();
List<File> loadedFiles = fileLoeader.init(filePaths);
public class FileLoader() {
private List<File> readFiles;
public void init(List<String> filePaths) { filePaths.foreach(s -> loadFile(s) }
public int calculate(Algorithm alg) { ... }
}
If you need this files only for your mentioned calculations, you could simply do the init and then call the calculation method in this class and inject your desired algorithm.
If you want to retreive the files for further use, you could store them in an accessible list like i showed in the code example. My example is java but can be transformed to any other language.
Related
I know what classes are about, but for better understanding I need a use case. Recently I discovered the construct of data classes. I get the idea behind normal classes, but I cannot imagine a real use case for data classes.
When should I use a data class and when I use a "normal" class? For all I know, all classes keep data.
Can you provide a good example that distinguishes data classes from non-data classes?
A data class is used to store data. It's lighter than a normal class, and can be compared to an array with key/value (dictionary, hash, etc.), but represented as an object with fixed attributes. In kotlin, according to the documentation, that adds those attributes to the class:
equals()/hashCode() pair
toString() of the form "User(name=John, age=42)"
componentN() functions corresponding to the properties in their order of declaration.
copy() function
Also it has a different behavior during class inheritence :
If there are explicit implementations of equals(), hashCode(), or toString() in the data class body or final implementations in a
superclass, then these functions are not generated, and the existing
implementations are used.
If a supertype has componentN() functions that are open and return compatible types, the corresponding functions are generated for the
data class and override those of the supertype. If the functions of
the supertype cannot be overridden due to incompatible signatures or
due to their being final, an error is reported.
Providing explicit implementations for the componentN() and copy() functions is not allowed.
So in kotlin, if you want to describe an object (a data) then you may use a dataclass, but if you're creating a complex application and your class needs to have special behavior in the constructor, with inheritence or abstraction, then you should use a normal class.
I do not know Kotlin, but in Python, a dataclass can be seen as a structured dict. When you want to use a dict to store an object which has always the same attributes, then you should not put it in a dict but use a Dataclass.
The advantage with a normal class is that you don't need to declare the __init__ method, as it is "automatic" (inherited).
Example :
This is a normal class
class Apple:
def __init__(size:int, color:str, sweet:bool=True):
self.size = size
self.color = color
self.sweet = sweet
Same class as a dataclass
from dataclasses import dataclass
#dataclass
class Apple:
size: int
color: str
sweet: bool = True
Then the advantage compared to a dict is that you are sure of what attribute it has. Also it can contains methods.
The advantage over to a normal class is that it is simpler to declare and make the code lighter. We can see that the attributes keywords (e.g size) are repeated 3 times in a normal class, but appear only once in a dataclass.
The advantage of normal class also is that you can personalize the __init__ method, (in a dataclass also, but then you lose it's main advantage I think) example:
# You need only 2 variable to initialize your class
class Apple:
def __init__(size:int, color:str):
self.size = size
self.color = color
# But you get much more info from those two.
self.sweet = True if color == 'red' else False
self.weight = self.__compute_weight()
self.price = self.weight * PRICE_PER_GRAM
def __compute_weight(self):
# ...
return (self.size**2)*10 # That's a random example
Abstractly, a data class is a pure, inert information record that doesn’t require any special handling when copied or passed around, and it represents nothing more than what is contained in its fields; it has no identity of its own. A typical example is a point in 3D space:
data class Point3D(
val x: Double,
val y: Double,
val z: Double
)
As long as the values are valid, an instance of a data class is entirely interchangeable with its fields, and it can be put apart or rematerialized at will. Often there is even little use for encapsulation: users of the data class can just access the instance’s fields directly. The Kotlin language provides a number of convenience features when data classes are declared as such in your code, which are described in the documentation. Those are useful when for example building more complex data structures employing data classes: you can for example have a hashmap assign values to particular points in space, and then be able to look up the value using a newly-constructed Point3D.
val map = HashMap<Point3D, String>()
map.set(Point3D(3, 4, 5), "point of interest")
println(map.get(Point3D(3, 4, 5))) // prints "point of interest"
For an example of a class that is not a data class, take FileReader. Underneath, this class probably keeps some kind of file handle in a private field, which you can assume to be an integer (as it actually is on at least some platforms). But you cannot expect to store this integer in a database, have another process read that same integer from the database, reconstruct a FileReader from it and expect it to work. Passing file handles between processes requires more ceremony than that, if it is even possible on a given platform. That property makes FileReader not a data class. Many examples of non-data classes will be of this kind: any class whose instances represent transient, local resources like a network connection, a position within a file or a running process, cannot be a data class. Likewise, any class where different instances should not be considered equal even if they contain the same information is not a data class either.
From the comments, it sounds like your question is really about why non-data classes exist in Kotlin and why you would ever choose not to make a data class. Here are some reasons.
Data classes are a lot more restrictive than a regular class:
They have to have a primary constructor, and every parameter of the primary constructor has to be a property.
They cannot have an empty primary constructor.
They cannot be open so they cannot be subclassed.
Here are other reasons:
Sometimes you don't want a class to have a copy function. If a class holds onto some heavy state that is expensive to copy, maybe it shouldn't advertise that it should be copied by presenting a copy function.
Sometimes you want to use an instance of a class in a Set or as Map keys without two different instances being considered as equivalent just because their properties have the same values.
The features of data classes are useful specifically for simple data holders, so the drawbacks are often something you want to avoid.
We all know the following code is bad because of the use of global variable:
a = 3.14
def inc():
global a
a += 1
inc()
b = a
It is so bad that in python one has to intentionally declare global to "break" the ban. But in the name of OO, how come the code below is justified:
class A:
def __init__(self):
self.a = 3.14
def inc(self):
self.a += 1
A_instance = A()
A_instance.inc()
b = A_instance.a
Within the scope of the class instance, isn't the member variable simply the same as global variable? "self" basically grant unrestricted access to all variable to any member function regardless the necessity, making it very stateful, prone to mistakes, and difficult to read (basically all the bad traits of global var)?
We also know the clean and better way is:
def inc(_a):
return _a+1
a = 3.14
b = inc(a)
So that the function inc() is at least stateless and has a predictable behavior. What's the best practice to use class and make it as stateless as possible? Or by definition, is OO always stateful and one has to go functional programming without classes at all? I also heard people say #static methods are stupid, break encapsulation, and should be avoided...
It comes down to whether you need to save state or not. If you do need to save state then it needs to be saved somewhere. Classes allow you to save state alongside the methods that operate on that state and enforce a class-local scope that is less error prone than storing state within a global scope.
As a concrete example, I recently implemented a Tree class for a forest simulation. One of the variables that is important to associate with a tree is its age. Where should the age of the tree be stored? It makes more sense to store it within the Tree class than outside of it.
If protecting the age of the tree from being manipulated by code outside of the class was important, I could have used Python's excellent #property syntax for that purpose, or less directly through the Python _varname variable naming convention.
Within the class, yes, the age of the tree is visible to the other methods, but that is a relatively narrow scope, certainly not akin to a global scope if your classes are written with a single responsibility in accord with SOLID object oriented design principles.
Now the reason why the concept of self is not considered bad practice is that if you are to create a class everything within that class ought to be related in one way or another which usually is class attributes that class instances are initialized with via the __init__ dunder method. Basically, if you are going to create a class then there ought to be a link of some sort, a class global if you will which are class attributes
usually the idea that creating classes with just self like this
class A:
def __init__(self):
self.a = 3.14
def inc(self):
self.a += 1
is not something that is often seen, rather people chose the way which is generally in many opinions considered cleaner and more predictable
class A:
def __init__(self, a):
self.a = a
def inc(self):
self.a += 1
With this when you declare class instances you have more control of what is actually there, so class instances are hence declared with variables
A_instance = A(3.14) #a will be 3.14 in this instance
B_instance = A(3.142) #a will be 3.142 in this instance
and so on.
Class variables that are declared outside the __init__ function but inside the class, itself are considerably different and that would look like
class A:
a = 22/7
def __init__(self):
....
with this a would be a global variable throughout the class(not the instance but the class itself). So what does that look like
class A:
a = 22/7
def __init__(self, item):
self.item = item
A_instance = A(3.14)
B_instance = A(5)
Although A_instance and B_instance are completely different and A_instance.item will equal 3.14 while B_instance.item will equal 5 both of these will have the class variable a as 22/7
A_instance.a #this will return a value of 22/7
B_instance.a #this will also return a value of 22/7
So the class variable remains consistent/the same throughout all instances of that class unless it is actively altered.
Point is if you are going to create a class it should have specific variables that are intertwined with each other and affect all a particular group of functions, these are the functions you make into class methods as such the use of self within classes is not wrong if used properly
I've always known what static methods are by definition, but I've always avoided using them at school because I was afraid of what I didn't know.
I already understand that you can use it as a counter throughout your entire project.
Now that I am interning I want to know when exactly static methods are used. From my observation so far, static classes/methods are used when it contains a lot of functions that will be used in many different classes and itself doesn't contain too many critical local variables within the class where it is not necessary to create an instant of it.
So as an example, you can have a static class called Zip that zips and unzips files and provide it to many different classes for them to do whatever with it.
Am I right? Do I have the right idea? I'm pretty sure there are many ways to use it.
Static functions are helpful as they do not rely on an instantiated member of whatever class they are attached to.
Static functions can provide functionality related to an a particular class without requiring the programmer to first create an instance of that class.
See this comparison:
class Numbers
{
public int Add(int x, int y)
{
return x + y;
}
public static int AddNumbers(int x, int y)
{
return x + y;
}
}
class Main
{
//in this first case, we use the non-static version of the Add function
int z1 = (new Numbers()).Add(2, 4);
//in the second case, we use the static one
int z2 = Numbers.AddNumbers(3, 5);
}
Technically, answers above are correct.
But the examples are not correct from the OOP point of view.
For example you have a class like this:
class Zip
{
public static function zipFile($fileName)
{
//
}
public static function unzipFile($fileName)
{
//
}
}
The truth is that there is nothing object-oriented here. You just defined two functions which you need to call using the fancy syntax like Zip::zipFile($myFile) instead of just zipFile($myFile).
You don't create any objects here and the Zip class is only used as a namespace.
So in this case it is better to just define these functions outside of class, as regular functions. There are namespaces in php since version 5.3, you can use them if you want to group your functions.
With the OOP approach, your class would look like this:
class ZipArchive
{
private $_archiveFileName;
private $_files;
public function __construct($archiveFileName) {
$this->_archiveFileName = $archiveFileName;
$this->_files = [];
}
public function add($fileName)
{
$this->_files[] = $fileName;
return $this; // allows to chain calls
}
public function zip()
{
// zip the files into archive specified
// by $_archiveFileName
}
}
And then you can use it like this:
$archive = new ZipArchive('path/to/archive.zip');
$archive->add('file1')->add('file2')->zip();
What is more important, you can now use the zip functionality in an OOP way.
For example, you can have a base class Archive and sub-classes like ZipArchive, TarGzArchive, etc.
Now, you can create an instance of the specific sub-class and pass it to other code which will not even know if files are going to be zip-ped or tag.gz-ipped. For example:
if ($config['archive_type'] === 'targz') {
// use tar.gz if specified
$archive = new TarGzArchive($path);
} else {
// use zip by default
$archive = new ZipArchive($path);
}
$backup = new Backup($archive /*, other params*/);
$backup->run();
Now the $backup object will use the specified archive type. Internally it doesn't know and doesn't care how exactly files will be archived.
You can even have a CopyArchive class which will simply copy files to another location.
It is easy to do it this way because your archive support is written in OOP way. You have small object responsible for specific things, you create and combine them and get the result you want.
And if you just have a bunch of static methods instead of real class, you will be forced to write the procedural-style code.
So I would not recommend to use static methods to implement actual features of your application.
Static methods may be helpful to support logging, debugging, testing and similar things. Like if you want to count number of objects created, you can use class-level static counter, increment it in the constructor and you can have a static method which reads the counter and prints it or writes to the log file.
Yes, static classes are used for problems that require stateless computation. Such as adding two numbers. Zipping a file. Etc.
If your class requires state, where you need to store connections or other longer living entities, then you wouldn't use static.
AFAIK. Static methods does not depends on a class instance. Just that.
As an example:
If you have an single thread program that will have only ONE database connection and will do several queries against the database it will be better to implement it as a static class (note that I specified that you will not connect, ever to several databases or have several threads).
So you will not need to create several connection objects, because you already know that you will only use one. And you will not need to create several objects. Singletons in this scenario are, also, an option.
There are other examples.
If you create an class to convert values.
class Convert{
static std::string fromIntToString(int value);
}
This way you will not need to create the class convert every time you need to convert from integer to an string.
std::string value = Convert::fromIntToString(10).
If you haven't done that you would need to instantiate this class several times through your program.
I know that you can find several other examples. It is up to you and your scenario to decide when you are going to do that.
I'm new to OOP and I'm in the following situation: I have something like a report "Engine" that is used for several reports, the only thing needed is the path of a config file.
I'll code in Python, but this is an agnostic question.So, I have the following two approaches
A) class ReportEngine is an abstract class that has everything needed BUT the path for the config file. This way you just have to instantiate the ReportX class
class ReportEngine(object):
...
class Report1(ReportEngine):
_config_path = '...'
class Report2(ReportEngine):
_config_path = '...'
report_1 = Report1()
B) class ReportEngine can be instantiated passing the config file path
class ReportEngine(object):
def __init__(self, config_path):
self._config_path = config_path
...
report_1 = ReportEngine(config_path="/files/...")
Which approach is the right one? In case it matters, the report object would be inserted in another class, using composition.
IMHO the A) approach is better if you need to implement report engines that are different from each other. If your reports are populated using different logic, follow this approach.
But if the only difference among your report engines is the _config_path i think that B) approach is the right one for you. Obviosly, this way you'll have a shared logic to build every report, regardless the report type.
Generally spoken, put everything which every Report has, in the superclass. Put specific things in the subclasses.
So in your case, put the _config_path in the superclass ReportEngine like in B) (since every Report has a _config_path), but instanciate specific Reports like in A), whereas every Report can set its own path.
I don't know Python, but did a quick search for the proper syntax for Python 3.0+, I hope it makes sense:
class ReportEngine(object):
def __init__(self, config_path):
self._config_path = config_path
def printPath(self):
print self._config_path
...
class Report1(ReportEngine):
def __init__(self):
super().__init__('/files/report1/...')
Then a
reportObj = Report1()
reportObj.printPath()
should print
'/files/report1/...'
Basically the main difference is that approach A is more flexible than B(not mutual change in one report does not influence other reports), while B is simpler and clearer (shows exactly where the difference is) but a change affecting one report type would require more work. If you are pretty sure the reports won't change in time - go with B, if you feel like the differences will not be common in the future - go with A.
Up-front: I am aware that R is a functional language, so please don't bite ;-)
I've had great experiences with using an OOP approach for a lot of my programs.
Now, I'm wondering if there's a way to make a distinction between public and private methods when using S4 Reference Classes in R?
Example
Class Definitions
setRefClass("B",
field=list(
b.1="numeric",
b.2="logical"
),
methods=list(
thisIsPublic=function(...) {
thisIsPublic_ref(.self=.self, ...)
},
thisIsPrivate=function(...) {
thisIsPrivate_ref(.self=.self, ...)
}
)
)
setRefClass("A",
field=list(
a.1="B"
)
)
NOTE
I usually do not place the actual method definition within the class def but separate it to a S4 method (i.e. thisIsPublic_ref) for the following reasons:
That way the class def stays clearly arranged and is easier to read in cases when the individual method defs grow quite large.
It allows you to switch to a functional execution of methods at any time. Be x an instance of a certain class, you are able to call foo_ref(.self=x) instead of x$foo().
It allows you to byte-compile the methods via compiler::cmpfun() which I think is not possible if you have "plain" Reference Class methods.
It sure does not really make sense to make it that complicated for this specific example, but I thought I'd nevertheless illustrate that approach.
Method Definitions
setGeneric(
name="thisIsPublic_ref",
signature=c(".self"),
def=function(
.self,
...
) {
standardGeneric("thisIsPublic_ref")
}
)
setGeneric(
name="thisIsPrivate_ref",
signature=c(".self"),
def=function(
.self,
...
) {
standardGeneric("thisIsPrivate_ref")
}
)
require(compiler)
setMethod(
f="thisIsPublic_ref",
signature=signature(.self="B"),
definition=cmpfun(function(
.self,
...
){
.self$b.1 * 1000
})
)
setMethod(
f="thisIsPrivate_ref",
signature=signature(.self="B"),
definition=cmpfun(function(
.self,
...
){
.self$b.2
})
)
Instances
x.b <- new("B", b.1=10, b.2=TRUE)
x.a <- new("A", a.1=x.b, a.2="hello world")
Public vs. private
Instances of class A (i.e. x.a) should be allowed to use class B's public methods:
> x.a$a.1$thisIsPublic()
[1] 10000
Instances of class A (i.e. x.a) should not be allowed to use class B's private methods. So I would want this not to work, i.e. result in an error:
> x.a$a.1$thisIsPrivate()
[1] TRUE
Any idea how one could specify this?
The only thing I came up with so far:
Adding a sender argument to each method, explicitly specify it for each method call and check if class(.self) == class(sender). But that seems a bit “explicit“.
As functions are first-class objects in R, you can embed one inside the other, as follows:
hello <- function() {
print_ <- function() {
return ('hello world')
}
print_()
}
Yes, it's cheeky, probably not the cleanest way, but it does work... Invoke using 'hello()'.
The short answer is to make a package. R's object systems and it's means of partitioning code (namespaces) are more separate than their equivalents in Java-like languages.
When you make a package, you specify what gets exported in a file called NAMESPACE using directives export and exportMethods. You can choose not to export methods and other R objects that you wish to be package private (to use Java terminology). See the Namespaces with S4 classes and methods section of the Writing R Extensions manual
Making a package is tricky the first time you do it, but there's lot's of help. See the docs for package.skeleton and the Writing R Extensions manual linked above.
Make sure Reference classes are really what you want. Regular S4 classes are usually the more R-ish way, for whatever that's worth. A great source of information about R's many OO constructs (and about packaging, too) is on Hadley Wickham's devtools wiki.