Validation in Constructor and Constructor Violation

Validation in Constructor and Constructor Violation - oop

I was doing some reading here(1,2) and I was wondering if making sure the parameters provided during initialization are correct, violates the guideline that constructors shouldn't do work.
For example (Python):
class Employee:
def __init__(self, empFirstname, empLastname, empEmail):
self._validate_employee(empFirstname, "First Name")
self._validate_employee(empLastname, "Last name")
self._validate_employee(empEmail, "Email")
self._validate_email(empEmail, "Email")
self.empFirstname = empFirstname
self.empLastname = empLastname
self.empEmail = empEmail
#property
def email(self):
return self.empEmail
def _validate_employee(self, parameter, error_message):
if not parameter:
raise TypeError("{0} {1}" .format(error_message, "is missing"))
def _validate_email(self, email, parameter):
if "#" not in email or "." not in email:
raise TypeError("{0} {1}" .format(parameter, " is invalid"))
In my example, I check to make sure the first and last names aren't blank and that the email is valid. Did I violate the guideline?
Update:
I'm not asking if it should throw, I'm asking if I'm violating the guideline that constructors shouldn't do work when it validates my parameters.

The requirement to provide arguments that satisfy certain conditions is a part of Design by Contract, namely it corresponds to preconditions. The client (of a constructor in your case) should guarantee that the arguments are as expected. If this is not the case, the supplier (the constructor in your case) cannot ensure the resulting object is in a valid state after the constructor returns. In languages that support Design by Contract natively, one would associate a class invariant stating that the names are not empty and the email address follows the naming convention. This could be achieved only when the arguments passed to the constructor are valid.
The conditions you mention would be written as preconditions and could be turned on or off depending on the policy established in the software development process. When enabled, they are checked on entry to the constructor, before its real body is executed. From this point of view, your code is not doing any additional work, but makes sure the arguments are correct. After program verification that ensures the preconditions are satisfied all the time, or extensive testing that gives the feeling that the preconditions are not violated by the client, they can be disabled, thus removing any "additional work".
To summarize:
the "additional code" in your examples are preconditions
preconditions are part of software contracts
the checks of the contracts can be enabled or disabled depending on the adopted policy, it is safe to disable them as soon as all clients guarantee the preconditions are never violated, thus removing any "additional work" from the constructor (or any other software component)

No, validating parameters and inputs is vital for all functions, including constructors. Blindly setting parameters without checking/error handling can lead to serious issues ranging from unexpected performance to the ability for malicious actors to manipulate your program in ways that it was not intended.

Related

Where to validate parameters sent to an API?

In which place is best practice to validate parameters sent by the user in an API design? By parameter validation I refer to: checking required params are sent, ensure they have correct format and so... Here are a couple of simple examples that validate an id has been sent. It is Python using Flask to illustrate:
A) Add validation logic in the route definition, within the controller.
#api.route('/job', methods=['GET'])
def get_jobs():
try:
if params["id"] is None:
raise Exception("Invalid param ID parameters.")
job = job_manager.get_job(params["id"])
return jsonify(job)
B) In the core of the app. This is the business layer, where logic is applied to transform data.
class JobManager:
def get_job(self, job_id) -> None:
if job_id is None:
raise Exception("Invalid param ID parameters.")
In more complex scenarios a validator service or decorators could be used, but the question would be the same: At which point of the code is best practice to validate a user's input.
If the answer is none of the scenarios above (or both), please provide more details on your answer. If possible, try to be language agnostic as I'm looking for a best practice that can be applied anywhere.

Parsing, as a rule, should happen at the point where information enters your system, or as close to that point as is practical.
Therefore certainly "application layer" rather than "domain layer/business layer": either invoked by the controller itself, or very close to it. (Not typically "in" the controller, because you should be able to test the parser without being coupled to a bunch of HTTP ceremony.)
#api.route('/job', methods=['GET'])
def get_jobs():
try:
job_id = parse_job_id(params["id"])
job = job_manager.get_job(job_id)
return jsonify(job)
In typed languages, this can make your life a lot easier, because you greatly reduce the number of places you have to ask "does this general purpose data structure have the information I expect?"
Checks against business policy, on the other hand, normally belong in the domain layer.
For example: if your API requires a date, the checks that the date is actually present, and that the date is represented in the appropriate ISO-8601 format, and so on... these kinds of checks all happen as part of the parsing of the input by the controller.
On the other hand, checking that the date is "in the future", or that the date is within warranty, or ... these are checks that belong in your domain code.

Generally, I split validation into phases:
Immediate syntax validation of input data in the REST controller
Business-logic validation in the services
The first validation should only flag things that are definitely wrong; e.g. missing required fields, type mismatch, unparsable strings, any attempts of code injection and the presence (or lack of) of security tokens.
When this validation passes, the input data is at least syntactically correct and may be passed on to the services, where a more strict validation occurs; i.e. does the input data make sense business-wise, does the resource with that ID exist - and so on.
Short version: The first validation looks for things that are obviously wrong, while the following validation ensures input data is correct and meaningful business-wise.

What does ‘serial’ do?

From the docs that say,
Returns the self-reference to the instance itself:
my $b; # defaults to Any
say $b.serial.^name; # OUTPUT: «Any␤»
my $breakfast = 'food';
$breakfast.serial.say; # OUTPUT: «food␤»
I do not have the foggiest what this routine does, please can someone explain?

On Supplys, it is an informational method that is supposed to indicate whether there will never be any concurrent emit on that Supply.
On HyperSeq and RaceSeq, it returns a serialized Seq, so you could consider it the opposite of the hyper and race method.
In general, it appears to return itself, which seems to make sense from the HyperSeq and RaceSeq point of view.
And yes, these should be documented properly, so please create a documentation issue. Thank you!

In the doc example it does nothing. That is, if you remove it you get the same results:
my $b; # defaults to Any
say $b.^name; # OUTPUT: «Any␤»
my $breakfast = 'food';
$breakfast.say; # OUTPUT: «food␤»
More generally, I think you'd best ignore the serial method other than to open a doc issue pointing to this SO if you'd like to improve the doc.
The serial method does not appear to be in the official language
A search of the roast repo for "serial" yields zero matches.
Within Rakudo source code the method name serial has been overloaded to have one of three meanings:
A boolean declaring whether a Supply sequence is always serial. Rakudo source examples: 1, 2. This looks to me like an internal method that doesn't need to be documented.
A coercion of parallel sequence (hyper or race) to a serial version of the same sequence. This looks to me like an internal method that doesn't need to be documented.
A "no op" that returns its invocant. I suspect it would be best if it were not documented, at least until such time as its raison d'etre is clear; its official status viz-a-viz the spec (roast) is clear; and/or there's an attempt to systematically document which operations have the is nodal set on them.
None of the above seems to warrant ordinary users' attention or documentation.
The Any class definition of a serial method seems pointless
The Any class serial method returns self, i.e. when called it is a no op.
I don't currently understand why there is an Any class definition.
One possible point for it would be that there are .serial calls made by internal code on instances of an unknown and generally unknowable class and there thus needs to be a default definition of serial in the Any class.
But a search of the rakudo repo for ".serial" suggests that calls are only made to supplies or hyper/race seqs.
That said, I note the is nodal trait on the proto serial declaration in Any that immediately precedes the multi method serial declaration. Perhaps that is the reason it's in Any.
See also Arbitrary drift of methods to Mu and Any.
The documentation you quoted seems pointless
The definition and example seem to reflect someone's sense of humor. I applaud use of humor but in this case I suspect the best improvement would be to just remove the page you linked.

What does noautovalidity mean in Vulkan API schema (vk.xml)?

What does noautovalidity mean on a struct member definition in the Vulkan API schema (vk.xml)?
registry.rnc says tag stating that no automatic validity language should be generated
The Vulkan registry spec says: prevents automatic validity language being generated for the tagged item. Only suppresses item-specific validity - parenting issues etc. are still captured. It must also be used for structures that have no implicit validity when such structure has explicit validity.
What is automatic validity language?
What is an example of something that has noautovalidity, and of that example, what was the item-specific validity that is suppressed? and why was it decided to be suppressed?

Consult the XML schema doc.
noautovalidity means most Valid Usage (Implicit) entries will not be generated for a given parameter or structure member.
It may mean Implicit Validity rules do not apply to the given item, although many times very similar Explicit VU is given in its place.
Examples:
VkWriteDescriptorSet::pImageInfo has noautovalidity because it shares descriptorCount. It prevents generating something like "pImageInfo has to be an array of descriptorCount elements". Instead Explicit VU is given, e.g.: "If descriptorType is *_IMAGE, then pImageInfo has to be an array of descriptorCount elements`.
VkComputePipelineCreateInfo::basePipelineHandle is noautovalidity because the API allows the parameter to be ignored if flags does not contain VK_PIPELINE_CREATE_DERIVATIVE_BIT. Without the attribute the spec would insist the handle has to be a valid VkPipeline (or additionally allowed to be VK_NULL_HANDLE if optional).
Corner case example:
VkViewport::x is noautovalidity. It does not seemingly need noautovalidity, but in this case it signifies that although the struct has no Implicit Validity, some Explicit Validity may be present in the spec.

What is the difference between an Idempotent and a Deterministic function?

Are idempotent and deterministic functions both just functions that return the same result given the same inputs?
Or is there a distinction that I'm missing?
(And if there is a distinction, could you please help me understand what it is)

In more simple terms:
Pure deterministic function: The output is based entirely, and only, on the input values and nothing else: there is no other (hidden) input or state that it relies on to generate its output. There are no side-effects or other output.
Impure deterministic function: As with a deterministic function that is a pure function: the output is based entirely, and only, on the input values and nothing else: there is no other (hidden) input or state that it relies on to generate its output - however there is other output (side-effects).
Idempotency: The practical definition is that you can safely call the same function multiple times without fear of negative side-effects. More formally: there are no changes of state between subsequent identical calls.
Idempotency does not imply determinacy (as a function can alter state on the first call while being idempotent on subsequent calls), but all pure deterministic functions are inherently idempotent (as there is no internal state to persist between calls). Impure deterministic functions are not necessarily idempotent.
Pure deterministic
Impure deterministic
Pure Nondeterministic
Impure Nondeterministic
Idempotent
Input
Only parameter arguments (incl. this)
Only parameter arguments (incl. this)
Parameter arguments and hidden state
Parameter arguments and hidden state
Any
Output
Only return value
Return value or side-effects
Only return value
Return value or side-effects
Any
Side-effects
None
Yes
None
Yes
After 1st call: Maybe.After 2nd call: None
SQL Example
UCASE
CREATE TABLE
GETDATE
DROP TABLE
C# Example
String.IndexOf
DateTime.Now
Directory.Create(String)Footnote1
Footnote1 - Directory.Create(String) is idempotent because if the directory already exists it doesn't raise an error, instead it returns a new DirectoryInfo instance pointing to the specified extant filesystem directory (instead of creating the filesystem directory first and then returning a new DirectoryInfo instance pointing to it) - this is just like how Win32's CreateFile can be used to open an existing file.
A temporary note on non-scalar parameters, this, and mutating input arguments:
(I'm currently unsure how instance methods in OOP languages (with their hidden this parameter) can be categorized as pure/impure or deterministic or not - especially when it comes to mutating the the target of this - so I've asked the experts in CS.SE to help me come to an answer - once I've got a satisfactory answer there I'll update this answer).
A note on Exceptions
Many (most?) programming languages today treat thrown exceptions as either a separate "kind" of return (i.e. "return to nearest catch") or as an explicit side-effect (often due to how that language's runtime works). However, as far as this answer is concerned, a given function's ability to throw an exception does not alter its pure/impure/deterministic/non-deterministic label - ditto idempotency (in fact: throwing is often how idempotency is implemented in the first place e.g. a function can avoid causing any side-effects simply by throwing right-before it makes those state changes - but alternatively it could simply return too.).
So, for our CS-theoretical purposes, if a given function can throw an exception then you can consider the exception as simply part of that function's output. What does matter is if the exception is thrown deterministically or not, and if (e.g. List<T>.get(int index) deterministically throws if index < 0).
Note that things are very different for functions that catch exceptions, however.
Determinacy of Pure Functions
For example, in SQL UCASE(val), or in C#/.NET String.IndexOf are both deterministic because the output depends only on the input. Note that in instance methods (such as IndexOf) the instance object (i.e. the hidden this parameter) counts as input, even though it's "hidden":
"foo".IndexOf("o") == 1 // first cal
"foo".IndexOf("o") == 1 // second call
// the third call will also be == 1
Whereas in SQL NOW() or in C#/.NET DateTime.UtcNow is not deterministic because the output changes even though the input remains the same (note that property getters in .NET are equivalent to a method that accepts no parameters besides the implicit this parameter):
DateTime.UtcNow == 2016-10-27 18:10:01 // first call
DateTime.UtcNow == 2016-10-27 18:10:02 // second call
Idempotency
A good example in .NET is the Dispose() method: See Should IDisposable.Dispose() implementations be idempotent?
a Dispose method should be callable multiple times without throwing an exception.
So if a parent component X makes an initial call to foo.Dispose() then it will invoke the disposal operation and X can now consider foo to be disposed. Execution/control then passes to another component Y which also then tries to dispose of foo, after Y calls foo.Dispose() it too can expect foo to be disposed (which it is), even though X already disposed it. This means Y does not need to check to see if foo is already disposed, saving the developer time - and also eliminating bugs where calling Dispose a second time might throw an exception, for example.
Another (general) example is in REST: the RFC for HTTP1.1 states that GET, HEAD, PUT, and DELETE are idempotent, but POST is not ( https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html )
Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE SHOULD NOT have side effects, and so are inherently idempotent.
So if you use DELETE then:
Client->Server: DELETE /foo/bar
// `foo/bar` is now deleted
Server->Client: 200 OK
Client->Server DELETE /foo/bar
// foo/bar` is already deleted, so there's nothing to do, but inform the client that foo/bar doesn't exist
Server->Client: 404 Not Found
// the client asks again:
Client->Server: DELETE /foo/bar
// foo/bar` is already deleted, so there's nothing to do, but inform the client that foo/bar doesn't exist
Server->Client: 404 Not Found
So you see in the above example that DELETE is idempotent in that the state of the server did not change between the last two DELETE requests, but it is not deterministic because the server returned 200 for the first request but 404 for the second request.

A deterministic function is just a function in the mathematical sense. Given the same input, you always get the same output. On the other hand, an idempotent function is a function which satisfies the identity
f(f(x)) = f(x)
As a simple example. If UCase() is a function that converts a string to an upper case string, then clearly UCase(Ucase(s)) = UCase(s).
Idempotent functions are a subset of all functions.

A deterministic function will return the same result for the same inputs, regardless of how many times you call it.
An idempotent function may NOT return the same result (it will return the result in the same form but the value could be different, see http example below). It only guarantees that it will have no side effects. In other words it will not change anything.
For example, the GET verb is meant to be idempotent in HTTP protocol. If you call "~/employees/1" it will return the info for employee with ID of 1 in a specific format. It should never change anything but simply return the employee information. If you call it 10, 100 or so times, the returned format will always be the same. However, by no means can it be deterministic. Maybe if you call it the second time, the employee info has changed or perhaps the employee no longer even exists. But never should it have side effects or return the result in a different format.
My Opinion
Idempotent is a weird word but knowing the origin can be very helpful, idem meaning same and potent meaning power. In other words it means having the same power which clearly doesn't mean no side effects so not sure where that comes from. A classic example of There are only two hard things in computer science, cache invalidation and naming things. Why couldn't they just use read-only? Oh wait, they wanted to sound extra smart, perhaps? Perhaps like cyclomatic complexity?

What is the appropriate amount of error-checking?

public void PublicMethod(FooBar fooBar)
{
if (fooBar == null)
throw new ArgumentNullException("fooBar", "fooBar cannot be null");
// log the call [added: Thanks S.Lott]
_logger.Log("PublicMethod called with fooBar class " + fooBar.Classification);
int action = DetermineAction();
PrivateMethod(fooBar, action);
}
private void PrivateMethod(FooBar fooBar, int action)
{
if (fooBar == null)
throw new ArgumentNullException("fooBar", "fooBar cannot be null"); // Is this line superfluous?
/*
Do something
*/
}
Is it OK to skip this kind of error checking in private methods if the input is already checked on the public interface? Usually there's some sort of rule-of-thumb one can go by...
Edit:
Maybe ArgumentNullException isn't such a good example because the argument can be made that you should check at both levels but return different error messages.

I would say no.
While it certainly holds true that you in this case knows that it has already been checked for nullability, in two months time the youngest intern will come along and write
PublicMethod2 that also calls PrivateMethod, but lo and behold he forgot to check for null.

Since the public method doesn't really use foobar, I'm not sure why it's checking. The current private method cares, but it's the private method's responsibility to care. Indeed, the whole point of a private method is to delegate all the responsibilities to it.
A method checks the input it actually uses; it doesn't check stuff it's just passing through.
If a different subclass has the same public method, but some different private method implementation -- one that can tolerate nulls -- what now? You have a public method that now has wrong constraints for the new subclass.
You want to do as little as possible in the public method so that various private implementations are free to do the right thing. Don't "over-check" or "just-in-case" check. Delegate responsibility.

I'd error check everything you can, you never know when something might happen that you didn't think about. (and its better safe than sorry)

When using design by contract (http://en.wikipedia.org/wiki/Design_by_contract) it’s normally client’s (public method) responsibility to make correct invocation, i.e. pass on valid parameters. In this particular scenario it depends whether null belongs to a set of valid input values, therefore there are 3 options:
1) Null is valid value: throwing exceptions or errors would have meant breaking the contract, the server (private method) has to process the null and shouldn’t complain.
2) Null is invalid value and passed by code within your control: it is up to the server (private method) to decide how to react. Obviously, throwing an exception is more graceful way of handling the situation, but it has a cost of having to handle that exception somewhere else up the stack. Exceptions are not the best way to deal with violation of contract caused by programming blunders. You really should throw exceptions not when a contract is already violated but when it cannot be fulfilled because of environmental problems what cannot be controlled in software. Blunders are better handled by sticking an assertion into the beginning of the private method to check that the parameter is not null. This will keep the complexity of your code down, there is no cost of having to handle the exception up the stack and it will achieve the goal of highlighting broken contracts during testing.
3) Then there is defensive programming (http://en.wikipedia.org/wiki/Defensive_programming). When dealing with parameters passed by an external code outside your control the immediate layer of your code needs to run paranoid level of checks and return errors according to its communication contract with the external world. Then, going deeper into the code layers not exposed externally, it still makes more sense to stick to the programming by contract.

At least put a comment that PrivateMethod must have a non-null FooBar and that PublicMethod checks this.

You might want to also mark the "private" method as private or protected.

That depends if a null-value indicates an error for a method. Remember that methods could also be called messages to an object; they operate on the state of the object aswell. Parameters can specialize the kind of message sent.
If publicMethod() does not use a parameter and changes the state of the instance while privateMethod() uses the parameter, do not consider it an error in publicMethod, but do in privateMethod().
If publicMethod() does not change state, consider it an error.
You could see the latter case as providing an interface to the internal functioning of an object.

I'd consider the answer to be "yes, do the check again" because:-
The private member could be reused again in the future from a different path through the code, so program defensively against that situation.
If you perform unit tests on private methods
My view might change if I had a static analyser that could pick this up and not flag the potential use of a null reference in the private method.

In cases where PrivateMethod will be called frequently with input that has already been verified, and only rarely with user input, Then I would use the PublicMethod/PrivateMethod concept with no error checking on PrivateMethod (and with PublicMethod doing nothing other then checking the parameters and calling PrivateMethod)
I would also call the private method something like PublicMethod_impl (for "implementation") so it's clear that it's an internal use/ no checking method.
I maintain that this design leads to more robust application, as it forces you to think about what's checked when. Too often people who always check parameters fall into the trap of "I've checked something, therefore I've checked everything".
As an example of this, a former co-worker (programming in C) would, before using a pointer, always check to see if it was null. Generally, the pointers in his code were initialized as startup and never changed, so the chances of it being null were quite low. Moreover, the pointer has one correct value and 65535 possible wrong values, and he was only checking for one of those wrong values.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas