Understanding what Fault, Error and Failure mean

Understanding what Fault, Error and Failure mean - testing

Consider the following class:
class Xyz {
public int count;
public void numZero (int[] x) {
// Effects: if x == null throw NullPointerException
// else return the number of occurrences of 0 in x
int count = 0;
for (int i = 1; i < x.length; i++) //we have a bug here
{
if (x[i] == 0)
{
count++;
}
}
this.count = count;
}
}
I'm trying to wrap my head about what Fault, Error and Failure really mean.
Fault
From what I've come to understand, a Fault in this context would be a flaw in the code's written logic.
So in this case the Fault would be the fact that the code instructs the computer to start iterating over all elements of v with a start index of 1 instead of the expected 0.
Error
When running the above method, we always get an Error but in once instance (when v.length == 0), as what we really want is to iterate over all elements of x, but since we're starting with i = 1, that is not really happening.
With an empty vector as input, as we don't enter the for loop, so our incorrect code isn't run, meaning that the Error doesn't happen, and everything happens as should in theory.
Failure
Since our code has a Fault that in execution-time will almost always manifest in a Error, we only have a Failure when we effectively see the incorrect output.
Assuming that an Error effectively happened in my program, we only have a Failure if it is in some way visible to the outside world. That is, had I private int count; instead of public int count; I'd never ever have an Error in my class (of course it'd be the most useless class ever!). Is this right?
Is everything I said correct or am I erring in something?
Thanks

Failure: A difference from the expected result. This is the problem
you observe.
Fault: The cause of the failure.
Error: The mistake which caused the fault to occur. e.g, typos.
An example of failure, fault and error.
pre: param is an integer.
post: returns the product of the param multiplied by 2.
1. int double (int param) {
2. int result;
3. result = param * param;
4. return result;
5. }
• A call to double(3) returns 9, but the post condition says it should return 6.
• Result 9 represents a failure.
• The failure is due to the fault at line 3, ( "* param" is used instead of "* 2")
• The error is a typo, ( someone typed "* param" instead of "* 2" by mistake).
Why give three different labels for a "Bug"?
They help communicate how precisely you know what the problem is.
Saying "failure" means you know something is wrong but don't know the cause.
Saying "fault" means you know the cause, but don't know why the fault occurred.
Saying "Error" means you know why the fault occurred; e.g.: The coder was distracted by a firetruck passing by.
You could ask, "But why did the person make a typo?" But that gets into into human factors and out of the scope of the question.
Source: Zhen Ming (Jack) Jiang - EECS 4413, Software Testing, York University.

First, a failure occurs whenever the actual service delivered by a system deviates from its expected service. Note that since even specifications can go wrong, the definition does not rely on them.
Second, an error is the part of the system state that may lead to a failure. The state of the system can go wrong but never reach the output, thus not lead to a failure.
Third, a fault is the cause of an error. It can be a design fault, a cosmic ray or whatever. If, as you point out, the fault is not activated, no error is produced.
Take a look at the basic concepts and terminology of dependability for more information.

Error is a deviation from the actual and the expected result. It represents the mistakes made by the people.
Faults are the result of an error. It is the incorrect step or process due to which the program or the software behaves in an unintended manner
Bug is an evidence of Fault in a program due to which program does not behave in an intended manner
Failure is an inability of the system or components to perform its required function. Failure occurs when Faults executes
Defect is said to be detected when Failure occurs.

There are a plurality of different definitions, the one I personally prefer is the following:
Fault -> Error -> Failure
Fault: The verified or hypothesized cause of an error (malfunctions, external interference, design errors).
Error: The manifestation of a fault within a program or data structure (difference between actual output and expected output).
Failure: The event that occurs when an error reaches the service interface, altering the service itself (leads to the inability of a system or component to perform required function according to its specification).

The Error in Error/Fault/Failure refers to the human error that introduced the problem. The human error was the incorrect thinking that caused the user to create an incorrect for statement in your example.
Errors are hard to measure or understand. It is difficult in many cases to know what the developer was thinking when the made the error that introduced the fault. That is why they like to differentiate between error and fault. We can see that there is a fault in the code, but it is hard to know why the error was created. It could be that the code was correct, and then during a subsequent change, the for loop was changed.
I always remember that an Error by a programmer leads to a fault in the code that results in a failure for the user. Not all errors result in a fault. Not all faults result in failures.

The software Fault refers to a bug in the code. And it is DURING the software activity.
While software Failure is when the system misbehaves. This is observed LATER than a fault.
Fault may be the cause for a Failure. Fault is "WHAT" and Failure is "WHEN".
Those are only fundamentals, but still I hope that it sheds some light on the matter.

Related

Why do I get “Lexical with name '$x' does not exist in this frame” when using “will leave”?

I have the following Raku code:
class Thing {
method close {
say "closed";
}
};
for 1..1000 {
my $x will leave { .close } = Thing.new;
}
Running it, I get the error:
Lexical with name '$x' does not exist in this frame
in block <unit> at c.raku line 7
Interestingly, this only happens if the number of iterations is high enough (with 500 or 100, I do not get any error messages).
If I change the body of the cycle to
my $x = Thing.new;
LEAVE $x.close;
then everything works without errors as well.
What is happening here? Did I misunderstand the will leave construct? (It seems to me the two versions should be equivalent.)
EDIT: Further observation – when running the code multiple times, the error appears nondeterministically. This suggests that the problem is somehow connected to garbage collection, but I am still confused as to what actually happens here.
I am using Rakudo v2021.03.

This is a bug. Have made an issue for it: https://github.com/rakudo/rakudo/issues/4403
I suggest using the workaround in the meantime.

Is division by zero considered error or failure?

When my program stops working due to division by 0 then is it considered error or failure?
Error is called when a human action produces an incorrect result. Failure is when we get different result than expected. I'm just wondering if the unexpected stop from program is a error by human or failure of the program?

It is considere as an error. Or we can also called this an exception. And if u dont want to prevent this u can use exception handling.
Depending on the programming environment and the type of number (e.g. floating point, integer) being divided by zero, it may generate positive or negative infinity, generate an exception, generate an error message, cause the program to terminate.

It is a type of failure when it is in compile time but it is an error because usually this type of an error occur when the program is compiled and then executed

Is it ok to execute the entire ensure block if the process is terminated?

Today I've learned that in Pharo the execution of:
[v := 1] ensure: [self halt. v := 2]
will end up setting v = 2, even when we abandon the process at the halt window(!).
I find this debatable. For me, the semantics of #ensure: means that the sequence
self halt. v := 2
must be executed, regardless the circumstances with the receiver block, not regardless the logic of the argument block. And since the logic of #halt includes the event of terminating the process, I find it intrusive the obstinate evaluation of the second sentence.
Next I tried the following:
[v := 1] ensure: [1 / 0. v := 2]
When the ZeroDivide exception popped-up I closed the debugger and still the value of v was 2 (same as with #halt.)
Finally, I evaluated:
[v := 1] ensure: [n := 1 / 0. v := v + n]
and closed the debugger on the ZeroDivide exception. This time the value of v was 1 but I got no exception from the fact that v + n cannot be evaluated. In other words, the error went on silently.
So my question is. What's the rational behind this behavior? Shouldn't the process just terminate at the point it would terminate under "normal" circumstances, i.e., with no #ensure: involved?

Interesting one. It seems that your answer lies in the method BlockClosure>>valueNoContextSwitch, which is called by #ensure:. If you read the comment there, it says that it creates an exact copy of BlockClosure>>value (in a primitive), and the return value of that copy gets returned, not the return value of the original block containing your halt which you terminated. So the copy gets executed (apparently ignoring the copied halt), even if the original doesn't get to finish.
My guess is that this is intended to ensure (no pun intended) that the ensure: block always runs, but has the unintended side effect of ignoring the termination of the original block. I agree with you that this is not only counter-intuitive, but also probably not what was intended.

I guess this is behavior which is not fully defined by any (ANSI) standard, but correct me, if I am wrong.
Other Smalltalks seem to behave different. I tried it in Smalltalk/X, where the Debugger offers 3 options: "Continue" (i.e. proceed), "Abort" (i.e. unwind) and "Terminate" (i.e. kill the process). I guess "Terminate" corresponds to what Squeak does when you close the debugger.
With "Abort" and "Terminate", the rest of the ensure block is NOT executed, with "Continue" it is. I guess that is ok, and what you would expect.
On Abort and Terminate (which are both unwinds to corrsponding exception handlers), it should not try to reevaluate or proceed the potentially wrong/bad/failing ensure block.
It is be the choice of the handler (which the Debugger basically is) if it wants to proceed or not. If not, then it should get out of the ensure block and continue to execute any other ensure blocks which may be above in the calling chain.
This is consistent with the behavior of exception handling blocks, which are also not reevaluated or proceeded if the same exception is raised within. In ST/X, there is explicit code in the exception classes which cares for this situation, so it is definitely by purpose and not by side effect.
My guess is that this is wrong in Squeak and the Squeak developers should be told.

Why ZeroDivide is resumable?

In Pharo (and other dialects) the ZeroDivide exception is resumable. Why?. For instance, if you evaluate 1 / 0 and then proceed, the answer is the ZeroDivide error. Why is this? Shouldn't ZeroDivide be not resumable?
EDIT:
Let's review this issue in more depth.
The problem here is that if the exception happens, what we get is the ZeroDivide exception. So, the only reason I can think of for making this exception resumable is to enable the following:
[a / b] on: ZeroDivide do: [:ex | ex resume: self anythingButTheQuotient],
right?
But this could have also be written
[a / b] on: ZeroDevide do: [self anythingButTheQuotient]
without requiring the exception to be resumable.
Resumable exceptions make sense if there is an "interesting" #defaultAction. But this seems not to be the case with ZeroDivide.
One could be tempted to say that in many cases one has this kind of code:
b = 0 ifTrue: [^0] ifFalse: [^a / b]
so why not using 0 as the #defaultAction? That would make the above code simpler (in those cases) and would only require a special handler in the (arguably) few ones that must behave differently. However, this would be a really bad decision because the default behavior would hide errors that, as we all know, the later they manifest the worst.

Yes, this is surprising at first sight, but the ANSI standard says:
Zero divide exceptions are resumable so any message in this protocol
that signal such an exception may ultimately return to their sender.
The example you gave was trivial, but when installing the handler a few methods above, it's less trivial to resume the exception where it was signalled.
[self doSomethingComplex]
on: ZeroDivide
do:
[:exception |
"Handle zero divide as inf/nan as if performed in floating point arithmetic"
exception resume: exception dividend asFloat / 0.0]
In Squeak or Pharo, see the references to signalContext instance variable in class Exception. You'll see that resuming is your sole option for returning control to the signaller.

How to handle GSM buffer on the Microcontroller?

I have a GSM module hooked up to PIC18F87J11 and they communicate just fine . I can send an AT command from the Microcontroller and read the response back. However, I have to know how many characters are in the response so I can have the PIC wait for that many characters. But if an error occurs, the response length might change. What is the best way to handle such scenario?
For Example:
AT+CMGF=1
Will result in the following response.
\r\nOK\r\n
So I have to tell the PIC to wait for 6 characters. However, if there response was an error message. It would be something like this.
\r\nERROR\r\n
And if I already told the PIC to wait for only 6 characters then it will mess out the rest of characters, as a result they might appear on the next time I tell the PIC to read the response of a new AT command.
What is the best way to find the end of the line automatically and handle any error messages?
Thanks!

In a single line
There is no single best way, only trade-offs.
In detail
The problem can be divided in two related subproblems.
1. Receiving messages of arbitrary finite length
The trade-offs:
available memory vs implementation complexity;
bandwidth overhead vs implementation complexity.
In the simplest case, the amount of available RAM is not restricted. We just use a buffer wide enough to hold the longest possible message and keep receiving the messages bytewise. Then, we have to determine somehow that a complete message has been received and can be passed to further processing. That essentially means analyzing the received data.
2. Parsing the received messages
Analyzing the data in search of its syntactic structure is parsing by definition. And that is where the subtasks are related. Parsing in general is a very complex topic, dealing with it is expensive, both in computational and laboriousness senses. It's often possible to reduce the costs if we limit the genericity of the data: the simpler the data structure, the easier to parse it. And that limitation is called "transport layer protocol".
Thus, we have to read the data to parse it, and parse the data to read it. This kind of interlocked problems is generally solved with coroutines.
In your case we have to deal with the AT protocol. It is old and it is human-oriented by design. That's bad news, because parsing it correctly can be challenging despite how simple it can look sometimes. It has some terribly inconvenient features, such as '+++' escape timing!
Things become worse when you're short of memory. In such situation we can't defer parsing until the end of the message, because it very well might not even fit in the available RAM -- we have to parse it chunkwise.
...And we are not even close to opening the TCP connections or making calls! And you'll meet some unexpected troubles there as well, such as these dreaded "unsolicited result codes". The matter is wide enough for a whole book. Please have a look at least here:
http://en.wikibooks.org/wiki/Serial_Programming/Modems_and_AT_Commands. The wikibook discloses many more problems with the Hayes protocol, and describes some approaches to solve them.

Let's break the problem down into some layers of abstraction.
At the top layer is your application. The application layer deals with the response message as a whole and understands the meaning of a message. It shouldn't be mired down with details such as how many characters it should expect to receive.
The next layer is responsible from framing a message from a stream of characters. Framing is extracting the message from a stream by identifying the beginning and end of a message.
The bottom layer is responsible for reading individual characters from the port.
Your application could call a function such as GetResponse(), which implements the framing layer. And GetResponse() could call GetChar(), which implements the bottom layer. It sounds like you've got the bottom layer under control and your question is about the framing layer.
A good pattern for framing a stream of characters into a message is to use a state machine. In your case the state machine includes states such as BEGIN_DELIM, MESSAGE_BODY, and END_DELIM. For more complex serial protocols other states might include MESSAGE_HEADER and MESSAGE_CHECKSUM, for example.
Here is some very basic code to give you an idea of how to implement the state machine in GetResponse(). You should add various types of error checking to prevent a buffer overflow and to handle dropped characters and such.
void GetResponse(char *message_buffer)
{
unsigned int state = BEGIN_DELIM1;
bool is_message_complete = false;
while(!is_message_complete)
{
char c = GetChar();
switch(state)
{
case BEGIN_DELIM1:
if (c = '\r')
state = BEGIN_DELIM2;
break;
case BEGIN_DELIM2:
if (c = '\n')
state = MESSAGE_BODY:
break;
case MESSAGE_BODY:
if (c = '\r')
state = END_DELIM;
else
*message_buffer++ = c;
break;
case END_DELIM:
if (c = '\n')
is_message_complete = true;
break;
}
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas