The PSP0 Process and modern tools and programming languages [closed] - process

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
The Personal Software Process (PSP) is designed to allow software engineers to understand and improve their performance. The PSP uses scripts to guide a practitioner through the process. Each script defines the purpose, the entry criteria, the steps to perform, and the exit criteria. PSP0 is designed to be a framework that allows for starting a personal process.
One of the scripts used in PSP0 is the Development Script, which is to guide development. This script is used once there is a requirements statement, a project plan summary, time and defect recording logs are made, and a defect type standard is established. The activities of this script are design, code, compile, and test. The script is exited when you have a thoroughly tested application and complete time and defect logs.
In the Code phase, you review the requirements and make a design, record requirements defects in the log, and perform time tracking. In the Compile phase, you compile, fix any compile time errors, and repeat until the program compiles, and record any defects and time. Finally, in the Test phase, you test until all tests run without error and all defects are fixed, while recording time and defects.
My concerns are with how to manage the code, compile, and test phases when using modern programming languages (especially interpreted languages like Python, Perl, and Ruby) and IDEs.
My questions:
In interpreted languages, there is no compile time. However, there might be problems in execution. Is executing the script, outside of the unit (and other) tests, considered "compile" or "test" time? Should errors with execution be considered "compile" or "test" errors when tracking defects?
If a test case encounters a syntax error, is that considered a code defect, a compile defect, or a test defect? The test actually found the error, but it is a code problem.
If an IDE identifies an error that would prevent compilation before actually compiling, should that be identified? If so, should it be identified and tracked as a compile error or a code error?
It seems like the PSP, at least the PSP0 baseline process, is designed to be used with a compiled language and small applications written using a text editor (and not an IDE). In addition to my questions, I would appreciate the advice and commentary of anyone who is using or has used the PSP.

In general, as the PSP is a personal improvement process, the answers to your actual questions do not matter as long as you pick one answer and apply it consistently. That way you will be able to measure the times you take in each defined phase, which is what PSP is after. If your team is collectively using the PSP then you should all agree on which scripts to use and how to answer your questions.
My takes on the actual questions are (not that they are relevant):
In interpreted languages, there is no compile time. However, there might
be problems in execution. Is executing
the script, outside of the unit (and
other) tests, considered "compile" or
"test" time? Should errors with
execution be considered "compile" or
"test" errors when tracking defects?
To me, test time is the time when the actual tests run and not anything else. In this case, both the errors and execution time I'd add as 'compile' time, time which is used in generating and running the code.
If a test case encounters a syntax error, is that considered a code
defect, a compile defect, or a test
defect? The test actually found the
error, but it is a code problem.
Syntax errors are code defects.
If an IDE identifies an error that would prevent compilation before
actually compiling, should that be
identified? If so, should it be
identified and tracked as a compile
error or a code error?
If the IDE is part of your toolchain, then it seeing errors is just like yourself having spotted the errors, and thus code errors. If you don't use the IDE regularly, then I'd count them as compile errors.

I've used PSP for years. As others have said, it is a personal process, and you will need to evolve PSP0 to improve your development process. Nonetheless, our team (all PSP-trained) grappled with these issues on several fronts. Let me give you an idea of the components involved, and then I'll say how we managed.
We had a PowerBuilder "tier"; the PowerBuilder IDE prevents you from even saving your code until it compiles correctly and links. Part of the system used JSP, though the quantity of Java was minor, and boilerplate, so that in practice, we didn't count it at all. A large portion of the system was in JS/JavaScript; this was done before the wonderful Ajax libraries came along, and represented a large portion of the work. The other large portion was Oracle Pl/Sql; this has a somewhat more traditional compile phase.
When working in PowerBuilder, the compile (and link) phase started when the developer saved the object. If the save succeeded, we recorded a compile time of 0. Otherwise, we recorded the time it took for us to fix the error(s) that caused the compile-time defect. Most often, these were defects injected in coding, removed in compile phase.
That forced compile/link aspect of the PowerBuilder IDE forced us to move the code review phase to after compiling. Initially, this caused us some distress, because we weren't sure how/if such a change would affect the meaning of the data. In practice, it became a non-issue. In fact, many of us also moved our Oracle Pl/Sql code reviews to after the compile phase, too, because we found that when reviewing the code, we would often gloss over some syntax errors that the compiler would report.
There is nothing wrong with a compile time of 0, any more than there is anything wrong with a test time of 0 (meaning your unit test passed without detecting errors, and ran significantly quicker than your unit of measure). If those times are zero, then you don't remove any defects in those phases, and you won't encounter a div/0 problem. You could also record a nominal minimum of 1 minute, if that makes you more comfortable, or if your measures require a non-zero value.
Your second question is independent of the development environment. When you encounter a defect, you record which phase you injected it in (typically design or code) and the phase you removed it (typically design/code review, compile or test). That gives you the measure called "leverage", which indicates the relative effectiveness of removing a defect in a particular phase (and supports the "common knowledge" that removing defects sooner is more effective than removing them later in the process). The phase the defect was injected in is its type, i.e., a design or coding defect. The phase the defect is removed in doesn't affect its type.
Similarly, with JS/JavaScript, the compile time is effectively immeasurable. We didn't record any times for compile phase, but then again, we didn't remove any defects in that phase. The bulk of the JS/JavaScript defects were injected in design/coding and removed in design review, code review, or test.

It sounds, basically, like your formal process doesn't match your practice process. Step back, re-evaluate what you're doing and whether you should choose a different formal approach (if in fact you need a formal approach to begin with).

In interpreted languages, there is no compile time. However, there might
be problems in execution. Is executing
the script, outside of the unit (and
other) tests, considered "compile" or
"test" time? Should errors with
execution be considered "compile" or
"test" errors when tracking defects?
The errors should be categorized according to when they were created, not when you found them.
If a test case encounters a syntax error, is that considered a code
defect, a compile defect, or a test
defect? The test actually found the
error, but it is a code problem.
Same as above. Always go back to the earliest point in time. If the syntax error was introduced while coding, then it corresponds to the coding phase, if it was introduced while fixing a bug, then it's in the defect phase.
If an IDE identifies an error that would prevent compilation before
actually compiling, should that be
identified? If so, should it be
identified and tracked as a compile
error or a code error?
I believe that should not be identified. It's just time spent on writing the code.
As a side note, I've used the Process Dashboard tool to track PSP data and found it quite nice. It's free and Java-based, so it should run anywhere. You can get it here:
http://processdash.sourceforge.net/

After reading the replies by Mike Burton, Vinko Vrsalovic, and JRL and re-reading the appropriate chapters in PSP: A Self-Improvement Process for Software Engineers, I've come up with my own takes on these problems. What is good, however, is that I found a section in the book that I originally missed when two pages stuck together.
In interpreted languages, there is
no compile time. However, there might
be problems in execution. Is executing
the script, outside of the unit (and
other) tests, considered "compile" or
"test" time? Should errors with
execution be considered "compile" or
"test" errors when tracking defects?
According to the book, it says that "if you are using a development environment that does not compile, then you should merely skip the compile step." However, it also says that if you have a build step, "you can record the build time and any build errors under the compile phase".
This means that for interpreted languages, you will either remove the compile phase from the tracking or replace compilation with your build scripts. Because the PSP0 is generally used with small applications (similar to what you would expect in a university lab), I would expect that you would not have a build process and would simply omit the step.
If a test case encounters a syntax
error, is that considered a code
defect, a compile defect, or a test
defect? The test actually found the
error, but it is a code problem.
I would record errors where they are located.
For example, if a test case has a defect, that would be a test defect. If the test ran, and an error was found in the application being tested, that would be a code or design defect, depending on where the problem actually originated.
If an IDE identifies an error that
would prevent compilation before
actually compiling, should that be
identified? If so, should it be
identified and tracked as a compile
error or a code error?
If the IDE identifies a syntax error, it is the same as you actually spotting the error before execution. Using an IDE properly, there are few excuses for letting defects that would affect execution (as in, cause errors to the execution of the application other than logic/implementation errors) through.

Related

Testing specific text on page render a code smell?

Occasionally, I'll find tests written by others that relies on specific text to be on a page (ie. a success message, an empty warning, etc.)
I find these distasteful, and usually will replace them with either a test for a specific selector (ie. #success-message or .error) or an I18n value (ie. I18n.t('foobar.success') or I18n.t('form.error.missing_error'))
The latter seems more future proof, since if the copy changes then my tests won't fail. However, some have argued that if you accidentally change the message, then it won't be caught as a failure.
Is there a standard practice when utilizing these sorts of things that I'm not aware of?
I do both in my tests: check selectors and check copy. Its good if tests fail due to copy changes. Its a bit more maintenance but the people who change copy should also be running tests or you should have a continuous integration setup to notify immediately if tests fail.
It gets a bit more complicated if you start running a/b or multivariant tests but it isn't insurmountable...Like I said, just more maintenance.
IMO, the tradeoff of maintenance vs confidence in code coverage is well worth it.

PC Lint for incremental build - with error from latest code

I am having a CI setup with incremental build. As part of the static checking, I am planning to configure a incremental PC Lint report - This report ignoring all other previous Lint report should provide the errors induced in the new code only. Is there any tool which would do this?
Any hint on the relative area to explore would help us.
I tried report diff'ing. But since the line number would vary from last check-in, would not get the actual incremental error introduced.
I am using Linux for my project build, and using Windows for PC Lint report generation.
Regards,
Wouldn't it be easier to just fix all the reported errors, and have a strict policy against creating new ones? That way you don't need to worry about diffs which by the nature of the problem is going to be hard to impossible.
You could write a script that takes the warnings from lint, removes the line numbers and adds a few lines from the source code around where the warning occurs. Diffing this would show all new lint warnings. One flaw in this, is that it would also show any warnings where source was modified near an existing warning without fixing the warning. On the other hand, this might actually be useful.
Years ago, I saw a utility on BDS Unix that would take your compiler errors and stuff them into your source code as comments. Which might be useful for this exercise. Unfortunately I can't remember what it was called.

Portland group FORTRAN pgf90 program fails when compiled with -fast, succeeds with -fast -Mnounroll

This code hummed along merrily for a long time, until we recently discovered an edge case where it fails silently-- no errors returned.
The fail is apprently pretty subtle. We can get the code to run uneventfully in the edge case by:
1) compiling with any set of options that includes -traceback or debug (-g or -gopt);
2) compiling with -fast -Mnounroll;
3) compiling with optimization <2;
4) adding WRITE statements into the code to determine the location of the fail;
In other words, most of the tools useful for debugging the failure-- actually result in the failure disappearing.
I am probing for any information on failures related to loop unrolling or other optimization, and their resolution.
Thank you all in advance.
I'm not familiar with pgf (heck, it's been 10 years since I used any fortran), but here are some general suggestions for tracking down (potential) compiler bugs:
Simplify a reproducible case. I.e. try to reproduce the problem with a similar looking piece of code that has all the superfluous details removed. This is helpful because a) you'll be less hesitant to release the code publicly, and b) if someone attempts to diagnose the problem, it will be easier for them with less surrounding material.
Talk to the experts: If you have a support contract for pgf, use it! There's a support request form on their site. If not, there's a User Forums section where you might be able to post your information - someone else may have better workaround, or an employee there may be able to log your problem.
Double-check your code. Is it possible that you're relying on some sort of unspecified behavior? This is the sort of thing that would cause your program to switch behavior when changing optimization levels. I'm not saying compiler bugs are impossible, but it could be a hack in your code too.
Hope that's helpful.

Software testing advice?

Where i am working we have the following issue:
Our current test procedure is that our business analyst test the release based on their specifications/tests. If it passes these tests it is given to the quality dept where they test the new release and the entire system to check if something else was broken.
Just to mention that we outsource our development. Unfortunately the release given to us is rarely tested by the developers and thats "the relationship" we have with them these last 7 years....
As a result if the patch/release fails the tests at the functionality testing level or at the quality level with each patch given we need to test the whole thing again not just the release.
Is there a way we can prevent this from happening?
You have two options:
Separate the code into independent modules so that a patch/change in one module only means you have to re-test that one module. However, due to dependencies this is effective only to a very limited degree.
Introduce automated tests so that re-testing is not as expensive. It takes some more work at fist, but will definitely pay off in your scenario. You don't have to do unit test or TDD - integration tests based on capture-replay tools are often easier to introduce in your scenario (established project with manual testing process).
Implement a continuous testing framework that you and the developers can access. Someething like CruiseControl.Net and NUnit to automate the functional tests.
Given access, they'll be able to see nightly tests on the build. Heck, they don't even need to test it themselves, your tests will be being run every night (or regularly), and they'll know straight away what faults they've caused, or fixed, if any.
Define a 'Quality SLA' - namely that all unit tests must pass, all new code must have a certain level of coverage, all new code must have a certain score in some static analysis checker.
Of course anything like this can be gamed, so have regular post release debriefs where you discuss areas of concern and put in place contingency to avoid it in future.
Implement GO server with Dashboard and handle with GO Agent GUI at your end.
http://www.thoughtworks-studios.com/forms/form/go/downloadlink text

How would one go about testing an interpreter or a compiler?

I've been experimenting with creating an interpreter for Brainfuck, and while quite simple to make and get up and running, part of me wants to be able to run tests against it. I can't seem to fathom how many tests one might have to write to test all the possible instruction combinations to ensure that the implementation is proper.
Obviously, with Brainfuck, the instruction set is small, but I can't help but think that as more instructions are added, your test code would grow exponentially. More so than your typical tests at any rate.
Now, I'm about as newbie as you can get in terms of writing compilers and interpreters, so my assumptions could very well be way off base.
Basically, where do you even begin with testing on something like this?
Testing a compiler is a little different from testing some other kinds of apps, because it's OK for the compiler to produce different assembly-code versions of a program as long as they all do the right thing. However, if you're just testing an interpreter, it's pretty much the same as any other text-based application. Here is a Unix-centric view:
You will want to build up a regression test suite. Each test should have
Source code you will interpret, say test001.bf
Standard input to the program you will interpret, say test001.0
What you expect the interpreter to produce on standard output, say test001.1
What you expect the interpreter to produce on standard error, say test001.2 (you care about standard error because you want to test your interpreter's error messages)
You will need a "run test" script that does something like the following
function fail {
echo "Unexpected differences on $1:"
diff $2 $3
exit 1
}
for testname
do
tmp1=$(tempfile)
tmp2=$(tempfile)
brainfuck $testname.bf < $testname.0 > $tmp1 2> $tmp2
[ cmp -s $testname.1 $tmp1 ] || fail "stdout" $testname.1 $tmp1
[ cmp -s $testname.2 $tmp2 ] || fail "stderr" $testname.2 $tmp2
done
You will find it helpful to have a "create test" script that does something like
brainfuck $testname.bf < $testname.0 > $testname.1 2> $testname.2
You run this only when you're totally confident that the interpreter works for that case.
You keep your test suite under source control.
It's convenient to embellish your test script so you can leave out files that are expected to be empty.
Any time anything changes, you re-run all the tests. You probably also re-run them all nightly via a cron job.
Finally, you want to add enough tests to get good test coverage of your compiler's source code. The quality of coverage tools varies widely, but GNU Gcov is an adequate coverage tool.
Good luck with your interpreter! If you want to see a lovingly crafted but not very well documented testing infrastructure, go look at the test2 directory for the Quick C-- compiler.
I don't think there's anything 'special' about testing a compiler; in a sense it's almost easier than testing some programs, since a compiler has such a basic high-level summary - you hand in source, it gives you back (possibly) compiled code and (possibly) a set of diagnostic messages.
Like any complex software entity, there will be many code paths, but since it's all very data-oriented (text in, text and bytes out) it's straightforward to author tests.
I’ve written an article on compiler testing, the original conclusion of which (slightly toned down for publication) was: It’s morally wrong to reinvent the wheel. Unless you already know all about the preexisting solutions and have a very good reason for ignoring them, you should start by looking at the tools that already exist. The easiest place to start is Gnu C Torture, but bear in mind that it’s based on Deja Gnu, which has, shall we say, issues. (It took me six attempts even to get the maintainer to allow a critical bug report about the Hello World example onto the mailing list.)
I’ll immodestly suggest that you look at the following as a starting place for tools to investigate:
Software: Practice and Experience April 2007. (Payware, not available to the general public---free preprint at http://pobox.com/~flash/Practical_Testing_of_C99.pdf.
http://en.wikipedia.org/wiki/Compiler_correctness#Testing (Largely written by me.)
Compiler testing bibliography (Please let me know of any updates I’ve missed.)
In the case of brainfuck, I think testing it should be done with brainfuck scripts. I would test the following, though:
1: Are all the cells initialized to 0
2: What happens when you decrement the data pointer when it's currently pointing to the first cell? Does it wrap? Does it point to invalid memory?
3: What happens when you increment the data pointer when it's pointing at the last cell? Does it wrap? Does it point to invalid memory
4: Does output function correctly
5: Does input function correctly
6: Does the [ ] stuff work correctly
7: What happens when you increment a byte more than 255 times, does it wrap to 0 properly, or is it incorrectly treated as an integer or other value.
More tests are possible too, but this is probably where i'd start. I wrote a BF compiler a few years ago, and that had a few extra tests. Particularly I tested the [ ] stuff heavily, by having a lot of code inside the block, since an early version of my code generator had issues there (on x86 using a jxx I had issues when the block produced more than 128 bytes or so of code, resulting in invalid x86 asm).
You can test with some already written apps.
The secret is to:
Separate the concerns
Observe the law of Demeter
Inject your dependencies
Well, software that is hard to test is a sign that the developer wrote it like it's 1985. Sorry to say that, but utilizing the three principles I presented here, even line numbered BASIC would be unit testable (it IS possible to inject dependencies into BASIC, because you can do "goto variable".