Boundary Value Analysis, Why does use two values inside the boundary? - testing

I can't understand why to use two values inside the boundary when using Boundary Value Analysis.
For instance, the program has the requirement: 1) Values between 1 and 100 are true, otherwise false.
func calc(x):
if (x >= 1 and x <= 100):
return True
else:
return False
A lot of books (Pressman, for instance) say you have to use the inputs 0, 1, 2, 99, 100 and 101 to test such program.
So, my question is: Why does use the inputs '2' and '99'?
I try to make a program with a fault that the test case set (0, 1, 2, 99, 100 and 101) expose a fail and the test case set (0, 1, 100, 101) does not expose it.
I can't make such program.
Could you make such program?
If not, it is a waste of resource create redundant test cases '2' and '99'.

The basic requirement is to have +-1 of the boundary values. So to test values for a range of 1-100
One test case for exact boundary values of input domains each means
1 and 100.
One test case for just below boundary value of input domains each
means 0 and 99.
One test case for just above boundary values of input domains each
means 2 and 101.
To answer your question - Why does use the inputs '2' and '99'? It is because if you are following BVA, you are checking both the limits (upper as well as lower) of the range to ensure that the software is behaving correctly. However, there are no hard and fast rules. If the range is big enough, then you should have more test points. You can also test the middle values as part of BVA.
Also, you can use Switch Case statements to create a program or multiple Ifs.

Related

Valid equivalent partitions in a range from -100 to 100?

I can't figure out this question.
A program which accepts an integer in the range -100 to +100:
1) How many valid equivalent partitions are there?
2) For which range what are minimum and maximum values?
3) Using BVA, what values need to be checked for the partitions?
So..., according to the equivalence testing, you can have a valid and invalid value. I supposed the invalid values would be anything less than -100 and greater than 100. However I can't find information about how to get equivalent partitions.
I mean, I can chose and say that it has 20 equivalent partitions, for example: -100 to -90 | -89 to 70 etc..., but: Is there a way to get this?
For the other questions: Is it possible get the previous partition so the minimum value would be -100 and the maximum -90?
Here's an example of how basic EPA & BVA applies to your data
So, practically in your case you'll have 3 values from equivalence partitioning and 4 values from boundary values analysis.
Don't forget to pay attention to 0. It's always a tricky point..
Good luck!
For Boundary value analysis, the main focus should be for the corner cases.
So for the above range, values needed to be checked for BVA are:
-101 -100 100 101 (This is as per ISTQB)
I assume that this is <-100;100> range so -100 and 100 are valid.
1) How many valid equivalent partitions are there?
Only one with any number from given range
2) For which range what are minimum and maximum values?
The minimum is -100 and the maximum is 100
3) Using BVA, what values need to be checked for the partitions?
Using BVA you have 6 values to be checked -101, - 100, -99, 99, 100 and 101 (the minumum/ the maximum and the next valid one just to find bugs in x>-100 except x>=-100, i.e. when programmer wrote x>-100 and you will check only -100 you will not find a bug, if you check also -99 you will find a bug).
1) How many valid equivalent partitions are there?
Theoretically for your range of -100 and 100 there would be three equivalent class partitions:
1) one partition having values below -100 i.e. -101,-102 etc. These are invalid values class.
2) Second partition having values between -100 and 100(including -100 and 100). These are valid values class.
3) Third partition having values greater than 100 i.e. 101,102 etc. These are invalid values class.
Now you can choose one value from each partition. For example,
1) You can choose -118 from first class(invalid class partition).
2) You can choose 70 from second class(valid class partition).
3) You can choose 170 from third class(invalid class partition).
But in my view if you want to check with more values you can do more partitions within class -100 to 100. For example you can divide it into -100 to -51, -50 to 0, 1 to 50, 51 to 100. Then you can choose one value from each of these partitions.
The main purpose of ECP is to reduce the number of test cases(test values) so if you have enough time then you can choose more than one value from each class or you can do make more classes and choose values from them.
2) For which range what are minimum and maximum values?
1) For first class minimum value cannot be described, maximum value is -101.
2) For second class minimum value is -100 and maximum value is 100.
3) For Third class minimum value is 101 and maximum value cannot be described.
3) Using BVA, what values need to be checked for the partitions?
For BVA following values need to be checked:
1) Value immediately below minimum value i.e. -101.
2) Minimum value i.e. -100.
3) Value immediately above minimum value of range i.e. -99.
4) value immediately below than maximum value of range i.e. 99.
5) Maximum value of range i.e. 100.
6) Value immediately above maximum value of range i.e. 101.
A program which accepts an integer in the range -100 to +100:
In Equivalance:
Test Scenario # Test Scenario Description Expected Outcome
Class I: values < -100 => invalid class
Class II: -100 to +100 => valid class
Class III: values > +100 => invalid class
In BVA,
1) Test cases with test data exactly as the input boundaries of input domain i.e. values -100 and +100 in our case.
2) Test data with values just below the extreme edges of input domains i.e. values -101 and +99.
3) Test data with values just above the extreme edges of input domain i.e. values -99 and +101.

How to get on input ONLY numeric characters on IJVM?

I'll do a "SIMPLE program" on IJVM, but it asks:
You must get on input ONLY numeric characters ( 0x30 to 0x39).
So if I'll insert for example (A or b or g etc.. ) it will stop with "HALT".
How can I make a condition that take the value from 0x30 to 0x39 without alphabetic characters?
You will need two separate tests.
First, test if the input is not less than 0x30.
Second, test that the input is less than 0x40.
If it meets both conditions, then it is input that you want.
Response to comment about three types of 'if':
Each conditional branch has two possible jump targets, one for when the condition is true, the other for when the condition is false.
For the n < 0 test, the TRUE address will be taken when n < 0, the FALSE address will be taken when n >= 0. The n < 0 test can also test for n >= 0, depending on the address taken.

Dataframe non-null values differ from value_counts() values

There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?
see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem

Find biggest subset that sums to zero in excel or access(vba,sql or anything)

I have a column of numbers in excel, with positives and negatives. It is an accounting book. I need to eliminate cells that sums to zero. It means I want to remove the subset, so the rest of element can not form any subset to sum to zero. I think this problem is to find the largest subset sum. By remove/eliminate, I mean to mark them in excel.
For example:
a set {1,-1,2,-2,3,-3,4,-4,5,-5,6,7,8,9},
I need a function that find subset {1,-1,2,-2,3,-3,4,-4,5,-5} and mark each element.
This suggestion may be a little heavy-handed, but it should be able to handle a broad class of problems -- like when one credit may be zeroed out by more than one debit (or vice versa) -- if that's what you want. Like you asked for, it will literally find the largest subset that sums to zero:
Enter your numbers in column A, say in the range A1:A14.
In column B, beside your numbers, enter 0 in each of the cells B1:B14. Eventually, these cells will be set to 1 if the corresponding number in column A is selected, or 0 if it isn't.
In cell C1, enter the formula =A1*B1. Copy the formula down to cells C2:C14.
At the bottom of column B, in cell B15, enter the formula =SUM(B1:B14). This formula calculates the count of your numbers that are selected.
At the bottom of column C, in cell C15, enter the formula =SUM(C1:C14). This formula calculates the sum of your numbers that are selected.
Activate the Solver Add-In and use it for the steps that follow.
Set the objective to maximize the value of cell $B$15 -- in other words, to maximize the count of your numbers that are selected (that is, to find the largest subset).
Set the following three constraints to require the values in cells B1:B14 (that indicate whether or not each of your numbers is selected) to be 0 or 1: a) $B$1:$B$14 >= 0, b) $B$1:$B$14 <= 1, and, c) $B$1:$B$14 = integer.
Set the following constraint to require the selected numbers to add up to 0: $C$15 = 0.
Use the Solver Add-In to solve the problem.
Hope this helps.
I think that you need to better define your problem because as it is currently stated there is no clear answer.
Here's why. Take this set of numbers:
{ -9, -5, -1, 6, 7, 10 }
There are 64 possible subsets - including the empty set - and of these three have zero sums:
{ -9, -1, 10 }, { -5, -1, 6 } & { }
There are two possible "biggest" zero-sum subsets.
If you remove either of these you end up with either of:
{ -5, 6, 7 } or { -9, 7, 10 }
Neither of these sum to zero, but there's no rule to determine which subset to pick.
You could decide to remove the "merged" set of zero sum subsets. This would leave you with:
{ 7 }
But does that make sense in your accounting package?
Equally you could just decide to eliminate only pairs of matching positive & negative numbers, but many transactions would involve triples (i.e. sale = cost + tax).
I'm not sure your question can be answered unless you describe your requirements more clearly.

Power-law distribution in T-SQL

I basically need the answer to this SO question that provides a power-law distribution, translated to T-SQL for me.
I want to pull a last name, one at a time, from a census provided table of names. I want to get roughly the same distribution as occurs in the population. The table has 88,799 names ranked by frequency. "Smith" is rank 1 with 1.006% frequency, "Alderink" is rank 88,799 with frequency of 1.7 x 10^-6. "Sanders" is rank 75 with a frequency of 0.100%.
The curve doesn't have to fit precisely at all. Just give me about 1% "Smith" and about 1 in a million "Alderink"
Here's what I have so far.
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank] = ROUND(88799 * RAND(), 0)
But this of course yields a uniform distribution.
I promise I'll still be trying to figure this out myself by the time a smarter person responds.
Why settle for the power-law distribution when you can draw from the actual distribution ?
I suggest you alter the LastNames table to include a numeric column which would contain a numeric value representing the actual number of indivuduals with a name that is more common. You'll probably want a number on a smaller but proportional scale, say, maybe 10,000 for each percent of representation.
The list would then look something like:
(other than the 3 names mentioned in the question, I'm guessing about White, Johnson et al)
Smith 0
White 10,060
Johnson 19,123
Williams 28,456
...
Sanders 200,987
..
Alderink 999,997
And the name selection would be
SELECT TOP 1 [LastName]
FROM [LastNames] as LN
WHERE LN.[number_described_above] < ROUND(100000 * RAND(), 0)
ORDER BY [number_described_above] DESC
That's picking the first name which number does not exceed the [uniform distribution] random number. Note how the query, uses less than and ordering in desc-ending order; this will guaranty that the very first entry (Smith) gets picked. The alternative would be to start the series with Smith at 10,060 rather than zero and to discard the random draws smaller than this value.
Aside from the matter of boundary management (starting at zero rather than 10,060) mentioned above, this solution, along with the two other responses so far, are the same as the one suggested in dmckee's answer to the question referenced in this question. Essentially the idea is to use the CDF (Cumulative Distribution function).
Edit:
If you insist on using a mathematical function rather than the actual distribution, the following should provide a power law function which would somehow convey the "long tail" shape of the real distribution. You may wan to tweak the #PwrCoef value (which BTW needn't be a integer), essentially the bigger the coeficient, the more skewed to the beginning of the list the function is.
DECLARE #PwrCoef INT
SET #PwrCoef = 2
SELECT 88799 - ROUND(POWER(POWER(88799.0, #PwrCoef) * RAND(), 1.0/#PwrCoef), 0)
Notes:
- the extra ".0" in the function above are important to force SQL to perform float operations rather than integer operations.
- the reason why we subtract the power calculation from 88799 is that the calculation's distribution is such that the closer a number is closer to the end of our scale, the more likely it is to be drawn. The List of family names being sorted in the reverse order (most likely names first), we need this substraction.
Assuming a power of, say, 3 the query would then look something like
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 88799 - ROUND(POWER(POWER(88799.0, 3) * RAND(), 1.0/3), 0)
Which is the query from the question except for the last line.
Re-Edit:
In looking at the actual distribution, as apparent in the Census data, the curve is extremely steep and would require a very big power coefficient, which in turn would cause overflows and/or extreme rounding errors in the naive formula shown above.
A more sensible approach may be to operate in several tiers i.e. to perform an equal number of draws in each of the, say, three thirds (or four quarters or...) of the cumulative distribution; within each of these parts list, we would draw using a power law function, possibly with the same coeficient, but with different ranges.
For example
Assuming thirds, the list divides as follow:
First third = 425 names, from Smith to Alvarado
Second third = 6,277 names, from to Gainer
Last third = 82,097 names, from Frisby to the end
If we were to need, say, 1,000 names, we'd draw 334 from the top third of the list, 333 from the second third and 333 from the last third.
For each of the thirds we'd use a similar formula, maybe with a bigger power coeficient for the first third (were were are really interested in favoring the earlier names in the list, and also where the relative frequencies are more statistically relevant). The three selection queries could look like the following:
-- Random Drawing of a single Name in top third
-- Power Coef = 12
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 425 - ROUND(POWER(POWER(425.0, 12) * RAND(), 1.0/12), 0)
-- Second third; Power Coef = 7
...
WHERE LN.[Rank]
= (425 + 6277) - ROUND(POWER(POWER(6277.0, 7) * RAND(), 1.0/7), 0)
-- Bottom third; Power Coef = 4
...
WHERE LN.[Rank]
= (425 + 6277 + 82097) - ROUND(POWER(POWER(82097.0, 4) * RAND(), 1.0/4), 0)
Instead of storing the pdf as rank, store the CDF (the sum of all frequencies until that name, starting from Aldekirk).
Then modify your select to retrieve the first LN with rank greater than your formula result.
I read the question as "I need to get a stream of names which will mirror the frequency of last names from the 1990 US Census"
I might have read the question a bit differently than the other suggestions and although an answer has been accepted, and a very through answer it is, I will contribute my experience with the Census last names.
I had downloaded the same data from the 1990 census. My goal was to produce a large number of names to be submitted for search testing during performance testing of a medical record app. I inserted the last names and the percentage of frequency into a table. I added a column and filled it with a integer which was the product of the "total names required * frequency". The frequency data from the census did not add up to exactly 100% so my total number of names was also a bit short of the requirement. I was able to correct the number by selecting random names from the list and increasing their count until I had exactly the required number, the randomly added count never ammounted to more than .05% of the total of 10 million.
I generated 10 million random numbers in the range of 1 to 88799. With each random number I would pick that name from the list and decrement the counter for that name. My approach was to simulate dealing a deck of cards except my deck had many more distinct cards and a varing number of each card.
Do you store the actual frequencies with the ranks?
Converting the algebra from that accepted answer to MySQL is no bother, if you know what values to use for n. y would be what you currently have ROUND(88799 * RAND(), 0) and x0,x1 = 1,88799 I think, though I might misunderstand it. The only non-standard maths operator involved from a T-SQL perspective is ^ which is just POWER(x,y) == x^y.