How test-cases should look like - testing

Let us say I want to make a boundary-value test case, I prepare and do the testing but how do I write the test-cases in a nice way? Is there any standardized way of doing this?

For writing test cases for boundary value analysis always take into consideration test data and on boundary less than boundary and greater than boundary.
Let us take an example:
Number range allowed is between 1 to 1000 then, following would be the conditions to be taken in considerations
1) Test cases with test data exactly as the input boundaries of input domain i.e. values 1 and 1000 in our case.
2) Test data with values just below the extreme edges of input domains i.e. values 0 and 999.
3) Test data with values just above the extreme edges of input domain i.e. values 2 and 1001.
Orelse u could also take 0 1 999 1000 1001
Boundary value analysis is often called as a part of stress and negative testing.
Note: There is no hard-and-fast rule to test only one value from each equivalence class you created for input domains. You can select multiple valid and invalid values from each equivalence class according to your needs and previous judgments.

Related

How to sample rows from a table with a specific probability?

I'm using BigQuery at my new position, and I'm totally new to SQL/BigQuery.
I'm testing a machine learning model and monitoring an A/B test with a different ratio, e.g., 3 vs. 10. To compare the A/B results, e.g., # of page view, I want to make the ratios equal first so that I can compare easily. For example, say we have a table with 13 records (3 are from A and 10 are from B). In addition, each row contains an id field that is identical. What I want to do is to extract only 3 samples out of 10 for B to match the sample number to A.
I'm trying to use the FARM_FINGERPRINT function to map fields to integers. Then I'm taking ABS and then calculating MOD to convert the integer numbers to a specific range, e.g., [0, 10). Eventually, I would like to get 3 in 10 items using the following line:
MOD(ABS(FARM_FINGERPRINT(field)), 10) < 3
However, I found that even if I run A/B with exactly the same ML model with different A/B ratio, the result is different between A and B (The results should be same because A and B are running the same ML model with just the different ratio). This made me doubt that the above implementation may bring some biased data sampling. I also read this post and confirmed the FARM_FINGERPRINT might not bring a randomly distributed result.
*There's a critical reason why I cannot simply multiply 3/10 to B, which is confidential and cannot disclose here.
Is there a better way to accomplish the equally distributed sampling?
Thank you in advance. (I'm sorry if the question is vague, as I'm hiding the confidential parts.)

SQL Edit Distance: How have you handled Fuzzy String Matching using SQL in the past?

I have always wanted to ask for your views on this topic, so here we go:
My team just provided me with a list of customer accounts we need to match with other databases and the main challenge we face is the fact our list is non-standarized so we call similarly but differently the same accounts than in our databases. For example:
My_List.Customers_Name Customers_Database.Customers_Name
- -
Charles Schwab Charles Schwab Corporation
So for example, I use Jaro Wrinkler Similarity function and Edit Distance in order to gather a list of similar strings and then manually match the accounts if needed. My question is:
Which rules/filters do you apply to the results of the fuzzy data matching in order to reduce the amount of manual match?
I am refering to rules like:
If the string has more than 20 characters and Edit Distance <= 1 then it will probably be the same so consider it a match. If the string has less than 4 characters and Edit Distance >0 then it will probably not be the same account so consider it a mismatch.
These rules I apply are completely made up from my side, I am wondering if there are some standard convention for applying text string fuzzy matching in order to only retrieve useful results and reduce manual match workload.
If there are not, could you tell your experience and how you handled this before?
Thank you so much
I've done this a few times. It's hugely dependent on the data sets, and the rules change every time.
My process is:
select a random set of sample records to check my rule set - large enough to be representative, small enough to be able to scan visually.
create a "match" table with "original", "match" and "confidence score" columns.
write the rules, as "insert" or "update" statements to create records in the "match" table
run the rules on my sample data set
evaluate the matches on the samples. Tweak, add, configure the rules.
rinse & repeat
The "rules" depend hugely on the data set. Commonly I use the following:
strip out punctuation
apply common substitutions (e.g. "Corp" becomes "Corporation")
split into separate words; apply fraction of each exact match out of 10 (so "Charles Schwab" matching "Charles Schwab Corporeation" would be 2/3 = 7 points, "HSBC" matching "HSBC" is 1/1 = 10 points
split into separate words; apply fraction of each close match out of 5 (so "Chls Schwab" matching "Charles Schwab Corporation" would be 2/3 = 3 points, "HSBC" matching "HSCB" is 1/1 = 5 points)

Creating Test cases using Decision Table Method

Assume you are a test analyst working on a banking project to upgrade an existing automated teller machine system to allow customers to obtain cash advances from supported credit cards. The system should allow cash advances from 20 dollars to 500 dollars, inclusively, for all supported credit cards. The correct list of supported credit cards is American Express, Visa, Japan Credit Bank, Eurocard, and MasterCard. The user interface starts with a default amount of 100 dollars for advances, and the ATM keypad is used to increase or decrease that amount in 20-dollar increments.
Consider the decision table shown in table 1.0 that describes the handling of these transactions.
Table 1.0. Cash advance decision table
Check the table in attached image
Assume that you want to design a set of test cases where the following coverage is achieved:
Decision table coverage
Boundary values for allowed and disallowed advance amounts
Successful advance for each supported card
Design a set of test cases that achieves this level of coverage with the minimum possible number of test cases. Assume each test case consists of a single combination of conditions to create and a single combination of actions to check. How many test cases do you need?
Can someone help me understanding this problem and solution?
Thanks in Advance :-)
Decision table coverage Boundary values for allowed and disallowed advance amounts ->
Boundary values for your example will be: less than 0; 0; 20-500; 500+.
Equivalence partitioning, boundary value testing and decision table described here: http://www.maniuk.net/search/label/test%20design%20technique
Successful advance for each supported card ->
Set of instruction number 5 (in decision table) should be applied for all types of provided cars. Depends on risks #4 should be tested too.
Design a set of test cases that achieves this level of coverage with the minimum possible number of test cases. -->
a. If we can assume that cards work totally the same with the same limits and processing procedures so 9 test cases needed, during boundaries testing you can use different cards, so each card can be used. b. If we assume that some specific still exists in processing so 13 test cases needed (9 from previous test + 4 other cards to test instruction #5.
c. If cards has different limits by themselves addition verification will be needed.

why the boundary test requires 3 values?

In boundary testing, suppose the correct boundary is salary < 20000,then the guideline suggests for setting the test cases at boundary for 19999, 20000, 200001 to locate the defect. if it wrongly implemented as <= 20000, then the failure can be identifying by the 20000 test case.
The problem is the defect can be found by using 19999 and 20000 test case already (2 values), why the guideline suggest for using 3 values in boundary? what is the usage of the third values? is it necessary?
Usually, input values at extreme ends of input criteria cause more errors in system. Hence, we test the lower and upper boundaries as part of stress/negative testing. It is a general practise in software testing to have minimum of 3 tests to check boundary values. However, it is not mandatory. In your case, if the test fails in second test, you would not execute the third test as there is no need for it. Once the bug is fixed, then you will test all three values to ensure everything is as per the requirement. Software testing relies on tester's judgement and you should follow the usual guidelines but nothing is mandatory.

What is difference between equivalence class testing and input domain partitioning?

I'm learning software testing now, just wondering what is difference between equivalence class testing and input domain partitioning, seems like both of them about to partition input domain.
Frankly saing, during my career as software testing engineer I haven't met a lot of mentions about input domain partitions.
But nevertheless this term exists and let's try to take a look is there a difference between equivalence class testing and input domain partitioning?
Equivalence class technique divides possible test data for, let's say application module, into partitions of equivalent data. They're "equivalent" because any member of that partition can perfectly represent the other member of that partition, and theoretically you need only one test using one of the partitions' members in order to make testing of that partition enough sufficient. Moreover the partitions should not overlap.
Yes I know, that's a little bit cumbersome, but let's take a look on the example: you have an input field on the web page which accepts all kind of chars but up to 256 of them. It gives you following equivalence partitions (simplified):
Char types:
only letters
only numbers
only special chars
mixed chars (letters + numbers + spec. chars)
Char quantity:
0
>0
<256
256
Each of that equivalence partitions has sub-partitions, e.g. "letters":
Big letters
Small letters
Mixed letters
That means that in order to sufficiently test "letters partitions" you have to design test case which will include at least one of those sub-partitions. Let's say it will be "letters -> Big letters": "TEST INPUT STRING". Take a look that here we've also combined our test string with "Char quantity - >0" equivalence partition.
So basicly saying combining sub-partitions of "Char types" and "Char quantities" partitions, you'll be able to design a minimum test set for testing input data of that field.
From the other side input domain for a program contains all the possible inputs to that program which is farely equal to equivalence classes of possible inputs of the application module.
Sometimes the ones who speak about input domain for a program, say also about regions which is the same thing as sub-partition of equivalence partitions. Moreover those input domains (and accordingly regions) must not overlap (so must they not within equivalence partition testing).
With all that said I would consider those two terms as ones, that describe the same matter but using different words.