Can PICT handle independent parameters - testing

Can PICT (=Pairwise Independent Combinatorial Testing) handle/model independent parameters.
For example in following input a and b are independent, so they should not be combined.
Input in PICT:
a: 1, 2, 3, 4
b: 5, 6, 7, 8
//some line which models the independence: a independent of b
Output, that I would expect:
a b
1 5
2 6
3 7
4 8
This example, with only 2 parameters, of course normally would not make much sense, but it's illustrative.
The same could be applied to 3 parameters (a,b,c), where a is independent of b, but not c.
The main goal of declaring parameters as independent would be the reduce the number of tests.
I read the paper/user guide to PICT, but I didn't found any useful information.

I will answer my question by myself:
The solution is to define submodels and set the default order from 2 (=pairwise) to 1 (= no combination).
For example parameter a = {a_1, a_2, a_3} should be independent of
b = {b_1, b_2, b_3} and
c = {c_1, ..., c_4}.
Therefor I would expect 12 tests ((b x c) + a).
Resulting in the following input file:
a: 1,2,3
b: 1,2,3
c: 1,2,3,4
{b,c}#2
{b,c}#2 defines a submodel, consisting of b and c, which uses pairwise combination.
And running pict with the option: "/o:1".

In PICT you can define conditions and achieve your goal to not combine them with the rest. You can add additional option as N/A and have something like this;
If [A] = '1' Then [B] = 'N/A';
This is one possible option to handle this case.

Related

Order-independent Deep Learning Model

I have a dataset with parallel time series. The column 'A' depends on columns 'B' and 'C'. The order (and the number) of dependent columns can change. For example:
A B C
2022-07-23 1 10 100
2022-07-24 2 20 200
2022-07-25 3 30 300
How should I transform this data, or how should I build the model so the order of columns 'B' and 'C' ('A', 'B', 'C' vs 'A', C', 'B'`) doesn't change the result? I know about GCN, but I don't know how to implement it. Maybe there are other ways to achieve it.
UPDATE:
I want to generalize my question and make one more example. Let's say we have a matrix as a singe observation (no time series data):
col1 col2 target
0 1 a 20
1 2 a 30
2 3 b 30
3 4 b 40
I would like to predict one value 'target' per each row/instance. Each instance depends on other instances. The order of rows is irrelevant, and the number of rows in each observation can change.
You are looking for a permutation invariant operation on the columns.
One way of achieving this would be to apply column-wise operation, followed by a global pooling operation.
How that achieves your goal:
column-wise operations are permutation equivariant; that is, applying the operation on the columns and permuting the output, is the same as permuting the columns and then applying the operation.
A global pooling operation (e.g., max-pool, avg-pool) across the columns is permutation invariant: the result of an average pool does not depend on the order of the columns.
Applying a permutation invariant operation on top of a permutation equivariant one results in an overall permutation invariant function.
Additionally, you should look at self-attention layers, which are also permutation equivariant.
What I would try is:
Learn a representation (RNN/Transformer) for a single time series. Apply this representation to A, B and C.
Learn a transformer between the representation of A to those of B and C: that is, use the representation of A as "query" and those of B and C as "keys" and "values".
This will give you a representation of A that is permutation invariant in B and C.
Update (Aug 3rd, 2022):
For the case of "observations" with varying number of rows, and fixed number of columns:
I think you can treat each row as a "token" (with a fixed dimension = number of columns), and apply a Transformer encoder to predict the target for each "token", from the encoded tokens.

pandas create Cross-Validation based on specific columns

I have a dataframe of few hundreds rows , that can be grouped to ids as follows:
df = Val1 Val2 Val3 Id
2 2 8 b
1 2 3 a
5 7 8 z
5 1 4 a
0 9 0 c
3 1 3 b
2 7 5 z
7 2 8 c
6 5 5 d
...
5 1 8 a
4 9 0 z
1 8 2 z
I want to use GridSearchCV , but with a custom CV that will assure that all the rows from the same ID will always be on the same set.
So either all the rows if a are in the test set , or all of them are in the train set - and so for all the different IDs.
I want to have 5 folds - so 80% of the ids will go to the train and 20% to the test.
I understand that it can't guarentee that all folds will have the exact same amount of rows - since one ID might have more rows than the other.
What is the best way to do so?
As stated, you can provide cv with an iterator. You can use GroupShuffleSplit(). For example, once you use it to split your dataset, you can put the result within GridSearchCV() for the cv parameter.
As mentioned in the sklearn documentation, there's a parameter called "cv" where you can provide "An iterable yielding (train, test) splits as arrays of indices."
Do check out the documentation in future first.
As mentioned previously, GroupShuffleSplit() splits data based on group lables. However, the test sets aren't necessarily disjoint (i.e. doing multiple splits, an ID may appear in multiple test sets). If you want each ID to appear in exactly one test fold, you could use GroupKFold(). This is also available in Sklearn.model_selection, and directly extends KFold to take into account group lables.

What's the preferred way to implement Data Driven Testing using Elixir/ExUnit?

I'd like to reuse the same code of a test case for several handcrafted combinations of input data and expected outcomes, but without copypasting the code for each set. Frameworks in other languages support it in different ways, for example in Groovy/Spock:
def "maximum of two numbers"(int a, int b, int c) {
expect:
Math.max(a, b) == c
where:
a | b | c
1 | 3 | 3
7 | 4 | 4
0 | 0 | 0
}
What's the preferred way to do this in ExUnit?
Maybe ExUnit is not the best framework to this?
I guess you could do something very simple like:
test "Math.max/2" do
data = [
{1, 3, 3},
{7, 4, 4},
]
for {a, b, c} <- data do
assert Math.max(b, c) == a
end
end
Pattern matching allows you to be pretty explicit when doing these thing in my opinion. And you can keep the simplicity of just variables and assertions, all inside ExUnit.
You can also wrap test definitions with an enumerable. Dave Thomas's Blog provides an extended example.
data = [
{1, 3, 3},
{7, 4, 4},
]
for {a,b,c} <- data do
#a a
#b b
#c c
test "func "<>#a<>", "<>#b<>" equals"<>#c do
assert fun(#a,#b) == #c
end
end
For a relatively simple test like this, I'm not sure this approach is any better, but for more complex tests this can provide better reporting and you can dynamically create tests at test time by using a function as in
the blog entry.

Comparing vectors

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.
I have a data.frame M with 100,000 lines (and many columns, out of which 2 columns are relevant to this problem, I'll call it M1, M2). I have another data.frame where column V1 with about 10,000 elements is essential to this task. My task is this:
For each of the element in V1, find where does it occur in M2 and pull out the corresponding M1. I am able to do this using for-loop and it is terribly slow! I am used to Matlab and Perl and this is taking for EVER in R! Surely there's a better way. I would appreciate any valuable suggestions in accomplishing this task...
for (x in c(1:length(V$V1)) {
start[x] = M$M1[M$M2 == V$V1[x]]
}
There is only 1 element that will match, and so I can use the logical statement to directly get the element in start vector. How can I vectorize this?
Thank you!
Here is another solution using the same example by #aix.
M[match(V$V1, M$M2),]
To benchmark performance, we can use the R package rbenchmark.
library(rbenchmark)
f_ramnath = function() M[match(V$V1, M$M2),]
f_aix = function() merge(V, M, by.x='V1', by.y='M2', sort=F)
f_chase = function() M[M$M2 %in% V$V1,] # modified to return full data frame
benchmark(f_ramnath(), f_aix(), f_chase(), replications = 10000)
test replications elapsed relative
2 f_aix() 10000 12.907 7.068456
3 f_chase() 10000 2.010 1.100767
1 f_ramnath() 10000 1.826 1.000000
Another option is to use the %in% operator:
> set.seed(1)
> M <- data.frame(M1 = sample(1:20, 15, FALSE), M2 = sample(1:20, 15, FALSE))
> V <- data.frame(V1 = sample(1:20, 10, FALSE))
> M$M1[M$M2 %in% V$V1]
[1] 6 8 11 9 19 1 3 5
Sounds like you're looking for merge:
> M <- data.frame(M1=c(1,2,3,4,10,3,15), M2=c(15,6,7,8,-1,12,5))
> V <- data.frame(V1=c(-1,12,5,7))
> merge(V, M, by.x='V1', by.y='M2', sort=F)
V1 M1
1 -1 10
2 12 3
3 5 15
4 7 3
If V$V1 might contain values not present in M$M2, you may want to specify all.x=T. This will fill in the missing values with NAs instead of omitting them from the result.

Deterministic/non-deterministic state system mapping

I read in a book on non-deterministic mapping there is mapping from Q*∑ to 2Q for M=(Q,∑,trans,q0,F)
where Q is a set of states.
But I am not able to understand how it's 2Q;
if there are 3 states a, b, c, how does it map to 8 states?
I always found that the easiest way to think about these (since the set of states is finite) is as having each of those subsets be an encoding of a base-2 number that ranges from 0 (all bits zero) to 2|Q|-1 (all bits one), where there are as many bits in the number as there are members in the state set, Q. Then, you can just take one of these numbers and map it into a subset by using whether a particular bit in the number is set. Easy!
Here's a worked example where Q = {a,b,c}. In this case, |Q| is 3 (there are three elements) and so 23 is 8. That means we get this if we say that the leading bit is for element a, the next bit is for b, and the trailing bit for c:
0 = 000 = {}
1 = 001 = {c}
2 = 010 = {b}
3 = 011 = {b,c}
4 = 100 = {a}
5 = 101 = {a,c}
6 = 110 = {a,b}
7 = 111 = {a,b,c}
See? That initial three states has been transformed into 8, and we have a natural numbering of them that we could use to create the labels of those states if we chose.
Now, to the interpretations of this within a non-deterministic context. Basically, the non-determinism means that we're uncertain about what state we're in. We represent this by using a pseudo-state that is the set of “real” states that we might be in; if we have total non-determinism then we are in the pseudo-state where all real-states are possible (i.e., {a,b,c}) whereas the pseudo-state where no real-states are possible (i.e., {}) is the converse (and really ought to be impossible to reach in the transition system). In a real system, you're usually not dealing with either of those extremes.
The logic of how you convert the deterministic transition system into a non-deterministic one is rather more complex than I want to go into here. (I had to read a substantial PhD thesis to learn it so it's definitely more than an SO answer's worth!)
2Q means the set of all subsets of Q. For each state q and each letter x from sigma, there is a subset of Q states to which you can go from q with letter x. So yeah, if there are three states abc the set 2Q consists of 8 elements {{}, {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, {a,b,c}}. It doesn't map to 8 states, it maps to one of these 8 sets. HTH