Combinations using FOREACH in Pig - apache-pig

I would like to generate combinations in pig with the help of FOREACH. Is there any possible way to do this ?
My Input:
A
B
C
Objective:
A,B
A,C
B,C
Here's the sample which I have tried. This sample shows " Syntax error, unexpected symbol at or near '$0' ".
A = load '/test';
B = foreach A generate $0;
Combination = Cross A, B;
Combination_Filter = foreach Combination generate $0 < $1;
Please help me in resolving this. Thanks in Advance

Can you try the below options?
input:
A
B
C
Option1:
A = LOAD 'input' AS(f1:chararray);
B = LOAD 'input' AS(f2:chararray);
C = CROSS A,B;
D = FILTER C BY A::f1 < B::f2;
DUMP D;
Option2:
A = LOAD 'input' AS (f1:chararray);
B = FOREACH A GENERATE f1 AS (f2:chararray);
C = CROSS A,B;
D = FILTER C BY A::f1 < B::f2;
DUMP D;
Output:
(A,B)
(A,C)
(B,C)

It is not possible to do that using only foreach, the only way to achieve something like that would be with Sivasakthi's answer, or using a custom UDF. You can put all the registers in a bag with a group all, and then run the UDF.
The UDF is in this other question: How to turn (A, B, C) into (AB, AC, BC) with Pig?
The code would be something like:
A = load '/test';
A_grouped = group A all;
A_combinations = foreach A_grouped generate CombinationsUDF(A);

Related

Power BI while loop

I'm trying to do a while loop in Power BI M language. But all of the logic are all over my head!
How would you translate a very simple loop like this into M language?
while X == True:
do abcdef
if Y == True:
end
Thanks very much!
Loops in M are probably best handled with the List.Generate function.
This article does a pretty good job at explaining how it works:
https://potyarkin.ml/posts/2017/loops-in-power-query-m-language/
Using this function, let's look at a more specific implementation of a while loop, say to find Fibonacci numbers less than 1000.
a = 1
b = 1
while b < 1000
b = a + b
a = b - a
would translate to M something like this:
let
data =
List.Generate(
() => [ a = 1, b = 1 ],
each [b] < 1000,
each [ b = [a] + [b], a = [b] ]
),
output = Table.FromRecords(data)[a]
in output
I'm not sure the best way to handle your break condition Y. It might depend on the specific problem.

Name the output of an expression in Tensorflow

I wonder whether it's possible to give a name to the output of a certain expression to retrieve it from the graph at another part of the code.
For example
def A(a, b):
c = a + b
d = d * a
return d
For debug purposes it would be nice if I could pull out c at another position without returning it through the entire function hierarchy.
Any ideas?
Thanks in advance!
I'm assuming a and b are tensors.
Either you give a name to c using tf.identity
def A(a, b):
c = a + b
c = tf.identity(c, name = "c")
d = d * a
return d
Either you use the tf.add operation instead of +:
def A(a, b):
tf.add(a, b, name = "c")
d = d * a
return d
Either way, you get retrieve c with tf.get_variable('c:0') (You might need to precise the scope if any.)

Pig hadoop returning 0's in division

I am trying to divide H by AB for each line. H / AB in the below line way below divies but produces an out of all ZEROS. I am really confused.
sum_of_scores = FOREACH final_group GENERATE group AS id,
SUM(s.AB) AS AB,
SUM(s.H) AS H;
final_final = FOREACH sum_of_scores GENERATE $0 AS month_state, $1 AS AB, $2 AS H;
dump final_final
out_put = FOREACH final_final GENERATE month_state, (H / AB) AS score;
dump out_put
It appears as though the expression (H / AB) is using integer division, so the arguments should first make use of the cast operators to convert to float, for example.
To expand upon the answer above, each column should explicitly be cast to a float:
out_put = FOREACH final_final GENERATE month_state, (FLOAT)((FLOAT)H/(FLOAT)AB) AS score;

xlrd dynamic variables python

I can make it work like this:
book = xlrd.open_workbook(Path+'infile')
sheet = book.sheet_by_index(0)
A, B, C, D = ([] for i in range (4))
A = sheet.col_values(0)
B = sheet.col_values(1)
C = sheet.col_values(2)
D = sheet.col_values(3)
but what I want is to make it work like this:
dyn_var_list = [A, B, C, D]
assert(len(sheet.row_values(0))==len(dyn_var_list))
for index, col in enumerate(sheet.row_values(0)):
dyn_var_list[index].append(col)
however, so far I can only get one value in my lists, using the code above, which is due to the usage of "(0)" after the row_values I guess, but I don't know how to resolve this as of yet.
Try
for c in range(sheet.ncols):
for r in range(sheet.nrows):
dyn_var_list[c].append(sheet.cell(r,c).value)
Here sheet.nrows gives you the number of rows in the sheet.

Multiple Condition Coverage Testing

When using the White Box method of testing called Multiple Condition Coverage, do we take all conditional statements or just the ones with multiple conditions? Now maybe the clues in the name but I'm not sure.
So if I have the following method
void someMethod()
{
if(a && b && (c || (d && e)) ) //Conditional A
{
}
if(z && q) // Conditional B
{
}
}
Do I generate the truth table for just "Conditional A", or do I also do Conditional B?
Thanks,
I might be missing something here but, the way you wrote the code in your question, conditions A and B are completely independent of each other. You therefore won't cover all of the code unless you test both conditionals.
I found the following on Multiple condition coverage. This would seem to indicate that Multiple Condition Coverage, as the name suggests, only applies to conditionals with multiple statements.
So for the following conditional:
if ((a>0)&&(b<=4)&&(c>0))
We create the following
Test Case a > 0 b <= 4 c > 0
MCC1 F F F
MCC2 F F T
MCC3 F T F
MCC4 F T T
MCC5 T F F
MCC6 T F T
MCC7 T T F
MCC8 T T T