Pig hadoop returning 0's in division - apache-pig

I am trying to divide H by AB for each line. H / AB in the below line way below divies but produces an out of all ZEROS. I am really confused.
sum_of_scores = FOREACH final_group GENERATE group AS id,
SUM(s.AB) AS AB,
SUM(s.H) AS H;
final_final = FOREACH sum_of_scores GENERATE $0 AS month_state, $1 AS AB, $2 AS H;
dump final_final
out_put = FOREACH final_final GENERATE month_state, (H / AB) AS score;
dump out_put

It appears as though the expression (H / AB) is using integer division, so the arguments should first make use of the cast operators to convert to float, for example.

To expand upon the answer above, each column should explicitly be cast to a float:
out_put = FOREACH final_final GENERATE month_state, (FLOAT)((FLOAT)H/(FLOAT)AB) AS score;

Related

Just want to check if the way i have used the assignment expression in python 3.8 program is correct or not

According to Heron's formula for finding area of a triangle , if the sides of a triangle are a, b & c is :
s = (a+b+c) / 2
area =sqrt( s * (s-a) * (s-b) * (s-c)) # sqrt means square root
so for finding the area of the triangle using Heron's formula in Python, if I write code like this, will it be a valid practise? I have used assignment expression while calculating the area.
a = int(input("Enter value of first side")) # Assuming value is integer
b = int(input("Enter value of second side")) # Assuming value is integer
c = int(input("Enter value of third side")) # Assuming value is integer
area = ((s := (a+b+c) /2) *(s -a)*(s-b)*(s-c))**0.5
print("Area of the triangle is", area)
Yes, program counts correct. There is a one drawback in input: I recommend to add '\n' in input because it is uncomfortable to enter values without any space near text. In this code I fixed that. But you need to add checking if sides can make triangle.
a = int(input("Enter value of first side")) // Assuming value is integer
b = int(input("Enter value of second side")) // Assuming value is integer
c = int(input("Enter value of third side")) // Assuming value is integer
area = ((s := (a+b+c) /2) *(s -a)*(s-b)*(s-c))**0.5
print("Area of the triangle is", area)

Define the function for distance matrix in ampl. Keep getting "i is not defined"

I'm trying to set up a ampl model which clusters given points in a 2-dimensional space according to the model of Saglam et al(2005). For testing purposes I want to generate randomly some datapoints and then calculate the euclidian distance matrix for them (since I need this one). I'm aware that I could only make the distance matrix without the data points but in a later step the data points will be given and then I need to calculate the distances between each the points.
Below you'll find the code I've written so far. While loading the model I keep getting the error message "i is not defined". Since i is a subscript that should run over x1 and x1 is a parameter which is defined over the set D and have one subscript, I cannot figure out why this code should be invalid. As far as I understand, I don't have to define variables if I use them only as subscripts?
reset;
# parameters to define clustered
param m; # numbers of data points
param n; # numbers of clusters
# sets
set D := 1..m; #points to be clustered
set L := 1..n; #clusters
# randomly generate datapoints
param x1 {D} = Uniform(1,m);
param x2 {D} = Uniform(1,m);
param d {D,D} = sqrt((x1[i]-x1[j])^2 + (x2[i]-x2[j])^2);
# variables
var x {D, L} binary;
var D_l {L} >=0;
var D_max >= 0;
#minimization funcion
minimize max_clus_dis: D_max;
# constraints
subject to C1 {i in D, j in D, l in L}: D_l[l] >= d[i,j] * (x[i,l] + x[j,l] - 1);
subject to C2 {i in D}: sum{l in L} x[i,l] = 1;
subject to C3 {l in L}: D_max >= D_l[l];
So far I tried to change the line form param x1 to
param x1 {i in D, j in D} = ...
as well as
param d {x1, x2} = ...
Alas, nothing of this helped. So, any help someone can offer is deeply appreciated. I searched the web but I found nothing useful for my task.
I found eventually what was missing. The line in which I calculated the parameter d should be
param d {i in D, j in D} = sqrt((x1[i]-x1[j])^2 + (x2[i]-x2[j])^2);
Retrospectively it's clear that the subscripts i and j should have been mentioned on the line, I don't know how I could miss that.

Destructuring a list with equations in maxima

Say that I have the following list of equations:
list: [x=1, y=2, z=3];
I use this pattern often to have multiple return values from a function. Kind of of like how you would use an object, in for example, javascript. However, in javascript, I can do things like this. Say that myFunction() returns the object {x:1, y:2, z:3}, then I can destructure it with this syntax:
let {x,y,z} = myFunction();
And now x,y,z are assigned the values 1,2,3 in the current scope.
Is there anything like this in maxima? Now I use this:
x: subst(list, x);
y: subst(list, y);
z: subst(list, z);
How about this. Let l be a list of equations of the form somesymbol = somevalue. I think all you need is:
map (lhs, l) :: map (rhs, l);
Here map(lhs, l) yields the list of symbols, and map(rhs, l) yields the list of values. The operator :: means evaluate the left-hand side and assign the right-hand side to it. When the left-hand side is a list, then Maxima assigns each value on the right-hand side to the corresponding element on the left.
E.g.:
(%i1) l : [a = 12, b = 34, d = 56] $
(%i2) map (lhs, l) :: map (rhs, l);
(%o2) [12, 34, 56]
(%i3) values;
(%o3) [l, a, b, d]
(%i4) a;
(%o4) 12
(%i5) b;
(%o5) 34
(%i6) d;
(%o6) 56
You can probably achieve it and write a function that could be called as f(['x, 'y, 'z], list); but you will have to be able to make some assignments between symbols and values. This could be done by writing a tiny ad hoc Lisp function being:
(defun $assign (symb val) (set symb val))
You can see how it works (as a first test) by first typing (form within Maxima):
:lisp (defun $assign (symb val) (set symb val))
Then, use it as: assign('x, 42) which should assign the value 42 to the Maxima variable x.
If you want to go with that idea, you should write a tiny Lisp file in your ~/.maxima directory (this is a directory where you can put your most used functions); call it for instance myfuncs.lisp and put the function above (without the :lisp prefix); then edit (in the very same directory) your maxima-init.mac file, which is read at startup and add the two following things:
add a line containing load("myfuncs.lisp"); before the following part;
define your own Maxima function (in plain Maxima syntax with no need to care about Lisp). Your function should contain some kind of loop for performing all assignments; now you could use the assign(symbol, value) function for each variable.
Your function could be something like:
f(vars, l) := for i:1 thru length(l) do assign(vars[i], l[i]) $
which merely assign each value from the second argument to the corresponding symbol in the first argument.
Thus, f(['x, 'y], [1, 2]) will perform the expected assigments; of course you can start from that for doing more precisely what you need.

BigQuery UDF using BYTES datatype

I am currently trying to calculate the Hamming distance between two binary strings in BigQuery using User defined functions in Javascript, my schema is quite simple:
row_id STRING
descriptors BYTES REPEATED
phash BYTES
What I am finding a bit confusing is the fact that you apparently deal with BYTES in BigQuery as a Base64 string, I imported both functions atob() and btoa() so I would be able to work with the binary form of the byte strings instead of the Base64 representation:
My Query currently looks like this:
CREATE TEMP FUNCTION f_PHASH_distance(ph1 BYTES, ph2 BYTES)
RETURNS INT64
LANGUAGE js AS
"""
return HammingDistance(ph1, ph2);
"""
OPTIONS (
library=["gs://test.appspot.com/HammingDistance.js",
"gs://test.appspot.com/btoa_atob.js"]
);
SELECT f_PHASH_distance(phash, CAST("9Slp3g9OgVI=" AS BYTES))
FROM ims.images WHERE row_id = "2333USX"
And the row with id = "2333USX" phash is equal to "9Slp3g9OgVI=" in base64, which means that the Hamming distance is 0. But instead of 0 I am currently getting is 35 on BigQuery.
HammingDistance.js has the following content:
function HammingDistance(a, b){
var count = 0;
for(var i = 0; i < a.length; i++){
// calculate XOR between the two chars
var xor = a.charCodeAt(i) ^ b.charCodeAt(i);
// count number of 1's on the result
for(var j = 0; j < 16; j++){
//add if LSB is 1
count += xor % 2;
//right shift the variable
xor = xor >> 1;
}
}
return count;
}
/**
* Calculates the distance between two Perceptual hashes of two images encoded
* in base 64
*/
function PHASHDistance(a, b){
return HammingDistance(atob(a), atob(b));
}
And testing it in the JS console of my browser I do get the expected result. So I assume that I am doing something wrong with the casts but the documentation is very scarce on UDFs with BYTE parameters.
Any help would be much appreciated.
It looks like the problem is that you are casting "9Slp3g9OgVI=" to bytes rather than converting it to bytes from base64. I think you want this instead:
SELECT f_PHASH_distance(phash, FROM_BASE64("9Slp3g9OgVI="))
FROM ims.images WHERE row_id = "2333USX"
You might be better off using SQL functions rather than JavaScript functions, though, since JavaScript normally isn't as fast. Here's a Hamming distance implementation in SQL, assuming that the bytes have equal lengths:
#standardSQL
CREATE TEMP FUNCTION HammingDistance(b1 BYTES, b2 BYTES) AS (
BIT_COUNT(b1 ^ b2)
);
WITH Input AS (
SELECT b'defdef' AS bytes UNION ALL
SELECT b'123de4' UNION ALL
SELECT b'abc123'
)
SELECT HammingDistance(b'abcdef', bytes)
FROM Input;
It takes the bitwise XOR of the two byte values, then checks how many bits are not the same.
In case someone is looking for a solution in the case of comparing regular strings (not binary ones as this question), look at my answer here

Combinations using FOREACH in Pig

I would like to generate combinations in pig with the help of FOREACH. Is there any possible way to do this ?
My Input:
A
B
C
Objective:
A,B
A,C
B,C
Here's the sample which I have tried. This sample shows " Syntax error, unexpected symbol at or near '$0' ".
A = load '/test';
B = foreach A generate $0;
Combination = Cross A, B;
Combination_Filter = foreach Combination generate $0 < $1;
Please help me in resolving this. Thanks in Advance
Can you try the below options?
input:
A
B
C
Option1:
A = LOAD 'input' AS(f1:chararray);
B = LOAD 'input' AS(f2:chararray);
C = CROSS A,B;
D = FILTER C BY A::f1 < B::f2;
DUMP D;
Option2:
A = LOAD 'input' AS (f1:chararray);
B = FOREACH A GENERATE f1 AS (f2:chararray);
C = CROSS A,B;
D = FILTER C BY A::f1 < B::f2;
DUMP D;
Output:
(A,B)
(A,C)
(B,C)
It is not possible to do that using only foreach, the only way to achieve something like that would be with Sivasakthi's answer, or using a custom UDF. You can put all the registers in a bag with a group all, and then run the UDF.
The UDF is in this other question: How to turn (A, B, C) into (AB, AC, BC) with Pig?
The code would be something like:
A = load '/test';
A_grouped = group A all;
A_combinations = foreach A_grouped generate CombinationsUDF(A);