Is it possible to cast a long to string in a GENERATE conditional using pig? - apache-pig

I have a bunch of longs that are sometimes string "INFINITY" or "NaN".
Assuming A is a record and B is a long:
I've tried doing...
FOREACH A GENERATE (B is not null?B:-1)
Though the above is not accurate as sometimes "B" apparently is a string.
Is there some conditional or compound conditional to check if it is not null and either 1) is not a string or 2) cast B such that i can make sure it is not null and does not start with "NaN" in a conditional?
My goal is to make it such that the long gets converted to a number (-1 if it is "NaN", or stay the same if it is not).
Describing A would show the following if exists, (or NaN if does not exist):
{
"B":28.2524232
}

Try this,
First load that long data as chararray format, then do conditions on it, then convert back into long. For example,
A = load 'data_file' as (B:chararray);
result1 = FOREACH A GENERATE (B matches '(.*)NaN(.*)'?'-1':(B matches '(.*)INFINITY(.*)'?'-1':B)) as B;
result2 = FOREACH result1 GENERATE (long)B;
Hope it should work and worked for me.

Maybe something like this :
FOREACH A GENERATE (B is not null ? (B matches 'NaN' OR B matches 'INFINITY' ? -1 : (int) B):-1)
Because of the "NaN" and the "INFINITY" pig may infer a bitearray or chararray, check the DESCRIBE as GoBrewers14 recommend.
Although you can LOAD with a schema and specify it as chararray and then convert as I did it :)
NB : the "B is not null" shouldn't be necessary but in case of ;)

Related

Kotlin. How to convert string to Int without loosing precision?

I'm trying to convert String to Int. The String can contain number as Int or as Double. But I need to convert string to Int anyway.
Here is my code:
val str = "999.13"
val number = str.toDoubleOrNull()?.roundToInt() ?: 0 // number will be 999
It works but there is one problem. If the source string will contain a very large number, for example 99999999999, then I get an incorrect number. After casting the string to a double, I lose precision.
What is the best way to perform such a manipulation without loss of precision? I would like to refrain from using BigDecimal, BigInteger etc.
Perhaps there is a more elegant solution for kotlin, please help me.
There's no way for Double and Long to hold bigger values than their largest possible values, so of course you will lose precision. That's why BigDecimal/BigInteger exist. They are the only ways to handle numbers that are bigger than the largest values Double and Long can handle, unless you want to handle parsing of the String yourself (note, you are parsing with toDoubleOrNull(), not casting).
I'm not sure why you'd want to avoid BigDecimal, but you could split the number at the decimal place, use toIntOrNull() or toLongOrNull() on the first part of the String and use toFloatOrNull() on the second part so you can round it to either 0 or 1 and add that to the first part to do the rounding:
val result = if ("." !in input)
input.toIntOrNull()
else {
val (firstPart, secondPart) = input.split(".")
val integerPart = firstPart.toIntOrNull()
integerPart?.let { it + (".$secondPart".toFloatOrNull()?.roundToInt() ?: 0) }
}
It would be a bit easier to use BigDecimal.
val result = runCatching {
BigDecimal(input).setScale(0, RoundingMode.HALF_UP).toInt()
}.getOrNull()
Both of the above would be simpler if you already know your input is valid.

Given no modulus or if even/odd function, how would one check for an odd or even number?

I have recently sat a computing exam in university in which we were never taught beforehand about the modulus function or any other check for odd/even function and we have no access to external documentation except our previous lecture notes. Is it possible to do this without these and how?
Bitwise AND (&)
Extract the last bit of the number using the bitwise AND operator. If the last bit is 1, then it's odd, else it's even. This is the simplest and most efficient way of testing it. Examples in some languages:
C / C++ / C#
bool is_even(int value) {
return (value & 1) == 0;
}
Java
public static boolean is_even(int value) {
return (value & 1) == 0;
}
Python
def is_even(value):
return (value & 1) == 0
I assume this is only for integer numbers as the concept of odd/even eludes me for floating point values.
For these integer numbers, the check of the Least Significant Bit (LSB) as proposed by Rotem is the most straightforward method, but there are many other ways to accomplish that.
For example, you could use the integer division operation as a test. This is one of the most basic operation which is implemented in virtually every platform. The result of an integer division is always another integer. For example:
>> x = int64( 13 ) ;
>> x / 2
ans =
7
Here I cast the value 13 as a int64 to make sure MATLAB treats the number as an integer instead of double data type.
Also here the result is actually rounded towards infinity to the next integral value. This is MATLAB specific implementation, other platform might round down but it does not matter for us as the only behavior we look for is the rounding, whichever way it goes. The rounding allow us to define the following behavior:
If a number is even: Dividing it by 2 will produce an exact result, such that if we multiply this result by 2, we obtain the original number.
If a number is odd: Dividing it by 2 will result in a rounded result, such that multiplying it by 2 will yield a different number than the original input.
Now you have the logic worked out, the code is pretty straightforward:
%% sample input
x = int64(42) ;
y = int64(43) ;
%% define the checking function
% uses only multiplication and division operator, no high level function
is_even = #(x) int64(x) == (int64(x)/2)*2 ;
And obvisouly, this will yield:
>> is_even(x)
ans =
1
>> is_even(y)
ans =
0
I found out from a fellow student how to solve this simplistically with maths instead of functions.
Using (-1)^n :
If n is odd then the outcome is -1
If n is even then the outcome is 1
This is some pretty out-of-the-box thinking, but it would be the only way to solve this without previous knowledge of complex functions including mod.

Possible to break out of a reduce operator in presto?

Wondering if it's possible to break out of a reduce operator in presto. Example use case:
I have a table where one column is an array of bigints, and I want to return all columns where the magnitude of the array is less than say 1000. So I could write
select
*
from table
where reduce(array_col, 0, (s,x) -> s + power(x,2), s -> if(s < power(1000,2), TRUE, FALSE))
but if there are a lot of rows and the arrays are big, this can take a while. I would like the operator to break and return FALSE as soon as the sum exceeds 1000. Currently I have:
select
*
from table
where reduce(array_col, 0, if(s >= power(1000,2), power(1000,2), s + power(x,2), s -> if(s < power(1000,2), TRUE, FALSE))
which at least saves some computation once the sum exceeds the target value, but still has to iterate through each array element.
There is no support for "break" from array reduction.
Note: technically, you may try to hack this by generating a failure (eg. 1/0) when you would want a break and catching it with try. I doubt it's worth it though.

What does comparable mean in Elm?

I'm having trouble understanding what exactly a comparable is in Elm. Elm seems as confused as I am.
On the REPL:
> f1 = (<)
<function> : comparable -> comparable -> Bool
So f1 accepts comparables.
> "a"
"a" : String
> f1 "a" "b"
True : Bool
So it seems String is comparable.
> f2 = (<) 1
<function> : comparable -> Bool
So f2 accepts a comparable.
> f2 "a"
As I infer the type of values flowing through your program, I see a conflict
between these two types:
comparable
String
So String is and is not comparable?
Why is the type of f2 not number -> Bool? What other comparables can f2 accept?
Normally when you see a type variable in a type in Elm, this variable is unconstrained. When you then supply something of a specific type, the variable gets replaced by that specific type:
-- says you have a function:
foo : a -> a -> a -> Int
-- then once you give an value with an actual type to foo, all occurences of `a` are replaced by that type:
value : Float
foo value : Float -> Float -> Int
comparable is a type variable with a built-in special meaning. That meaning is that it will only match against "comparable" types, like Int, String and a few others. But otherwise it should behave the same. So I think there is a little bug in the type system, given that you get:
> f2 "a"
As I infer the type of values flowing through your program, I see a conflict
between these two types:
comparable
String
If the bug weren't there, you would get:
> f2 "a"
As I infer the type of values flowing through your program, I see a conflict
between these two types:
Int
String
EDIT: I opened an issue for this bug
Compare any two comparable values. Comparable values include String, Char, Int, Float, Time, or a list or tuple containing comparable values. These are also the only values that work as Dict keys or Set members.
taken from the elm docs here.
In older Elm versions:
Comparable types includes numbers, characters, strings,~~
lists of comparable things, and tuples of comparable things. Note that
tuples with 7 or more elements are not comparable; why are your tuples
so big?
This means that:
[(1,"string"), (2, "another string")] : List (Int, String) -- is comparable
But having
(1, "string", True)` : (Int, String, Bool) -- or...
[(1,True), (2, False)] : List (Int, Bool ) -- are ***not comparable yet***.
This issue is discussed here
Note: Usually people encounter problems with the comparable type when they try to use a union type as a Key in a Dict.
Tags and Constructors of union types are not comparable. So the following doesn't even compile.
type SomeUnion = One | Two | Three
Dict.fromList [ (One, "one related"), (Two, "two related") ] : Dict SomeUnion String
Usually when you try to do this, there is a better approach to your data structure. But until this gets decided - an AllDict can be used.
I think this question can be related to this one. Int and String are both comparable in the sense that strings can be compared to strings and ints can be compared to ints. A function that can take any two comparables would have a signature comparable -> comparable -> ... but within any one evaluation of the function both of the comparables must be of the same type.
I believe the reason f2 is confusing above is that 1 is a number instead of a concrete type (which seems to stop the compiler from recognizing that the comparable must be of a certain type, probably should be fixed). If you were to do:
i = 4 // 2
f1 = (<) i -- type Int -> Bool
f2 = (<) "a" -- type String -> Bool
you would see it actually does collapse comparable to the correct type when it can.

What is one of the more efficient ways to search for a string in an array and get its index?

Given an enum similar to this:
Friend Enum TestValue As Int32
tstNotSet = -1
tstA = 0
tstB = 1
tstC = 2
tstD = 3
tstE = 4
tstF = 5
tstG = 6
tstH = 7
tstI = 8
tstJ = 9
tstK = 10
tstL = 11
tstM = 12
End Enum
And an Array similar to this:
Dim TestValues() As String = {"A", "B", "C", "D", "E", "F", "G",
"H", "I", "J", "K", "L", "M"}
And a string is fed in as input in some form similar to (assume it is already stored to a variable):
Dim tmpInput As String = "ADFGHJLM"
And a Sub/Method in another arbitrary class that takes in as input an array of one of more of the Enums from TestValue, based on the input string tmpInput. Basically, I want to walk the tmpInput variable, and for each character, map out its equivalent member in the Enum, so that I can pass it to this Sub in the other object. The string array TestValues and the Enum TestValue (yes, the names could be done better, but don't let that bother you too much) are laid out to match each other explicitly.
So I basically want to search the array for the matching letter, and use its index offset to know which Enum I want to map to that letter. My current code uses a large Select Case statement, but that's just ugly (although, performance tests show it to be rather speedy, even in the debug build).
The purpose of this test case is to provide an example of a mechanism I use in a project I'm working on. In this project, objects have a ReadOnly property that returns a string of letters composed from TestValues. It also has a Sub that accepts an array of one or more Enums from TestValue that sets a private member in the object that is used by the aforementioned ReadOnly property. The purpose was to store an array of smaller integer values (the Enums) rather than an array of strings for the object's internal functionality. So I had to create a way to map back and forth between the string array and the enum.
Yes, it's easily doable with the many different collections available in .NET, but I feel those are too heavyweight for my needs, as many of my objects have enums as small as two values, hence, arrays. I borrowed the trick from a similar example used in C to be able to select a string from a const array based on an index offset.
Anyways, as I've discovered, searching arrays in VB.NET is not trivial. There is apparently no simple command like TestValues.GetIndex(tmp(i)), where tmp(i) is a variable holding a single character (String, not Char), that would return say, '8' if tmp(i) was set to 'I'. Instead, methods like Array.FindIndex require using delegates/predicates, something I haven't fully wrapped my head around just yet (they seem like function pointers from C).
So what's the best way, other than constantly looping over the array for every input character, to locate the index offset based on the stored value? Is the method I highlight sane or insane (hint: it's a hold-over from VBA code)? Is there a more efficient way?
Oh, and yes, the ReadOnly property does check that the internal members are NOT set to tstNotSet before attempting to read from the TestValues array. That's why that Enum member exists.
Thanks!
EDIT: Hopefully this doesn't muddle the explanation up too much, but here's an example, as simplified as I can get it, of how the look up currently operates using the array, enum, and input string as defined above:
Dim n As Int32 = 0
Dim Foobar(0 to 12) As TestValue
For Each s As String In tmpInput
Select Case Char.ToUpper(CChar(s))
Case CChar(TestValues(tstA))
Foobar(n) = tstA
n += 1
Case CChar(TestValues(tstB))
Foobar(n) = tstB
n += 1
Case CChar(TestValues(tstC))
Foobar(n) = tstC
n += 1
Case CChar(TestValues(tstD))
Foobar(n) = tstD
n += 1
Case CChar(TestValues(tstE))
Foobar(n) = tstE
n += 1
Case CChar(TestValues(tstF))
Foobar(n) = tstF
n += 1
Case CChar(TestValues(tstG))
Foobar(n) = tstG
n += 1
Case CChar(TestValues(tstH))
Foobar(n) = tstH
n += 1
Case CChar(TestValues(tstI))
Foobar(n) = tstI
n += 1
Case CChar(TestValues(tstJ))
Foobar(n) = tstJ
n += 1
Case CChar(TestValues(tstK))
Foobar(n) = tstK
n += 1
Case CChar(TestValues(tstL))
Foobar(n) = tstL
n += 1
Case CChar(TestValues(tstM))
Foobar(n) = tstM
n += 1
End Select
Next
As noted in my comment to Jon Skeet, this construct, along with the rest of the Object's components, executes 100,000 times in a profiling loop in ~570ms (rough average of 3-5 runs).
Exchanging the above construct out with a smaller Array.IndexOf construct loops 100,000 times in ~630ms (again, 3-5 runs, rough average, the whole Object). The new construct looks like this:
Dim p As Int32
p = Array.IndexOf(TestValues, s)
If p <> tstNotSet Then
Foobar(n) = DirectCast(p, TestValue)
n += 1
End If
I'm afraid I found your question extremely hard to understand, but is Array.IndexOf what you're looking for?
Dim index = Array.IndexOf(TestValues, tmp(i))
I've got trouble tying a rope to this question. But in any kind of lookup scenario, you always want to use a Dictionary. You'll get O(1) time instead of O(n). Something like this:
Dim lookup As New Dictionary(Of Char, TestValue)
lookup.Add("A"c, TestValue.tstA)
lookup.Add("B"c, TestValue.tstB)
'' etc
You can make the initialization cleaner in many ways. Then:
Dim value As TestValue = lookup(letter)
i would say the solution by #Hans Passant is the way to go with this, but since you are dealing with chars, and chars are numbers,there is an alternative where you dont need a Dictionary.
you could store all the TestValue enum values in an array, and do something like testValueResult = testValueArray(charCode - 65),i.e. just map 'A' to index 0,'B' to 1..., or even just a direct cast from the numeric form of the TestValue to its Enum since you do define it as an integer, and include a simple bounds check for tstNotSet too.