Can't invoke Java UDF which accepts Tuple input

Can't invoke Java UDF which accepts Tuple input - apache-pig

I can't understand the way to invoke Java UDF which accepts Tuple as input.
gsmCell = LOAD '$gsmCell' using PigStorage('\t') as
(branchId,
cellId: int,
lac: int,
lon: double,
lat: double
);
gsmCellFiltered = FILTER gsmCell BY cellId is not null and
lac is not null and
lon is not null and
lat is not null;
gsmCellFixed = FOREACH gsmCellFiltered GENERATE FLATTEN (pig.parser.GSMCellParser(* ) ) as
(cellId: int,
lac: int,
lon: double,
lat: double,
);
When I wrap input for GSMCellParser using () I get inside UDF:
Tuple(Tuple).
Pig does wraps all fields into tuple and puts it inside one more tuple.
When I try to pass a list of fields, use * or $0.. I do get exception:
sed by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045:
<line 28, column 57> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:761)
at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88)
at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:246)
What Do i do wrong?
My aim is to feed my UDF with tuple. Tuple should contain a list of fields. (i.e. size of tuple should be 4: cellid, lac, lon. lat)
UPD:
I've tried GROUP ALL:
--filter non valid records
gsmCellFiltered = FILTER gsmCell BY cellId is not null and
lac is not null and
lon is not null and
lat is not null and
azimuth is not null and
angWidth is not null;
gsmCellFilteredGrouped = GROUP gsmCellFiltered ALL;
--fix records
gsmCellFixed = FOREACH gsmCellFilteredGrouped GENERATE FLATTEN (pig.parser.GSMCellParser($1)) as
(cellId: int,
lac: int,
lon: double,
lat: double,
azimuth: double,
ppw,
midDist: double,
maxDist,
cellType: chararray,
angWidth: double,
gen: chararray,
startAngle: double
);
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045:
<line 27, column 64> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
The input schema for this UDF is: Tuple
I do't get the idea.
Tuple is an ordered set of fileds. LOAD function returns a tuple to me.
I want to pass the whole tuple to my UDF.

From the signature of the T EvalFunc<T>.eval(Tuple) method, you can see that all EvalFunc UDFs are passed a Tuple - this tuple contains all the arguments passed to the UDF.
In your case, calling GSMCellParser(*) means that the first argument of the Tuple will be the current tuple being processed (hence the tuple in a tuple).
Conceptually if you want the tuple to just contain the fields you should invoke as GSMCellParser(cellid, lac, lat, lon), then the Tuple passed to the eval func would have a schema of (int, int, double, double). This also makes your Tuple coding easier as you don't have to fish out the fields from the passed 'tuple in a tuple', rather you know that field 0 is the cellid, field 1 id the lac, etc.

Related

Create Table variable datatype that would allow to save integer/floats [SQL]

as the title states, when creating a table, when definining an variable + datatype like:
CREATE TABLE ExampleTable{
ID INTEGER,
NAME VARCHAR(200),
Integerandfloat
}
Question: You can define a variable as integer or as float etc. however, is there a datatype that can hold both values, integer as well as a float number ?

Some databases support variant data types that can have an arbitrary type. For instance, SQL Server has sql_variant.
Most databases also allow you to create your own data type (using create type). However, the power of that functionality depends on the database.
For the choice between a float and an integer, there isn't much choice. An 8-byte floating point representation covers all 4-byte integers, so you can just use a float. However, float is generally not very useful in relational databases. Fixed-point representations (numeric/decimal) are more common and might also do what you want.

Just store it using float.
Think in this way: you have two variables, one integer type (let's call it i) and another float type (let's call it f).
If you do:
i = 0.55
RESULT -> i = 0
But if you have:
f = 0.55
RESULT -> f = 0.55
In this way you can store in f also integer value:
f = 1
RESULT -> f = 1

Return tuple of lists from function

I'd like to write a function that returns two lists in Elm but I'm running into issues. It seems like the compiler cannot match the types of the empty list [].
import Html exposing (text)
main =
let
(a, b) = genList
in
text "Hello"
genList: List Float List Float
genList =
([], [])
The compiler errors are as follows:
Detected errors in 1 module.
-- TYPE MISMATCH ---------------------------------------------------------------
`genList` is being used in an unexpected way.
6| (a, b) = genList
^^^^^^^
Based on its definition, `genList` has this type:
List Float List Float
But you are trying to use it as:
( a, b )
-- TYPE MISMATCH ---------------------------------------------------------------
The definition of `genList` does not match its type annotation.
11| genList: List Float List Float
12| genList =
13| ([], [])
The type annotation for `genList` says it is a:
List Float List Float
But the definition (shown above) is a:
( List a, List b )
I haven't found any way of giving a type hint for the empty list. Checking the documentation, it doesn't go that deep:
https://guide.elm-lang.org/core_language.html
http://elm-lang.org/docs/syntax#functions

The type signature also needs the (.., ..) tuple syntax like:
genList: (List Float, List Float)
genList =
([], [])
[] is the correct syntax for generating an empty list though. If you want to know more about the List type, it's probably better to look at the docs on package.elm-lang.org. The two links you shared are more "intro guides" than comprehensive docs.

Cast sql.ColumnName to Double

I need to call scala.math.pow to calulate a number, but I'm having issues casting a column created in scala sql and cast to a double.
This is the line I use to call the power function.
scala.math.pow(pr,$”numinLinks”)
I have a spark sql data frame that has a column that I attempted to cast to a double using this UDL.
val toDouble = udf[Double, Int]( _.toDouble)
Then I called this on my data frame.
val joinDFAdjusted = join.withColumn(“numInLinks”, toDouble(joinDF(“numInLinks”)))
In the schema, it shows that my column is of StructField(numInLinks, Double, true). This is the error I receive.
found: org.apache.spark.sql.ColumnName
required: Double

Just use pow function:
import org.apache.spark.sql.functions.pow
join.withColumn("numInLinksExp", pow($"pr", $"numinLinks"))

Tagged Unions in Elm

I'm reading http://elm-lang.org/guide/model-the-problem and want to better understand Tagged Unions in Elm. Specifically I came across this example:
type Scale = Normal | Logarithmic
type Widget
= ScatterPlot (List (Int, Int))
| LogData (List String)
| TimePlot Scale (List (Time, Int))
The way I think it's interpreted is as follows:
Scale is a type with 2 possible values: Normal or Logarithmic
Widget is a type with 3 possible values: ScatterPlot, LogData, or TimePlot
However, how do I interpret the (List (Int, Int)) part in ScatterPlot? Similarly, how do I interpret the Scale (List (Time, Int)) part in TimePlot?

List is a built-in type, taking one parameter (another type) and meaning "a list containing values of this type as its elements". So List (Int, Int) is a list of (Int, Int). So what's (Int, Int)?
In general any (a, b) is a tuple with members of type a and type b. A tuple is a bit like a record without field names, so you can only distinguish elements by their position - however unlike a list the elements can be of different types. So (Int, Int) is a tuple containing two Ints, where Int is just an integer.
Thus, List (Int, Int) is a list of tuples of two integers.
With TimePlot you've actually got two different type parameters - Scale and List (Time, Int). The latter should now make sense given the explanation of List (Int, Int) - just the tuple has Time as its first type instead of Int.
So TimePlot takes two types as parameters, and it becomes a TimePlot Scale (List (Time, Int)).
In Elm and related languages, type notation (and function application) are defined such that any expression a b c d means a with parameters b, c, and d. If c d is meant to be one parameter it is put in parentheses.
As Andreas says, think of the union 'tags' as functions - they really are, in fact they're called "type constructors". TimePlot is a function taking a Scale and a List (Time, Int) and returning a Widget. Normal is a function with no parameters which returns a Scale, and so on.

Just think about them as function signatures. So Scatterplot must be created like this
ScatterPlot [(1,1), (2,2)]
and when you pattern match this in a case statement
case widget of
ScatterPlot l -> l -- l is from type (List (Int, Int))
LogData l -> l -- l is from type (List String)
TimePlot l -> l -- l is from type Scale (List (Time, Int))

What does comparable mean in Elm?

I'm having trouble understanding what exactly a comparable is in Elm. Elm seems as confused as I am.
On the REPL:
> f1 = (<)
<function> : comparable -> comparable -> Bool
So f1 accepts comparables.
> "a"
"a" : String
> f1 "a" "b"
True : Bool
So it seems String is comparable.
> f2 = (<) 1
<function> : comparable -> Bool
So f2 accepts a comparable.
> f2 "a"
As I infer the type of values flowing through your program, I see a conflict
between these two types:
comparable
String
So String is and is not comparable?
Why is the type of f2 not number -> Bool? What other comparables can f2 accept?

Normally when you see a type variable in a type in Elm, this variable is unconstrained. When you then supply something of a specific type, the variable gets replaced by that specific type:
-- says you have a function:
foo : a -> a -> a -> Int
-- then once you give an value with an actual type to foo, all occurences of `a` are replaced by that type:
value : Float
foo value : Float -> Float -> Int
comparable is a type variable with a built-in special meaning. That meaning is that it will only match against "comparable" types, like Int, String and a few others. But otherwise it should behave the same. So I think there is a little bug in the type system, given that you get:
> f2 "a"
As I infer the type of values flowing through your program, I see a conflict
between these two types:
comparable
String
If the bug weren't there, you would get:
> f2 "a"
As I infer the type of values flowing through your program, I see a conflict
between these two types:
Int
String
EDIT: I opened an issue for this bug

Compare any two comparable values. Comparable values include String, Char, Int, Float, Time, or a list or tuple containing comparable values. These are also the only values that work as Dict keys or Set members.
taken from the elm docs here.
In older Elm versions:
Comparable types includes numbers, characters, strings,~~
lists of comparable things, and tuples of comparable things. Note that
tuples with 7 or more elements are not comparable; why are your tuples
so big?
This means that:
[(1,"string"), (2, "another string")] : List (Int, String) -- is comparable
But having
(1, "string", True)` : (Int, String, Bool) -- or...
[(1,True), (2, False)] : List (Int, Bool ) -- are ***not comparable yet***.
This issue is discussed here
Note: Usually people encounter problems with the comparable type when they try to use a union type as a Key in a Dict.
Tags and Constructors of union types are not comparable. So the following doesn't even compile.
type SomeUnion = One | Two | Three
Dict.fromList [ (One, "one related"), (Two, "two related") ] : Dict SomeUnion String
Usually when you try to do this, there is a better approach to your data structure. But until this gets decided - an AllDict can be used.

I think this question can be related to this one. Int and String are both comparable in the sense that strings can be compared to strings and ints can be compared to ints. A function that can take any two comparables would have a signature comparable -> comparable -> ... but within any one evaluation of the function both of the comparables must be of the same type.
I believe the reason f2 is confusing above is that 1 is a number instead of a concrete type (which seems to stop the compiler from recognizing that the comparable must be of a certain type, probably should be fixed). If you were to do:
i = 4 // 2
f1 = (<) i -- type Int -> Bool
f2 = (<) "a" -- type String -> Bool
you would see it actually does collapse comparable to the correct type when it can.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Can't invoke Java UDF which accepts Tuple input - apache-pig

Related

Create Table variable datatype that would allow to save integer/floats [SQL]

Return tuple of lists from function

Cast sql.ColumnName to Double

Tagged Unions in Elm

What does comparable mean in Elm?

Categories

Resources