How can I use `apply` with a function that takes multiple inputs - graphlab

I have a function that has multiple inputs, and would like to use SFrame.apply to create a new column. I can't find a way to pass two arguments into SFrame.apply.
Ideally, it would take the entry in the column as the first argument, and I would pass in a second argument. Intuitively something like...
def f(arg_1,arg_2):
return arg_1 + arg_2
sf['new_col'] = sf.apply(f,arg_2)

suppose the first argument of function f is one of the column.
Say argcolumn1 in sf, then
sf['new_col'] = sf['argcolumn1'].apply(lambda x:f(x,arg_2))
should work

Try this.
sf['new_col'] = sf.apply(lambda x : f(arg_1, arg_2))

The way i understand your question (and because none of the previous answers are marked as accepted), it seems to me that you are trying to apply a transformation using two different columns of a single SFrame, so:
As specified in the online documentation, the function you pass to the SFrame.apply method will be called for every row in the SFrame.
So you should rewrite your function to receive a single argument representing the current row, as follow:
def f(row):
return row['column_1'] + row['column_2']
sf['new_col'] = sf.apply(f)

Related

Can anyone tell me what's wrong with my code (I am a newbie in programming, pls do cooperate )

I am trying to write a code which calculates the HCF of two numbers but I am either getting a error or an empty list as my answer
I was expecting the HCF, My idea was to get the factors of the 2 given numbers and then find the common amongst them then take the max out of that
For future reference, do not attach screenshots. Instead, copy your code and put it into a code block because stack overflow supports code blocks. To start a code block, write three tildes like ``` and to end it write three more tildes to close. If you add a language name like python, or javascript after the first three tildes, syntax highlighting will be enabled. I would also create a more descriptive title that more accurately describes the problem at hand. It would look like so:
Title: How to print from 1-99 in python?
for i in range(1,100):
print(i)
To answer your question, it seems that your HCF list is empty, and the python max function expects the argument to the function to not to be empty (the 'arg' is the HCF list). From inspection of your code, this is because the two if conditions that need to be satisfied before anything is added to HCF is never satisfied.
So it could be that hcf2[x] is not in hcf and hcf[x] is not in hcf[x] 2.
What I would do is extract the logic for the finding of the factors of each number to a function, then use built in python functions to find the common elements between the lists. Like so:
num1 = int(input("Num 1:")) # inputs
num2 = int(input("Num 2:")) # inputs
numberOneFactors = []
numberTwoFactors = []
commonFactors = []
# defining a function that finds the factors and returns it as a list
def findFactors(number):
temp = []
for i in range(1, number+1):
if number%i==0:
temp.append(i)
return temp
numberOneFactors = findFactors(num1) # populating factors 1 list
numberTwoFactors = findFactors(num2) # populating factors 2 list
# to find common factors we can use the inbuilt python set functions.
commonFactors = list(set(numberOneFactors).intersection(numberTwoFactors))
# the intersection method finds the common elements in a set.

How do I access dataframe column value within udf via scala

I am attempting to add a column to a dataframe, using a value from a specific column—-let’s assume it’s an id—-to look up its actual value from another df.
So I set up a lookup def
def lookup(id:String): String {
return lookupdf.select(“value”)
.where(s”id = ‘$id’”).as[String].first
}
The lookup def works if I test it on its own by passing an id string, it returns the corresponding value.
But I’m having a hard time finding a way to use it within the “withColumn” function.
dataDf
.withColumn(“lookupVal”, lit(lookup(col(“someId”))))
It properly complains that I’m passing in a column, instead of the expected string, the question is how do I give it the actual value from that column?
You cannot access another dataframe from withColumn . Think of withColumn can only access data at a single record level of the dataDf
Please use a join like
val resultDf = lookupDf.select(“value”,"id")
.join(dataDf, lookupDf("id") == dataDf("id"), "right")

Reading in non-consecutive columns using XLSX.gettable?

Is there a way to read in a selection of non-consecutive columns of Excel data using XLSX.gettable? I’ve read the documentation here XLSX.jl Tutorial, but it’s not clear whether it’s possible to do this. For example,
df = DataFrame(XLSX.gettable(sheet,"A:B")...)
selects the data in columns “A” and “B” of a worksheet called sheet. But what if I want columns A and C, for example? I tried
df = DataFrame(XLSX.gettable(sheet,["A","C"])...)
and similar variations of this, but it throws the following error: MethodError: no method matching gettable(::XLSX.Worksheet, ::Array{String,1}).
Is there a way to make this work with gettable, or is there a similar function which can accomplish this?
I don't think this is possible with the current version of XLSX.jl:
If you look at the definition of gettable here you'll see that it calls
eachtablerow(sheet, cols;...)
which is defined here as accepting Union{ColumnRange, AbstractString} as input for the cols argument. The cols argument itself is converted to a ColumnRange object in the eachtablerow function, which is defined here as:
struct ColumnRange
start::Int # column number
stop::Int # column number
function ColumnRange(a::Int, b::Int)
#assert a <= b "Invalid ColumnRange. Start column must be located before end column."
return new(a, b)
end
end
So it looks to me like only consecutive columns are working.
To get around this you should be able to just broadcast the gettable function over your column ranges and then concatenate the resulting DataFrames:
df = reduce(hcat, DataFrame.(XLSX.gettable.(sheet, ["A:B", "D:E"])))
I found that to get #Nils Gudat's answer to work you need to add the ... operator to give
reduce(hcat, [DataFrame(XLSX.gettable(sheet, x)...) for x in ["A:B", "D:E"]])

Pandas - Apply function and generate more than one row with lambda function

This apply function works but I don't think its efficient;
xyz = data.apply(lambda row: pd.Series({"z":getNVC(row)[0],"y":getNVC(row)[1],"x":getNVC(row)[2]}),axis=1)
So I basically want to apply the NVC function once per row and return an np.array which has 3 elements. I then map these 3 elements to new columns x,y and z. However, I think at the moment I am calling the function 3 times?
Ideally I would like to just call it once, save in the output in a variable, say output and unpack the three elements into the columns. The allocation would probably be something like;
pd.Series({"z":output[0],"y":output[1],"x":output[2]})
Going purely by creating the dict (that is input to Series), while calling getNVC only once, the following may work:
pd.Series( dict(zip("zyx", getNVC(row))) )

How to get Elemwise{tanh,no_inplace}.0 value

I am using Deep learning Theano. How can I see the content of a variable like this: Elemwise{tanh,no_inplace}.0. It is the input data of logistic layer.
Suppose your variable is called t. Then you can evaluate it by calling t.eval(). This may fail if input data are needed. In that case you need to supply them by providing a dictionary like this t.eval({input_var1: value1, input_var2: value2}). This is the ad-hoc way of evaluating a theano-expression.
The way it works in real programs is to create a function taking the necessary input, for example: f = theano.function([input_var1, input_var2], t), will yield a function that takes two input variables, calculates t from them and outputs the result.
Right now, you don't seem to print values but operations. The output Elemwise{tanh,no_inplace}.0 means, that you have an element wise operation of tanh, that is not done in place. You still need to create a function that takes input and executes your operation. Then you need to call that function and print the result. You can read more about that in the graph-structure part of their tutorial.