Saving state of closure in Groovy - serialization

I would like to use a Groovy closure to process data coming from a SQL table. For each new row, the computation would depend on what has been computed previously. However, new rows may become available on further runs of the application, so I would like to be able to reload the closure, initialised with the intermediate state it had when the closure was last executed in the previous run of the application.
For example, a closure intending to compute the moving average over 3 rows would be implemented like this:
def prev2Val = null
def prevVal = null
def prevId = null
Closure c = { row ->
println([ prev2Val, prevVal, prevId])
def latestVal = row['val']
if (prev2Val != null) {
def movMean = (prev2Val + prevVal + latestVal) / 3
sql.execute("INSERT INTO output(id, val) VALUES (?, ?)", [prevId, movMean])
}
sql.execute("UPDATE test_data SET processed=TRUE WHERE id=?", [row['id']])
prev2Val = prevVal
prevVal = latestVal
prevId = row['id']
}
test_data has 3 columns: id (auto-incremented primary key), value and processed. A moving mean is calculated based on the two previous values and inserted into the output table, against the id of the previous row. Processed rows are flagged with processed=TRUE.
If all the data was available from the start, this could be called like this:
sql.eachRow("SELECT id, val FROM test_data WHERE processed=FALSE ORDER BY id", c)
The problem comes when new rows become available after the application has already been run. This can be simulated by processing a small batch each time (e.g. using LIMIT 5 in the previous statement).
I would like to be able to dump the full state of the closure at the end of the execution of eachRow (saving the intermediate data somewhere in the database for example) and re-initialise it again when I re-run the whole application (by loading those intermediate variable from the database).
In this particular example, I can do this manually by storing the values of prev2Val, prevVal and prevId, but I'm looking for a generic solution where knowing exactly which variables are used wouldn't be necessary.
Perhaps something like c.getState() which would return [ prev2Val: 1, prevVal: 2, prevId: 6] (for example), and where I could use c.setState([ prev2Val: 1, prevVal: 2, prevId: 6]) next time the application is executed (if there is a state stored).
I would also need to exclude sql from the list. It seems this can be done using c.#sql=null.
I realise this is unlikely to work in the general case, but I'm looking for something sufficiently generic for most cases. I've tried to dehydrate, serialize and rehydrate the closure, as described in this Groovy issue, but I'm not sure how to save and store all the # fields in a single operation.
Is this possible? Is there a better way to remember state between executions, assuming the list of variables used by the closure isn't necessarily known in advance?

Not sure this will work in the long run, and you might be better returning a list containing the values to pass to the closure to get the next set of data, but you can interrogate the binding of the closure.
Given:
def closure = { row ->
a = 1
b = 2
c = 4
}
If you execute it:
closure( 1 )
You can then compose a function like:
def extractVarsFromClosure( Closure cl ) {
cl.binding.variables.findAll {
!it.key.startsWith( '_' ) && it.key != 'args'
}
}
Which when executed:
println extractVarsFromClosure( closure )
prints:
['a':1, 'b':2, 'c':4]
However, any 'free' variables defined in the local binding (without a def) will be in the closures binding too, so:
fish = 42
println extractVarsFromClosure( closure )
will print:
['a':1, 'b':2, 'c':4, 'fish':42]
But
def fish = 42
println extractVarsFromClosure( closure )
will not print the value fish

Related

Meaning of "Lua does not perform the primitive assignment." in 2.4 (concerning __newindex)

from https://www.lua.org/manual/5.3/manual.html
see section 2.4. Concerning the metamethod operation __newindex states the following quote:
__newindex: The indexing assignment table[key] = value. Like the index event, this event happens when table is not a table or when key is not
present in table. The metamethod is looked up in table.
Like with indexing, the metamethod for this event can be either a
function or a table. If it is a function, it is called with table,
key, and value as arguments. If it is a table, Lua does an indexing
assignment to this table with the same key and value. (This assignment
is regular, not raw, and therefore can trigger another metamethod.)
Whenever there is a __newindex metamethod, Lua does not perform the
primitive assignment. (If necessary, the metamethod itself can call
rawset to do the assignment.)
of that I ask what the follow specifically intends to say
"Lua does not perform the
primitive assignment. (If necessary, the metamethod itself can call
rawset to do the assignment.)"
Does this mean that if the value is a number, which is a primitive, it will not be assigned to the provided table through the metamethod event and we have to use rawget or something? This is very confusing and contradictory to me.
I want to show same examples to help you figure out this confusion.
The primitive assignment example:
local test = {}
test['x'] = 1 -- equal to rawset(test, 'x', 1)
print(test['x']) -- 1
print(rawget(test,'x')) -- 1
the primitive assignment code test['x'] = 1 equal to rawset(test, 'x', 1) when the table test have no __newindexmetamethod.
then the __newindex metamethod example:
local test = {}
setmetatable(test, {__newindex = function(t,key,value) end})
test['x'] = 1
print(test['x']) -- nil
print(rawget(test,'x')) -- nil
the assignment test['x'] = 1 will trigger to call the __newindex function.
if __newindex do nothing, then nothing happens, we will get nil result of test['x'].
If the __newindex function call rawset:
local test = {}
setmetatable(test, {
__newindex = function(t,key,value)
rawset(t,key,value) -- t:test key:'x' value:1
end})
test['x'] = 1
print(test['x']) -- 1
print(rawget(test,'x')) -- 1
the code have same effect as the first example.
So the manual say:
"Lua does not perform the primitive assignment. (If necessary, the metamethod itself can call rawset to do the assignment.)"
Then the problem is, how we can use __newindex?
It can be used to separate the old and new index in table.
local test = {y = 1}
local newtest = {}
setmetatable(test, {
__newindex =
function(t,key,value)
newtest[key] = value
end,
__index = newtest
})
test["x"] = 1
print(test['x']) -- 1
print(test['y']) -- 1
print(rawget(test, 'x')) -- nil
print(rawget(test, 'y')) -- 1
the old index 'x' and new index 'y' can all be accessed by test[key], and can be separated by rawget(test, key)

use associate array total value count Lua

i want to count the data type of each redis key, I write following code, but run error, how to fix it?
local detail = {}
detail.hash = 0
detail.set = 0
detail.string = 0
local match = redis.call('KEYS','*')
for i,v in ipairs(match) do
local val = redis.call('TYPE',v)
detail.val = detail.val + 1
end
return detail
(error) ERR Error running script (call to f_29ae9e57b4b82e2ae1d5020e418f04fcc98ebef4): #user_script:10: user_script:10: attempt to perform arithmetic on field 'val' (a nil value)
The error tells you that detail.val is nil. That means that there is no table value for key "val". Hence you are not allowed to do any arithmetic operations on it.
Problem a)
detail.val is syntactic sugar for detail["val"]. So if you expect val to be a string the correct way to use it as a table key is detail[val].
Possible problem b)
Doing a quick research I found that this redis call might return a table, not a string. So if detail[val] doesn't work check val's type.

Using a table for variable name in a table is not found when called for

I am making quite the complex thing and I am trying to use tables as variable names cause I have found that lua works with it, that is:
lua
{[{1,2}]="Meep"}
The issue is it is callable, when I do it and try to call it using the same kind of table, it won't find it.
I have tried looking for it and such but I have no clue why it won't do this.
ua
local c = {[{1,2}]="Meep"}
print(c[{1,2}],c)
Do I expect to become but it does not.
"Meep",{[{1,2}]="Meep"}
but what I get is
nil,{[{1,2}]="Meep"}
If I however try
lua
local m={1,2}
local c = {[m]="Meep"}
print(c[m],c)
it becomes the correct one, is there a way to avoid that middle man? After all m=={1,2} will return true.
The problem you have is that tables in lua are represented as references. If you compare two different talbes you are comparing those references. So the equation only gets true if the given tables are exactly the same.
t = { 1, 2, 3 }
t2 = { 1, 2, 3 }
print(t == t) -- true
print(t2 == t) -- false
print(t2 == t2) -- true
Because of this fact, you can pass them in function per reference.
function f(t)
t[1] = 5
end
t2 = { 1 }
f(t2)
print(t2[1]) -- 5
To bypass this behavior, you could (like suggested in comments) serialize the table before using it as a key.

Hive access previous row value

I have the same issue mentioned
here
However, the problem is on Hive database. When I try the solution on my table that looks like
Id Date Column1 Column2
1 01/01/2011 5 5 => Same as Column1
2 02/01/2011 2 18 => (1 + (value of Column2 from the previous row)) * (1 + (Value of Column1 from the current row)) i.e. (1+5)*(1+2)
3 03/01/2011 3 76 => (1+18)*(1+3) = 19*4
I get the error
FAILED: SemanticException Recursive cte cteCalculation detected (cycle: ctecalculation -> cteCalculation).
What is the workaround possible in this case
You will have to write a UDF for this.
Below you can see a very (!!) simplified UDF for what you need.
The idea is to store the value from the previous execution in a variable inside the UDF and each time return (stored_value+1)*(current_value+1) and then store it for the next line.
You need to take care of the first value to get, so there is a special case for that.
Also, you have to pass the data ordered to the function as it simply goes line by line and performs what you need without considering any order.
You have to add your jar and create a function, lets call it cum_mul.
The SQL will be :
select id,date,column1,cum_mul(column1) as column2
from
(select id,date,column1 from myTable order by id) a
The code for the UDF :
import org.apache.hadoop.hive.ql.exec.UDF;
public class cum_mul extends UDF {
private int prevValue;
private boolean first=true;
public int evaluate(int value) {
if (first) {
this.prevValue = value;
first = false;
return value;
}
else {
this.prevValue = (this.prevValue+1)*(value+1);
return this.prevValue;
}
}
}
Hive common table expression (CTE) works as a query level temp-table (a syntax sugar) that is accessible within the whole SQL.
Recursive query is not supported because it introduces multiple stages with massive I/O, which is something that the underlying execution and storage engine not good at. In fact, Hive strictly prohibit recursive references for CTEs and views. Hence the error you got.

Inserting default values if column value is 'None' using slick

My problem is simple.
I have a column seqNum: Double which is NOT NULL DEFAULT 1 in CREATE TABLE statement as follows:
CREATE TABLE some_table
(
...
seq_num DECIMAL(18,10) NOT NULL DEFAULT 1,
...
);
User can enter a value for seqNum or not from UI. So the accepting PLAY form is like:
case class SomeCaseClass(..., seqNum: Option[Double], ...)
val secForm = Form(mapping(
...
"seqNum" -> optional(of[Double]),
...
)(SomeCaseClass.apply)(SomeCaseClass.unapply))
The slick Table Schema & Objects looks like this:
case class SomeSection (
...
seqNum: Option[Double],
...
)
class SomeSections(tag: Tag) extends Table[SomeSection](tag, "some_table") {
def * = (
...
seqNum.?,
...
) <> (SomeSection.tupled, SomeSection.unapply _)
...
def seqNum = column[Double]("seq_num", O.NotNull, O.Default(1))
...
}
object SomeSections {
val someSections = TableQuery[SomeSections]
val autoInc = someSections returning someSections.map(_.sectionId)
def insert(s: someSection)(implicit session: Session) = {
autoInc.insert(s)
}
}
When I'm sending seqNum from UI, everything is works fine but when None is there, it breaks saying that cannot insert NULL in NOT NULL column which is correct. This question explains why.
But how to solve this problem using slick? Can't understand where should I check about None? I'm creating & sending an object of SomeSection to insert method of SomeSections Object.
I'm using sql-server, if it matters.
Using the default requires not inserting the value at all rather than inserting NULL. This means you will need a custom projection to insert to.
people.map(_.name).insert("Chris") will use defaults for all other fields. The limitations of scala's native tuple transformations and case class transformations can make this a bit of a hassle. Things like Slick's HLists, Shapeless, Scala Records or Scala XR can help, but are not trivial or very experimental at the moment.
Either you enforce the Option passed to Slick by suffixing it with a .getOrElse(theDefault), or you make the DB accepts NULL (from a None value) and defaults it using some trigger.