Best way to pass array of objects to Redis Lua script - redis

Question
What is a best practice to pass an array of objects into Lua script? Is there any better way than converting objects to JSON and parse them with cjson within a script?
More context
I have streaming application which keeps it state in Redis. Every second we get 5-100 events, all operations are done within single transaction in order to boost performance like following:
RedisCommands<String, String> cmd = getOrCreateRedisClient();
cmd.multi();
for (Event event: listOfEvents) {
cmd.sadd("users", event.getUserId());
cmd.sadd("actions", event.getActionId());
cmd.incrbyfloat("users:" + event.getUserId(), event.getImpact());
}
cmd.exec();
Now I have to move this logic to Lua scripts. I suppose it also will be faster to pass an array of events to Lua script instead of making up to 100 script invocations (one per event). Am I right? What is the best way to pass list of events to Lua script?

It depends...
If your logic won't change in the future, i.e. you'll only use user id, action id, and impact of an event, you can just pass these three elements to Lua:
redis-cli --eval your-script.lua , userid1 actionid1 impact1 userid2 actionid2 impact2 userid3 actionid3 impact3
In this case, you don't need to convert an event object to JSON string, and Lua script doesn't need to parse JSON string, so it should be faster.
However, if your logic might change in the future, i.e. you might need to use other members of an event, you'd better convert an event object to a JSON string, and pass an array of JSON string to Lua script:
redis-cli --eval your-script.lua , {json1} {json2} {json3}
So that these changes will be transparent to your code, and you only need to change the Lua script.

Related

Array of objects

Let's say I want to connect to two package repositories, make a query for a package name, combine the result from the repos and process it (filter, unique, prioritize,...), What is a good way to do that?
What I though about is creating Array of two Cro::HTTP::Client objects (with base-uri specific to each repo), and when I need to make HTTP request I call #a>>.get, then process the result from the repos together.
I have attached a snippet of what I'm trying to do. But I would like to see if there is a better way to do that. or if the approach mention in the following link is suitable for this use case! https://perl6advent.wordpress.com/2013/12/08/day-08-array-based-objects/
use Cro::HTTP::Client;
class Repo {
has $.name;
has Cro::HTTP::Client $!client;
has Cro::Uri $.uri;
has Bool $.disable = False;
submethod TWEAK () {
$!client = Cro::HTTP::Client.new(base-uri => $!uri, :json);
}
method get (:$package) {
my $path = <x86_64?>;
my $resp = await $!client.get($path ~ $package);
my $json = await $resp.body;
return $json;
}
}
class AllRepos {
has Repo #.repo;
method get (:$package) {
# check if some repos are disabled
my #candidate = #!repo>>.get(:$package).unique(:with(&[eqv])).flat;
# do furthre processign of the data then return it;
return #candidate;
}
}
my $repo1 = Repo.new: name => 'repo1', uri => Cro::Uri.new(:uri<http://localhost:80>);
my $repo2 = Repo.new: name => 'repo2', uri => Cro::Uri.new(:uri<http://localhost:77>);
my #repo = $repo1, $repo2;
my $repos = AllRepos.new: :#repo;
#my #packages = $repos.get: package => 'rakudo';
Let's say I want to connect to two package repositories, make a query for a package name, combine the result from the repos and process it (filter, unique, prioritize,...), What is a good way to do that?
The code you showed looks like one good way in principle but not, currently, in practice.
The hyperoperators such as >>:
Distribute an operation (in your case, connect and make a query) ...
... to the leaves of one or two input composite data structures (in your case the elements of one array #!repo) ...
... with logically parallel semantics (by using a hyperoperator you are declaring that you are taking responsibility for thinking that the parallel invocations of the operation will not interfere with each other, which sounds reasonable for connecting and querying) ...
... and then return a resulting composite data structure with the same shape as the original structure if the hyperoperator is a unary operator (which applies in your case, because you applied >>, which is an unary operator which takes a single argument on its left, so the result of the >>.get is just a new array, just like the input #!repo) or whose shape is the hyper'd combination of the shapes of the pair of structures if the hyperoperator is a binary operator, such as >>op<< ...
... which can then be further processed (in your case it is, with .unique, which will produce a resulting Seq) ...
... whose elements you then assign back into another array (#candidate).
So your choice is a decent fit in principle, but the commitment to parallelism is only semantic and right now the Rakudo compiler never takes advantage of it, so it will actually run your code sequentially, which presumably isn't a good fit in practice.
Instead I suggest you consider:
Using map to distribute an operation over multiple elements (in a shallow manner; map doesn't recursively descend into a deep structure like the hyperoperators, deepmap etc., but that's OK for your use case) ...
... in combination with the race method which parallelizes the method it proceeds.
So you might write:
my #candidate =
#!repo.hyper.map(*.get: :$package).unique(:with(&[eqv])).flat;
Alternatively, check out task 94 in Using Perl 6.
if the approach mention in the following link is suitable for this use case! https://perl6advent.wordpress.com/2013/12/08/day-08-array-based-objects/
I don't think so. That's about constructing a general purpose container that's like an array but with some differences to the built in Array that are worth baking into a new type.
I can just about imagine such things that are vaguely related to your use case -- eg an array type that automatically hyper distributes method calls invoked on it, if they're defined on Any or Mu (rather than Array or List), i.e. does what I described above but with the code #!repo.get... instead of hyper #!repo.map: *.get .... But would it be worth it (assuming it would work -- I haven't thought about it beyond inventing the idea for this answer)? I doubt it.
More generally...
It seems like what you are looking for is cookbook like material. Perhaps a question posted at the reddit sub /r/perl6 is in order?

CEP rule to update fragments within a managed object

I need to be able to create an event processing rule that when you add a new device, you take a string value from one fragment (e.g.: c8y_Hardware.imei) and use that string to populate another fragment (e.g: c8y_Mobile.imei). So the new device would then have the same value in both c8y_Hardware.imei and c8y_Mobile.imei.
We have attempted setting up the appropriate CEP rules, but they are not working (they do compile and save).
insert into UpdateManagedObject
select
m.id as id,
{
"c8y_Mobile.imei", getString(m,"c8y_Hardware.imei")
} as fragments
from
ManagedObjectCreated as m
where
getString(m,"c8y_Hardware.imei") != "";
Any guidance on where we are messing up our syntax would be greatly appreciated.
It should be: m.managedObject.id as id.
Usually you would also get an error on compile but it can be that the streams also have an id so that it technically works in CEP. You should be able to check if it triggers on the debug stream and see the id that has been set.
Same applies for all other Cumulocity streams. The streams itself e.g. ManagedObjectCreated or AlarmUpdated etc. are not the objects directly. They have always a property like in this case managedObject or for AlarmUpdated it is alarm. This property is the actual payload.
The helper methods like getString are written in a way that you can pass either the payload or the full stream object so there it does not matter.

Snakemake: how maintain a snakelike instance value across multiple instances of the same invocation

I want to save some information within the python code that is part of my snake file, and have this information available to the python code in every instance that snakemake creates when it is running the workflow. But a separate run of the workflow should have its own separate instance of information.
For example, say I were to create a UUID in my python code, and then later use it in the python code. But I want the UUID to be the same one in all running instances of the workflow. Instead, a new UUID gets created each time an instance is started.
If I start snakemake twice at the same time, I would want each of the two runs to create their own UUID, but within each run, all instances created by the run would use the same UUID. How to do this? Is there an identifier somewhere in the snakemake object that remains the same within one run across all instances, but changes from run to run?
Here's an example that fails with a 'No rule to produce' error:
import uuid
ID = str(uuid.uuid4())
print("ID:", ID)
rule all:
output: ID
run: print("Hello world")
If instead of 'run' it uses 'shell', it works fine, so I assume that Snakemake is rerunning the snakefile code when it executes the "run" portion of the rule. How could this be modified to work, to retain the first UUID value instead of generating a second one? Also, why isn't the ID specified for output in the rule captured when the rule is first processed, without requiring a second invocation of the python code? Since it works with 'shell', the second invocation is not needed specifically for processing the "output" statement.
Indeed, when you use a run block, Snakemake will invoke itself to execute that job, meaning that it also reparses the Snakefile, generating a new UUID. The same will happen on the cluster. There are good technical reasons for doing it like this (performance, the Python GIL, restrictions with pickling, simplicity and robustness of the implementation).
I am not sure what exactly you want to achieve, but it might help to look at this: http://snakemake.readthedocs.io/en/stable/project_info/faq.html#i-want-to-pass-variables-between-rules-is-that-possible
I've found a method that seems to work: use the process group ID:
ID = str(os.getpgrp())
Multiple instances of the same pipeline have the same group ID. However, I'm not sure if this remains true on a cluster, probably not. In my case that didn't matter.

In VB.Net, how can I create a method that waits for a variable number of asynchronous calls to complete, and then returns a result?

How can I code a method in VB.Net 2012 that waits for a variable number of asynchronous calls to complete, and only when all calls finish will then return a result?
I'm writing an app that retrieves a value from various web pages, and then returns the sum of those values. The number of values to retrieve will be determined by the user at runtime. As web retrieval is asynchronous by nature, I'm trying to make the app more efficient by coding it as such. I've just read about the keywords Async and Await, which seem perfect for the job. I also found this example of how to do it in C#: Run two async tasks in parallel and collect results in .NET 4.5.
But there are two issues with this example: 1) At first glance, I don't know how to make the same thing happen in VB.Net, and 2) I don't know how it could be redesigned to handle a variable number of called tasks.
Here's a pseudo-translation from the example, of what I hope to achieve:
Function GetSumOfValues(n as Integer)
For i = 1 To n
GetValueAsync<i>.Start()
Next i
Dim result = Await Task.WhenAll(GetValueAsync<?*>)
Return result.Sum()
End Function
Note the question mark, as I'm not sure if it's possible to give WhenAll a "wildcarded" group of tasks. Perhaps with an object collection?
You can use this example of using tasks with Task.WaitAll
Now, to collect data asynchronously, you can use a static method with sync lock. Or one of the synchronized collections

Execute command block in primitive in NetLogo extension

I'm writing a primitive that takes in two agentsets and a command block. It needs to call a few functions, execute the command block in the current context, and then call another function. Here's what I have so far:
class WithContext(pushGraphContext: GraphContext => Unit, popGraphContext: api.World => GraphContext)
extends api.DefaultCommand {
override def getSyntax = commandSyntax(
Array(AgentsetType, AgentsetType, CommandBlockType))
def perform(args: Array[Argument], context: Context) {
val turtleSet = args(0).getAgentSet.requireTurtleSet
val linkSet = args(1).getAgentSet.requireLinkSet
val world = linkSet.world
val gc = new GraphContext(world, turtleSet, linkSet)
val extContext = context.asInstanceOf[ExtensionContext]
val nvmContext = extContext.nvmContext
pushGraphContext(gc)
// execute command block here
popGraphContext(world)
}
}
I looked at some examples that used nvmContext.runExclusively, but that looked like it's specifically for having a given agentset run the command block. I want the current agent (possibly the observer) to run it. Should I wrap nvm.agent in an agentset and pass that to nvmContext.runExclusively? If so, what's the easiest way to wrap an agent in agentset? If not, what should I do?
Method #1
The quicker-but-arguably-dirtier method is to use runExclusiveJob, as demonstrated in (e.g.) the create-red-turtles command in https://github.com/NetLogo/Sample-Scala-Extension/blob/master/src/SampleScalaExtension.scala .
To wrap the current agent in an agentset, you can use agent.AgentSetBuilder. (You could also pass an Array[Agent] of length 1 to one of the ArrayAgentSet constructors, but I'd recommend AgentSetBuilder since it's less reliant on internal implementation details which are likely to change.)
Method #2
The disadvantage of method #1 is the slight constant overhead associated with creating and setting up the extra AgentSet, Job, and Context objects and directing execution through them.
Creating and running a separate job isn't actually how built-in commands like if and while work. Instead of making a new job, they remain in the current job and cause commands in a command block to run (or not run) by manipulating the instruction pointer (nvm.Context.ip) to jump into them or skip over them.
I believe an extension command could do the same. I'm not certain if it has been tried before, but I can't see any reason it wouldn't work.
Doing it this way would involve understanding more about NetLogo engine internals, as documented at https://github.com/NetLogo/NetLogo/wiki/Engine-architecture . You'd model your primitive after e.g. https://github.com/NetLogo/NetLogo/blob/5.0.x/src/main/org/nlogo/prim/etc/_if.java , including altering your implementation of nvm.CustomAssembled. (Note that prim._extern, which runs extension commands, delegates its assemble method to the wrapped command's own assemble method, so this should work.) In your assemble method, instead of calling done() at the end to terminate the job, you'd just allow execution to fall through.
I could try to construct an example that works this way, but it'd take me a couple hours; it's probably not worth me doing unless there's a real need.