Optimizing Redis Lua script calls to redis.call - redis

According to this Lua optimization doc: https://www.lua.org/gems/sample.pdf
Access to external locals (that is, variables that are local to an
enclosing function) is not as fast as access to local variables, but
it is still faster than access to globals. Consider the next fragment:
function foo (x)
for i = 1, 1000000 do
x = x + math.sin(i)
end
return x
end
print(foo(10))
We can optimize it by declaring sin once, outside function foo: local
sin = math.sin
function foo (x)
for i = 1, 1000000 do
x = x + sin(i)
end
return x
end
print(foo(10))
Is redis.call a global? Can we optimize it by delcaring a local variable pointing to it? Especially for tight loops that call redis.call many times.
And as a follow up are KEYS and ARGV also globals?

Is redis.call a global?
And as a follow up are KEYS and ARGV also globals?
Yes, redis, KEYS and ARGV are globals.
Can we optimize it by delcaring a local variable pointing to it? Especially for tight loops that call redis.call many times.
According to your reference, it seems you can optimize the Lua code with local vars. However, normally you should not run too many operations in Lua script, e.g. run a large loop and execute many redis commands. Because Redis is single-threaded, if Lua script takes too much time, it will block Redis.
Normally, I'd like to assign elements in KEYS and ARGV array, to local vars. Not for efficiency, but for code clarity.

So the following three Redis commands all execute successfully.
eval "return _G.redis.call('set', 'x', 1)" 0
eval "return _G.KEYS" 1 1
eval "return _G.ARGV" 0 1
So I'm guessing they're all globals. Please someone correct me if I'm wrong...

Related

PL_strtab/SHAREKEYS and copy-on-write leak

Perl internally uses dedicated hash PL_strtab as shared storage for hash's keys, but in fork environment like apache/mod_perl this creates a big issue. Best practice says to preload modules in parent process, but nobody says it's eventually allocates memory for PL_strtab and these pages of memory tend to be implicitly modified in child processes. There are seems to be 2 major reasons of modification:
Reason 1: reallocation (hsplit()) may happen when PL_strtab growths in child process.
Reason 2: REFCNT every time new reference created.
Example below shows 16MB copy-on-write leak in attempt to use hash. Attempts to recompile perl with -DNODEFAULT_SHAREKEYS fails (https://rt.perl.org/SelfService/Display.html?id=133384). I was able to get access to PL_strtab via XS module.
Ideally I'm looking for a way to downgrade all hashes created in parent to keep hash keys within a hash (HE object) rather than PL_strtab, i.e. turn off SHAREKEYS flag. This should allow to shrink PL_strtab to minimum possible size. Ideally it should have 0 keys in parent.
Please let me know you think it's theoretically possible via XS.
#!/usr/bin/env perl
use strict;
use warnings;
use Linux::Smaps;
$SIG{CHLD} = sub { waitpid(-1, 1) };
# comment this block
{
my %h;
# pre-growth PL_strtab hash, kind of: keys %$PL_strtab = 2_000_000;
foreach my $x (1 .. 2_000_000) {
$h{$x} = undef;
}
}
my $pid = fork // die "Cannot fork: $!";
unless ($pid) {
# child
my $s = Linux::Smaps->new($$)->all;
my $before = $s->shared_clean + $s->shared_dirty;
{
my %h;
foreach my $x (1 .. 2_000_000) {
$h{$x} = undef;
}
}
my $s2 = Linux::Smaps->new($$)->all;
my $after = $s2->shared_clean + $s2->shared_dirty;
warn 'COPY-ON-WRITE: ' . ($before - $after) . ' KB';
exit 0;
}
sleep 1000;
print "DONE\n";
Note, that sample %h in parent get destroyed and not accessible in child. The only purpose of it is to preallocate more memory for PL_strtab and make copy-on-write issue more noticeable.
The problem that PL_strtab is shared data structure (not %h). It's solely controlled by Perl and there is no way to control it or use IPC::Shareable or any other well-known for me CPAN modules.
Real life example:
In apache/mod_perl, Starman or any other prefork environment everybody tries to preload as much as possible modules in parent process. Right?
If any of preloaded modules creates hash (even temporary) with big number of keys Perl silently allocates more and more memory for internal PL_strtab hash.
PL_strtab silently get touched in children on any attempt to use hashes.
Problem even worse, because huge percentage of modules we preload are CPAN modules -> there is no way to know which of them overuse hashes resulting in increased memory footprint of parent process.

Lua Spaghetti Modules

I am currently developing my own programming language. The codebase (in Lua) is composed of several modules, as follows:
The first, error.lua, has no dependancies;
lexer.lua depends only on error.lua;
prototypes.lua also has no dependancies;
parser.lua, instead, depends on all the modules above;
interpreter.lua is the fulcrum of the whole codebase. It depends on error.lua, parser.lua, and memory.lua;
memory.lua depends on functions.lua;
finally, functions.lua depends on memory.lua and interpreter.lua. It is required from inside memory.lua, so we can say that memory.lua also depends on interpreter.lua.
With "A depends on B" I mean that the functions declared in A need those declared in B.
The real problem, though, is when A depends on B which depends on A, which, as you can understand from the list above, happens quite frequently in my code.
To give a concrete example of my problem, here's how interpreter.lua looks like:
--first, I require the modules that DON'T depend on interpreter.lua
local parser, Error = table.unpack(require("parser"))
--(since error.lua is needed both in the lexer, parser and interpreter module,
--I only actually require it once in lexer.lua and then pass its result around)
--Then, I should require memory.lua. But since memory.lua and
--functions.lua need some functions from interpreter.lua to work, I just
--forward declare the variables needed from those functions and then those functions themself:
--forward declaration
local globals, new_memory, my_nil, interpret_statement
--functions I need to declare before requiring memory.lua
local function interpret_block()
--uses interpret_statement and new_memory
end
local function interpret_expresion()
--uses new_memory, Error and my_nil
end
--Now I can safely require memory.lua:
globals, new_memory, my_nil = require("memory.lua")(interpret_block, interpret_espression)
--(I'll explain why it returns a function to call later)
--Then I have to fulfill the forward declaration of interpret_executement:
function interpret_executement()
--uses interpret_expression, new_memory and Error
end
--finally, the result is a function
return function()
--uses parser, new_fuction and globals
end
The memory.lua module returns a function so that it can receive interpret_block and interpret_expression as arguments, like this:
--memory.lua
return function(interpret_block, interpret_expression)
--declaration of globals, new_memory, my_nil
return globals, new_memory, my_nil
end
Now, I got the idea of the forward declarations here and that of the functions-as-modules (like in memory.lua, to pass some functions from the requiring module to the required module) here. They're all great ideas, and I must say that they work greatly. But you pay in readability.
In fact, breaking in smaller pieces the code this time made my work harder that it would have been if I coded everything in a single file, which is impossible for me because it's over than 1000 lines of code and I'm coding from a smartphone.
The feeling I have is that of working with spaghetti code, only on a larger scale.
So how could I solve the problem of my code being ununderstandable because of some modules needing each other to work (which doesn't involve making all the variables global, of course)? How would programmers in other languages solve this problem? How should I reorganize my modules? Are there any standard rules in using Lua modules that could also help me with this problem?
If we look at your lua files as a directed graph, where a vertice points from a dependency to its usage, the goal is to modify your graph to be a tree or forest, as you intend to get rid of the cycles.
A cycle is a set of nodes, which, traversed in the direction of the vertices can reach the starting node.
Now, the question is how to get rid of cycles?
The answer looks like this:
Let's consider node N and let's consider {D1, D2, ..., Dm} as its direct dependencies. If there is no Di in that set that depends on N either directly or indirectly, then you can leave N as it is. In that case, the set of problematic dependencies looks like this: {}
However, what if you have a non-empty set, like this: {PD1, ..., PDk} ?
You then need to analyze PDi for i between 1 and k along with N and see what is the subset in each PDi that does not depend on N and what is the subset of N which does not depend on any PDi. This way you can define N_base and N, PDi_base and PDi. N depends on N_base, just like all PDi elements and PDi depends on PDi_base along with N_base.
This approach minimalizes circles in the dependency tree. However, it is quite possible that a function set of {f1, ..., fl} exists in this group which cannot be migrated into _base as discussed due to dependencies and there are still cycles. In this case you need to give a name to the group in question, create a module for it and migrate all to functions into that group.

Make interpreter execute faster

I've created an interprter for a simple language. It is AST based (to be more exact, an irregular heterogeneous AST) with visitors executing and evaluating nodes. However I've noticed that it is extremely slow compared to "real" interpreters. For testing I've ran this code:
i = 3
j = 3
has = false
while i < 10000
j = 3
has = false
while j <= i / 2
if i % j == 0 then
has = true
end
j = j+2
end
if has == false then
puts i
end
i = i+2
end
In both ruby and my interpreter (just finding primes primitively). Ruby finished under 0.63 second, and my interpreter was over 15 seconds.
I develop the interpreter in C++ and in Visual Studio, so I've used the profiler to see what takes the most time: the evaluation methods.
50% of the execution time was to call the abstract evaluation method, which then casts the passed expression and calls the proper eval method. Something like this:
Value * eval (Exp * exp)
{
switch (exp->type)
{
case EXP_ADDITION:
eval ((AdditionExp*) exp);
break;
...
}
}
I could put the eval methods into the Exp nodes themselves, but I want to keep the nodes clean (Terence Parr saied something about reusability in his book).
Also at evaluation I always reconstruct the Value object, which stores the result of the evaluated expression. Actually Value is abstract, and it has derived value classes for different types (That's why I work with pointers, to avoid object slicing at returning). I think this could be another reason of slowness.
How could I make my interpreter as optimized as possible? Should I create bytecodes out of the AST and then interpret bytecodes instead? (As far as I know, they could be much faster)
Here is the source if it helps understanding my problem: src
Note: I haven't done any error handling yet, so an illegal statement or an error will simply freeze the program. (Also sorry for the stupid "error messages" :))
The syntax is pretty simple, the currently executed file is in OTZ1core/testfiles/test.txt (which is the prime finder).
I appreciate any help I can get, I'm really beginner at compilers and interpreters.
One possibility for a speed-up would be to use a function table instead of the switch with dynamic retyping. Your call to the typed-eval is going through at least one, and possibly several, levels of indirection. If you distinguish the typed functions instead by name and give them identical signatures, then pointers to the various functions can be packed into an array and indexed by the type member.
value (*evaltab[])(Exp *) = { // the order of functions must match
Exp_Add, // the order type values
//...
};
Then the whole switch becomes:
evaltab[exp->type](exp);
1 indirection, 1 function call. Fast.

Any language: recycling Variables

This seems like a simple question to ask but, is it generally a good idea to re-use variables in any scripting language?
I'm particularly interested in the reasons why it is/isn't good practice as I'm besides my self if I should or shouldn't be doing it.
For example, $i is one of the most common variables used in a loop to iterate through things.
E.g (in php):
//This script will iterate each entry in $array
//and output it with a comma if it isn't the last item.
$i=0; //Recycling $i
foreach ($array as $v) {
$i++;
if ($i<count($array))
{
echo $v.', ';
}
else {
echo $v;
}
}
Say I had several loops in my script, would it be better to re-use $i or to just use another variable such as $a and for any other loops go from $b to $z.
Obviously to re-use $i, I'd have to set $i=0; or null before the loop, or else it would give some very weird behaviour later on in the script. Which makes me wonder if reusing it is worth the hassle..
Would the following take up more space than just using another variable?
$i=0; //Recycling $i
Of course this is the most simple example of use and I would like to know about if the script were terribly more complicated, would it cause more trouble than it's worth?
I know that re-using a variable could save a minuscule amount of memory but when those variables stack up, it gets important, doesn't it?
Thank you for any straight answers in advance and if this question is too vague, I apologize and would like to know that too- so that I know I should ask questions such as these elsewhere.
I think you have "string" confused with "variable". A string is a collection of ASCII or Unicode characters, depending on the programming language. A variable is a named location in memory. So you are talking about recycling a variable, not a string.
Unfortunately, the way that variables are dealt with is highly language and compiler specific. You will get much better information if you give us more details.
Most languages have the concept of scope, with variable space being allocated at declaration and deallocated when the scope ends. If you keep a variable in its proper scope, you won't have to worry about wasting memory, as variables will be deallocated when no longer needed. Again, the specifics of scope are language dependent.
Generally, it is a bad idea to reuse variable names like that, as when (not if) you forget to reset the variable to some initial value within the same scope, your program will break at runtime.
You should be initializing $i to 0 before entering your foreach loop, regardless of whether you used it earlier in the function, and regardless of the language you're using. It's quite simply a language-neutral best practice.
Many languages help you enforce this by allowing you to bind the counting variable to the scope of the for loop. PHP's foreach provides a similar mechanism, provided your array is 0-based and contains only numeric keys:
foreach ($array as $i => $v) {
echo $v;
if ($i + 1 < count($array))
echo ', ';
}
Also, in regards to your statement:
I know that re-using a variable could save a minuscule amount of memory but when those variables stack up, it gets important, doesn't it?
No, it really doesn't. Variables will be garbage-collected when they fall out of scope. It's virtually impossible (without programatically generating code) to use so many counting/temporary variables through the course of a single function that the memory usage is at all significant. Reusing a variable instead of declaring a new variable will have no measurable impact on your programs performance, and virtually no impact on its memory footprint.
Resetting your variable ($i=0) is not going to take more space.
When you declare a variable (say of type integer), a predefined amount of memory is allocated for that integer regardless of what its value is. By changing it back to zero, you only modify the value.
If you declare a new variable, memory will have to be allocated for it and typically it will also be initialized to zero, so resetting your existing variable probably isn't taking any extra CPU time either.
Reusing an index variable for mutually exclusive loops is a good idea; it makes your code easier to understand. If you're declaring a lot of extra variables, other people reading your code may wonder if "i" is still being used for something even after your loop has completed.
Reuse it! :)
Personally I donm't have any problem with reusing index/counter variables - although in some languages eg .Net, using a variable in a for loop specifically scopes the variable to that loop - so it's cleaned up afterwards no matter what.
You shouldn't ever re-use variables which contain any significant data (This shouldn't be an issue as all your variables should have descriptive names)
I once had to try to clean up some excel macro code written by a student - all the variable names were Greek gods and the usage swapped repeatedly throughout the script. In the end, I just re-wrote it.
You should also be aware that depending on the type of variable and the language you're using, re-allocating contents can be as expensive as creating a new variable. eg again in .Net, strings are immutable so this...
Dim MyString as String = "Something"
MyString = "SomethingElse"
Will actually allocate a new area of memory to store the 2nd usage and change the MyString pointer to point at the new location.
Incidentally, this is why the following is really inefficient (at least in .Net):
Dim SomeString = SomeVariable & " " & SomeOtherVariable
This allocates memory for SomeVariable then more memory for SomeVariable + " ", then yet again for SomeVariable + " " + SomeOtherVariable - meaning SomeVariable is actually written to memory 3 times.
In summary, very simple variables for loops I'd re-use. Anything else, I'd just assign a new variables - especially since memory allocation is usually only a tiny fraction of the time/processing of most applications.
Remember: Premature optimization is the root of all evil
This depends on the problem you are solving.
When using PHP (and other scripting languages) you should consider unset() on sizeable variables such as arrays or long strings. You shouldn't bother to re-use or unset small variables (integer/boolean/etc). It just isn't worth it when using a (slow) scripting language.
If you are developing a performance critical application in a low (lower) level language such as C then variable re-use is more important.

What is the standard way to optimise mutual recursion in F#/Scala?

These languages do not support mutually recursive functions optimization 'natively', so I guess it must be trampoline or.. heh.. rewriting as a loop) Do I miss something?
UPDATE: It seems that I did lie about FSharp, but I just didn't see an example of mutual tail-calls while googling
First of all, F# supports mutually recursive functions natively, because it can benefit from the tailcall instruction that's available in the .NET IL (MSDN). However, this is a bit tricky and may not work on some alternative implementations of .NET (e.g. Compact Frameworks), so you may sometimes need to deal with this by hand.
In general, I that there are a couple of ways to deal with it:
Trampoline - throw an exception when the recursion depth is too high and implement a top-level loop that handles the exception (the exception would carry information to resume the call). Instead of exception you can also simply return a value specifying that the function should be called again.
Unwind using timer - when the recursion depth is too high, you create a timer and give it a callback that will be called by the timer after some very short time (the timer will continue the recursion, but the used stack will be dropped).
The same thing could be done using a global stack that stores the work that needs to be done. Instead of scheduling a timer, you would add function to the stack. At the top-level, the program would pick functions from the stack and run them.
To give a specific example of the first technique, in F# you could write this:
type Result<´T> =
| Done of ´T
| Call of (unit -> ´T)
let rec factorial acc n =
if n = 0 then Done acc
else Call(fun () -> factorial (acc * n) (n + 1))
This can be used for mutually recursive functions as well. The imperative loop would simply call the f function stored in Call(f) until it produces Done with the final result. I think this is probably the cleanest way to implement this.
I'm sure there are other sophisticated techniques for dealing with this problem, but those are the two I know about (and that I used).
On Scala 2.8, scala.util.control.TailCalls:
import scala.util.control.TailCalls._
def isEven(xs: List[Int]): TailRec[Boolean] = if (xs.isEmpty)
done(true)
else
tailcall(isOdd(xs.tail))
def isOdd(xs: List[Int]): TailRec[Boolean] = if (xs.isEmpty)
done(false)
else
tailcall(isEven(xs.tail))
isEven((1 to 100000).toList).result
Just to have the code handy for when you Bing for F# mutual recursion:
let rec isOdd x =
if x = 1 then true else isEven (x-1)
and isEven x =
if x = 0 then true else isOdd (x-1)
printfn "%A" (isEven 10000000)
This will StackOverflow if you compile without tail calls (the default in "Debug" mode, which preserves stacks for easier debugging), but run just fine when compiled with tail calls (the default in "Release" mode). The compiler does tail calls by default (see the --tailcalls option), and .NET implementations on most platforms honor it.