Aerospike AQL count(*) SQL analogue script - aerospike

Ok, so the problem is that I need to do aggregation queries on aerospike's aql console. Specifically, I would like to take an average of a bins of records in a set and to count all the records in a set. I am not sure how to even begin...

aql> SHOW SETS will give you the numbers of objects in yours sets, with the column n_objects
Then, you use the n_objects values to calculate your average

SQL-like aggregation functions are implemented in Aerospike using stream UDFs, which are written in Lua. A stream UDF is a map-reduce operation that is applied on a stream of records returned from a scan or secondary index query.
The stream UDF module (let's assume it's contained in the file aggr_funcs.lua) would implement COUNT(*) by returning 1 for each record it sees, and reducing to an aggregated integer value.
local function one(record)
return 1
end
local function sum(v1, v2)
return v1 + v2
end
function count_star(stream)
return stream : map(one) : reduce(sum)
end
You would register the UDF module with the server, then invoke it. Here's an example of how you'd do that in Python using aerospike.Query.apply:
import aerospike
from aerospike import predicates as p
config = {'hosts': [('127.0.0.1', 3000)],
'lua': {'system_path':'/usr/local/aerospike/lua/',
'user_path':'/usr/local/aerospike/usr-lua/'}}
client = aerospike.client(config).connect()
query = client.query('test', 'demo')
#query.where(p.between('my_val', 1, 9)) optionally use a WHERE predicate
query.apply('aggr_funcs', 'count_star')
num_records = query.results()
client.close()
However, you should get metrics such as the number of objects using an info command. Aerospike has an info subsystem that is used by the command line tools such as asinfo, the AMC dashboard, and the info methods of the language clients.
To get the number of objects in the cluster:
asinfo -h 33.33.33.91 -v 'objects'
23773
You can also get the number of objects in a specific namespace. I have a two node cluster, and I'll query each one:
asinfo -h 33.33.33.91 -v 'namespace/test'
type=device;objects=23773;sub-objects=0;master-objects=12274;master-sub-objects=0;prole-objects=11499;prole-sub-objects=0;expired-objects=0;evicted-objects=0;set-deleted-objects=0;nsup-cycle-duration=0;nsup-cycle-sleep-pct=0;used-bytes-memory=2139672;data-used-bytes-memory=618200;index-used-bytes-memory=1521472;sindex-used-bytes-memory=0;free-pct-memory=99;max-void-time=202176396;non-expirable-objects=0;current-time=201744558;stop-writes=false;hwm-breached=false;available-bin-names=32765;used-bytes-disk=6085888;free-pct-disk=99;available_pct=99;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;evict-hist-buckets=10000;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=432000;max-ttl=0;conflict-resolution-policy=generation;single-bin=false;ldt-enabled=false;ldt-page-size=8192;enable-xdr=false;sets-enable-xdr=true;ns-forward-xdr-writes=false;allow-nonxdr-writes=true;allow-xdr-writes=true;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;migrate-order=5;migrate-sleep=1;migrate-tx-partitions-initial=4096;migrate-tx-partitions-remaining=0;migrate-rx-partitions-initial=4096;migrate-rx-partitions-remaining=0;migrate-tx-partitions-imbalance=0;total-bytes-disk=8589934592;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=8589934592;writethreads=1;writecache=67108864;obj-size-hist-max=100
asinfo -h 33.33.33.92 -v 'namespace/test'
type=device;objects=23773;sub-objects=0;master-objects=11499;master-sub-objects=0;prole-objects=12274;prole-sub-objects=0;expired-objects=0;evicted-objects=0;set-deleted-objects=0;nsup-cycle-duration=0;nsup-cycle-sleep-pct=0;used-bytes-memory=2139672;data-used-bytes-memory=618200;index-used-bytes-memory=1521472;sindex-used-bytes-memory=0;free-pct-memory=99;max-void-time=202176396;non-expirable-objects=0;current-time=201744578;stop-writes=false;hwm-breached=false;available-bin-names=32765;used-bytes-disk=6085888;free-pct-disk=99;available_pct=99;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;evict-hist-buckets=10000;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=432000;max-ttl=0;conflict-resolution-policy=generation;single-bin=false;ldt-enabled=false;ldt-page-size=8192;enable-xdr=false;sets-enable-xdr=true;ns-forward-xdr-writes=false;allow-nonxdr-writes=true;allow-xdr-writes=true;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;migrate-order=5;migrate-sleep=1;migrate-tx-partitions-initial=4096;migrate-tx-partitions-remaining=0;migrate-rx-partitions-initial=4096;migrate-rx-partitions-remaining=0;migrate-tx-partitions-imbalance=0;total-bytes-disk=8589934592;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=8589934592;writethreads=1;writecache=67108864;obj-size-hist-max=100
Notice that the value of master-objects on each of the nodes adds up together to the cluster-wide objects value.
To get the number of objects in a set:
asinfo -h 33.33.33.91 -v 'sets/test/demo'
n_objects=23771:n-bytes-memory=618046:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false:set-delete=false;

Related

attempt to call field 'replicate_commands' (a nil value)

I use jedis + lua to eval script, here is my lua script:
redis.replicate_commands()
local second = redis.call('TIME')[1]
local currentKey = KEYS[1]..second
if redis.call('EXISTS', currentKey) == 0 then
redis.call('SETEX', currentKey, 1, 1)
return 1
else
return redis.call('INCR', currentKey)
end
As I use 'Time', it reports error:Write commands not allowed after non deterministic commands.
after searching on internet, I add 'redis.replicate_commands()' as first line of lua script, but it still reports error:ERR Error running script (call to f_c89a6ee8ad732a325e530f4a69226851cde302e2): #user_script:1: user_script:1: attempt to call field 'replicate_commands' (a nil value)
Does replicate_commands need arguments or is there a way to solve my problem?
redis version:3.0
jedis version:2.9
lua version: I don't know where to find
The error attempt to call field 'replicate_commands' (a nil value) means replicate_commands() doesn't exists in the redis object. It is a Lua-side error message.
replicate_commands() was introduced until Redis 3.2. See EVAL - Replicating commands instead of scripts. Consider upgrading.
The first error message (Write commands not allowed after non deterministic commands) is a redis-side message, you cannot call write-commands (like SET, SETEX, INCR, etc) after calling non-deterministic commands (like SPOP, SCAN, RANDOMKEY, TIME, etc).
A very important part of scripting is writing scripts that are pure functions.
Scripts executed in a Redis instance are, by default, propagated to
replicas and to the AOF file by sending the script itself -- not the
resulting commands.
This is so if the Redis server is restarted, playing again the AOF log, or also if replicated in a slave, the script should deliver the same dataset.
This is why in Redis 3.2 replicate_commands() was introduced. And starting with Redis 5 scripts are always replicated as effects -- as if replicate_commands() was called when the script started. But for versions before 3.2, you simply cannot do this.
Therefore, either upgrade to 3.2 or later, or pass currentKey already calculated to the script from the client instead.
Note that creating currentKey dynamically makes your script single-instance-only.
All Redis commands must be analyzed before execution to determine
which keys the command will operate on. In order for this to be true
for EVAL, keys must be passed explicitly. This is useful in many ways,
but especially to make sure Redis Cluster can forward your request to
the appropriate cluster node.
Note this rule is not enforced in order to provide the user with
opportunities to abuse the Redis single instance configuration, at the
cost of writing scripts not compatible with Redis Cluster.
Finally, the Lua version at Redis 3.0.0 is Lua 5.1.5, same as all the way up to Redis 6 RC1.

redis from python display in cli

I am pushing data into redis from python like this:
ts = datetime.datetime.now().timestamp()
if msg.field == 2:
seq = [ts, 'ask', msg.price]
r.rpush(contractTuple[0], *seq)
I expect the inserted data (seq) to be one object in redis. However, when I look at the data from the reds-clithe fields of the python list are on separate lines:
127.0.0.1:6379> lrange ES 0 -13
406) "1523994426.496158"
407) "ask"
408) "2699.5"
127.0.0.1:6379>
Is this the way redis-cli displays data (strange if true imo), or am I pushing data into redis incorrectly?
See: http://redis-py.readthedocs.io/en/latest/index.html#redis.StrictRedis.rpush:
rpush(name, *values)
Push values onto the tail of the list name
Redis doesn't have a concept of "objects". If you want these values to be grouped, you'll have to implement your own methods to (de)serialize them into strings.

Apache Ignite sql query returns only cache contents, not complete results from database

My Ignite nodes (2 server nodes - let's call them A and B) are configured as follows:
ccfg.setCacheMode(CacheMode.PARTITIONED);
ccfg.setAtomicityMode(CacheMode.TRANSACTIONAL);
ccfg.setReadThrough(true);
ccfg.setWriteThrough(true);
ccfg.setWriteBehindEnabled(true);
ccfg.setWriteBehindBatchSize(10000);
Node A is started first, from command line as follows:
apache-ignite-fabric-2.2.0-bin>bin/ignite.bat config/default-config.xml
Node B is started from java code by running
public static void main(String[] args) throws Exception {
Ignite ignite = Ignition.start(ServerConfigurationFactory.createConfiguration());
ignite.cache("MyCache").loadCache(null);
...
}
(jar containing ServerConfigurationFactory is put in the apache-ignite-fabric-2.2.0-bin\libs directory so Node A and B are on the same cluster..otherwise there is an error)
I have a query that is supposed to return 9061 results from the database. After the cache loading process in Node B, I went to the Web Console and ran a simple count SQL statement against the caches. There is a button "Execute on selected node" that allows you to choose a specific cache to query. I queried Node A and got a count of 2341, and on Node B I get a count of 2064. If I just use the "Execute" button I get 4405 which is just the total of node A and B. Obviously they are missing 4656 records (9061 total records in db - 4405 in nodes A and B). I also ran the same count query in Java code using SqlFieldsQuery and I also get 4405.
Since readThrough is set to true I expected Ignite to also return results that are not in memory. But this is not the case because it just returns whatever is on the cache. Am I doing something wrong here? Thank you.
Read though works only for key-value APIs, so SQL engine assumes that all required data is preloaded from database prior to running a query.
If your data set doesn't fit in memory and you can't preload all the data, you can use native Ignite persistence storage: https://apacheignite.readme.io/docs/distributed-persistent-store

How redis pipe-lining works in pyredis?

I am trying to understand, how pipe lining in redis works? According to one blog I read, For this code
Pipeline pipeline = jedis.pipelined();
long start = System.currentTimeMillis();
for (int i = 0; i < 100000; i++) {
pipeline.set("" + i, "" + i);
}
List<Object> results = pipeline.execute();
Every call to pipeline.set() effectively sends the SET command to Redis (you can easily see this by setting a breakpoint inside the loop and querying Redis with redis-cli). The call to pipeline.execute() is when the reading of all the pending responses happens.
So basically, when we use pipe-lining, when we execute any command like set above, the command gets executed on the server but we don't collect the response until we executed, pipeline.execute().
However, according to the documentation of pyredis,
Pipelines are a subclass of the base Redis class that provide support for buffering multiple commands to the server in a single request.
I think, this implies that, we use pipelining, all the commands are buffered and are sent to the server, when we execute pipe.execute(), so this behaviour is different from the behaviour described above.
Could someone please tell me what is the right behaviour when using pyreids?
This is not just a redis-py thing. In Redis, pipelining always means buffering a set of commands and then sending them to the server all at once. The main point of pipelining is to avoid extraneous network back-and-forths-- frequently the bottleneck when running commands against Redis. If each command were sent to Redis before the pipeline was run, this would not be the case.
You can test this in practice. Open up python and:
import redis
r = redis.Redis()
p = r.pipeline()
p.set('blah', 'foo') # this buffers the command. it is not yet run.
r.get('blah') # pipeline hasn't been run, so this returns nothing.
p.execute()
r.get('blah') # now that we've run the pipeline, this returns "foo".
I did run the test that you described from the blog, and I could not reproduce the behaviour.
Setting breakpoints in the for loop, and running
redis-cli info | grep keys
does not show the size increasing after every set command.
Speaking of which, the code you pasted seems to be Java using Jedis (which I also used).
And in the test I ran, and according to the documentation, there is no method execute() in jedis but an exec() and sync() one.
I did see the values being set in redis after the sync() command.
Besides, this question seems to go with the pyredis documentation.
Finally, the redis documentation itself focuses on networking optimization (Quoting the example)
This time we are not paying the cost of RTT for every call, but just one time for the three commands.
P.S. Could you get the link to the blog you read?

Use Multiple DBs With One Redis Lua Script?

Is it possible to have one Redis Lua script hit more than one database? I currently have information of one type in DB 0 and information of another type in DB 1. My normal workflow is doing updates on DB 1 based on an API call along with meta information from DB 0. I'd love to do everything in one Lua script, but can't figure out how to hit multiple dbs. I'm doing this in Python using redis-py:
lua_script(keys=some_keys,
args=some_args,
client=some_client)
Since the client implies a specific db, I'm stuck. Ideas?
It is usually a wrong idea to put related data in different Redis databases. There is almost no benefit compared to defining namespaces by key naming conventions (no extra granularity regarding security, persistence, expiration management, etc ...). And a major drawback is the clients have to manually handle the selection of the correct database, which is error prone for clients targeting multiple databases at the same time.
Now, if you still want to use multiple databases, there is a way to make it work with redis-py and Lua scripting.
redis-py does not define a wrapper for the SELECT command (normally used to switch the current database), because of the underlying thread-safe connection pool implementation. But nothing prevents you to call SELECT from a Lua script.
Consider the following example:
$ redis-cli
SELECT 0
SET mykey db0
SELECT 1
SET mykey db1
The following script displays the value of mykey in the 2 databases from the same client connection.
import redis
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
lua1 = """
redis.call("select", ARGV[1])
return redis.call("get",KEYS[1])
"""
script1 = r.register_script(lua1)
lua2 = """
redis.call("select", ARGV[1])
local ret = redis.call("get",KEYS[1])
redis.call("select", ARGV[2])
return ret
"""
script2 = r.register_script(lua2)
print r.get("mykey")
print script2( keys=["mykey"], args = [1,0] )
print r.get("mykey"), "ok"
print
print r.get("mykey")
print script1( keys=["mykey"], args = [1] )
print r.get("mykey"), "misleading !!!"
Script lua1 is naive: it just selects a given database before returning the value. Its usage is misleading, because after its execution, the current database associated to the connection has changed. Don't do this.
Script lua2 is much better. It takes the target database and the current database as parameters. It makes sure that the current database is reactivated before the end of the script, so that next command applied on the connection still run in the correct database.
Unfortunately, there is no command to guess the current database in the Lua script, so the client has to provide it systematically. Please note the Lua script must reset the current database at the end whatever happens (even in case of previous error), so it makes complex scripts cumbersome and awkward.