Access to Mongoid 3 master node - ruby-on-rails-3

How do you get access to the master node in Mongoid > 3.0 ?
In Mongoid < 3.0 you could use:
Mongoid::Config.master.eval('...')
The closest I can find in 3 seems to be:
klass.collection.database.command(eval: '...') #=> failed with error "not master"
Is there a better way to get access to master? Or a way to ensure the command is evaluated by the master node?

Mongoid 3.0 uses Moped and not the 10gen driver, so see Moped::Cluster#with_primary
http://rubydoc.info/github/mongoid/moped/master/Moped/Cluster:with_primary
For example:
User.collection.database.session.cluster.with_primary do
p User.collection.database.command(eval: 'function() { return 3+3; }')
p User.collection.database.command(ping: 1)
end
output:
{"retval"=>6.0, "ok"=>1.0}
{"ok"=>1.0}
Note that other possible solutions like group, aggregation, and mapreduce are recommended over eval.

Related

getting running job id from BigQueryOperator using xcom

I want to get Bigquery's job id from BigQueryOperator.
I saw in bigquery_operator.py file the following line:
context['task_instance'].xcom_push(key='job_id', value=job_id)
I don't know if this is airflow's job id or BigQuery job id, if it's BigQuery job id how can I get it using xcom from downstream task?.
I tried to do the following in downstream Pythonoperator:
def write_statistics(**kwargs):
job_id = kwargs['templates_dict']['job_id']
print('tamir')
print(kwargs['ti'].xcom_pull(task_ids='create_tmp_big_query_table',key='job_id'))
print(kwargs['ti'])
print(job_id)
t3 = BigQueryOperator(
task_id='create_tmp_big_query_table',
bigquery_conn_id='bigquery_default',
destination_dataset_table= DATASET_TABLE_NAME,
use_legacy_sql=False,
write_disposition='WRITE_TRUNCATE',
sql = """
#standardSQL...
The UI is great for checking whether an XCom was written to or not, which I'd recommend you do even before you try to reference it in a separate task so you don't need to worry about whether you're fetching it correctly or not. Click your create_tmp_big_query_table task -> Task Instance Details -> XCom. It'll look something like the following:
In your case, the code looks right to me, but I'm guessing your version of Airflow doesn't have the change that added saving job id into an XCom. This feature was added in https://github.com/apache/airflow/pull/5195, which is currently only on master and currently not part of the most recent stable release (1.10.3). See for yourself in the 1.10.3 version of the BigQueryOperator.
Your options are to wait for it to be in a release (...sometimes takes awhile), running off a version of master with that change, or temporarily copy over the newer version of the operator as a custom operator. In the last case, I'd suggest naming it something like BigQueryOperatorWithXcom with a note to replace it with the built-in operator once it's released.
The JOB ID within bigquery_operator.py is the BQ JOB ID. You can understand it looking at the previous lines:
if isinstance(self.sql, str):
job_id = self.bq_cursor.run_query(
sql=self.sql,
destination_dataset_table=self.destination_dataset_table,
write_disposition=self.write_disposition,
allow_large_results=self.allow_large_results,
flatten_results=self.flatten_results,
udf_config=self.udf_config,
maximum_billing_tier=self.maximum_billing_tier,
maximum_bytes_billed=self.maximum_bytes_billed,
create_disposition=self.create_disposition,
query_params=self.query_params,
labels=self.labels,
schema_update_options=self.schema_update_options,
priority=self.priority,
time_partitioning=self.time_partitioning,
api_resource_configs=self.api_resource_configs,
cluster_fields=self.cluster_fields,
encryption_configuration=self.encryption_configuration
)
elif isinstance(self.sql, Iterable):
job_id = [
self.bq_cursor.run_query(
sql=s,
destination_dataset_table=self.destination_dataset_table,
write_disposition=self.write_disposition,
allow_large_results=self.allow_large_results,
flatten_results=self.flatten_results,
udf_config=self.udf_config,
maximum_billing_tier=self.maximum_billing_tier,
maximum_bytes_billed=self.maximum_bytes_billed,
create_disposition=self.create_disposition,
query_params=self.query_params,
labels=self.labels,
schema_update_options=self.schema_update_options,
priority=self.priority,
time_partitioning=self.time_partitioning,
api_resource_configs=self.api_resource_configs,
cluster_fields=self.cluster_fields,
encryption_configuration=self.encryption_configuration
)
for s in self.sql]
Eventually, run_with_configuration method returns self.running_job_id from BQ

Aerospike AQL count(*) SQL analogue script

Ok, so the problem is that I need to do aggregation queries on aerospike's aql console. Specifically, I would like to take an average of a bins of records in a set and to count all the records in a set. I am not sure how to even begin...
aql> SHOW SETS will give you the numbers of objects in yours sets, with the column n_objects
Then, you use the n_objects values to calculate your average
SQL-like aggregation functions are implemented in Aerospike using stream UDFs, which are written in Lua. A stream UDF is a map-reduce operation that is applied on a stream of records returned from a scan or secondary index query.
The stream UDF module (let's assume it's contained in the file aggr_funcs.lua) would implement COUNT(*) by returning 1 for each record it sees, and reducing to an aggregated integer value.
local function one(record)
return 1
end
local function sum(v1, v2)
return v1 + v2
end
function count_star(stream)
return stream : map(one) : reduce(sum)
end
You would register the UDF module with the server, then invoke it. Here's an example of how you'd do that in Python using aerospike.Query.apply:
import aerospike
from aerospike import predicates as p
config = {'hosts': [('127.0.0.1', 3000)],
'lua': {'system_path':'/usr/local/aerospike/lua/',
'user_path':'/usr/local/aerospike/usr-lua/'}}
client = aerospike.client(config).connect()
query = client.query('test', 'demo')
#query.where(p.between('my_val', 1, 9)) optionally use a WHERE predicate
query.apply('aggr_funcs', 'count_star')
num_records = query.results()
client.close()
However, you should get metrics such as the number of objects using an info command. Aerospike has an info subsystem that is used by the command line tools such as asinfo, the AMC dashboard, and the info methods of the language clients.
To get the number of objects in the cluster:
asinfo -h 33.33.33.91 -v 'objects'
23773
You can also get the number of objects in a specific namespace. I have a two node cluster, and I'll query each one:
asinfo -h 33.33.33.91 -v 'namespace/test'
type=device;objects=23773;sub-objects=0;master-objects=12274;master-sub-objects=0;prole-objects=11499;prole-sub-objects=0;expired-objects=0;evicted-objects=0;set-deleted-objects=0;nsup-cycle-duration=0;nsup-cycle-sleep-pct=0;used-bytes-memory=2139672;data-used-bytes-memory=618200;index-used-bytes-memory=1521472;sindex-used-bytes-memory=0;free-pct-memory=99;max-void-time=202176396;non-expirable-objects=0;current-time=201744558;stop-writes=false;hwm-breached=false;available-bin-names=32765;used-bytes-disk=6085888;free-pct-disk=99;available_pct=99;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;evict-hist-buckets=10000;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=432000;max-ttl=0;conflict-resolution-policy=generation;single-bin=false;ldt-enabled=false;ldt-page-size=8192;enable-xdr=false;sets-enable-xdr=true;ns-forward-xdr-writes=false;allow-nonxdr-writes=true;allow-xdr-writes=true;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;migrate-order=5;migrate-sleep=1;migrate-tx-partitions-initial=4096;migrate-tx-partitions-remaining=0;migrate-rx-partitions-initial=4096;migrate-rx-partitions-remaining=0;migrate-tx-partitions-imbalance=0;total-bytes-disk=8589934592;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=8589934592;writethreads=1;writecache=67108864;obj-size-hist-max=100
asinfo -h 33.33.33.92 -v 'namespace/test'
type=device;objects=23773;sub-objects=0;master-objects=11499;master-sub-objects=0;prole-objects=12274;prole-sub-objects=0;expired-objects=0;evicted-objects=0;set-deleted-objects=0;nsup-cycle-duration=0;nsup-cycle-sleep-pct=0;used-bytes-memory=2139672;data-used-bytes-memory=618200;index-used-bytes-memory=1521472;sindex-used-bytes-memory=0;free-pct-memory=99;max-void-time=202176396;non-expirable-objects=0;current-time=201744578;stop-writes=false;hwm-breached=false;available-bin-names=32765;used-bytes-disk=6085888;free-pct-disk=99;available_pct=99;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;evict-hist-buckets=10000;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=432000;max-ttl=0;conflict-resolution-policy=generation;single-bin=false;ldt-enabled=false;ldt-page-size=8192;enable-xdr=false;sets-enable-xdr=true;ns-forward-xdr-writes=false;allow-nonxdr-writes=true;allow-xdr-writes=true;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;migrate-order=5;migrate-sleep=1;migrate-tx-partitions-initial=4096;migrate-tx-partitions-remaining=0;migrate-rx-partitions-initial=4096;migrate-rx-partitions-remaining=0;migrate-tx-partitions-imbalance=0;total-bytes-disk=8589934592;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=8589934592;writethreads=1;writecache=67108864;obj-size-hist-max=100
Notice that the value of master-objects on each of the nodes adds up together to the cluster-wide objects value.
To get the number of objects in a set:
asinfo -h 33.33.33.91 -v 'sets/test/demo'
n_objects=23771:n-bytes-memory=618046:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false:set-delete=false;

Neo4j - cypher query optimization

I am pretty new with Neo4j and we are trying to use it with our PHP SQL Server based application. I am using Neo4j 2.0 milestone 6. Some of the relevant configuration variables:
wrapper.java.initmemory=5000
wrapper.java.maxmemory=5000
Now coming to the question- I am trying to write a cypher query which traverses graph and calculates allocation amount. Here is a snapshot of the structure of the graph.
Basically I have a department as a starting point which has certain amount and then it is allocated out to certain Product which in turn allocates out to other Products, so on. It can be n level deep allocation. Now I need to calculate amount for all the products.
The cypher query that I am using is as below:
MATCH (f:financial) WHERE f.amount <> 0
WITH f
MATCH f-[r_allocates:Allocates*]->(n_allocation)
RETURN
n_allocation.cost_pool_hierarchy_id,
SUM(reduce(totalamt=f.amount, r IN r_allocates| (r.allocate)*totalamt/100 )) as amt
ORDER BY n_allocation.cost_pool_hierarchy_id
LIMIT 1000
This query takes 30+ seconds with warmed up caches. I have tried going back to Neo4j 1.9 as I found on some posts that Neo4j 2.0 is not yet optimized, but the similar query in 1.9 takes 40+ seconds.
Here is the output from profiler:
==> ColumnFilter(symKeys=["n_allocation.cost_pool_hierarchy_id", " INTERNAL_AGGREGATE2db722c4-6400-4803-90a2-b883b0076e8b"], returnItemNames=["n_allocation.cost_pool_hierarchy_id", "amt"], _rows=746, _db_hits=0)
==> Top(orderBy=["SortItem(Cached(n_allocation.cost_pool_hierarchy_id of type Any),true)"], limit="Literal(1000)", _rows=746, _db_hits=0)
==> EagerAggregation(keys=["Cached(n_allocation.cost_pool_hierarchy_id of type Any)"], aggregates=["( INTERNAL_AGGREGATE2db722c4-6400-4803-90a2-b883b0076e8b,Sum(ReduceFunction(r_allocates,r,Divide(Multiply(Product(r,allocate(11),true),totalamt),Literal(100)),totalamt,Product(f,amount(9),true))))"], _rows=746, _db_hits=11680622)
==> Extract(symKeys=["f", "n_allocation", " UNNAMED55", "r_allocates"], exprKeys=["n_allocation.cost_pool_hierarchy_id"], _rows=3906768, _db_hits=3906768)
==> PatternMatch(g="(f)-[' UNNAMED55']-(n_allocation)", _rows=3906768, _db_hits=0)
==> Filter(pred="(NOT(Product(f,amount(9),true) == Literal(0)) AND hasLabel(f:financial(6)))", _rows=9959, _db_hits=34272)
==> NodeByLabel(label="financial", identifier="f", _rows=34272, _db_hits=0)
I would appreciate any help in optimizing this query. Do I need to update some configuration settings? Or do I need to change the structure of graph?
Update:
Just to add further - even the below relatively simple query takes 26 seconds:
MATCH p=(f:financial)-[r*2]->(n) RETURN COUNT(p)

neo4j V2.0.0 M5 "java.lang.NullPointerException"

After upgrading to Neo4j V2.0.0 M5 I ran into the subject error when running a cypher query in my web app. To isolate the issue, I tried the following similar queries in the basic Neo4j console (http://console.neo4j.org/) as follows:
START n=node(*)
WHERE n.name ='Neo'
RETURN n
Result: (6 {name:"Neo"})
Next tested match on regular expression using "=~"
START n=node(*)
WHERE n.name =~'Neo.*'
RETURN n
Result: Error: java.lang.NullPointerException
Next tested on case insensitive by pre-pending a regular expression with (?i)
START n=node(*)
WHERE n.name =~'(?i)Neo'
RETURN n
Result: Error: java.lang.NullPointerException
And finally tested for both regex and case insensitivity with =~ '(?i)neo.*'
MATCH n
WHERE n.name =~ '(?i)neo.*'
RETURN n
Result: Error: java.lang.NullPointerException
I believe the issue is with "=~." Can anyone else recreate these errors? Shouldn't all of these queries resulted in returning the "Neo" node? If not, please let me know why.
Thank you,
Jeff
The great people at Neo4j tell me that this is a milestone release bug. As with all milestone releases, we were warned. Reverting back to Neo4j Stable Release 1.9.3 and will test with M6.

Use Multiple DBs With One Redis Lua Script?

Is it possible to have one Redis Lua script hit more than one database? I currently have information of one type in DB 0 and information of another type in DB 1. My normal workflow is doing updates on DB 1 based on an API call along with meta information from DB 0. I'd love to do everything in one Lua script, but can't figure out how to hit multiple dbs. I'm doing this in Python using redis-py:
lua_script(keys=some_keys,
args=some_args,
client=some_client)
Since the client implies a specific db, I'm stuck. Ideas?
It is usually a wrong idea to put related data in different Redis databases. There is almost no benefit compared to defining namespaces by key naming conventions (no extra granularity regarding security, persistence, expiration management, etc ...). And a major drawback is the clients have to manually handle the selection of the correct database, which is error prone for clients targeting multiple databases at the same time.
Now, if you still want to use multiple databases, there is a way to make it work with redis-py and Lua scripting.
redis-py does not define a wrapper for the SELECT command (normally used to switch the current database), because of the underlying thread-safe connection pool implementation. But nothing prevents you to call SELECT from a Lua script.
Consider the following example:
$ redis-cli
SELECT 0
SET mykey db0
SELECT 1
SET mykey db1
The following script displays the value of mykey in the 2 databases from the same client connection.
import redis
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
lua1 = """
redis.call("select", ARGV[1])
return redis.call("get",KEYS[1])
"""
script1 = r.register_script(lua1)
lua2 = """
redis.call("select", ARGV[1])
local ret = redis.call("get",KEYS[1])
redis.call("select", ARGV[2])
return ret
"""
script2 = r.register_script(lua2)
print r.get("mykey")
print script2( keys=["mykey"], args = [1,0] )
print r.get("mykey"), "ok"
print
print r.get("mykey")
print script1( keys=["mykey"], args = [1] )
print r.get("mykey"), "misleading !!!"
Script lua1 is naive: it just selects a given database before returning the value. Its usage is misleading, because after its execution, the current database associated to the connection has changed. Don't do this.
Script lua2 is much better. It takes the target database and the current database as parameters. It makes sure that the current database is reactivated before the end of the script, so that next command applied on the connection still run in the correct database.
Unfortunately, there is no command to guess the current database in the Lua script, so the client has to provide it systematically. Please note the Lua script must reset the current database at the end whatever happens (even in case of previous error), so it makes complex scripts cumbersome and awkward.