ComputeTaskFuture map returns futures map even when invalid nodeId cluster is specified - ignite

ClusterGroup clusterGroup = ignite.cluster().forNodeIds(invalidNodeIds)
final Map<IgniteUuid, ComputeTaskFuture<Object>> computeTaskFutures = ignite.compute(clusterGroup).activeTaskFutures();
when fictitious invalidNodeIds are passed in, even though the backing ClusterGroupAdapter has an empty node ids (Set ids) object, I see valid futures being returned. Isn't this wrong ?
Thanks

IgniteCompute.taskFutures() is local operation, i.e. it returns futures for tasks that were executed by the current node. Having said that, cluster group is not applicable to this method.

Related

PromQL: How to get cpu usage of all replicasets and containers given the cluster, namesapce, and deployment?

I am trying to write PromQL on Prometheus UI to get cpu usage of all replicasets and their containers by fixing the cluster, namespace, and deployment. My desirable outcome is to graph each {replicaset, container} pair cpu usage on the same graph. Since there are no labels within container_cpu_usage_seconds_total that allow me to group by replicaset name. I am sure that I have to retrieve replicaset name and somehow aggregate by containers using the kube_pod_info metrics. And then, join different metrics to get what I want.
Below is what I came up with right now:
avg by (replicaset, container) (
container_cpu_usage_seconds_total
* on (replicaset) group_left (created_by_kind, created_by_name)
(kube_pod_info {app_kubernetes_io_name="kube-state-metrics", cluster=~"${my_cluster}", namespace=~"${my_namespace}", created_by_kind=~"ReplicaSet"} * 0)
)
I got an error saying "many-to-many matching not allowed: matching labels must be unique on one side".
My desirable output is:
*{r, c} means certain {replicaset, container} pair

Redis - Count distinct problem (without hyper log log)

I should solve a count-distinct problem in Redis without the use of HyperLogLog (because of the 0.81% of known error).
I got different requests with a list of objects [O1, O2, ... On] for a specific Key A.
For each list of objects received, Redis should memorize the Objects not still saved and return the number of new objects saved.
For Example:
Request 1 : Key: A - Objects: [O1, O2, O3] -> Response 1: Number of new objects : 3
Request 2 : Key: A - Objects: [O1, O2, O4] -> Response 2: Number of new objects : 1
Request 3 : Key: A - Objects: [O1, O2, O4] -> Response 3: Number of new objects : 0
I have tried to solve this problem with the Hyper Log Log and it's working perfectly but with a growing dataset of objects, the number of new objects saved is not so accurate.
With the sets and the hashmap, the memory is growing too much.
I have read some stuff about Bitmaps but is not too clear. Do you have any reference to projects that are already facing this problem?
Thanks in advance
You might want to consider using a bloom filter. This is available as a module https://redis.com/redis-best-practices/bloom-filter-pattern/.
Bloom filters allow quick tests for membership with 0 false negatives and a very low false positive, provided you know in advance what the maximum number of elements are. You would need to write code of the sort:
result = bf.exists(key, item)
if result == 0:
bf.add(key, item)
bf.inc(key_count)

Neo4j: How to pass a variable to Neo4j Apoc (apoc.path.subgraphAll) Property

Am new to Neo4j and trying to do a POC by implementing a graph DB for Enterprise Reference / Integration Architecture (Architecture showing all enterprise applications as Nodes, Underlying Tables / APIs - logically grouped as Nodes, integrations between Apps as Relationships.
Objective is to achieve seamlessly 'Impact Analysis' using the strength of Graph DB (Note: I understand this may be an incorrect approach to achieve whatever am trying to achieve, so suggestions are welcome)
Let me come brief my question now,
There are four Apps - A1, A2, A3, A4; A1 has set of Tables (represented by a node A1TS1) that's updated by Integration 1 (relationship in this case) and the same set of tables are read by Integration 2. So the Data model looks like below
(A1TS1)<-[:INT1]-(A1)<-[:INT1]-(A2)
(A1TS1)-[:INT2]->(A1)-[:INT2]->(A4)
I have the underlying application table names captured as a List property in A1TS1 node.
Let's say one of the app table is altered for a new column or Data type and I wanted to understand all impacted Integrations and Applications. Now am trying to write a query as below to retrieve all nodes & relationships that are associated/impacted because of this table alteration but am not able to achieve this
Expected Result is - all impacted nodes (A1TS1, A1, A2, A4) and relationships (INT1, INT2)
Option 1 (Using APOC)
MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b)
WITH a as STRTND, Collect(type(r)) as allr
CALL apoc.path.subgraphAll(STRTND, {relationshipFilter:allr}) YIELD nodes, relationships
RETURN nodes, relationships
This faile with error Failed to invoke procedure 'apoc.path.subgraphAll': Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
Option 2 (Using with, unwind, collect clause)
MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b)
WITH a as STRTND, Collect(r) as allr
UNWIND allr as rels
MATCH p=()-[rels]-()-[rels]-()
RETURN p
This fails with error "Cannot use the same relationship variable 'rels' for multiple patterns" but if I use the [rels] once like p=()-[rels]=() it works but not yielding me all nodes
Any help/suggestion/lead is appreciated. Thanks in advance
Update
Trying to give more context
Showing the Underlying Data
MATCH (TC:TBLCON) RETURN TC
"TC"
{"Tables":["TBL1","TBL2","TBL3"],"TCName":"A1TS1","AppName":"A1"}
{"Tables":["TBL4","TBL1"],"TCName":"A2TS1","AppName":"A2"}
MATCH (A:App) RETURN A
"A"
{"Sponsor":"XY","Platform":"Oracle","TechOwnr":"VV","Version":"12","Tags":["ERP","OracleEBS","FinanceSystem"],"AppName":"A1"}
{"Sponsor":"CC","Platform":"Teradata","TechOwnr":"RZ","Tags":["EDW","DataWarehouse"],"AppName":"A2"}
MATCH ()-[r]-() RETURN distinct r.relname
"r.relname"
"FINREP" │ (runs between A1 to other apps)
"UPFRNT" │ (runs between A2 to different Salesforce App)
"INVOICE" │ (runs between A1 to other apps)
With this, here is what am trying to achieve
Assume "TBL3" is getting altered in App A1, I wanted to write a query specifying the table "TBL3" in match pattern, get all associated relationships and connected nodes (upstream)
May be I need to achieve in 3 steps,
Step 1 - Write a match pattern to find the start node and associated relationship(s)
Step 2 - Store that relationship(s) from step 1 in a Array variable / parameter
Step 3 - Pass the start node from step 1 & parameter from step 2 to apoc.path.subgraphAll to see all the impacted nodes
This may conceptually sound valid but how to do that technically in neo4j Cypher query is the question.
Hope this helps
This query may do what you want:
MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
MATCH p=(tc)-[:Foo*]-()
WITH tc,
REDUCE(s = [], x IN COLLECT(NODES(p)) | s + x) AS ns,
REDUCE(t = [], y IN COLLECT(RELATIONSHIPS(p)) | t + y) AS rs
UNWIND ns AS n
WITH tc, rs, COLLECT(DISTINCT n) AS nodes
UNWIND rs AS rel
RETURN tc, nodes, COLLECT(DISTINCT rel) AS rels;
It assumes that you provide the name of the table of interest (e.g., "TBL3") as the value of a table parameter. It also assumes that the relationships of interest all have the Foo type.
It first finds tc, the TBLCON node(s) containing that table name. It then uses a variable-length non-directional search for all paths (with non-repeating relationships) that include tc. It then uses COLLECT twice: to aggregate the list of nodes in each path, and to aggregate the list of relationships in each path. Each aggregation result would be a list of lists, so it uses REDUCE on each outer list to merge the inner lists. It then uses UNWIND and COLLECT(DISTINCT x) on each list to produce a list with unique elements.
[UPDATE]
If you differentiate between your relationships by type (rather than by property value), your Cypher code can be a lot simpler by taking advantage of APOC functions. The following query assumes that the desired relationship types are passed via a types parameter:
MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
CALL apoc.path.subgraphAll(
tc, {relationshipFilter: apoc.text.join($types, '|')}) YIELD nodes, relationships
RETURN nodes, relationships;
WIth some lead from cybersam's response, the below query gets me what I want. Only constraint is, this result is limited to 3 layers (3rd layer through Optional Match)
MATCH (TC:TBLCON) WHERE 'TBL3' IN TC.Tables
CALL apoc.path.subgraphAll(TC, {maxLevel:1}) YIELD nodes AS invN, relationships AS invR
WITH TC, REDUCE (tmpL=[], tmpr IN invR | tmpL+type(tmpr)) AS impR
MATCH FLP=(TC)-[]-()-[FLR]-(SL) WHERE type(FLR) IN impR
WITH FLP, TC, SL,impR
OPTIONAL MATCH SLP=(SL)-[SLR]-() WHERE type(SLR) IN impR RETURN FLP,SLP
This works for my needs, hope this might also help someone.
Thanks everyone for the responses and suggestions
****Update****
Enhanced the query to get rid of Optional Match criteria and other given limitations
MATCH (initTC:TBLCON) WHERE $TL IN initTC.Tables
WITH Reduce(O="",OO in Reduce (I=[], II in collect(apoc.node.relationship.types(initTC)) | I+II) | O+OO+"|") as RF
MATCH (TC:TBLCON) WHERE $TL IN TC.Tables
CALL apoc.path.subgraphAll(TC,{relationshipFilter:RF}) YIELD nodes, relationships
RETURN nodes, relationships
Thanks all (especially cybersam)

Neo4j: "ghost" node in label index throws error

I have a neo4j database with a set of nodes with label :EXAMPLE.
There are two operations. First I delete one node and then I look for another one. They are done separately using neo4j API.
MATCH (n:EXAMPLE {Name: { name1 }}) DELETE n;
and
MATCH (n:EXAMPLE {Name: { name2 }}) RETURN n;
Sometimes, when I execute second query, it throws an error "Node with id 123". Node with id 123 is the same node that was deleted in the first query.
It happens when there is a lot of requests are coming to the database simultaneously.
I guess that it could happen if node was deleted, but EXAMPLE label index wasn't updated yet. There are two facts that prove such theory.
1) The error is unstable.
2) If I change second query like this (remove the label), I won't get the error:
MATCH (n {Name: { name2 }}) RETURN n;
Neo4j version is 2.1.5, Java - OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-2~deb7u1) and operation system is Debian. There are no other indexes in the database except the label.
The question is how can I fix this, but still use labels?
What ends up happening is that (simplified) the operations will order like so:
Q1: MATCH (n)
Q2: DELETE (n), COMMIT
Q1: RETURN n # Error, n no longer exists
For implementation reasons, this is much more likely to happen if cypher is going via an index. The database will eventually handle this for you, but for now, you'll need to wrap that read query in a retry block - if it fails with this type of error, you simply run it again.
On that note, there are other errors that are easily recoverable from by retrying, such as deadlock errors, so wrapping your statements and/or transactions in retry-blocks is a useful thing to do in general.
This is a possible workaround:
Mark nodes as deleted instead of deleting. Ignore nodes that are marked as deleted. Delete all such nodes at once with a garbage collector.

NHibernate not finding named query result sets in 2nd level cache

I have a simple unit test where I execute the same NHibernate named query 2 times (different session each time) with the identical parameter. It's a simple int parameter, and since my query is a named query I assume these 2 calls are identical and the results should be cached.
In fact, I can see in my log that the results ARE being cached, but with different keys. So, my 2nd query results are never found in cache.
here's a snip from my log (note how the keys are different):
(first query)
DEBUG NHibernate.Caches.SysCache2.SysCacheRegion [(null)] <(null)> -
adding new data: key= [snipped]... parameters: ['809']; named
parameters: {}#743460424 &
value=System.Collections.Generic.List`1[System.Object]
(second query)
DEBUG NHibernate.Caches.SysCache2.SysCacheRegion [(null)] <(null)> -
adding new data: key=[snipped]... parameters: ['809']; named
parameters: {}#704749285 &
value=System.Collections.Generic.List`1[System.Object]
I have NHibernate set up to use the query cache. And I have these queries set to cacheable=true. Don't know where else to look. Anyone have any suggestions?
Thanks
-Mike
Okay - i figured this out. I was executing my named query using the following syntax:
IQuery q = session.GetNamedQuery("MyQuery")
.SetResultTransformer(Transformers.AliasToBean(typeof(MyDTO)))
.SetCacheable(true)
.SetCacheRegion("MyCacheRegion");
( which, I might add, is EXACTLY how the NHibernate docs tell you how to do it.. but I digress ;) )
If you use create a new AliasToBean Transformer for every query, then each query object (which is the key to the cache) will be unique and you will never get a cache hit. So, in short, if you do it like the nhib docs say then caching wont work.
Instead, create your transformer one time in a static member var and then use that for your query, and caching will work - like this:
private static IResultTransformer myTransformer = Transformers.AliasToBean(typeof(MyDTO))
...
IQuery q = session.GetNamedQuery("MyQuery")
.SetResultTransformer(myTransformer)
.SetCacheable(true)
.SetCacheRegion("MyCacheRegion");