How to setup a pr. job_type.Murnaghan that for each volume reads the structure output of another Murnaghan job? - pyiron

For the Ene-Vol calculations of the non-cubic structures, one has to relax the structures for all volumes.
Suppose that I start with a pr.jobtype.Murnaghan() job that its ref_job_relax is a cell-shape and internal coordinates relaxation. Let's call the Murnaghan job R1 with 7 volumes, i.e. R1-V1,...,R1-V7.
After one or more rounds of relaxation (R1...RN), one has to perform a static calculation to acquire a precise energy. Let's call the final static round S.
For the final round, I want to create a pr.jobtype.Murnaghan() job that reads all the required setup configurations from the ref_job_static except the input structures .
Then for each volume S-Vn it should read the corresponding output structure of RN-Vn, e.g. R1-V1-->S-V1, ..., R1-V7-->S-V7 if there were only one round of relaxation.
I am looking for an implementation like below:
murn_relax = pr.create_job(pr.job_type.Murnaghan, 'R1')
murn_relax.ref_job = ref_job_relax
murn_relax.run()
murn_static = pr.create_job(pr.job_type.Murnaghan, 'S', continuation=True)
murn_static.ref_job = ref_job_static
murn_static.structures_from(prev_job='R1')
murn_static.run()

The Murnaghan object has two relevant functions:
get_structure() https://github.com/pyiron/pyiron_atomistics/blob/master/pyiron_atomistics/atomistics/master/murnaghan.py#L829
list_structures() https://github.com/pyiron/pyiron_atomistics/blob/master/pyiron_atomistics/atomistics/master/murnaghan.py#L727
The first returns the predicted equilibrium structure and the second returns the structures at the different volumes.
In addition you can get the IDs of the children and iterate over those:
structure_lst = [
pr.load(job_id).get_structure()
for job_id in murn_relax.child_ids
]
to get a list of converged structures.

Related

PromQL: How to get cpu usage of all replicasets and containers given the cluster, namesapce, and deployment?

I am trying to write PromQL on Prometheus UI to get cpu usage of all replicasets and their containers by fixing the cluster, namespace, and deployment. My desirable outcome is to graph each {replicaset, container} pair cpu usage on the same graph. Since there are no labels within container_cpu_usage_seconds_total that allow me to group by replicaset name. I am sure that I have to retrieve replicaset name and somehow aggregate by containers using the kube_pod_info metrics. And then, join different metrics to get what I want.
Below is what I came up with right now:
avg by (replicaset, container) (
container_cpu_usage_seconds_total
* on (replicaset) group_left (created_by_kind, created_by_name)
(kube_pod_info {app_kubernetes_io_name="kube-state-metrics", cluster=~"${my_cluster}", namespace=~"${my_namespace}", created_by_kind=~"ReplicaSet"} * 0)
)
I got an error saying "many-to-many matching not allowed: matching labels must be unique on one side".
My desirable output is:
*{r, c} means certain {replicaset, container} pair

Create subgraph query in Gremlin around single node with outgoing and incoming edges

I have a large Janusgraph database and I'd to create a subgraph centered around one node type and including incoming and outgoing nodes of specific types.
In Cypher, the query would look like this:
MATCH (a:Journal)N-[:PublishedIn]-(b:Paper{paperTitle:'My Paper Title'})<-[:AuthorOf]-(c:Author)
RETURN a,b,c
This is what I tried in Gremlin:
sg = g.V().outE('PublishedIn').subgraph('j_p_a').has('Paper','paperTitle', 'My Paper Title')
.inE('AuthorOf').subgraph('j_p_a')
.cap('j_p_a').next()
But I get a syntax error. 'AuthorOf' and 'PublishedIn' are not the only edge types ending at 'Paper' nodes.
Can someone show me how to correctly execute this query in Gremlin?
As written in your query, the outE step yields edges and the has step will check properties on those edges, following that the query processor will expect an inV not another inE. Without your data model it is hard to know exactly what you need, however, looking at the Cypher I think this is what you want.
sg = g.V().outE('PublishedIn').
subgraph('j_p_a').
inV().
has('Paper','paperTitle', 'My Paper Title').
inE('AuthorOf').
subgraph('j_p_a')
cap('j_p_a').
next()
Edited to add:
As I do not have your data I used my air-routes graph. I modeled this query on yours and used some select steps to limit the data size processed. This seems to work in my testing. Hopefully you can see the changes I made and try those in your query.
sg = g.V().outE('route').as('a').
inV().
has('code','AUS').as('b').
select('a').
subgraph('sg').
select('b').
inE('contains').
subgraph('sg').
cap('sg').
next()

Neo4j: How to pass a variable to Neo4j Apoc (apoc.path.subgraphAll) Property

Am new to Neo4j and trying to do a POC by implementing a graph DB for Enterprise Reference / Integration Architecture (Architecture showing all enterprise applications as Nodes, Underlying Tables / APIs - logically grouped as Nodes, integrations between Apps as Relationships.
Objective is to achieve seamlessly 'Impact Analysis' using the strength of Graph DB (Note: I understand this may be an incorrect approach to achieve whatever am trying to achieve, so suggestions are welcome)
Let me come brief my question now,
There are four Apps - A1, A2, A3, A4; A1 has set of Tables (represented by a node A1TS1) that's updated by Integration 1 (relationship in this case) and the same set of tables are read by Integration 2. So the Data model looks like below
(A1TS1)<-[:INT1]-(A1)<-[:INT1]-(A2)
(A1TS1)-[:INT2]->(A1)-[:INT2]->(A4)
I have the underlying application table names captured as a List property in A1TS1 node.
Let's say one of the app table is altered for a new column or Data type and I wanted to understand all impacted Integrations and Applications. Now am trying to write a query as below to retrieve all nodes & relationships that are associated/impacted because of this table alteration but am not able to achieve this
Expected Result is - all impacted nodes (A1TS1, A1, A2, A4) and relationships (INT1, INT2)
Option 1 (Using APOC)
MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b)
WITH a as STRTND, Collect(type(r)) as allr
CALL apoc.path.subgraphAll(STRTND, {relationshipFilter:allr}) YIELD nodes, relationships
RETURN nodes, relationships
This faile with error Failed to invoke procedure 'apoc.path.subgraphAll': Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
Option 2 (Using with, unwind, collect clause)
MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b)
WITH a as STRTND, Collect(r) as allr
UNWIND allr as rels
MATCH p=()-[rels]-()-[rels]-()
RETURN p
This fails with error "Cannot use the same relationship variable 'rels' for multiple patterns" but if I use the [rels] once like p=()-[rels]=() it works but not yielding me all nodes
Any help/suggestion/lead is appreciated. Thanks in advance
Update
Trying to give more context
Showing the Underlying Data
MATCH (TC:TBLCON) RETURN TC
"TC"
{"Tables":["TBL1","TBL2","TBL3"],"TCName":"A1TS1","AppName":"A1"}
{"Tables":["TBL4","TBL1"],"TCName":"A2TS1","AppName":"A2"}
MATCH (A:App) RETURN A
"A"
{"Sponsor":"XY","Platform":"Oracle","TechOwnr":"VV","Version":"12","Tags":["ERP","OracleEBS","FinanceSystem"],"AppName":"A1"}
{"Sponsor":"CC","Platform":"Teradata","TechOwnr":"RZ","Tags":["EDW","DataWarehouse"],"AppName":"A2"}
MATCH ()-[r]-() RETURN distinct r.relname
"r.relname"
"FINREP" │ (runs between A1 to other apps)
"UPFRNT" │ (runs between A2 to different Salesforce App)
"INVOICE" │ (runs between A1 to other apps)
With this, here is what am trying to achieve
Assume "TBL3" is getting altered in App A1, I wanted to write a query specifying the table "TBL3" in match pattern, get all associated relationships and connected nodes (upstream)
May be I need to achieve in 3 steps,
Step 1 - Write a match pattern to find the start node and associated relationship(s)
Step 2 - Store that relationship(s) from step 1 in a Array variable / parameter
Step 3 - Pass the start node from step 1 & parameter from step 2 to apoc.path.subgraphAll to see all the impacted nodes
This may conceptually sound valid but how to do that technically in neo4j Cypher query is the question.
Hope this helps
This query may do what you want:
MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
MATCH p=(tc)-[:Foo*]-()
WITH tc,
REDUCE(s = [], x IN COLLECT(NODES(p)) | s + x) AS ns,
REDUCE(t = [], y IN COLLECT(RELATIONSHIPS(p)) | t + y) AS rs
UNWIND ns AS n
WITH tc, rs, COLLECT(DISTINCT n) AS nodes
UNWIND rs AS rel
RETURN tc, nodes, COLLECT(DISTINCT rel) AS rels;
It assumes that you provide the name of the table of interest (e.g., "TBL3") as the value of a table parameter. It also assumes that the relationships of interest all have the Foo type.
It first finds tc, the TBLCON node(s) containing that table name. It then uses a variable-length non-directional search for all paths (with non-repeating relationships) that include tc. It then uses COLLECT twice: to aggregate the list of nodes in each path, and to aggregate the list of relationships in each path. Each aggregation result would be a list of lists, so it uses REDUCE on each outer list to merge the inner lists. It then uses UNWIND and COLLECT(DISTINCT x) on each list to produce a list with unique elements.
[UPDATE]
If you differentiate between your relationships by type (rather than by property value), your Cypher code can be a lot simpler by taking advantage of APOC functions. The following query assumes that the desired relationship types are passed via a types parameter:
MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
CALL apoc.path.subgraphAll(
tc, {relationshipFilter: apoc.text.join($types, '|')}) YIELD nodes, relationships
RETURN nodes, relationships;
WIth some lead from cybersam's response, the below query gets me what I want. Only constraint is, this result is limited to 3 layers (3rd layer through Optional Match)
MATCH (TC:TBLCON) WHERE 'TBL3' IN TC.Tables
CALL apoc.path.subgraphAll(TC, {maxLevel:1}) YIELD nodes AS invN, relationships AS invR
WITH TC, REDUCE (tmpL=[], tmpr IN invR | tmpL+type(tmpr)) AS impR
MATCH FLP=(TC)-[]-()-[FLR]-(SL) WHERE type(FLR) IN impR
WITH FLP, TC, SL,impR
OPTIONAL MATCH SLP=(SL)-[SLR]-() WHERE type(SLR) IN impR RETURN FLP,SLP
This works for my needs, hope this might also help someone.
Thanks everyone for the responses and suggestions
****Update****
Enhanced the query to get rid of Optional Match criteria and other given limitations
MATCH (initTC:TBLCON) WHERE $TL IN initTC.Tables
WITH Reduce(O="",OO in Reduce (I=[], II in collect(apoc.node.relationship.types(initTC)) | I+II) | O+OO+"|") as RF
MATCH (TC:TBLCON) WHERE $TL IN TC.Tables
CALL apoc.path.subgraphAll(TC,{relationshipFilter:RF}) YIELD nodes, relationships
RETURN nodes, relationships
Thanks all (especially cybersam)

pushgateway or node exporter - how to add string data?

I have a cron job that runs an sql query every day, and gives me an important integer.
And I have to expose that integer to the Prometheus server.
As I've seen I have two options; use the pushgateway or node exporter.
But that metric (integer) that I get from the sql query also need some information (like the company name, and the database that I got it from).
What would be a better way?
For instance this is what I made for my metric:
count = some number
registry = CollectorRegistry()
g = Gauge('machine_number', 'machfoobarine_stat', registry=registry).set(count)
push_to_gateway('localhost:9091', job='batchA', registry=registry)
So how do I add key-value pairs to my metric above?
Do I have to change the job name ('batchA') for every single sql count that I get and expose as a metric to the pushgateway, because I can only see the last one?
Tnx,
Tom
The best way is to set a general name to your metric, for example animal_count and then specialize it with label. Here is an pseudocode:
g = Gauge.build("animal_count", "Number of animal in zoo")
.labelsName("sex", "classes")
.create();
g.labels("male", "mammals")
.set(count);

Redis sorted sets and best way to store uids

I have data consisting of user_ids and tags of these user ids.
The user_ids occur multiple times and have pre-specified number of tags (500) however that might change in the feature. What must be stored is the user_id, their tags and their count.
I want later to easily find tags with top score.. etc. Every time a tag appears it is incremented
My implementation in redis is done using sorted sets
every user_id is a sorted set
key is user_id and is a hex number
works like this:
zincrby user_id:x 1 "tag0"
zincrby user_id:x 1 "tag499"
zincrby user_id:y 1 "tag3"
and so on
having in mind that I want to get tags with highest score, is there a better way?
The second issue is that right now I 'm using "keys *" to retrieve these keys for client side manipulation which I know that it's not aimed towards production systems.
Plus it would be great for memory problems to iterate through a specified number of keys (in the range of 10000). I know that keys have to be stored in memory, however they don't follow
a specific pattern to allow for partial retrieval so I can avoid "zmalloc" error (4GB 64 bit debian server).
Keys amount to range of 20 million.
Any thoughts?
My first point would be to note that 4 GB are tight to store 20M sorted sets. A quick try shows that 20M users, each of them with 20 tags would take about 8 GB on a 64 bits box (and it accounts for the sorted set ziplist memory optimizations provided with Redis 2.4 - don't even try this with earlier versions).
Sorted sets are the ideal data structure to support your use case. I would use them exactly as you described.
As you pointed out, KEYS cannot be used to iterate on keys. It is rather meant as a debug command. To support key iteration, you need to add a data structure to provide this access path. The only structures in Redis which can support iteration are the list and the sorted set (through the range methods). However, they tend to transform O(n) iteration algorithms into O(n^2) (for list), or O(nlogn) (for zset). A list is also a poor choice to store keys since it will be difficult to maintain it as keys are added/removed.
A more efficient solution is to add an index composed of regular sets. You need to use a hash function to associate a specific user to a bucket, and add the user id to the set corresponding to this bucket. If the user id are numeric values, a simple modulo function will be enough. If they are not, a simple string hashing function will do the trick.
So to support iteration on user:1000, user:2000 and user:1001, let's choose a modulo 1000 function. user:1000 and user:2000 will be put in bucket index:0 while user:1001 will be put in bucket index:1.
So on top of the zsets, we now have the following keys:
index:0 => set[ 1000, 2000 ]
index:1 => set[ 1001 ]
In the sets, the prefix of the keys is not needed, and it allows Redis to optimize the memory consumption by serializing the sets provided they are kept small enough (integer sets optimization proposed by Sripathi Krishnan).
The global iteration consists in a simple loop on the buckets from 0 to 1000 (excluded). For each bucket, the SMEMBERS command is applied to retrieve the corresponding set, and the client can then iterate on the individual items.
Here is an example in Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# ----------------------------------------------------
import redis, random
POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
NUSERS = 10000
NTAGS = 500
NBUCKETS = 1000
# ----------------------------------------------------
# Fill redis with some random data
def fill(r):
p = r.pipeline()
# Create only 10000 users for this example
for id in range(0,NUSERS):
user = "user:%d" % id
# Add the user in the index: a simple modulo is used to hash the user id
# and put it in the correct bucket
p.sadd( "index:%d" % (id%NBUCKETS), id )
# Add random tags to the user
for x in range(0,20):
tag = "tag:%d" % (random.randint(0,NTAGS))
p.zincrby( user, tag, 1 )
# Flush the pipeline every 1000 users
if id % 1000 == 0:
p.execute()
print id
# Flush one last time
p.execute()
# ----------------------------------------------------
# Iterate on all the users and display their 5 highest ranked tags
def iterate(r):
# Iterate on the buckets of the key index
# The range depends on the function used to hash the user id
for x in range(0,NBUCKETS):
# Iterate on the users in this bucket
for id in r.smembers( "index:%d"%(x) ):
user = "user:%d" % int(id)
print user,r.zrevrangebyscore(user,"+inf","-inf", 0, 5, True )
# ----------------------------------------------------
# Main function
def main():
r = redis.Redis(connection_pool=POOL)
r.flushall()
m = r.info()["used_memory"]
fill(r)
info = r.info()
print "Keys: ",info["db0"]["keys"]
print "Memory: ",info["used_memory"]-m
iterate(r)
# ----------------------------------------------------
main()
By tweaking the constants, you can also use this program to evaluate the global memory consumption of this data structure.
IMO this strategy is simple and efficient, because it offers O(1) complexity to add/remove users, and true O(n) complexity to iterate on all items. The only downside is the key iteration order is random.