Returning a map object in cypher - cypher

I need to create edges between a set of nodes but it is not guaranteed that the edge is not exists already, I need to know which edges has been created so I can increment the edges counter for the two connected nodes.
I want to know the edges count for every node without querying the graph each time.
Example:
MERGE (u:user {id:999049043279872})
MERGE (g1:group {id:346709075951616})
MERGE (g2:group {id:346709075951617})
MERGE (g1)-[m1:member]->(u)
MERGE (g2)-[m2:member]->(u)
Sometimes the user is already a member of the group so I don't want to increment the counter in this case.
I tried to use the result statistics but it returns the created relationships number only, I thought also about using a map and then fill the content using ON CREATE SET after MERGE:
WITH {g1:0, g2:0} as res
MERGE (u:user {id:999049043279872})
MERGE (g1:group {id:346709075951616})
MERGE (g2:group {id:346709075951617})
MERGE (g1)-[m1:member]->(u)
ON CREATE SET res.g1 = 1
MERGE (g2)-[m2:member]->(u)
ON CREATE SET res.g2 = 1
RETURN res
But it does not works; the server crashes immediately after executing the query.
Exception:
------ FAST MEMORY TEST ------
17235:M 28 Feb 2022 16:56:50.016 # main thread terminated
17235:M 28 Feb 2022 16:56:50.017 # Bio thread for job type #0 terminated
17235:M 28 Feb 2022 16:56:50.017 # Bio thread for job type #1 terminated
17235:M 28 Feb 2022 16:56:50.018 # Bio thread for job type #2 terminated
Fast memory test PASSED, however your memory can still be broken.
Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /lib/x86_64-linux-gnu/libc.so.6 (base 0x7fbfe3dcc000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash by opening an issue on github:
http://github.com/redis/redis/issues
Suspect RAM error? Use redis-server --test-memory to verify it.
Segmentation fault
Any ideas?
Thanks in advance

Neo4j stores already a counter inside each node to count the number of relationships and to provide a fast count access. When you want to get the number of members in a group, you can simply do:
MATCH (g:group)
return size((g)<-[:member]-())

Related

GitLab API - get the overall # of lines of code

I'm able to get the stats (additions, deletions, total) for each commit, however how can I get the overall #?
For example, if one MR has 30 commits, I need the net # of lines of code added\deleted which you can see in the top corner.
This # IS NOT the sum of all #'s per commit.
So, I would need an API that returns the net # of lines of code added\removed at MR level (no matter how many commits are).
For example, if I have 2 commits: 1st one adds 10 lines, and the 2nd one removes the exact same 10 lines, then the net # is 0.
Here is the scenario:
I have an MR with 30 commits.
GitLab API provides support to get the stats (lines of code added\deleted) per Commit (individually).
If I go in GitLab UI, go to the MR \ Changes, I see the # of lines added\deleted that is not the SUM of all the Commits stats that I'm getting thru API.
That's my issue.
A simpler example: let's say I have 2 commits, one adds 10 lines of code, while the 2nd commit removes the exact same 10 lines of code. Using the API, I'm getting the sum, which is 20 LOCs added. However, if I go in the GitLab UI \ Changes, it's showing me 0 (zero), which is correct; that's the net # of chgs overall. This is the inconsistency I noticed.
To do this for an MR, you would use the MR changes API and count the occurrences of lines starting with + and - in the changes[].diff fields to get the additions and deletions respectively.
Using bash with gitlab-org/gitlab-runner!3195 as an example:
GITLAB_HOST="https://gitlab.com"
PROJECT_ID="250833"
MR_ID="3195"
URL="${GITLAB_HOST}/api/v4/projects/${PROJECT_ID}/merge_requests/${MR_ID}/changes"
DIFF=$(curl ${URL} | jq -r ".changes[].diff")
ADDITIONS=$(grep -E "^\+" <<< "$DIFF")
DELETIONS=$(grep -E "^\-" <<< "$DIFF")
NUM_ADDITIONS=$(wc -l <<< "$ADDITIONS")
NUM_DELETIONS=$(wc -l <<< "$DELETIONS")
echo "${MR_ID} has ${NUM_ADDITIONS} additions and ${NUM_DELETIONS} deletions"
The output is
3195 has 9 additions and 2 deletions
This matches the UI, which also shows 9 additions and 2 deletions
This, as you can see is a representative example of your described scenario since the combined total of the individual commits in this MR are 13 additions and 6 deletions.

SLURM releasing resources using scontrol update results in unknown endtime

I have a program that will dynamically release resources during job execution, using the command:
scontrol update JobId=$SLURM_JOB_ID NodeList=${remaininghosts}
However, this results in some very weird behavior sometimes. Where the job is re-queued. Below is the output of sacct
sacct -j 1448590
JobID NNodes State Start End NodeList
1448590 4 RESIZING 20:47:28 01:04:22 [0812,0827],[0663-0664]
1448590.0 4 COMPLETED 20:47:30 20:47:30 [0812,0827],[0663-0664]
1448590.1 4 RESIZING 20:47:30 01:04:22 [0812,0827],[0663-0664]
1448590 3 RESIZING 01:04:22 01:06:42 [0812,0827],0663
1448590 2 RESIZING 01:06:42 1:12:42 0827,tnxt-0663
1448590 4 COMPLETED 05:33:15 Unknown 0805-0807,0809]
The first lines show everything works fine, nodes are getting released but in the last line, it shows a completely different set of nodes with an unknown end time. The slurm logs show the job got requeued:
requeue JobID=1448590 State=0x8000 NodeCnt=1 due to node failure.
I suspect this might happen because the head node is killed, but the slurm documentation doesn't say anything about that.
Does anybody had an idea or suggestion?
Thanks
In this post there was a discussion about resizing jobs.
In your particular case, for shrinking I would use:
Assuming that j1 has been submitted with:
$ salloc -N4 bash
Update j1 to the new size:
$ scontrol update jobid=$SLURM_JOBID NumNodes=2
$ scontrol update jobid=$SLURM_JOBID NumNodes=ALL
And update the environmental variables of j1 (the script is created by the previous commands):
$ ./slurm_job_$SLURM_JOBID_resize.sh
Now, j1 has 2 nodes.
In your example, your "remaininghost" list, as you say, may exclude the head node that is needed by Slurm to shrink the job. If you provide a quantity instead of a list, the resize should work.

How to get information on latest successful pod deployment in OpenShift 3.6

I am currently working on making a CICD script to deploy a complex environment into another environment. We have multiple technology involved and I currently want to optimize this script because it's taking too much time to fetch information on each environment.
In the OpenShift 3.6 section, I need to get the last successful deployment for each application for a specific project. I try to find a quick way to do so, but right now I only found this solution :
oc rollout history dc -n <Project_name>
This will give me the following output
deploymentconfigs "<Application_name>"
REVISION STATUS CAUSE
1 Complete config change
2 Complete config change
3 Failed manual change
4 Running config change
deploymentconfigs "<Application_name2>"
REVISION STATUS CAUSE
18 Complete config change
19 Complete config change
20 Complete manual change
21 Failed config change
....
I then take this output and parse each line to know which is the latest revision that have the status "Complete".
In the above example, I would get this list :
<Application_name> : 2
<Application_name2> : 20
Then for each application and each revision I do :
oc rollout history dc/<Application_name> -n <Project_name> --revision=<Latest_Revision>
In the above example the Latest_Revision for Application_name is 2 which is the latest complete revision not building and not failed.
This will give me the output with the information I need which is the version of the ear and the version of the configuration that was used in the creation of the image use for this successful deployment.
But since I have multiple application, this process can take up to 2 minutes per environment.
Would anybody have a better way of fetching the information I required?
Unless I am mistaken, it looks like there are no "one liner" with the possibility to get the information on the currently running and accessible application.
Thanks
Assuming that the currently active deployment is the latest successful one, you may try the following:
oc get dc -a --no-headers | awk '{print "oc rollout history dc "$1" --revision="$2}' | . /dev/stdin
It gets a list of deployments, feeds it to awk to extract the name $1 and revision $2, then compiles your command to extract the details, finally sends it to standard input to execute. It may be frowned upon for not using xargs or the like, but I found it easier for debugging (just drop the last part and see the commands printed out).
UPDATE:
On second thoughts, you might actually like this one better:
oc get dc -a -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.spec.template.spec.containers[0].env}{"\n\t"}{.spec.template.spec.containers[0].image}{"\n-------\n"}{end}'
The example output:
daily-checks
[map[name:SQL_QUERIES_DIR value:daily-checks/]]
docker-registry.default.svc:5000/ptrk-testing/daily-checks#sha256:b299434622b5f9e9958ae753b7211f1928318e57848e992bbf33a6e9ee0f6d94
-------
jboss-webserver31-tomcat
registry.access.redhat.com/jboss-webserver-3/webserver31-tomcat7-openshift#sha256:b5fac47d43939b82ce1e7ef864a7c2ee79db7920df5764b631f2783c4b73f044
-------
jtask
172.30.31.183:5000/ptrk-testing/app-txeq:build
-------
lifebicycle
docker-registry.default.svc:5000/ptrk-testing/lifebicycle#sha256:a93cfaf9efd9b806b0d4d3f0c087b369a9963ea05404c2c7445cc01f07344a35
You get the idea, with expressions like .spec.template.spec.containers[0].env you can reach for specific variables, labels, etc. Unfortunately the jsonpath output is not available with oc rollout history.
UPDATE 2:
You could also use post-deployment hooks to collect the data, if you can set up a listener for the hooks. Hopefully the information you need is inherited by the PODs. More info here: https://docs.openshift.com/container-platform/3.10/dev_guide/deployments/deployment_strategies.html#lifecycle-hooks

Why received ZFS dataset uses less space than original?

I have a dataset on the server1 that I want to back up to the second server2.
Server1 (original):
zfs list -o name,used,avail,refer,creation,usedds,usedsnap,origin,compression,compressratio,refcompressratio,mounted,atime,lused storage/iscsi/webhost-old produces:
NAME USED AVAIL REFER CREATION USEDDS USEDSNAP ORIGIN COMPRESS RATIO REFRATIO MOUNTED ATIME LUSED
storage/iscsi/webhost-old 67,8G 1,87T 67,8G Út kvě 31 6:54 2016 67,8G 16K - lz4 1.00x 1.00x - - 67,4G
Sending volume to the 2nd server:
zfs send storage/iscsi/webhost-old | pv | ssh -c arcfour,aes128-gcm#openssh.com root#10.0.0.2 zfs receive -Fduv pool/bkp-storage
received 69,6GB stream in 378 seconds (189MB/sec)
Server2 zfs list produces:
NAME USED AVAIL REFER CREATION USEDDS USEDSNAP ORIGIN COMPRESS RATIO REFRATIO MOUNTED ATIME LUSED
pool/bkp-storage/iscsi/webhost-old 36,1G 3,01T 36,1G Pá pro 29 10:25 2017 36,1G 0 - lz4 1.15x 1.15x - - 28,4G
Why is there such a difference in sizes? Thanks.
From what you posted, I noticed 3 things that seemed odd:
the compressratio is 1.15x on system 2, but 1.00x on system 1
on system 2, used is 1.27x higher than logicalused
the logicalused and the number zfs receive report are ~2.3x higher on system 1 than system 2
These terms are all defined in the man page, but are still confusing to reverse-engineer explanations for in practice.
(1) could happen if you enabled compression on the source dataset after you wrote all the data to it, since ZFS doesn't rewrite the data to compress it when you enable that setting. The data sent by zfs send is uncompressed unless you use -c, but system 2 will try to compress it as it runs zfs receive if the setting is enabled on the destination dataset. If both system 1 and system 2 had the same compression settings before the data was written, they would have the same compressratio as well.
(2) can happen due to metadata written along with your data, but in this case it's too high for "normal" metadata, which accounts for 1-2% of most pools. It's probably caused by a pool-wide setting, like configuring RAID-Z, or a weird combination of striping and mirroring (like 4 stripes, but with one of them being a mirror).
For (3), I re-read the man page to try to figure it out:
logicalused
The amount of space that is "logically" consumed by this dataset and
all its descendents. See the used property. The logical space
ignores the effect of the compression and copies properties, giving a
quantity closer to the amount of data that applications see.
If you were sending a dataset (instead of a single iSCSI volume) and the send size matched system 2's logicalused value (instead of system 1's), I would guess you forgot to send some child datasets (i.e. by using zfs send -R). However, neither of those are true in this case.
I had to do some additional digging -- this blog post from 2005 might contain the explanation. If system 1 didn't have compression enabled when the data was written (like I guessed above for (1)), the function responsible for not writing zeroed-out blocks (zio_compress_data) would not be run, so you probably have a bunch of empty blocks written to disk, and accounted for in the logicalused size. However, since lz4 is configured on system 2, it would run there, and those blocks would not be counted.

How to parse bhist log

I am using IBM LSF and trying to get usage statistics during a certain period. I found that bhist does the job, but the short form bhist output does not show all of the fields I need.
What I want to know is:
Is bhist's output field customizable? The fields I need are:
<jobid>
<user>
<queue>
<job_name>
<project_name>
<job_description>
<submission_time>
<pending_time>
<run_time>
If 1 is not possible, the long form (bhist -l) output shows everything I need, but the format is hard to manipulate. I've pasted an example of the format below.
For example, the number of line between records is not fixed, and the word wrap in each event may break the line in the middle of a word I'm trying to scan for. How do I parse this format with sed and awk?
JobId <1531>, User <user1>, Project <default>, Command< example200>
Fri Dec 27 13:04:14: Submitted from host <hostA> to Queue <priority>, CWD <$H
OME>, Specified Hosts <hostD>;
Fri Dec 27 13:04:19: Dispatched to <hostD>;
Fri Dec 27 13:04:19: Starting (Pid 8920);
Fri Dec 27 13:04:20: Running with execution home </home/user1>, Execution CWD
</home/user1>, Execution Pid <8920>;
Fri Dec 27 13:05:49: Suspended by the user or administrator;
Fri Dec 27 13:05:56: Suspended: Waiting for re-scheduling after being resumed
by user;
Fri Dec 27 13:05:57: Running;
Fri Dec 27 13:07:52: Done successfully. The CPU time used is 28.3 seconds.
Summary of time in seconds spent in various states by Sat Dec 27 13:07:52 1997
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
5 0 205 7 1 0 218
------------------------------------------------------------
.... repeat
I'm adding a second answer because it might help you with your problem without actually having to write your own solution (depending on the usage statistics you're after).
LSF already has a utility called bacct that computes and prints out various usage statistics about historical LSF jobs filtered by various criteria.
For example, to get summary usage statistics about jobs that were dispatched/completed/submitted between time0 and time1, you can use (respectively):
bacct -D time0,time1
bacct -C time0,time1
bacct -S time0,time1
Statistics about jobs submitted by a particular user:
bacct -u <username>
Statistics about jobs submitted to a particular queue:
bacct -q <queuename>
These options can be combined as well, so for example if you wanted statistics about jobs that were submitted and completed within a particular time window for a particular project, you can use:
bacct -S time0,time1 -C time0,time1 -P <projectname>
The output provides some summary information about all jobs that match the provided criteria like so:
$ bacct -u bobbafett -q normal
Accounting information about jobs that are:
- submitted by users bobbafett,
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to queues normal,
- accounted on all service classes.
------------------------------------------------------------------------------
SUMMARY: ( time unit: second )
Total number of done jobs: 0 Total number of exited jobs: 32
Total CPU time consumed: 46.8 Average CPU time consumed: 1.5
Maximum CPU time of a job: 9.0 Minimum CPU time of a job: 0.0
Total wait time in queues: 18680.0
Average wait time in queue: 583.8
Maximum wait time in queue: 5507.0 Minimum wait time in queue: 0.0
Average turnaround time: 11568 (seconds/job)
Maximum turnaround time: 43294 Minimum turnaround time: 40
Average hog factor of a job: 0.00 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.02 Minimum hog factor of a job: 0.00
Total Run time consumed: 351504 Average Run time consumed: 10984
Maximum Run time of a job: 1844674 Minimum Run time of a job: 0
Total throughput: 0.24 (jobs/hour) during 160.32 hours
Beginning time: Nov 11 17:55 Ending time: Nov 18 10:14
This command also has a long form output that provides some bhist -l-like information about each job that might be a bit easier to parse (although still not all that easy):
$ bacct -l -u bobbafett -q normal
Accounting information about jobs that are:
- submitted by users bobbafett,
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to queues normal,
- accounted on all service classes.
------------------------------------------------------------------------------
Job <101>, User <bobbafett>, Project <default>, Status <EXIT>, Queue <normal>,
Command <sleep 100000000>
Wed Nov 11 17:37:45: Submitted from host <endor>, CWD <$HOME>;
Wed Nov 11 17:55:05: Completed <exit>; TERM_OWNER: job killed by owner.
Accounting information about this job:
CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP
0.00 1040 1040 exit 0.0000 0M 0M
------------------------------------------------------------------------------
...
Long form output is pretty hard to parse. I know bjobs has an option for unformatted output (-UF) in older LSF versions which makes it a bit easier, and the most recent version of LSF allows you to customize which columns get printed in short form output with -o.
Unfortunately, neither of these options are available with bhist. The only real possibilities for historical information are:
Figure out some way to parse bhist -l -- impractical and maybe not even possible due to inconsistent formatting as you've discovered.
Write a C program to do what you want using the LSF API, which exposes the functions that bhist itself uses to parse the lsb.events file. This is the file that stores all the historical information about the LSF cluster, and is what bhist reads to generate its ouptut.
If C is not an option for you, then you could try writing a script to parse the lsb.events file directly -- the format is documented in the configuration reference. This is hard, but not impossible. Here is the relevant document for LSF 9.1.3.
My personal recommendation would be #2 -- the function you're looking for is lsb_geteventrec(). You'd basically read each line in lsb.events one at a time and pull out the information you need.