List Jenkins job build detials for last one year along with the user who triggered the build - api

Is there any simple way to work with APIs or with scripting to get list of all builds performed on all jobs for last one year along with the user who triggered the build as a report?

This should do. Run from <JENKINS_URL>/script or in a Jenkins job with an "Execute System Groovy Script" (not an "Execute Groovy script").
Updated: to include details from the subject line.
def jobNamePattern ='.*' // adjust to folder/job regex as needed
def daysBack = 365 // adjust to how many days back to report on
def timeToDays = 24*60*60*1000 // converts msec to days
println "Job Name: ( # builds: last ${daysBack} days / overall ) Last Status\n Number | Trigger | Status | Date | Duration\n"
Jenkins.instance.allItems.findAll() {
it instanceof Job && it.fullName.matches(jobNamePattern)
}.each { job ->
builds = job.getBuilds().byTimestamp(System.currentTimeMillis() - daysBack*timeToDays, System.currentTimeMillis())
println job.fullName + ' ( ' + builds.size() + ' / ' + job.builds.size() + ' ) ' + job.getLastBuild()?.result
// individual build details
builds.each { build ->
println ' ' + build.number + ' | ' + build.getCauses()[0].getShortDescription() + ' | ' + build.result + ' | ' + build.getTimestampString2() + ' | ' + build.getDurationString()
}
}
return
Sample Output
ITSuppt/sampleApplication ( 4 / 11 ) SUCCESS
13 | Started by user Ian W | SUCCESS | 2020-10-22T01:57:58Z | 30 sec
12 | Started by user Ian W | FAILURE | 2020-10-22T01:51:36Z | 45 sec
11 | Started by user Ian W | SUCCESS | 2020-10-15T18:26:22Z | 29 sec
10 | Started by user Ian W | FAILURE | 2020-10-15T18:14:13Z | 55 sec
It could take a long time if you have a lot of jobs and builds, so you might want to restrict to skip the details to start or use a job pattern name. Build Javadoc for additional info.
Or, according to this S/O answer, you can Get build details for all builds of all jobs from Jenkins REST API (additional examples elsewhere).

Related

Input function for technical / biological replicates in snakemake

I´m currently trying to write a Snakemake workflow that can check automatically via a sample.tsv file if a given sample is a biological or technical replicate. And then use in this case at some point of my workflow a rule to merge technical/biological replicates.
My tsv file looks like this:
|sample | unit_bio | unit_tech | fq1 | fq2 |
|----------|----------|-----------|-----|-----|
| bCalAnn1 | 1 | 1 | /home/assembly_downstream/data/arima_HiC/bCalAnn1_1_1_R1.fastq.gz | /home/assembly_downstream/data/arima_HiC/bCalAnn1_1_1_R2.fastq.gz |
| bCalAnn1 | 1 | 2 | /home/assembly_downstream/data/arima_HiC/bCalAnn1_1_2_R1.fastq.gz | /home/assembly_downstream/data/arima_HiC/bCalAnn1_1_2_R2.fastq.gz |
| bCalAnn2 | 1 | 1 | /home/assembly_downstream/data/arima_HiC/bCalAnn2_1_1_R1.fastq.gz | /home/assembly_downstream/data/arima_HiC/bCalAnn2_1_1_R2.fastq.gz |
| bCalAnn2 | 1 | 2 | /home/assembly_downstream/data/arima_HiC/bCalAnn2_1_2_R1.fastq.gz | /home/assembly_downstream/data/arima_HiC/bCalAnn2_1_2_R2.fastq.gz |
| bCalAnn2 | 2 | 1 | /home/assembly_downstream/data/arima_HiC/bCalAnn2_2_1_R1.fastq.gz | /home/assembly_downstream/data/arima_HiC/bCalAnn2_2_1_R2.fastq.gz |
| bCalAnn2 | 3 | 1 | /home/assembly_downstream/data/arima_HiC/bCalAnn2_3_1_R1.fastq.gz | /home/assembly_downstream/data/arima_HiC/bCalAnn2_3_1_R2.fastq.gz |
My Pipeline looks like this:
import pandas as pd
import os
import yaml
configfile: "config.yaml"
samples = pd.read_table(config["samples"], dtype=str)
rule all:
input:
expand(config["arima_mapping"] + "final/{sample}_{unit_bio}_{unit_tech}.bam", zip,
sample=samples["sample"], unit_bio=samples["unit_bio"], unit_tech=samples["unit_tech"])
..
some rules
..
rule add_read_groups:
input:
config["arima_mapping"] + "paired/{sample}_{unit_bio}_{unit_tech}.bam"
output:
config["arima_mapping"] + "paired_read_groups/{sample}_{unit_bio}_{unit_tech}.bam"
params:
platform = "ILLUMINA",
sampleName = "{sample}",
library = "{sample}",
platform_unit ="None"
conda:
"../envs/arima_mapping.yaml"
log:
config["logs"] + "arima_mapping/paired_read_groups/{sample}_{unit_bio}_{unit_tech}.log"
shell:
"picard AddOrReplaceReadGroups I={input} O={output} SM={params.sampleName} LB={params.library} PU={params.platform_unit} PL={params.platform} 2> {log}"
rule merge_tech_repl:
input:
config["arima_mapping"] + "paired_read_groups/{sample}_{unit_bio}_{unit_tech}.bam"
output:
config["arima_mapping"] + "merge_tech_repl/{sample}_{unit_bio}_{unit_tech}.bam"
params:
val_string = "SILENT"
conda:
"../envs/arima_mapping.yaml"
log:
config["logs"] + "arima_mapping/merged_tech_repl/{sample}_{unit_bio}_{unit_tech}.log"
threads:
2 #verwendet nur maximal 2
shell:
"picard MergeSamFiles -I {input} -O {output} --ASSUME_SORTED true --USE_THREADING true --VALIDATION_STRINGENCY {params.val_string} 2> {log}"
rule mark_duplicates:
input:
config["arima_mapping"] + "merge_tech_repl/{sample}_{unit_bio}_{unit_tech}.bam" if config["tech_repl"] else config["arima_mapping"] + "paired_read_groups/{sample}_{unit_bio}_{unit_tech}.bam"
output:
bam = config["arima_mapping"] + "final/{sample}_{unit_bio}_{unit_tech}.bam",
metric = config["arima_mapping"] + "final/metric_{sample}_{unit_bio}_{unit_tech}.txt"
#params:
conda:
"../envs/arima_mapping.yaml"
log:
config["logs"] + "arima_mapping/mark_duplicates/{sample}_{unit_bio}_{unit_tech}.log"
shell:
"picard MarkDuplicates I={input} O={output.bam} M={output.metric} 2> {log}"
At the moment I have set a boolean in a config file that tells the mark_duplicates rule whether to take its input from the add_read_group or the merge_technical_replicates rule. This is of course not optimal since it could be that some samples may have duplicates (of any numbers) while others don´t. Therefore I want to create a syntax that checks the tsv table if a given sample name and unit_bio number are identical while the unit_tech number is different (and later analog to this for biological replicates), thus merging these specific samples while nonduplicate samples skip the merging rule.
EDIT
For clarification since I think I explained my goal confusingly.
My first attempt looks like this, I want "i" to be flexible, in case the duplicate number changes. I don't think that my input function returns all duplicates together that match each other but gives them one by one which is not what I want. I´m also unsure on how I would have to handle samples that do not have duplicates since they would have to skip this rule somehow.
input_function(wildcards):
return expand({sample}_{unit_bio}_{i}.bam", sample = wildcards.sample,
unit_bio = wildcards.unit_bio,
i = samples["sample"].str.count(wildcards.sample))
rule tech_duplicate_check:
input:
input_function #(that returns a list of 2-n duplicates, where n could be different for each sample)
output:
{sample}_{unit_bio}.bam
shell:
MergeTechDupl_tool {input} # input is a list
Therefore I want to create a syntax that checks the tsv table if a given sample name and unit_bio number are identical while the unit_tech number is different (and later analog to this for biological replicates), thus merging these specific samples while nonduplicate samples skip the merging rule.
rule gather_techdups_of_a_biodup:
output: "{sample}/{unit_bio}"
input: gather_techdups_of_a_biodup_input_fn
shell: "true" # Fill this in
rule gather_biodips_of_a_techdup:
output: "{sample}/{unit_tech}"
input: gather_biodips_of_a_techdup_input_fn
shell: "true" # Fill this in
After some attempts my main problem I struggle with is the table checking. As far as I know snakemake takes templates as input and checks for all samples that match this. But I would need to check the table for every sample that shares (e.g. for technical replicate) the sample name and the unit_bio number take all these samples and give them as input for the first rule run. Then I would have to take the next sample which was not already part of a previous run to prevent merging the same samples multiple times.
The logic you describe here can be implemented in the gather_techdups_of_a_biodup_input_fn and gather_biodips_of_a_techdup_input_fn functions above. For example, read your sample TSV file with pandas, filter for wildcards.sample and wildcards.unit_bio (or wildcards.unit_tech), then extract columns fq1 and fq2 from the filtered data frame.

Quartz job scheduler in .net core on 12AM and 12PM everyday

In a .NET Core 5 Web API project I have a job scheduler, which is updating something in the database.
I want to run that job scheduler twice a day at 12 AM and 12 PM. What will be the cron expression for that?
How am I able to run the Quartz job scheduler twice in a day?
Here is the code of scheduler start:
public async Task StartAsync(CancellationToken cancellationToken)
{
Scheduler = await _schedulerFactory.GetScheduler(cancellationToken);
Scheduler.JobFactory = _jobFactory;
var job2 = new JobSchedule(jobType: typeof(MCBJob),
cronExpression: "0 0 0/12 * * ");
var mcbJob = CreateJob(job2);
var mcbTrigger = CreateTrigger(job2);
await Scheduler.ScheduleJob(mcbJob, mcbTrigger, cancellationToken);
await Scheduler.Start(cancellationToken);
}
You can separate values with , to specify individual values.
https://en.wikipedia.org/wiki/Cron#CRON_expression
4 -> 4
0-4 -> 0,1,2,3,4
*/4 -> 0,4,8,12,...,52,56
0,4 -> 0,4
We can build the schedule now:
0 0 0,12 * *
| | | | every month
| | | every day
| | at hour 0 and 12
| at minute 0
at first second
You can use https://crontab.guru/ to build a cron expression interactively.
May be this is helpful in your case.
Visit http://www.cronmaker.com/
CronMaker is a simple website which helps you to build cron expressions. CronMaker uses Quartz open source scheduler. Generated expressions are based on Quartz cron format.

Karate - How to construct two tables, using lines from each to validate against the other [duplicate]

I want to use single row under examples in cucumber like below:
Examples:
| data1 | data2|paymentOp|
| MySql | uk1 |??????????|
Where paymentOp is a number which I am getting from java method which has List as an argument. The method returns each of the numbers which I want to pass it under paymentOp.
There is an absolute way to iterate it by copy the row and paste it again in the table but I don't want that because the method has a dynamic result which may return 2 or 5 set of numbers.
Is it possible to achieve it using Karate?
How to proceed further. Any lead here would be much appreciated!
You can combine Examples: with dynamic behavior. Please read this example (especially the second one): https://github.com/intuit/karate/blob/master/karate-demo/src/test/java/demo/outline/examples.feature
Since you have difficulties reading the docs and examples (:P) here is a simple example. Take some time to understand it carefully.
Background:
* def data = { one: 1, two: 2, three: 3 }
Scenario Outline:
* match data.<key> == <value>
Examples:
| key | value |
| one | 1 |
| two | 2 |
| three | 3 |

Executing all assertions in the same Spock test, even if one of them fails

I am trying to verify two different outputs in the context of a single Spock method that runs multiple test cases of the form when-then-where. For this reason I use two assertions at the then block, as can be seen in the following example:
import spock.lang.*
#Unroll
class ExampleSpec extends Specification {
def "Authentication test with empty credentials"() {
when:
def reportedErrorMessage, reportedErrorCode
(reportedErrorMessage, reportedErrorCode) = userAuthentication(name, password)
then:
reportedErrorMessage == expectedErrorMessage
reportedErrorCode == expectedErrorCode
where:
name | password || expectedErrorMessage | expectedErrorCode
' ' | null || 'Empty credentials!' | 10003
' ' | ' ' || 'Empty credentials!' | 10003
}
}
The code is an example where the design requirement is that if name and password are ' ' or null, then I should always expect exactly the same expectedErrorMessage = 'Empty credentials!' and expectedErrorCode = 10003. If for some reason (presumably because of bugs in the source code) I get expectedErrorMessage = Empty! (or anything else other than 'Empty credentials!') and expectedErrorCode = 10001 (or anything else other than 1003), this would not satisfy the above requirement.
The problem is that if both assertions fail in the same test, I get a failing message only for the first assertion (here for reportedErrorMessage). Is it possible to get informed for all failed assertions in the same test?
Here is a piece of code that demonstrates the same problem without other external code dependencies. I understand that in this particular case it is not a good practice to bundle two very different tests together, but I think it still demonstrates the problem.
import spock.lang.*
#Unroll
class ExampleSpec extends Specification {
def "minimum of #a and #b is #c and maximum of #a and #b is #d"() {
expect:
Math.min(a, b) == c
Math.max(a, b) == d
where:
a | b || c | d
3 | 7 || 3 | 7
5 | 4 || 5 | 4 // <--- both c and d fail here
9 | 9 || 9 | 9
}
}
Based on the latest comment by OP, it looks like a solution different from my previous answer would be helpful. I'm leaving the previous answer in-place, as I feel it still provides useful information related to the question (specifically separating positive and negative tests).
Given that you want to see all failures, and not just have it fail at the first assert that fails, I would suggest concatenating everything together into a boolean AND operation. Not using the && shortcut operator, because it will only run until the first check that does not satisfy the entire operation. I would suggest using the &, so that all checks are made, regardless of any previously failing checks.
Given the max and min example above, I would change the expect block to this:
Math.min(a, b) == c & Math.max(a, b) == d
When the failure occurs, it gives you the following information:
Math.min(a, b) == c & Math.max(a, b) == d
| | | | | | | | | | |
4 5 4 | 5 false 5 5 4 | 4
false false
This shows you every portion of the failing assert. By contrast, if you used the &&, it would only show you the first failure, which would look like this:
Math.min(a, b) == c && Math.max(a, b) == d
| | | | | |
4 5 4 | 5 false
false
This could obviously get messy pretty fast if you have more than two checks on a single line - but that is a tradeoff you can make between all failing information on one line, versus having to re-run the test after fixing each individual component.
Hope this helps!
I think there are two different things at play here.
Having a failing assert in your code will throw an error, which will cease execution of the code. This is why you can't have two failing assertions in a single test. Any line of code in an expect or then block in Spock has an implicit assert before it.
You are mixing positive and negative unit tests in the same test. I ran into this before myself, and I read/watched something about this and Spock (I believe from the creator, Peter Niederwieser), and learned that these should be separated into different tests. Unfortunately I couldn't find that reference. So basically, you'll need one test for failing use cases, and one test for passing/successful use cases.
Given that information, here is your second example code, with the tests separated out, with the failing scenario in the second test.
#Unroll
class ExampleSpec extends Specification {
def "minimum of #a and #b is #c and maximum of #a and #b is #d - successes"() {
expect:
Math.min(a, b) == c
Math.max(a, b) == d
where:
a | b || c | d
3 | 7 || 3 | 7
9 | 9 || 9 | 9
}
def "minimum of #a and #b is #c and maximum of #a and #b is #d - failures"() {
expect:
Math.min(a, b) != c
Math.max(a, b) != d
where:
a | b || c | d
5 | 4 || 5 | 4
}
}
As far as your comment about the MongoDB test case - I'm not sure what the intent is there, but I'm guessing they are making several assertions that are all passing, rather than validating that something is failing.

How can I efficiently create unique relationships in Neo4j?

Following up on my question here, I would like to create a constraint on relationships. That is, I would like there to be multiple nodes that share the same "neighborhood" name, but each uniquely point to a particular city in which they reside.
As encouraged in user2194039's answer, I am using the following index:
CREATE INDEX ON :Neighborhood(name)
Also, I have the following constraint:
CREATE CONSTRAINT ON (c:City) ASSERT c.name IS UNIQUE;
The following code fails to create unique relationships, and takes an excessively long period of time:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
WITH line
WHERE line.Neighborhood IS NOT NULL
WITH line
MATCH (c:City { name : line.City})
MERGE (c)<-[:IN]-(n:Neighborhood {name : toInt(line.Neighborhood)});
Note that there is a uniqueness constraint on City, but NOT on Neighborhood (because there should be multiple ones).
Profile with Limit 10,000:
+--------------+------+--------+---------------------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+--------------+------+--------+---------------------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 9750 | 3360 | anon[307], b, neighborhood, line | MergePattern |
| SchemaIndex | 9750 | 19500 | b, line | line.City; :City(name) |
| ColumnFilter | 9750 | 0 | line | keep columns line |
| Filter | 9750 | 0 | anon[220], line | anon[220] |
| Extract | 10000 | 0 | anon[220], line | anon[220] |
| Slice | 10000 | 0 | line | { AUTOINT0} |
| LoadCSV | 10000 | 0 | line | |
+--------------+------+--------+---------------------------+------------------------------+
Total database accesses: 22860
Following Guilherme's recommendation below, I implemented the helper yet it is raising the error py2neo.error.Finished. I've searched the documentation, and wasn't able to determine a work around from this. It looks like there's an open SO post about this exception.
def run_batch_query(queries, timeout=None):
if timeout:
http.socket_timeout = timeout
try:
graph = Graph()
authenticate("localhost:7474", "account", "password")
tx = graph.cypher.begin()
for query in queries:
statement, params = query
tx.append(statement, params)
results = tx.process()
tx.commit()
except http.SocketError as err:
raise err
except error.Finished as err:
raise err
collection = []
for result in results:
records = []
for record in result:
records.append(record)
collection.append(records)
return collection
main:
queries = []
template = ["MERGE (city:City {Name:{city}})", "Merge (city)<-[:IN]-(n:Neighborhood {Name : {neighborhood}})"]
statement = '\n'.join(template)
batch = 5000
c = 1
start = time.time()
# city_neighborhood_map is a defaultdict that maps city-> set of neighborhoods
for city, neighborhoods in city_neighborhood_map.iteritems():
for neighborhood in neighborhoods:
params = dict(city=city, neighborhood=neighborhood)
queries.append((statement, params))
c +=1
if c % batch == 0:
print "running batch"
print c
s = time.time()*1000
r = run_batch_query(queries, 10)
e = time.time()*1000
print("\t{0}, {1:.00f}ms".format(c, e-s))
del queries[:]
print c
if queries:
s = time.time()*1000
r = run_batch_query(queries, 300)
e = time.time()*1000
print("\t{0} {1:.00f}ms".format(c, e-s))
end = time.time()
print("End. {0}s".format(end-start))
If you want to create unique relationships you have 2 options:
Prevent the path from being duplicated, using MERGE, just like #user2194039 suggested. I think this is the simplest, and best approach you can take.
Turn your relationship into a node, and create an unique constraint on it. But it's hardly necessary for most cases.
If you're having trouble with speed, try using the transactional endpoint. I tried importing your data (random cities and neighbourhoods) through IMPORT CSV in 2.2.1, and I it was slow as well, though I am not sure why. If you send your queries with parameters to the transactional endpoint in batches of 1000-5000, you can monitor the process, and probably gain a performance boost.
I managed to import 1M rows in just under 11 minutes.
I used an INDEX for Neighbourhood(name) and a unique constraint for City(name).
Give it a try and see if it works for you.
Edit:
The transactional endpoint is a restful endpoint that allows you do execute transactions in batch. You can read about it here.
Basically, it allows you to stream a bunch of queries to the server at once.
I don't know what programming language/stack you're using, but in python, using a package like py2neo, it would be something like this:
with open("city.csv", "r") as fp:
reader = csv.reader(fp)
queries = []
template = ["MERGE (c :`City` {name: {city}})",
"MERGE (c)<-[:IN]-(n :`Neighborhood` {name: {neighborhood}})"]
statement = '\n'.join(template)
batch = 5000
c = 1
start = time.time()
for row in reader:
city, neighborhood = row
params = dict(city=city, neighborhood=neighborhood)
queries.append((statement, params))
if c % batch == 0:
s = time.time()*1000
r = neo4j.run_batch_query(queries, 10)
e = time.time()*1000
print("\t{0}, {1:.00f}ms".format(c, e-s))
del queries[:]
c += 1
if queries:
s = time.time()*1000
r = neo4j.run_batch_query(queries, 300)
e = time.time()*1000
print("\t{0} {1:.00f}ms".format(c, e-s))
end = time.time()
print("End. {0}s".format(end-start))
Helper functions:
def run_batch_query(queries, timeout=None):
if timeout:
http.socket_timeout = timeout
try:
graph = Graph(uri) # "{protocol}://{host}:{port}/db/data/"
tx = graph.cypher.begin()
for query in queries:
statement, params = query
tx.append(statement, params)
results = tx.process()
tx.commit()
except http.SocketError as err:
raise err
collection = []
for result in results:
records = []
for record in result:
records.append(record)
collection.append(records)
return collection
You will monitor how long each transaction takes, and you can tweak the number of queries per transactions, as well as the timeout.
To be sure we're on the same page, this is how I understand your model: Each city is unique and should have some number of neighborhoods pointing to it. The neighborhoods are unique within the context of a city, but not globally. So if you have a neighborhood 3 [IN] city Boston, you could also have a neighborhood 3 [IN] city Seattle, and both of those neighborhoods are represented by different nodes, even though they have the same name property. Is that correct?
Before importing, I would recommend adding an index to your neighborhood nodes. You can add the index without enforcing uniqueness. I have found that this greatly increases speeds on even small databases.
CREATE INDEX ON :Neighborhood(name)
And for the import:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
MERGE (c:City {name: line.City})
MERGE (c)<-[:IN]-(n:Neighborhood {name: toInt(line.Neighborhood)})
If you are importing a large amount of data, it may be best to use the USING PERIODIC COMMIT command to commit periodically while importing. This will reduce the memory used in the process, and if your server is memory-constrained, I could see it helping performance. In your case, with almost a million records, this is recommended by Neo4j. You can even adjust how often the commit happens by doing USING PERIODIC COMMIT 10000 or such. The docs say 1000 is the default. Just understand that this will break the import into several transactions.
Best of luck!