NextFlow: How to fail if channel is empty ( .ifEmpty() ) - nextflow

I'd like for my NextFlow pipeline to fail if a specific channel is empty because, as is, the pipeline will continue as though nothing is wrong, but the process depending on the channel never starts. The answer to a related post states that we generally shouldn't check if a channel is empty, but I'm not sure how else to handle this.
The issue I'm having in the below example is that it always fails, but the process is called if I comment out the .ifEmpty() statement.
Here's a basic example:
/*
* There are .cram files in this folder
*/
params.input_sample_folder = 'path/to/folder/*'
samples = Channel.fromPath(params.input_sample_folder, checkIfExists: true)
.filter( ~/.*(\.sam|\.bam|\.cram)/ )
.ifEmpty( exit 1,
"ERROR: Did not find any samples in ${params.input_sample_folder})
workflow{
PROCESS_SAMPLES( samples )
}
Ultimate questions:
My guess is that the channel does not fill immediately. Is that true? If so, when does it fill?
How should I handle this situation? I want to fail if the channel doesn't get populated. e.g., I was surprised to learn that the channel remains empty if I only provide a folder path without a glob/wildcard character (/path/to/folder/; no * or *.cram, etc.). I don't think I can handle it in the process itself, because the process never gets called if the channel is legitimately empty.
Really appreciate your help.

Setting checkIfExists: true will actually throw an exception for you if the specified files do not exist on your file system. The trick is to specify the files you need when you create the channel, rather than filtering for them downstream. For example, all you need is:
params.input_sample_folder = 'path/to/folder'
samples = Channel.fromPath(
"${params.input_sample_folder}/*.{sam,bam,cram}",
checkIfExists: true,
)
Or arguably better; since this gives the user full control over the input files:
params.input_sample_files = 'path/to/folder/*.{sam,bam,cram}'
samples = Channel.fromPath( params.input_sample_files, checkIfExists: true )
Either way, both will have your pipeline fail with exit status 1 and the following message in red when no matching files exist:
No files match pattern `*.{sam,bam,cram}` at path: path/to/folder/
As per the docs, the ifEmpty operator is really just intended to emit a default value when a channel becomes empty. To avoid having to check if a channel is empty, the general solution is to just avoid creating an empty channel in the first place. There's lots of ways to do this, but one way might look like:
import org.apache.log4j.Logger
nextflow.enable.dsl=2
def find_sample_files( input_dir ) {
def pattern = ~/.*(\.sam|\.bam|\.cram)/
def results = []
input_dir.eachFileMatch(pattern) { item ->
results.add( item )
}
return results
}
params.input_sample_folder = 'path/to/folder'
workflow {
input_sample_folder = file( params.input_sample_folder )
input_sample_files = find_sample_files( input_sample_folder )
if ( !input_sample_files ) {
log.error("ERROR: Did not find any samples in ${params.input_sample_folder}")
System.exit(1)
}
sample_files = Channel.of( input_sample_files )
sample_files.view()
}

Related

Scrapy spidermon exceptions

I'm trying to setup the basic suite of spidermon monitors as described here I did a quick Google search and also found this. So I made a quick monitors.py, then copy and pasted the code in there.
I then proceeded to do this:
SPIDERMON_ENABLED = True
SPIDERMON_SPIDER_CLOSE_MONITORS = (
'spidermon.contrib.scrapy.monitors.SpiderCloseMonitorSuite',
)
in my settings.py in the scrapy project.
It keeps raising this error:
spidermon.exceptions.NotConfigured: You should specify a minimum number of items to check against
Which I believe I've done (SPIDERMON_MIN_ITEMS = 10 # "SPIDERMON_MIN_ITEMS" - at the top of the file).
What am I doing wrong? I just want to setup the pre-defined monitors and then optimize them later.
Spidermon couldn't find a valid value for SPIDERMON_MIN_ITEMS in the settings. This must be an integer value bigger than zero otherwise it'll throw the error described. SPIDERMON_ADD_FIELD_COVERAGE set is also mandatory in order to use all the monitors available in this MonitorSuite.
In order to run the built-in close MonitorSuite SpiderCloseMonitorSuite from Spidermon project, please confirm if the settings.py file - located in the root directory of your scrapy project - have the variables below:
EXTENSIONS = {
'spidermon.contrib.scrapy.extensions.Spidermon': 500,
}
SPIDERMON_ENABLED = True
SPIDERMON_MIN_ITEMS = 10
SPIDERMON_ADD_FIELD_COVERAGE = True
SPIDERMON_SPIDER_CLOSE_MONITORS = (
'spidermon.contrib.scrapy.monitors.SpiderCloseMonitorSuite',
)

2nd call to MemoryCache.Set() with the same key erases entry if cache is full

This is a bit of an edge case, and I would submit this as a bug in the repo if I could find it...
Consider the following LINQPad snippet:
void Main()
{
var memoryCache = new MemoryCache(new MemoryCacheOptions
{
SizeLimit = 1 // <-- Setting to 2 fixes the issue
});
Set(memoryCache);
memoryCache.Get("A").Dump(); // Yields 1
Set(memoryCache);
memoryCache.Get("A").Dump(); // Yields null
}
private void Set(MemoryCache memoryCache)
{
//memoryCache.Remove("A"); // <-- Also fixes the issue
memoryCache.Set("A", 1, new MemoryCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromDays(1),
SlidingExpiration = TimeSpan.FromDays(1),
Size = 1
});
}
My question is, when using .Set(), is a new entry added, then the old removed, thus requiring extra space allocated in the cache?
This seems to relate to a bug I logged (which was rejected). You can see the source for this here, and from what I read the logic in your (and my) case works like this:
Add item requested.
Size would be exceeded if the item was to be added (UpdateCacheSizeExceedsCapacity()) therefore reject request (silently, which is what I objected to).
Also, because this condition was detected, kick off OvercapacityCompaction(), which will remove your item.
There looks to be a race-condition, as the compaction work is queued to a background thread; perhaps occasionally your test finds the item still there?
To answer your specific question - no, it's not first added then the excess removed.
Edit:
Re the second Get() always returning null... I missed some extra handling of the existing entry, if found (it doesn't improve the outcome though). Prior to checking whether the size would be exceeded by adding the item, there is this:
if (_entries.TryGetValue(entry.Key, out CacheEntry priorEntry))
{
priorEntry.SetExpired(EvictionReason.Replaced);
}
I.e. if it finds your existing entry it marks it as evicted. It then applies the UpdateCacheSizeExceedsCapacity() test, without factoring in that it's just evicted the existing entry (which arguably it could).
Later on, still in Set()/SetEntry(), in the exceeds-capacity case, it does this:
if (priorEntry != null)
{
RemoveEntry(priorEntry);
}
...which immediately removes the previous entry. So whether and when OvercapacityCompaction() (or ScanForExpiredItems()) would have got it doesn't matter, it's gone before returning from the Set()/SetEntry().
(I've also updated the source link above to the current value; doesn't change the logic).

Adding a dependency to a batch task at runtime

I have a batch job in AX 2012 R2 that runs, essentially iterating over a table and creating an instance of a class (that extends RunBaseBatch) that gets added as a task.
I also have some post processing items I need to do, after all the tasks have completed.
So far, the following is working:
while select stagingTable where stagingTable.OperationNo == params.paramOperationNo()
{
batchHeader = this.getCurrentBatchHeader();
batchTask = OperationTask::construct();
batchHeader.addRuntimeTask(batchTask,this.getCurrentBatchTask().RecId);
}
batchHeader.save();
postTask = PostProcessingTask::construct();
batchHeader.addRuntimeTask(postTask,this.getCurrentBatchTask().RecId);
batchHeader.addDependency(postTask,batchTask,BatchDependencyStatus::FinishedOrError);
batchHeader.save();
My thought is that this will add a dependency on the post process task to not start until we get Finished or Error on the last task added in the loop. What I get instead is an exception "The dependency could not be created because task '' does not exist."
I'm uncertain what I'm missing, as the tasks all get added executed successfully, it seems that just the dependency doesn't want to work.
Several things, where this code is being called matters. Is the code already in batch? Is the code calling in doBatch() before/after the super? etc.
You have a while-select, does this create multiple batch tasks? If it does, then you need to create a dependency on each batch task object. This is one problem I see. If your while-select statement only selects 1 record and adds one task, then the problem is something else, but you shouldn't do a while-select to select one record.
Also, you call batchHeader.save(); two times. I'd probably remove the first call. I'd need to see what is instantiating your code.
Where you have this.getCurrentBatchTask().RecId, depending on if your code is in batch or not, try replacing that with BatchHeader::getCurrentBatchTask().RecId
And where you have batchHeader = this.getCurrentBatchHeader(); replace that with batchHeader = BatchHeader::getCurrentBatchHeader();
EDIT Try this code (fix whatever to make it compile)
BatchHeader batchHeader = BatchHeader::getCurrentBatchHeader();
Set set = new Set(Types::Class);
SetEnumerator se;
BatchTask batchTask;
PostTask postTask;
while select stagingTable where stagingTable.OperationNo == params.paramOperationNo()
{
batchTask = OperationTask::construct();
set.add(batchTask);
batchHeader.addRuntimeTask(batchTask,BatchHeader::getCurrentBatchTask().RecId);
}
// Create post task
postTask = PostProcessingTask::construct();
batchHeader.addRuntimeTask(postTask,BatchHeader::getCurrentBatchTask().RecId);
// Create dependencies
se = set.getEnumerator();
while (se.moveNext())
{
batchTask = se.current(); // Task to make dependent on
batchHeader.addDependency(postTask,batchTask,BatchDependencyStatus::FinishedOrError);
}
batchHeader.save();

Knowledge & Connect PHP API, Found object(Account or Answer) but contains only null fields

I'm facing some strange issues when I try to fetch(Connect PHP API)/searchContent(Knowledge Foundation API) following the tutorials/documentations.
Behaviour and output
Following the documentation, we initialize the API. The function error_get_last() (called after the fetch) states that the core read-only file (we are not allowed to modify it) contains an error:
Array ( [type] => 8 [message] => Undefined index: REDIRECT_URL [file] => /cgi-bin/${interface_name}.cfg/scripts/cp/core/framework/3.2.4/init.php [line] => 246 )
After initialization, we call the fetch function to retrieve an account. If we give a wrong ID, it returns an error:
Invalid ID: No such Account with ID = 32
Otherwise, furnishing a correct ID returns an Account object with all fields populated as NULL:
object(RightNow\Connect\v1_2\Account)#22 (25) {
["ID"]=>
NULL
["LookupName"]=>
NULL
["CreatedTime"]=>
NULL
["UpdatedTime"]=>
NULL
["AccountHierarchy"]=>
NULL
["Attributes"]=>
NULL
["Country"]=>
NULL
["CustomFields"]=>
NULL
["DisplayName"]=>
NULL
["DisplayOrder"]=>
NULL
["EmailNotification"]=>
NULL
["Emails"]=>
NULL
["Login"]=>
NULL
/* [...] */
["StaffGroup"]=>
NULL
}
Attempts, workaround and troubleshooting information
Configuration: The account used using the InitConnectAPI() has the permissions
Initialization: Call to InitConnectAPI() not throwing any exception(added a try - catch block)
Call to the fetch function: As said above, the call to RNCPHP\Account::fetch($act_id) finds the account (invalid_id => error) but doesn't manage to populate the fields
No exception is thrown on the RNCPHP::fetch($correct_id) call
The behaviour is the same when I try to retrieve an answer following a sample example from the Knowledge Foundation API : $token = \RNCK::StartInteraction(...) ; \RNCK::searchContent($token, 'lorem ipsum');
Using PHP's SoapClient, I manage to retrieve populated objects. However, It's not part of the standard and a self-call-local-WebService is not a good practice.
Code reproducing the issue
error_reporting(E_ALL);
require_once(get_cfg_var('doc_root') . '/include/ConnectPHP/Connect_init.phph');
InitConnectAPI();
use RightNow\Connect\v1_2 as RNCPHP;
/* [...] */
try
{
$fetched_acct = RNCPHP\Account::fetch($correct_usr_id);
} catch ( \Exception $e)
{
echo ($e->getMessage());
}
// Dump part
echo ("<pre>");
var_dump($fetched_acct);
echo ("</pre>");
// The core's error on which I have no control
print_r(error_get_last());
Questions:
Have any of you face the same issue ? What is the workaround/fix which would help me solve it ?
According to the RNCPHP\Account::fetch($correct_usr_id) function behaviour, we can surmise that the issue comes from the 'fields populating' step which might be part of the core (on which I have no power). How am I supposed to deal with this (fetch is static and account doesn't seem abstract) ?
I tried to use the debug_backtrace() function in order to have some visibility on what may go wrong but it doesn't output relevant information. Is there any way I can get more debug information ?
Thanks in advance,
Oracle Service Cloud uses lazy loading to populate the object variables from queried data using Connect for PHP APIs. When you output the result of an object, it will appear as each variable is empty, per your example. However, if you access the parameter, then it becomes available. This is only an issue when you try to print your object, like this example. Accessing the data should be immediate.
To print your object, like in your example, you would need to iterate through the object variables and access each one first. You could build a helper class to do that through reflection. But, to illustrate with a single field, do the following:
$acct = RNCPHP\Account::fetch($correctId);
$acct->ID;
print_r($acct); // Will now "show" ID, but none of the other fields have been loaded.
In the real world, you probably just want to operate on the data. So, even though you cannot "see" the data in the object, it's there. In the example below, we're accessing the updated time of the account and then performing an action on the object if it meets a condition.
//Set to disabled if last updated < 90 days ago
$acct = RNCPHP\Account::fetch($correctId);
$chkDate = time() - 7776000;
if($acct->UpdatedTime < $chkDate){
$acct->Attributes->PermanentlyDisabled = true;
$acct->save(RNCPHP\RNObject::SuppressAll);
}
If you were to print_r the object after the if condition, then you would see the UpdatedTime variable data because it was loaded at the condition check.

How to recursively parse xsd files to generate a list of included schemas for incremental build in Maven?

I have a Maven project that uses the jaxb2-maven-plugin to compile some xsd files. It uses the staleFile to determine whether or not any of the referenced schemaFiles have been changed. Unfortunately, the xsd files in question use <xs:include schemaLocation="../relative/path.xsd"/> tags to include other schema files that are not listed in the schemaFile argument so the staleFile calculation in the plugin doesn't accurately detect when things need to be actually recompiled. This winds up breaking incremental builds as the included schemas evolve.
Obviously, one solution would be to list all the recursively referenced files in the execution's schemaFile. However, there are going to be cases where developers don't do this and break the build. I'd like instead to automate the generation of this list in some way.
One approach that comes to mind would be to somehow parse the top-level XSD files and then either sets a property or outputs a file that I can then pass into the schemaFile parameter or schemaFiles parameter. The Groovy gmaven plugin seems like it might be a natural way to embed that functionality right into the POM. But I'm not familiar enough with Groovy to get started.
Can anyone provide some sample code? Or offer an alternative implementation/solution?
Thanks!
Not sure how you'd integrate it into your Maven build -- Maven isn't really my thing :-(
However, if you have the path to an xsd file, you should be able to get the files it references by doing something like:
def rootXsd = new File( 'path/to/xsd' )
def refs = new XmlSlurper().parse( rootXsd ).depthFirst().findAll { it.name()=='include' }.#schemaLocation*.text()
println "$rootXsd references $refs"
So refs is a list of Strings which should be the paths to the included xsds
Based on tim_yates's answer, the following is a workable solution, which you may have to customize based on how you are configuring the jaxb2 plugin.
Configure a gmaven-plugin execution early in the lifecycle (e.g., in the initialize phase) that runs with the following configuration...
Start with a function to collect File objects of referenced schemas (this is a refinement of Tim's answer):
def findRefs { f ->
def relPaths = new XmlSlurper().parse(f).depthFirst().findAll {
it.name()=='include'
}*.#schemaLocation*.text()
relPaths.collect { new File(f.absoluteFile.parent + "/" + it).canonicalFile }
}
Wrap that in a function that iterates on the results until all children are found:
def recursiveFindRefs = { schemaFiles ->
def outputs = [] as Set
def inputs = schemaFiles as Queue
// Breadth-first examine all refs in all schema files
while (xsd = inputs.poll()) {
outputs << xsd
findRefs(xsd).each {
if (!outputs.contains(it)) inputs.add(it)
}
}
outputs
}
The real magic then comes in when you parse the Maven project to determine what to do.
First, find the JAXB plugin:
jaxb = project.build.plugins.find { it.artifactId == 'jaxb2-maven-plugin' }
Then, parse each execution of that plugin (if you have multiple). The code assumes that each execution sets schemaDirectory, schemaFiles and staleFile (i.e., does not use the defaults!) and that you are not using schemaListFileName:
jaxb.executions.each { ex ->
log.info("Processing jaxb execution $ex")
// Extract the schema locations; the configuration is an Xpp3Dom
ex.configuration.children.each { conf ->
switch (conf.name) {
case "schemaDirectory":
schemaDirectory = conf.value
break
case "schemaFiles":
schemaFiles = conf.value.split(/,\s*/)
break
case "staleFile":
staleFile = conf.value
break
}
}
Finally, we can open the schemaFiles, parse them using the functions we've defined earlier:
def schemaHandles = schemaFiles.collect { new File("${project.basedir}/${schemaDirectory}", it) }
def allSchemaHandles = recursiveFindRefs(schemaHandles)
...and compare their last modified times against the stale file's modification time,
unlinking the stale file if necessary.
def maxLastModified = allSchemaHandles.collect {
it.lastModified()
}.max()
def staleHandle = new File(staleFile)
if (staleHandle.lastModified() < maxLastModified) {
log.info(" New schemas detected; unlinking $staleFile.")
staleHandle.delete()
}
}