temp() not working on PBS Torque cluster with snakemake - snakemake

When I define an output/input file with temp(), the file does not seemed to be removed when running on a PBS Torque cluster.
Is this a known problem? if so is there a workaround? or a maybe a way to write a rule to remove a file when it is no longer required by other rules?

Related

Checkpoint s3p flink on EMR

I have problem with checkpoint by s3p in the flink of EMR.
When creating the EMR cluster, I have a tick in Presto and added jar file as instructed at https://ci.apache.org/projects/flink/flink-docs-stable/ops/plugins.html.
But when checking point by s3p in flink, it still reports
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3p'. The scheme is directly supported by Flink through the following plugin: flink-s3-fs-presto. Please ensure that each plugin resides within its own subfolder within the plugins directory. See https://ci.apache.org/projects/flink/flink-docs-stable/ops/plugins.html for more information. If you want to use a Hadoop file system for that scheme, please add the scheme to the configuration fs.allowed-fallback-filesystems. For a full list of supported file systems, please see https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
Can you help me checkpoint s3p on the flink of EMR?
Thanks.
Presto in EMR has nothing to do with the flink-s3-fs-presto plugin in Flink. You can leave it unticked in the future (doesn't hurt either except blowing things up).
The most likely reason is that you forgot to create a subfolder in the plugins folder. Could you give me an ls of your Flink distribution?

Snakemake - Tibanna config support

I trying to run snakemake --tibanna to deploy Snakemake on AWS using the "Unicorn" Step Functions Tibanna creates.
I can't seem to find a way to change the different arguments Tibanna accepts like which subnet, AZ or Security Group will be used for the actual EC2 instance deployed.
Argument example (when running Tibanna without Snakemake):
https://github.com/4dn-dcic/tibanna/blob/master/test_json/unicorn/shelltest4.json#L32
Thanks!
Did you noticed this option?
snakemake --help
--tibanna-config TIBANNA_CONFIG [TIBANNA_CONFIG ...]
Additional tibanan config e.g. --tibanna-config
spot_instance=true subnet= security
group=
I think it was added recently.
-jk

Icache and Dcache in Simple.py configuration of gem5

I am trying to understand the models generated using gem5. I simulated a build/X86/gem5.opt with the gem5/configs/learning_gem5/part1/simple.py configuration file provided in gem5 repo.
In the output directory I get the following .dot graph:
I have the following doubts:
Does this design not have any Instruction and Data Cache? I checked the config.ini file there were no configuration statistics such as ICache/Dcache size.
What is the purpose of adding the icache_port and dcache_port?
system.cpu.icache_port = system.membus.slave
system.cpu.dcache_port = system.membus.slave
Does this design not have any Instruction and Data Cache? I checked the config.ini file there were no configuration statistics such as ICache/Dcache size.
I'm not very familiar with that config, but unless caches were added explicitly somewhere, then there aren't caches.
Just compare it to an se.py run e.g.:
build/ARM/gem5.opt configs/example/se.py --cmd hello.out \
--caches --l2cache --l1d_size=64kB --l1i_size=64kB --l2_size=256kB`
which definitely has caches, e.g. that config.ini at gem5 211869ea950f3cc3116655f06b1d46d3fa39fb3a contains:
[system.cpu.dcache]
size=65536
What is the purpose of adding the icache_port and dcache_port?
I'm not very familiar with the port system.
I think ports are used as a way for components to communicate, often in master / slave pairs, e.g. CPU is a master and the cache is a slave. So here I think that the CPU port is there but there is nothing attached to it, so no caches.
For example on the above se.py example we see this clearly:

Getting exception while running Pig Script

I am getting the below error while running a Pig Script on approximately a 300GB dataset.
Error: Exceeded limits on number of counters - Counters=120 Limit=120
Does anybody have any ideas on how to resolve the issue without modifying counter config in the Pig properties file?
This can not be qualified as proper answer since you need modify configuration files. I don't think there is any way at the moment of doing this without modifying some configuration files.
Now this is pure nit picking but actually you can do this without modifying Pig properties. All you need to do is to configure counter limit in Hadoop configuration file.
Add mapreduce.job.counters.max or mapreduce.job.counters.limit, depending your Hadoop version, to your file mapred-site.xml. Eg.
<property>
<name>mapreduce.job.counters.limit</name>
<value>256</value>
</property>
Remember to restart all node managers and also the history server.

Map Reduce job on Amazon: argument for custom jar

This is one of my first try with Map Reduce on AWS in its Management Console.
Hi have uploaded on AWS S3 my runnable jar developed on Hadoop 0.18, and it works on my local machine.
As described on documentation, I have passed the S3 paths for input and output as argument of the jar: all right, but the problem is the third argument that is another path (as string) to a file that I need to load while the job is in execution. That file resides on S3 bucket too, but it seems that my jar doesn't recognize the path and I got a FileNotFound Exception while it tries to load it. That is strange because this is a path exactly like the other two...
Anyone have any idea?
Thank you
Luca
This is a problem with AWS, please check Lesson 2 at http://meghsoft.com/blog/. See if you can use FileSystem.get(uri, conf) to obtain a file system supporting your path.
Hope this helps.
Sonal
Sonal,
thank you for your suggestion.
In the end the solution was using the DistributedCache.
Loading the file before to run the job I can access inside the Map Class everithing I need by overriding the confiure method and taking the file from the distributed cache (already loaded with the file).
Thank you,
Luca