So, I have this piece of code:
for (z in 0 until texture.extent.z) {
println(z)
for (y in 0 until texture.extent.y)
for (x in 0 until texture.extent.x) {
val v = Vec3(x, y, z) / texture.extent
var n = when {
FRACTAL -> FractalNoise().noise(v * noiseScale)
else -> 20f * glm.perlin(v)
}
n -= glm.floor(n)
data[x + y * texture.extent.x + z * texture.extent.x * texture.extent.y] = glm.floor(n * 255).b
}
}
That takes over 4m on the jvm. The original sample in cpp uses OpenMp to accelerate the calculation.
I've heard about coroutines and I hope I could take advantage of them in this case.
I tried first to wrap the whole fors into a runBlocking because I do want that all the coroutines have finished before I move on.
runBlocking {
for (z in 0 until texture.extent.z) {
println(z)
for (y in 0 until texture.extent.y)
for (x in 0 until texture.extent.x) {
launch {
val v = Vec3(x, y, z) / texture.extent
var n = when {
FRACTAL -> FractalNoise().noise(v * noiseScale)
else -> 20f * glm.perlin(v)
}
n -= glm.floor(n)
data[x + y * texture.extent.x + z * texture.extent.x * texture.extent.y] = glm.floor(n * 255).b
}
}
}
}
But this is throwing different thread errors plus a final jvm crash
[thread 27624 also had an error][thread 23784 also had an error]# A fatal error has been detected by the Java Runtime Environment:
#
[thread 27624 also had an error][thread 23784 also had an error]# A fatal error has been detected by the Java Runtime Environment:
#
# [thread 14004 also had an error]EXCEPTION_ACCESS_VIOLATION
[thread 32652 also had an error] (0xc0000005)[thread 32616 also had an error]
at pc=0x0000000002d2fd50
, pid=23452[thread 21264 also had an error], tid=0x0000000000007b68
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# J 1431 C2 java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Ljava/util/concurrent/ForkJoinTask;)V (86 bytes) # 0x0000000002d2fd50 [0x0000000002d2f100+0xc50]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\gBarbieri\IdeaProjects\Vulkan\hs_err_pid23452.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
Process finished with exit code 1
I tried also to collect all the jobs into an arrayList and join() them at the end, but without success..
May coroutine be used for a parallel task like this one?
If yes, what am I doing wrong?
Instead of coroutines you should consider the parallel computation engine built into the JDK: java.util.stream. What you have here is an embarrassingly parallelizable task, a perfect use case for it.
I'd use something along these lines:
IntStream.range(0, extent.x)
.boxed()
.parallel()
.flatMap { x ->
IntStream.range(0, extent.y).boxed().flatMap { y ->
IntStream.range(0, extent.z).mapToObj { z ->
Vec(x, y, z)
}
}
}
.forEach { vec ->
data[vecToArrayIndex(vec)] = computeValue(vec)
}
Related
Suppose I have nextflow channels:
Channel.fromFilePairs( "test/read*_R{1,2}.fa" )
.set{ reads }
reads.view()
Channel.fromPath(['test/lib_R1.fa','test/lib_R2.fa'] )
.set{ libs }
libs.view()
Which results in:
// reads channel
[read_b, [<path>/test/read_b_R1.fa, <path>/test/read_b_R2.fa]]
[read_a, [<path>/test/read_a_R1.fa, <path>/test/read_a_R2.fa]]
// libs channel
<path>/test/lib_R1.fa
<path>/test/lib_R2.fa
How do I run a process foo that executes matching read-lib pair, where the same lib is used for all read pairs? So basically I want to execute foo 4 times:
foo(test/read_b_R1.fa, test/lib_R1.fa)
foo(test/read_b_R2.fa, test/lib_R2.fa)
foo(test/read_a_R1.fa, test/lib_R1.fa)
foo(test/read_a_R2.fa, test/lib_R2.fa)
If you want to use the same library for all read pairs, what you really want is a value channel which can be read an unlimited number of times without being consumed. Note that a value channel is implicitly created by a process when it's invoked with a simple value. This could indeed be a list of files, but it looks like what you want is just one of these to correspond to each of the R1 or R2 reads. I think the simplest solution here is to just include your process using an alias so that you can pass in the required channels/files without too much effort:
params.reads = 'test/read*_R{1,2}.fa'
include { foo as foo_r1 } from './modules/foo.nf'
include { foo as foo_r2 } from './modules/foo.nf'
workflow {
Channel
.fromFilePairs( params.reads )
.multiMap { sample, reads ->
def (r1, r2) = reads
read1:
tuple(sample, r1)
read2:
tuple(sample, r2)
}
.set { reads }
lib_r1 = file('test/lib_R1.fa')
lib_r2 = file('test/lib_R2.fa')
foo_r1(reads.read1, lib_r1)
foo_r2(reads.read2, lib_r2)
}
Contents of ./modules/foo.nf:
process foo {
debug true
input:
tuple val(sample), path(fasta)
path(lib)
"""
echo $sample, $fasta, $lib
"""
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [confident_boyd] DSL2 - revision: 8c81e2d743
executor > local (6)
[a8/e8a752] process > foo_r1 (2) [100%] 3 of 3 ✔
[75/2b32f5] process > foo_r2 (3) [100%] 3 of 3 ✔
readC, readC_R2.fa, lib_R2.fa
readA, readA_R1.fa, lib_R1.fa
readC, readC_R1.fa, lib_R1.fa
readB, readB_R2.fa, lib_R2.fa
readA, readA_R2.fa, lib_R2.fa
readB, readB_R1.fa, lib_R1.fa
process FOO {
debug true
input:
tuple val(files), path(lib)
output:
stdout
script:
file_a = files[0]
file_b = files[1]
"""
echo $file_a with $lib
echo $file_b with $lib
"""
}
workflow {
Channel
.of(['read_b', [file('/test/read_b_R1.fa'), file('/test/read_b_R2.fa')]],
['read_a', [file('/test/read_a_R1.fa'), file('/test/read_a_R2.fa')]]
)
.set { reads }
Channel
.of(file('/test/lib_R1.fa'),
file('/test/lib_R2.fa')
)
.set { libs }
reads
.map { sample, files -> files }
.flatten()
.map { file -> [file.name.split('_')[2].split('.fa')[0], file]}
.groupTuple()
.set { reads }
libs
.map { file -> [file.name.split('_')[1].split('.fa')[0], file]}
.set { libs }
reads
.join(libs)
.map { Rx, path, lib -> [path, lib] }
| FOO
}
The output of the script above is:
N E X T F L O W ~ version 22.10.4
Launching `ex.nf` [elegant_wiles] DSL2 - revision: 00862286fd
executor > local (2)
[58/9b3cf1] process > FOO (2) [100%] 2 of 2 ✔
/test/read_b_R1.fa with lib_R1.fa
/test/read_a_R1.fa with lib_R1.fa
/test/read_b_R2.fa with lib_R2.fa
/test/read_a_R2.fa with lib_R2.fa
EDIT as a reply to the comment below.
If you want the process to run once per element in the channel, check the modified version below:
process FOO {
debug true
input:
tuple val(file), path(lib)
output:
stdout
script:
"""
echo $file with $lib
"""
}
workflow {
Channel
.of(['read_b', [file('/test/read_b_R1.fa'), file('/test/read_b_R2.fa')]],
['read_a', [file('/test/read_a_R1.fa'), file('/test/read_a_R2.fa')]]
)
.set { reads }
Channel
.of(file('/test/lib_R1.fa'),
file('/test/lib_R2.fa')
)
.set { libs }
reads
.map { sample, files -> files }
.flatten()
.map { file -> [file.name.split('_')[2].split('.fa')[0], file]}
.groupTuple()
.set { reads }
libs
.map { file -> [file.name.split('_')[1].split('.fa')[0], file]}
.set { libs }
reads
.join(libs)
.map { Rx, path, lib -> [path, lib] }
.map { x, y -> [[x[0], y], [x[1], y]] }
.flatMap()
| FOO
}
Output:
N E X T F L O W ~ version 22.10.4
Launching `ex.nf` [sharp_ekeblad] DSL2 - revision: 1412af632e
executor > local (4)
[a0/416f59] process > FOO (1) [100%] 4 of 4 ✔
/test/read_b_R2.fa with lib_R2.fa
/test/read_a_R2.fa with lib_R2.fa
/test/read_a_R1.fa with lib_R1.fa
/test/read_b_R1.fa with lib_R1.fa
In Chisel documentation we have an example of rising edge detection method defined as following :
def risingedge(x: Bool) = x && !RegNext(x)
All example code is available on my github project blp.
If I use it on an Input signal declared as following :
class RisingEdge extends Module {
val io = IO(new Bundle{
val sclk = Input(Bool())
val redge = Output(Bool())
val fedge = Output(Bool())
})
// seems to not work with icarus + cocotb
def risingedge(x: Bool) = x && !RegNext(x)
def fallingedge(x: Bool) = !x && RegNext(x)
// works with icarus + cocotb
//def risingedge(x: Bool) = x && !RegNext(RegNext(x))
//def fallingedge(x: Bool) = !x && RegNext(RegNext(x))
io.redge := risingedge(io.sclk)
io.fedge := fallingedge(io.sclk)
}
With this icarus/cocotb testbench :
class RisingEdge(object):
def __init__(self, dut, clock):
self._dut = dut
self._clock_thread = cocotb.fork(clock.start())
#cocotb.coroutine
def reset(self):
short_per = Timer(100, units="ns")
self._dut.reset <= 1
self._dut.io_sclk <= 0
yield short_per
self._dut.reset <= 0
yield short_per
#cocotb.test()
def test_rising_edge(dut):
dut._log.info("Launching RisingEdge test")
redge = RisingEdge(dut, Clock(dut.clock, 1, "ns"))
yield redge.reset()
cwait = Timer(10, "ns")
for i in range(100):
dut.io_sclk <= 1
yield cwait
dut.io_sclk <= 0
yield cwait
I will never get rising pulses on io.redge and io.fedge. To get the pulse I have to change the definition of risingedge as following :
def risingedge(x: Bool) = x && !RegNext(RegNext(x))
With dual RegNext() :
With simple RegNext() :
Is it a normal behavior ?
[Edit: I modified source example with the github example given above]
I'm not sure about Icarus, but using the default Treadle simulator for a test like this.
class RisingEdgeTest extends FreeSpec {
"debug should toggle" in {
iotesters.Driver.execute(Array("-tiwv"), () => new SlaveSpi) { c =>
new PeekPokeTester(c) {
for (i <- 0 until 10) {
poke(c.io.csn, i % 2)
println(s"debug is ${peek(c.io.debug)}")
step(1)
}
}
}
}
}
I see the output
[info] [0.002] debug is 0
[info] [0.002] debug is 1
[info] [0.002] debug is 0
[info] [0.003] debug is 1
[info] [0.003] debug is 0
[info] [0.003] debug is 1
[info] [0.004] debug is 0
[info] [0.004] debug is 1
[info] [0.005] debug is 0
[info] [0.005] debug is 1
And the wave form looks like
Can you explain what you think this should look like.
Do not change module input value on rising edge of clock.
Ok I found my bug. In the cocotb testbench I toggled input values on the same edge of synchronous clock. If we do that, the input is modified exactly under the setup time of D-Latch, then the behavior is undefined !
Then, the problem was a cocotb testbench bug and not Chisel bug. To solve it we just have to change the clock edge for toggling values like it :
#cocotb.test()
def test_rising_edge(dut):
dut._log.info("Launching RisingEdge test")
redge = RisingEdge(dut, Clock(dut.clock, 1, "ns"))
yield redge.reset()
cwait = Timer(4, "ns")
yield FallingEdge(dut.clock) # <--- 'synchronize' on falling edge
for i in range(5):
dut.io_sclk <= 1
yield cwait
dut.io_sclk <= 0
yield cwait
I am using win32, macx and unix:!macx aka. Linux if statements in my .pro file, to specify os specific tasks, e.g.
win32 {
TARGET = myapp
RC_FILE = myapp.rc
}
macx {
TARGET = MyApp
ICON = myapp.icns
QMAKE_INFO_PLIST = Info.plist
}
unix:!macx { # linux
CONFIG(debug, debug|release) {
TARGET = myapp-debug
}
CONFIG(release, debug|release) {
TARGET = myapp
}
}
This works fine for if X else, if X elseif X else, and if not X where X is an os specifier.
Is there a way to tell qmake it must compile a block for os1 or os2?
You can use the | operator for a logical or. For example:
win32|macx {
HEADERS += debugging.h
}
http://doc.qt.io/qt-4.8/qmake-advanced-usage.html
I'd like to use CGAL convex partitioning in an application that is based on the epeck kernel, but trying to compile such throws the following error:
error:
no matching constructor for initialization of 'CGAL::Partition_vertex<CGAL::Partition_traits_2<CGAL::Epeck> >'
A simple test case for this is to take, for example, the greene_approx_convex_partition_2.cpp example from the distribution and try to change the kernel parameterization to epeck.
Are/can the 2D convex partitioning routines supported on an epeck kernel? Any pointers or advice much appreciated!
thanks much,
Here is a workaround:
--- a/include/CGAL/Partition_2/Indirect_edge_compare.h
+++ b/include/CGAL/Partition_2/Indirect_edge_compare.h
## -69,7 +69,7 ## class Indirect_edge_compare
else
{
// construct supporting line for edge
- Line_2 line = _construct_line_2(*edge_vtx_1, *edge_vtx_2);
+ Line_2 line = _construct_line_2((Point_2)*edge_vtx_1, (Point_2)*edge_vtx_2);
return _compare_x_at_y_2(*vertex, line) == SMALLER;
}
}
## -98,10 +98,10 ## class Indirect_edge_compare
// else neither endpoint is shared
// construct supporting line
- Line_2 l_p = _construct_line_2(*p, *after_p);
+ Line_2 l_p = _construct_line_2((Point_2)*p, (Point_2)*after_p);
if (_is_horizontal_2(l_p))
{
- Line_2 l_q = _construct_line_2(*q, *after_q);
+ Line_2 l_q = _construct_line_2((Point_2)*q, (Point_2)*after_q);
if (_is_horizontal_2(l_q))
{
## -130,7 +130,7 ## class Indirect_edge_compare
return q_larger_x;
// else one smaller and one larger
// construct the other line
- Line_2 l_q = _construct_line_2(*q, *after_q);
+ Line_2 l_q = _construct_line_2((Point_2)*q, (Point_2)*after_q);
if (_is_horizontal_2(l_q)) // p is not horizontal
{
return _compare_x_at_y_2((*q), l_p) == LARGER;
I have also noticed that while greene_approx_convex_partition_2 with epeck results in the compiler error mentioned above, the alternative approx_convex_partition_2 compiles just fine with epeck right out of the box.
I wrote a matrix multiplication algorithm, which uses parallel collections, to speed up the multiplication.
It goes like that:
(0 until M1_ROWS).grouped(PARTITION_ROWS).toList.par.map( i =>
singleThreadedMultiplicationFAST(i.toArray.map(m1(_)), m2)
).reduce(_++_)
Now I would like to do the same in Akka, so what I did is:
val multiplyer = actorOf[Pool]
multiplyer start
val futures = (0 until M1_ROWS).grouped(PARTITION_ROWS).map( i =>
multiplyer ? MultiplyMatrix(i.toArray.map(m1(_)), m2)
)
futures.map(_.get match { case res :Array[Array[Double]] => res }).reduce(_++_)
class Multiplyer extends akka.actor.Actor{
protected def receive = {
case MultiplyMatrix(m1, m2) => self reply singleThreadedMultiplicationFAST (m1,m2)
}
}
class Pool extends Actor with DefaultActorPool
with FixedCapacityStrategy with RoundRobinSelector {
def receive = _route
def partialFill = false
def selectionCount = 1
def instance = actorOf[Multiplyer]
def limit = 32 // I tried 256 with no effect either
}
It turned out that actor based version of this algorithm is using only
200% on my i7 sandybridge, while the parallel collections version is
using 600% of processor and is 4-5x faster.
I thought it might be the dispatcher and tried this:
self.dispatcher = Dispatchers.newThreadBasedDispatcher(self, mailboxCapacity = 100)
and this(I shared this one between actors):
val messageDispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher("d1")
.withNewBoundedThrea dPoolWithLinkedBlockingQueueWithUnboundedCapacity(100)
.setCorePoolSize(16)
.setMaxPoolSize(128)
.setKeepAliveTimeInMillis(60000).build
But I didn't observe any changes. Still 200% processor usage only and
the algorithm is 4-5 times slower than the parallel collections
version.
I am sure I am doing something silly so please help!!!:)
This expression:
val futures = (0 until M1_ROWS).grouped(PARTITION_ROWS).map( i =>
multiplyer ? MultiplyMatrix(i.toArray.map(m1(_)), m2)
)
creates a lazy collection, so your _.get makes your entire program serial.
So the solution is to make that expression strict by adding toList or similar.