RisingEdge example doesn't work for module input signal in Chisel3 - hdl

In Chisel documentation we have an example of rising edge detection method defined as following :
def risingedge(x: Bool) = x && !RegNext(x)
All example code is available on my github project blp.
If I use it on an Input signal declared as following :
class RisingEdge extends Module {
val io = IO(new Bundle{
val sclk = Input(Bool())
val redge = Output(Bool())
val fedge = Output(Bool())
})
// seems to not work with icarus + cocotb
def risingedge(x: Bool) = x && !RegNext(x)
def fallingedge(x: Bool) = !x && RegNext(x)
// works with icarus + cocotb
//def risingedge(x: Bool) = x && !RegNext(RegNext(x))
//def fallingedge(x: Bool) = !x && RegNext(RegNext(x))
io.redge := risingedge(io.sclk)
io.fedge := fallingedge(io.sclk)
}
With this icarus/cocotb testbench :
class RisingEdge(object):
def __init__(self, dut, clock):
self._dut = dut
self._clock_thread = cocotb.fork(clock.start())
#cocotb.coroutine
def reset(self):
short_per = Timer(100, units="ns")
self._dut.reset <= 1
self._dut.io_sclk <= 0
yield short_per
self._dut.reset <= 0
yield short_per
#cocotb.test()
def test_rising_edge(dut):
dut._log.info("Launching RisingEdge test")
redge = RisingEdge(dut, Clock(dut.clock, 1, "ns"))
yield redge.reset()
cwait = Timer(10, "ns")
for i in range(100):
dut.io_sclk <= 1
yield cwait
dut.io_sclk <= 0
yield cwait
I will never get rising pulses on io.redge and io.fedge. To get the pulse I have to change the definition of risingedge as following :
def risingedge(x: Bool) = x && !RegNext(RegNext(x))
With dual RegNext() :
With simple RegNext() :
Is it a normal behavior ?
[Edit: I modified source example with the github example given above]

I'm not sure about Icarus, but using the default Treadle simulator for a test like this.
class RisingEdgeTest extends FreeSpec {
"debug should toggle" in {
iotesters.Driver.execute(Array("-tiwv"), () => new SlaveSpi) { c =>
new PeekPokeTester(c) {
for (i <- 0 until 10) {
poke(c.io.csn, i % 2)
println(s"debug is ${peek(c.io.debug)}")
step(1)
}
}
}
}
}
I see the output
[info] [0.002] debug is 0
[info] [0.002] debug is 1
[info] [0.002] debug is 0
[info] [0.003] debug is 1
[info] [0.003] debug is 0
[info] [0.003] debug is 1
[info] [0.004] debug is 0
[info] [0.004] debug is 1
[info] [0.005] debug is 0
[info] [0.005] debug is 1
And the wave form looks like
Can you explain what you think this should look like.

Do not change module input value on rising edge of clock.
Ok I found my bug. In the cocotb testbench I toggled input values on the same edge of synchronous clock. If we do that, the input is modified exactly under the setup time of D-Latch, then the behavior is undefined !
Then, the problem was a cocotb testbench bug and not Chisel bug. To solve it we just have to change the clock edge for toggling values like it :
#cocotb.test()
def test_rising_edge(dut):
dut._log.info("Launching RisingEdge test")
redge = RisingEdge(dut, Clock(dut.clock, 1, "ns"))
yield redge.reset()
cwait = Timer(4, "ns")
yield FallingEdge(dut.clock) # <--- 'synchronize' on falling edge
for i in range(5):
dut.io_sclk <= 1
yield cwait
dut.io_sclk <= 0
yield cwait

Related

Spurious 'variable is never used' warning?

I'm trying to clean up my code to get rid of Kotlin warnings, but I don't understand what's causing a couple warnings- I have this code:
var slowestMps:Float? = null
for (r in this.sections.subList(i + 1, this.sections.size - 1)) {
if (!r.locs.any { l -> l.speed <= this.maxReversePointMps}) {
continue
}
val mps = r.m / r.s
if (slowest == null || (slowestMps != null && mps < slowestMps)) {
slowest = r
slowestMps = mps
}
}
And I get the warnings:
Variable 'slowestMps' is never used
The value 'mps' assigned to 'var slowestMps: Float? defined in ... is never used
Why are these warnings getting triggered - aren't these variables being used?

How can I implement coroutines for a parallel task

So, I have this piece of code:
for (z in 0 until texture.extent.z) {
println(z)
for (y in 0 until texture.extent.y)
for (x in 0 until texture.extent.x) {
val v = Vec3(x, y, z) / texture.extent
var n = when {
FRACTAL -> FractalNoise().noise(v * noiseScale)
else -> 20f * glm.perlin(v)
}
n -= glm.floor(n)
data[x + y * texture.extent.x + z * texture.extent.x * texture.extent.y] = glm.floor(n * 255).b
}
}
That takes over 4m on the jvm. The original sample in cpp uses OpenMp to accelerate the calculation.
I've heard about coroutines and I hope I could take advantage of them in this case.
I tried first to wrap the whole fors into a runBlocking because I do want that all the coroutines have finished before I move on.
runBlocking {
for (z in 0 until texture.extent.z) {
println(z)
for (y in 0 until texture.extent.y)
for (x in 0 until texture.extent.x) {
launch {
val v = Vec3(x, y, z) / texture.extent
var n = when {
FRACTAL -> FractalNoise().noise(v * noiseScale)
else -> 20f * glm.perlin(v)
}
n -= glm.floor(n)
data[x + y * texture.extent.x + z * texture.extent.x * texture.extent.y] = glm.floor(n * 255).b
}
}
}
}
But this is throwing different thread errors plus a final jvm crash
[thread 27624 also had an error][thread 23784 also had an error]# A fatal error has been detected by the Java Runtime Environment:
#
[thread 27624 also had an error][thread 23784 also had an error]# A fatal error has been detected by the Java Runtime Environment:
#
# [thread 14004 also had an error]EXCEPTION_ACCESS_VIOLATION
[thread 32652 also had an error] (0xc0000005)[thread 32616 also had an error]
at pc=0x0000000002d2fd50
, pid=23452[thread 21264 also had an error], tid=0x0000000000007b68
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# J 1431 C2 java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Ljava/util/concurrent/ForkJoinTask;)V (86 bytes) # 0x0000000002d2fd50 [0x0000000002d2f100+0xc50]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\gBarbieri\IdeaProjects\Vulkan\hs_err_pid23452.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
Process finished with exit code 1
I tried also to collect all the jobs into an arrayList and join() them at the end, but without success..
May coroutine be used for a parallel task like this one?
If yes, what am I doing wrong?
Instead of coroutines you should consider the parallel computation engine built into the JDK: java.util.stream. What you have here is an embarrassingly parallelizable task, a perfect use case for it.
I'd use something along these lines:
IntStream.range(0, extent.x)
.boxed()
.parallel()
.flatMap { x ->
IntStream.range(0, extent.y).boxed().flatMap { y ->
IntStream.range(0, extent.z).mapToObj { z ->
Vec(x, y, z)
}
}
}
.forEach { vec ->
data[vecToArrayIndex(vec)] = computeValue(vec)
}

what is the kotlin idiom for an equivalent to this python iterator

The question is, how to create a python like iterator in Kotlin.
Consider this python code that parses string into substrings:
def parse(strng, idx=1):
lst = []
for i, c in itermarks(strng, idx):
if c == '}':
lst.append(strng[idx:i-1])
break
elif c == '{':
sublst, idx = parse(strng, i+1)
lst.append(sublst)
else:
lst.append(strng[idx:i-1])
idx = i+1
return lst, i
>>>res,resl = parse('{ a=50 , b=75 , { e=70, f=80 } }')
>>>print(resl)
>>>[' a=50', ' b=75', [' e=7', ' f=80'], '', ' f=80']
This is a play example just to illustrate a python iterator:
def findany(strng, idx, chars):
""" to emulate 'findany' in kotlin """
while idx < len(strng) and strng[idx] not in chars:
idx += 1
return idx
def itermarks(strng, idx=0):
while True:
idx = findany(strng, idx, ',{}"')
if idx >= len(strng):
break
yield idx, strng[idx]
if strng[idx] == '}':
break
idx += 1
Kotlin has iterators and generators, and as I understand it there can only be one per type. My idea is to define a type with a generator and instance that type. So the for loop from 'parse' (above) would look like this :
for((i,c) in IterMarks(strng){
......
}
But how do i define the generator, and what is the best idiom.
Kotlin uses two interfaces: Iterable<T> (from JDK) and Sequence<T>. They are identical, with the exception that the first one is eager, while the second one is lazy by convention.
To work with iterators or sequences all you have to do is implement one of those interfaces. Kotlin stdlib has a bunch of helper functions that may help.
In particular, a couple of functions for creating sequences with yield was added in 1.1. They and some other functions are called generators. Use them if you like them, or implement the interfaces manually.
OK, after some work, here is the Kotlin for the iterator:
import kotlin.coroutines.experimental.*
fun iterMarks(strng: String, idx:Int=0)=buildSequence{
val specials = listOf("\"", "{", "}", ",")
var found:Pair<Int,String>?
var index = idx
while (true){
found = strng.findAnyOf(specials, index)
if (found == null) break
yield (found)
index= found.first + 1
}
}
The main discoveries were that an iterator can be returned by any function, so the there is no need to add the iterator methods to an existing object. The JetBrains doco is solid, but lacks examples so hopefully the above example helps. You can also work from the basics, and again the notes are good but lack examples. I will post more on other approaches if there is interest.
The this code for 'parse' then works:
fun parse(strng:String, idxIn:Int=1): Pair<Any,Int> {
var lst:MutableList<Any> = mutableListOf()
var idx = idxIn
loop# for (mark in iterMarks(strng, idx)){
if(mark==null ||mark.first <= idx){
// nothing needed
}
else
{
when( mark.second ) {
"}" -> {
lst.add(strng.slice(idx..mark.first - 1))
idx = mark.first + 1
break#loop
}
"{" -> {
val res: Pair<Any, Int>
res = parse(strng, mark.first + 1)
lst.add(res.first)
idx = res.second
}
"," -> {
lst.add(strng.slice(idx..mark.first - 1))
idx = mark.first + 1
}
}
}
}
return Pair(lst, idx)
}
Hopefully this example will make it less work for the next person new to Kotlin, by providing an example of implementing an iterator. Specifically if you know how to make an iterator in python then this example should be useful

How to step over objc_msgSend function in lldb?

In objective-c, function cal is translated to objc_msgSend. e.g.
[foo doSomething:#"keke"]
is translated to
objc_msgSend(foo, "doSomething:", #"keke")
How could I step directly to foo.doSomething: while debugging in lldb?
lldb does provide thread plan which could control the step logic.
class GotoUser:
def __init__ (self, thread_plan, dict):
self.start_time = time.time()
self.thread_plan = thread_plan
target = self.thread_plan.GetThread().GetProcess().GetTarget();
module = target.GetModuleAtIndex(0)
sbaddr = lldb.SBAddress(module.GetObjectFileHeaderAddress())
self.start_address = sbaddr.GetLoadAddress(target)
module = target.GetModuleAtIndex(1)
sbaddr = lldb.SBAddress(module.GetObjectFileHeaderAddress())
self.end_address = sbaddr.GetLoadAddress(target)
print "start addr: ", hex(self.start_address), " end addr: ", hex(self.end_address)
def explains_stop (self, event):
if self.thread_plan.GetThread().GetStopReason()== lldb.eStopReasonTrace:
return True
else:
return False
def should_stop (self, event):
cur_pc = self.thread_plan.GetThread().GetFrameAtIndex(0).GetPC()
if cur_pc >= self.start_address and cur_pc <= self.end_address:
self.thread_plan.SetPlanComplete(True)
print 'time used ', (time.time() - self.start_time)
return True
else:
return False
def should_step (self):
return True
create a python script, load it to lldb.
run thread step-scripted gotouser.GotoUser.
mission completed.
full source code:
https://github.com/Jichao/lldb-scripts/blob/master/gotouser.py
version built into lldb:
https://github.com/Jichao/lldb
Using the Python thread plan is clever! But you should not have to do this for ObjC messages. lldb knows that objc_msgSend & a few others are the dispatch functions for ObjC messages. So if a step in ends up at objc_msgSend, lldb will figure out the method implementation from the object/selector pair passed in, set a breakpoint there, and continue.
For instance:
(lldb) run
Process 55502 launched: '/private/tmp/trivial' (x86_64)
Process 55502 stopped
* thread #1: tid = 0x32619ba, function: main , stop reason = breakpoint 1.1
frame #0: 0x0000000100000f28 trivial`main at trivial.m:18
15 main()
16 {
17 Trivial *my_foo = [[Trivial alloc] init];
-> 18 [my_foo doSomething];
19 return 0;
20 }
(lldb) s
Process 55502 stopped
* thread #1: tid = 0x32619ba, function: -[Trivial doSomething] , stop reason = step in
frame #0: 0x0000000100000ed7 trivial`-[Trivial doSomething] at trivial.m:10
7 #implementation Trivial
8 - (void) doSomething
9 {
-> 10 NSLog(#"%# called doSomething.", self);
11 }
12 #end
13
So the step in stopped at the actual message receiver in this case. If that is not what is happening for you, most likely something is fooling the part of lldb that does the object/selector -> implementation lookup. I'd have to know more about your code to figure out why that might be.
How could I step directly to foo.doSomething: while debugging in lldb?
seems: No possbile
Workaround: add related breakpoint
your here: -[foo doSomething:]

Akka - load balancing and full use of the processor

I wrote a matrix multiplication algorithm, which uses parallel collections, to speed up the multiplication.
It goes like that:
(0 until M1_ROWS).grouped(PARTITION_ROWS).toList.par.map( i =>
singleThreadedMultiplicationFAST(i.toArray.map(m1(_)), m2)
).reduce(_++_)
Now I would like to do the same in Akka, so what I did is:
val multiplyer = actorOf[Pool]
multiplyer start
val futures = (0 until M1_ROWS).grouped(PARTITION_ROWS).map( i =>
multiplyer ? MultiplyMatrix(i.toArray.map(m1(_)), m2)
)
futures.map(_.get match { case res :Array[Array[Double]] => res }).reduce(_++_)
class Multiplyer extends akka.actor.Actor{
protected def receive = {
case MultiplyMatrix(m1, m2) => self reply singleThreadedMultiplicationFAST (m1,m2)
}
}
class Pool extends Actor with DefaultActorPool
with FixedCapacityStrategy with RoundRobinSelector {
def receive = _route
def partialFill = false
def selectionCount = 1
def instance = actorOf[Multiplyer]
def limit = 32 // I tried 256 with no effect either
}
It turned out that actor based version of this algorithm is using only
200% on my i7 sandybridge, while the parallel collections version is
using 600% of processor and is 4-5x faster.
I thought it might be the dispatcher and tried this:
self.dispatcher = Dispatchers.newThreadBasedDispatcher(self, mailboxCapacity = 100)
and this(I shared this one between actors):
val messageDispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher("d1")
.withNewBoundedThrea dPoolWithLinkedBlockingQueueWithUnboundedCapacity(100)
.setCorePoolSize(16)
.setMaxPoolSize(128)
.setKeepAliveTimeInMillis(60000).build
But I didn't observe any changes. Still 200% processor usage only and
the algorithm is 4-5 times slower than the parallel collections
version.
I am sure I am doing something silly so please help!!!:)
This expression:
val futures = (0 until M1_ROWS).grouped(PARTITION_ROWS).map( i =>
multiplyer ? MultiplyMatrix(i.toArray.map(m1(_)), m2)
)
creates a lazy collection, so your _.get makes your entire program serial.
So the solution is to make that expression strict by adding toList or similar.