I am developing a simple NAND module in SystemC. By specification, it should have a 4 ns delay so I tried to describe it with a process with a "wait" statement and SC_THREAD, as follows:
//file: nand.h
#include "systemc.h"
SC_MODULE(nand2){
sc_in<bool> A, B;
sc_out<bool> F;
void do_nand2(){
bool a, b, f;
a = A.read();
b = B.read();
f = !(a && b);
wait(4, SC_NS);
F.write(f);
}
SC_CTOR(nand2){
SC_THREAD(do_nand2);
sensitive << A << B;
}
};
To simulate I've created another module the outputs the stimulus for the NAND, as follows:
//file: stim.h
#include "systemc.h"
SC_MODULE(stim){
sc_out<bool> A, B;
sc_in<bool> Clk;
void stimGen(){
wait();
A.write(false);
B.write(false);
wait();
A.write(false);
B.write(true);
wait();
A.write(true);
B.write(true);
wait();
A.write(true);
B.write(false);
}
SC_CTOR(stim){
SC_THREAD(stimGen);
sensitive << Clk.pos();
}
};
Having these two modules described, the top module (where sc_main is) looks like this:
//file: top.cpp
#include "systemc.h"
#include "nand.h"
#include "stim.h"
int sc_main(int argc, char* argv[]){
sc_signal<bool> ASig, BSig, FSig;
sc_clock Clk("Clock", 100, SC_NS, 0.5);
stim Stim("Stimulus");
Stim.A(ASig); Stim.B(BSig); Stim.Clk(Clk);
nand2 nand2("nand2");
nand2.A(ASig); nand2.B(BSig); nand2.F(FSig);
sc_trace_file *wf = sc_create_vcd_trace_file("sim");
sc_trace(wf, Stim.Clk, "Clock");
sc_trace(wf, nand2.A, "A");
sc_trace(wf, nand2.B, "B");
sc_trace(wf, nand2.F, "F");
sc_start(400, SC_NS);
sc_close_vcd_trace_file(wf);
return 0;
}
The code was compiled and simulated with no errors, however when visualizing the .vcd file in gtkwave the output (F) gets stuck in 1, only showing the delay in the beginning of the simulation.
To test if there were any errors in the code I removed the "wait" statements and changed SC_THREAD to SC_METHOD in the nand.h file and simulated again, now getting the correct results, but without the delays of course.
What am I doing wrong?
It's best if you use an SC_METHOD for process do_nand2, which is sensitive to the inputs. A thread usually has an infinite loop inside of it and it runs for the entire length of the simulation. A method runs only once from beginning to end when triggered. You use threads mostly for stimulus or concurrent processes and threads may, or may not be sensitive to any events.
Just solved the problem:
instead of using
wait(4, SC_NS);
with SC_THREAD I used
next_trigger(4, SC_NS);
with SC_METHOD and it worked just fine.
Related
I'm trying to read the value of ccount register on esp8266. Though the first read after reset seems to be sensible, the rest values seem to be fishy.
Here is a complete code snippet I'm using
/* Hello World Example
This example code is in the Public Domain (or CC0 licensed, at your option.)
Unless required by applicable law or agreed to in writing, this
software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied.
*/
#include <stdio.h>
#include <stdint.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_system.h"
#include "esp_spi_flash.h"
static inline uint32_t get_ccount(void)
{
volatile uint32_t r;
__asm__ __volatile__("rsr %0,ccount":"=a" (r));
return r;
}
static void print_ccount()
{
uint32_t c = get_ccount();
printf("ccount: %u\n", c);
}
void app_main()
{
for (int i = 10; i >= 0; i--) {
print_ccount();
vTaskDelay(1000 / portTICK_PERIOD_MS);
}
printf("Restarting now.\n");
fflush(stdout);
esp_restart();
}
First read after reset is usually something like 659430, 110466 etc, but the rest values is usually recurring '1981' value or something similar. Even after reset, all reads after the first read return '1981' value.
Sometimes the output looks like this:
�ccount: 110466
I'm afraid that there is some garbage on the stack but I can't figure out what is the cause.
I encountered a problem when using OpenMP to parallelize my code. I have attached the simplest code that can reproduce my problem.
#include <iostream>
#include <vector>
using namespace std;
int main()
{
int n = 10;
int size = 1;
vector<double> vec(1, double(1.0));
double sum = 0.0;
#pragma omp parallel for private(vec) reduction(+: sum)
for (int i = 0; i != n; ++i)
{
/* in real case, complex operations applied on vec here */
sum += vec[0];
}
cout << "sum: " << sum << endl;
return 0;
}
I compile with g++ with flag -fopenmp, and the error message from g++ prompts "Segmentation fault (core dumped)". I am wondering what's wrong with the code.
Note that vec should be set to private since in the real code a complex operation is applied on vec in the for-loop.
The problem indeed comes from the private(vec) clause. There are two issues with this code.
First, from a semantics perspective, the private(vec) should be shared(vec), as the intent seems to be to work on the same std::vector instance in parallel. So, the code should look like this:
#pragma omp parallel for shared(vec), reduction(+: sum)
for (int i = 0; i != n; ++i)
{
sum += vec[0];
}
In the previous code, the private(vec) made a private instance of std::vector for each thread and was supposed to initialize these instances by calling the default constructor of std::vector.
Second, the segmentation fault then arises from the fact that there's no vec[0] element in any of the private instances. This can be confirmed by calling vec.size() fro the threads.
PS: shared(vec) would be been the default sharing for vec as per the OpenMP specification anyways.
I have been reading this upvoted answer on Stack Overflow: https://stackoverflow.com/a/26129960/12311164
It says that replacing wait(delay, units); in SC_THREAD to next_trigger(delay, units) in SC_METHOD works.
But when I tried, it does not work. I am trying to build adder module with 2 ns output delay. Instead of having a 2 ns output delay, the adder output is getting updated every 2 ns.
Design:
#include "systemc.h"
#define WIDTH 4
SC_MODULE(adder) {
sc_in<sc_uint<WIDTH> > A, B;
sc_out<sc_uint<WIDTH> > OUT;
void add(){
sc_time t1 = sc_time_stamp();
int current_time = t1.value();
int intermediate = A.read() + B.read();
next_trigger(2, SC_NS);
OUT.write(intermediate);
cout << " SC_METHOD add triggered at "<<sc_time_stamp() <<endl;
}
SC_CTOR(adder){
SC_METHOD(add);
sensitive << A << B;
}
};
I know how to simulate delay using 2 techniques: sc_event and SC_METHOD and the wait statement in SC_THREAD, but I would like to simulate the delay using next_trigger(). I have read the Language Reference Manual, but could not figure how to do it.
Simulated on EDA Playground here: https://edaplayground.com/x/dFzc
I think I need to trigger 2 NS after the inputs change, how to do that?
You will have to track state manually:
sc_uint<WIDTH> intermediate;
void add(){
if (A->event() || B->event() || sc_delta_count() == 0) {
intermediate = A.read() + B.read();
next_trigger(2, SC_NS);
} else {
OUT->write(intermediate);
}
}
The problem is that using next_trigger doesn't magically transform your SC_METHOD into SC_THREAD. In general, I find any usage of next_trigger inconvenient and there are better ways of doing this using sc_event.
I've read that SC_CTHREAD works only with bool, like:
SC_MODULE(my_module){
sc_in<bool> clk;
// ...
void foo();
// ...
SC_CTOR(my_module){
SC_CTHREAD(foo, clk.pos());
}
}
But what if I have sc_in_clk clk in my module, like it is in this example: http://www.asic-world.com/systemc/process3.html? After such simulation the result of the function is not calculated, so I'm using SC_METHOD(foo); sensitive << clk.pos();.
My question is: how can I use sc_in_clk type and SC_CTHREAD both at the same time? Do I need to cast clk to bool somehow?
Yes you can use it both at the same time because sc_in_clk is merely a typedef of sc_in<bool>. That means it doesn't matter whether you use sc_in_clk or sc_in<bool> with SC_CTHREAD.
From the documentation:
typedef sc_in<bool> sc_in_clk;
The typedef sc_in_clk is provided for convenience when adding clock inputs to a module and for backward compatibility with earlier versions of SystemC. An application may use sc_in_clk or sc_in< bool > interchangeably.
I tried to reproduce your problem in my own environment (SystemC 2.3.2). Based on the snippet you posted, I created this small SystemC program:
#include <systemc.h>
SC_MODULE(my_module)
{
sc_in_clk clk;
void foo();
SC_CTOR(my_module)
{
SC_CTHREAD(foo, clk.pos());
}
};
void my_module::foo()
{
while(1)
{
cout << sc_time_stamp() << endl;
wait();
}
}
my_module *DUT;
int sc_main(int argc, char** argv){
sc_clock clk("clk", 10, SC_NS);
DUT = new my_module("my_module");
DUT->clk(clk);
sc_start(50, SC_NS);
return 0;
}
This code works as expected and the output is:
0 s
10 ns
20 ns
30 ns
40 ns
You can try to match the structure of your code to the structure of the program above to find potential other bugs in your code.
What is the structure of your void foo()? Does it contain any form of the wait function, other than void wait(); or void wait(int);? Because a clocked thread process may only call these two forms of wait.
Im making a xor gate in SystemC, from the binding of four NAND gates. I want the module to receive a vector of N bits, where N is passed as parameter. I should be able to perform & and not bitwise operations (for the NAND gate).
The best solution may be using sc_bv_base type, but I don't know how to initialize it in the constructor.
How can I create a bit vector using a custom length?
A way to parameterise the module is to create a new C++ template for the module.
In this example, the width of the input vector can be set at the level of the instantiation of this module
#ifndef MY_XOR_H_
#define MY_XOR_H_
#include <systemc.h>
template<int depth>
struct my_xor: sc_module {
sc_in<bool > clk;
sc_in<sc_uint<depth> > din;
sc_out<bool > dout;
void p1() {
dout.write(xor_reduce(din.read()));
}
SC_CTOR(my_xor) {
SC_METHOD(p1);
sensitive << clk.pos();
}
};
#endif /* MY_XOR_H_ */
Note that the struct my_xor: sc_module is used i.s.o. the SC_MODULE macro. (See page 40 , 5.2.5 SC_MODULE of the IEEE Std 1666-2011).
You can test this with the following testbench:
//------------------------------------------------------------------
// Simple Testbench for xor file
//------------------------------------------------------------------
#include <systemc.h>
#include "my_xor.h"
int sc_main(int argc, char* argv[]) {
const int WIDTH = 8;
sc_signal<sc_uint<WIDTH> > din;
sc_signal<bool> dout;
sc_clock clk("clk", 10, SC_NS, 0.5); // Create a clock signal
my_xor<WIDTH> DUT("my_xor"); // Instantiate Device Under Test
DUT.din(din); // Connect ports
DUT.dout(dout);
DUT.clk(clk);
sc_trace_file *fp; // Create VCD file
fp = sc_create_vcd_trace_file("wave"); // open(fp), create wave.vcd file
fp->set_time_unit(100, SC_PS); // set tracing resolution to ns
sc_trace(fp, clk, "clk"); // Add signals to trace file
sc_trace(fp, din, "din");
sc_trace(fp, dout, "dout");
sc_start(31, SC_NS); // Run simulation
din = 0x00;
sc_start(31, SC_NS); // Run simulation
din = 0x01;
sc_start(31, SC_NS); // Run simulation
din = 0xFF;
sc_start(31, SC_NS); // Run simulation
sc_close_vcd_trace_file(fp); // close(fp)
return 0;
}
Note that I'm using a struct and not a class. A class is also possible.
class my_xor: public sc_module{
public:
The XOR in this code is just the xor_reduce. You can find more about in the IEEE Std 1666-2011 at page 197 (7.2.8 Reduction operators). But I assume this is not the solution you wanted to have.