Clock type in SC_CTHREAD - systemc

I've read that SC_CTHREAD works only with bool, like:
SC_MODULE(my_module){
sc_in<bool> clk;
// ...
void foo();
// ...
SC_CTOR(my_module){
SC_CTHREAD(foo, clk.pos());
}
}
But what if I have sc_in_clk clk in my module, like it is in this example: http://www.asic-world.com/systemc/process3.html? After such simulation the result of the function is not calculated, so I'm using SC_METHOD(foo); sensitive << clk.pos();.
My question is: how can I use sc_in_clk type and SC_CTHREAD both at the same time? Do I need to cast clk to bool somehow?

Yes you can use it both at the same time because sc_in_clk is merely a typedef of sc_in<bool>. That means it doesn't matter whether you use sc_in_clk or sc_in<bool> with SC_CTHREAD.
From the documentation:
typedef sc_in<bool> sc_in_clk;
The typedef sc_in_clk is provided for convenience when adding clock inputs to a module and for backward compatibility with earlier versions of SystemC. An application may use sc_in_clk or sc_in< bool > interchangeably.
I tried to reproduce your problem in my own environment (SystemC 2.3.2). Based on the snippet you posted, I created this small SystemC program:
#include <systemc.h>
SC_MODULE(my_module)
{
sc_in_clk clk;
void foo();
SC_CTOR(my_module)
{
SC_CTHREAD(foo, clk.pos());
}
};
void my_module::foo()
{
while(1)
{
cout << sc_time_stamp() << endl;
wait();
}
}
my_module *DUT;
int sc_main(int argc, char** argv){
sc_clock clk("clk", 10, SC_NS);
DUT = new my_module("my_module");
DUT->clk(clk);
sc_start(50, SC_NS);
return 0;
}
This code works as expected and the output is:
0 s
10 ns
20 ns
30 ns
40 ns
You can try to match the structure of your code to the structure of the program above to find potential other bugs in your code.
What is the structure of your void foo()? Does it contain any form of the wait function, other than void wait(); or void wait(int);? Because a clocked thread process may only call these two forms of wait.

Related

OpenMP with Segmentation fault (core dumped)

I encountered a problem when using OpenMP to parallelize my code. I have attached the simplest code that can reproduce my problem.
#include <iostream>
#include <vector>
using namespace std;
int main()
{
int n = 10;
int size = 1;
vector<double> vec(1, double(1.0));
double sum = 0.0;
#pragma omp parallel for private(vec) reduction(+: sum)
for (int i = 0; i != n; ++i)
{
/* in real case, complex operations applied on vec here */
sum += vec[0];
}
cout << "sum: " << sum << endl;
return 0;
}
I compile with g++ with flag -fopenmp, and the error message from g++ prompts "Segmentation fault (core dumped)". I am wondering what's wrong with the code.
Note that vec should be set to private since in the real code a complex operation is applied on vec in the for-loop.
The problem indeed comes from the private(vec) clause. There are two issues with this code.
First, from a semantics perspective, the private(vec) should be shared(vec), as the intent seems to be to work on the same std::vector instance in parallel. So, the code should look like this:
#pragma omp parallel for shared(vec), reduction(+: sum)
for (int i = 0; i != n; ++i)
{
sum += vec[0];
}
In the previous code, the private(vec) made a private instance of std::vector for each thread and was supposed to initialize these instances by calling the default constructor of std::vector.
Second, the segmentation fault then arises from the fact that there's no vec[0] element in any of the private instances. This can be confirmed by calling vec.size() fro the threads.
PS: shared(vec) would be been the default sharing for vec as per the OpenMP specification anyways.

corefine_and_compute_difference CGAL error: precondition violation

Problem description
I read the mesh from the file "blank.off" and load it into the a surface_mesh variable blank. One file named "hepoints49.txt" stores point clouds. I use function CGAL::advancing_front_surface_reconstruction() to convert this point cloud to surface_mesh sv, and then use function corefine_and_compute_difference(blank,sv,res) to perform the Boolean subtraction between blank and sv.But the program throws an exception and terminates. The following is displayed on the terminal:
Using context 4 . 3 GL
load sv...
Using context 4 . 3 GL
start difference...
CGAL error: precondition violation!
Expression : CGAL::is_valid_polygon_mesh(tm)
File : D:\dev\vcpkg\installed\x64-windows\include\CGAL/Polygon_mesh_processing/orientation.h
Line : 190
Could you please help me solve this problem?
code
#include<iostream>
#include<io.h>
#include<fstream>
#include<algorithm>
#include<array>
#include<CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include<CGAL/Advancing_front_surface_reconstruction.h>
#include<CGAL/Surface_mesh.h>
#include<CGAL/disable_warnings.h>
#include<CGAL/draw_surface_mesh.h>
#include<ctime>
#include<string>
#include<CGAL/polygon_mesh_processing/corefinement.h>
#include<CGAL/polygon_mesh_processing/remesh.h>
#include<CGAL/boost/graph/selection.h>
#include<CGAL/polygon_mesh_processing/repair_self_intersections.h>
using std::cin;
using std::cout;
using std::endl;
using std::string;
namespace PMP = CGAL::Polygon_mesh_processing;
typedef std::array<std::size_t, 3> Facet;
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef Kernel::Point_3 Point_3;
typedef CGAL::Surface_mesh<Point_3> Mesh;
struct Construct {
Mesh& mesh;
template <typename PointIterator>
Construct(Mesh& mesh, PointIterator b, PointIterator e):mesh(mesh) {
for (; b != e; ++b) {
boost::graph_traits<Mesh>::vertex_descriptor v;
v = add_vertex(mesh);
mesh.point(v) = *b;
}
}
Construct& operator=(const Facet f) {
typedef boost::graph_traits<Mesh>::vertex_descriptor vertex_descriptor;
typedef boost::graph_traits<Mesh>::vertices_size_type size_type;
mesh.add_face(vertex_descriptor(static_cast<size_type>(f[0])),
vertex_descriptor(static_cast<size_type>(f[1])),
vertex_descriptor(static_cast<size_type>(f[2])));
return *this;
}
Construct& operator*() { return *this; }
Construct& operator++() { return *this; }
Construct& operator++(int) { return *this; }
};
int main() {
//load blank
Mesh blank, sv,res;
std::ifstream fin("blank.off");
fin>>blank;
fin.close();
CGAL::draw(blank);
//load sv
string filename = "hepoints49.txt" ;
std::cout << "load sv..."<< std::endl;
fin.open(filename);
std::vector<Point_3> points;
std::vector<Facet> facets;
std::copy(std::istream_iterator<Point_3>(fin),
std::istream_iterator<Point_3>(),
std::back_inserter(points));//load points
fin.close();
Construct construct(sv, points.begin(), points.end());
CGAL::advancing_front_surface_reconstruction(points.begin(), points.end(), construct);//convert sv to surface_mesh
CGAL::draw(sv);
std::cout << "start difference..." << std::endl;
bool valid_difference = PMP::corefine_and_compute_difference(blank,sv,res);
if (valid_difference) {
std::cout << "difference was successfully computed. " << std::endl;
CGAL::draw(res);
}
else {
std::cout << "difference could not be completed. Skip. " << endl << endl;
}
//CGAL::draw(res);
return 0;
}
Runtime environment
CGAL version: 5.3
IDE: VS2017
Solution Configuration: Debug x64
I tried to run this program in Release mode, of course there is no exception thrown. But the result I got turned out to be the opposite of what I want.
Files
Files that appearing in the code are provided below:
https://github.com/wenzaifou/for-stack-overflow-question3.git
Github link is provided because the file is relatively large.
The way the mesh is constructed from advancing front output does not filter out isolated vertices, which causes the exception to be raised. Adding a call to CGAL::Polygon_mesh_processing::remove_isolated_vertices(sv) will solve the problem.
Then you might encounter the issue that your meshes are not outward oriented (meaning then represent an infinite portion of space). Adding the following calls will solve the problem:
if (!CGAL::Polygon_mesh_processing::is_outward_oriented(blank))
CGAL::Polygon_mesh_processing::reverse_face_orientations(blank);
if (!CGAL::Polygon_mesh_processing::is_outward_oriented(sv))
CGAL::Polygon_mesh_processing::reverse_face_orientations(sv);
Doc refs here and there.

Why do I need separable compilation?

I have the code shown below. As far as I understood, separable compilation must be turned on when
CUDA device code is separated into .h and .cu files
Use ObjectA's device code into Object's B device code
however, in my main function I am not having any of the cases above. Could you tell me why do I have to set separable compilation for this sample project?
BitHelper.h
#pragma once
#include <cuda_runtime.h>
#define COMPILE_TARGET __host__ __device__
class BitHelper
{
public:
COMPILE_TARGET BitHelper();
COMPILE_TARGET ~BitHelper();
COMPILE_TARGET static void clear(unsigned int& val0);
};
BitHelper.cu
#include "bithelper.h"
BitHelper::BitHelper()
{}
BitHelper::~BitHelper()
{}
void BitHelper::clear(unsigned int& val0)
{
val0 = 0x0000;
}
Consume_BitHelper.h
#pragma once
class Consume_BitHelper
{
public:
void apply();
private:
bool test_cpu();
bool test_gpu();
};
Consume_BitHelper.cu
#include "consume_bithelper.h"
#include <cuda_runtime.h>
#include <iostream>
#include "bithelper.h"
__global__
void myKernel()
{
unsigned int FLAG_VALUE = 0x2222;
printf("GPU before: %d\n", FLAG_VALUE);
BitHelper::clear(FLAG_VALUE);
printf("GPU after: %d\n", FLAG_VALUE);
}
void Consume_BitHelper::apply()
{
test_cpu();
test_gpu();
cudaDeviceSynchronize();
}
bool Consume_BitHelper::test_cpu()
{
std::cout << "TEST CPU" << std::endl;
unsigned int FLAG_VALUE = 0x1111;
std::cout << "CPU before: " << FLAG_VALUE << std::endl;
BitHelper::clear(FLAG_VALUE);
std::cout << "CPU after : " << FLAG_VALUE << std::endl;
return true;
}
bool Consume_BitHelper::test_gpu()
{
std::cout << "TEST GPU" << std::endl;
myKernel << <1, 1 >> > ();
return true;
}
main.cu
#include "consume_bithelper.h"
#include "bithelper.h"
#include <iostream>
int main(int argc, char** argv)
{
Consume_BitHelper cbh;
cbh.apply();
std::cout << "\nPress any key to continue...";
std::cin.get();
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(cuda_class LANGUAGES CXX CUDA)
#BitHelper needs separable compilation because we have separated declaration from definition
add_library(bithelper_lib STATIC bithelper.cu)
set_property(TARGET bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
#Consume_BitHelper needs separable compilation because we call BitHelper's device code
#from Consume_BitHelper's kernel
add_library(consume_bithelper_lib STATIC consume_bithelper.cu)
set_property(TARGET consume_bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(consume_bithelper_lib bithelper_lib)
#We only call CPU code so no need of separable compilation?
add_executable(${PROJECT_NAME} main.cu)
target_link_libraries(${PROJECT_NAME} bithelper_lib consume_bithelper_lib)
The errors I'm getting are these
EDIT
According to Robert Crovella's post Consume_BitHelper.cu uses BitHelper::clear defined in a separate compilation unit.
Does it mean I have to activate only separate compilation for BitHelper?
Since separate compilation has to do only with device code called from device code.
Why am I getting the mentioned errors when separate compilation is NOT on for cuda_class? (which is the executable created from CMake and is not calling any device code)
Separable compilation has to do with how the compiler handles function calls. In exchange for a little bit of overhead, you get the ability to make true function calls and thus access code from other "compilation units" (i.e. .cu source files).
As GPU programmers are obsessed with performance (particularly the extra registers that get used when separable compilation is enabled) Nvidia made it an option instead of default.
You should only need separable compilation for .cu files that access functions/globals defined in other .cu files.

SC_THREAD does not get triggered by its sensitivity list

I am developing a simple NAND module in SystemC. By specification, it should have a 4 ns delay so I tried to describe it with a process with a "wait" statement and SC_THREAD, as follows:
//file: nand.h
#include "systemc.h"
SC_MODULE(nand2){
sc_in<bool> A, B;
sc_out<bool> F;
void do_nand2(){
bool a, b, f;
a = A.read();
b = B.read();
f = !(a && b);
wait(4, SC_NS);
F.write(f);
}
SC_CTOR(nand2){
SC_THREAD(do_nand2);
sensitive << A << B;
}
};
To simulate I've created another module the outputs the stimulus for the NAND, as follows:
//file: stim.h
#include "systemc.h"
SC_MODULE(stim){
sc_out<bool> A, B;
sc_in<bool> Clk;
void stimGen(){
wait();
A.write(false);
B.write(false);
wait();
A.write(false);
B.write(true);
wait();
A.write(true);
B.write(true);
wait();
A.write(true);
B.write(false);
}
SC_CTOR(stim){
SC_THREAD(stimGen);
sensitive << Clk.pos();
}
};
Having these two modules described, the top module (where sc_main is) looks like this:
//file: top.cpp
#include "systemc.h"
#include "nand.h"
#include "stim.h"
int sc_main(int argc, char* argv[]){
sc_signal<bool> ASig, BSig, FSig;
sc_clock Clk("Clock", 100, SC_NS, 0.5);
stim Stim("Stimulus");
Stim.A(ASig); Stim.B(BSig); Stim.Clk(Clk);
nand2 nand2("nand2");
nand2.A(ASig); nand2.B(BSig); nand2.F(FSig);
sc_trace_file *wf = sc_create_vcd_trace_file("sim");
sc_trace(wf, Stim.Clk, "Clock");
sc_trace(wf, nand2.A, "A");
sc_trace(wf, nand2.B, "B");
sc_trace(wf, nand2.F, "F");
sc_start(400, SC_NS);
sc_close_vcd_trace_file(wf);
return 0;
}
The code was compiled and simulated with no errors, however when visualizing the .vcd file in gtkwave the output (F) gets stuck in 1, only showing the delay in the beginning of the simulation.
To test if there were any errors in the code I removed the "wait" statements and changed SC_THREAD to SC_METHOD in the nand.h file and simulated again, now getting the correct results, but without the delays of course.
What am I doing wrong?
It's best if you use an SC_METHOD for process do_nand2, which is sensitive to the inputs. A thread usually has an infinite loop inside of it and it runs for the entire length of the simulation. A method runs only once from beginning to end when triggered. You use threads mostly for stimulus or concurrent processes and threads may, or may not be sensitive to any events.
Just solved the problem:
instead of using
wait(4, SC_NS);
with SC_THREAD I used
next_trigger(4, SC_NS);
with SC_METHOD and it worked just fine.

Setting the vector length in SystemC with a received parameter

Im making a xor gate in SystemC, from the binding of four NAND gates. I want the module to receive a vector of N bits, where N is passed as parameter. I should be able to perform & and not bitwise operations (for the NAND gate).
The best solution may be using sc_bv_base type, but I don't know how to initialize it in the constructor.
How can I create a bit vector using a custom length?
A way to parameterise the module is to create a new C++ template for the module.
In this example, the width of the input vector can be set at the level of the instantiation of this module
#ifndef MY_XOR_H_
#define MY_XOR_H_
#include <systemc.h>
template<int depth>
struct my_xor: sc_module {
sc_in<bool > clk;
sc_in<sc_uint<depth> > din;
sc_out<bool > dout;
void p1() {
dout.write(xor_reduce(din.read()));
}
SC_CTOR(my_xor) {
SC_METHOD(p1);
sensitive << clk.pos();
}
};
#endif /* MY_XOR_H_ */
Note that the struct my_xor: sc_module is used i.s.o. the SC_MODULE macro. (See page 40 , 5.2.5 SC_MODULE of the IEEE Std 1666-2011).
You can test this with the following testbench:
//------------------------------------------------------------------
// Simple Testbench for xor file
//------------------------------------------------------------------
#include <systemc.h>
#include "my_xor.h"
int sc_main(int argc, char* argv[]) {
const int WIDTH = 8;
sc_signal<sc_uint<WIDTH> > din;
sc_signal<bool> dout;
sc_clock clk("clk", 10, SC_NS, 0.5); // Create a clock signal
my_xor<WIDTH> DUT("my_xor"); // Instantiate Device Under Test
DUT.din(din); // Connect ports
DUT.dout(dout);
DUT.clk(clk);
sc_trace_file *fp; // Create VCD file
fp = sc_create_vcd_trace_file("wave"); // open(fp), create wave.vcd file
fp->set_time_unit(100, SC_PS); // set tracing resolution to ns
sc_trace(fp, clk, "clk"); // Add signals to trace file
sc_trace(fp, din, "din");
sc_trace(fp, dout, "dout");
sc_start(31, SC_NS); // Run simulation
din = 0x00;
sc_start(31, SC_NS); // Run simulation
din = 0x01;
sc_start(31, SC_NS); // Run simulation
din = 0xFF;
sc_start(31, SC_NS); // Run simulation
sc_close_vcd_trace_file(fp); // close(fp)
return 0;
}
Note that I'm using a struct and not a class. A class is also possible.
class my_xor: public sc_module{
public:
The XOR in this code is just the xor_reduce. You can find more about in the IEEE Std 1666-2011 at page 197 (7.2.8 Reduction operators). But I assume this is not the solution you wanted to have.