One clock cycle delay in communication between one SC_CTHREAD and another SC_CTHREAD - systemc

I am trying to model a simple direct mapped cache with main memory module which is an sc_cthread and a main memory state machine which also an SC_CTHREAD. I am observing one clock cycle delay from writing to a signal from my main memory module and receiving it on state machine.
How can I do it in only one clock cycle?

You cannot avoid the latency between threads when using an SC_CTHREAD. When writing to an sc_signal from one CTHREAD, the value change will only be visible to another CTHREAD at the next clock edge.
If you must use a CTHREAD (i.e. using high-level synthesis), then the only way to avoid the cross-thread latency is to place both functionalities within a single CTHREAD.
If you only need a behavioral model for simulation, then you could use SC_THREADs and sc_events. One thread can generate an sc_event that is being waited on by the second thread. When the second thread wakes on that event, it can observe sc_signal changes done by the first thread, and then produce an output (aligned with the clock edge if desired). Using sc_events gives the opportunity to sample and update signals "between" clock edges.

Related

How to synchronize between queues on different CPU threads?

It is said semaphores are designed for this but how? It looks like I need to submit the semaphore before waiting for it to signal. Then what's the point of multithreading?
I'm using skia (has its own VkQueue) to draw UI, I don't have access to the commandbuffer, I can only provide semaphores for it. it first waits for the scene complete semaphore then draw ui and signals present ready semaphore.
It works fine when everything happens in a single thread. But after I move the UI part to a second thread. It stopped working and I got validation errors like: VkQueue is waiting on semaphore that has no way to be signaled. Of course, since it's on a different thread, the semaphore might not have been submitted to a queue yet.
The spec for vkQueuePresentKHR says
All elements of the pWaitSemaphores member of pPresentInfo must be semaphores that are signaled, or have semaphore signal operations previously submitted for execution
You can't submit work that waits on a semaphore that you plan to submit later. If you have this kind of dependency in your code you need to externally synchronize the submissions so the command buffers that will signal will be sent BEFORE you submit the dependent command buffers, regardless of the queue.
If you're using multiple threads it sounds like you need to rely on some CPU side synchronization primitives, like a CPU semaphore to properly order the work between them. Pure Vulkan sync primitives won't help you there.

TinyAVR 0-Series: Can I use pin-change sensing without entering interrupt handler?

I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?
Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).

Timed simulation with SystemC

With reference to question SystemC module not working with SC_THREAD, a timed simulation is imitated using next_trigger(). As I understood from this article, this restarts the thread after the specified time:
next_trigger(double, sc_time_unit): The process shall be triggered when specified time has elapsed.
I.e. it effectively executes the operations after the occurrence of this instruction after the time specified, but also executes the operations found before that instruction. I have the feeling that the repeated utilization of next_trigger within an SC_THREAD may result in 'glitches' in the simulation.
Q1: Is my feeling correct?
Q2: Is there another possibility to delay execution (something that suspending the thread for the given time, rather than restarting it)
First of all next_trigger can only be used with SC_METHOD's as mentioned here:
next_trigger() is used with process methods, one's which are not threads.
Here are a few pointer's in term of SystemC processes:
SC_METHOD's are processes which must complete it's execution at one pass.(e.g.: a simple function call)
Note: Do not use while(1) loops in SC_METHOD's.
SC_THREAD's are processes which are separate thread of execution, one must explicitly use wait() statements here to synchronize the SystemC kernel simulation. This is the place where you will mostly find while(1) (infinite) loops in use.
For suspending the thread for some simulation time you can use the wait() statement to introduce the perceived delay.
But for better understanding you need to understand the difference between static and dynamic sensitivity in SystemC refer here for more information.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?
Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.
I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

Driving input signals combinatorially (in the same cycle) in UVM

If I want to combinatorially drive a design input signal based on certain output from the design in UVM driver, what is the best way? If I implement it in run phase and look at the design output signal, I will see it on next positive edge of clock, right? This will waste a cycle.
E.g. rd input signal is asserted randomly to design; except when empty is high, it should de-assert in the same cycle.
Implementing anything in the run phase does not automatically mean that you will synchronize on the posedge of the clock. You can always fork out a method from the run phase that waits for a change in a specific signal and then does something at that point:
task run_phase(uvm_phase phase);
fork
monitor_comb_sig();
join_none
endtask
task monitor_comb_sig();
forever begin
#(some_signal); // waits until some_signal changes
// drive some other signal based on this change
end
endtask