Driving input signals combinatorially (in the same cycle) in UVM

Driving input signals combinatorially (in the same cycle) in UVM - verification

If I want to combinatorially drive a design input signal based on certain output from the design in UVM driver, what is the best way? If I implement it in run phase and look at the design output signal, I will see it on next positive edge of clock, right? This will waste a cycle.
E.g. rd input signal is asserted randomly to design; except when empty is high, it should de-assert in the same cycle.

Implementing anything in the run phase does not automatically mean that you will synchronize on the posedge of the clock. You can always fork out a method from the run phase that waits for a change in a specific signal and then does something at that point:
task run_phase(uvm_phase phase);
fork
monitor_comb_sig();
join_none
endtask
task monitor_comb_sig();
forever begin
#(some_signal); // waits until some_signal changes
// drive some other signal based on this change
end
endtask

Related

How can I have Variable Frequency PWM with STM32?

I am working on an LLC converter project. So I need PWM signals with variable frequency. I mean I need too change frequency real time. For example frequency modulation 40kHZ-80kHZ. Can anyone give me an idea? Which timer mode I have to use ? Thanks..

Its a little tricky to answer your question when you dont state the exact hardware you're working with. Seeing your tags i will assume its a member of the the STM32 family.
STM standard timers have registers you usually dont need to interface directly. Hal does that for you. However as far as i am aware Hal does not support such functionalities. The standard STM32 Timer has a TIMx_ARR and a TIMx_CCRn register. These hold some of the configuration necessary for PWM generation. You should be able to change your frequency by adjusting the ARR register and the Duty-Cycle by adjusting the CCRn register.
Be carful with that approach tho as it usually has no inbuilt protection. You will not damage your device but it is very easy to produce unintended behavior.
You also need to consider the prescaler values and the general configuration of your timer.
For detailed information refer to the Chapter: GPTIM in the reference Manual of your device as i can not give you a more detailed description with the little information you have provided.

As far as I understood from your question and follow-up comments, you want a constant duty cycle (~50%), but you want variable fequency, as well as phase shift. That is totally doable, and you can change the values on the fly, but for the phase shift, I would suggest using two timers. One master, one slave.
Idea:
Master slave controls the phase shift. The period of master slave is equal to the period of the final waveform. It goes from 0 to its ARR, and at some point there is the phase shift value in Compare register, which flips the output of master from LOW to HIGH on its way from 0 to ARR.
The slave is activated by master output change from LOW to HIGH, runs for one period, which is equal to the period of the master (ARR). It outputs PWM to some pin. Once it reaches ARR, it stops (only for master to start it again). Obviously, you need to adjust Compare register for PWM output to keep the duty cycle constant.
I made some crude illustration of what I mean, because discussing timers with text only can be a little (very) tricky. Paint skills 10/10:
How to adjust stuff:
Adjust frequency (period length) by changing ARR of both timers (it's always the same), if you want to keep duty cycle, you will need to immediately adjust compare value to ARR/2 (for ~50% duty cycle) of the slave timer. Make sure the compare value of the phase shifter master is below the ARR if you reduce ARR, otherwise the slave will never get triggered.
Adjust phase shift by changing compare value of the master timer between 0 and ARR.
Additional notes:
The master timer is configured to have TRGO (trigger output, master feature) on switch from LOW to HIGH of "compare".
The slave timer is in one pulse mode (OPM), meaning it disables itself after a single period. It will be reactivated by the next master's phase shift pulse (compare HIGH).
The master's signal is supposed to do reset and activation of the timer (resets CNT) (there is a list of modes - what TRGI trigger input does to the slave timer). Resetting the timer will load new values into ARR (see next point).
Both master and slave have ARR buffer enabled. This will allow you to change ARR values, but the changes take effect only when the current cycle ends. This will prevent jitter while changing period length and/or phase shift.
The slave timer is in PWM1 or PWM2 mode, depending on whether you want the first part of the output aveform to be LOW or HIGH, that's all the difference.
Helpful example from me:
I have written an implementation of master/slave timers with them activating each other differently purely on registers and with every line of code commented. I was a little new to it all (which shows in the structure of the project), it was literally my first experiment with timers after studying the timers in the reference manual for days, but I tried my best. I have a description of what I do in main.c. You may find it helpful. Note that the timers with the same numbers are similar or even identical across various STM32 devices, so my code is likely portable down to copy-paste into your code (which I'm totally OK with if you or anyone does it). Here is a link to main.c on my GitHub. I also have oscilloscope screenshots there.

TinyAVR 0-Series: Can I use pin-change sensing without entering interrupt handler?

I am evaluating the ATtiny806 running at 20MHz to build a cycle-accurate Intel 4004 microprocessor emulator. (I know it will be a bit too slow, but AVRs have a huge community.)
I need to synchronize to the external, two-phase non-overlapping clocks. These are not fast clocks (the original 4004 ran at 750kHz)
but if I spin-wait for every clock edge, I risk wasting most of my time budget.
The TinyAVR 0-series has a very nice pin-change interrupt facility that can be configured to trigger only on rising edges.
But, an interrupt routine round-trip is 8 cycles (3 in, 5 out).
My question is:
Can I leverage the pin-change sensing mechanism while never visiting an ISR?
(Other processor families let you poll for interruptible conditions without enabling interrupts from that peripheral). Can polling be done with a tight skip-on-bit/jump-back loop, followed by a set-bit instruction?

Straightforward way
You can always just poll on the level of the GPIO pin using the single cycle skip if bit set/clear instruction on the appropriate PORT register and bit.
But as you mention, polling does burn cycles so I'm not sure exactly what you want here - either a poll (that burns cycles but has low latency) or an interrupt (that has higher latency but allows processing to continue until the condition is true).
Note that if things get really tight and you are looking for, say, power savings by sleeping between clock signal transitions then you can do tricks like having an ISR that nevers returns (saving the IRET cycles) but that requires some careful coding probably with something like a state machine.
INTFLAG way
Alternately, if you want to use the internal pin state machine logic and you can live without interrupts, then you can use the INTFLAGS flags to check for the pin change configured in the ISC bits of the PINxCTRL register. As long as global interrupts are not enabled in SREG then you can spin poll on the appropriate INTFLAG bit to check/wait for the desired condition, and then write a 1 to that bit to clear the flag.
Note that if you want to make this fast, you will probably want to map the appropriate PORT to a VPORT since the VPORT registers are in I/O Memory. This lets you use SBIS to test the INTFLAG bit a single cycle and SBI to clear the bit in a single cycle (these instructions only work on IO memory and the normal PORT registers are not in IO Memory).
Finally one more complication, if you need to leave the interrupts on when doing this, it is probably possible by hacking the interrupt priority registers. You'd set the pin change to be on level 0, and then make sure the interrupts you care about are level 1 or higher, and then trick the interrupt controller into thinking that there is already a level 0 running so these interrupts do not actually fire. There are also other restrictions to this strategy so avoid it if at all possible.
Programmable logic way
If you want to get really esoteric, it is likely possible that you could route the input value of a pin to a configurable custom logic LUT in the chip and then route the output of that module to a bit that you test using a 1-cycle bit test (maybe an unused IO Pin). To do this, you'd feedback the output of the LUT back into one of its inputs and then use the LUT to create a strobe on the edge you are looking for. This is very complex, and also since the strobe has no acknowledgement that if the signal changes when you are not looking for it (in a spin check) then it will be lost and you will have to wait for the next edge (probably fatal in your application).

One clock cycle delay in communication between one SC_CTHREAD and another SC_CTHREAD

I am trying to model a simple direct mapped cache with main memory module which is an sc_cthread and a main memory state machine which also an SC_CTHREAD. I am observing one clock cycle delay from writing to a signal from my main memory module and receiving it on state machine.
How can I do it in only one clock cycle?

You cannot avoid the latency between threads when using an SC_CTHREAD. When writing to an sc_signal from one CTHREAD, the value change will only be visible to another CTHREAD at the next clock edge.
If you must use a CTHREAD (i.e. using high-level synthesis), then the only way to avoid the cross-thread latency is to place both functionalities within a single CTHREAD.
If you only need a behavioral model for simulation, then you could use SC_THREADs and sc_events. One thread can generate an sc_event that is being waited on by the second thread. When the second thread wakes on that event, it can observe sc_signal changes done by the first thread, and then produce an output (aligned with the clock edge if desired). Using sc_events gives the opportunity to sample and update signals "between" clock edges.

How to keep interrupts short?

The most heard advice in embedded programming is "keep your interrupts short".
Now my situation is that I have a very long running task in my main() loop (writing large blocks of data to SDcard), which can sometimes take 100ms. So to keep my system responsive I moved all other stuff to interrupt-handlers.
For example, normally one would handle the incoming UART data in an interrupt, then process the incoming command in the main() loop, and then send back the response. But in my case, the whole processing/handling of the commands also takes places in the interrupts, because my main() loop can be blocked for (relatively) long periods.
The optimal solution would be to switch to an RTOS but I don't have the RAM for it. Are there alternatives for my design where the interrupts can be short?

The traditional approach for this is for Interrupts to schedule a deferred procedure and end the interrupt as soon as possible.
Once the interrupt has finished, the list of deferred procedures is walked from most-important to least important.
Consider the case where you have your main (lower proiority) action, and two interrupts I1 and I2, where I2 is more important than main, but less important than I1.
In this case, let's suppose you're running main and I1 fires. I1 schedules a deferred procedure and signals to the hardware that I1 is done. I1's DPC now begins running. Suddenly I2 comes in from the hardware. I2's interrupt takes over from I1's DPC and schedules I2's DPC and signals to the hardware that it's done.
The scheduler then returns to I1's DPC (because it is more important), and when I1's DPC completes, I2's DPC begins (because it is more important than main), and then eventually returns execution to main.
This design allows you to schedule the importance of different interrupts, encourages you to keep your interrupts small, and allows you to complete DPCs in an ordered and in-order prioritized way.

There are 100 different ways to skin this cat, depending on CPU architecture (interrupt nesting & prioritization, software interrupt support, etc.) but let's take a pretty straightforward approach that is relatively simple to understand and free from the race conditions and resource-sharing hazards of a preemptive kernel.
(Disclaimer: my first choice is typically a preemptive real time kernel, many of them can run in extremely resource-constrained systems... SecurityMatt's suggestion is good but if you're not comfortable implementing your own preemptible kernel / task switcher, particularly one that handles asynchronous (interrupt-triggered) preemption, you can get wrapped around the axle pretty quickly. So what I'm proposing below is not as responsive as a preemption-based kernel, but it's much simpler and often adequate).
Create 3 event/work queues:
Q1 is the lowest priority and handles your slow, background SD card writes
Q2 holds requests to process incoming UART packets
Q3 (highest priority) holds UART RX FIFO read requests.
I split up the UART RX FIFO reading and the processing of the read packet so that the FIFO reading is always serviced ahead of the packet processing; maybe you want to keep them together, your choice.
For this to work, you break your large (~100ms) SD card write process into a bunch of smaller, discrete, run to completion steps.
So for example, to write 5 blocks, 20ms each, you write the first block, then enqueue "write next block" to Q1. You go back to your scheduler at the end of each step & scan the queues in priority order, starting with Q3. If Q2 and Q3 are empty, you pull the next event off of Q1 ("write next block"), and run that command for another 20ms before returning and scanning the queues again. If 20ms is not responsive enough, you break up each 20ms block write into a more fine-grained set of steps, continually posting to Q1 the next work step.
Now for the incoming UART stuff; in the UART RX ISR, you simple enqueue a "read UART FIFO" command in Q3, and return from interrupt back into the 20ms "write block" step that was interrupted. As soon as the CPU finishes the write, it goes back and scans the queues in priority order (worst case response will be 20ms if the block write had just begun at the time of the interrupt). The queue scanner (scheduler) will see that Q3 now has work to do, and it will run that command before going back and scanning again.
The responsiveness in your system, worst case, will be determined by the longest run-to-completion step in the system, regardless of priority. You keep your system very responsive by doing work in small, discrete, run to completion steps.
Note that I have to speak in generalities here. Maybe you want to read the UART RX FIFO in the ISR, put the data into a buffer, and only defer the packet processing, not the actual reading of the FIFO (then you'd only have 2 queues). You have to work this out for yourself. But I hope the approach makes sense.
This event-driven approach with prioritized queues is exactly the approach used by the Quantum Platform (QP) event-driven framework. The QP actually supports an underlying non-preemptive (cooperative) scheduler, such as what was described here, or a preemptive scheduler which runs the scheduler each an event is queued (similar to the approach suggested by SecurityMatt). You can see the code/implementation of the QP's cooperative scheduler over at QP website.

An alternative solution would be as follow:
Anywhere the FAT library can capture the processor for a long time, you insert a call to a new function which is normally very fast and return to the caller after a few machine cycles. Such fast function would not impact the real-time performance of your time consuming operation, such as reading/writing to SD Flash. You would insert such call in any loop that wait for a flash sector to be erased. You also insert a call to such function in between every 512 bytes written or 512 bytes read.
The goal of that function is to perform most of the task that you would normally have inside the "while(1)" loop in a typical "main()" for embedded device. It would first increment an integer and perform a fast modulo on the new value, then return if the modulo is not equal to an arbitrary constant. The code is as follow:
void premption_check(void)
{
static int fast_modulo = 0;
//divide the number of call
fast_modulo++;
if( (fast_modulo & 0x003F) != 3)
{
return;
}
//the processor would continue here only once every 64 calls to "premption_check"
Next, you call the functions that extract RS232 characters/strings from the serial port interrupts, process any command if complete strings are received, etc
The binary mask 0x3F used above means that we look only at the 6 least significant bits of the counter. When these 6 bits happen to be equal to the arbitrary value 5, when go ahead with the calls to functions which may take some micro-second or even milli-second to execute. You may want to try smaller or larger binary mask depending on the speed at which you want to service the serial port and other operations. You may even use simultaneously more than one mask to service some operation faster than other.
The FAT library and the SD card should not experience any problem when some sporadic delay happen in between two Flash erase operation, for example.
The solution given here works even with a micro-controller with only 2K byte, like many variant of 8051. As incredible as it may seems, the pinball machine of 1980 to 1990 had a few K of RAM, slow processors (like 10 MHz) and they where able to test one hundred switch... fully debounced, update a X/Y matrix display, produce sound effects, etc The solutions developed by these engineer can still be used to boost the performance of large system. Even with the best servers with 64 Gig RAM and many Terabyte of hard disk, I presume that any bytes count when some company want to index billions of WEB pages.

As no-one has suggested coming at it from this end yet I'll throw it in the hat:
It's possible that sticking the SD card service routine in a low-priority interrupt, maybe throwing in some DMA if you can, would free up your main loop & other interrupts to be more responsive, rather than being stuck in a main() loop waiting for longtime for something to finish.
The caveat to this is I don't know if the hardware has any way of triggering the interrupt when the SD card is ready for more, you might have to cheat by running a polling timer to check & force the interrupt. I'm not above that sort of thing though, if you have spare hardware timers & interrupts it can be done with very little overhead.
Resorting to an RTOS for something like this would seem overkill & an admission of failure to me... ;)

VHDL - When does a process() run for the first time?

Consider : process(a)
According to the text i have :
A process is first entered at the time
of simulation, at which time it is
executed until it suspends itself due
to a wait statement or a sensitivity
list.
Am i right in inferring that a process WILL have to run once even without any events on the sensitivity list? Also, what if there are multiple processes inside an architecture, are they all executed once?

AFAIK, the sensitivity list (eg, process (x,y) )is just a shorthand for wait on x,y; just before the end process of a procedure (pg 152, "A Designer's Guide to VHDL" 3rd edition). So all procedures will run at least once.

There are 3 stages involved in running a VHDL simulation. These are elaboration, initialisation and simulation.
At the beginning of the initialisation phase, the current time is set to 0. The simulation kernel then places all of the simulation processes in the active processes queue. Each simulation process is then taken from this queue and executed until it suspends. The order of execution of simuation processes during initialisation is not important. The initial execution of each simulation process ensures that all initial transactions are scheduled so that the simulation can continue.
A simulation process is suspended either implicity or explicity. A process with a sensitivity list is suspended implicity after its sequential statements have been executed to the end of the process. A process with one or more wait statements is suspended explicitly when its first wait statement is executed.
When the active processes queue is empty, the initialisation phase is complete.
So to answer your question, all processes will run once during the initialisation phase.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas