I'm studying about 8085 microprocessor and found that it works on 3 Mhz and takes for instance 4 clock cycles for opcode fetch, 3 clock cycles for IO/M read or write.
So now, If I somehow overclock 8085 to work at 10 Mhz then will the number of clock cycles get reduced ? Or the number of cycles will remain same but the effective time for 3 or 4 complete cycles will get reduced ?
Overclocking reduces the length of a cycle. Instructions still take the same number of cycles.
The clock cycle is a heart beat that is used to synchronise actions across a circuit. Different parts act simultaneously, passing data at appropriate moments based on the clock. For example, a four-cycle instruction for a fictional architecture might be performed as:
load accumulator with value of register while issuing memory read cycle;
add low four bits of fetched value to accumulator;
add high four bits of fetched value to accumulator;
move value from accumulator back to register.
You can't cut a cycle from that without changing the architecture somewhere — simply providing a faster clock can't make the ALU suddenly work in 8-bit quantities and, even if it did, the register to which its final value goes wouldn't expect to receive a value earlier than the fourth cycle.
Simply changing the clock rate doesn't change the layout or overriding logic of the circuit so it can't change the number of cycles it takes different parts to do things or the relative times at which they expect other parts to have things done.
Related
I’m writing a gameboy emulator and I’ve come to implementing the graphics. However I can’t quite figure out how it works with the cpu as far as timing/clock cycles go. Does the CPU execute a certain amount of cycles (if so how many) and then hand it of to the GPU? Or is the gameboy always in a hblank/vblank state and the GPU uses the CPU in between them? I can’t find any information that helps me with this, only how to use the control registers.
This has been answered at https://forums.nesdev.com/viewtopic.php?f=20&t=17754&p=225009#p225009
It turns out I had it completely wrong and they are completely different.
Here is the post:
The Game Boy CPU and PPU run in parallel. The 4.2 MHz master clock is
also the dot clock. It's divided by 2 to form the PPU's 2.1 MHz memory
access clock, and divided by 4 to form a multi-phase 1.05 MHz clock
used by the CPU.
Each scanline is 456 dots (114 CPU cycles) long and consists of mode 2
(OAM search), mode 3 (active picture), and mode 0 (horizontal
blanking). Mode 2 is 80 dots long (2 for each OAM entry), mode 3 is
about 168 plus about 10 more for each sprite on a given line, and mode
0 is the rest. After 144 scanlines are drawn are 10 lines of mode 1
(vertical blanking), for a total of 154 lines or 70224 dots per
screen. The CPU can't see VRAM (writes are ignored and reads are $FF)
during mode 3, but it can during other modes. The CPU can't see OAM
during modes 2 and 3, but it can during blanking modes (0 and 1).
The link gives more of a general answer instead of implementation specifics, so I want to give my 2 cents.
CPU usually is the main part of your emulator and what actually counts cycles. Each time your CPU does something for any amount of cycles you pass that amount of cycles to other components of your emulator so that they can synchronize themselves.
For example, some CPU instructions read and write memory as part of a single instruction. That means is would take Gameboy CPU 4 (read) + 4 (write) cycles to complete the instruction. So in emulator you do the read, pass 4 cycles to GPU, do the write, pass 4 cycles to GPU. You do the same for other components that run parallel to the CPU like timers and sound.
It's actually important to do it that way instead of emulating whole instruction and then synchronizing everything else. Don't know about real ROMs but there're test ROMs that verify this exact behavior. 8 cycles is a long time and in the middle of multiple memory accesses some other Gameboy component might make a change.
I am using the STM32F4 microcontroller with a microSD card. I am capturing analogue data via DMA.
I am using a double buffer, taking 1280 (10*128 - 10 FFTs) samples at a time.
When one buffer is full I am setting a flag and I then look at 128 samples at a time and run an FFT calculation on it. All of this is running well.
The data is being sampled at the rate I want and FFT calculation is as I would expect. If I just let the program run for one second, I see that it runs the FFT approximately 343 times (44000/128).
But the problem is I would like to save 64 values from this FFT to the SD card.
I am using the HCC fat file system library.
Each loop of the FFT calculation I am copy the 64 values into an array.
After every 10 calculations I write the contents of this array to file and start again.
The array stores 640 float_32 values (10*64).
This works perfectly for a one-second test run. I get 22,000 values stored to the SD card.
But as I increase the time I start losing samples as it take the SD card longer to write. I need the SD card to store over 87 kbit/s (4 bytes * 64 * 343 = 87808) consistently. I have tried increasing the DMA buffer sample size and then the number of times it writes, but didn't find it helped.
I am using an 8G microSD card, class 4. I formatted the SD card to the default FAT32 allocation unit size 2048.
How should I organize the buffering of data to allow for this? I thought using fewer writes might help. Would a queue help? How would I implement this and would anyone have an example?
I saw that clifford had a similar problem and he was using a queue, How can I use an SD card for logging 16-bit data at 48 ksamples/s?.
In my case I got it to work by trying a large number of different cards - they vary a great deal. If I had enough RAM available for a longer buffer that would have worked too.
If you are not using an RTOS, the queue buffering option may not be available to you, or at least would be non-trivial to implement.
Using an RTOS queue, I suggest that you create a queue of messages each of length 64*sizeof(float_32), the number of messages in the queue will be determined by the ammount of card latency you need to deal with; a length of 343 for example, will sustain a card stall of 1 second, and will require 87Kb of RAM. The application will then have a high priority thread performing the FFT and placing data in the queue, while a low priority thread takes data from the queue and writes to the file.
You might improve performance further by accumulating multiple message blocks in your DMA buffer before initiating a write, and there may be some benefit in carefully selecting an optimum DMA buffer length.
Flash is very, very sensitive to overwrites. Writing 3kB and then a further 3kB may count as an overwrite of the first 4 kB. In your case, there's no good reason why you'd want such small writes anyway. I'd advise 16 kB writes (32 frames/write * 64 samples/frame * 4 bytes/sample). You'd need 5 or 6 writes per second, which should be well in spec of any old SD card.
Now it's quite likely that you'd get another 1280 samples it while writing; you'll have to deal with that on another thread. Should be no problem as the writing should block without using CPU (it's a low-level Flash delay)
The most probable cause of the problem might be the way you are interfacing the card through the library.
SD cards over the SPI protocol (which I assume being used here) can be read or written in 512 byte sector units, some SD commands making it possible to stream (to perform sequential sector access faster). An important element of the SD card SPI protocol are various delays, where you have to poll the card whether you could start an operation (such as writing data to a sector).
You should read the library's API to discover how its writing process might work. You will need to perform some regular action which in the end would poll the card to know whether the writing process could continue. Some cards might require a set number of accesses before becoming ready for an operation, some others might use timeouts for state transitions. It might not work well to have the function called relatively rarely (such as once in 2-3 milliseconds) anticipating the card getting ready meanwhile. You have to keep on nagging it whether it completed already.
Just from own experiences with SD interfacing.
I am unable to understand the difference between Bus Cycle, Instruction Cycle and Machine Cycle. Please help me out. Thanks
First off, computers use a clock. The frequency of this clock indicates how many (Giga/Mega/Kilo) cyles per second that the clock wave changes. This is the basis of any cycle for the computer.
The bus cycle is the cycle or time required to make a single read or write transaction between the cpu and an external device such as external memory.
The machine cycle is the amount of cycles needed to do either a fetch, read or write operation. more here. The read or write may be more than a single bus cycle if the transaction between the CPU and memory is longer than the data width fetched or written. For example, on an 8080 machine, the data width is 8 bits. If the CPU needs to fetch or write 16 bits of data, that will require two bus cycles.
The instruction cycle is how many of these machine cycles are needed to complete an instruction. This varies depending on the instruction. For instance, some instructions after fetching them from memory need to fetch more data to complete the instruction, some need to write data at the end of the instruction cycle, some instructions don't do much at all, like the NOP, which basically fetches the instruction and does nothing for one machine cycle.
I hope this helps a bit. If not, maybe microprocessor timing diagrams will help clear things up a bit more.
I need to drive a 32Khz square wave on pin 19 of a Renesas R8C/36C µController. The pin is non-negotiable (the circuit design is already complete.)
The software design uses a 250 µsec interrupt for simulating multi-tasking, but that's only good for 2Khz full-wave.
Do I need to create another higher-priority interrupt for driving 32 Khz, or is there some other trick that I'm not aware of?
R8C/36C Hardware Manual
R8C/36C Software Manual
I am not familiar with the RC8 and Renesas don't say much on the subject of performance, but it is a CISC processor with typically 4 cycles per instruction, so lets estimate about 4 MIPS? Some instructions are much longer with division up to 30 cycles.
So if you create a 64KHz timer and flip the output on each interrupt, you have about 63 instructions between each interrupt, you have the interrupt latency plus the code to flip the bit. If it works at all, it is likely to constitute a significant CPU load and may affect the timeliness of other operations.
Be realistic, without a redesign, the project may not be viable. You are already stressing it with the 4KHz OS tick in my opinion - the software overhead at that rate is likley to be a significant chunk of your CPU load.
[ADDED]
I previously suggested 6 instructions between interrupts - finger trouble in the calculator, I have changed that estimate to 63, and moderated my conclusion to "barely feasible".
However I looked again at the data sheet, interrupt latency is variable because the instruction execution is variable, and the current instruction must complete before the interrupt is serviced, the worst case is when the DIVX instruction is executing, when it takes up-to 51 cycles before the first instruction of the interrupt routine. That's 2.55us, when you need the interrupt to trigger every 15.625us, the variable latency will impose significant jitter and constitutes 6 to 16 % of your total CPU time without even considering that used by the ISR itself.. Plus if the interrupt itself is pre-empted, or a higher priority interrupt is running when this one becomes due, further jitter will be imposed.
Whether it works will depend on the accuracy and jitter constraints of the 32KHz, and whatever else your code needs to get done.
As many people have pointed out, this design doesn't seem to be very good from a hardware standpoint if the 32khz clock is meant to be generated with a gpio.
However, I don't know How desperate is your situation, nor do I know the volume involved. But if it is a prototype or very short series, and pin 20 is free, you can short-circuit pins 19 and 20, setup pin 19 as an input and 20 as output. Since pin 20 can be used as output from timer rd, you could set up that timer to output the 32khz without using any interrupts.
I am not a renesas micro expert, but I'm talking from what I've seen in the data sheet you attached and previous experience with other mcu's.
I hope this helps.
Looking at the datasheet for that chip:
It looks like your only real option is to use the pin as a generic output port.
the only usable output mode seems to be the generic output port.
If you can't strap pin 19 to another pin that has the hardware to generate 32KHz and just make pin 19 an input? Not a proud moment but it was easy on a DIL package.
Could you call an interrupt every 15.6us and toggle pin19 then on the sixteenth interrupt do the multi-tasking stuff but that is likely to be wasteful. With an interrupt rate of 32Khz, setting pin19 then eighth of the time doing the multi-tasking decisions and the other seven times wait till a point you can reset pin19 and do some background code for less than half the CPU time
how does clock control various events(operations) from being occurred in desired sequence?what is the significance of a clock cycle time(i've heard that many operations can be issued in a single clock cycle)?
or simply,how does CPU controls operation ordering?
CPUs have various processing units (float, vector, integer), and pipelines of different lengths for each unit.
The clock determines at which speed it will go through all operations in a pipeline, each operation being a tick. Once it gets to the end, the result is sent back to cache/memory.
Multiple pipelines can be active at the same time.
That's all I can tell you..
Ars Technica used to have great articles about this, such as this one:
Understanding the Microprocessor
The clock does not control the sequence of instructions. The clock controls the amount of times per second that the CPU "ticks." Each time is referred as a cycle and consequently each cycle takes some time to complete.
The sequence of instructions is dictated by the running program. Modern CPUs also include optimisations that influence the exact sequence.
These optimisations also make the clock speed (= amount of cycles per second) less significant. For example a dual core CPU is able to execute two instructions in the same cycle.
Yes usually instructions complete in a couple of cycles and compilers optimise the programs to use costly instructions less.