Working on GIC 600 for 1 of N model for SPI. I have 3 question below
Suppose we have 4 PE. Suppose first core is busy with higher priority task and others are in sleep. what is the decision of GIC?
At what condition the interrupt is divert from one core to another core
Can we route interrupt to specific PE in 1 of N model
Thank you in advance
Explore GIC600 spec but not clear about how GIC behave in different scenarios
Related
The Drake system has been installed and I am trying to get a 6 Degree of Freedom arm from Kinova to be integrated with Drake.
This work would provide a financial support
I have committed my code to the fork and I am awaiting the software to be accepted
I used the 7 DOF arm as a starting point and created the 6 dof arm/hand
I successfully implemented a solver that fits my needs. However, I need to run the solver on 1500+ different "problems" at 0:00 precisely, everyday. Because my web-app is in ruby, I built a quarkus "micro-service" that takes the data, calculate a solution and return it to my main app.
In my application.properties, I set:
quarkus.optaplanner.solver.termination.spent-limit=5s
which means each request take ~5s to solve. But sending 1500 requests at once will saturate the CPU on my machine.
Is there a way to tell OptaPlanner to stop when the solution is good enough ? ( for example if the score is stable ... ). That way I can maybe reduce the time from 5s to 1-2s depending on the problem?
What are your recommandations for my specific scenario?
The SolverManager will automatically queue solver jobs if too many come in, based on its parallelSolverCount configuration:
quarkus.optaplanner.solver-manager.parallel-solver-count=3
In this case, it will run 3 solvers in parallel. So if 7 datasets come in, it will solve 3 of them and the other 4 later, as the earlier solvers terminate. However if you use moveThreadCount=2, then each solver uses at least 2 cpu cores, so you're using at least 6 CPU cores.
By default parallelSolverCount is currently set to half your CPU cores (it currently ignores moveThreadCount). In containers, it's important to use JDK 11+: the CPU count of the container is often different than from the bare metal machine.
You can indeed tell the OptaPlanner Solvers to stop when the solution is good enough, for example when a certain score is attained or the score hasn't improved in an amount of time, or combinations thereof. See these OptaPlanner docs. Quarkus exposes some of these already (the rest currently still need a solverConfig.xml file), some Quarkus examples:
quarkus.optaplanner.solver.termination.spent-limit=5s
quarkus.optaplanner.solver.termination.unimproved-spent-limit=2s
quarkus.optaplanner.solver.termination.best-score-limit=0hard/-1000soft
I’m writing a gameboy emulator and I’ve come to implementing the graphics. However I can’t quite figure out how it works with the cpu as far as timing/clock cycles go. Does the CPU execute a certain amount of cycles (if so how many) and then hand it of to the GPU? Or is the gameboy always in a hblank/vblank state and the GPU uses the CPU in between them? I can’t find any information that helps me with this, only how to use the control registers.
This has been answered at https://forums.nesdev.com/viewtopic.php?f=20&t=17754&p=225009#p225009
It turns out I had it completely wrong and they are completely different.
Here is the post:
The Game Boy CPU and PPU run in parallel. The 4.2 MHz master clock is
also the dot clock. It's divided by 2 to form the PPU's 2.1 MHz memory
access clock, and divided by 4 to form a multi-phase 1.05 MHz clock
used by the CPU.
Each scanline is 456 dots (114 CPU cycles) long and consists of mode 2
(OAM search), mode 3 (active picture), and mode 0 (horizontal
blanking). Mode 2 is 80 dots long (2 for each OAM entry), mode 3 is
about 168 plus about 10 more for each sprite on a given line, and mode
0 is the rest. After 144 scanlines are drawn are 10 lines of mode 1
(vertical blanking), for a total of 154 lines or 70224 dots per
screen. The CPU can't see VRAM (writes are ignored and reads are $FF)
during mode 3, but it can during other modes. The CPU can't see OAM
during modes 2 and 3, but it can during blanking modes (0 and 1).
The link gives more of a general answer instead of implementation specifics, so I want to give my 2 cents.
CPU usually is the main part of your emulator and what actually counts cycles. Each time your CPU does something for any amount of cycles you pass that amount of cycles to other components of your emulator so that they can synchronize themselves.
For example, some CPU instructions read and write memory as part of a single instruction. That means is would take Gameboy CPU 4 (read) + 4 (write) cycles to complete the instruction. So in emulator you do the read, pass 4 cycles to GPU, do the write, pass 4 cycles to GPU. You do the same for other components that run parallel to the CPU like timers and sound.
It's actually important to do it that way instead of emulating whole instruction and then synchronizing everything else. Don't know about real ROMs but there're test ROMs that verify this exact behavior. 8 cycles is a long time and in the middle of multiple memory accesses some other Gameboy component might make a change.
I saw a few graphs and tables showing how well are the CPU instructions interleaved. For example:
time → total: 7
1 B = a + b ● ● ●
2 C = c + d ● ● ●
3 A = B * C ○ ○ ● ● ●
which I got from Playing with the CPU pipeline.
My question is twofold: how to find out the stalls in the first place, and how to visualize them in a readable way? I mean, what software is used to look at and optimize code at such level?
Short: In most cases, no software is used to "look at the stalls". Stalls are predictable, and can be found without even touching a computer. You know when they will happen, and you can draw them however you like.
Full story:
First of all you have to understand pipelining.
Each "action" that has to be taken to process an instruction, is executed on separate hardware (seperate parts of your CPU). If there are 7 steps, you could be handling 7 instructions at the same time.
(source: http://www.phatcode.net/res/260/files/html/CPUArchitecturea3.html)
This image visualizes this pipelining. You can see multiple instructions shifting through the CPU.As soon as instruction 1's opcode is retrieved, it doesn't need the opcode hardware anymore. Instruction 2's opcode can now be retrieved. This goes the same for all other blocks.
The important thing to notice here, is to see that values for instruction 2 are loaded before instruction 1 finished. This is possible, if the values of instruction 2 do not depend on instruction 1. If they do depend on instruction 1, instruction 2 needs to be stalled. It will wait at its place. Instead of at T5, values will be retrieved at T6. At this point instruction 1 has stored its result, so instruction 2 can proceed.
This is what you see with 1 and 2. They're independent, allowing to execute the next instruction without any stalls. However, 3 depends on 1 and 2. This means it has to wait until both results are stored.
To answer your question now:
How did we know? We saw it, without using any tool. How did we visualise it? The same way we would visualise any other data, meaning you can choose which way, as long as it's clear to understand.
Please note that this is a simplified answer, in order to make it understandable. Pipelining and processor optimizations are way more advanced in modern computers. For example, there are (conditional) jumps, which can cause instruction 2,3,4 to be skipped, and all of a sudden another instruction has to be loaded in the pipeline due to the jump. You can find a lot about this (both simplified and advanced), when searching for pipelining.
More detailed information on this topic:
http://www.phatcode.net/res/260/files/html/CPUArchitecturea3.html section 4.8.2. (This is what I found while googling to refresh my memory, but it looks like pretty good information)
I have:
A I/O devices
B Processors
C Processes
My main memory is large enough to hold C processes.
A is smaller than B and B smaller than C
What is the maximum number of processes that can be in either block-suspended state or in ready-suspended state at one time?
In other words: How many processes can a hard drive hold at one time according to my data shown above?
A,B,C are numbers
The maximum number of blocked processes can be C, you could be dead-locked. The maximum blocked processes that won't result in a deadlock is C - 1. Someone has to be doing work somewhere to advance the system.
The maximum number of ready processes is going to be C - B. Everything is ready to run, and B processes are currently running.
The number of I/O devices doesn't matter. Either everyone is fighting over a single resource, or everyone is fighting over many resources. In the end, the amount of contention is going to be a factor of resource utilization.