When to use Simulink in an embedded processor - embedded

We are developing a motor controller on a dsPIC. We intend to use Simulink to model the motor control algorithm with Real Time Embedded Workshop to convert the Simulink model into C code.
Our firmware will have some other minor logic operations, but its main function is motor control. We are wondering if we should try to do all the firmware in Simulink or seperate the logic operations into C code, while the motor control algorithm stays in Simulink?
Does anyone have a recommendation on which path we should start down?
thanks,
Brent

FYI I just built a system like yours but with a TI DSP.
I'm assuming you're doing something complex like vector control. If so, here's what you do: In your model, make one block for each task / each time period you need. This may just be the PWM interrupt with control in it. Define all the IO each task will need - try to keep each signal to 16 bits which are atomic on the DsPIC (this eliminates most rate transitions). Get simulink to make each top-level block a function call. Do only the control inside this/these blocks and leave all hardware configuration, task scheduling, other logic to C code. Simulink can generate a C and H file that you just include in the project with the other code. You'll fill out a structure of inputs, call the function, and get back a structure with the outputs. Keep the model clean of all hardware dependencies.
Do not believe the Mathworks marketing folks. They want you to do everything in Simulink. Don't get me wrong, it's a great tool for certain types of things. But for stuff you can't do in the model (like hello world) they suggest using the "legacy code tool" as if anything that's not a model is "like totally old-school". Restrict your model to control loops and signal flows - which it's good for - and it'll be fine.

Do the logic operations interact with the motor control or are they just unrelated operations? The degree of interaction could help make the decision.
If they are unrelated then for maintainability it might be best to keep them out of the model. Then you can update the logic without having to regenerate the C for the entire SimuLink model. There would be less chance of a regression problem.
If they are related to or interact with the model, then of course it is a case to keep them in the model so that you don't get incompatible versions linked into a build.

Related

How can I use functions in another program

I want to create a simulation program, for BUS related simulations.
These simulations use lot of calculations, and methods, and there are more and more.
First I want to write the Core of the program, which can use separately coded and compiled program blocks. What kind of programming technology is good for this?
For example: I want to send a CAN frame with checksum calculation. In the core program I choose from a directory that which checksum calculation is good for me.

Input live data into AnyLogic

I'm currently a Mechanical Engineering student that is looking into a project on Intelligent Manufacturing.
I have been using AnyLogic to explore manufacturing simulation. I have created a basic Jobshop that involves that transportation of material pallets from delivery to storage to processing. My next step is to transition this static scheduling system to a dynamic scheduling system.
I would like to know if there is any way to actively manipulate the simulation whilst it is running? For example, controlling the availability of processing machines in real time or triggering a delivery. So far I have been unable to find any way of manipulating the simulation after it has been run.
Does anybody have experience with real time data input into simulation software?
In your model, you can always add control elements (buttons, check boxes, sliders, etc). By adding these in the models you can control your model on runtime. For instance... if you have a variable X equal to 3 in your model, if you use a button, you can add the code X=4; and the variable X will change its value.
My suggestion is for you to explore the different options in the controls palette and refer yourself to the anylogic help to learn how to use each of them.
These controls must be placed in "main" in order to make changes when the simulation is running. If you place them in the simulation experiment window, then you won't be able to use them on runtime.
Your model will look like this:

disambiguating HPCT artificial intelligence architecture

I am trying to construct a small application that will run on a robot with very limited sensory capabilities (NXT with gyroscope/ultrasonic/touch) and the actual AI implementation will be based on hierarchical perceptual control theory. I'm just looking for some guidance regarding the implementation as I'm confused when it comes to moving from theory to implementation.
The scenario
My candidate scenario will have 2 behaviors, one is to avoid obstacles, second is to drive in circular motion based on given diameter.
The problem
I've read several papers but could not determine how I should classify my virtual machines (layers of behavior?) and how they should communicating to lower levels and solving internal conflicts.
These are the list of papers I've went through to find my answers but sadly could not
pct book
paper on multi-legged robot using hpct
pct alternative perspective
and the following ideas are the results of my brainstorming:
The avoidance layer would be part of my 'sensation layer' and that is because it only identifies certain values like close objects e.g. ultrasonic sensor specific range of values. The other second layer would be part of the 'configuration layer' as it would try to detect the pattern in which the robot is driving like straight line, random, circle, or even not moving at all, this is using the gyroscope and motor readings. 'Intensity layer' represents all sensor values so it's not something to consider as part of the design.
Second idea is to have both of the layers as 'configuration' because they would be responding to direct sensor values from 'intensity layer' and they would be represented in a mesh-like design where each layer can send it's reference values to the lower layer that interface with actuators.
My problem here is how conflicting behavior would be handled (maneuvering around objects and keep running in circles)? should be similar to Subsumption where certain layers get suppressed/inhibited and have some sort of priority system? forgive my short explanation as I did not want to make this a lengthy question.
/Y
Here is an example of a robot which implements HPCT and addresses some of the issues relevant to your project, http://www.youtube.com/watch?v=xtYu53dKz2Q.
It is interesting to see a comparison of these two paradigms, as they both approach the field of AI at a similar level, that of embodied agents exhibiting simple behaviors. However, there are some fundamental differences between the two which means that any comparison will be biased towards one or the other depending upon the criteria chosen.
The main difference is of biological plausibility. Subsumption architecture, although inspired by some aspects of biological systems, is not intended to theoretically represent such systems. PCT, on the hand, is exactly that; a theory of how living systems work.
As far as PCT is concerned then, the most important criterion is whether or not the paradigm is biologically plausible, and criteria such as accuracy and complexity are irrelevant.
The other main difference is that Subsumption concerns action selection whereas PCT concerns control of perceptions (control of output versus control of input), which makes any comparison on other criteria problematic.
I had a few specific comments about your dissertation on points that may need
clarification or may be typos.
"creatures will attempt to reach their ultimate goals through
alternating their behaviour" - do you mean altering?
"Each virtual machine's output or error signal is the reference signal of the machine below it" - A reference signal can be a function of one or more output signals from higher-level systems, so more strictly this would be, "Each virtual machine's output or error signal contributes to the reference signal of a machine at a lower level".
"The major difference here is that Subsumption does not incorporate the ideas of 'conflict' " - Well, it does as the purpose of prioritising the different layers, and sub-systems, is to avoid conflict. Conflict is implicit, as there is not a dedicated system to handle conflicts.
"'reorganization' which require considering the goals of other layers." This doesn't quite capture the meaning of reorganisation. Reorganisation happens when there is prolonged error in perceptual control systems, and is a process whereby the structure of the systems changes. So rather than just the reference signals changing the connections between systems or the gain of the systems will change.
"Design complexity: this is an essential property for both theories." Rather than an essential property, in the sense of being required, it is a characteristic, though it is an important property to consider with respect to the implementation or usability of a theory. Complexity, though, has no bearing on the validity of the theory. I would say that PCT is a very simple theory, though complexity arises in defining the transfer functions, but this applies to any theory of living systems.
"The following step was used to create avoidance behaviour:" Having multiple nodes for different speeds seem unnecessarily complex. With PCT it should only be necessary to have one such node, where the distance is controlled by varying the speed (which could be negative).
Section 4.2.1 "For example, the avoidance VM tries to respond directly to certain intensity values with specific error values." This doesn't sound like PCT at all. With PCT, systems never respond with specific error (or output) values, but change the output in order to bring the intensity (in this case) input in to line with the reference.
"Therefore, reorganisation is required to handle that conflicting behaviour. I". If there is conflict reorganisation may be necessary if the current systems are not able to resolve that conflict. However, the result of reorganisation may be a set of systems that are able to resolve conflict. So, it can be possible to design systems that resolve conflict but do not require reorganisation. That is usually done with a higher-level control system, or set of systems; and should be possible in this case.
In this section there is no description of what the controlled variables are, which is of concern. I would suggest being clear about what are goal (variables) of each of the systems.
"Therefore, the designed behaviour is based on controlling reference values." If it is only reference values that are altered then I don't think it is accurate to describe this as 'reorganisation'. Such a node would better be described as a "conflict resolution" node, which should be a higher-level control system.
Figure 4.1. The links annotated as "error signals" are actually output signals. The error signals are the links between the comparator and the output.
"the robot never managed to recover from that state of trying to reorganise the reference values back and forth." I'd suggest the way to resolve this would be to have a system at a level above the conflicted systems, and takes inputs from one or both of them. The variable that it controls could simply be something like, 'circular-motion-while-in-open-space', and the input a function of the avoidance system perception and then a function of the output used as the reference for the circular motion system, which may result in a low, or zero, reference value, essentially switching off the system, thus avoiding conflict, or interference. Remember that a reference signal may be a weighted function of a number of output signals. Those weights, or signals, could be negative so inhibiting the effect of a signal resulting in suppression in a similar way to the Subsumption architecture.
"In reality, HPCT cannot be implemented without the concept of reorganisation because conflict will occur regardless". As described above HPCT can be implemented without reorganisation.
"Looking back at the accuracy of this design, it is difficult to say that it can adapt." Provided the PCT system is designed with clear controlled variables in mind PCT is highly adaptive, or resistant to the effects of disturbances, which is the PCT way of describing adaption in the present context.
In general, it may just require clarification in the text, but as there is a lack of description of controlled variables in the model of the PCT implementation and that, it seems, some 'behavioural' modules used were common to both implementations it makes me wonder whether PCT feedback systems were actually used or whether it was just the concept of the hierarchical architecture that was being contrasted with that of the Subsumption paradigm.
I am happy to provide more detail of HPCT implementation though it looks like this response is somewhat overdue and you've gone beyond that stage.
Partial answer from RM of the CSGnet list:
https://listserv.illinois.edu/wa.cgi?A2=ind1312d&L=csgnet&T=0&P=1261
Forget about the levels. They are just suggestions and are of no use in building a working robot.
A far better reference for the kind of robot you want to develop is the CROWD program, which is documented at http://www.livingcontrolsystems.com/demos/tutor_pct.html.
The agents in the CROWD program do most of what you want your robot to do. So one way to approach the design is to try to implement the control systems in the CROWD programs using the sensors and outputs available for the NXT robot.
Approach the design of the robot by thinking about what perceptions should be controlled in order to produce the behavior you want to see the robot perform. So, for example, if one behavior you want to see is "avoidance" then think about what avoidance behavior is (I presume it is maintaining a goal distance from obstacles) and then think about what perception, if kept under control, would result in you seeing the robot maintain a fixed distance from objects. I suspect it would be the perception of the time delay between sending and receiving of the ultrasound pulses.Since the robot is moving in two-space (I presume) there might have to be two pulse sensors in order to sense the two D location of objects.
There are potential conflicts between the control systems that you will need to build; for example, I think there could be conflicts between the system controlling for moving in a circular path and the system controlling for avoiding obstacles. The agents in the CROWD program have the same problem and sometimes get into dead end conflicts. There are various ways to deal with conflicts of this kind;for example, you could have a higher level system monitoring the error in the two potentially conflicting systems and have it make reduce the the gain in one system or the other if the conflict (error) persists for some time.

Creating simple waveforms with CoreAudio

I am new to CoreAudio, and I would like to output a simple sine wave and square wave with a given frequency and amplitude through the speakers using CA. I don't want to use sound files as I want to synthesize the sound.
What do I need to do this? And can you give me an example or tutorial? Thanks.
There are a number of errors in the previous answer. I, the legendary :-) James McCartney, not James Harkins wrote the sinewavedemo, I also wrote SuperCollider which is what the audiosynth.com website is about. I also now work at Apple on CoreAudio. The sinewavedemo DOES use CoreAudio, since it uses AudioHardware.h from CoreAudio.framework as its way to play the sound.
You should not use the sinewavedemo. It is very old code and it makes dangerous assumptions about the buffer layout of the audio hardware. The easiest way nowadays to play a sound that you are generating is to use the AudioQueue, or to use an output audio unit with a render callback set.
The best and easiest way to do that without files is to prepare a single cycle buffer, containing one cycle of the wave (this is called technically a wavetable)
In the playback function called by CoreAudio thread, fill the output buffer with samples read from the wave buffer.
Note however that you will face two problems very quickly :
- for the sine wave, if the playback frequency is not an integer multiple of the desired sine frequency, you will probably need to implement an interpolator if you want to have a good quality. Using only integer pointers will generate a significant level of harmonic noise.
for the square wave, avoid to just program an array with +1 / -1 values. Such a signal is not bandlimited and will alias a lot. Do not forget that the spectrum of a square wave is virtually infinite!
To get good algorithms for signal generation, take a look to musicdsp.org, that's probably one of the best resource for that
Are you new to audio programming in general? As a starting point i would check out
http://www.audiosynth.com/sinewavedemo.html
This is a minimum osx sinewave implementation by the legendary James Harkins. Note, it doesn't use CoreAudio at all.
If you specifically want to use CoreAudio for your sinewave you need to create an output unit (RemoteIO on the iphone, AUHAL on osx) and supply an input callback, where you can pretty much use the code from the above example. Check out
http://developer.apple.com/mac/library/technotes/tn2002/tn2091.html
The benefits of CoreAudio are chiefly, chain other effects with your sinewave, write plugins for hosts like Logic & provide the interfaces for them, write a host (like Logic) for plugins that can be chained together.
If you don't wont to write a plugin, or host plugins then CoreAudio might not actually be for you. But one of the best things about using CoreAudio is that once you get your sinewave callback working it is easy to add effects, or mix multiple sines together
To do this you need to put your output unit in a graph, to which you can effects, mixers, etc.
Here is some help on setting up graphs http://timbolstad.com/2010/03/16/core-audio-getting-started-pt2/
It isn't as difficult as it looks. Apple provides C++ helper classes for many things (/Developer/Examples/CoreAudio/PublicUtility) and even if you don't want to use C++ (you don't have to!) they can be a useful guide to the CoreAudio API.
If you are not doing this realtime, using the sin() function from math.h is not a bad idea. Just fill however many samples you need with sin() beforehand when it is time to play it, just send it to the audio buffer. sin() can be quite slow to call once every sample if you are doing this realtime, using an interpolated wavetable lookup method is much faster, but the resulting sound will not be as spectrally pure.
There is a good and well documented sine wave player code example in Chapter 7 of the Adamson/Avila "Learning Core Audio" book, published by Addison-Wesley Professional (ISBN-10: 0-321-63684-8 ):
http://www.informit.com/store/learning-core-audio-a-hands-on-guide-to-audio-programming-9780321636843
It is a rather new publication (2012) and addresses precisely the issue of this question. It's only a starting point, but it's a valuable starting point.
BTW. Don't jump to graphs before having this basic lesson (which involves some math) behind.
Concerning example code, a quick and efficient method I often use deals with a pre-filled sinewave lookup table which has as many members as sample rate, for 44100 Hz the table has size of 44100. In other words, cycle length equals sample rate. This gives an acceptable trade-off between speed and quality in many cases. You can initialize it with the program.
If you generate floating point samples (which is default in OSX), and use math functions, use sinf() rather than (float)sin(). Promotions in inner loop cycles of a render callback are always resource-expensive. So are repetitive multiplications of constants, such as 2.0*M_PI, which can too often be found in code examples.

Can you program a pure GPU game?

I'm a CS master student, and next semester I will have to start working on my thesis. I've had trouble coming up with a thesis idea, but I decided it will be related to Computer Graphics as I'm passionate about game development and wish to work as a professional game programmer one day.
Unfortunately I'm kinda new to the field of 3D Computer Graphics, I took an undergraduate course on the subject and hope to take an advanced course next semester, and I'm already reading a variety of books and articles to learn more. Still, my supervisor thinks its better if I come up with a general thesis idea now and then spend time learning about it in preparation for doing my thesis proposal. My supervisor has supplied me with some good ideas but I'd rather do something more interesting on my own, which hopefully has to do with games and gives me more opportunities to learn more about the field. I don't care if it's already been done, for me the thesis is more of an opportunity to learn about things in depth and to do substantial work on my own.
I don't know much about GPU programming and I'm still learning about shaders and languages like CUDA. One idea I had is to program an entire game (or as much as possible) on the GPU, including all the game logic, AI, and tests. This is inspired by reading papers on GPGPU and questions like this one I don't know how feasible that is with my knowledge, and my supervisor doesn't know a lot about recent GPUs. I'm sure with time I will be able to answer this question on my own, but it'd be handy if I could know the answer in advance so I could also consider other ideas.
So, if you've got this far, my question: Using only shaders or something like CUDA, can you make a full, simple 3D game that exploits the raw power and parallelism of GPUs? Or am I missing some limitation or difference between GPUs and CPUs that will always make a large portion of my code bound to CPU? I've read about physics engines running on the GPU, so why not everything else?
DISCLAIMER: I've done a PhD, but have never supervised a student of my own, so take all of what I'm about to say with a grain of salt!
I think trying to force as much of a game as possible onto a GPU is a great way to start off your project, but eventually the point of your work should be: "There's this thing that's an important part of many games, but in it's present state doesn't fit well on a GPU: here is how I modified it so it would fit well".
For instance, fortran mentioned that AI algorithms are a problem because they tend to rely on recursion. True, but, this is not necessarily a deal-breaker: the art of converting recursive algorithms into an iterative form is looked upon favorably by the academic community, and would form a nice center-piece for your thesis.
However, as a masters student, you haven't got much time so you would really need to identify the kernel of interest very quickly. I would not bother trying to get the whole game to actually fit onto the GPU as part of the outcome of your masters: I would treat it as an exercise just to see which part won't fit, and then focus on that part alone.
But be careful with your choice of supervisor. If your supervisor doesn't have any relevant experience, you should pick someone else who does.
I'm still waiting for a Gameboy Emulator that runs entirely on the GPU, which is just fed the game ROM itself and current user input and results in a texture displaying the game - maybe a second texture for sound output :)
The main problem is that you can't access persistent storage, user input or audio output from a GPU. These parts have to be on the CPU, by definition (even though cards with HDMI have audio output, but I think you can't control it from the GPU). Apart from that, you can already push large parts of the game code into the GPU, but I think it's not enough for a 3D game, since someone has to feed the 3D data into the GPU and tell it which shaders should apply to which part. You can't really randomly access data on the GPU or run arbitrary code, someone has to do the setup.
Some time ago, you would just setup a texture with the source data, a render target for the result data, and a pixel shader that would do the transformation. Then you rendered a quad with the shader to the render target, which would perform the calculations, and then read the texture back (or use it for further rendering). Today, things have been made simpler by the fourth and fifth generation of shaders (Shader Model 4.0 and whatever is in DirectX 11), so you can have larger shaders and access memory more easily. But still they have to be setup from the outside, and I don't know how things are today regarding keeping data between frames. In worst case, the CPU has to read back from the GPU and push again to retain game data, which is always a slow thing to do. But if you can really get to a point where a single generic setup/rendering cycle would be sufficient for your game to run, you could say that the game runs on the GPU. The code would be quite different from normal game code, though. Most of the performance of GPUs comes from the fact that they execute the same program in hundreds or even thousands of parallel shading units, and you can't just write a shader that can draw an image to a certain position. A pixel shader always runs, by definition, on one pixel, and the other shaders can do things on arbitrary coordinates, but they don't deal with pixels. It won't be easy, I guess.
I'd suggest just trying out the points I said. The most important is retaining state between frames, in my opinion, because if you can't retain all data, all is impossible.
First, Im not a computer engineer so my assumptions cannot even be a grain of salt, maybe nano scale.
Artificial intelligence? No problem.There are countless neural network examples running in parallel in google. Example: http://www.heatonresearch.com/encog
Pathfinding? You just try some parallel pathfinding algorithms that are already on internet. Just one of them: https://graphics.tudelft.nl/Publications-new/2012/BB12a/BB12a.pdf
Drawing? Use interoperability of dx or gl with cuda or cl so drawing doesnt cross pci-e lane. Can even do raytracing at corners so no z-fighting anymore, even going pure raytraced screen is doable with mainstream gpu using a low depth limit.
Physics? The easiest part, just iterate a simple Euler or Verlet integration and frequently stability checks if order of error is big.
Map/terrain generation? You just need a Mersenne-twister and a triangulator.
Save game? Sure, you can compress the data parallelly before writing to a buffer. Then a scheduler writes that data piece by piece to HDD through DMA so no lag.
Recursion? Write your own stack algorithm using main vram, not local memory so other kernels can run in wavefronts and GPU occupation is better.
Too much integer needed? You can cast to a float then do 50-100 calcs using all cores then cast the result back to integer.
Too much branching? Compute both cases if they are simple, so every core is in line and finish in sync. If not, then you can just put a branch predictor of yourself so the next time, it predicts better than the hardware(could it be?) with your own genuine algorithm.
Too much memory needed? You can add another GPU to system and open DMA channel or a CF/SLI for faster communication.
Hardest part in my opinion is the object oriented design since it is very weird and hardware dependent to build pseudo objects in gpu. Objects should be represented in host(cpu) memory but they must be separated over many arrays in gpu to be efficient. Example objects in host memory: orc1xy_orc2xy_orc3xy. Example objects in gpu memory: orc1_x__orc2_x__ ... orc1_y__orc2_y__ ...
The answer has already been chosen 6 years ago but for those interested to the actual question, Shadertoy, a live-coding WebGL platform, recently added the "multipass" feature allowing preservation of state.
Here's a live demo of the Bricks game running on Gpu.
I don't care if it's already been
done, for me the thesis is more of an
opportunity to learn about things in
depth and to do substantial work on my
own.
Then your idea of what a thesis is is completely wrong. A thesis must be an original research. --> edit: I was thinking about a PhD thesis, not a master thesis ^_^
About your question, the GPU's instruction sets and capabilities are very specific to vector floating point operations. The game logic usually does little floating point, and much logic (branches and decision trees).
If you take a look to the CUDA wikipedia page you will see:
It uses a recursion-free,
function-pointer-free subset of the C
language
So forget about implementing any AI algorithms there, that are essentially recursive (like A* for pathfinding). Maybe you could simulate the recursion with stacks, but if it's not allowed explicitly it should be for a reason. Not having function pointers also limits somewhat the ability to use dispatch tables for handling the different actions depending on state of the game (you could use again chained if-else constructions, but something smells bad there).
Those limitations in the language reflect that the underlying HW is mostly thought to do streaming processing tasks. Of course there are workarounds (stacks, chained if-else), and you could theoretically implement almost any algorithm there, but they will probably make the performance suck a lot.
The other point is about handling the IO, as already mentioned up there, this is a task for the main CPU (because it is the one that executes the OS).
It is viable to do a masters thesis on a subject and with tools that you are, when you begin, unfamiliar. However, its a big chance to take!
Of course a masters thesis should be fun. But ultimately, its imperative that you pass with distinction and that might mean tackling a difficult subject that you have already mastered.
Equally important is your supervisor. Its imperative that you tackle some problem they show an interest in - that they are themselves familiar with - so that they can become interested in helping you get a great grade.
You've had lots of hobby time for scratching itches, you'll have lots more hobby time in the future too no doubt. But master thesis time is not the time for hobbies unfortunately.
Whilst GPUs today have got some immense computational power, they are, regardless of things like CUDA and OpenCL limited to a restricted set of uses, whereas the CPU is more suited towards computing general things, with extensions like SSE to speed up specific common tasks. If I'm not mistaken, some GPUs have the inability to do a division of two floating point integers in hardware. Certainly things have improved greatly compared to 5 years ago.
It'd be impossible to develop a game to run entirely in a GPU - it would need the CPU at some stage to execute something, however making a GPU perform more than just the graphics (and physics even) of a game would certainly be interesting, with the catch that game developers for PC have the biggest issue of having to contend with a variety of machine specification, and thus have to restrict themselves to incorporating backwards compatibility, complicating things. The architecture of a system will be a crucial issue - for example the Playstation 3 has the ability to do multi gigabytes a second of throughput between the CPU and RAM, GPU and Video RAM, however the CPU accessing GPU memory peaks out just past 12MiB/s.
The approach you may be looking for is called "GPGPU" for "General Purpose GPU". Good starting points may be:
http://en.wikipedia.org/wiki/GPGPU
http://gpgpu.org/
Rumors about spectacular successes in this approach have been around for a few years now, but I suspect that this will become everyday practice in a few years (unless CPU architectures change a lot, and make it obsolete).
The key here is parallelism: if you have a problem where you need a large number of parallel processing units. Thus, maybe neural networks or genetic algorithms may be a good range of problems to attack with the power of a GPU. Maybe also looking for vulnerabilities in cryptographic hashes (cracking the DES on a GPU would make a nice thesis, I imagine :)). But problems requiring high-speed serial processing don't seem so much suited for the GPU. So emulating a GameBoy may be out of scope. (But emulating a cluster of low-power machines might be considered.)
I would think a project dealing with a game architecture that targets multiple core CPUs and GPUs would be interesting. I think this is still an area where a lot of work is being done. In order to take advantage of current and future computer hardware, new game architectures are going to be needed. I went to GDC 2008 and there were ome talks related to this. Gamebryo had an interesting approach where they create threads for processing computations. You can designate the number of cores you want to use so that if you don't starve out other libraries that might be multi-core. I imagine the computations could be targeted to GPUs as well.
Other approaches included targeting different systems for different cores so that computations could be done in parallel. For instance, the first split a talk suggested was to put the renderer on its own core and the rest of the game on another. There are other more complex techniques but it all basically boils down to how do you get the data around to the different cores.