Direct X 11 and GPU acceleration (Simulations etc) - where to start? - gpu

I am using SharpDX and am fairly comfortable with it at the moment, but for an assessment for university, I need to create a demo which utilizes some sort of GPU acceleration. This is an 'independent research' task - what that means is, I assigned myself this task. I am legitimately interested in GPU acceleration in games, but right now it feels like I threw myself way too far into the deep.
I am planning to do a particle system (it will be a very BASIC system, with particles firing/falling/dieing) but I need a starting point.
Can someone point me in the right direction? Articles to read? things to consider? I have googled my heart out on things like "GPU acceleration DirectX", but I can't find any solid results! I wish I had a sort of 'hello world' for GPU acceleration..!

If you want to understand how to build a GPU particle system, I suggest you to read the book "Practical Rendering And Compution with Direct3D11", where you will find an entire chapter dedicated on how to implement a simple GPU particle system.
Then the most trickiest part is probably the sorting algorithm which is not detailed in the previous book, but you can have a look at ComputeShaderSort11 sample from the old DirectX June 2010 that could help you a lot (the implementation is quite efficient).
Also, I did a full particle engine on the GPU with SharpDX at my work, so It is perfectly achievable with SharpDX.

Related

Which SoC for making a Rubik's Cube Solver?

Is it wiser to use a Raspberry Pi over an Intel Galileo for making a Rubik's Cube Solver? Programming language isn't a major issue, Although Python would be slightly more preferred.
The major constraint is that there is only one PWM pin on the Raspberry Pi, we're thinking about using servo motors to rotate the Cube. What do you people think?
Major differences:
PWM pins
Processor
RAM
While this is hardly the kind of question to ask on this forum; I will attempt to NOT confuse you with my answer.
Before trying to answer which is best there are a few other things you need to ask yourself:
what does your design involve other than the processor? Do you want
to use 6 servos, one for each face of the cube? Do you have a more
cost efficient design that involves fewer servos? How many I/O pins
do you actually need?
RAM and Processor type are factors to consider when it comes to how fast your algorithm will run. Are you trying to make the fastest Rubik's Cube Solver in the world? Or just one that can actually solve the problem.
is cost a factor? Both platforms are decently priced but there is a difference between them which may matter when you are on a budget
The Galileo platform is newer than the Pi. You are more likely to find answers to your question when going with the more popular platform.
Is the programming language important? This comes back to how fast you want the algorithm to run. A c implementation will run faster than a python implementation, but ultimately I think it's better to stick to what you are more comfortable with.
On a personal note, I would probably go with the Pi because there's a huge community built around it and you can find plug-in expansion boards for almost anything you can think of, which will allow you to focus on software without worrying too much about the hardware side of things.

How Do You Profile & Optimize CUDA Kernels?

I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizing CUDA code is not like profiling & optimizing code that runs on a CPU. So I am hoping to learn from your experiences about how to get the most out of my code.
There was a post recently looking for the fastest possible code to identify self numbers, and I provided a CUDA implementation. I'm not satisfied that this code is as fast as it can be, but I'm at a loss as to figure out both what the right questions are and what tool I can get the answers from.
How do you identify ways to make your CUDA kernels perform faster?
If you're developing on Linux then the CUDA Visual Profiler gives you a whole load of information, knowing what to do with it can be a little tricky. On Windows you can also use the CUDA Visual Profiler, or (on Vista/7/2008) you can use Nexus which integrates nicely with Visual Studio and gives you combined host and GPU profile information.
Once you've got the data, you need to know how to interpret it. The Advanced CUDA C presentation from GTC has some useful tips. The main things to look out for are:
Optimal memory accesses: you need to know what you expect your code to do and then look for exceptions. So if you are always loading floats, and each thread loads a different float from an array, then you would expect to see only 64-byte loads (on current h/w). Any other loads are inefficient. The profiling information will probably improve in future h/w.
Minimise serialization: the "warp serialize" counter indicates that you have shared memory bank conflicts or constant serialization, the presentation goes into more detail and what to do about this as does the SDK (e.g. the reduction sample)
Overlap I/O and compute: this is where Nexus really shines (you can get the same info manually using cudaEvents), if you have a large amount of data transfer you want to overlap the compute and the I/O
Execution configuration: the occupancy calculator can help with this, but simple methods like commenting the compute to measure expected vs. measured bandwidth is really useful (and vice versa for compute throughput)
This is just a start, check out the GTC presentation and the other webinars on the NVIDIA website.
If you are using Windows... Check Nexus:
http://developer.nvidia.com/object/nexus.html
The CUDA profiler is rather crude and doesn't provide a lot of useful information. The only way to seriously micro-optimize your code (assuming you have already chosen the best possible algorithm) is to have a deep understanding of the GPU architecture, particularly with regard to using shared memory, external memory access patterns, register usage, thread occupancy, warps, etc.
Maybe you could post your kernel code here and get some feedback ?
The nVidia CUDA developer forum forum is also a good place to go for help with this kind of problem.
I hung back because I'm no CUDA expert, and the other answers are pretty good IF the code is already pretty near optimal. In my experience, that's a big IF, and there's no harm in verifying it.
To verify it, you need to find out if the code is for sure not doing anything it doesn't really have to do. Here are ways I can see to verify that:
Run the same code on the vanilla processor, and either take stackshots of it, or use a profiler such as Oprofile or RotateRight/Zoom that can give you equivalent information.
Running it on a CUDA processor, and doing the same thing, if possible.
What you're looking for are lines of code that have high occupancy on the call stack, as shown by the fraction of stack samples containing them. Those are your "bottlenecks". It does not take a very large number of samples to locate them.

What are some ideas for an embedded and/or robotics project?

I'd like to start messing around programming and building something with an Arduino board, but I can't think of any great ideas on what to build. Do you have any suggestions?
I show kids, who have never programmed, or done any electronics before, to make a simple 'Phototrope', a light sensitive robot, in about a day. It costs under £30 (GBP) including Arduino, electronics and off-the-shelf mechanics. If folks really get into mobile robots, the initial project can grow and grow (which I feel is part of the fun).
There are international robot competitions which require relatively simple mechanics to get started, e.g. in the UK http://www.tic.ac.uk/micromouse/toh.asp
Ultimate performance require specially built machines (for lightness) , but folks would get creditable results with an Arduino Nano, the right electronics, and a couple of good motors.
A line following robot is the classic mobile robot project. The track can be as simple as electrical tape. Pololu have some fun videos about their near-Arduino 3PI robot. The sensors are about £1, and there are a bunch of simple motor+gearbox kits from lots of places for under £10. Add a few £ for motor control, and you have autonomous robot mechanics, in need of programming! Add an Infrared Remote receiver (about £1), and you can drive it around using your TV remote. Add a small solar cell, use an Arduino analogue input to measure voltage, and it can find the sun. With a bit more electronics, it can 'feed' itself. And so it gets more sophisticated. Each step might be no more than a few hours to a few days effort, and you'll find new problems to solve and learn from.
IMHO, the most interesting (low-cost) competitions are maze solving robots. The international competition rule require the robot to explore a walled maze, usually using Infrared sensors, and calculate their optimal route. The challenges include keeping track of current position to near-millimeter accuracy, dealing with real world's unpredictably noisy environment and optimising straight-line speed with shortest distance cornering.
All that in 16K of program, and 1K RAM, with real-time interrupt handling (as much as 100K interrupts/second for some motor systems), sensor sampling, motor speed control, and maze solving is an interesting programming challenge. (You might make it 'easy' with 32K of program, and 2K RAM :-)
I'm working on a 'constrained' robot challenge (based on Arduino) so that robot performance is mainly about programming rather than having a big budget.
Start small and build up to something more complex. Control servos. Blink LEDs. Debounce inputs. Read analog sensors. Display text on an LCD. Then put it together.
Despite the name, I like the "Evil Genius" book for PIC microcontrollers because of the small, easily digestible projects that tend to build on one another. It is, of course, aimed at PIC programmers rather than the Arduino, but the material covered will be useful no matter what you're developing on.
I know Arduino is trendy right now, but I also like the Teensy++ development board because of its low price-point ($24), breadboard-compatible PCB, relatively high pin count, Linux development environment, USB connectivity, and not needing a programmer. Worth considering for smaller projects.
If you come up with something cool, let me know. I need an excuse to do something fun :)
Bicycle-related ideas:
theft alarm (perhaps with radio link to a base station which is connected to a PC by Ethernet)
fancy trip computer (with reed switch or opto sensor on wheel)
integrate with a GPS telematics unit (trip logging) with Ethernet/USB download of logged data to PC. Also has an interesting PC programming component--integrate with Google Maps.
Other ideas:
Clock with automatic time sync from:
GPS receiver
FM radio signal with embedded RDS data with CT code
Digital radio (DAB+)
Mobile phone tower (would it require a subscription and SIM card for this receive-only operation?)
NTP server via:
Ethernet
WiFi
ZigBee (with a ZigBee coordinator that gets its time from e.g. Ethernet or GPS)
Mains electricity smart meter via ZigBee (I'm interested now that smart meters are being introduced in Victoria, Australia; not sure if the smart meters broadcast the time info though, and whether it requires authentication)
Metronome
Instrument tuner
This reverse-geocache puzzle box was an awesome Arduino project. You could take this to the next step, e.g. have a reverse-geocache box that gives out a clue only at a specific location, and then using physical clues found at that location coupled with the next clue from the box, determine where to go for the next step.
You could do one of the firefighting robot competitions. We built a robot in university for my bachelor's final project, but didn't have time to enter the competition. Plus the robot needed some polish anyway... :)
Video here.
Mind you, this was done with a Motorola HC12 and a C compiler, and most components outside the microcontroller board were made from scratch, so it took longer than it should. Should be much easier with prefab components.
Path finding/obstacle navigation is typically a good project to start with. If you want something practical, take a look at how iRobot vacuums the floor and come up with a better scheme.
Depends on your background and if you want practical or cool. On the practical side, a remote control could be a simple starting point. It's got buttons and lights but isn't too demanding.
For a cool project maybe a Simon-style memory game or anything with lights & noises (thinking theremin-style).
I don't have suggestions or perhaps something like a line follower robot. I could help you with some links for inspiration
Arduino tutorials
Top 40 Arduino Projects of the Web
20 Unbelievable Arduino Projects
I'm currently developing plans to automate my 30 year old model train layout.
A POV device could be fun to build (just google for POV Arduino). POV means persistence of vision.

How can I make my own microcontroller?

How can I make my own microcontroller? I've done some work using GAL chips and programmed a chip to do simple commands such as add, load, move, xor, and output, but I'd like to do something more like a real microcontroller.
How can I go about doing this? I've read a little bit about FPGA and CPLD, but not very much, and so was looking for some advice on what to get and how to start developing on it.
Look here for a good wiki book. I had some coursework I wrote when I was teaching Electronic Eng, but I couldn't find it around. When I was teaching, most of the students were happy to use the schematic capture tools in the Xilinx Foundation package. They've moved onto ISE and WebPACK now. You can download the WebPack for free, which is useful, and it has schematic capture and simulation in it.
If you really want to shine, learn VHDL or Verilog (VHDL seems to be more common where I've worked, but that is only a small smattering of places) and code the design rather than enter it through the GUI.
If you know ANYTHING at all about digital logic design (and some HDL) I rekon you can have a somewhat functional 8-bit microprocessor simulating in VHDL in about 2 days. You're not going to build anything blazingly fast or enormously powerful in that time but it's a good starting point to grow from. If you have to learn about digital design, factor in a couple of days to learn how the tools work and simulate some basic logic circuits before moving onto the uP design.
Start learning the basics of digital systems, and how to build a binary adder. Move on to building an ALU to handle addition, subtraction, and, or, xor, etc and then a sequencer to read opcodes from RAM and supply them to the execution unit.
You can get fancy with instruction set design, but I'd recommend starting out REALLY simple until you have your head around whats going on, then throw it out and start again with something more complex.
Once you have the design simulating nicely you can gauge its complexity and purchase a device to suit. You should look at a development system for the device family you've chosen. Pick a device bigger than what you need for development because it's nice to be able to add extra instrumentation to debug it when it's running, and you almost certainly won't have optimized your design in the early stages of getting it on the device.
EDIT: Colin Mackenzie has a good tutorial about uC design and some FPGA boards as well as a bit of other stuff.
You may want to have a look around OpenCores.org, a "forge" site for open source IP core development. Also, consider getting yourself a development board like one of these to play around with.
Much of the tools ecosystem revolves around VHDL, although Avalda is working on tools to compile F# for FPGAs.
I saw a textbook once that stepped through building a machine from TTL chips. This had the same instruction set as a PDP-8, which is very - and I mean very - simple, so the actual machine architecture is easy to implement in this way.
The PDP-8 FAQ mentions a book: "The Art of Digital Design," second edition, by Franklin Prosser and David Winkel (Prentice-Hall, 1987, ISBN 0-13-046780-4). It also mentions people implementing it in FPGA's.
Given the extreme simplicity of this CPU architecture and availability of PDP-8 code or reference implementations it might be a good starting point to warm up with.
Alternatively, an acquaintance of mine implemented a thumb (cut down ARM) on a FPGA as a university project run by one Steve Furber (a prominent Acorn alumnus). Given that this could be compressed into a format small enough for a university project it might also be a good start.
To play with soft-core microprocessors, I like the Spartan 3 Starter Board from Digilent just because it has 1M of static RAM. SDRAM and DDR RAM are harder to get going, you know.
The leds, switches and a simple serial interface are a plus to debug and communicate.
As someone already pointed out, OpenCores.org is a good place to find working examples. I used the Plasma uC to write some papers while on university.
A microcontroller can be as simple as a ROM (instruction*2^x + (clock phase) is the address, outputs are the control signals, and you're good to go). Or it can be a complex harry beast with three arms and branch prediction support hardware.
Can you give more details about your aspirations?
After searching some very helpful links by all of you, I came across this Wikiversity course.
One of the first sentences is, "Have you ever thought to build your own microprocessor?"
Xilinx has a MicroBlaze and a PicoBlaze soft controller for its FPGAs. The latter is free, while, IIRC, the Microblaze is to be paid for.
As its name suggests the PicoBlaze is a small processor, which has its limitations, but OTOH is compact enough to run on a CPLD. Anyway a nice processor to get you started.
Pablo Bleyer has a PicoBlaze-compatible PacoBlaze. PacoBlaze was written in Verilog (which, like Adam said, less common than VHDL).
You need a big fpga for a little mcu.
You need a fpga with the correct hardware blocks if you need things like AD.
You need a soft core to put into the fpga.
But how about to just play around with a normal MCU before this project,
so you kind of know where you are going? How about some AVR:s from Atmel.
You can get free samples of pic micro controllers at this site. Last I knew, you don't even have to pay shipping.
http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=64

Best platform for learning embedded programming? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking to learn about embedded programming (in C mainly, but I hope to brush up on my ASM as well) and I was wondering what the best platform would be. I have some experience in using Atmel AVR's and programming them with the stk500 and found that to be relatively easy. I especially like AVR Studio and the debugger that lets you view that state of registers.
However, If I was to take the time to learn, I would rather learn about something that is prevalent in industry. I am thinking ARM, that is unless someone has a better suggestion.
I would also be looking for some reference material, I have found the books section on the ARM website and if one is a technically better book than another I would appreciate a heads up.
The last thing I would be looking for is a prototyping/programming board like the STK500 that has some buttons and so forth.
Thanks =]
"embedded programming" is a very broad term. AVR is pretty well in that category, but it's a step below ARM, in that it's both simpler to use, as well as less powerful.
If you just want to play around with ARM, buy a Nintendo DS or a Gameboy Advance. These are very cheap compared to the hardware inside (wonders of mass production), and they both have free development toolchains based off of gcc which can compile to them.
If you want to play around with embedded linux, BeagleBoard is looking to be a good option, only $150 and it has a ton of features.
Personally I think AVR is best for the smaller-sized 8-bit platforms, and ARM is best for the larger, more powerful 32-bit based platforms. Like many AVR fans, I don't like PIC. It just seems worse in pretty much every way. Also avoid anything that requires you to write any type of BASIC.
If you just want to play around with it, I'd suggest the Arduino platform (http://www.arduino.cc). It's based on the ATmega168 or ATmega8, depending on the version. It uses a C-like language and has its own IDE.
Myself I've worked in embedded programming for 9 years now and have experience on TI MSP430, Atmel AVR (a couple of flavours) and will be using an ARM soon.
My suggestion is to pickup something that has some extra features in the processor like ethernet controller and CAN controller, even get two or three if you can. Embedded devices are nice to work with, but once they can talk to other similar devices via CAN or get onto a network, they can become much more fun to play with.
ADI's Blackfin is another option since it's quite a straight forward architecture to program, yet can also do some fairly hefty DSP stuff should you choose to go down that route. It helps that the assembly language is quite sane too.
The Blackfin STAMP boards are an inexpensive (~$100 last I checked) way in, and they support the free GCC tools and uClinux.
Whatever architecture you choose I'd definitely recommend first downloading the toolchain\SDK and looking through the sample projects and tutorials - generally having a bit of a play about. You can often get quite acquainted with the architecture through simulation without even touching any hardware.
ARM has the nicest instruction set of the widely used embedded platforms, leaving you free to pick up the general principles of writing software for embedded platforms without getting bogged down in weird details like non-orthogonal registers or branch delay slots. There are plenty of emulators - ARM's own, while not free, is cycle-accurate; and a huge variety of programmable ARM-based hardware is cheap and easy to come by as well.
The TI MSP430 is a great platform for learning how to program microcontrollers. TI has a variety of FREE Tools and some cheap evaluation boards (starting at $20). Plus, it's a low-power, modern microcontroller.
A nice choice would be PIC18 by Microchip
It has quite alot of material, documentation, tutorials and projects on the internet
Free IDE and compiler.
you can pull your own flash writer in a few minutes.
(Although for a debugger to work you'll need to work harder)
If you're a student (or has a student email address) Microchip will send you free sample chips. So basically you can have a full development environment for close to nothing.
PICs are quite prevalent in the industry. Specifically as controllers for robots for some reason although they can do so much more.
Arduino seems to be the platform of choice these days for beginners although there are lots of others. I like the Olimex boards personally but they are not really for beginners.
Microchip's PIC range of CPUs are also excellent for beginners, especially if you want to program in assembler.
BTW, Assembler is not used as much as it used to. The general rule with embedded is if you've got 4k of memory or more, use C. You get portability and you can develop code faster.
I suppose it depends on your skill level and what you want to do with the chip. I usually choose which embedded chip to use by the available peripherals. If you want a USB port, find one with USB built in, if you want analogue-to-digital, find one with an ADC etc. If you've got a simple application, use an 8-bit but if you need serious number crunching, go 32 bits.
I'd like to suggest the beagleboard from TI. It has a Omap3 on it. That's a Cortex-A8 ARM11 CPU, a C64x+ DSP and a video accelerator as well.
The board does not need an expensive jtag device. A serial cable an an SD-Card is all you need to get started. Board costs only $150 and there is a very active community.
www.beagleboard.org
Your question sort of has been answered in this question.
To add to that, the embedded processor industry is very segmented, it doesn't have a major player like Intel/x86 is for the "desktop" processor industry. The ARM processor does have a large share, so does MIPS I believe, and there are many smaller more specific microcontroller like chips available (like the MSP430 etc from TI).
As for documentation, I do embedded development for a day job, and the documentation we have access to (as software developers) is rather sparse. Your best bet is to use the documentation available on the processor manufacturers site.
Take a look at Processing and the associated Arduino and Wiring boards.
If you just want to have fun, then try the Parallax Propeller chip. The HYDRA game platform looks like a blast. There's a $100 C compiler for it now.
I started on BASIC stamps, moved up through SX chips and PICs into 8051s, then 68332s, various DSPs, FPGA soft processors, etc.
8051s are more useful in the real world... the things won't go away. There's TONS of derivatives and crazy stuff for them. (Just stay away from the DS80C400) The energy industry is absolutely full of them.
Start with something tiny. If you have external RAM and plenty of registers... what's the difference between that and a SBC?
Many moons ago I've worked with 8-bitters like 68HC05 and Z80, later AVR and MSP430 (16-bit). However most recent projects were on ARM7. Several manufacturers offer ARM controllers, in all colors and sizes (well, not really color).
ARM(7) is replacing 8-bit architecture: it's more performant (32-bit RISC at faster instruction cycles than most 8-bitters), has more memory and is available with several IO-configurations.
I worked with NXP LPC2000 controllers, which are also inexpensive (< 1 USD for a 32-bitter!).
If you're in Europe http://www.olimex.com/dev/index.html has some nice low-cost development boards. Works in the rest of the world too :-)
For a fun project to test, have a look at xgamestation
But for a more industrial used one chip solution programming, look at PIC
For my Computer Architecture course I had to work with both a PIC and an AVR; in my opinion the PIC was easier to work with, but that's maybe because that's what we worked with the most and we had the most time to get used to. We used the AVR maybe only a couple of times so I couldn't get the hang of it perfectly but it also was nothing overly complicated, or at least not more frustrating than the other.
I think you can also order microprocessor samples from Microchip's website so you could also get started with that?
Second that:
Arduino platform http://www.arduino.cc
HTH
For learning, you can't go past the AVR. The chips are cheap and they'll run with zero external components - they also supply enough current to drive an LED straight from the port.
You can start with a cheap programmer such as lady-ada's USBTinyISP (USD$22 for a kit) which can power your board with 5V from the USB port. Get the free tools WinAVR (GCC based) and AVRStudio and get a small project working in no time.
Yes the AVRs have limitations - but developing software for microcontrollers is largely about managing resources and coping with those problems. It's unlikely that you'll experience problems such as running out of stack space, RAM or ROM when you're making hobbist projects for powerful ARM platforms.
That said, ARM is also a great platform which is widely used in the industry, however, for learning I highly recommend AVRs.
I would suggest Microchip's PIC18F series. I just started developing for them with the RealICE in-circuit emulator, but the pickit2 is a decent debugger for the price. You could say this for the AVR's also, but there is a large following for the device all over the web. I was able to have a - buggy, yet functional - embedded USB device running within days due to all the PIC related chatter.
The only thing I don't like about the PICs is that a lot of the sample code is VERY entwined into the demo boards. That can make it hard to tear out sections that you need and still have an application that will build and run for your application.
Texas Instruments has released a very interesting development kit at a very low price: The eZ430-Chronos Development Tool contains an MSP430 with display and various sensors in a sports watch, including a usb debug programmer and a usb radio access point for 50$
There is also a wiki containing lots and lots of information.
I have already created a stackexchange proposal for the eZ430-Chronos Kit.
You should try and learn from developpers kits provided by Embedded Artists. After you get the kit, check their instructional videos and videos provided by NXP, which are not as detailed as they could be, but they cover a lot of things. Problems with learning ARM as your first architecture and try to do something practicall are:
You need to buy dev. kit.
You need a good book to learn ARM assembly, because sooner or later you will come across ARM startup code, which is quite a deal for a beginner. The book i mentioned allso covers some C programming.
Combine book mentioned above with a user guide for your speciffic processor like this one. Make sure you get this as studying this in combination with above book is the only way to learn your ARM proc. in detail.
If you want to make a transfer from ARM assembly to C programming you will need to read this book, which covers a different ARM processor but is easier for C beginner. The down side is that it doesn't explain any ARM assembly, but this is why you need the first book.
There is no easy way.
mikroElektronika has nice ARM boards and C, Pascal and Basic compilers that might suite your demands.