any path-finding tricks and strategies for mobile game (e.g rts) on less powerful device? - optimization

i'm developing a 2d game , rts game, is sort of like COC (Clash of Clans). cool mobile game,huh. but i run into some problem with path-finding, as usual,i do path-finding algorithm once every agent was placed somewhere in screen by finger-touch,but in some case, this incur performance penalty, and your mobile phone will be very hot as your agents increase suddenly and simultaneously.
actually,no matter what path-finding i use,e.g a*,dijkstra, or something special(maybe optimal),which is always time-comsuming process throughout the whole game loop ,especially massive agents on less powerfull mobile cpu. as far as i know, some game like this, shortest path is not the focus (will people care about the path agent walk through intentionally?) instead of efficient and naturally path-finding. so my mind come up with some solutions,maybe impractical.
solution 1 : use some cheaper path-finding algorithm, could be graph related or somethingelse because shortest path doesn't matter.
solution 2 : put some limits on ai module to process agent for path-finding, e.g upper limit to path-finding algorithm calls at interval,that is,just one or two agents of the those agents got planning, let rest of them plann after several game frames. as you know ,its drawback is obvious.
the above is what i thougt. hope your game dev disciplined guys give me brilliant idea , tricks, i'll appricate. thank you very much.
EDIT:
here is my related pseudo code,and procedure cresspond to my game logic.
//inside logic thread
procedure putonagent
if (need to put agent on world space)
//do standard a* path-finding for an agent
path_list=do_aStar_path_finding(attacktargetpos,startpos);
and then enqueue path_list;
......
end
the path_list's queue finally used by visual agents for stepping forward. any hints?

Look up "Hierarchical pathfinding" Say you're driving to a city far away, you don't plan the entire path before you get in the car!
Pathfinding is usually done in steps, like it's not one function call, after N iterations it'll return (and indicate it's not complete) so it can be run at the next available time. Basically rather than a function with locals think operator() and state variables as members of a class.
To make it fast you can make the heuristic crap with A* pathfinding, suppose I use a heuristic of 10* the distance-as-the-crow-flies, it may not find the shortest path but it will have a strong preference for heading towards the target rather than "fanning out" and exploring further around the closed region.

Related

Arcade Fighting Game AI

I need to build the AI for an opponent in an arcade style fighting game, very similar to Mortal Kombat.
I don't want to use random moves for the computer, but I would like to have an AI that is harder to beat.
Where can I start looking for resources ? Do you know of any implementation of this sort of project ?
Think about how you play the game.
Ask yourself, under what conditions would I perform certain attacks? When would I block? What do I do when I have low health? When my opponent has low health? Do I become more agressive in one situation over the other? When is it best to use long range versus short range?
Etc.
An AI like this usually only follows a bunch of if/else/then statements, with som randomness added in.
You want it to react quickly so much of anything else (A*, alpha-beta, etc) won't be as useful.
There is an algorithm which depends on statistic and count of shoots in each direction. You can put such logic to calculate how many times enemy kicked you in needed direction and predict future attacks.

How can I estimate if a feature is going to take up too many resources on an FPGA?

I'm starting on my first commercial sized application, and I often find myself making a design, but stopping myself from coding and implementing it, because it seems like a huge use of resources. This is especially true when it's on a piece that is peripheral (for example an enable for the output taps of a shift register). It gets even worse when I think about how large the generic implementation can get (4k bits for the taps example). The cleanest implementation would have these, but in my head it adds a great amount of overhead.
Is there any kind of rule I can use to make a quick decision on whether a design option is worth coding and evaluation? In general I worry less about the number of flip-flops, and more when it comes to width of signals. This may just be coming from a CS background where all application boundarys should be as small as possibly feasable to prevent overhead.
Point 1. We learn by playing, so play! Try a couple of things. See what the tools do. Get a feel for the problem. You won't get past this is you don't try something. Often the problems aren't where you think they're going to be.
Point 2. You need to get some context for these decisions. How big is adding an enable to a shift register compared to the capacity of the FPGA / your design?
Point 3. There's two major types of 'resource' to consider :- Cells and Time.
Cells is relatively easy in broad terms. How many flops? How much logic in identifiable blocks (e.g. in an ALU: multipliers, adders, etc)? Often this is defined by the design you're trying to do. You can't build an ALU without registers, a multiplier, an adder, etc.
Time is more subtle, and is invariably traded off against cells. You'll be trying to hit some performance target and recognising the structures that will make that hard are where to experience from point 1 comes in.
Things to look out for include:
A single net driving a large number of things. Large fan-outs cause a heavy load on a single driver which slows it down. The tool will then have to use cells to buffer that signal. Classic time vs cells trade off.
Deep clumps of logic between register stages. Again the tool will have to spend more cells to make logic meet timing if it's close to the edge. Simple logic is fast and small. Sometimes introducing a pipeline stage can decrease the size of a design is it makes the logic either side far easier.
Don't worry so much about large buses, if each bit is low fanout and you've budgeted for the registers. Large buses are often inherent in fast designs because you need high bandwidth. It can be easier to go wide than to go to a higher clock speed. On the other hand, think about the control logic for a wide bus, because it's likely to have a large fan-out.
Different tools and target devices have different characteristics, so you have to play and learn the rules for your set-up. There's always a size vs speed (and these days 'vs power') compromise. You need to understand what moves you along that curve in each direction. That comes with experience.
Is there any kind of rule I can use to make a quick decision on whether a design option is worth coding and evaluation?
Only rule I can come up with is 'Have I got time? or not?'
If I have, I'll explore. If not I better just make something work.
Ahhh, the life of doing design to a deadline!
It's something that comes with experience. Here's some pointers:
adding numbers is fairly cheap
choosing between them (multiplexing) gets big quite quickly if you have a lot of inputs to the multiplexer (the width of each input is a secondary issue also).
Multiplications are free if you have spare multipliers in your chip, they suddenly become expensive when you run out of hard DSP blocks.
memory is also cheap, until you run out. For example, your 4Kbit shift register easily fits within a single Xilinx block RAM, which is fine if you have one to spare. If not it'll take a large number of LUTs (depending on the device - an older Spartan 3 can fit 17 bits into a LUT (including the in-CLB register), so will require ~235 LUTS). And not all LUTs can be shift registers. If you are only worried about the enable for the register, don't. Unless you are pushing the performance of the device, routing that sort of signal to a few hundred LUTs is unlikely to cause major timing issues.

Can this online highscore scheme be abused?

Background
One problem with games using online highscore lists is that they often can be abused. The game sends the current score to the server and a cunning user can analyze the protocol/scheme and send bogus scores. That is why some highscore lists are topped with 999999 scores.
A common solution to this problem is to encrypt the score in some way, and on top of that put other mechanisms to recognize false scores. But even if you do this, it's the client that sends the score and the client is living in the user's computer and can be reverse-engineered.
My idea
I am designing/thinking about a game (that I will complete, yeah right :) ) where you configure your player/robot with instructions on how to perform a task (and when these instructions are to be carried out). When a "Go" button is pressed the game runs the instructions. Finally a result and, if successful, a score, is obtained.
So, how about this: Instead of submitting the score, the actual instructions are sent to the server, where they are run, using the same implementation. Then the server calculates the score and places the user on the highscore list.
The question
Are there ways this idea can be abused to get a false score?
I understand that this probably is not a new idea. But if it works, it wouldn't be impossible to extend it to other games too, where it is possible to record all user actions.
People will always find a way to cheat, but this seems like a reasonable counter measure. You'll have to consider your intended traffic levels as your scheme will require more resources than if it was just recording the high score sent by the client.
But, as an aside - this game sounds an awful lot like my job (giving instructions to a machine so it performs some task). No high-score board though (although, that would be awesome).
as long as the robot program's behavior doesn't depend on the speed of the computer it'll be fine and if the programs are quite small at most a few kilobytes this would work fine; the only way i can see to cheat it is if one cloned the work space and ran a program to find the optimal program for the robot and then put it in and submitted it or if some one posted the solutions, and people used that but both of those issues can be solved with randomization.
(a note about the issue of speed dependent games, it's fine for the game to uniformly slow down if the computer can't run it at full speed but if the physics time step depends on the frame rate, you can get problems like the jump height varying with the frame rate)

Implementation of achievement systems in modern, complex games

Many games that are created these days come with their own achievement system that rewards players/users for accomplishing certain tasks. The badges system here on stackoverflow is exactly the same.
There are some problems though for which I couldn't figure out good solutions.
Achievement systems have to watch out for certain events all the time, think of a game that offers 20 to 30 achievements for e.g.: combat. The server would have to check for these events (e.g.: the player avoided x attacks of the opponent in this battle or the player walked x miles) all time.
How can a server handle this large amount of operations without slowing down and maybe even crashing?
Achievement systems usually need data that is only used in the core engine of the game and wouldn't be needed out of there anyway if there weren't those nasty achievements (think of e.g.: how often the player jumped during each fight, you don't want to store all this information in a database.). What I mean is that in some cases the only way of adding an achievement would be adding the code that checks for its current state to the game core, and thats usually a very bad idea.
How do achievement systems interact with the core of the game that holds the later unnecessary information? (see examples above)
How are they separated from the core of the game?
My examples may seem "harmless" but think of the 1000+ achievements currently available in World of Warcraft and the many, many players online at the same time, for example.
Achievement systems are really just a form of logging. For a system like this, publish/subscribe is a good approach. In this case, players publish information about themselves, and interested software components (that handle individual achievements) can subscribe. This allows you to watch public values with specialised logging code, without affecting any core game logic.
Take your 'player walked x miles' example. I would implement the distance walked as a field in the player object, since this is a simple value to increment and does not require increasing space over time. An achievement that rewards players that walk 10 miles is then a subscriber of that field. If there were many players then it would make sense to aggregate this value with one or more intermediate broker levels. For example, if 1 million players exist in the game, then you might aggregate the values with 1000 brokers, each responsible for tracking 1000 individual players. The achievement then subscribes to these brokers, rather than to all the players directly. Of course, the optimal hierarchy and number of subscribers is implementation-specific.
In the case of your fight example, players could publish details of their last fight in exactly the same way. An achievement that monitors jumping in fights would subscribe to this info, and check the number of jumps. Since no historical state is required, this does not grow with time either. Again, no core code need be modified; you only need to be able to access some values.
Note also that most rewards do not need to be instantaneous. This allows you some leeway in managing your traffic. In the previous example, you might not update the broker's published distance travelled until a player has walked a total of one more mile, or a day has passed since last update (incrementing internally until then). This is really just a form of caching; the exact parameters will depend on your problem.
You can even do this if you don't have access to source, for example in videogame emulators. A simple memory-scan tool can be written to find the displayed score for example. Once you have that your achievement system is as easy as polling that memory location every frame and seeing if their current "score" or whatever is higher than their highest score. The cool thing about videogame emulators is that memory locations are deterministic (no operating system).
There are two ways this is done in normal games.
Offline games: nothing as complex as pub/sub - that's massive overkill. Instead you just use a big map / dictionary, and log named "events". Then every X frames, or Y seconds (or, usually: "every time something dies, and 1x at end of level"), you iterate across achievements and do a quick check. When the designers want a new event logged, it's trivial for a programmer to add a line of code to record it.
NB: pub/sub is a poor fit for this IME because the designers never want "when player.distance = 50". What they actually want is "when player's distance as perceived by someone watching the screen seems to have travelled past the first village, or at least 4 screen widths to the right" -- i.e. far more vague and abstract than a simple counter.
In practice, that means that the logic goes at the point where the change happens (before the event is even published), which is a poor way to use pub/sub. There are some game engines that make it easier to do a "logic goes at the point of receipt" (the "sub" part), but they're not the majority, IME.
Online games: almost identical, except you store "counters" (int that goes up), and usually also: "deltas" (circular buffers of what's-happened frame to frame), and: "events" (complex things that happened in game that can be hard-coded into a single ID plus a fixed-size array of parameters). These are then exposed via e.g SNMP for other servers to collect at low CPU cost and asynchronously
i.e. almost the same as 1 above, except that you're careful to do two things:
Fixed-size memory usage; and if the "reading" servers go offline for a while, achievements won in that time will need to be re-won (although you usually can have a customer support person manually go through the main system logs and work out that the achievement "probably" was won, and manually award it)
Very low overhead; SNMP is a good standard for this, and most teams I know end up using it
If your game architecture is Event-driven, then you can implement achievements system using finite-state machines.

Can you program a pure GPU game?

I'm a CS master student, and next semester I will have to start working on my thesis. I've had trouble coming up with a thesis idea, but I decided it will be related to Computer Graphics as I'm passionate about game development and wish to work as a professional game programmer one day.
Unfortunately I'm kinda new to the field of 3D Computer Graphics, I took an undergraduate course on the subject and hope to take an advanced course next semester, and I'm already reading a variety of books and articles to learn more. Still, my supervisor thinks its better if I come up with a general thesis idea now and then spend time learning about it in preparation for doing my thesis proposal. My supervisor has supplied me with some good ideas but I'd rather do something more interesting on my own, which hopefully has to do with games and gives me more opportunities to learn more about the field. I don't care if it's already been done, for me the thesis is more of an opportunity to learn about things in depth and to do substantial work on my own.
I don't know much about GPU programming and I'm still learning about shaders and languages like CUDA. One idea I had is to program an entire game (or as much as possible) on the GPU, including all the game logic, AI, and tests. This is inspired by reading papers on GPGPU and questions like this one I don't know how feasible that is with my knowledge, and my supervisor doesn't know a lot about recent GPUs. I'm sure with time I will be able to answer this question on my own, but it'd be handy if I could know the answer in advance so I could also consider other ideas.
So, if you've got this far, my question: Using only shaders or something like CUDA, can you make a full, simple 3D game that exploits the raw power and parallelism of GPUs? Or am I missing some limitation or difference between GPUs and CPUs that will always make a large portion of my code bound to CPU? I've read about physics engines running on the GPU, so why not everything else?
DISCLAIMER: I've done a PhD, but have never supervised a student of my own, so take all of what I'm about to say with a grain of salt!
I think trying to force as much of a game as possible onto a GPU is a great way to start off your project, but eventually the point of your work should be: "There's this thing that's an important part of many games, but in it's present state doesn't fit well on a GPU: here is how I modified it so it would fit well".
For instance, fortran mentioned that AI algorithms are a problem because they tend to rely on recursion. True, but, this is not necessarily a deal-breaker: the art of converting recursive algorithms into an iterative form is looked upon favorably by the academic community, and would form a nice center-piece for your thesis.
However, as a masters student, you haven't got much time so you would really need to identify the kernel of interest very quickly. I would not bother trying to get the whole game to actually fit onto the GPU as part of the outcome of your masters: I would treat it as an exercise just to see which part won't fit, and then focus on that part alone.
But be careful with your choice of supervisor. If your supervisor doesn't have any relevant experience, you should pick someone else who does.
I'm still waiting for a Gameboy Emulator that runs entirely on the GPU, which is just fed the game ROM itself and current user input and results in a texture displaying the game - maybe a second texture for sound output :)
The main problem is that you can't access persistent storage, user input or audio output from a GPU. These parts have to be on the CPU, by definition (even though cards with HDMI have audio output, but I think you can't control it from the GPU). Apart from that, you can already push large parts of the game code into the GPU, but I think it's not enough for a 3D game, since someone has to feed the 3D data into the GPU and tell it which shaders should apply to which part. You can't really randomly access data on the GPU or run arbitrary code, someone has to do the setup.
Some time ago, you would just setup a texture with the source data, a render target for the result data, and a pixel shader that would do the transformation. Then you rendered a quad with the shader to the render target, which would perform the calculations, and then read the texture back (or use it for further rendering). Today, things have been made simpler by the fourth and fifth generation of shaders (Shader Model 4.0 and whatever is in DirectX 11), so you can have larger shaders and access memory more easily. But still they have to be setup from the outside, and I don't know how things are today regarding keeping data between frames. In worst case, the CPU has to read back from the GPU and push again to retain game data, which is always a slow thing to do. But if you can really get to a point where a single generic setup/rendering cycle would be sufficient for your game to run, you could say that the game runs on the GPU. The code would be quite different from normal game code, though. Most of the performance of GPUs comes from the fact that they execute the same program in hundreds or even thousands of parallel shading units, and you can't just write a shader that can draw an image to a certain position. A pixel shader always runs, by definition, on one pixel, and the other shaders can do things on arbitrary coordinates, but they don't deal with pixels. It won't be easy, I guess.
I'd suggest just trying out the points I said. The most important is retaining state between frames, in my opinion, because if you can't retain all data, all is impossible.
First, Im not a computer engineer so my assumptions cannot even be a grain of salt, maybe nano scale.
Artificial intelligence? No problem.There are countless neural network examples running in parallel in google. Example: http://www.heatonresearch.com/encog
Pathfinding? You just try some parallel pathfinding algorithms that are already on internet. Just one of them: https://graphics.tudelft.nl/Publications-new/2012/BB12a/BB12a.pdf
Drawing? Use interoperability of dx or gl with cuda or cl so drawing doesnt cross pci-e lane. Can even do raytracing at corners so no z-fighting anymore, even going pure raytraced screen is doable with mainstream gpu using a low depth limit.
Physics? The easiest part, just iterate a simple Euler or Verlet integration and frequently stability checks if order of error is big.
Map/terrain generation? You just need a Mersenne-twister and a triangulator.
Save game? Sure, you can compress the data parallelly before writing to a buffer. Then a scheduler writes that data piece by piece to HDD through DMA so no lag.
Recursion? Write your own stack algorithm using main vram, not local memory so other kernels can run in wavefronts and GPU occupation is better.
Too much integer needed? You can cast to a float then do 50-100 calcs using all cores then cast the result back to integer.
Too much branching? Compute both cases if they are simple, so every core is in line and finish in sync. If not, then you can just put a branch predictor of yourself so the next time, it predicts better than the hardware(could it be?) with your own genuine algorithm.
Too much memory needed? You can add another GPU to system and open DMA channel or a CF/SLI for faster communication.
Hardest part in my opinion is the object oriented design since it is very weird and hardware dependent to build pseudo objects in gpu. Objects should be represented in host(cpu) memory but they must be separated over many arrays in gpu to be efficient. Example objects in host memory: orc1xy_orc2xy_orc3xy. Example objects in gpu memory: orc1_x__orc2_x__ ... orc1_y__orc2_y__ ...
The answer has already been chosen 6 years ago but for those interested to the actual question, Shadertoy, a live-coding WebGL platform, recently added the "multipass" feature allowing preservation of state.
Here's a live demo of the Bricks game running on Gpu.
I don't care if it's already been
done, for me the thesis is more of an
opportunity to learn about things in
depth and to do substantial work on my
own.
Then your idea of what a thesis is is completely wrong. A thesis must be an original research. --> edit: I was thinking about a PhD thesis, not a master thesis ^_^
About your question, the GPU's instruction sets and capabilities are very specific to vector floating point operations. The game logic usually does little floating point, and much logic (branches and decision trees).
If you take a look to the CUDA wikipedia page you will see:
It uses a recursion-free,
function-pointer-free subset of the C
language
So forget about implementing any AI algorithms there, that are essentially recursive (like A* for pathfinding). Maybe you could simulate the recursion with stacks, but if it's not allowed explicitly it should be for a reason. Not having function pointers also limits somewhat the ability to use dispatch tables for handling the different actions depending on state of the game (you could use again chained if-else constructions, but something smells bad there).
Those limitations in the language reflect that the underlying HW is mostly thought to do streaming processing tasks. Of course there are workarounds (stacks, chained if-else), and you could theoretically implement almost any algorithm there, but they will probably make the performance suck a lot.
The other point is about handling the IO, as already mentioned up there, this is a task for the main CPU (because it is the one that executes the OS).
It is viable to do a masters thesis on a subject and with tools that you are, when you begin, unfamiliar. However, its a big chance to take!
Of course a masters thesis should be fun. But ultimately, its imperative that you pass with distinction and that might mean tackling a difficult subject that you have already mastered.
Equally important is your supervisor. Its imperative that you tackle some problem they show an interest in - that they are themselves familiar with - so that they can become interested in helping you get a great grade.
You've had lots of hobby time for scratching itches, you'll have lots more hobby time in the future too no doubt. But master thesis time is not the time for hobbies unfortunately.
Whilst GPUs today have got some immense computational power, they are, regardless of things like CUDA and OpenCL limited to a restricted set of uses, whereas the CPU is more suited towards computing general things, with extensions like SSE to speed up specific common tasks. If I'm not mistaken, some GPUs have the inability to do a division of two floating point integers in hardware. Certainly things have improved greatly compared to 5 years ago.
It'd be impossible to develop a game to run entirely in a GPU - it would need the CPU at some stage to execute something, however making a GPU perform more than just the graphics (and physics even) of a game would certainly be interesting, with the catch that game developers for PC have the biggest issue of having to contend with a variety of machine specification, and thus have to restrict themselves to incorporating backwards compatibility, complicating things. The architecture of a system will be a crucial issue - for example the Playstation 3 has the ability to do multi gigabytes a second of throughput between the CPU and RAM, GPU and Video RAM, however the CPU accessing GPU memory peaks out just past 12MiB/s.
The approach you may be looking for is called "GPGPU" for "General Purpose GPU". Good starting points may be:
http://en.wikipedia.org/wiki/GPGPU
http://gpgpu.org/
Rumors about spectacular successes in this approach have been around for a few years now, but I suspect that this will become everyday practice in a few years (unless CPU architectures change a lot, and make it obsolete).
The key here is parallelism: if you have a problem where you need a large number of parallel processing units. Thus, maybe neural networks or genetic algorithms may be a good range of problems to attack with the power of a GPU. Maybe also looking for vulnerabilities in cryptographic hashes (cracking the DES on a GPU would make a nice thesis, I imagine :)). But problems requiring high-speed serial processing don't seem so much suited for the GPU. So emulating a GameBoy may be out of scope. (But emulating a cluster of low-power machines might be considered.)
I would think a project dealing with a game architecture that targets multiple core CPUs and GPUs would be interesting. I think this is still an area where a lot of work is being done. In order to take advantage of current and future computer hardware, new game architectures are going to be needed. I went to GDC 2008 and there were ome talks related to this. Gamebryo had an interesting approach where they create threads for processing computations. You can designate the number of cores you want to use so that if you don't starve out other libraries that might be multi-core. I imagine the computations could be targeted to GPUs as well.
Other approaches included targeting different systems for different cores so that computations could be done in parallel. For instance, the first split a talk suggested was to put the renderer on its own core and the rest of the game on another. There are other more complex techniques but it all basically boils down to how do you get the data around to the different cores.