I keep hearing from associates about grid computing which, from what I can gather, is highly distributed stuff along the lines of SETI#Home.
Is anyone working on these sort of systems for business use? My interest is in figuring out if there's a commercial reason for starting software development in this field.
Rendering Farms such as Pixar
Model Evaluation e.g. weather, financials, military
Architectural Engineering e.g. earthquakes.
To list a few.
Grid computing is really only needed if you have a lot of WORK that needs to be done, like folding proteins, otherwise a simple server farm will likely be plenty.
Obviously Google are major users of Grid Computing; all their search service relies on it, and many others.
Engines such as BigTable are based on using lots of nodes for storage and computation. These are commercially very useful because they're a good alternative to a small number of big servers, providing better redundancy and cost effective scaling.
The downside is that the software is fiendishly difficult to write, but Google seem to manage that one ok :)
So anything which requires big storage and/or lots of computation.
I used to work for these guys. Grid computing is used all over. Anyone who makes computer chips uses them to test designs before getting physical silicon cut. Financial websites use grids to calculate if you qualify for that loan. These days they are starting to replace big iron in a lot of places, as they tend to be cheaper to maintain over the long term.
Related
I am aware that game engines like Unity, Unreal, Cry Engine provide almost all the tools necessary to build an AAA title game. Its also the best choice if the game has a tight release data or if your new to game development. But since they are generalist game engines (meaning that they are made to fit multiple genre of games. Correct me if in wrong) for some games (next-gen or games which require a lot of performance), they might leave some performance on the table, something which could be accomplished by developing a custom engine.
This brings me to my question,
Do game developers (indie game developers, large teams or even companies) still build game engines from scratch to tailor fit a game or a game franchise?
Thank You!
when we talk about big companies like Ubisoft or rockstar they built their own engines and didn't use Unity or unreal
Rockstar uses "Rockstar Advanced Game Engine"
and Ubisoft uses "AnvilNext"
but why?
there are millions of reasons they do such a thing, I'm gonna say just 2 from #scremyCat
the support
and the license
Support: Highest degree of support and understanding - as they built it all, they understand all of its internals and can offer
complete support. E.g. A game needs X feature, they'll easily know
if they can implement it or not. Another benefit of this is not
having to wait on external entities, if there's a game breaking bug
in the engine they can get right on it, while a third party engine
depending on the licensing agreement this might not be possible
(though they would typically license the source code anyway).
License: Licensing - as an indie developer accepting that you might have to pay a small percentage of your revenue for licensing
the engine might not be as much of an issue seeing as the amount you
need to breakeven is unlikely going to be very high and chances are
you're already making when your revenue is at the levels needed to
pay a %, and your total revenue from a game isn't likely going to be
huge anyway so the amount in licensing fees you need to pay may seem
very reasonable. Meanwhile a AAA game will have a much higher
break-even target and their expected revenue is most definitely in
the tens to hundreds of millions, which now means they're paying a
large amount in licensing fees. Now it should be said they usually
get much better licensing deals to begin with than the indie dev
gets, but still they're paying huge amounts.
As for timeframe, it can take years to fully develop an engine of their scale. Often why you'll see them using the same version of the engine for a good cycle of games whilst working on the next version of the engine. And as for what's involved, a LOT. They need to handle every platform they'll be targeting, the rendering, the physics, the AI, the audio, the input, the file system access, the asset management pipeline, the tools, etc.
How are they better than current popular engines? They aren't necessarily (to other developers), but to them with their own reasoning for doing it they are. The simplest answer for how can they be better is that when you're creating your own engine from scratch you can do whatever you want.
It should also be said that developing your own engine isn't just limited to large game companies, a number of smaller developers also do this. The more popular reasons for this are typically because they enjoy it, and have some functionality they want that isn't available in existing options. E.g. While you can create many games with Unity or Unreal, there's plenty of things which just aren't feasible or might take considerable work to even make possible anyway. This can be a reason for a smaller dev to make their own engine.
Yes, they absolutely do. Nintendo is a good example.
i am currently working on an existing infrastructure where i have about a 1000 customer sites connected to about 5 different hubs. A customer site may connect to one or two hubs to ensure reliability but each customer site is connected to at least one hub. I want to ensure if the current system is the best or can be optimised to have better connection from customer sites to hubs, to help improve connectivity and reliability. Can you suggest good Optimisation Algorithms to look into?. Thank you
Sounds like you're doing some variation of the Facility Problem.
This is a well-known problem, and while there are algorithms that can solve for the global optimum (Djiskra's Algorithm, or other variants of Dynamic Programming), they do not scale well (i.e. you run into the curse of dimensionality). You could try this, but 1000 sounds already pretty big (depends on your exact problem formulation though).
I'd recommend taking a look at this coursera mooc Discrete Optimization. You don't have to take the whole course, but in the "Assignments" section of the video lectures, he also explains a variant of the Facility problem, some possible approaches to think about, and once you've decided which one you want to use, you can look deeper into that particular approach.
I am looking for a project idea in distributed processing on Unix based systems. I wish to use only the C programming language. I have to finish the project in 4 months and it's a part of my course work. Can someone help me with an idea?
Cryptography problems
Distributed Ray Tracer
Chess AI (really, AI for any game)
Large Prime Number Search
Web crawler or other search mechanism
Generic Problem Solver (push out problem definition on the fly, followed by problem data).
Note on the last one:
An example would be if you have a gaming website with lots of board games that you were coming out with all the time. You don't want to have to install new clients on all your servers every time you write a new AI for a board game, so you have a program which you can send new AIs to and then after that you can just send the game data and the pushed AI will be used to solve the problem. This is best used for problems which can be broken into smaller chunks.
It is hard to answer without knowing anything about performance, the scale of the project, what you are trying to accomplish, etc. For example, is it one task or multiple tasks? Is the project just totally open?
4 months is pretty short, but maybe some kind of physics problem or math problem. Sorting or some kind of database work might be dull but beneficial.
Check out mapreduce for ideas! I was really motivated by this work, personally.
We used distributed processing here at work, but it's such a broad field..
Yeah.
Why not write a distributed compiler. You may then present an interface for people to compile things on the fly, and it will be passed to your distribute compilenet. Java is probably well-suited, and you'll get to do fun things, like be very mindful of security and so on.
The BOINC project is always looking for help and is very interesting:
http://boinc.berkeley.edu/
If you want to leave your mark and change the way we search the web,
look into B-Trees.
B-Trees and offspring/variants are the working horse of the internet.
Google uses them extensively to index the web.
Database indexes/indices are B-Tree offspring/variants.
Every LAMP system uses a database and indexes/indices.
Also, they are used extensively in distributed VLDB (Very Large DataBases)
Perhaps you can improve existing distributed databases (Cassandra and HBase)
These are lofty goals, but for me, this would leave a lasting mark
in the way Web data is processed, indexed and stored.
Write a distributed, fault tolerant, redundant network B+Tree or B*Tree.
Read Drozdek's book Data Structures and Algorithms in C++.
It's a good survey of B-Trees.
Read about skip trees
http://www.cs.huji.ac.il/~ittaia/papers/AAY-OPODIS05.pdf
Read about Efficient B-tree Based Indexing for Cloud Data Processing
http://www.comp.nus.edu.sg/~ooibc/vldb10-cgindex.pdf
Google search "Network B+Tree"
https://www.google.com/search?rlz=1C1CHKZ_enUS431US431&sourceid=chrome&ie=UTF-8&q=Network+B%2BTree
We've got a fairly large application running on VxWorks 5.5.1 that's been developed and modified for around 10 years now. We have some simple home-grown tools to show that we are not using too much memory or too much processor, but we don't have a good feel for how much headroom we actually have. It's starting to make it difficult to do estimates for future enhancements.
Does anybody have any suggestions on how to profile such a system? We've never had much luck getting the Wind River tools to work.
For bonus points: the other complication is that our system has very different behaviors at different times; during start-up it does a lot of stuff, then it sits relatively idle except for brief bursts of activity. If there is a profiler with some programmatic way to have to record state information, I think that'd be very useful too.
FWIW, this is compiled with GCC and written entirely in C.
I've done a lot of performance tuning of various kinds of software, including embedded applications. I won't discuss memory profiling - I think that is a different issue.
I can only guess where the "well-known" idea originated that to find performance problems you need to measure performance of various parts. That is a top-down approach, similar to the way governments try to control budget waste, by subdividing. IMHO, it doesn't work very well.
Measurement is OK for seeing if what you did made a difference, but it is poor at telling you what to fix.
What is good at telling you what to fix is a bottom-up approach, in which you examine a representative sample of microscopic units of what is being spent, and finding out the full explanation of why each one is being spent. This works for a simple statistical reason. If there is a reason why some percent (for example 40%) of samples can be saved, on average 40% of samples will show it, and it doesn't require a huge number of samples. It does require that you examine each sample carefully, and not just sort of aggregate them into bigger bunches.
As a historical example, this is what Harry Truman did at the outbreak of the U.S. involvement in WW II. There was terrific waste in the defense industry. He just got in his car, drove out to the factories, and interviewed the people standing around. Then he went back to the U.S. Senate, explained what the problems were exactly, and got them fixed.
Maybe this is more of an answer than you wanted. Specifically, this is the method I use, and this is a blow-by-blow example of it.
ADDED: I guess the idea of finding-by-measuring is simply natural. Around '82 I was working on an embedded system, and I needed to do some performance tuning. The hardware engineer offered to put a timer on the board that I could read (providing from his plenty). IOW he assumed that finding performance problems required timing. I thanked him and declined, because by that time I knew and trusted the random-halt technique (done with an in-circuit-emulator).
If you have the Auxiliary Clock available, you could use the SPY utility (configurable via the config.h file) which does give you a very rough approximation of which tasks are using the CPU.
The nice thing about it is that it does not require being attached to the Tornado environment and you can use it from the Kernel shell.
Otherwise, btpierre's suggestion of using taskHookAdd has been used successfully in the past.
I've worked on systems that have had luck using locally-built monitoring utilities based on taskSwitchHookAdd and related functions (delete hook, etc).
"Simply" use this to track the number of ticks a given task runs. I realize that this is fairly gross scale information for profiling, but it can be useful depending on your needs.
To see how much cpu% each task is using, calculate the percentage of ticks assigned to each task.
To see how much headroom you have, add a lowest priority "idle" task that just does "while(1){}", and see how much cpu% it is assigned to it. Roughly speaking, that's your headroom.
I have custom coded several enterprise applications for mid to large organizations to use internally (some with a minimal external footprint). I now have plans for a web project that may (hopefully) see a large userbase with more daily traffic than my previous projects have ever attained. Obviously I want my design to be scalable and maintainable. The problem is that from a physical layout perspective (servers/VMs) I do not know what to expect.
The question: What are some good resources for this? Books? Websites? I have found plenty on scalable application design, but nothing on scalable physical design.
It's hard to give exact answer without knowing something about what technologies you plan to use. The approach to the application can't be completely unaware of planned physical infrastructure if scaling is a major driver.
Caching would have to be a big concern. Also ways to expand the hardware where your data lives.
A very interesting and instructive read is the real world bio of live journal, a history of scaling, and how they grew their physical presence with a massive growth in their website. One major offshoot of their work was a new caching technology, memcached, which is now used by FaceBook among others. It is surprisingly honest.
The High Scalability blog is good. You can look at some of their examples that go over the physical parts of large sites. I would say the common first level physical scaling technique would be a load balancer. That is pretty easy but at the simplest you still have a database that is a potential bottleneck. Most of the physical parts of scaling require you to just add more and the real issues come in where you are forced to use just one of something.