Protection against automation - automation

One of our next projects is supposed to be a MS Windows based game (written in C#, with a winform GUI and an integrated DirectX display-control) for a customer who wants to give away prizes to the best players. This project is meant to run for a couple of years, with championships, ladders, tournaments, player vs. player-action and so on.
One of the main concerns here is cheating, as a player would benefit dramatically if he was able to - for instance - let a custom made bot play the game for him (more in terms of strategy-decisions than in terms of playing many hours).
So my question is: what technical possibilites do we have to detect bot activity? We can of course track the number of hours played, analyze strategies to detect anomalies and so on, but as far as this question is concerned, I would be more interested in knowing details like
how to detect if another application makes periodical screenshots?
how to detect if another application scans our process memory?
what are good ways to determine whether user input (mouse movement, keyboard input) is human-generated and not automated?
is it possible to detect if another application requests informations about controls in our application (position of controls etc)?
what other ways exist in which a cheater could gather informations about the current game state, feed those to a bot and send the determined actions back to the client?
Your feedback is highly appreciated!

I wrote d2botnet, a .net diablo 2 automation engine a while back, and something you can add to your list of things to watch out for are malformed /invalid/forged packets. I assume this game will communicate over TCP. Packet sniffing and forging are usually the first way games (online anyways) are automated. I know blizzard would detect malformed packets, somehting i tried to stay away from doing in d2botnet.
So make sure you detect invalid packets. Encrypt them. Hash them. do somethign to make sure they are valid. If you think about it, if someone can know exactly what every packet means that is sent back and forth they dont even need to run the client software, which then makes any process based detection a moot point. So you can also add in some sort of packet based challenge response that your cleint must know how to respond to.

Just an idea what if the 'cheater' runs your software in a virtual machine (like vmware) and makes screenshots of that window? I doubt you can defend against that.
You obviously can't defend against the 'analog gap', e.g. the cheater's system makes external screenshots with a high quality camera - I guess it's only a theoretical issue.
Maybe you should investigate chess sites. There is a lot of money in chess, they don't like bots either - maybe they have come up with a solution already.

The best protection against automation is to not have tasks that require grinding.
That being said, the best way to detect automation is to actively engage the user and require periodic CAPTCHA-like tests (except without the image and so forth). I'd recommend utilizing a database of several thousand simple one-off questions that get posed to the user every so often.
However, based on your question, I'd say your best bet is to not implement the anti-automation features in C#. You stand very little chance of detecting well-written hacks/bots from within managed code, especially when all the hacker has to do is simply go into ring0 to avoid detection via any standard method. I'd recommend a Warden-like approach (download-able module that you can update whenever you feel like) combined with a Kernel-Mode Driver that hooks all of the windows API functions and watches them for "inappropriate" calls. Note, however, that you're going to run into a lot of false positives, so you need to not base your banning system on your automated data. Always have a human look over it before banning.

A common method of listening to keyboard and mouse input in an application is setting a windows hook using SetWindowsHookEx.
Vendors usually try to protect their software during installation so that hacker won't automate and crack/find a serial for their application.
Google the term: "Key Loggers"...
Here's an article that describes the problem and methods to prevent it.

I have no deeper understanding on how PunkBuster and such softwar works, but this is the way I'd go:
Iintercept calls to the API functions that handle the memory stuff like ReadProcessMemory, WriteProcessMemory and so on.
You'd detect if your process is involved in the call, log it, and trampoline the call back to the original function.
This should work for the screenshot taking too, but you might want to intercept the BitBlt function.
Here's a basic tutorial concerning the function interception:
Intercepting System API Calls

You should look into what goes into Punkbuster, Valve Anti-Cheat, and some other anti-cheat stuff for some pointers.
Edit: What I mean is, look into how they do it; how they detect that stuff.

I don't know the technical details, but Intenet Chess Club's BlitzIn program seems to have integrated program switching detection. That's of course for detecting people running a chess engine on the side and not directly applicable to your case, but you may be able to extrapolate the apporach to something like if process X takes more than Z% CPU time the next Y cycles, it's probably a bot running.
That in addition to a "you must not run anything else while playing the game to be eligible for prizes" as part of the contest rules might work.
Also, a draconian "we might decide in any time for any reason that you have been using a bot and disqualify you" rule also helps with the heuristic approach above (used in prized ICC chess tournaments).
All these questions are easily solved by the rule 1 above:
* how to detect if another application makes periodical screenshots?
* how to detect if another application scans our process memory?
* what are good ways to determine whether user input (mouse movement, keyboard input) is human-generated and not automated?
* is it possible to detect if another application requests informations about controls in our application (position of controls etc)?
I think a good way to make harder the problem to the crackers is to have the only authoritative copies of the game state in your servers, only sending to and receiving updates from the clients, that way you can embed in the communication protocol itself client validation (that it hasn't been cracked and thus the detection rules are still in place). That, and actively monitoring for new weird behavior found might get you close to where you want to be.

Related

A Conceptual Understanding of APIs

I have been learning coding for about a month now. I have some good experience with Python, and additionally I have completed this web development course on Udacity.
Now, I have a good foundation for programming, but one thing that confused me a lot is how to interact with various websites and APIs. The course I did briefly touched on this in terms of XML and JSON files and how some webpages offer their pages in these formats for easier reading by other machines.
But there are still a bunch of tasks which I have no idea how to approach whatsoever, but want to eventually do. I have constructed some hypothetical examples for the purpose of this question. I will post my current rough understanding of how I would do them below each one, and would appreciate feedback (on the API interaction, not on the front-end or on any back-end algorithms/AI/parsing):
Creating a phone application (disregarding the front-end part) which can then communicate with and perform rudimentary tasks on my computer.
I have no idea how to do this, and my guess would be that I would have to look into some external application/API meant for this process and implement this on both-ends of the system.
Being able to write a bot which goes on to a game website and controls the object via script. (e.g going onto a pacman game website written in flash and automatically controlling the character to avoid the ghosts)
I don't even know if this is possible, or how browser flash games interact handle the user-server interaction. Would I have to post some data via HTTP manually in the same way that playing in the keyboard would do? Or is everything done client side, in which case how would I fake user input? How would I get information on the ghost's position to work the AI?
Creating a mobile app for my school by allowing users to put their username and password into the app and then having the app automatically log in to the school and fetch certain data (e.g timetable) and return back in a readable form.
I'm guessing that I would take the input from the user on my mobile-app, and then navigate to the school's login page and POST this data in the relevant forms to log in. And then that I would (somehow, not sure), navigate to the timetable URL through my code while still managing to stay logged in, and then parse the html there?
I would appreciate some detail on how these kind of things are done, preferably with reference to these examples, so that I can get a better conceptual understanding.
Thanks!
Note: I have asked all those various questions mostly rhetorically, just so that those reading can get a better understanding of what my current programming level and understanding of APIs is at. I do not expect nor require specific answers for each and every question (so I hope this doesn't get flagged as being too vague or requiring too much detail!), I just would appreciate some responses telling me roughly how each of these APIs work approximately and how I would even start at looking at how to do these things.
You asked too many questions and honestly speaking I am not able to read and grasp entire text posted by you.
So, I am focusing only the title of your question:
"A conceptual understanding of API"
API (Application Programming Interface) means a set of functions which you can directly use by simply passing parameters to them.
Actually, in application development there are many common functions which every application programmer have to use. So, instead of coding them every time by every programmer, they are already coded in functions which you can use simply by passing parameters to them (if they need any external parameter).
Example:
I am offering you a maths API, set of functions {add, sub, mul, div}. You can pass two numbers to any of these four functions and get desired result instead of coding every time for ever operation like add, sub, mul and div.
Hope it helps...

Hardware Serial Number Discussion. Licence protection

I am working on some application wich will get HDD serial number and then i will use that HDD serial number for licence (cd-key) registration with product. Now the problems wich i can come to:
User have 2 HDD's and once my application gets its serial from first HDD it will register with it so what if user later changes order of HDD's? if the seccond HDD becomes a Master and the first one becomes slave? could be solved with getting both and combine them togather but what if later he removes one then? :D
What if user's HDD dies and he buys new one? Is still same pc only another HDD. So the licence wont be walid anymore just because is another HDD.
Is it possible to fake it? Example i am using VB.net 2010 and application is working on framework(.net) so there is some "dll" wich is responsible to get the serial of HDD so would be possible to replace this "dll" (crack it) so it returns some hardcoded serial of hdd?!?
Could be possible to get processor serial? that would be much batter but could it be done? and does the processor have serial, i mean probably have but is it possible to get it? and same question as abowe could it be faked through changing "dll" or something?
anny other suggestions or experiances?
I seen there are more questions like this but couldnt find some answers so now i ask here!
------ EDITED/ADDED: -------
As talked below i forgot all .net can be decompiled in few secconds! so...
Making own installer. Why?
if i make an installer in wich you enter serial and only if serial is ok to use then install software so what it does? it extracts my software to your computer and again you have ".net" exe wich you can easely decompile and make a crack for it so where is point in making installation with serial!? or if my software is "protected" with some obfuscator so then installation with serial is unneded here i could then simply include serial registration in my software and using some booleans store registered=1||0
i got email from one person here, btw. duno where you got my mail :) and he says some smart things and why some of you people dont respond to my question and this discussion and what he says is this: "scared that others will see my code and how bad it is." so then people just dont want to spent time on this. well thats not problem i know my code is big "minestrone", big mess much words(variables) some on english some on croatian so on well my software is working thats important and i know i suc* we all suc* everyone knows something(more or batter) that the other one. anyway, thats not problem, problem is that i dont want that the software is open source lets say my software is "photoshop" and now someone downloads click there and there and have the whole code and can easely copy paste change few things and no problem he made good application :)
custom compiler? anyone have experiances? would it be ok for some time? :)
what other solution or language would be good to use in future to avoid this "open source" .net! i been looking around so for vb.net, c#, c++ is all based on .net so is all same. vb6 wich i love again same thing. they all can easely be decompiled! what language could not be so easy to decompile? should i switch to assembler? :D i joke, i hope! :p
maybe i just too much stressed up, much work! duno you decide :)
PLEASE READ MY QUESTION AND PLEASE DONT ANSWER ME SOMETHING LIKE "PIRACY CANOT BE STOPED BLA BLA" AND THINGS LIKE THAT. THAT WASNT MY QUESTION! THANK YOU!
Sorry on bold big latters but some people read just title and then answer stupidities! If you want talk about it then read question and write otherwise dont post some stupidities please
Let me first answer your questions:
If the order of the HDDs changes, your application could still find that serial number within the system. However, in either case I would resort to a scheme where I use the device of the system partition or so.
If the HDD dies, the user will be in trouble. There is no good solution to that as long as you insist on your source for the uniqueness of the user's system: i.e. the HDD serial.
It's absolutely possible, yes. At different levels, though. A cracker would always choose the simplest method.
Yes. I'm afraid that will only work with unmanaged code, though. See Wikipedia. And yes, this could be circumvented again by DLL placement (see my comment on the question).
Now let me give you an advice that worked fine for me. Use the SID of the machine account (not to be confused with SYSTEM, which has a well-known SID). And before you counter with NewSid (which, by the way has been retired by MS), this is much more effort to change, especially in domain environments and can have very nasty and unforeseen effects. Therefore if you want to tie your application to a Windows installation, the SID will be sufficient. The SID has the same advantages as a UUID you could create, but it's not as easy to manipulate as a UUID that you store in the registry or a file.
Oh, and before I forget to mention it. Yes, even using the SID can be "cracked" in various ways. But it balances convenience for the user with your demand for security.
Yes, you have to be aware of that. You'll need several fall back methods to take care of this
You have to be aware of that as well.
Everything is fakeable with some energy behind it. However, why fake such an id if you simply can manipulate the program itself? All .net code can be disassembled and manipulated
I think this is possible as well, but would have the same problem behind it.
Other suggestion:
Just because there is piracy, don't make the experience bad for your customer. Use something that is reuseable (like a serial number or keyfile), invest in a good obfuscator to make it harder for somebody to inspect your code, but beyond all: Make your application stand out so people buy it. And even though you didn't ask for it, I have to say it - you can't stop piracy by enforcing orwellian-like surveillance of your program. This will drive customers away as it is a pain in the *ss to work with your application. With a serial or keyfile you still have some sort of protection, the customer likes it because it is easy to use, he doesn't have to call you/write a support ticket if his computer fails or the stars align unfavourable. Pirates will break it eventually, but your customer is happy, and that is what counts.
Anything you rely on which is in userland can and will be spoofed if it is worthwhile to the end user/attacker. So locking the licence to an HDD serial number will not put of attackers, but it will seriously upset your customers.
The same goes for processor serial numbers - it is too easy to pop some code inline to change what your application will read.
Your only reasonable bet will be dongles - ie specific hardware, or a way to get them to register and run with an online connection, so you can validate them using elements you control (although in saying that, if your app is high enough value, expect the dongle to be hacked/replicated too!)
Your biggest problem may be overdoing the security - if you get it wrong in any way you will alienate your customer base.
People regularly upgrade failed hard drives, or those which are too small, as well as most other components in their computers. If you stop them using your product, even for a couple of days, they are likely to look elsewhere!
You can do what you are suggesting, but there are issues. What you are suggesting is called "machine binding" in the licensing world. There are commercial tools that do this for you (disclaimer: I work for one such provider Wibu-Systems). What YOU are proposing has some pros and cons:
Pros: requires no separate hardware (dongle), you can roll your own, easy solution to implement at a basic level.
Cons: can be cracked in a matter of minutes, will create problems for users when they change the HW config or move the app to a new PC, rolling your own will introduce the oppty for new bugs in an area you apparently have no prior experience with.
Why not use a commercial solution? Would you write your own setup program, too? How about your own compiler, linker, and debugger?

Why use AMQP/ZeroMQ/RabbitMQ

as opposed to writing your own library.
We're working on a project here that will be a self-dividing server pool, if one section grows too heavy, the manager would divide it and put it on another machine as a separate process. It would also alert all connected clients this affects to connect to the new server.
I am curious about using ZeroMQ for inter-server and inter-process communication. My partner would prefer to roll his own. I'm looking to the community to answer this question.
I'm a fairly novice programmer myself and just learned about messaging queues. As i've googled and read, it seems everyone is using messaging queues for all sorts of things, but why? What makes them better than writing your own library? Why are they so common and why are there so many?
what makes them better than writing your own library?
When rolling out the first version of your app, probably nothing: your needs are well defined and you will develop a messaging system that will fit your needs: small feature list, small source code etc.
Those tools are very useful after the first release, when you actually have to extend your application and add more features to it.
Let me give you a few use cases:
your app will have to talk to a big endian machine (sparc/powerpc) from a little endian machine (x86, intel/amd). Your messaging system had some endian ordering assumption: go and fix it
you designed your app so it is not a binary protocol/messaging system and now it is very slow because you spend most of your time parsing it (the number of messages increased and parsing became a bottleneck): adapt it so it can transport binary/fixed encoding
at the beginning you had 3 machine inside a lan, no noticeable delays everything gets to every machine. your client/boss/pointy-haired-devil-boss shows up and tell you that you will install the app on WAN you do not manage - and then you start having connection failures, bad latency etc. you need to store message and retry sending them later on: go back to the code and plug this stuff in (and enjoy)
messages sent need to have replies, but not all of them: you send some parameters in and expect a spreadsheet as a result instead of just sending and acknowledges, go back to code and plug this stuff in (and enjoy.)
some messages are critical and there reception/sending needs proper backup/persistence/. Why you ask ? auditing purposes
And many other use cases that I forgot ...
You can implement it yourself, but do not spend much time doing so: you will probably replace it later on anyway.
That's very much like asking: why use a database when you can write your own?
The answer is that using a tool that has been around for a while and is well understood in lots of different use cases, pays off more and more over time and as your requirements evolve. This is especially true if more than one developer is involved in a project. Do you want to become support staff for a queueing system if you change to a new project? Using a tool prevents that from happening. It becomes someone else's problem.
Case in point: persistence. Writing a tool to store one message on disk is easy. Writing a persistor that scales and performs well and stably, in many different use cases, and is manageable, and cheap to support, is hard. If you want to see someone complaining about how hard it is then look at this: http://www.lshift.net/blog/2009/12/07/rabbitmq-at-the-skills-matter-functional-programming-exchange
Anyway, I hope this helps. By all means write your own tool. Many many people have done so. Whatever solves your problem, is good.
I'm considering using ZeroMQ myself - hence I stumbled across this question.
Let's assume for the moment that you have the ability to implement a message queuing system that meets all of your requirements. Why would you adopt ZeroMQ (or other third party library) over the roll-your-own approach? Simple - cost.
Let's assume for a moment that ZeroMQ already meets all of your requirements. All that needs to be done is integrating it into your build, read some doco and then start using it. That's got to be far less effort than rolling your own. Plus, the maintenance burden has been shifted to another company. Since ZeroMQ is free, it's like you've just grown your development team to include (part of) the ZeroMQ team.
If you ran a Software Development business, then I think that you would balance the cost/risk of using third party libraries against rolling your own, and in this case, using ZeroMQ would win hands down.
Perhaps you (or rather, your partner) suffer, as so many developers do, from the "Not Invented Here" syndrome? If so, adjust your attitude and reassess the use of ZeroMQ. Personally, I much prefer the benefits of Proudly Found Elsewhere attitude. I'm hoping I can proud of finding ZeroMQ... time will tell.
EDIT: I came across this video from the ZeroMQ developers that talks about why you should use ZeroMQ.
what makes them better than writing your own library?
Message queuing systems are transactional, which is conceptually easy to use as a client, but hard to get right as an implementor, especially considering persistent queues. You might think you can get away with writing a quick messaging library, but without transactions and persistence, you'd not have the full benefits of a messaging system.
Persistence in this context means that the messaging middleware keeps unhandled messages in permanent storage (on disk) in case the server goes down; after a restart, the messages can be handled and no retransmit is necessary (the sender does not even know there was a problem). Transactional means that you can read messages from different queues and write messages to different queues in a transactional manner, meaning that either all reads and writes succeed or (if one or more fail) none succeeds. This is not really much different from the transactionality known from interfacing with databases and has the same benefits (it simplifies error handling; without transactions, you would have to assure that each individual read/write succeeds, and if one or more fail, you have to roll back those changes that did succeed).
Before writing your own library, read the 0MQ Guide here: http://zguide.zeromq.org/page:all
Chances are that you will either decide to install RabbitMQ, or else you will make your library on top of ZeroMQ since they have already done all the hard parts.
If you have a little time give it a try and roll out your own implemntation! The learnings of this excercise will convince you about the wisdom of using an already tested library.

What methods do you use to test for scalability in web applications?

Our testing system is pretty rudimentary; fire up a browser, see if it works. Recently we ran into problems, found by our client, with our application where the number of users created a slow-down in the application. The application is basically a huge Word document with people editing their own versions all at the same time. Part of the problem came from not knowing how to test multiple instances at the same time. My partner and I thought about how to test this; one idea was to hire out an internet cafe and hire students for an hour to bang on the app.
What are other ways that people have tried to emulate concurrency in testing their web-based application? Most of the advice here is for specific methodology; I'm asking, how do you test it to make sure that it works?
If you have never checked out Selenium, then you need to. It will allow you to do automated web testing through the browser. Ok, so first problem solved.
Now ideally you could use that same script and load it up on a bunch of boxes and run them all at once to get some sort of load testing right? Luckily for you someone has already figured this out, although it is a paid service: Browser Mob. But, it looks like you were willing to spend a little money to do this anyway, and would probably net you better, more repeatable results.
We usually answer the question "can the web application do more than one thing at a time" by using JMeter to produce a simulated HTTP load on the web server.
I find that it helps to consider distinguish several different types of testing; concurrency (what happens when two events in the system collide), capacity (what happens when there are many overlapping requests), volume (what happens as data accumulates in the system)...
Huge general slow down, evidenced by response times that fall outside of the SLA, are usually related to capacity problems (with contention as a common cause) or volume (many users, much data, and the system gets slower over time). The former usually requires some sort of multi-threaded request stream; the latter you can usually manage by preloading the volume, and then measuring the response times experienced by a single user.
I generally find that separating the load generator from the actual measurement/instrumentation is a good idea. That can be as simple as having a black box over there to generate a typical load, and sitting here with a stop watch measuring the responsiveness of a typical use case.
JMeter http://jmeter.apache.org/

How to save a program's progress, and resume later?

You may know a lot of programs, e.g some password cracking programs, we can stop them while they're running, and when we run the program again (with or without entering a same input), they will be able to continue from where they have left. I wonder what kind of technique those programs are using?
[Edit] I am writing a program mainly based on recursion functions. Within my knowledge, I think it is incredibly difficult to save such states in my program. Is there any technique, somehow, saves the stack contents, function calls, and data involved in my program, and then when it is restarted, it can run as if it hasn't been stopped? This is just some concepts I got in my mind, so please forgive me if it doesn't make sense...
It's going to be different for every program. For something as simple as, say, a brute force password cracker all that would really need to be saved was the last password tried. For other apps you may need to store several data points, but that's really all there is too it: saving and loading the minimum amount of information needed to reconstruct where you were.
Another common technique is to save an image of the entire program state. If you've ever played with a game console emulator with the ability to save state, this is how they do it. A similar technique exists in Python with pickling. If the environment is stable enough (ie: no varying pointers) you simply copy the entire apps memory state into a binary file. When you want to resume, you copy it back into memory and begin running again. This gives you near perfect state recovery, but whether or not it's at all possible is highly environment/language dependent. (For example: most C++ apps couldn't do this without help from the OS or if they were built VERY carefully with this in mind.)
Use Persistence.
Persistence is a mechanism through which the life of an object is beyond programs execution lifetime.
Store the state of the objects involved in the process on the local hard drive using serialization.
Implement Persistent Objects with Java Serialization
To achieve this, you need to continually save state (i.e. where you are in your calculation). This way, if you interrupt the probram, when it restarts, it will know it is in the middle of calculation, and where it was in that calculation.
You also probably want to have your main calculation in a separate thread from your user interface - this way you can respond to "close / interrupt" requests from your user interface and handle them appropriately by stopping / pausing the thread.
For linux, there is a project named CRIU, which supports process-level save and resume. It is quite like hibernation and resuming of the OS, but the granularity is broken down to processes. It also supports container technologies, specifically Docker. Refer to http://criu.org/ for more information.