WHY do we use AQAB (2^16+1) as public exponent in RSA? - cryptography

So I have been learning RSA and really trying to understand it from a math perspective. I think I have a pretty good handle on it now except for one thing.
In all the explanations and tutorials I have studied, when it comes time to making a public exponent, it’s usually randomly generated as a coprime of (p-1)(q-1). Usually this is a small number because it’s easier to learn that way.
However, in the real world I understand it’s pretty common practice to use AQAB as the public exponent. Upon googling this I find plenty of people telling me WHAT AQAB is, but I’m not finding sufficient answers as to WHY it is used.
My assumption is that randomizing the public exponent is not very important - as opposed to randomizing p and q. And I know I read that 65537 is kind of a sweet spot between security and speed. But with that said I’m still having trouble grasping the WHY.

Related

Where can I find good explanations of Computability and Complexity?

I have a repeat coming up in Computability and Complexity and I was wondering if anybody has good resources for this sort of study.
Things like regular languages, context free and context sensitive languages and all that sort of stuff.
For example:
As you can see, it is a horribly phrased question. The notes our lecturer gave us are equally as bad. I really need to pass this module so if anybody has a good resource for studying these topics it would be much appreciated.
I think the problem you're having is not the fault of the phrasing, but the fact that you're not yet comfortable dealing with the mathematical notation involved.
Wikipedia has a lot of articles on automata and other computer science theory topics. Also, a google search on 'NFA to DFA' turns up many helpful results. Automata are used heavily in compilers, so you might find a more "practical" explanation of things in material from a compilers course.
Your class is going to be heavily mathematical, though, so you would do best for yourself by putting aside the attitude that the material you've been given is poor and spend the time learning to understand it. Mathematical formulations give you precise and concise descriptions without as much room for misinterpretation as informal language has.
You may want to look at the class notes made available by Avi Kak at
https://engineering.purdue.edu/kak/courses-i-teach/ECE664/Index.html
See the handwritten notes on Lecture 17 that explain the notation in your question.

How to get better at optimization?

In advance apologize if the question seems somewhat broad or strange, I don't mean to offend anyone, but maybe someone can actually make a recommendation. I tried looking for the similar questions, but cold not.
Which are the better resources (books, blogs etc.) that can teach about optimizing code?
There is quite a few resources on making code more human-readable (Code Complete being number one choice probably). But what about making it run faster, more memory-efficient?
Of course there are lots of books on each particular language, but I wonder if there are some that cover the problems of memory / speed of operations and are somewhat language-independent?
Here are some links that might be helpful in general on the subject of memory optimizations
What Every Programmer Should Know About Memory by Ulrich Drepper
Herb Sutter: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Slides: Herb Sutter: Machine Architecture (Things Your Programming Language Never Told You)
Video: Herb Sutter # NWCPP: Machine Architecture: Things Your Programming Language Never Told You
The microarchitecture of Intel, AMD and VIA CPUs
An optimization guide for assembly programmers and compiler makers, by Agner Fog
Read Structured Programming with go to Statements. While it's the source of the "premature optimisation is the source of all evil" quote that comes up the moment somebody wants to make anything faster or smaller - no matter how desperately important or late in the process they are - it's actually about the importance of making things efficient when you can.
Learn about time complexity, space complexity and the analysis of algorithms.
Come up with examples where you would want to sacrifice having worse space complexity for better time complexity, and vice versa.
Know the time and space complexities of the algorithms and data structures your languages and frameworks of choice offer, especially those you use most often.
Read the answers on this site on questions about creating a good hash code.
Study the approach HTTP took to having the advantage of caching, without the disadvantage of using stale data inappropriately. Consider how easy or difficult that is to apply to in-memory caches. Consider when you would say "screw it, I can live with being stale for the speed boost it gives me". Consider when you would say "screw it, I can live with being slow for the guarantee of freshness it gives me".
Learn how to multithread. Learn when it improves performance. Learn why it often doesn't or even makes things worse.
Look at a lot of Joe Duffy's blog where performance is a regular concern of his writing.
Learn how to process items as streams or iterations rather than building and rebuilding data-structures full of each item, each time. Learn when you're actually better off not doing that.
Know what things cost. You can't reasonably decide "I'll work so this is in the CPU cache rather than main-memory/main-memory rather than disk/disk rather than over a network" unless you've a good idea what actually causes each to be hit, and what the cost differences are. Worse, you can't dismiss something as premature optimisation if you don't know what they cost - not bothering to optimise something is often the best choice, but if you don't even consider it in passing you aren't "avoiding premature optimisation", you're muddling through and hoping it works.
Learn a bit about what optimisations are done for you by the script engine/jitter/compiler/etc you use. Learn how to work with them rather than against them. Learn not to re-do work it'll do for you anyway. In one or two cases, you may also be able to apply the same general principle to your work.
Search for cases on this site where something is dismissed as an implementation detail - yes, all of those are cases where the detail in question isn't the most important thing at the time, but all of those implementation details were chosen for a reason. Learn what they were. Learn the counter-arguments.
Edit (I'll keep adding a few more to this as I go):
Different books of course differ in the emphasis they put on efficiency concerns, but I remember Stroustrup's The C++ Programming Language as one where there were a good few times where he will explain a choice between a few different options as relating to efficiency, and also on how to not have decisions made for efficiency's sake impact on the usability of the classes "from the outside".
Which brings me to another point. Concentrate on the efficiency of the library code you reuse in different projects. You don't want to ever be thinking "maybe I should hand-roll a new one here to be more efficient", unless it's a very specialised case, you want to be confident that lots of work went into making that heavily used class efficient over a lot of case, and concentrate on identifying hot-spots.
As for specialised cases, some of the more obscure data structures are worth knowing for the cases they serve. For example, a DAWG is a very compact structure for storing strings with a lot of common prefixes and suffixes (which would be most words in most natural languages) where you just want to find those in the list that match a pattern. If you need a "payload" then a tree where each letter has a list of nodes for each subsequent letter (a generalisation of a DAWG but ending in that "payload" rather than the terminal node) has some but not all of the advantages. They also find the result in O(n) time where n is the length of the string sought.
How often will that come up? Not many. It came up for me once (a few times really, but they were variants of the same case), and as such it would not have been worth it for me to learn all there was to know about DAWGs until then. But I knew enough to know it was what I needed to research later, and it saved me gigabytes (really, from way too much for a machine with 16GB RAM to cope with, to less than 1.5GB). Going straight for a hand-rolled DAWG would totally be premature optimisation rather than putting the strings in a hashset, but flicking through the NIST datastructure site meant I could when it came up.
Consider: "Finding a string in a DAWG is O(n)" "Finding a string in a Hashset is O(1)" Both of these statements is true, but the speed of the two tends to be comparable. Why? Because the DAWG is O(n) in terms of the length of the string, and effectively O(1) in terms of the size of the DAWG. The Hashset is O(1) in terms of the size of the hashset, but working out the hash is typically O(n) in terms of the length of the string, and equality checks are also O(n) in terms of that length. Both statements were correct, but they were thinking about a different n! You always need to know what n means in any discussion of time and space complexity - most often it'll be the size of the structure, but not always.
Don't forget constant effects: O(n²) is the same as O(1) for sufficiently low values of n! Remember that the likes of O(n²) translates as n²*k + n * k₁ + k₂, with the assumption that k₁ & k₂ are low enough and k and the k of another algorithm or structure we are comparing of are close enough, that they don't really matter and it's only n² that we care about. This isn't true all the time, and we can sometimes find that k, k₁ or k₂ are high enough that we end up in trouble. It's also not true when n is going to be so small as to make the difference in the constant costs of different approaches matter. Of course normally when n is small we don't have a big efficiency concern, but what if we are doing m operations on structures averaging n in size, and m is large. If we are choosing between an O(1) and a O(n²) approach, we are choosing between an O(m) and O(n²m) approach overall. It still seems like a no-brainer in favour of the former, but with a low n it essentially becomes a choice between two different O(m) approaches, and the constant factors are much more important.
Learn about lock-free multi-threading. Or perhaps don't. Personally, I've two pieces of my own code I use professionally that use all but the simplest lock-free techniques. One is based on well-known approaches and I wouldn't bother now (it's .NET code first written for .NET2.0 and the .NET4.0 library supplies a class that does the same thing). The other I first wrote for fun, and only used after that just-for-fun period had given me something reliable (and it still gets beaten by something in the 4.0 library for a lot of cases, but not for some others that I care about). I would hate to have to write something like it with a deadline and a client in mind.
All that said, if you're coding out of interest, the challenges involved are interesting and it's an enjoyable thing to work with when you've the freedom to give up on a failed plan that you don't get when you're doing something for a paying client, and you'll certainly learn a lot about efficiency concerns generally. (Take a look at https://github.com/hackcraft/Ariadne if you want to see some of what I've done with this).
A Case Study
Actually, that contains a relatively good example of some of the above principles. Take a look at the method that's currently at line 511 at https://github.com/hackcraft/Ariadne/blob/master/Collections/ThreadSafeDictionary.cs (where I joke in the comments about it being flame-bait for people quoting Dijkstra. Let's use it as a case-study:
This method was first written to use recursion, because it's a naturally recursive problem - after doing the operation on the current table, if there's a "next" table we want to do the exact same operation on that, and so on until there's no further table.
Recursion is almost always slower than iteration, for a few different methods. Should we make all recursive calls iterative? No, it's often not worth it, and recursion is a wonderful way to write code that is clear about what it's doing. Here though I apply the principle above that since this is a library that might be called where performance is crucial, particular effort should be extended on it.
The decision to try to improve its speed being made, the next thing I did was make measurements. I don't depend on "I know that iteration is faster than recursion, so it must be faster when changed to avoid recursion". That's just not true - a poorly written iterative version may not be as good as a well-written recursive version.
The next question is, just how to re-write it. I've a tested method that I know works and I'm going to replace it with a different version. I don't want to replace it with a version that doesn't work, obviously, so how to re-write while taking the most advantage out of what's already there?
Well, I know about tail-call elimination; an optimisation normally done by compilers that changes the way the stack is managed so that recursive functions end up with properties closer to those of iterative (it's still recursive from the perspective of the source code, but it's iterative in terms of how the compiled code actually uses the stack).
This gives me two things to think about: 1. Maybe the compiler is already doing this, in which case my extra work isn't going to do anything to help. 2. If the compiler isn't already doing this, I can take the same basic approach manually.
That decision made, I replaced all of the points where the method called itself, with a change to the one parameter that would be different for that next call, and then go back to the beginning. I.e. instead of having:
CurrentMethod(param0.next, param1, param2, /*...*/);
We have:
param0 = param0.next;
goto startOfMethod;
That being done, I measure again. Running through the entire unit tests for the class is now consistently 13% faster than before. If it were closer I'd have tried more detail measurements, but a consistent 13% on runs that includes code that doesn't even call this method is something I'm pretty happy with. (It also tells me that the compiler wasn't doing the same optimisation, or I wouldn't have gained anything).
Then I clean up the method to make more changes that make sense with the new code. Most of them let me take out the goto because goto is indeed nasty (and there's other places the same optimisation was done that aren't as obvious because the goto was refactored entirely). In some, I left it in, because 13% is worth breaking the no-goto rule to my mind!
So the above gives an example of:
Deciding where to concentrate optimisation effort (based on how often it might be hit and my inability to predict all uses of the library)
Using knowledge of general costs (recursion costs more than iteration, most of the time).
Measuring rather than depending on assuming the above always applies.
Learning from what compilers do.
Understanding that because of that I may not gain anything - maybe the compiler already did it for me.
Avoiding optimisations leading to unreadable code (refactoring out most of the gotos the first pass introduced).
Some of these are matters of opinion and style (the decision to leave in some goto would not be without controversy), and it's certainly okay to disagree with my decisions, but knowledge of the points raised so far in this post would make it an informed disagreement, rather than a knee-jerk one.
In addition to the resources mentioned in other answers, Michael Abrash's Graphics Programming Black Book is a great read for learning about optimization. While the specifics are a bit dated in places, it is still a great resource for learning about how to approach optimization.
Any time you want to optimize code it is absolutely essential to measure, measure, measure. One of the best ways to learn about optimization is by doing - take some code you want to optimize, learn how to use a profiler to measure its performance and then make changes and measure the results.

How can I decide when to use linear programming?

When I look at optimization problems I see a lot of options. One is linear programming. I understand in abstract terms how LP works, but I find it difficult to see whether a particular problem is suitable for LP or not. Are there any heuristics that can help guide this decision?
For example, the work described in Is there a good way to do this type of mining? took weeks before I saw how to structure the problem correctly. Is it possible to know "in advance" that problem could be solved by LP, without first seeing "how to phrase it"?
Is there a checklist I can use to decide whether a problem is suitable for LP? Is there a standard (readable) reference for this topic?
Heuristics (and/or checklists) to decide if the problem at hand is really a Linear Program.
Here's my attempt at answering, and I have also tried to outline how I'd approach this problem.
Questions that indicate that a given problem is suitable to be formulated as an LP/IP:
Are there decisions that need to be taken regularly, at different time intervals?
Are there a number of resources (workers, machines, vehicles) that need to be assigned tasks? (hours, jobs, destinations)
Is this a routing problem, where different "points" have to be visited?
Is this a location or a "layout" problem? (Whole class of Stock-cutting problems fall into this group)
Answering yes to these questions means that an LP formulation might work.
Commonly encountered LP's include: Resource allocation.: (Assignment, Transportation, Trans-shipment, knapsack) ,Portfolio Allocation, Job Scheduling, and network flow problems.
Here's a good list of LP Applications for anyone new to LPs or IPs.
That said, there are literally 1000s of different types of problems that can be formulated as LP/IP. The people I've worked with (researchers, colleagues) develop an intuition. They are good at recognizing that a problem is a certain type of an Integer Program, even if they don't remember the details, which they can then look up.
Why this question is tricky to answer:
There are many reasons why it is not always straightforward to know if an LP formulation will cut it.
There is a lot of "art" (subjectivity) in the approach to modeling/formulation.
Experience helps a lot. People get good at recognizing that this problem can be "likened" to another known formulation
Even if a problem is not a straight LP, there are many clever master-slave techniques (sub-problems), or nesting techniques that make the overall formulation work.
What looks like multiple objectives can be combined into one objective function, with an appropriate set of weights attached.
Experienced modelers employ decomposition and constraint-relaxation techniques and later compensate for it.
How to Proceed to get the basic formulation done?
The following has always steered me in the right direction. I typically start by listing the Decision Variables, Constraints, and the Objective Function. I then usually iterate among these three to make sure that everything "fits."
So, if you have a problem at hand, ask yourself:
What are the Decision Variables (DV)? I find that this is always a good place to start the process of formulation. How many types of DV's are there? (Which resource gets which task, and when should it start?)
What are the Constraints?
Some constraints are very readily visible. Others take a little bit of teasing out. The constraints have to be written in terms of your decision variables, and any constants/limits that are imposed.
What is the Objective Function?
What are the quantities that need to be maximized or minimized? Note: Sometimes, it is not clear what the objective function is. That is okay, because it could well be a constraint-satisfaction problem.
A couple of quick Sanity Checks once you think your LP formulation is done:
I always try to see if a trivial solution (all 0s or all big
numbers) is not part of the solution set. If yes, then the
formulation is most probably not correct. Some constraint is
missing.
Make sure that each and every constraint is "related"' to
the Decision Variables. (I occasionally find constraints that are
just "hanging out there." This means that a "bookkeeping constraint"
has been missed.)
In my experience, people who keep at it almost always develop the needed intuition. Hope this helps.

Cocoa app - security issue

I've a question about a good way to protect a bit my cocoa app from piracy. I know that this is impossible!
So, in my app I've an isRegistered() method that runs every time the user launch the app.
This is called from the applicationDidFinishLaunching: App delegate. So if this method returns true, the app continues to execute the code, otherwise an Alert appears saying that the app is not registered and there are xx day to buy a license.
This is a good way? Because, I have no experience in this.
Thank you in advance for your help!
SOLVED
First of all, thanks to everybody! I think the same thing: any copy protection can stop the piracy. I'm trying only to solve this little bug, even if I know that someone will crack my app again.
However, it's true - the best thing is to improve the app and not waste the time to try make the piracy protection more efficient.
The solution you describe requires almost no expertise whatsoever to crack. It is trivial to change your isRegistered() function to always return true. Thus, the effort required to circumvent your protection is a tiny fraction of the effort you would have to spend implementinging all the infrastructure to support users purchasing registration codes.
In other words, you're not getting a good return on investment. There is some debate over whether the return on investment implementing piracy protection (rather than improving your product) is ever good enough (because you pit yourself against people who have nothing better to do than prove they're cleverer than you).
One good way to redress the balance of return on investment is to use pre-existing code such as AquaticPrime. That way, at least you won't have spent so much time chasing rainbows :)
I am not in shrink-wrapped software business but my friend is. And his observation after 10 years of selling his product was that it makes no sense to create too sophisticated protection because always some one will hack it. You are alone and world is infinite. It is better to invest time/money in improving your software than working on copy protection.
Also keep in mind that around 10% of will never steal and other 10% will always try. Just make sure that those 80% is able to buy your product without any other mayor obstacle. Than you could ignore those nasty 10%. Actually it is a quote from Joel Spolsky IHMO.
So your solution seems to be completely OK from technical point of view and just stay with it.
it's almost never worth implementing your own anti-piracy system, because you'll almost always spend a lot of effort on something which can then be broken very easily. Rely on a shared implementation - in this case a framework like AquaticPrime (lots of people on the macsb mailing list recommend that one) - and you're effectively relying on the framework being good enough to protect your own app as well as all the others.
The code signing framework on Leopard and later allows you to sign your code such that if it's ever tampered with, it will refuse to run - see the documentation of the kill option in the manpage.
This is a good question. Having read the answers, I think what BitDrink was really getting at was this: we know that an isRegistered() function is dead simple to hack. With the understanding that any protection system eventually will be hacked, what are some strategies for writing a function that's harder to hack than an isRegistered function that returns a boolean?
Fundamentally, any copy protection system will eventually have something that looks like this:
if (program is registered)
let the program continue
else
nagging message
end
Any hacker with a copy of GDB will eventually find that first line and write a tiny little patch to strip it out. Most copy protection systems focus on security through obscurity, i.e. making that line hard to find. You can also make this system more robust by signing the binary and checking the signature, but you'll just add another hoop for the hackers to jump through. They'll eventually find your public key and change it to their own public key so they can replace your signature. However, I believe this will significantly slow them down. Leopard offers a code signing utility, but I don't know if it can be used to prevent incorrectly signed applications from running at all.
There's no perfect solution to this problem, but there are two main things to remember:
your registration system will be broken. There is absolutely no way around this.
your reigstration system is a barrier between the user and your program. You should optimize for the (hopefully majority of) legitimate users and make this as easy to do as possible.

Really Big Numbers and Objective-C

I've been toying around with some Project Euler problems and naturally am running into a lot that require the handling of bigger than long long type numbers. I am committed to using Cocoa and Objective-C (I need to stay sharp for work) but can't find an elegant way (read: library) to handle these really big numbers.
I'd love to use GMP but is sounds like using it with Xcode is a complete world of hurt.
Does anyone know of any other options?
If I were you I would compile gmp outside XCode and use just gmp.h and libgmp.a (or libgmp.dylib) in my XCode project.
Try storing the digits in arrays.
Although you will have to write some new functions for all your arithmatic problems but thats how we were told to do it in college.
Plus the speed of calculations was pretty improved as big numbers weren't really big afterall and were not numbers really altogether
see if it helps
regards
vBigNum in vecLib implements 1024 bit integers (signed or unsigned). Is that big enough?
If you wanted to use matlab (or anything close) you could look at my implementation of a big integer form (vpi) on the file exchange.
It is rather simple. Store each digit separately. Adds and subtracts are simple, just implement a carry operation. Multiplies are best done using convolution, then a carry. Implement divide and mod operators, then a powermod operation, useful for many of the PE problems. Powers are easy - just repeated squaring and multiplication, based on the binary representation of the exponent.
This will let you solve many PE problems.
I too got the bright idea to attempt some Euler Project problems with Cocoa/Objective-C and have found it frustrating. I previously used Java and perhaps some PHP. I posted my exact problem in this thread.
I always considered using a library cheating for this project. Just write a class with the things you need. And don't be afraid to use malloc and uint64_t and so on. NSNumber is not a good idea in many cases.
On the other hand, there are many problems where the obvious solution would require huge to enormously huge numbers, and the trick is to find a way to solve the problem without using these huge numbers. (For example, what is the sum of the last thousand digits of 1,000,000 factorial)?