Bcrypt - How many iterations/cost?

Bcrypt - How many iterations/cost? - cryptography

I've read some articles saying you should set the cost to be at least 16 (216), yet others say 8 or so is fine.
Is there any official standard for how high the cost should be set to?

The cost you should use depends on how fast your hardware (and implementation) is.
Generally speaking a cost of 8 or 10 is fine -- there isn't any noticable delay. It's still a huge level of protection and far better than any home-grown solution using SHAs and salts. Once you upgrade your hardware you could increase the cost to 16. I would say that 16 is a little high at this time, and will probably result in noticeable (and annoying) delays. But if 16 works for you, by all means go for it!

You must set the number of iterations at the maximum value which is still "tolerable" depending on the hardware you use and the patience of the users. Higher is better.
The whole point of the iteration count is to make the password processing slow -- that is, to make it slow for the attacker who "tries" potential passwords. The slower the better. Unfortunately, raising the iteration count makes it slow for you too...
As a rule of thumb, consider that an attacker will break passwords by trying, on average, about 10 millions (107) of potential passwords. If you set the iteration count so that password hashing takes 1 second for you, and you consider that the attacker can muster ten times more computing power than you, then it will take him 107*1/10 seconds, i.e. about 12 days. If you set the iteration count so that password hashing takes only 0.01 second on your PC, then the attacker is done in three hours.

The cost should depend on your hardware.
You should test your cost settings and aim for the 100 .. 500 ms interval. Of course, if you are working with highly sensitive information, the time could be 1000 ms or even more.

Related

Effect of more number of functions in an application

If I have a large number of functions in my application, Do they effect the execution speed of the application?
For example: I have 10000 functions in my application but each time that I run my application only 1 or 2 functions will work. It is not known beforehand which function(s) will be called, it depends on user given input.
Does it changes the execution speed it I have many number of functions?

The speed shouldn't be significantly affected in your case. The number of procedures defined is much less important than the computational complexity of each procedure called.
Think about it. A 2.5GHz processor can theoretically perform more than 10 billion floating point operations per second (FLOPS). The time required to load a fixed number of procedures into memory, even a million lines of code, will remain constant and fairly trivial, but if one of your procedures is complex enough, the number of operations can increase massively over a comparatively few iterations.

9,998 functions not used, but still in since they are referenced, does not affect performance unless you need to parse all code at each run.
I'm thinking the case analysis size might affect the performance. If you have 10,000 fucntions and only use about 2 each time, then you'll have about 5,000 outcomes and that means a lot of tests if it's a linear analysis or about 13 if it's binary.
I'd start with profiling the code to find the bottlenecks.

Why is the n+1 selects pattern slow?

I'm rather inexperienced with databases and have just read about the "n+1 selects issue". My follow-up question: Assuming the database resides on the same machine as my program, is cached in RAM and properly indexed, why is the n+1 query pattern slow?
As an example let's take the code from the accepted answer:
SELECT * FROM Cars;
/* for each car */
SELECT * FROM Wheel WHERE CarId = ?
With my mental model of the database cache, each of the SELECT * FROM Wheel WHERE CarId = ? queries should need:
1 lookup to reach the "Wheel" table (one hashmap get())
1 lookup to reach the list of k wheels with the specified CarId (another hashmap get())
k lookups to get the wheel rows for each matching wheel (k pointer dereferenciations)
Even if we multiply that by a small constant factor for an additional overhead because of the internal memory structure, it still should be unnoticeably fast. Is the interprocess communication the bottleneck?
Edit: I just found this related article via Hacker News: Following a Select Statement Through Postgres Internals. - HN discussion thread.
Edit 2: To clarify, I do assume N to be large. A non-trivial overhead will add up to a noticeable delay then, yes. I am asking why the overhead is non-trivial in the first place, for the setting described above.

You are correct that avoiding n+1 selects is less important in the scenario you describe. If the database is on a remote machine, communication latencies of > 1ms are common, i.e. the cpu would spend millions of clock cycles waiting for the network.
If we are on the same machine, the communication delay is several orders of magnitude smaller, but synchronous communication with another process necessarily involves a context switch, which commonly costs > 0.01 ms (source), which is tens of thousands of clock cycles.
In addition, both the ORM tool and the database will have some overhead per query.
To conclude, avoiding n+1 selects is far less important if the database is local, but still matters if n is large.

Assuming the database resides on the same machine as my program
Never assume this. Thinking about special cases like this is never a good idea. It's quite likely that your data will grow, and you will need to put your database on another server. Or you will want redundancy, which involves (you guessed it) another server. Or for security, you might want not want your app server on the same box as the DB.
why is the n+1 query pattern slow?
You don't think it's slow because your mental model of performance is probably all wrong.
1) RAM is horribly slow. Your CPU is wasting around 200-400 CPU cycles each time it needs to read something something from RAM. CPUs have a lot of tricks to hide this (caches, pipelining, hyperthreading)
2) Reading from RAM is not "Random Access". It's like a hard drive: sequential reads are faster.
See this article about how accessing RAM in the right order is 76.6% faster http://lwn.net/Articles/255364/ (Read the whole article if you want to know how horrifyingly complex RAM actually is.)
CPU cache
In your "N+1 query" case, the "loop" for each N includes many megabytes of code (on client and server) swapping in and out of caches on each iteration, plus context switches (which usually dumps the caches anyway).
The "1 query" case probably involves a single tight loop on the server (finding and copying each row), then a single tight loop on the client (reading each row). If those loops are small enough, they can execute 10-100x faster running from cache.
RAM sequential access
The "1 query" case will read everything from the DB to one linear buffer, send it to the client who will read it linearly. No random accesses during data transfer.
The "N+1 query" case will be allocating and de-allocating RAM N times, which (for various reasons) may not be the same physical bit of RAM.
Various other reasons
The networking subsystem only needs to read one or two TCP headers, instead of N.
Your DB only needs to parse one query instead of N.
When you throw in multi-users, the "locality/sequential access" gets even more fragmented in the N+1 case, but stays pretty good in the 1-query case.
Lots of other tricks that the CPU uses (e.g. branch prediction) work better with tight loops.
See: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/13/10533875.aspx

Having the database on a local machine reduces the problem; however, most applications and databases will be on different machines, where each round trip takes at least a couple of milliseconds.
A database will also need a lot of locking and latching checks for each individual query. Context switches have already been mentioned by meriton. If you don't use a surrounding transaction, it also has to build implicit transactions for each query. Some query parsing overhead is still there, even with a parameterized, prepared query or one remembered by string equality (with parameters).
If the database gets filled up, query times may increase, compared to an almost empty database in the beginning.
If your database is to be used by other application, you will likely hammer it: even if your application works, others may slow down or even get an increasing number of failures, such as timeouts and deadlocks.
Also, consider having more than two levels of data. Imagine three levels: Blogs, Entries, Comments, with 100 blogs, each with 10 entries and 10 comments on each entry (for average). That's a SELECT 1+N+(NxM) situation. It will require 100 queries to retrieve the blog entries, and another 1000 to get all comments. Some more complex data, and you'll run into the 10000s or even 100000s.
Of course, bad programming may work in some cases and to some extent. If the database will always be on the same machine, nobody else uses it and the number of cars is never much more than 100, even a very sub-optimal program might be sufficient. But beware of the day any of these preconditions changes: refactoring the whole thing will take you much more time than doing it correctly in the beginning. And likely, you'll try some other workarounds first: a few more IF clauses, memory cache and the like, which help in the beginning, but mess up your code even more. In the end, you may be stuck in a "never touch a running system" position, where the system performance is becoming less and less acceptable, but refactoring is too risky and far more complex than changing correct code.
Also, a good ORM offers you ways around N+1: (N)Hibernate, for example, allows you to specify a batch-size (merging many SELECT * FROM Wheels WHERE CarId=? queries into one SELECT * FROM Wheels WHERE CarId IN (?, ?, ..., ?) ) or use a subselect (like: SELECT * FROM Wheels WHERE CarId IN (SELECT Id FROM Cars)).
The most simple option to avoid N+1 is a join, with the disadvantage that each car row is multiplied by the number of wheels, and multiple child/grandchild items likely ending up in a huge cartesian product of join results.

There is still overhead, even if the database is on the same machine, cached in RAM and properly indexed. The size of this overhead will depend on what DBMS you're using, the machine it's running on, the amount of users, the configuration of the DBMS (isolation level, ...) and so on.
When retrieving N rows, you can choose to pay this cost once or N times. Even a small cost can become noticeable if N is large enough.
One day someone might want to put the database on a separate machine or to use a different dbms. This happens frequently in the business world (to be compliant with some ISO standard, to reduce costs, to change vendors, ...)
So, sometimes it's good to plan for situations where the database isn't lightning fast.
All of this depends very much on what the software is for. Avoiding the "select n+1 problem" isn't always necessary, it's just a rule of thumb, to avoid a commonly encountered pitfall.

Varnish: performance impact large ban list

We were wondering if anyone has experience with a large amount of bans in Varnish. We consider a ban strategy which could result in a couple of hundred (smart) bans each night (on X million cache objects).
Although I am aware that this is highly dependent on environment variables we were wondering if this have a significant performance impact.

Bans are quite CPU intensive so care should be taken not to overuse them. If you do, CPU usage will rise and you'll notice a huge amount of regular expression matches will be executed each second.
In general one ban will match against every object in memory at the point it is entered, so having a million object each ban will result in a million ban evaluation. This might sound like a lot but modern servers are fast and today a modern server is be capable of doing tens of millions of regular expression matches each second. My four year old laptop does something like 15 million regex matches a second running on a single core, just to give you an idea of the scale.
In addition there is another feature of Varnish that comes into play. The ban lurker. The ban lurker is a thread that walks the cache and evaluates bans trying to kill of objects before they are requested, thereby reducing the size of the ban list. If your bans don't use the req object they are candidates for evaluation by the lurker. If you plan on using a few bans you should take care to write your bans in a lurker friendly fashion. So called "smart bans", which you seem to be familiar with.
All in all I think your setup sounds sane. Issuing a couple of hundred smart bans with a few million objects in cache will probably work just fine. There will of course be a bit of CPU load when the bans are deployed and the TTFB will increase somewhat, but I think you'll be fine. You might want to play somewhat with the parameters that control how the ban lurker works, but try the defaults first, they are pretty sane.

Centralized storage for large text files

What should do system : store/manage centralized large(100 - 400 mb) text files
What to store : lines from text file, for some files lines must be unique, metadata about file(filename, comment, last update etc.) also must be stored position in file( on same file may be different positions for different applications)
Operations : concurrent get lines from file (100 - 400 lines on query), add lines(also 100 - 400 lines), exporting is not critical - can be scheduled
So which storage to use SQL DBMS - too slow, i think, maybe a noSQL solution ?

NoSQL: Cassandra is an option (you can store it line by line or groups of lines I guess), Voldemort is not too bad, you might even get away with using MongoDB but not sure it fits the "large files" requirement.

400 MiB will be completely served from the caches on every non-ridiculous database server. Insofar, the choice of database does not really matter too much, any database will be able to deliver fast (though there are different kinds of "fast", it depends what you need).
If you are really desperate for raw speed, you can go with something like redis. Again, 400 MiB is no challenge for that.
SQL might be slightly slower (but not that much) but has the huge advantage of being flexible. Flexibility, generality, and the presence of a "built-in programming language" are not free, but they should not have a too bad impact, because either way returning data from the buffer cache works more or less at the speed of RAM.
If you ever figure that you need a different database at a later time, SQL will let you do it with a few commands, or if you ever want something else you've not planned for, SQL will do. There is no guarantee that doing something different will be feasible with a simple key-value store.
Personally, I wouldn't worry about performance for such rather "small" datasets. Really, every kind of DB will serve that well, worry not. Come again when your datasets are several dozens of gigabytes in size.
If you are 100% sure that you will definitively never need the extras that a fully blown SQL database system offers, go with NoSQL to shave off a few microseconds. Otherwise, just stick with it to be on the safe side.
EDIT:
To elaborate, consider that a "somewhat lower class" desktop has upwards of 2 GiB (usually rather 4 GiB) nowadays, and a typical "no big deal" server has something like 32 GiB. In that light, 400 MiB is nothing. Typical network uplink on a server (unless you are willing to pay extra) are 100 mibit/s.
A 400 MiB text file might have somewhere around a million lines. That boils down to 6-7 memory accesses for a "typical SQL server", and to 2 memory accesses plus the time needed to calculate a hash for a "typical NoSQL server". Which is, give or take few a dozen cycles, the same in either case -- something around a half a microsecond on a relatively slow system.
Add to that a few dozen microseconds the first time a query is executed, because it must be parsed, validated, and optimized, if you use SQL.
Network latency is somewhere around 2 to 3 milliseconds if you're lucky. That's 3 to 4 orders of magnitude more for establishing a connection, sending a request to the server, and receiving an answer. Compared to that, it seems ridiculous to worry whether the query takes 517 or 519 microseconds. If there are 1-2 routers in between, it becomes even more pronounced.
The same is true for bandwidth. You can in theory push around 119 MiB/s over a 1 Gibit/s link assuming maximum sized frames and assuming no ACKs and assuming absolutely no other traffic, and zero packet loss. RAM delivers in the tens of GiB per second without trouble.

FLOPS assigned to sqrt in GPU to measure performance and global efficiency

In a GPU implementation we need to estimate its performance in terms of GLOPS. The code is very basic, but my problem is how many FLOPS should I give to the operations "sqrt" or "mad", whether 1 or more.
Besides, I obtain 50 GFLOPS for my code if 1 say 1 FLOP for these operations, while the theoretical maximum for this GPU is 500GFLOPS. If I express it in precentages I get 10 %. In terms of speedup I get 100 times. So I think it is great, but 10% seems to be a bit low yield, what do you think?
Thanks

The right answer is probably "it depends".
For pure comparative performance between code run on different platforms, I usually count transcendentals, sqrt, mads, as one operation. In that sort of situation, the key performance metric is how long the code takes to run. It is almost impossible to do the comparison any other way - how would you go about comparing the "FLOP" count of a hardware instruction for a transcendental which takes 25 cycles to retire, versus a math library generated stanza of fmad instructions which also takes 25 cycles to complete? Counting instructions or FLOPs becomes meaningless in such a case, both performed the desired operation in the same amount of clock cycles, despite a different apparent FLOP count.
On the other hand, for profiling and performance tuning of a piece of code on given hardware, the FLOP count might be a useful metric to have. In GPUs, it is normal to look at FLOP or IOP count and memory bandwidth utilization to determine where the performance bottleneck of a given code lies. Having those numbers might point you in the direction of useful optimizations.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas