I want to allow yarn to update to all newer minor + patch versions, so I am using the caret.
If the latest version of "my-package" is "3.3.3", what are the behavioral differences between "my-package": "^3" and "my-package": "^3.2.1"?
If there is none, then is there a reasonable argument in favor of one or the other?
Related
Can you give the list of conditional instructions available in AVX2?
So far I've found the following:
_mm256_blendv_* for selection from a and b based on mask c
Are there something like conditional multiply and conditional add, etc.?
Also if instructions taking imm8 count (like _mm256_blend_*), could you explain how to get that imm8 after a vector comparision?
Intel Intrinsics Guide suggests gather, load and store operating with a mask. The immediate imm8 in blend_epi16 is not programmable unless self-modifying code or a jump table is considered an option. It's still possible to derive using pext from BMI2 to compact half of odd positioned bits from the result of movemask -- one gets 32 independent mask bits from movemask in AVX2, but blend_epi16 uses each bit to control four bytes--or one 16-bit variable in each bank.
AVX512 introduces optional zero-masking and merge-masking for almost all instructions.
Before that, to do a conditional add, mask one operand (with vandps or vandnps for the inverse) before the add (instead of vblendvps on the result). This is why packed-compare instructions/intrinsics produce all-zero or all-one elements.
0.0 is the additive identity element, so adding it is a no-op. (Except for IEEE semantics of -0.0 and +0.0, I forget how that works exactly).
Masking a constant input instead of blending the result avoids making the critical path longer, for something like conditionally adding 1.0.
Conditional multiply is more cumbersome because 0.0 is not the multiplicative identity. You need to multiply by 1.0 to keep a value unchanged, and you can't easily produce that with an AND or ANDN with a compare result. You can blendv an input, or you can do the multiply and blendv the output.
The alternative to blendv is at least 3 booleans, like AND/ANDN/OR, but that's usually not worth it. Although note that Haswell runs vblendvps and vpblendvb as 2 uops for port 5, so it's a potential bottleneck compared to using integer booleans that can run on any port. Skylake runs them vblendvps as 2 uops for any port. It could make sense to do something to avoid having a blendv on the critical path, though.
Masking an input operand or blending the result is generally how you do branchless SIMD conditionals.
BLENDV is usually at least 2 uops, so it's slower than an AND.
Immediate blends are much more efficient, but you can't use them, because the imm8 blend control has to be a compile-time constant embedded into the instruction's machine code. That's what immediate means in an assembly-language context.
Came across a following difference:
Example 1
var s = [];
var anArray = [1,2,3];
var iterateeFunction = function(n){
s.push(n);
}
_( anArray ).forEach( iterateeFunction )
Output:
v2.4.1 : s = [1,2,3]
v3.10.1: s = []
Example 2
_.first([1, 2, 3], 2);
Output:
v2.4.1 : [ 1, 2 ]
v3.10.1: 1
Not sure if this is really an incompatibility! #John-David Dalton: Please Advise: What was the need to change this? Compatibility issues just for this! I dont think its a sufficient cause.
This may be incorrect way of using lodash but on older versions it anyways worked.
I think there may be more such instances. So My question is
Are there anymore such instances? How big a risk it is, to change my project's current version to the latest version?
how to tackle such issues such that older functionalities remain intact + I also get to use the newer utility functions in v3.10.1?
I really like v3.10.1 and I would love to upgrade!
This difference has to do with the new lazy sequences added in version 3.
The result is that each/forEach processing, which are used for side-effects, has lead to much confusion and "bug" reports - see "Lazy each should cause immediate evaluation", "Each doesn't execute callbacks when wrapped", etc.
The change to lazy sequence processing may break any sequence-operation callback that expected immediate execution (ie. performed a side-effect) that is not otherwise immediately forced or materialized. The problem will most often be encountered in each/forEach usage - where the pass-through sequence is generally ignored - but it can also manifest in other 'unexpectedly delayed' executions.
To make the v3 code act as before, force or "unwrap" the sequence with value.
_( anArray ).forEach( iterateeFunction ).value()
// ^-- creates lazy sequence
// ^-- so this doesn't "run" the callback
// ^-- until unwrapped
or, better, use the non-wrapped form:
_( anArray, iterateeFunction )
or, better, with ES5-compatible functions (where it is an array as the object):
anArray.forEach(iterateFunction)
"foo": "~0.2.1"
"foo": ">= 0.2.1"
What's the difference?
>= means any version equal or bigger to the mentioned release. For example 42.42.42 would be fine with >= 0.2.1 requirement (no matter how incompatible it would be in practice). Also, it means that 0.2.1-beta is not fine, as beta was before final release.
~ means reasonably close to the specified version (as in, compatible). It takes the semantic versioning definition, so any major version jumps aren't considered to be compatible (higher than the last number in specified version). For example, 42.42.42 or 0.3.0 is not fine with ~0.2.1 requirement. However, 0.2.1-beta or 0.2.42 is allowed, as it's reasonably close to the final version.
Tilde means next significant release. In your case, it is equivalent to >= 2.0, < 3.0.
A simple rule-of-thumb way is that the ~ allows the last digit to go up. e.g. ~2.2 means 2.2 and any 2.x where x is 2 or above. ~2.1.3 on the is also any 2.1.x where x is 3 or above.
http://getcomposer.org/doc/01-basic-usage.md#package-versions
On a modern Pentium it is no longer possible to give branching hints to the processor it seems. Assuming that a profiling compiler such as gcc with profile-guided optimization gains information about likely branching behavior, what can it do to produce code that will execute more quickly?
The only option I know of is to move unlikely branches to the end of a function. Is there anything else?
Update.
http://download.intel.com/products/processor/manual/325462.pdf volume 2a, section 2.1.1 says
"Branch hint prefixes (2EH, 3EH) allow a program to give a hint to the processor about the most likely code path for
a branch. Use these prefixes only with conditional branch instructions (Jcc). Other use of branch hint prefixes
and/or other undefined opcodes with Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable
behavior."
I don't know if these actually have any effect however.
On the other hand section 3.4.1. of http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf says
"
Compilers generate code that improves the efficiency of branch prediction in Intel processors. The Intel
C++ Compiler accomplishes this by:
keeping code and data on separate pages
using conditional move instructions to eliminate branches
generating code consistent with the static branch prediction algorithm
inlining where appropriate
unrolling if the number of iterations is predictable
With profile-guided optimization, the compiler can lay out basic blocks to eliminate branches for the most
frequently executed paths of a function or at least improve their predictability. Branch prediction need
not be a concern at the source level. For more information, see Intel C++ Compiler documentation.
"
http://cache-www.intel.com/cd/00/00/40/60/406096_406096.pdf says in "Performance Improvements with PGO "
"
PGO works best for code with many frequently executed branches that are difficult to
predict at compile time. An example is the code with intensive error-checking in which
the error conditions are false most of the time.
The infrequently executed (cold) errorhandling code can be relocated so the branch is rarely predicted incorrectly. Minimizing
cold code interleaved into the frequently executed (hot) code improves instruction cache
behavior."
There are two possible sources for the information you want:
There's Intel 64 and IA-32 Architectures Software Developer's Manual (3 volumes). This is a huge work which has evolved for decades. It's the best reference I know on a lot of subjects, including floating-point. In this case, you want to check volume 2, the instruction set reference.
There's Intel 64 and IA-32 Architectures Optmization Reference Manual. This will tell you in somewhat brief terms what to expect from each microarchitecture.
Now, I don't know what you mean by a "modern Pentium" processor, this is 2013, right? There aren't any Pentiums anymore...
The instruction set does support telling the processor if the branch is expected to be taken or not taken by a prefix to the conditional branch instructions (such as JC, JZ, etc). See volume 2A of (1), section 2.1.1 (of the version I have) Instruction Prefixes. There is the 2E and 3E prefixes for not taken and taken respectively.
As to whether these prefixes actually have any effect, if we can get that information, it will be on Optimization Reference Manual, the section for the microarchitecture you want (and I'm sure it won't be the Pentium).
Apart from using those, there is an entire section on the Optimization Reference Manual on that subject, that's section 3.4.1 (of the version I have).
It makes no sense to reproduce that here, since you can download the manual for free.
Briefly:
Eliminate branches by using conditional instructions (CMOV, SETcc),
Consider the static prediction algorithm (3.4.1.3),
Inlining
Loop unrolling
Also, some compilers, GCC, for instance, even when CMOV is not possible, often perform bitwise arithmetic to select one of two distinct things computed, thus avoiding branches. It does this particularly with SSE instructions when vectorizing loops.
Basically, the static conditions are:
Unconditional branches are predicted to be taken (... kind of expectable...)
Indirect branches are predicted not to be taken (because of a data dependency)
Backward conditionals are predicted to be taken (good for loops)
Forward conditionals are predicted not to be taken
You probably want to read the entire section 3.4.1.
If it's clear that a loop is rarely entered, or that it normally iterates very few times, then the compiler might avoid unrolling the loop, as doing so can add a lot of harmful complexity to handle edge conditions (an odd-number iterations, etc.). Vectorisation, in particular, should be avoided in such cases.
The compiler might rearrange nested tests, so that the one that most frequently results in a short-cut can be used to avoid performing a test on something with a 50% pass rate.
Register allocation can be optimised to avoid having a rarely-used block force register spill in the common case.
These are just some examples. I'm sure there are others I haven't thought of.
Off the top of my head, you have two options.
Option #1: Inform the compiler of the hints and let the compiler organize the code appropriately. For example, GCC supports the following ...
__builtin_expect((long)!!(x), 1L) /* GNU C to indicate that <x> will likely be TRUE */
__builtin_expect((long)!!(x), 0L) /* GNU C to indicate that <x> will likely be FALSE */
If you put them in macro form such as ...
#if <some condition to indicate support>
#define LIKELY(x) __builtin_expect((long)!!(x), 1L)
#define UNLIKELY(x) __builtin_expect((long)!!(x), 0L)
#else
#define LIKELY(x) (x)
#define UNLIKELY(x) (x)
#endif
... you can now use them as ...
if (LIKELY (x != 0)) {
/* DO SOMETHING */
} else {
/* DO SOMETHING ELSE */
}
This leaves the compiler free to organize the branches according to static branch prediction algorithms, and/or if the processor and compiler support it, to use instructions that indicate which branch is more likely to be taken.
Option #2: Use math to avoid branching.
if (a < b)
y = C;
else
y = D;
This could be re-written as ...
x = -(a < b); /* x = -1 if a < b, x = 0 if a >= b */
x &= (C - D); /* x = C - D if a < b, x = 0 if a >= b */
x += D; /* x = C if a < b, x = D if a >= b */
Hope this helps.
It can make the fall-through (ie the case where a branch is not taken) the most used path. That has two big effects:
only 1 branch can be taken per clock, or on some processors even per 2 clocks, so if there are any other branches (there usually are, most code that matters is in a loop), a taken branch is bad news, a non-taken branch less so.
when the branch predictor is wrong, the code that it does have to execute is more likely to be in the code cache (or µop cache, where applicable). If it wasn't, that would have been a double-whammy of restarting the pipeline and waiting for a cache miss. This is less of an issue in most loops, since both sides of the branch are likely to be in the cache, but it comes into play in big loops and other code.
It can also decide whether to do if-conversion based on better data than a heuristic guess. If-conversions may seem like "always a good idea", but they're not, they're only "often a good idea". If the branch in the branching implementation is very well-predicted, the if-converted code can well be slower.
I need a compile time check for what version of glibc will be used.
The only compile time checks (ie #defines) I can find return the glibc date (__GLIBCXX__) and correspondence between the date and version seems iffy. How do you check at compile time for the version of glibc that will be used?
My code will compile and run on several systems, including a very old one. In particular I am interested in using malloc_info (see http://man7.org/linux/man-pages/man3/malloc_info.3.html). This was added to glibc in version 2.10. The program will be used on the same (or an identical system) it was built on.
I think what you're looking for is __GLIBC__ and __GLIBC_MINOR__, which represent an int of the major and minor version numbers of the GNU C Library. Have a look at this(archive link) for more details.
So if __GLIBC__ is greater than 2, or __GLIBC__ is equal to 2 and __GLIBC_MINOR__ is greater than or equal to 10, then malloc_info() should work.