In Javascriptcore, question about clobberWorld(), read(World)/write(Heap) and some terminologies - webkit

I currently analyze JavaScriptCore's code base.
I know that clobberWorld() stand for notifying current operation is effectful.
But, some article says that read(World) and write(Heap) do same thing as clobberWorld().
What read() and write() means? And what World and Heap means?
Last question is about terminologies.
In DFG, what is the full name of AI and CSE??

AI means AbstractInterpreter and CSE means Common SubExpression Elimination. As for read and write, it models a DFG IR node and representing side effect, and in LICM phase you can see it more.

Related

Writing a computer program that will analyse the quality of another computer program?

I'm interested in knowing the possibilities of this. I'm working on a project that validates the skills of a software engineer, currently we validate skills based on code reviews by credentialed developers.
I know the answer if far more completed that the question, I couldn't imagine how complex the program would have to be able to analyse complex code but I am starting with basic programming interview questions.
For example, the classic FizzBuzz question:
Write a program that prints the numbers from 1 to 20. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
and below is the solution in python:
for num in range(1,21):
string = ""
if num % 3 == 0:
string = string + "Fizz"
if num % 5 == 0:
string = string + "Buzz"
if num % 5 != 0 and num % 3 != 0:
string = string + str(num)
print(string)
Question is, can we programatically analyse the validity of this solution?
I would like to know if anyone has attempted this, and if there are current implementations I can take a look at. Also if anyone has used z3, and if it is something I can use to solve this problem.
As Vilx- mentioned, correctness of programs (including whether or not they terminate) is in general known to be undecidable. However, tools such as Z3 show that relevant concrete cases can still be reasoned about, despite the general undecidability of the problem.
Static analysers typically look for "simple" problems (e.g. null dereferences, out-of-bounds accesses, numerical overflows), but are comparably fast and require little user guidance (think of guidance in the spirit of adding type annotations to your code).
A non-exhaustive (and biased) list of keywords to search for: "static analysers", "abstract interpretation"; "facebook infer", "airbus absint", "juliasoft".
Verifiers attempt to prove much richer properties, in particular functional correctness, e.g. "does this sort-implementation really sort my array (and not do anything else, e.g. deallocate some global memory or update an element reachable from the array)?" or "does that crypto-implementation really implement the crypto protocol it promises to implement?". This is a much harder task and tools from that line of research are typically rather slow, require expert users with a background in formal verification and significant user guidance.
A non-exhaustive (and biased) list of keywords to search for: "verification", "hoare logic", "separation logic"; "eth viper", "microsoft dafny", "kuleuven verifast", "microsoft f*".
Other formal methods exist, e.g. refinement (or correct-by-construction), but with even less tool support and, as far as I know, industry acceptance.
Let's put it this way: it's been mathematically proven that you CANNOT determine if a program will ever terminate. So if you want a mathematically perfect answer of if the target program is correct, you're doomed.
That said, you can still do unit tests and "linting" which will give you plenty of intetesting insights.
But for simple pieces of code like the FizzBuzz, I think that eyeballing by an experienced dev will probably bring the best results.

Evaluate Formal tools

What are all the factors one should consider in order to compare 3 formal verification tools?
Eg: Jaspergold, Onespin, Incisive.
From my little research, Jaspergold comes on top. But i want to do it myself on a project.
I have noted down some points such as
1.Supported languages(vhdl, sv, verilog, sva, psl,etc)
2.GUI
3.Capability(how much big design can they handle)
4.Number of Evaluation cycles
5.Performance(How fast they find proof or counter example)
With what other features can i extend this list?
Thanks!

Prolog game programming board evaluation

I have created a game, (4 in a row), in prolog. My heuristic function requires me to know how many Player's and Opponent's chips are in each possible 4-row combination on the board. The method I am using is as follows (in psuedocodish):
I have 1 list of all possible fours of the board (ComboList) =of the form==> [[A,B,C,D]|Rest].
I have 1 list of all the moves of the 1st player (List1) =of the form==> [[1],[7],[14]]
And 1 for opponent's moves (List2).
Step 1: obtain the first combo from ComboList, 2:
Check all of List1 to see how many are in this combo, 3:
Check all of List2 to see how many are in this combo,
Move onto the next combo from ComboList and start over...
This PROCESS takes waay too much runtime for what is required.
Please can someone suggest something better and more efficient! Much thanks in advance!
The following code uses member/3, which has also to become
known as nth1/3. See here:
http://storage.developerzen.com/fourrow.pro.txt
The predicate is nowadays found in library(lists) and has
possibly native support or a fast implementation:
http://www.swi-prolog.org/pldoc/man?predicate=nth1/3
But I guess asserting some facts and relying on argument
indexing might get you an even better result. See for
example here:
http://www.mxro.de/applications/four-in-a-row
Hope this helps.
Bye

Custom EQ AudioUnit on iOS

The only effect AudioUnit on iOS is the "iTunes EQ", which only lets you use EQ pre-sets. I would like to use a customized eq in my audio graph
I came across this question on the subject and saw an answer suggesting using this DSP code in the render callback. This looks promising and people seem to be using this effectively on various platforms. However, my implementation has a ton of noise even with a flat eq.
Here's my 20 line integration into the "MixerHostAudio" class of Apple's "MixerHost" example application (all in one commit):
https://github.com/tassock/mixerhost/commit/4b8b87028bfffe352ed67609f747858059a3e89b
Any ideas on how I could get this working? Any other strategies for integrating an EQ?
Edit: Here's an example of the distortion I'm experiencing (with the eq flat):
http://www.youtube.com/watch?v=W_6JaNUvUjA
In the code in EQ3Band.c, the filter coefficients are used without being initialized. The init_3band_state method initialize just the gains and frequencies, but the coefficients themselves - es->f1p0 etc. are not initialized, and therefore contain some garbage values. That might be the reason for the bad output.
This code seems wrong in more then one way.
A digital filter is normally represented by the filter coefficients, which are constant, the filter inner state history (since in most cases the output depends on history) and the filter topology, which is the arithmetic used to calculate the output given the input and the filter (coeffs + state history). In most cases, and of course when filtering audio data, you expect to get 0's at the output if you feed 0's to the input.
The problems in the code you linked to:
The filter coefficients are changed in each call to the processing method:
es->f1p0 += (es->lf * (sample - es->f1p0)) + vsa;
The input sample is usually multiplied by the filter coefficients, not added to them. It doesn't make any physical sense - the sample and the filter coeffs don't even have the same physical units.
If you feed in 0's, you do not get 0's at the output, just some values which do not make any sense.
I suggest you look for another code - the other option is debugging it, and it would be harder.
In addition, you'd benefit from reading about digital filters:
http://en.wikipedia.org/wiki/Digital_filter
https://ccrma.stanford.edu/~jos/filters/

What are the most hardcore optimisations you've seen?

I'm not talking about algorithmic stuff (eg use quicksort instead of bubblesort), and I'm not talking about simple things like loop unrolling.
I'm talking about the hardcore stuff. Like Tiny Teensy ELF, The Story of Mel; practically everything in the demoscene, and so on.
I once wrote a brute force RC5 key search that processed two keys at a time, the first key used the integer pipeline, the second key used the SSE pipelines and the two were interleaved at the instruction level. This was then coupled with a supervisor program that ran an instance of the code on each core in the system. In total, the code ran about 25 times faster than a naive C version.
In one (here unnamed) video game engine I worked with, they had rewritten the model-export tool (the thing that turns a Maya mesh into something the game loads) so that instead of just emitting data, it would actually emit the exact stream of microinstructions that would be necessary to render that particular model. It used a genetic algorithm to find the one that would run in the minimum number of cycles. That is to say, the data format for a given model was actually a perfectly-optimized subroutine for rendering just that model. So, drawing a mesh to the screen meant loading it into memory and branching into it.
(This wasn't for a PC, but for a console that had a vector unit separate and parallel to the CPU.)
In the early days of DOS when we used floppy discs for all data transport there were viruses as well. One common way for viruses to infect different computers was to copy a virus bootloader into the bootsector of an inserted floppydisc. When the user inserted the floppydisc into another computer and rebooted without remembering to remove the floppy, the virus was run and infected the harddrive bootsector, thus permanently infecting the host PC. A particulary annoying virus I was infected by was called "Form", to battle this I wrote a custom floppy bootsector that had the following features:
Validate the bootsector of the host harddrive and make sure it was not infected.
Validate the floppy bootsector and
make sure that it was not infected.
Code to remove the virus from the
harddrive if it was infected.
Code to duplicate the antivirus
bootsector to another floppy if a
special key was pressed.
Code to boot the harddrive if all was
well, and no infections was found.
This was done in the program space of a bootsector, about 440 bytes :)
The biggest problem for my mates was the very cryptic messages displayed because I needed all the space for code. It was like "FFVD RM?", which meant "FindForm Virus Detected, Remove?"
I was quite happy with that piece of code. The optimization was program size, not speed. Two quite different optimizations in assembly.
My favorite is the floating point inverse square root via integer operations. This is a cool little hack on how floating point values are stored and can execute faster (even doing a 1/result is faster than the stock-standard square root function) or produce more accurate results than the standard methods.
In c/c++ the code is: (sourced from Wikipedia)
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1); // Now this is what you call a real magic number
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
A Very Biological Optimisation
Quick background: Triplets of DNA nucleotides (A, C, G and T) encode amino acids, which are joined into proteins, which are what make up most of most living things.
Ordinarily, each different protein requires a separate sequence of DNA triplets (its "gene") to encode its amino acids -- so e.g. 3 proteins of lengths 30, 40, and 50 would require 90 + 120 + 150 = 360 nucleotides in total. However, in viruses, space is at a premium -- so some viruses overlap the DNA sequences for different genes, using the fact that there are 6 possible "reading frames" to use for DNA-to-protein translation (namely starting from a position that is divisible by 3; from a position that divides 3 with remainder 1; or from a position that divides 3 with remainder 2; and the same again, but reading the sequence in reverse.)
For comparison: Try writing an x86 assembly language program where the 300-byte function doFoo() begins at offset 0x1000... and another 200-byte function doBar() starts at offset 0x1001! (I propose a name for this competition: Are you smarter than Hepatitis B?)
That's hardcore space optimisation!
UPDATE: Links to further info:
Reading Frames on Wikipedia suggests Hepatitis B and "Barley Yellow Dwarf" virus (a plant virus) both overlap reading frames.
Hepatitis B genome info on Wikipedia. Seems that different reading-frame subunits produce different variations of a surface protein.
Or you could google for "overlapping reading frames"
Seems this can even happen in mammals! Extensively overlapping reading frames in a second mammalian gene is a 2001 scientific paper by Marilyn Kozak that talks about a "second" gene in rat with "extensive overlapping reading frames". (This is quite surprising as mammals have a genome structure that provides ample room for separate genes for separate proteins.) Haven't read beyond the abstract myself.
I wrote a tile-based game engine for the Apple IIgs in 65816 assembly language a few years ago. This was a fairly slow machine and programming "on the metal" is a virtual requirement for coaxing out acceptable performance.
In order to quickly update the graphics screen one has to map the stack to the screen in order to use some special instructions that allow one to update 4 screen pixels in only 5 machine cycles. This is nothing particularly fantastic and is described in detail in IIgs Tech Note #70. The hard-core bit was how I had to organize the code to make it flexible enough to be a general-purpose library while still maintaining maximum speed.
I decomposed the graphics screen into scan lines and created a 246 byte code buffer to insert the specialized 65816 opcodes. The 246 bytes are needed because each scan line of the graphics screen is 80 words wide and 1 additional word is required on each end for smooth scrolling. The Push Effective Address (PEA) instruction takes up 3 bytes, so 3 * (80 + 1 + 1) = 246 bytes.
The graphics screen is rendered by jumping to an address within the 246 byte code buffer that corresponds to the right edge of the screen and patching in a BRanch Always (BRA) instruction into the code at the word immediately following the left-most word. The BRA instruction takes a signed 8-bit offset as its argument, so it just barely has the range to jump out of the code buffer.
Even this isn't too terribly difficult, but the real hard-core optimization comes in here. My graphics engine actually supported two independent background layers and animated tiles by using different 3-byte code sequences depending on the mode:
Background 1 uses a Push Effective Address (PEA) instruction
Background 2 uses a Load Indirect Indexed (LDA ($00),y) instruction followed by a push (PHA)
Animated tiles use a Load Direct Page Indexed (LDA $00,x) instruction followed by a push (PHA)
The critical restriction is that both of the 65816 registers (X and Y) are used to reference data and cannot be modified. Further the direct page register (D) is set based on the origin of the second background and cannot be changed; the data bank register is set to the data bank that holds pixel data for the second background and cannot be changed; the stack pointer (S) is mapped to graphics screen, so there is no possibility of jumping to a subroutine and returning.
Given these restrictions, I had the need to quickly handle cases where a word that is about to be pushed onto the stack is mixed, i.e. half comes from Background 1 and half from Background 2. My solution was to trade memory for speed. Because all of the normal registers were in use, I only had the Program Counter (PC) register to work with. My solution was the following:
Define a code fragment to do the blend in the same 64K program bank as the code buffer
Create a copy of this code for each of the 82 words
There is a 1-1 correspondence, so the return from the code fragment can be a hard-coded address
Done! We have a hard-coded subroutine that does not affect the CPU registers.
Here is the actual code fragments
code_buff: PEA $0000 ; rightmost word (16-bits = 4 pixels)
PEA $0000 ; background 1
PEA $0000 ; background 1
PEA $0000 ; background 1
LDA (72),y ; background 2
PHA
LDA (70),y ; background 2
PHA
JMP word_68 ; mix the data
word_68_rtn: PEA $0000 ; more background 1
...
PEA $0000
BRA *+40 ; patched exit code
...
word_68: LDA (68),y ; load data for background 2
AND #$00FF ; mask
ORA #$AB00 ; blend with data from background 1
PHA
JMP word_68_rtn ; jump back
word_66: LDA (66),y
...
The end result was a near-optimal blitter that has minimal overhead and cranks out more than 15 frames per second at 320x200 on a 2.5 MHz CPU with a 1 MB/s memory bus.
Michael Abrash's "Zen of Assembly Language" had some nifty stuff, though I admit I don't recall specifics off the top of my head.
Actually it seems like everything Abrash wrote had some nifty optimization stuff in it.
The Stalin Scheme compiler is pretty crazy in that aspect.
I once saw a switch statement with a lot of empty cases, a comment at the head of the switch said something along the lines of:
Added case statements that are never hit because the compiler only turns the switch into a jump-table if there are more than N cases
I forget what N was. This was in the source code for Windows that was leaked in 2004.
I've gone to the Intel (or AMD) architecture references to see what instructions there are. movsx - move with sign extension is awesome for moving little signed values into big spaces, for example, in one instruction.
Likewise, if you know you only use 16-bit values, but you can access all of EAX, EBX, ECX, EDX , etc- then you have 8 very fast locations for values - just rotate the registers by 16 bits to access the other values.
The EFF DES cracker, which used custom-built hardware to generate candidate keys (the hardware they made could prove a key isn't the solution, but could not prove a key was the solution) which were then tested with a more conventional code.
The FSG 2.0 packer made by a Polish team, specifically made for packing executables made with assembly. If packing assembly isn't impressive enough (what's supposed to be almost as low as possible) the loader it comes with is 158 bytes and fully functional. If you try packing any assembly made .exe with something like UPX, it will throw a NotCompressableException at you ;)