What is the meaning of compile time abstraction? - oop

I came across the sentence, "Compile time abstraction of runtime behaviour", what is compile time abstraction here? My guesses would be,
like in a language, trying to optimize/do things that can be done at compile time, and only leaving room for things that can be done only at the run time,
For Example.
int a;
a = 5;// 5 can be assigned to a only at compile time(unless it is a const), because the user might have created the program, where he gets the input from commandline, stdin,fin etc etc
where as, int a// can be done at compile time, since you know the type right away......

It seems that you are confused by "compile-time abstraction" in
Static type checking is a compile-time abstraction of the runtime behaviour of your program, ...
(quote from the paper "Static Typing Where Possible, Dynamic Typing When Needed" that you've linked to in your comment)
If the word "abstraction" was replaced by "approximation", would that make more sense to you?
Given an expression E of type T, we can say that T approximates at compile-time the sort of values computed at run-time (when evaluating E). For example, say you have an expression [2+2*3] of type [integer] -- you could say that "this expression will evaluate to an integer".

Related

In a dependently typed programming language is Type-in-Type practical for programming?

In a language with dependent types you can have Type-in-Type which simplifies the language and gives it a lot of power. This makes the language logically inconsistent but this might not be a problem if you are interested in programming only and not theorem proving.
In the Cayenne paper (a dependently typed language for programming) it is mentioned about Type-in-Type that "the unstratified type system would make it impossible during type checking to determine if an expression corresponds to a type or a real value and it would be impossible to remove the types at runtime" (section 2.4).
I have two questions about this:
In some dependently typed languages (like Agda) you can explicitly say which variables should be erased. In that case does Type-in-Type still cause problems?
We could extend the hierarchy one extra step with Kind where Type : Kind and Kind : Kind. This is still inconsistent but it seems that now you can know if a term is a type or a value. Is this correct?
the unstratified type system would make it impossible during type
checking to determine if an expression corresponds to a type or a real
value and it would be impossible to remove the types at runtime
This is not correct. Type-in-type prevents erasure of proofs, but it does not prevent erasure of types, assuming that we have parametric polymorphism with no typecase operation. Recent GHC Haskell is an example for a system which supports type-in-type, type erasure and type-level computation at the same time, but which does not support proof erasure. In dependently typed settings, we always know if a term is a type or not; we just check whether its type is Type.
Type erasure is just erasure of all things with type Type.
Proof erasure is more complicated. Let's assume that we have a Prop universe like in Coq, which is intended to be a universe of computationally irrelevant types. Here, we can use some p : Bool = Int proof to coerce Bool-s to Int. If the language is consistent, there is no closed proof of Bool = Int so closed program execution never encounters such coercion. Thus, closed program execution is safe even if we erase all coercions.
If the language is inconsistent, and the only way of proving contradiction is by an infinite loop, there is a diverging closed proof of Bool = Int. Now, closed program execution can actually hit a proof of falsehood; but we can still have type safety, by requiring that coercion must evaluate the proof argument. Then, the program loops whenever we coerce by falsehood, so execution never reaches the unsound parts of the program.
Probably the key point here is that A = B : Prop supports coercion, which eliminates into computationally relevant universe, but a parametric Type universe has no elimination principle at all and cannot influence computation.
Erasure can be generalized in several ways. For example, we may have any inductive data type with a single constructor (and no stored data which is not available from elsewhere, e.g. type indices), and try to erase every matching on that constructor. This is again sound if the language is total, and not otherwise. If we don't have a Prop universe, we can still do erasure like this. IIRC Idris does this a lot.
I just want to add a note that I believe is related to the question. Formality, a minimal proof language based on self-types, is non-terminating. I was involved in a Reddit discussion about whether Formality can segfault. One way that could happen is if you could prove Nat == String, then cast 42 :: Nat to 42 :: String and then print it as if it was a string, for example. But that's not the case. While you can prove String == Int in Formality:
nat_is_string: Nat == String
nat_is_string
And you can use it to cast a Nat to a String:
nat_str: String
42 :: rewrite x in x with nat_is_string
Any attempt to print nat_str, your program will not segfault, it will just hang. That's because you can't erase the equality evidence in Formality. To understand why, let's see the definition of Equal.rewrite (which is used to cast 42 to String):
Equal.rewrite<A: Type, a: A, b: A>(e: Equal(A,a,b))<P: A -> Type>(x: P(a)): P(b)
case e {
refl: x
} : P(e.b)
Once we erase the types, the normal form of rewrite becomes λe. λx. e(x). The e there is the equality evidence. In the example above, the normal form of nat_str is not 42, but nat_is_string(42). Since nat_is_string is an equality proof, then it has two options: either it will halt and become identity, in which case it will just return 42, or it will hang forever. In this case, it doesn't halt, so nat_is_string(42) will never return 42. As such, it can't be printed, and any attempt to use it will cause your entire program to hang, but not segfault.
So, in short, the insight is that self types allow us to encode the Equal, rewrite / subst, and erase all the type information, but not the equality evidence itself.

Defining Rules for implicit C++ constexpr Style Compile Time Evaluation

After reading and watching a talk about constexpr in C++ I thought:
That's so cool!
Being a language nerd, I thought about defining a new programming language with the constexpr idea baked in implicitly. Meaning no constexpr keyword. As far as I understand the general rules for evaluating constexpr at compile time are:
The function or variable must be declared with the constexpr keyword.
If it is a function its parameters must all be known at compile time. If it is a variable it must be initialized when declared with a value known at compile time.
The value of a constexpr function/variable is needed at compile time or else the compiler is allowed to defer it to run time.
But lets say that we remove condition 1 and 3. Then the compiler would evaluate anything that it could at compile time. But you might get the wrong result in some cases. Here is an example:
var file = new File("Data.txt");
...
...
In this case the path is known at compile time therefore the file is searched for on the machine the program is compiled on and not the user's machine. In some cases that might be what the programmer needs but it often isn't.
So the question is: Could a set of additional rules be defined so that the compiler can make the distinction?

Constant vs Unchanged Variable?

Is there any advantage to using a constant (unchangable) than just not changing a variable?
Depending on your language and compiler, a constant may get inlined & optimized when built. Variables will likely eat up stack space even if it never changes.
By making the value constant, the compiler can just substitute it. If you have x / 2, for example, the compiler can compute the value and use that instead of having to emit code to retrieve the value of x and then divide it by 2.
Also, you don't have to worry about accidentally changing the value. For example, in C-like languages you might accidentally type if (x = 2) when you meant if (x == 2) which will change the value of x if it's a variable.
Anyone maintaining your code in the future (including you) won't have to look around to see where (if anywhere) a constant is changed when finding a bug or adding a feature - they'll know right off the bat that it can't be changed.
In some program languages, declaring something to be constant will allow a compiler to make optimizations which would not otherwise be possible. Further, declaring something to be constant can be a useful way of documenting that there are places in the code which might be broken should the value change.
Unfortunately, some programming languages sometimes do evil things with things that are declared constant. For example, in some .net languages, if a value type which is declared read-only is passed by modifiable reference, the compiler will, rather than refusing to allow such an action, instead make a copy and pass that. Such implicit copying will impair efficiency, and may result in unexpected semantics.

What are the steps I need to do to complete this programming assignment?

I'm having a hard time understanding what I'm supposed to do. The only thing I've figured out is I need to use yacc on the cminus.y file. I'm totally confused about everything after that. Can someone explain this to me differently so that I can understand what I need to do?
INTRODUCTION:
We will use lex/flex and yacc/Bison to generate an LALR parser. I will give you a file called cminus.y. This is a yacc format grammar file for a simple C-like language called C-minus, from the book Compiler Construction by Kenneth C. Louden. I think the grammar should be fairly obvious.
The Yahoo group has links to several descriptions of how to use yacc. Now that you know flex it should be fairly easy to learn yacc. The only base type is int. An int is 4 bytes. Booleans are handled as ints, as in C. (Actually the grammar allows you to declare a variable as a type void, but let's not do that.) You can have one-dimensional arrays.
There are no pointers, but references to array elements should be treated as pointers (as in C).
The language provides for assignment, IF-ELSE, WHILE, and function calls and returns.
We want our compiler to output MIPS assembly code, and then we will be able to run it on SPIM. For a simple compiler like this with no optimization, an IR should not be necessary. We can output assembly code directly in one pass. However, our first step is to generate a symbol table.
SYMBOL TABLE:
I like Dr. Barrett’s approach here, which uses a lot of pointers to handle objects of different types. In essence the elements of the symbol table are identifier, type and pointer to an attribute object. The structure of the attribute object will differ according to the type. We only have a small number of types to deal with. I suggest using a linear search to find symbols in the table, at least to start. You can change it to hashing later if you want better performance. (If you want to keep in C, you can do dynamic allocation of objects using malloc.)
First you need to make a list of all the different types of symbols that there are—there are not many—and what attributes would be necessary for each. Be sure to allow for new attributes to be added, because we
have not covered all the issues yet. Looking at the grammar, the question of parameter lists for functions is a place where some thought needs to be put into the design. I suggest more symbol table entries and pointers.
TESTING:
The grammar is correct, so taking the existing grammar as it is and generating a parser, the parser will accept a correct C-minus program but it won’t produce any output, because there are no code snippets associated with the rules.
We want to add code snippets to build the symbol table and print information as it does so.
When an identifier is declared, you should print the information being entered into the symbol table. If a previous declaration of the same symbol in the same scope is found, an error message should be printed.
When an identifier is referenced, you should look it up in the table to make sure it is there. An error message should be printed if it has not been declared in the current scope.
When closing a scope, warnings should be generated for unreferenced identifiers.
Your test input should be a correctly formed C-minus program, but at this point nothing much will happen on most of the production rules.
SCOPING:
The most basic approach has a global scope and a scope for each function declared.
The language allows declarations within any compound statement, i.e. scope nesting. Implementing this will require some kind of scope numbering or stacking scheme. (Stacking works best for a one-pass
compiler, which is what we are building.)
(disclaimer) I don't have much experience with compiler classes (as in school courses on compilers) but here's what I understand:
1) You need to use the tools mentioned to create a parser which, when given input will tell the user if the input is a correct program as to the grammar defined in cminus.y. I've never used yacc/bison so I don't know how it is done, but this is what seems to be done:
(input) file-of-some-sort which represents output to be parsed
(output) reply-of-some-sort which tells if the (input) is correct with respect to the provided grammar.
2) It also seems that the output needs to check for variable consistency (ie, you can't use a variable you haven't declared same as any programming language), which is done via a symbol table. In short, every time something is declared you add it to the symbol table. When you encounter an identifier, if it is not one of the language identifiers (like if or while or for), you'll look it up in the symbol table to determine if it has been declared. If it is there, go on. If it's not - print some-sort-of-error
Note: point(2) there is a simplified take on a symbol table; in reality there's more to them than I just wrote but that should get you started.
I'd start with yacc examples - see what yacc can do and how it does it. I guess there must be some big example-complete-with-symbol-table out there which you can read to understand further.
Example:
Let's take input A:
int main()
{
int a;
a = 5;
return 0;
}
And input B:
int main()
{
int a;
b = 5;
return 0;
}
and assume we're using C syntax for parsing. Your parser should deem Input A all right, but should yell "b is undeclared" for Input B.

const vs enum in D

Check out this quote from here, towards the bottom of the page. (I believe the quoted comment about consts apply to invariants as well)
Enumerations differ from consts in that they do not consume any space
in the final outputted object/library/executable, whereas consts do.
So apparently value1 will bloat the executable, while value2 is treated as a literal and doesn't appear in the object file.
const int value1 = 0xBAD;
enum int value2 = 42;
Back in C++ I always assumed this was for legacy reasons, and old compilers that couldn't optimize away constants. But if this is still true in D, there must be a deeper reason behind this. Anyone know why?
Just like in C++, an enum in D seems to be a "conserved integer literal" (edit: amazing, D2 even supports floats and strings). Its enumerators have no location. They are just immaterial as values without identity.
Placing enum is new in D2. It first defines a new variable. It is not an lvalue (so you also cannot take its address). An
enum int a = 10; // new in D2
Is like
enum : int { a = 10 }
If i can trust my poor D knowledge. So, a in here is not an lvalue (no location and you can't take its address). A const, however, has an address. If you have a global (not sure whether this is the right D terminology) const variable, the compiler usually can't optimize it away, because it doesn't know what modules can access that variable or could take its address. So it has to allocate storage for it.
I think if you have a local const, the compiler can still optimize it away just as in C++, because the compiler knows by looking at its scope whether or not anyone is interested in its address or whether everyone just takes its value.
Your actual question; why enum/const is the same in D as in C++; seems to be unanswered. Sadly there exists no good reason for this choice whatsoever. I believe that this was just an unintentional side effect in C++ that became a de facto pattern. In D the same pattern was needed, and Walter Bright decided that it should be done as in C++ such that those coming from that place would recognize what to do ... In fact, before this rather IMHO silly decision, the keyword manifest was used instead of enum for this usecase.
I think a good compiler/linker should still remove the constant. It's just that with the enum, it's actually guaranteed in the spec. The difference is primarily a matter of semantics. (Also keep in mind that 2.0 isn't complete yet)
The real purpose of enum being expanded syntactically to support single manifest constants, from what I understand, is that Don Clugston, a D template guru, was doing some crazy stuff with templates. He kept running into long build times, ridiculous compiler memory usage, etc. because the compiler kept creating internal data strucutres for const variables. One key thing about const/immutable variables compared to enums is that const/immutable variables are lvalues and can have their address taken. This means there is some extra overhead for the compiler. This usually doesn't matter, but when you're executing really complicated compile-time metaprograms, even if const variables are optimized away, this is still significant overhead at compile time.
It sounds like the enum value will be used "inline" in expressions where as the const will actually take storage and any expression referencing it will be loading the value from the memory storage.
This sound similar to the difference between const vs. readonly in C#. The former is a compile-time constant and the later is a run-time constant. This definitely affected versioning of assemblies (since assemblies referencing a readonly would receive a copy at compile time and would not get a change to the value if the referenced assembly was rebuilt with a different value).