I was looking at the source for the memoize.
Coming from languages like C++/Python, this part hit me hard:
(let [mem (atom {})] (fn [& args] (if-let [e (find #mem args)] ...
I realize that memoize returns a function, but for storing state, it uses a local "variable" mem. But after memoize returns the function, shouldn't that outer let vanish from scope. How can the function still refer to the mem.
Why doesn't Clojure delete that outer variable, and how does it manage variable names. Like suppose, I make another memoized function, then memoize uses another mem. Doesn't that name clash with the earlier mem?
P.S.: I was thinking that there must be something much be happening in there, that prevents that, so I wrote myself a easier version, that goes like http://ideone.com/VZLsJp , but that still works like the memoize.
Objects are garbage collectable if no thread can access them, as per usual for JVM languages. If a thread has a reference to the function returned by memoize and the function has a reference to the atom in mem then transitively the atom is still accessible.
But after memoize returns the function, shouldn't that outer let vanish from scope. How can the function still refer to the mem.
This is what is called a closure. If a function is defined using a name from its environment, it keeps a reference to that value afterwards - even if the defining environment is gone and the function is the only thing that has access any more.
Like suppose, I make another memoized function, then memoize uses another mem. Doesn't that name clash with the earlier mem?
No, except possibly by confusing programmers. Having multiple scopes each declare their own name mem is very much possible and the usual rules of lexical scoping are used to determine which is meant when mem is read. There are some trickier edge cases such as
(let[foo 2]
(let[foo (fn[] foo)] ;; In the function definition, foo has the value from the outer scope
;; because the second let has not yet bound the name
(foo))) ;; => 2.
but generally the idea is pretty simple - the value of a name is the one given in the definition closest in the program text to the place it is used - either in the local scope or in the closest outer scope.
Different invocations of memoize create different closures so that the name mem refers to different atoms in each returned function.
Related
I have recently encountered a confusing dichotomy regarding structures in Lisp.
When creating a structure with (defstruct), we specify the slots by keyword (:slotname). But when accessing it, we use local symbols ('slotname).
Why? This makes no sense to me.
Also, doesn't this pollute the keyword package every time you declare a structure?
If I try to access the slots by keyword, I get confusing errors like:
When attempting to read the slot's value (slot-value), the slot :BALANCE is
missing from the object #S(ACCOUNT :BALANCE 1000 :CUSTOMER-NAME "John Doe").
I don't understand this message. It seems to be telling me that something right under my nose doesn't exist.
I have tried declaring the structure using local symbols; and also with unbound keywords (#:balance) and these don't work.
DEFSTRUCT is designed in the language standard in this way:
slot-names are not exposed
there is no specified way to get a list of slot-names of a structure class
there is no specified way to access a slot via a slot-name
thus at runtime there might be no slot-names
access to slots is optimized with accessor functions: static structure layout, inlined accessor functions, ...
Also explicitly:
slot-names are not allowed to be duplicate under string=. Thus slots foo::a and bar::a in the same structure class are not allowed
the effects of redefining a structure is undefined
The goal of structures is to provide fast record-like objects without costly features like redefinition, multiple inheritance, etc.
Thus using SLOT-VALUE to access structure slots is an extension of implementations, not a part of the defined language. SLOT-VALUE was introduced when CLOS was added to Common Lisp. Several implementations provide a way to access a structure slot via SLOT-VALUE. This then also requires that the implementation has kept track of slot names of that structure.
SLOT-VALUE is simply a newer API function, coming from CLOS for CLOS. Structures are an older feature, which was defined already in the first version of Common Lisp defined by the book CLtL1.
You used make-instance to create a class instance and then you are showing a struct, I am confused.
structs automatically build their accessor functions. You create it with make-account. Then you'd use account-balance instead of slot-value.
I don't know what is the expected behavior to use make-instance with a struct. While it seemed to work on my SBCL, you are not using structs right.
(defstruct account
(balance))
(make-account :balance 100)
#S(ACCOUNT :BALANCE 100)
(account-balance *)
100
With classes, you are free to name your accessor functions as you want.
;;(pseudocode)
(defclass bank-account ()
((balance :initform nil ;; otherwise it's unbound
:initarg :balance ;; to use with make-instance :balance
:accessor balance ;; or account-balance, as you wish.
)))
(make-instance 'bank-account :balance 200)
#<BANK-ACCOUNT {1009302A33}>
(balance *)
200
https://lispcookbook.github.io/cl-cookbook/data-structures.html#structures
http://www.lispworks.com/documentation/HyperSpec/Body/m_defstr.htm
the slot :BALANCE is missing from the object #S(ACCOUNT :BALANCE 1000 :CUSTOMER-NAME "John Doe").
The slot name is actually balance and the representation uses the generated initargs. With the class object, the error message might be less confusing:
When attempting to read the slot's value (slot-value), the slot :BALANCE is missing from the object #<BANK-ACCOUNT {1009302A33}>.
First of all, see Rainer's excellent answer on structures. In summary:
Objects defined with defstruct have named accessor functions, not named slots. Further the field names of these objects which are mentioned in the defstruct form must be distinct as strings, and so keywords are completely appropriate for use in constructor functions. Any use of slot-value on such objects is implementation-dependent, and indeed whether or not named slots exist at all is entirely implementation-dependent.
You generally want keyword arguments for the constructors for the reasons you want keyword arguments elsewhere: you don't want to have to painfully provide 49 optional arguments so you can specify the 50th. So it's reasonable that the default thing defstruct does is that. But you can completely override this if you want to, using a BOA constructor, which defstruct allows you to do. You can even have no constructor at all! As an example here is a rather perverse structure constructor: it does use keyword arguments, but not the ones you might naively expect.
(defstruct (foo
(:constructor
make-foo (&key ((:y x) 1) ((:x y) 2))))
y
x)
So the real question revolves around classes defined with defclass, which usually do have named slots and where slot-value does work.
So in this case there are really two parts to the annswer.
Firstly, as before, keyword arguments are really useful for constructors because no-one wants to have to remember 932 optional argument defaults. But defclass provides complete control over the mapping between keyword arguments and the slots they initialise, or whether they initialise slots at all or instead are passed to some initialize-instance method. You can just do anything you want here.
Secondly, you really want slot names for objects of classes defined with defclass to be symbols which live in packages. You definitely do not want this to happen:
(in-package "MY-PACKAGE")
(use-package "SOMEONE-ELSES-PACKAGE")
(defclass my-class (someone-elses-class)
((internal-implementation-slot ...)))
only to discover that you have just modified the definition of the someone-elses-package::internal-implementation-slot slot in someone-elses-class. That would be bad. So slot names are symbols which live in packages and the normal namespace control around packages works for them too: my-package::internal-implementation-slot and someone-elses-package::internal-implementation-slot are not (usually) the same thing.
Additionally, the whole keyword-symbol-argument / non-keyword-symbol-variable thing is, well, let's just say well-established:
(defun foo (&key (x 1))
... x ...)
Finally note, of course, that keyword arguments don't actually have to be keywords: it's generally convenient that they are because you need quotes otherwise, but:
(defclass silly ()
((foo :initarg foo
:accessor silly-foo)
(bar :initarg bar
:accessor silly-bar)))
And now
> (silly-foo (make-instance 'silly 'bar 3 'foo 9))
9
In Rust variables are immutable by default, i.e., they don't vary but are not constants (as noted here).
Do they retain the name "variable" just by convention, or is there another reason why the term "variable" is maintained?
It should be noted that the term mut in Rust was hotly debated before stabilization with some arguing that it should be called excl or uniq. The matter is that the mut in in let mut x and &mut x are two completely different things.
let mut x declares that x is mutable, in the sense that it can be re-assigned, but also that one can take a &mut reference of it; which is best called an exclusive or unique reference. It is quite possible in Rust in some cases to mutate through a shared reference in the case of std::cell::Cell, for instance, and not all operations that require an exclusive reference involve mutation. An operation that requires an exclusive reference is simply one that would be unsafe with a shared one; Cell is designed in such a way that it is not, by strictly controlling under what conditions mutation can occur.
In theory, the two functions of let mut x could have different keywords, but they are compressed into one for simplicity. Rust could in theory be designed with mut and excl being different keywords, and allowing for let excl x, which would be a variable wherefrom one could take an exclusive reference, but not mutate.
One can also have variables that are not declared with mut, in particular in function calls. In a signature like fn func ( x : u32 ), x is not mutable, but it is variable, because it a different x can be passed every single time.
The let mut x type of "mutable" is purely a lint and, in theory, unnecessary for Rust to work — any currently working Rust program will continue to work if all non-mutable variables be made mutable. It's simply considered bad practice to do so and the compiler will warn the programmer whenever he make a variable mutable that isn't necessary to be mutable; this helps catching unintended bugs. This is absolutely not the case with exclusive and shared references, which are necessary to be distinguished and more than just a lint.
Here "variable" means "factor involved in computation" not "varying". This is from the mathematical principle where expressions like f(x) include x, a variable, as a part of the equation.
In Rust, like with other languages, you'll need variables (e.g. input) that affects how the program runs, otherwise your program would only ever behave in a singular, specific way, producing the same output each time.
You'll need to think of what variables change during processing and which do not. Those that do not need to change do not need to be declared mutable.
Regardless of if or when they change, they're still considered variables.
In C++ you'll have things like const int x which is a constant (read-only) variable, so the term can take on all sorts of specific meanings.
Is the term immutable variable just a convention?
By definition every... definition of a word is a convention, language, meaning of the word, change by time, is unique for every people that live, you can take 100 peoples and end with 100 difference definition of 1 word. That why we often start scientific paper by defining word that could be miss understand in the paper. Trying to clarify as much as possible. Rust does not differs that why we have The Reference
We have a specific section for variable
A variable is a component of a stack frame, either a named function
parameter, an anonymous temporary, or a named local variable.
A local variable (or stack-local allocation) holds a value directly,
allocated within the stack's memory. The value is a part of the stack
frame.
Local variables are immutable unless declared otherwise. For example:
let mut x = ....
Function parameters are immutable unless declared with mut. The mut
keyword applies only to the following parameter. For example: |mut x,
y| and fn f(mut x: Box, y: Box) declare one mutable variable
x and one immutable variable y.
Local variables are not initialized when allocated. Instead, the
entire frame worth of local variables are allocated, on frame-entry,
in an uninitialized state. Subsequent statements within a function may
or may not initialize the local variables. Local variables can be used
only after they have been initialized through all reachable control
flow paths.
So there is not much to add, variable in rust is clearly defined, it doesn't matter if your definition doesn't match or you find a definition of variable that doesn't match Rust one. In the context of Rust, variable is that. If you want to ask about opinion about this choice then it's off topic as opinion oriented. But, wiki definition make Rust definition quite standard both from mathematics view than computer science:
Variable (computer science), a symbolic name associated with a value and whose associated value may be changed
Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many sciences
C language take scope binding during compile time (variable reference get fixed address - doesn't change at all), that is example of static scoping.
Elisp language take scope binding during run time (variable point to top of own personal reference stack, let/defun/... special forms add to the top of stack on entry from top on the leave - in that time capture modified), that is example of dynamic scoping.
What type of binding used in lexical scoping?
Such language as Common Lisp, Python, R, JavaScript state that they implement lexical scoping.
What technics are used in implementation for that languages?
I hear about environments that carried with function appearance. If I right - when was environments created?
Is it possible or usual to construct and bind environment to function by developer manually? Some thing like call( bind_env(make_env({buf: [...], len: 5}), myfunc) )
In short, lexical scoping takes place at compile time (or more precisely, at the time when the function definition is evaluated). Also, lexical scoping can be static scoping: this is how ML-languages (SML, OCaml, Haskell) do it.
Environments
Every function is defined in some environment. Also, each function creates its own local environment nested in enclosing environment. Top level environment is where all usual variables, functions (+, -, sin, map, etc.) and syntax (relevant for languages that can extend syntax, like Common Lisp, Scheme, Clojure) are defined.
Each function creates its own local environment nested in enclosing environment (e.g., top level or of other function). Function arguments and all local variables live in this environment. If function reference a variable or a function that is not defined in the local environment (called free in this environment) it is found in enclosing environment of the function definition (or higher in enclosing env of enclosing environment if it is not found there and so on). This is different from dynamic scoping where the value would be found in the environment from where the function is called.
I am going to illustrate this using Scheme:
y is free in this definition
(define (foo x)
(+ x y))
Here is y defined in top-level environment
(define y 1)
Introduce local 'y', but foo will use y from enclosing (top-level) environment of the definition. Hence, the result is 2 and not 11.
(let ((y 10))
(foo 1))
=> 2
You can also define a function (or a procedure in Scheme's terms) with a local environment enclosing it:
(define bar
(let ((y 100))
(lambda (x) (+ x y))))
(bar 1)
=> 101
Here procedure value bar is defined to be a procedure. Variable y is again free in the procedure body. But enclosing environment is created by let form in which y is defined to be 100. So, when bar is called, it is that value of y which is fetched and not top-level (in a dynamically scoped language it would have returned 2).
Answering your last question, it is possible to create your own environment manually, but it would be too much work and probably wouldn't be very efficient. When the language is implemented (for example, Scheme interpreter), that is exactly what the language developer is doing.
Good explanation of environments is shown in SICP, Chapter 3
Other comments
AFAIK, from Emacs 23, ELisp is using lexical scoping as well as dynamic scoping (similarly to Common Lisp, see below).
Common Lisp uses lexical scoping for local variables (introduced by let form) and dynamic scoping for global variables (they are also called special; it is possible to declare a local variable special, but it is rarely used) defined with defvar and defparameter. To distinguish them from lexically scoped variables, their names usually have "ear muffs", for example *standard-input*. Top-level functions are also special in CL, which can be rather dangerous: one can unknowingly alter behavior by shadowing top-level function. This why CL standard specifies the locks on standard library function to prevent their re-definition.
Scheme, in contrast, always uses lexical scoping. Dynamic scoping, however, is useful sometimes (Richard Stallman makes a good point on it). To overcome this, many Scheme implementations introduced so-called parameters (implemented using lexical scoping).
Languages like Common Lisp, Scheme, Clojure, Python keep a dynamic reference to a variable: you can construct the variable name from a string (intern a symbol in Lisp's terms) and find its value. More static languages, like C, OCaml or Haskell, cannot do that (unless some form of reflection is used). But this has a weak connection to what kind of scoping they use.
How can I attach an arbitrary tag to a closure in Scheme?
Here are a couple things I'd like to use this for:
(1) To mark closures that provide an interface to produce a string for what they represent, like what #kud0h asked for here. A general ->string procedure could include code something like this:
(display (if (stringable? x)
(x 'string)
x)
str-port)
(2) More generally, to determine if a closure is an "object" that obeys the rules of a general object interface, or maybe to tell the class of an object (something like what #KPatnode was asking about here).
I can't query a procedure to see if it supports a certain interface by calling it, because if it doesn't support a known interface, calling the procedure will produce unpredictable results, most likely a run-time error.
Chez Scheme has putprop and getprop procedures that allow you to add keys and values to symbols. However, closures can be anonymous, or bound to different symbols, so I'd prefer to attach a calling-convention tag to the closure itself, not a symbol that it's bound to.
The only idea I have right now is to maintain a global hash table of all "stringable" or "object" closures in the system. That seems a little clunky. Is there a simpler, more elegant, or more efficient way?
Racket has applicable structures: you can give a structure type an apply hook to be called if an instance is used as a function.
If you want a more portable solution, you can use a hash table to associate your data with certain procedures. Unless your Scheme provides weak hashtables, though, keep in mind that the hashtable will prevent the procedures from being garbage-collected.
I think you might, instead of tagging procedures per se, want to look at Racket's object system, which has a concept of interfaces. It sounds quite similar to what you're after.
You could go extreme and redefine lambda syntax. Something like this (but untested by me):
(define *properties* '()) ;; example only
(define-syntax lambda
(let-syntax ((sys-lambda
(syntax-rules ()
((_ args body ...)
(lambda args body ...)))))
(syntax-rules ()
((_ args body ...)
(let ((func (sys-lambda args body ...)))
(set! *properties*
(cons (cons func '(NO-PROPERTIES))
*properties*))
func)))))
I am working through Write Yourself a Scheme in 48 Hours (I'm up to about 85hrs) and I've gotten to the part about Adding Variables and Assignments. There is a big conceptual jump in this chapter, and I wish it had been done in two steps with a good refactoring in between rather then jumping at straight to the final solution. Anyway…
I've gotten lost with a number of different classes that seem to serve the same purpose: State, ST, IORef, and MVar. The first three are mentioned in the text, while the last seems to be the favored answer to a lot of StackOverflow questions about the first three. They all seem to carry a state between consecutive invocations.
What are each of these and how do they differ from one another?
In particular these sentences don't make sense:
Instead, we use a feature called state threads, letting Haskell manage the aggregate state for us. This lets us treat mutable variables as we would in any other programming language, using functions to get or set variables.
and
The IORef module lets you use stateful variables within the IO monad.
All this makes the line type ENV = IORef [(String, IORef LispVal)] confusing - why the second IORef? What will break if I'll write type ENV = State [(String, LispVal)] instead?
The State Monad : a model of mutable state
The State monad is a purely functional environment for programs with state, with a simple API:
get
put
Documentation in the mtl package.
The State monad is commonly used when needing state in a single thread of control. It doesn't actually use mutable state in its implementation. Instead, the program is parameterized by the state value (i.e. the state is an additional parameter to all computations). The state only appears to be mutated in a single thread (and cannot be shared between threads).
The ST monad and STRefs
The ST monad is the restricted cousin of the IO monad.
It allows arbitrary mutable state, implemented as actual mutable memory on the machine. The API is made safe in side-effect-free programs, as the rank-2 type parameter prevents values that depend on mutable state from escaping local scope.
It thus allows for controlled mutability in otherwise pure programs.
Commonly used for mutable arrays and other data structures that are mutated, then frozen. It is also very efficient, since the mutable state is "hardware accelerated".
Primary API:
Control.Monad.ST
runST -- start a new memory-effect computation.
And STRefs: pointers to (local) mutable cells.
ST-based arrays (such as vector) are also common.
Think of it as the less dangerous sibling of the IO monad. Or IO, where you can only read and write to memory.
IORef : STRefs in IO
These are STRefs (see above) in the IO monad. They don't have the same safety guarantees as STRefs about locality.
MVars : IORefs with locks
Like STRefs or IORefs, but with a lock attached, for safe concurrent access from multiple threads. IORefs and STRefs are only safe in a multi-threaded setting when using atomicModifyIORef (a compare-and-swap atomic operation). MVars are a more general mechanism for safely sharing mutable state.
Generally, in Haskell, use MVars or TVars (STM-based mutable cells), over STRef or IORef.
Ok, I'll start with IORef. IORef provides a value which is mutable in the IO monad. It's just a reference to some data, and like any reference, there are functions which allow you to change the data it refers to. In Haskell, all of those functions operate in IO. You can think of it like a database, file, or other external data store - you can get and set the data in it, but doing so requires going through IO. The reason IO is necessary at all is because Haskell is pure; the compiler needs a way to know which data the reference points to at any given time (read sigfpe's "You could have invented monads" blogpost).
MVars are basically the same thing as an IORef, except for two very important differences. MVar is a concurrency primitive, so it's designed for access from multiple threads. The second difference is that an MVar is a box which can be full or empty. So where an IORef Int always has an Int (or is bottom), an MVar Int may have an Int or it may be empty. If a thread tries to read a value from an empty MVar, it will block until the MVar gets filled (by another thread). Basically an MVar a is equivalent to an IORef (Maybe a) with extra semantics that are useful for concurrency.
State is a monad which provides mutable state, not necessarily with IO. In fact, it's particularly useful for pure computations. If you have an algorithm that uses state but not IO, a State monad is often an elegant solution.
There is also a monad transformer version of State, StateT. This frequently gets used to hold program configuration data, or "game-world-state" types of state in applications.
ST is something slightly different. The main data structure in ST is the STRef, which is like an IORef but with a different monad. The ST monad uses type system trickery (the "state threads" the docs mention) to ensure that mutable data can't escape the monad; that is, when you run an ST computation you get a pure result. The reason ST is interesting is that it's a primitive monad like IO, allowing computations to perform low-level manipulations on bytearrays and pointers. This means that ST can provide a pure interface while using low-level operations on mutable data, meaning it's very fast. From the perspective of the program, it's as if the ST computation runs in a separate thread with thread-local storage.
Others have done the core things, but to answer the direct question:
All this makes the line type ENV =
IORef [(String, IORef LispVal)]
confusing. Why the second IORef? What
will break if I do type ENV = State
[(String, LispVal)] instead?
Lisp is a functional language with mutable state and lexical scope. Imagine you've closed over a mutable variable. Now you've got a reference to this variable hanging around inside some other function -- say (in haskell-style pseudocode) (printIt, setIt) = let x = 5 in (\ () -> print x, \y -> set x y). You now have two functions -- one prints x, and one sets its value. When you evaluate printIt, you want to lookup the name of x in the initial environment in which printIt was defined, but you want to lookup the value that name is bound to in the environment in which printIt is called (after setIt may have been called any number of times).
There are ways besids the two IORefs to do this, but you certainly need more than the latter type you've proposed, which doesn't allow you to alter the values that names are bound to in a lexically-scoped fashion. Google the "funargs problem" for a whole lot of interesting prehistory.