How is the `*var-name*` naming-convention used in clojure? - naming-conventions

As a non-lisper coming to clojure how should I best understand the naming convention where vars get a name like *var-name*?
This appears to be a lisp convention indicating a global variable. But in clojure such vars appear in namespaces as far as I can tell.
I would really appreciate a brief explanation of what I should expect when an author has used such vars in their code, ideally with a example of how and why such a var would be used and changed in a clojure library.

It's a convention used in other Lisps, such as Common Lisp, to distinguish between special variables, as distinct from lexical variables. A special or dynamic variable has its binding stored in a dynamic environment, meaning that its current value as visible to any point in the code depends upon how it may have been bound higher up the call stack, as opposed to being dependent only on the most local lexical binding form (such as let or defn).
Note that in his book Let Over Lambda, Doug Hoyte argues against the "earmuffs" asterix convention for naming special variables. He uses an unusual macro style that makes reference to free variables, and he prefers not to commit to or distinguish whether those symbols will eventually refer to lexical or dynamic variables.
Though targeted specifically at Common Lisp, you might enjoy Ron Garret's essay The Idiot's Guide to Special Variables. Much of it can still apply to Clojure.

Functional programming is all about safe predictable functions. Infact some of us are afraid of that spooky "action at a distance" thing. When people call a function they get a warm fuzzy satisfaction that the function will always give them the same result if they call the function or read the value again. the *un-warm-and-fuzzy* bristly things exist to warn programmers that this variable is less cuddly than some of the others.

Some references I found in the Clojure newsgroups:
Re: making code readable
John D. Hume
Tue, 30 Dec 2008 08:30:57 -0800
On Mon, Dec 29, 2008 at 4:10 PM, Chouser wrote:
I believe the idiom for global values like this is to place asterisks
around the name.
I thought the asterisk convention was for variables intended for
dynamic binding. It took me a minute to figure out where I got that
idea. "Programming Clojure" suggests it (without quite saying it) in
chapter 6, section 3.
"Vars intended for dynamic binding are sometimes called special vari-
ables. It is good style to name them
with leading and trailing asterisks."
Obviously the book's a work in progress, but that does sound
reasonable. A special convention for variables whose values change (or
that my code's welcome to rebind) seems more useful to me than one for
"globals" (though I'm not sure I'd consider something like grid-size
for a given application a global). Based on ants.clj it appears Rich
doesn't feel there needs to be a special naming convention for that
sort of value.
and...
I believe the idiom for global values like this is to place asterisks
around the name. Underscores (and CamelCase) should only be used when
required for Java interop:
(def *grid-size* 10)
(def *height* 600)
(def *margin* 50)
(def *x-index* 0)
(def *y-index* 1)

Related

Why do people put "my" in their variable names?

I have seen an incredibly large number of variables in our codebase at work that are along the lines of myCounter or myClassVariable. Why?
I don't just see this at work, either. I see this in tutorials online, github, blogs, etc. I would understand if it's just a placeholder for an example, but otherwise I can't imagine it being a standard of any kind.
Is this a holdover from some old standard or is it just a bad practice that snuck in before code reviews were common place? Was it an ancestor to people's usage of the underscore to indicate a _ClassName?
Starting a variable name with my indicates that it is defined by the programmer rather than a predefined variable. In a statically typed language (with explicit types) it also prevents a name clash with the type name, for instance
Collection myCollection;
A my variable name should only be used in generic examples where the variable has no other interpretation.
It may also be worth mentioning that in the programming language Perl (since version 5 from 1994) the keyword my is used to declare a local variable, for instance
my $message = "hello there";
https://perldoc.perl.org/functions/my

Constant vs Unchanged Variable?

Is there any advantage to using a constant (unchangable) than just not changing a variable?
Depending on your language and compiler, a constant may get inlined & optimized when built. Variables will likely eat up stack space even if it never changes.
By making the value constant, the compiler can just substitute it. If you have x / 2, for example, the compiler can compute the value and use that instead of having to emit code to retrieve the value of x and then divide it by 2.
Also, you don't have to worry about accidentally changing the value. For example, in C-like languages you might accidentally type if (x = 2) when you meant if (x == 2) which will change the value of x if it's a variable.
Anyone maintaining your code in the future (including you) won't have to look around to see where (if anywhere) a constant is changed when finding a bug or adding a feature - they'll know right off the bat that it can't be changed.
In some program languages, declaring something to be constant will allow a compiler to make optimizations which would not otherwise be possible. Further, declaring something to be constant can be a useful way of documenting that there are places in the code which might be broken should the value change.
Unfortunately, some programming languages sometimes do evil things with things that are declared constant. For example, in some .net languages, if a value type which is declared read-only is passed by modifiable reference, the compiler will, rather than refusing to allow such an action, instead make a copy and pass that. Such implicit copying will impair efficiency, and may result in unexpected semantics.

Ocaml naming convention

I am wondering if there exists already some naming conventions for Ocaml, especially for names of constructors, names of variables, names of functions, and names for labels of record.
For instance, if I want to define a type condition, do you suggest to annote its constructors explicitly (for example Condition_None) so as to know directly it is a constructor of condition?
Also how would you name a variable of this type? c or a_condition? I always hesitate to use a, an or the.
To declare a function, is it necessary to give it a name which allows to infer the types of arguments from its name, for example remove_condition_from_list: condition -> condition list -> condition list?
In addition, I use record a lot in my programs. How do you name a record so that it looks different from a normal variable?
There are really thousands of ways to name something, I would like to find a conventional one with a good taste, stick to it, so that I do not need to think before naming. This is an open discussion, any suggestion will be welcome. Thank you!
You may be interested in the Caml programming guidelines. They cover variable naming, but do not answer your precise questions.
Regarding constructor namespacing : in theory, you should be able to use modules as namespaces rather than adding prefixes to your constructor names. You could have, say, a Constructor module and use Constructor.None to avoid confusion with the standard None constructor of the option type. You could then use open or the local open syntax of ocaml 3.12, or use module aliasing module C = Constructor then C.None when useful, to avoid long names.
In practice, people still tend to use a short prefix, such as the first letter of the type name capitalized, CNone, to avoid any confusion when you manipulate two modules with the same constructor names; this often happen, for example, when you are writing a compiler and have several passes manipulating different AST types with similar types: after-parsing Let form, after-typing Let form, etc.
Regarding your second question, I would favor concision. Inference mean the type information can most of the time stay implicit, you don't need to enforce explicit annotation in your naming conventions. It will often be obvious from the context -- or unimportant -- what types are manipulated, eg. remove cond (l1 # l2). It's even less useful if your remove value is defined inside a Condition submodule.
Edit: record labels have the same scoping behavior than sum type constructors. If you have defined a {x: int; y : int} record in a Coord submodule, you access fields with foo.Coord.x outside the module, or with an alias foo.C.x, or Coord.(foo.x) using the "local open" feature of 3.12. That's basically the same thing as sum constructors.
Before 3.12, you had to write that module on each field of a record, eg. {Coord.x = 2; Coord.y = 3}. Since 3.12 you can just qualify the first field: {Coord.x = 2; y = 3}. This also works in pattern position.
If you want naming convention suggestions, look at the standard library. Beyond that you'll find many people with their own naming conventions, and it's up to you to decide who to trust (just be consistent, i.e. pick one, not many). The standard library is the only thing that's shared by all Ocaml programmers.
Often you would define a single type, or a single bunch of closely related types, in a module. So rather than having a type called condition, you'd have a module called Condition with a type t. (You should give your module some other name though, because there is already a module called Condition in the standard library!). A function to remove a condition from a list would be Condition.remove_from_list or ConditionList.remove. See for example the modules List, Array, Hashtbl,Map.Make`, etc. in the standard library.
For an example of a module that defines many types, look at Unix. This is a bit of a special case because the names are mostly taken from the preexisting C API. Many constructors have a short prefix, e.g. O_ for open_flag, SEEK_ for seek_command, etc.; this is a reasonable convention.
There's no reason to encode the type of a variable in its name. The compiler won't use the name to deduce the type. If the type of a variable isn't clear to a casual reader from the context, put a type annotation when you define it; that way the information provided to the reader is validated by the compiler.

Separate Namespaces for Functions and Variables in Common Lisp versus Scheme

Scheme uses a single namespace for all variables, regardless of whether they are bound to functions or other types of values. Common Lisp separates the two, such that the identifier "hello" may refer to a function in one context, and a string in another.
(Note 1: This question needs an example of the above; feel free to edit it and add one, or e-mail the original author with it and I will do so.)
However, in some contexts, such as passing functions as parameters to other functions, the programmer must explicitly distinguish that he's specifying a function variable, rather than a non-function variable, by using #', as in:
(sort (list '(9 A) '(3 B) '(4 C)) #'< :key #'first)
I have always considered this to be a bit of a wart, but I've recently run across an argument that this is actually a feature:
...the
important distinction actually lies in the syntax of forms, not in the
type of objects. Without knowing anything about the runtime values
involved, it is quite clear that the first element of a function form
must be a function. CL takes this fact and makes it a part of the
language, along with macro and special forms which also can (and must)
be determined statically. So my question is: why would you want the
names of functions and the names of variables to be in the same
namespace, when the primary use of function names is to appear where a
variable name would rarely want to appear?
Consider the case of class names: why should a class named FOO prevent
the use of variables named FOO? The only time I would be referring the
class by the name FOO is in contexts which expect a class name. If, on
the rare occasion I need to get the class object which is bound to the
class name FOO, there is FIND-CLASS.
This argument does make some sense to me from experience; there is a similar case in Haskell with field names, which are also functions used to access the fields. This is a bit awkward:
data Point = Point { x, y :: Double {- lots of other fields as well --} }
isOrigin p = (x p == 0) && (y p == 0)
This is solved by a bit of extra syntax, made especially nice by the NamedFieldPuns extension:
isOrigin2 Point{x,y} = (x == 0) && (y == 0)
So, to the question, beyond consistency, what are the advantages and disadvantages, both for Common Lisp vs. Scheme and in general, of a single namespace for all values versus separate ones for functions and non-function values?
The two different approaches have names: Lisp-1 and Lisp-2. A Lisp-1 has a single namespace for both variables and functions (as in Scheme) while a Lisp-2 has separate namespaces for variables and functions (as in Common Lisp). I mention this because you may not be aware of the terminology since you didn't refer to it in your question.
Wikipedia refers to this debate:
Whether a separate namespace for functions is an advantage is a source of contention in the Lisp community. It is usually referred to as the Lisp-1 vs. Lisp-2 debate. Lisp-1 refers to Scheme's model and Lisp-2 refers to Common Lisp's model. These names were coined in a 1988 paper by Richard P. Gabriel and Kent Pitman, which extensively compares the two approaches.
Gabriel and Pitman's paper titled Technical Issues of Separation in Function Cells and Value Cells addresses this very issue.
Actually, as outlined in the paper by Richard Gabriel and Kent Pitman, the debate is about Lisp-5 against Lisp-6, since there are several other namespaces already there, in the paper are mentioned type names, tag names, block names, and declaration names. edit: this seems to be incorrect, as Rainer points out in the comment: Scheme actually seems to be a Lisp-1. The following is largely unaffected by this error, though.
Whether a symbol denotes something to be executed or something to be referred to is always clear from the context. Throwing functions and variables into the same namespace is primarily a restriction: the programmer cannot use the same name for a thing and an action. What a Lisp-5 gets out of this is just that some syntactic overhead for referencing something from a different namespace than what the current context implies is avoided. edit: this is not the whole picture, just the surface.
I know that Lisp-5 proponents like the fact that functions are data, and that this is expressed in the language core. I like the fact that I can call a list "list" and a car "car" without confusing my compiler, and functions are a fundamentally special kind of data anyway. edit: this is my main point: separate namespaces are not a wart at all.
I also liked what Pascal Constanza had to say about this.
I've met a similar distinction in Python (unified namespace) vs Ruby (distinct namespaces for methods vs non-methods). In that context, I prefer Python's approach -- for example, with that approach, if I want to make a list of things, some of which are functions while others aren't, I don't have to do anything different with their names, depending on their "function-ness", for example. Similar considerations apply to all cases in which function objects are to be bandied around rather than called (arguments to, and return values from, higher-order functions, etc, etc).
Non-functions can be called, too (if their classes define __call__, in the case of Python -- a special case of "operator overloading") so the "contextual distinction" isn't necessarily clear, either.
However, my "lisp-oid" experience is/was mostly with Scheme rather than Common Lisp, so I may be subconsciously biased by the familiarity with the uniform namespace that in the end comes from that experience.
The name of a function in Scheme is just a variable with the function as its value. Whether I do (define x (y) (z y)) or (let ((x (lambda (y) (z y)))), I'm defining a function that I can call. So the idea that "a variable name would rarely want to appear there" is kind of specious as far as Scheme is concerned.
Scheme is a characteristically functional language, so treating functions as data is one of its tenets. Having functions be a type of their own that's stored like all other data is a way of carrying on the idea.
The biggest downside I see, at least for Common Lisp, is understandability. We can all agree that it uses different namespaces for variables and functions, but how many does it have? In PAIP, Norvig showed that it has "at least seven" namespaces.
When one of the language's classic books, written by a highly respected programmer, can't even say for certain in a published book, I think there's a problem. I don't have a problem with multiple namespaces, but I wish the language was, at the least, simple enough that somebody could understand this aspect of it entirely.
I'm comfortable using the same symbol for a variable and for a function, but in the more obscure areas I resort to using different names out of fear (colliding namespaces can be really hard to debug!), and that really should never be the case.
There's good things to both approaches. However, I find that when it matters, I prefer having both a function LIST and a a variable LIST than having to spell one of them incorrectly.

Is there a concise catalog of variable naming-conventions?

There are many different styles of variable names that I've come across over the years.
The current wikipedia entry on naming conventions is fairly light...
I'd love to see a concise catalog of variable naming-conventions, identifying it by a name/description, and some examples.
If a convention is particularly favored by a certain platform community, that would be worth noting, too.
I'm turning this into a community wiki, so please create an answer for each convention, and edit as needed.
The best naming convention set that I've seen is in the book "Code Complete" Steve McConnell has a great section in there about naming conventions and lots of examples. His examples run through a number of "best practices" for different languages, but ultimately leave it up to the developer, dev manager, or architect to decide the specific action.
PEP8 is the style guide for Python code that is part of the standard library which is relatively influential in the Python community. For bonus points, in addition to covering the naming conventions used by the Python standard library, it provides an overview of naming conventions in general.
According to PEP8: variables, instance variables, functions and methods in the standard library are supposed to use lowercase words separated by underscores for readability:
foo
parse_foo
a_long_descriptive_name
To distinguish a name from a reserved word, append an underscore:
in_
and_
or_
Names that are weakly private (not imported by 'from M import *') start with a single underscore. Names whose privacy is enforced with name mangling start with double underscores.:
_some_what_private
__a_slightly_more_private_name
Names that are Python 'special' methods start and end with double underscores:
__hash__ # hash(o) = o.__hash__()
__str__ # str(o) = o.__str__()
Sun published a list of variable name conventions for Java here. The list includes conventions for naming packages, classes, interfaces, methods, variables, and constants.
The section on variables states
Except for variables, all instance, class, and class constants are in mixed case with a lowercase first letter. Internal words start with capital letters. Variable names should not start with underscore _ or dollar sign $ characters, even though both are allowed.
Variable names should be short yet meaningful. The choice of a variable name should be mnemonic- that is, designed to indicate to the casual observer the intent of its use. One-character variable names should be avoided except for temporary "throwaway" variables. Common names for temporary variables are i, j, k, m, and n for integers; c, d, and e for characters.
Some examples include
int i;
char c;
float myWidth;
Whatever your naming convention is, it usually do not allow you to easily distinguish:
instance (usually private) variable
local variables (used as parameters or local to a function)
I use a naming convention based on a the usage of indefinite article (a, an, some) for local variables, and no article for instance variables, as explained in this question about variable prefix.
So a prefix (either my way, or any other prefix listed in that question) can be one way of refining variable naming convention.
I know your question is not limited on variables, I just wanted to point out that part of naming convention.
There is, the "Java Programmers Phrasebook" by Einar Hoest.
He analysed the grammatical structure of method names together with the structure of the method bodies, and collected information such as:
add-[noun]-* These methods often have parameters and create objects.
They very rarely return field values.
The phrase appears in practically all
applications.
etcetera ... collected from hundreds of open-source projects.
For more background information, please refer to his SLE08 paper.