Stata: Can I use "by" for a block of code? - block

I know and love by. But I wonder if it is somehow possible to "by" an entire block of code instead of just a single command.
This would be useful if one would first like to use a command that stores values (e.g. count) and then do something with those results.

Yes, in the sense that you can write a program and make it byable and Stata programmers do this routinely.
http://www.stata.com/manuals14/pbyable.pdf
No, in the sense that by: is a prefix to a single command (so, as above, whatever you specify must be specified by one line).

Related

Custom, user-definable "wildcard" constants in SQL database search -- possible?

My client is making database searches using a django webapp that I've written. The query sends a regex search to the database and outputs the results.
Because the regex searches can be pretty long and unintuitive, the client has asked for certain custom "wildcards" to be created for the regex searches. For example.
Ω := [^aeiou] (all non-vowels)
etc.
This could be achieved with a simple permanent string substitution in the query, something like
query = query.replace("Ω", "[^aeiou]")
for all the elements in the substitution list. This seems like it should be safe, but I'm not really sure.
He has also asked that it be possible for the user to define custom wildcards for their searches on the fly. So that there would be some other input box where a user could define
∫ := some other regex
And to store them you might create a model
class RegexWildcard(models.Model):
symbol = ...
replacement = ...
I'm personally a bit wary of this, because it does not seem to add a whole lot of functionality, but does seem to add a lot of complexity and potential problems to the code. Clients can now write their queries to a db. Can they overwrite each other's symbols?
That I haven't seen this done anywhere before also makes me kind of wary of the idea.
Is this possible? Desirable? A great idea? A terrible idea? Resources and any guidance appreciated.
Well, you're getting paid by the hour....
I don't see how involving the Greek alphabet is to anyone's advantage. If the queries are stored anywhere, everyone approaching the system would have to learn the new syntax to understand them. Plus, there's the problem of how to type the special symbols.
If the client creates complex regular expressions they'd like to be able to reuse, that's understandable. Your application could maintain a list of such expressions that the user could add to and choose from. Notionally, the user would "click on" an expression, and it would be inserted into the query.
The saved expressions could have user-defined names, to make them easier to remember and refer to. And you could define a syntax that referenced them, something otherwise invalid in SQL, such as ::name. Before submitting the query to the DBMS, you substitute the regex for the name.
You still have the problem of choosing good names, and training.
To prevent malformed SQL, I imagine you'll want to ensure the regex is valid. You wouldn't want your system to store a ; drop table CUSTOMERS; as a "regular expression"! You'll either have to validate the expression or, if you can, treat the regex as data in a parameterized query.
The real question to me, though, is why you're in the vicinity of standardized regex queries. That need suggests a database design issue: it suggests the column being queried is composed of composite data, and should be represented as multiple columns that can be queried directly, without using regular expressions.

Many instances of a terminal symbol in a BNF grammar

given a grammar like
<term>::= x[i]+exp(x[i]) | x[i]
<i>::= 1|2|3
Does a way exist to force the use of the same "i" in one solution of non terminal symbol ? So, I want to avoid solutions like x[1]+exp(2) or x[3]+exp(1)
Does a way exist to avoid that the same "i" is used in one solution of non terminal symbol ?So, I want to avoid solutions like x[1]+exp(1)
No, that's not possible with a context-free grammar.
This is essentially what "context-free" means. Every non-terminal in a production can be expanded independently without regard to the context in which it appears.
Of course, if i really only has three possible values, you can enumerate the finite number of legal productions, according to any definition of "legal" which you find convenient. But that gets really messy when the number of possibilities increases.
The most convenient solution is generally to accept the base syntax and check for concordance (or difference) in the associated semantic rule. That also allows for better error messages.

Report Earliest Item in List

I am using Snap! to try to find the earliest item in a list. For instance, in list [3,1,2], I would like to report "1." I would like the solution to work for words as well (for instance, given list [Bob, George, Ari] report "Ari").
I tried to use recursion to solve the problem
and the solution works. However, I cannot find a way to do so recursively without the second "if else" statement. Is there a way to use recursion to solve this problem without the "if 0= length of..." statement?
Play with it here.
I don't see a way to do this without two if...else statements. You need two checks:
Is the list exhausted?
Is the first element less than all the following elements?
In some languages, you can use the conditional ternary operator ?:, but I don't think Snap! supports that. It's really just syntactic sugar for an if...else anyway.
You can do some clean-up on this function, though.
I recommend explicitly handling the case of a zero-length list.
"Earliest" is confusing. I recommend the term "least", since you're checking with the "less than" operator.
Don't call keep items such that [] from [] multiple times. This is inefficient and potentially a bug if someone modifies one line but forgets to modify the other. Instead, save the result in a script variable.
Don't compare the current first element to every element in the list. This gives the function an O(n^2) run time. Instead, compare it only to the least element so far. This reduces the run time to O(n).
Some of these changes are implemented here:

Long variable names

Lets say i have a variable that contains the number of search engine names in a file, what would you name it?
number_of_seach_engine_names
search_engine_name_count
num_search_engines
engines
engine_names
other name?
The first name describes what the variable contains precisely, but isn't it too long?, any advice for choosing variable names? especially how to shorten a name that is too long or what kind of abbreviations to use?
How about numEngineNames?
Choosing variable names is more art than science. You want something that doesn't take an epoch to type, but long enough to be expressive. It's a subjective balance.
Ask yourself, if someone were looking at the variable name for the first time, is it reasonably likely that person will understand its purpose?
A name is too long when there exists a shorter name that equally conveys the purpose of the variable.
I think engineCount would be fine here. The number of engine names is presumably equal to the number of engines.
See JaredPar's post.
It depends on the scope of the variable. A local variable in a short function is usually not worth a 'perfect name', just call it engine_count or something like that. Usually the meaning will be easy to spot, if not a comment might be better than a two-line variable name.
Variables of wider scope – i.e. global variables (if they are really necessary!), member variables – deserve IMHO a name that is almost self documentary. Of course looking up the original declaration is not difficult and most IDE do it automatically, but the identifier of the variable should not be meaningless (i.e. number or count).
Of course, all this depends a lot on your personal coding style and the conventions at your work place.
Depends on the context, if its is a local variable, as eg
int num = text.scan(SEARCH_ENGINE_NAME).size();
the more explicit the right-hand of the expression the shorter the name I'd pick. The rational is that we are in a limited scope of maybe 4-5 lines and can thus assume that the reader will be able to make the connection between the short name and the right-hand-side expression. If however, it is the field of a class, I'd rather be as verbose as possible.
See similar question
The primary technical imperative is to reduce complexity. Variables should be named to reduce complexity. Sometimes this results in shorter names, sometimes longer names. It usually corresponds to how difficult it is for a maintainer to understand the complexity of the code.
On one end of the spectrums, you have for loop iterators and indexes. These can have names like i or j, because they are just that common and simple. Giving them longer names would only cause more confusion.
If a variable is used frequently but represents something more complex, then you have to give it a clear name so that the user doesn't have to relearn what it means every time they use it.
On the other end of the spectrum are variables that are used very rarely. You still want to reduce confusion here, but giving it a short name is less important, because the penalty for relearning the purpose of the variable is not paid very often.
When thinking about your code, try to look at it from the perspective of someone else. This will help not only with picking names, but with keeping your code readable as a whole.
Having really long variable names will muddle up your code's readability, so you want to avoid those. But on the other end of the spectrum, you want to avoid ultra-short names or acronyms like "n" or "ne." Short, cryptic names like these will cause someone trying to read your code to tear their hair out. Usually one to two letter variables are used for small tasks like being incremented in a for loop, for example.
So what you're left with is a balance between these two extremes. "Num" is a commonly used abbreviation, and any semi-experienced programmer will know what you mean immediately. So something like "numEngines" or "numEngineNames" would work well. In addition to this, you can also put a comment in your code next to the variable the very first time it's used. This will let the reader know exactly what you're doing and helps to avoid any possible confusion.
I'd name it "search_engine_count", because it holds a count of search engines.
Use Esc+_+Esc to write:
this_is_a_long_variable = 42
Esc+_+Esc and _ are not identical characters in Mathematica. That's why you are allowed to use the former but not the latter.
If it is a local variable in a function, I would probably call it n, or perhaps ne. Most functions only contain two or three variables, so a long name is unnecessary.

whats the difference between a command and a statement

Often when reading about Tcl (e.g. http://antirez.com/articoli/tclmisunderstood.html) you read that "everything is a command". Sometimes you also hear how other languages are, like Tcl, "command languages."
To me with my background in other languages, I just view these "commands" as statements. What precisely is the difference between a command and a statement?
Traditionally, in Tcl, the phrase "everything is a command" means that there's no such thing as a "reserved word" command, or one that is defined by the system that you can't change. Every single executable piece of Tcl code is of the format:
command ?arg1? ... ?argN?
There's no such thing as a command that's part of the syntax and can't be overwritten (like "if" and other control structures in most languages). It's entirely possible to redefine the "if" command to do something slightly different.
For example, you could redefine "while" as:
proc while {expression body} {
for {} {[uplevel 1 expr $expression]} {} {
uplevel 1 $body
}
}
The above code being untested... but it shows the general idea.
A command is what other languages call a function, routine or reserved word, and can be defined by the "proc" command or in C or whatever. A statement is an invocation of a command. Using traditional definitions, a statement in Tcl is a command followed by zero or more arguments.
Consider the following code:
1 proc hello {string} {
2 puts "hello, $string"
3 }
4 hello "world"
Line 1 defines a command named "hello", line 4 is a statement that calls the "hello" command.
True enough, some articles on Tcl use the term "command" to mean both the name of a command and the invocation of the command. I wouldn't get too hung up on that. Think of statements as the invocation of a command and you'll be fine.
When you see the phrase "everything is a command", it means is that there are no reserved words. Things that look like language syntax or keywords -- if, return, exit, break, while and so on... -- are all commands. They can all be treated alike: they can be renamed, they can be removed, they can be re-implemented, etc.
I'd say that a command is what you execute in a statement; different statements may execute the same command (with different arguments, typically). The command is the operation; a statement is a specific invocation of the operation.
I guess this is mainly a question of semantics, so there may be some variation in the understanding of these concepts. That said, this Wikipedia article provides some guidance that is in keeping with my intuition on the topic.
A statement is a unit of an imperative program. Specifically, it is the unit that is terminated by the statement terminator. In C that's a semi-colon. Or, in Pascal and its descendants, it's the unit that's separated by the statement separator. I think in most flavours of Pascal that's also a semi-colon.
A command is an action, such as a function call or a keyword that performs an action. The Wikipedia article likens them to verbs, and I think that's a good description.
So, a variable declaration is a statement, but probably not a command. And a variable assignment via an assignment operator might be considered a command by some and not by others. One sometimes hears it referred to as an assignment command, but not often. If it looks like a function call, as in TCL, then it's more 'command-like', since there's an explicit verb set.
Also, statements may consist of several commands. For example, think about several function calls in C joined with commas. I would consider that one statement, since it has one semi-colon and returns one value. However, each of the separate calls could be considered a command.
The other thing to bear in mind when considering the statement/command terminology is that statements typically refer to imperative programs, while commands typically refer to shells and scripts. So, when using a command shell like bash, one speaks of commands. If I write a bash script, then it usually thought of as a script of commands, rather than a program of statements, even though the difference is largely academic.
TCL, as one of the early scripting languages, probably wanted to draw a distinction between itself as an interpreted scripting language running in a shell (wish), versus compiled languages like C and Pascal. Perl, for example, having come to popularity somewhat later, typically doesn't harp on the distinction.
However, you still often hear people refer to Perl scripts, rather than Perl programs, and likewise for TCL.
I fear that my answer may have done nothing to clarify the distinction, but I hope it helps.
I often wondered why the term 'statement' is used in programming since it does not seem to match with the meaning of the word in natural language, where a command is an imperative and a statement is not. Intuitively I would prefer the word command.
In Pascal, a variable declaration is not considered a statement, as Jeremy Bourque suggests (it may be true for other languages), since each programming block is divided in a declaration section with all declarations and the statement section with all statements.
Statements are separated by semicolons, as Jeremy Bourque says above. I don't think it is possible to have more than one command in one statement (as in C, apparently). Except for compound statements of course, which have one or more sub-statements. So I guess that one could consider command and statement synonyms in Pascal.
However, in implementations of Pascal, the word command is often used for commands given to the compiler and development environment (IDE). It might be useful to have a different word for commands (statements) in the source code. Perhaps the people who developed Pascal and other early languages, considered a command as something which was to be executed immediately. Therefore Bourque's remarks about the difference between compiled and scripted languages sound logical to me.
I would think that a command is an instruction in code, and a statement may run several commands but at the end evaluates to true or false.