Is there an APL idiom to get a vector of all alphabetical characters? - cross-platform

I know you can get a character vector of all numbers with ∊⍕¨⍳10, but is there a platform independent idiom for getting a vector of all alphabetical characters, aside from manually typing 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'? I know that I can do ⎕AV[(⍳26)+(⎕AV⍳'a')-1] to get all lowercase characters (and uppercase by changing the 'a' to 'A') in Dyalog APL, but I presume the system variable ⎕AV isn't available in other environments.

Not really.
In Dyalog APL, what I generally do is use ⎕A for the uppercase characters and ⎕UCS 96+⍳26 for the lowercase characters. (And ⎕A,⎕UCS 96+⍳26 for the whole alphabet.)
⎕AV is usually present, but its contents are not standard. (For example, NARS2000's ⎕AV is different from Dyalog's.) By the way, in Dyalog ⎕AV is considered deprecated in favour of ⎕UCS. Any APL that implements ⎕UCS will do it the same way, because Unicode is a set standard.
If you want a guaranteed implementation-independent, readable way to define the alphabet I would indeed recommend to just store abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ in your workspace.
However, I would not recommend trying to write implementation-independent APL code to begin with. APL dialects are rather divergent, so this is decidedly nontrivial (if possible at all for complex code), and will be difficult to maintain.

Even though Quad names (⎕xxx) are usually case insensitive, MicroAPL distinguishes between ⎕A and ⎕a, so ⎕a,⎕A gives 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.

Yes, in the latest versions*, write ⎕A,819⌶⎕A, for ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.
Try it online!
Documentation.
* Latest builds of 14.1, and all versions of 15.0 and up.

Related

Kotlin, why is function deprecated?

Recently, i've wanted to use capitalize() function.
When I did so, a warning occured:
'capitalize(): String' is deprecated. Use replaceFirstChar instead.
The suggested replaceFirstChar feels quite long and complicated:
input.replaceFirstChar { if (it.isLowerCase()) it.titlecase(Locale.getDefault()) else it.toString() }
Although everything works fine with capitalize(), i've been just wondering what the reason for deprecated warning is, if there's any problem with it and how can i solve this.
There is no problem with the code per se. The "issue" is what capitalize means for all of the possible languages/characters/platforms/locales.
The doc even says why it is deprecated.
The title case of a character is USUALLY the same as its upper case
with several exceptions. The particular list of characters with the
special title case form depends on the underlying platform.
If you follow all of the reasoning why it is deprecated you reach isTitleCase where you will have an example of characters in languages where the character should not be "upper cased".
Capitalization is writing a word with its first letter as a capital letter (uppercase
letter) and the remaining letters in lower case, in writing systems
with a case distinction.
There are a few good examples and explanations for different languages on Letter case Wiki
e.g.
The German letter "ß" formerly existed only in lower case.

Should a number after a word in ALL_CAPS style be preceded by an underscore?

I tried to find literature on this but couldn't seem to find any. Examples in PEP8 do not include digits (I'm using Python, but this question should be language agnostic).
In snake_case, I would write variable1, variable2, and this is fine to me as the number stand out.
However, in ALL_CAPS, I instinctively write VARIABLE_1, VARIABLE_2 instead of VARIABLE1, VARIABLE2, I suppose because it feels like the digits blend into the words without an underscore separating them. For a more real example see NUM2WORDS v NUM_2_WORDS, the latter seems far clearer, at least to me.
Is this "wrong" (as far as the definition of that word can stretch)? What is the prevailing style and why?
In prose (ordinary writing) you would write "variable 1" not "variable1", so for sake of consistency I think you should add an underscore when snake case is used, i.e. variable_1.

Does the triple equal sign (===) behave differently in AssemblyScript?

A vendor I use packages their software with AssemblyScript. They provide some infrastructure and I build on top of it.
Accidentally, I changed my double equal signs ("==") to triple equal signs ("===") in a function that performs equality checks on hexadecimal strings. I spent hours ensuring that the values checked are indeed equal and have the same case sensitivity, but nothing could make the if statement enter the branch I was expecting it to enter, except for going back to "==".
And so I ended up here, asking for help. How is "===" different to "==" in AssemblyScript? Is it some quirk of the language itself or the vendor's parser?
Yes. In AssemblyScript tripple equal ("===") compare raw references and skip overloading operator ("=="). See docs.
There are have proposal avoid this non-standard for TypeScript behaviour. You could check and upvote this issue

Why not have operators as both keywords and functions?

I saw this question and it got me wondering.
Ignoring the fact that pretty much all languages have to be backwards compatible, is there any reason we cannot use operators as both keywords and functions, depending on if it's immediately followed by a parenthesis? Would it make the grammar harder?
I'm thinking mostly of python, but also C-like languages.
Perl does something very similar to this, and the results are sometimes surprising. You'll find warnings about this in many Perl texts; for example, this one comes from the standard distributed Perl documentation (man perlfunc):
Any function in the list below may be used either with or without parentheses around its arguments. (The syntax descriptions omit the parentheses.) If you use parentheses, the simple but occasionally surprising rule is this: It looks like a function, therefore it is a function, and precedence doesn't matter. Otherwise it's a list operator or unary operator, and precedence does matter. Whitespace between the function and left parenthesis doesn't count, so sometimes you need to be careful:
print 1+2+4; # Prints 7.
print(1+2) + 4; # Prints 3.
print (1+2)+4; # Also prints 3!
print +(1+2)+4; # Prints 7.
print ((1+2)+4); # Prints 7.
An even more surprising case, which often bites newcomers:
print
(a % 7 == 0 || a % 7 == 1) ? "good" : "bad";
will print 0 or 1.
In short, it depends on your theory of parsing. Many people believe that parsing should be precise and predictable, even when that results in surprising parses (as in the Python example in the linked question, or even more famously, C++'s most vexing parse). Others lean towards Perl's "Do What I Mean" philosophy, even though the result -- as above -- is sometimes rather different from what the programmer actually meant.
C, C++ and Python all tend towards the "precise and predictable" philosophy, and they are unlikely to change now.
Depending on the language, not() is not defined. If not() is not defined in some language, you can not use it. Why not() is not defined in some language? Because creator of that language probably had not need this type of language construction. Because it is better to let things be simpler.

What is The Turkey Test?

I came across the word 'The Turkey Test' while learning about code testing. I don't know really what it means.
What is Turkey Test? Why is it called so?
The Turkey problem is related to software internationalization or simply to its misbehavior in various language cultures.
In various countries there are different standards, for example for writing dates (14.04.2008 in Turkey and 4/14/2008 in US), numbers (i.e. 123,45 in Poland and 123.45 in USA) and rules about character uppercasing (like in Turkey with letters i, I and ı).
As Jeff Moser pointed below one such problem was pointed out by a Turkish user who found a bug in the ToUpper() function. There are more details in comments below.
However the problem is not limited to Turkey and to string conversions.
For example, in Poland and many other countries, dates and numbers are also written in a different manner.
Some links from a Google search for the Turkey Test :
Does Your Code Pass The Turkey Test?
by Jeff Moser
What's Wrong With Turkey?
by Jeff Atwood
Here is described the turkey test
Forget about Turkey, this won't even pass in the USA. You need a case insensitive compare. So you try:
String.Compare(string,string,bool ignoreCase):
....
Do any of these pass "The Turkey Test?"
Not a chance!
Reason: You've been hit with the "Turkish I" problem.
As discussed by lots and lots of people, the "I" in Turkish behaves differently than in most languages. Per the Unicode standard, our lowercase "i" becomes "İ" (U+0130 "Latin Capital Letter I With Dot Above") when it moves to uppercase. Similarly, our uppercase "I" becomes "ı" (U+0131 "Latin Small Letter Dotless I") when it moves to lowercase.
We write dates smaller to bigger like dd.MM.yyyy: 28.10.2010
We use '.'(dot) for thousands separator, and ','(comma) for decimal separator: 4.567,9
We have ö=>Ö, ç=>Ç, ş=>Ş, ğ=>Ğ, ü=>Ü, and most importantly ı=>I and i => İ; in other words, lower case of upper I is dotless and upper case of lower i is dotted.
People may have very stressful times because of meaningless errors caused by the above rules.
If your code properly runs in Turkey, it'll probably work anywhere.
The so called "Turkey Test" is related to Software internationalization. One problem of globalization/internationalization are that date and time formats in different cultures can differ on many levels (day/month/year order, date separator etc).
Also, Turkey has some special rules for capitalization, which can lead to problems. For example, the Turkish "i" character is a common problem for many programs which capitalize it in a wrong way.
The link provided by #Luixv gives a comprehensive description of the issue.
The summary is that if your going to test your code on only one non-English locale, test it on Turkish.
This is because the Turkish has instances of most edge cases you are likely to encounter with localization, including "unusual" format strings and non-standard characters (such as a different capitalization rules for i).
Jeff Atwood has a blog article on same which is the first place I came across it myself.
in summary attempting to run your application under a Turkish Locale is an excellent test
of your I18n.
here's jeffs article