Opening curly brace at a newline in Kotlin - kotlin

Based on Kotlin coding conventions, it is discouraged to put opening curly braces on its own line as semi-colons are optional and that might cause surprising behavior. What would the surprising behavior be if we put the opening brace on its own line?
Formatting
In most cases, Kotlin follows the Java coding conventions.
Use 4 spaces for indentation. Do not use tabs.
For curly braces, put the opening brace in the end of the line where
the construct begins, and the closing brace on a separate line aligned
horizontally with the opening construct.
if (elements != null) {
for (element in elements) {
// ...
}
}
(Note: In Kotlin, semicolons are optional, and therefore line breaks are significant. The language design assumes Java-style braces,
and you may encounter surprising behavior if you try to use a
different formatting style.)

The comment about surprising behavior is not about the curly brace, but more general. Consider this code:
val result = 1
+ 2
println(result)
you might expect this to print "3", but it prints "1" because these are 2 statements, val result = 1 and + 2
You would write it like this in Kotlin if you wanted to break the line:
val result = 1 +
2
this is a simple example, but highlights the difference that not having semicolons to determine the end of a statement makes.

Related

Escape hex like \u... in kotlin strings

I have a string "\ufffd\ufffd hello\n"
i have a code like this
fun main() {
val bs = "\ufffd\ufffd hello\n"
println(bs) // �� hello
}
and i want to see "\ufffd\ufffd hello", how can i escape \u for every hex values
UPD:
val s = """\uffcd"""
val req = """(?<!\\\\)(\\\\\\\\)*(\\u)([A-Fa-f\\d]{4})""".toRegex()
return s.replace(unicodeRegex, """$1\\\\u$3""")
(I'm interpreting the question as asking how to clearly display a string that contains non-printable characters.  The Kotlin compiler converts sequences of a \u followed by 4 hex digits in string literals into single characters, so the question is effectively asking how to convert them back again.)
Unfortunately, there's no built-in way of doing this.  It's fairly easy to write one, but it's a bit subjective, as there's no single definition of what's ‘printable‘…
Here's an extension function that probably does roughly what you want:
fun String.printable() = map {
when (Character.getType(it).toByte()) {
Character.CONTROL, Character.FORMAT, Character.PRIVATE_USE,
Character.SURROGATE, Character.UNASSIGNED, Character.OTHER_SYMBOL
-> "\\u%04x".format(it.toInt())
else -> it.toString()
}
}.joinToString("")
println("\ufffd\ufffd hello\n".printable()) // prints ‘\ufffd\ufffd hello\u000a’
The sample string in the question is a bad example, because \uFFFD is the replacement character — a black diamond with a question mark, usually shown in place of any non-displayable characters.  So the replacement character itself is displayable!
The code above treats it as non-displayable by excluding the Character.OTHER_SYMBOL type — but that will also exclude many other symbols.  So you'll probably want to remove it, leaving just the other 5 types.  (I got those from this answer.)
Because the trailing newline is non-displayable, that gets converted to a hex code too.  You could extend the code to handle the escape codes \t, \b, \n, \r and maybe \\ too if needed.  (You could also make it more efficient… this was done for brevity!)
Simply escape the \ in your strings by adding another backslash in front of it:
val bs = "\\ufffd\\ufffd hello\n"
You can also use raw strings with """ so you don't have to escape the backslashes (which is useful for regex):
val bs = """\ufffd\ufffd hello\n"""
Note that in that case the \n would also NOT be counted as an LF character, and will be literally printed as the 2 characters "\n".
You can add literal line breaks in your raw string if you want an actual line feed, though:
val bs = """\ufffd\ufffd hello
"""

Brace Delimiters with qq Don't Interpolate Code in Raku

Sorry if this is documented somewhere, but I haven't been able to find it. When using brace delimiters with qq, code is not interpolated:
qq.raku
#!/usr/bin/env raku
say qq{"Two plus two": { 2 + 2 }};
say qq["Two plus two": { 2 + 2 }];
$ ./qq.raku
"Two plus two": { 2 + 2 }
"Two plus two": 4
Obviously, this isn't a big deal since I can use a different set of delimiters, but I ran across it and thought I'd ask.
Update
As #raiph pointed out, I forgot to put the actual question: Is this the way it's supposed to work?
The quote language "nibbler" (the bit of the grammar that eats its way through a quoted string) looks like this:
[
<!stopper>
[
|| <starter> <nibbler> <stopper>
|| <escape>
|| .
]
]*
That is, until we see a stopper, eat whichever comes first of:
A starter (the opening { in your case), followed by some internal stuff, followed by a stopper (the }); this allows for nesting of the construct inside of the string
An escape (and closure interpolation is considered a kind of escape)
Any other character
This ordering in the grammar means that a nesting of the chosen quote starter/stopper will always win over an escape. This issue was discussed during the language design; we could, after all, have reordered the alternation in the grammar to have escapes win. On balance, however, it was felt that the choice of starter/stopper was the more local decision than the general properties of the quoting language, and so should take precedence. (This is also consistent with how quote languages are constructed: we take the base quoted string grammar and mix starter/stopper methods into it.)
Obviously, this isn't a big deal since I can use a different set of delimiters, but I ran across it and thought I'd ask.
You didn't ask anything. :)
Let's say you've got some text. And you want to use double quote processing to get interpolation, except you don't want braced text to be interpolated as code. You could write, say, qq:!c '...'. But don't you think it's a lot easier to remember, write, and read qq{ ... }?
Nice little touch, right?
Which is why it's the way it is -- it's a very nice touch.
And, perhaps, why it's not documented -- it's little, and, once you encounter it, obvious what you need to do.
That said, the Q lang escapes include ones to recursively re-enter the Q lang:
say qq{"Two plus two": \qq[{ 2 + 2 }] }; # "Two plus two": 4
Does that answer your question? :)

Keep block style code spacing in IntelliJ

How would one achieve to keep IntelliJ from removing spaces in Python (or any language for that matter) in areas where spaces serve a specific purpose of readability, such as repetitive assignments of many values.
is how I like it and I think many vim users agree that this is the way to go.
However, this is what IntelliJ makes out of it
The issue is specifically interesting with language such as python where spaces can (but do not have to) impact the programs flow.
I am also aware that it is rather difficult to define when spaces should be compacted (i.e. when only one of the 4 lines above are present) and when they should be kept.
I guess some heuristic approaches would work, this however wouldn't really be a 100% on-spot lintable situation.
I like your idea, but don't see how to achieve that within the Editor Settings.
An ugly alternative that does work, but "pollutes" your source, is to Enable formatter markers in comments on this screen: File -> Settings -> Editor -> Code Style:
After choosing that option you can selectively create blocks of code that will be ignored by IDEA when it formats the code:
// #formatter:off
String s1 = "Arkansas" + ".";
String s2 = "Maine" + ".";
String s3 = "Massachusetts" + ".";
String s4 = "Ohio" + ".";
// #formatter:on
You could also raise a bug report with JetBrains: "Provide an option to allow multiple embedded spaces in source code". That should be fairly straightforward for them to implement: just don't replace multiple embedded spaces by a single space when reformatting.

IntelliJ code style only break on whitespace

I'm trying to create a code style in IntelliJ to keep formatting consistent. I want to ensure the right margin is not exceeded, but when I reformat and IntelliJ tries to enforce that, it often results in some quite strange looking lines.
For example, it often likes to break the line in the middle of a string resulting in something like
public static final String SOME_CONSTANT_STRING = "A long string that is cut" +
+ " off in the middle.";
What I would like instead, is for IntelliJ to only break the line at whitespace so the line would instead look like
public static final String SOME_CONSTANT_STRING =
"A long string that is cut off in the middle";
Is this behavior possible?
From Preferences > Editor > Code Style > Java > Wrapping and Braces ...
Assignment statement > Wrap if long
Assignment statement > Align when multiline
Here's a screenshot showing the configuration:
Here's a screenshot showing the outcome after applying this formatter to the declaration of a very long string constant:

Does .parse anchor or :sigspace first in a Perl 6 rule?

I have two questions. Is the behavior I show correct, and if so, is it documented somewhere?
I was playing with the grammar TOP method. Declared as a rule, it implies beginning- and end-of-string anchors along with :sigspace:
grammar Number {
rule TOP { \d+ }
}
my #strings = '137', '137 ', ' 137 ';
for #strings -> $string {
my $result = Number.parse( $string );
given $result {
when Match { put "<$string> worked!" }
when Any { put "<$string> failed!" }
}
}
With no whitespace or trailing whitespace only, the string parses. With leading whitespace, it fails:
<137> worked!
<137 > worked!
< 137 > failed!
I figure this means that rule is applying :sigspace first and the anchors afterward:
grammar Foo {
regex TOP { ^ :sigspace \d+ $ }
}
I expected a rule to allow leading whitespace, which would happen if you switched the order:
grammar Foo {
regex TOP { :sigspace ^ \d+ $ }
}
I could add an explicit token in rule for the beginning of the string:
grammar Number {
rule TOP { ^ \d+ }
}
Now everything works:
<137> worked!
<137 > worked!
< 137 > worked!
I don't have any reason to think it should be one way or the other. The Grammars docs say two things happen, but the docs do not say which order these effects apply:
Note that if you're parsing with .parse method, token TOP is automatically anchored
and
When rule instead of token is used, any whitespace after an atom is turned into a non-capturing call to ws.
I think the answer is that the rule isn't actually anchored in the pattern sense. It's the way .parse works. The cursor has to start at position 0 and end at the last position in the string. That's something outside of the pattern.
The behavior is intended, and is a culmination of these language features:
Sigspace ignores whitespace before the first atom.
From the design docs1 (S05: Regexes and Rules, line 348, emphasis added):
The new :s (:sigspace) modifier causes certain whitespace sequences to be considered "significant"; they are replaced by a whitespace matching rule, . Only whitespace sequences immediately following a matching construct (atom, quantified atom, or assertion) are eligible. Initial whitespace is ignored at the front of any regex, to make it easy to write rules that can participate in longest-token-matching alternations. Trailing space inside the regex delimiters is significant.
This means:
rule TOP { \d+ }
^-------- <.ws> automatically inserted
rule TOP { ^ \d+ $ }
^---^-^---- <.ws> automatically inserted
Regexes are first-class compiled code with lexical scoping.
A regex/rule is not a string that may have characters concatenated to it later to change its behavior. It is a self-contained routine, which is parsed and has its behavior nailed down at compile time.
Regex modifiers like :sigspace, including the one implicitly added by the rule keyword, apply only to their lexical scope - i.e. to the fragment of source code they appear in at compile time. S05, line 6291:
The :i, :m, :r, :s, :dba, :Perl5, and Unicode-level modifiers can be placed inside the regex (and are lexically scoped)
The anchoring of rule TOP is done at run time by .parse.
S05, line 44231:
The .parse and .parsefile methods anchor to the beginning and ending of the text, and fail if the end of text is not reached. (The TOP rule can check against $ itself if it wishes to produce its own error message.)
I.e. the anchoring to the beginning of the string is not intrinsic to the rule TOP, and doesn't affect how the lexical scope of TOP is parsed and compiled. It is done when method .parse is called.
It has to be this way, because because the same grammar can be used with different starting rules instead of TOP, using .parse(..., rule => ...).
So when you write
rule TOP { \d+ }
it is compiled as
regex TOP { :r \d+ <.ws> }
And when you .parse that grammar, it effectively invokes the regex code ^ <TOP> $, with the anchors not being part of TOP's lexical scope but rather of a scope that merely calls the routine TOP. The combined behavior is as if the rule TOP had been written as:
regex TOP { ^ [:r :s \d+] $ }
1) The design docs are in general not to be taken as gospel for what is or isn't part of the Perl 6 language, but S05 is pretty accurate in that regard, except that it mentions some features that haven't been implemented yet but are planned. Anyone who wants to truly grok the intricacies of Perl 6 regexes/grammars, is IMO well served by reading the full S05 from top to bottom at least once.
There aren't two regex effects going on. The rule applies :sigspace. After that, the grammar is defined. When you call .parse, it starts at the beginning of the string and goes to the end (or fails). That anchoring isn't part of the grammar. It's part of how .parse applies the grammar.
My main issue was the odd way some of the things are worded in the docs. They aren't technically wrong, but they also tend to assume knowledge about things the reader might not know. In this case, the casual comment about anchoring TOP isn't as special as it seems. Any rule passed to .parse is anchored in the same way. There's no special behavior for that rule name other than it's the default value for :rule in a call to .parse.