Finding classes/files/symbols with umlauts (ä, ö, ü) by their transliteration (ae, oe, ue) - intellij-idea

I am working with code which uses German for naming of classes, symbols and files. All German special characters like ä, ü, ö and ß are used in their transliterated form, i.e. "ae" for "ä", "oe" for "ö" etc.
Since there is no technical reason for doing that anymore, I am evaluating whether allowing umlauts and the like in their natural form is feasible. The biggest problem here is that there will be inconsistent naming once using umlauts will be allowed. I.e. a class may be named "ÖffentlicheAuftragsübernahme" (new form) or "OeffentlicheAuftragsuebernahme" (old form). This will make searching for classes, symbols and files more difficult.
Is there a way to extend the search (code navigation to be exact) of IntelliJ IDEA in a way that it will ignore whether a name is written using umlauts or their transliteration?
I suppose, this would require modifying the way IDEA indexes files. Would that be possible with a plugin? Or is there a different way to acomplish the desired result?
Example
Given the classes "KlasseÄ" "KlasseAe", "KlasseOe", "KlasseÜ"
IDEA "navigate to class" (CTRL+N) --> find result
"KlasseÄ" --> ["KlasseÄ" "KlasseAe"]
"KlasseAe" --> ["KlasseÄ" "KlasseAe"]
"KlasseUe" --> ["KlasseÜ"]
"KlasseÖ" --> ["KlasseOe"]

This is actually fairly easy to implement and does not require indexing changes. All you need to do is implement the ChooseByNameContributor interface and register it as an extension for gotoClassContributor extension point. In your implementation, you can use IDEA's existing indices to find classes with the alternatively spelled names, and to return them from getItemsByName.

You could provide your own instance of com.intellij.navigation.GotoClassContributor and in getItemsByName() look for the different variants of the input name in the PSI class index.
For example, if you extend com.intellij.ide.util.gotoByName.DefaultClassNavigationContributor, you can implement the method like this:
#Override
#NotNull
public NavigationItem[] getItemsByName(String name, final String pattern, Project project, boolean includeNonProjectItems) {
List<NavigationItem> result = new ArrayList<>();
Processor<NavigationItem> processor = Processors.cancelableCollectProcessor(result);
List<String> variants = substituteUmlauts(name);
for (String variant : variants) {
processElementsWithName(variant, processor, FindSymbolParameters.wrap(pattern, project, includeNonProjectItems));
}
return result.isEmpty() ? NavigationItem.EMPTY_NAVIGATION_ITEM_ARRAY :
result.toArray(new NavigationItem[result.size()]);
}
substituteUmlauts() will compute all the different versions of what you type in the search box, and all the results will be aggregated in result.

Related

How to replace all `public void` "Test"-methods with just "void" (with SSR and IntelliJ)

I've recently joined a codebase that no one seemed to ever run SonarLint against, and now every test class I open in IntelliJ is highlighted like a Christmas tree - all test methods are public, even though JUnit5 is used.
Tired of removing those public modifiers manually, I decided to implement an SSR template to do that for me. However, I can't seem to make it work with method parameter annotations! (which seem to be rather usual thing with JMockit)
The best I can have thus far is this:
Open Edit->Find->Replace structurally
Paste this into Search field:
#$MethodAnnotation$
public void $MethodName$(#mockit.Injectable $ParameterType$ $Parameter$) {
$statement$;
}
Choose file type: Java - Class Member
Add filter for MethodAnnotation variable: Text=org.junit.jupiter.api.Test
Add Min=0 for statement and Parameter variables
Paste this into Replace field:
#$MethodAnnotation$
void $MethodName$(#Injectable $ParameterType$ $Parameter$) {
$statement$;
}
(mind the indentation, otherwise it will be problematic!)
Select Complete match value for Search target
Find
As you can see, the annotation is hard-coded, which of course limits the applicability of this snippet seriously.
Is this a problem with IntelliJ or rather my bad knowledge of SSR? What would be a proposed solution?

Should I give up grammatical correctness when naming my functions to offer regularity?

I implement several global functions in our library that look something like this:
void init_time();
void init_random();
void init_shapes();
I would like to add functions to provide a check whether those have been called:
bool is_time_initialized();
bool is_random_initialized();
bool are_shapes_initialized();
However, as you can see are_shapes_initialized falls out of the row due to the fact that shapes is plural and therefore the function name must start with are and not is. This could be a problem, as the library is rather large and not having a uniform way to group similiar functions under the same naming convention might be confusing / upsetting.
E.g. a user using IntelliSense quickly looking up function names to see if the libary offers a way to check if their initialization call happened:
They won't find are_shapes_initialized() here unless scrolling through hundreds of additional function / class names.
Just going with is_shapes_initialized() could offer clarity:
As this displays all functions, now.
But how can using wrong grammar be a good approach? Shouldn't I just assume that the user should also ask IntelliSense for "are_initialized" or just look into the documentation in the first place? Probably not, right? Should I just give up on grammatical correctness?
The way I see it, a variable is a single entity. Maybe that entity is an aggregate of other entities, such as an array or a collection, in which case it would make sense to give it a plural name e.g. a set of Shape objects could be called shapes. Even so, it is still a single object. Looking at it that way, it is grammatically acceptable to refer to it as singular. After all, is_shapes_initialized actually means "Is the variable 'shapes' initialized?"
It's the same reason we say "The Bahamas is" or "The Netherlands is", because we are referring to the singular country, not whatever plural entity it is comprised of. So yes, is_shapes_initialized can be considered grammatically correct.
It's more a matter of personal taste. I would recommend putting "is" before functions that return Boolean. This would look more like:
bool is_time_initialized();
bool is_random_initialized();
bool is_shapes_initialized();
This makes them easier to find and search for, even if they aren't grammatically correct.
You can find functions using "are" to show it is plural in places such as the DuckDuckGo app, with:
areItemsTheSame(...)
areContentsTheSame(...)
In the DuckDuckGo app, it also uses "is" to show functions return boolean, and boolean variables:
val isFullScreen: Boolean = false
isAssignableFrom(...)
In OpenTK, a C# Graphics Library, I also found usage of "are":
AreTexturesResident(...)
AreProgramsResident(...)
In the same OpenTK Libary, they use "is" singularly for functions that return boolean and boolean variables:
IsEnabledGenlock(...)
bool isControl = false;
Either usage could work. Using "are" plurally would make more sense grammatically, and using "if" plurally could make more sense for efficiency or simplifying Boolean functions.
Here's what I would do, assuming you are trying to avoid calling this function on each shape.
void init_each_shape();
bool is_each_shape_initialized();
Also assuming that you need these functions, it seems like it would make more sense to have the functions throw an exception if they do not succeed.

.bind vs string interpolation in aurelia

In our code base we have a mixture of the following:
attribute="${something}", attribute="${something | converter}", etc.
attribute.bind="something", attribute.bind="something | converter"
I find the latter easier to read.
The examples I'm referring to are exactly like the above; i.e., they do not add any additional string content.
I think that it's easier on Aurelia too. Am I correct?
Also, for these specific cases where no actual interpolation is
involved, is there any benefit to the first form? (other than it is
two characters less to type.)
Given the examples you have shown, I would recommend using option 2. It really isn't "easier on Aurelia," but it is more explicit that you are binding the value of that attribute to the property listed.
Original Answer Below
The benefit of the first option is when you have, for example, an attribute that accepts many values but as a single string. The most common example of this is the class attribute. The class attribute accepts multiple classes in a space-separated list:
<div class="foo bar baz"></div>
Imagine we only want to add or remove the class baz from this list based on a prop on our VM someProp while leaving the other classes. To do this using the .bind syntax, we would have to create a property on our VM that has the full list but adds or removes baz as determined by the value of someProp. But using the string interpolated binding, this becomes much simpler:
<div class="foo bar ${someProp ? 'baz' : ''}"></div>
You can imagine how this could be extended with multiple classes being added or removed. You could maybe create a value converter to do this using the .bind syntax, but it might end up with something that wasn't as readable.
I could imagine a value converter being created that might look something like this in use:
<div class.bind="someProp | toggleClass:'baz':'foo':bar'"></div>
I really think this is much less readable than using the string interpolation syntax.
By the way, the value converter I imagined above would look like this:
export class ToggleClassValueConverter {
toView(value, toggledClass, ...otherProps) {
return `${otherProps.join(' ')} ${value ? toggledClass : ''}`;
}
}
The best part is that I'm still using string interpolation in the value converter :-)
After wading through the tabs I'd already opened I found this. Although it's not quite the same thing, and it's a bit old, there's a similar thing talked about on https://github.com/aurelia/templating-binding/issues/24#issuecomment-168112829 by Mr Danyow (emphasis mine)
yep, the symbol for binding behaviors is & (as opposed to | for value converters).
<input type="text" data-original="${name & oneTime}" value.bind="name" />
Here's the standard way to write a one-time binding. This will be a bit more light-weight in terms of parsing and binding:
<input type="text" data-original.one-time="name" value.bind="name" />
I don't know if it applies to the .bind/${name} case as well as the oneTime one in the example, but perhaps if it comes to his attention he can say either way.
Given this isn't a cut and dry answer, I'll be marking Ashley's as the answer as it confirms the legibility question and provides useful information on other use cases should anyone else search on similar terms.

Write a compiler for a language that looks ahead and multiple files?

In my language I can use a class variable in my method when the definition appears below the method. It can also call methods below my method and etc. There are no 'headers'. Take this C# example.
class A
{
public void callMethods() { print(); B b; b.notYetSeen();
public void print() { Console.Write("v = {0}", v); }
int v=9;
}
class B
{
public void notYetSeen() { Console.Write("notYetSeen()\n"); }
}
How should I compile that? what i was thinking is:
pass1: convert everything to an AST
pass2: go through all classes and build a list of define classes/variable/etc
pass3: go through code and check if there's any errors such as undefined variable, wrong use etc and create my output
But it seems like for this to work I have to do pass 1 and 2 for ALL files before doing pass3. Also it feels like a lot of work to do until I find a syntax error (other than the obvious that can be done at parse time such as forgetting to close a brace or writing 0xLETTERS instead of a hex value). My gut says there is some other way.
Note: I am using bison/flex to generate my compiler.
My understanding of languages that handle forward references is that they typically just use the first pass to build a list of valid names. Something along the lines of just putting an entry in a table (without filling out the definition) so you have something to point to later when you do your real pass to generate the definitions.
If you try to actually build full definitions as you go, you would end up having to rescan repatedly, each time saving any references to undefined things until the next pass. Even that would fail if there are circular references.
I would go through on pass one and collect all of your class/method/field names and types, ignoring the method bodies. Then in pass two check the method bodies only.
I don't know that there can be any other way than traversing all the files in the source.
I think that you can get it down to two passes - on the first pass, build the AST and whenever you find a variable name, add it to a list that contains that blocks' symbols (it would probably be useful to add that list to the corresponding scope in the tree). Step two is to linearly traverse the tree and make sure that each symbol used references a symbol in that scope or a scope above it.
My description is oversimplified but the basic answer is -- lookahead requires at least two passes.
The usual approach is to save B as "unknown". It's probably some kind of type (because of the place where you encountered it). So you can just reserve the memory (a pointer) for it even though you have no idea what it really is.
For the method call, you can't do much. In a dynamic language, you'd just save the name of the method somewhere and check whether it exists at runtime. In a static language, you can save it in under "unknown methods" somewhere in your compiler along with the unknown type B. Since method calls eventually translate to a memory address, you can again reserve the memory.
Then, when you encounter B and the method, you can clear up your unknowns. Since you know a bit about them, you can say whether they behave like they should or if the first usage is now a syntax error.
So you don't have to read all files twice but it surely makes things more simple.
Alternatively, you can generate these header files as you encounter the sources and save them somewhere where you can find them again. This way, you can speed up the compilation (since you won't have to consider unchanged files in the next compilation run).
Lastly, if you write a new language, you shouldn't use bison and flex anymore. There are much better tools by now. ANTLR, for example, can produce a parser that can recover after an error, so you can still parse the whole file. Or check this Wikipedia article for more options.

Writing a TemplateLanguage/VewEngine

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.
I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.
Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.
Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.
Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2
If you were in my shoes and weren't a language creating expert, where would you start?
The Spark grammar is implemented with a kind-of-fluent domain specific language.
It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.
The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.
The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.
That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.
The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.
It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.
Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split
hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!
to
write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')
Step 2. Convert your token list to a syntax tree:
* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')
class Instruction {
}
class Write : Instruction {
string text;
}
class Substitute : Instruction {
string varname;
}
class Sequence : Instruction {
Instruction[] items;
}
class If : Instruction {
string condition;
Instruction then;
Instruction else;
}
Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.
Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.
There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.
http://github.com/claco/tt.net/
In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.
Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.
If it's not what you're looking for, then you may get some ideas by looking at the source code.