Can you modify character escaping in spring-data-solr? - spring-data-solr

We store a path in our schema, slash-delimited, and it also starts with a slash. According to this post solr interprets slashes in the beginning of queries as regex expressions after version 4.0, which means that we need to escape the slash.
But SolrTemplate.queryForPage(Query, Class) does not escape the slash, so the natural solution is to use QueryParser.escape(searchTerm) suggested in the link above.
However, this will add a backslashe to escape the slash, and that backslash will be escaped by SolrTemplate, resulting in an escaped backslash in the query - which gives no results
I'll add a couple of examples for clarity:
Query without any escaping:
q=paths:/myrootpath&start=0&rows=10
Query with manual escaping of the slash(QueryParser.escape(String)):
q=paths:\\/myrootpath&start=0&rows=10
The query I need:
q=paths:\/myrootpath&start=0&rows=10
I'm not sure if this is a bug or intended, since as far as I know, pre-4.0-solr didn't need to escape slashes
So is there way in spring-data-solr to either disable character escaping for a query, or modify which characters that are escaped?

I like idea with converter. We can add new class ContentPath:
#AllArgsConstructor
#ToString
#Getter
public class ContentPath {
private String path;
}
Then new converter needs to be registered while SolrTemplate creation:
SolrTemplate solrTemplate = ...
DefaultQueryParser defaultQueryParser = new DefaultQueryParser();
defaultQueryParser.registerConverter(new Converter<ContentPath, String>() {
#Override
public String convert(ContentPath o) {
return escape(o.getPath());
}
});
solrTemplate.registerQueryParser(Query.class, defaultQueryParser);
Last change needs to be done while query definition:
new SimpleQuery((new Criteria(IDENTIFIER)).is(new ContentPath(id)));

Related

How to ignore unregistered namespaces in i18next (colons in strings)

We use i18next in a CMS to power its internationalization features. Since developers can build however they want with the CMS there is opportunity for them to add l10n keys that include colons, including as part of HTML, such as Find more info here.
As has been documented, with default namespace separator settings i18next will think the colon is identifying a namespace/key pair. Since the CMS uses its own namespace (so devs won't accidentally overwrite UI strings), we don't have the option to turn off namespacing completely (with nsSeparator: false).
What I'm looking for is a way for i18next to only recognize registered namespaces as namespaces. So if we tell i18next that the valid namespaces are ['ns1', 'ns2'] and it receives Title: Subtitle, that string will be treated as a key, not a namespace/key pair.
I saw the loadNamespaces method, but that looks to simply register them to the ns option on the instance. Is there a way for i18next to essentially disallow any unregistered namespace?
This is definitely a workaround, but it doesn't feel too hackey, so I was comfortable using it indefinitely. It's been several weeks, so I don't remember if I got this from somewhere else, but it's possible, for the sake of not taking the credit if undeserved.
First, I added the appendNamespaceToMissingKey: true option to the init function. This adds the default namespace to any keys that don't already have a namespace. It also includes an unknown namespace (or something it thinks is a NS) with the "missing key" for parsing.
Then I added the parseMissingKeyHandler option assigned to the following function:
function (key) {
if (key.startsWith(`${this.defaultNS[0]}:`)) {
return key.slice(this.defaultNS[0].length + 1);
} else {
return key;
}
}
Since I didn't have to hard-code the namespace I felt okay with that.
So if a "key" comes in here with the default namespace, which will include all unlocalized strings without colons (e.g., 'Some text'), that namespace is removed and the string continues on normally. Since i18next doesn't have a value for that string, the string is printed as-is.
If an unlocalized string comes in containing a colon (e.g., '🏛', i18next thinks the first part before the colon is a namespace, so the default is not applied. Therefore this colon-ized string is returned from the function the same as it entered. Again, since i18next doesn't have a value for this string, the string is printed as-is. In this case, that includes the part before the colon as well as the colon separator. We end up with the full link HTML, for example.
So in addition to the other options in place, it looks like:
i18next.init({
...otherOptions,
appendNamespaceToMissingKey: true,
parseMissingKeyHandler (key) {
// We include namespaces with unrecognized l10n keys using
// `appendNamespaceToMissingKey: true`. This passes strings containing
// colons that were never meant to be localized through to the UI.
//
// Strings that do not include colons ("Content area") are given the
// default namespace by i18next ("translation," by default). Here we
// check if the key starts with that default namespace, meaning it
// belongs to no other registered namespace, then remove that default
// namespace before passing this through to be processed and displayed.
if (key.startsWith(`${this.defaultNS[0]}:`)) {
return key.slice(this.defaultNS[0].length + 1);
} else {
return key;
}
}
});
I'm including my code comment since you may also want to include something like this to remind yourself later why you include this convoluted handler.

WebStorm Live Template, separate a string of inputs

I want to create a Live Template for createSelector:
export const someSelector = createSelector(getThis, getThat, getSomethingElse, (this, that, somethingElse) =>
$END$
)
I can get it to work pretty well with a single argument (e.g., only getThis which then results in (this) in the arrow function args).
// template text
createSelector($someSelector$, ($variable$) => $END$)
// variables
// expression for "variable":
decapitalize(regularExpression(someSelector, "get", ""))
This works correctly with a single argument as mentioned above, and almost works correctly with multiple arguments, except for the capitalization:
createSelector(getThis, getThat, getSomethingElse, (this, That, SomethingElse) => /* $end$ */)
I tried wrapping that whole thing in camelCase but then of course the commas and spaces are gone.
The issue is clearly that I'm processing the whole string at once so the whole string is run through whatever string formatting function. There doesn't appear to be any way to treat individual instances of "get" separately.
I tried capture groups which I really thought would work:
decapitalize(regularExpression(someSelector, "get(\w+)", "$1"))
But that doesn't replace anything, it just copies the whole thing:
createSelector(getThis, getThat, (getThis, getThat) => )
Is there any way to accomplish this?
UPDATE:
I even learned Groovy script and wrote the following, which works in a groovy playground, but gets in WebStorm gets the same result as my final example above!
groovyScript("return _1.replaceAll(/get(\w+)/) { it[1].uncapitalize() };", someSelector)
This could be done with RegEx .. but Java does not seem to support \l replacement modifier (to be used as \l$1 instead of $1 in your initial regularExpression() code).
Live example (works in PCRE2, e.g. in PHP): https://regex101.com/r/6faVqC/1
Docs on replacement modifiers: https://www.regular-expressions.info/refreplacecase.html
In any case: this whole thing is handled by Java and you are passing RegEx pattern or GrovyScript code inside double quotes. Therefore any \ symbols would need to be escaped.
You need to replace get(\w+) by get(\\w+).
The following seems to work just fine for me here (where someSelector is the Live Template variable):
groovyScript("return _1.replaceAll(/get(\\w+)/) { it[1].uncapitalize() };", someSelector)

Which rule does the string match?

I'm using the Java syntax defined at https://github.com/antlr/grammars-v4/tree/master/java/java
My users are free to input any thing, for example
assert image != null;
,
public Color[][] smooth(Color[][] image, int neighberhoodSize)
{
...
}
,
package myapplication.mylibrary;
, and
import static java.lang.System.out; //'out' is a static field in java.lang.System
import static screen.ColorName.*;
My program should tell which syntax the input matches.
What I have up to now is
var stream = CharStreams.fromString(input);
ITokenSource lexer = new JavaLexer(stream);
ITokenStream tokens = new CommonTokenStream(lexer);
Parser parser = new JavaParser(tokens);
parser.ErrorHandler = new BailErrorStrategy();
try
{
var tree = parser.statement();
Console.WriteLine("The input is a statement");
}
catch (Exception e)
{
Console.WriteLine("The input is not a statement");
}
Are there better way to check the input match any of the 100 rules?
No, there's no other way than trial-and-error. Note that your generated parser has the property:
public static final String[] ruleNames
which you can use in combination with reflection to call all parser rules automatically instead of trying them manually.
Also, trying parser.statement() might not be enough: the input String s = "mu"; FUBAR could be properly parsed by parser.statement() and leave the trailing Identifier (FUBAR) in the token stream. After all, the statement rule probably does not end with an EOF token forcing the parser to consume all tokens. You'll probably have to manually check if all tokens are consumed before determining the input was successfully parsed by a certain parser rule. Also see this Q&A: How to test ANTLR translation without adding EOF to every rule
Unless you really mean that your users can enter anything (and I would suspect that, with some thought, that’s not really the case)
You could add a parser rule that includes alternatives for each construct your users could enter. You might have to take a little care on the order.
Since parser rules are evaluated recursive descent, if your new rule isn’t referenced by any other rules, it would have no impact on the rest of the grammar.
Could be worth a shot.

Jackson YAML Parser Deleted Special characters

I have issue while using ObjectMapper with YAMLFactory to Parse a YAML File
The YAML file I’m trying to parse : https://drive.google.com/open?id=1Q85OmjH-IAIkordikLTsC1oQVTg8ggc8
Parsing the File using readValue as shown here :
ObjectMapper mapper = new ObjectMapper(new YAMLFactory().enable(Feature.MINIMIZE_QUOTES)//
.disable(Feature.WRITE_DOC_START_MARKER)//
.disable(YAMLGenerator.Feature.SPLIT_LINES));
TypeReference<HashMap<String, Object>> typeRef = new TypeReference<HashMap<String, Object>>() {};
HashMap<String, Object> obj = mapper.readValue(responseBuffer.toString(), typeRef);
Converting the Obj to json then to YAML again by :
JsonElement jsonElem = wrapJacksonObject(obj);
String cloudTemplateJsonString = new GsonBuilder().disableHtmlEscaping().setPrettyPrinting()//
.create()//
.toJson(jsonElem);
JsonNode jsonNode = mapper.readTree(cloudTemplateJsonString);
String yaml = new YAMLMapper().enable(Feature.MINIMIZE_QUOTES)//
.disable(Feature.WRITE_DOC_START_MARKER)//
.writeValueAsString(jsonNode);
After checking the last String, I see that these Special Characters are Changed/Deleted (they are Changed exactly after Point 2) :
a. ‘ transferred to “ or Deleted
b. ! : Regarding the exclamation mark : the whole string after it until first space is deleted totally
Examples :
Version: !Join ['-', [!Ref GatewayVersion, GW]]
After Parsing
Version:
- '-'
- - GatewayVersion
- GW
Also single Quotes sometimes Deleted / Converted to double Quote
AllowedPattern: '^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})$'
After Parsing Single quotes Deleted :
AllowedPattern: ^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/(\d{1,2})$
I try to use Escape Characters Customization By customizing Implementation for CharacterEscapes class but it didn’t help
In YAML, a value such as a string literal can be prepended by tokens indicating metadata about the node, known as node properties. Tokens beginning with a bang ! are considered to be 'node tags', and tokens begining with an ampersand & are 'node anchors'.
https://yaml.org/spec/1.2/spec.html#id2783797
JSON does not have an equivalent capability. Because Jackson is primarily a JSON parsing library its internal representation of structured data nodes do not have fields for metadata, and so its YAMLFactory parser implementation simply discards them.
Looking at your YAML file I expect the intended parser of the file (aws' cloudwatch cli tool?) would have known how to use those !Join and !Ref node tags to construct an internal representation of the Version field.
Similarly, single or double quotes surrounding a text value are considered to be part of the markup (i.e., used the parser) rather than part of the value. Thus the parser discards these characters (after using them as guides on how to consume the value). Quotes (double or single) may be added or not as neccessary when reserializing an internal representation back into YAML or JSON.

How do I replace multiple characters in a String?

How do I replace multiple characters in a String?
Like Java's replaceAll(regex:replacement:) function.
str.replaceAll("[$,.]", "") //java code
This answer is very close but I want to change more than one character at the same time.
[$,.] is regex, which is the expected input for Java's replaceAll() method. Kotlin, however, has a class called Regex, and string.replace() is overloaded to take either a String or a Regex argument.
So you have to call .toRegex() explicitly, otherwise it thinks you want to replace the String literal [$,.]. It's also worth mentioning that $ in Kotlin is used with String templates, meaning in regular strings you have to escape it using a backslash. Kotlin supports raw Strings (marked by three " instead of one) which don't need to have these escaped, meaning you can do this:
str = str.replace("""[$,.]""".toRegex(), "")
In general, you need a Regex object. Aside using toRegex() (which may or may not be syntactical sugar), you can also create a Regex object by using the constructor for the class:
str = str.replace(Regex("""[$,.]"""), "")
Both these signal that your string is regex, and makes sure the right replace() is used.
If you're happy to work with regular expressions, then refer to the accepted answer here. If you're curious as to how you can achieve this without regular expressions, continue reading.
You can use the String.filterNot(predicate:) and Set.contains(element:) functions to define a String.removeAll extension function as follows:
/**
* #param charactersToRemove The characters to remove from the receiving String.
* #return A copy of the receiving String with the characters in `charactersToRemove` removed.
*/
fun String.removeAll(charactersToRemove: Set<Char>): String {
return filterNot { charactersToRemove.contains(it) }
}
You would call on this function as follows: myString.removeAll(setOf('$', '.'))