i18next: is there a maximum key length? or special characters not allowed in key? - i18next

I am trying to use the default language value itself as key in my usage of i18next to keep it simple.
Sometimes the messages are long and sometime have special chars like single quote/ double quote or apostrophe ...
Even though the resourceBundle is having this key value pair, i18next is still logging this as missing key.
Happens only for long text or keys with special chars mentioned above.

verified that only the key separator and ns separator only have to be replaced. Otherwise everything else look good.

Related

REGEXP_REPLACE explanation

Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.

Character-set of SSH keys (safe delimiter for using sed with public keys)

I am using sed to replace a placeholder in a script with my public ssh key. The character / is definitely present in some SSH keys, how can I find out which character I can use as delimiter for sed?
I am looking for an answer of either the set of all characters that can be part of the string generated by ssh-keygen, or which characters are guaranteed not to.
The public key in opnessh format is base64-encoded as mentioned for example in the manual page for sshd. Therefore you can use any character that is not in the list of base64 characters. The / is there but for example | can be used safely (though in the comment section can be anything).
For information, from the info sed, section 3.5:
The '/' characters may be uniformly replaced by any other single character within any given 's' command.
The '/' character (or whatever other character is used in its stead) can
appear in the REGEXP or REPLACEMENT only if it is preceded by a '\'
character.
So you can chose any suitable character that doesn't appear in your input data.

Redis: acceptable characters for values

I am aware of the naming conventions for Redis keys (this is a great link here Naming Convention and Valid Characters for a Redis Key ) but what of the values? Will I have an issue if my values include characters such as &^*$#+{ ?
From http://redis.io/topics/data-types:
Redis Strings are binary safe, this means that a Redis string can contain any kind of data, for instance a JPEG image or a serialized Ruby object.
A String value can be at max 512 Megabytes in length.
So those chars you've specified will be fine, as will any other data.
#Ruan is not exactly covering the whole story. I have looked close at that section of the Redis docs and it doesn't cover special characters.
For example, you will need to escape double quotes " with a preceding backslash \" in your key.
Also if you do have special characters in your key i.e, spaces, single or double quotes, you will need to wrap your whole key in double quotes.
The following keys are valid and you can use them to start understanding how special characters are handled.
The following allows spaces in your key.
set "users:100:John Doe" 1234
The following allows special characters by escaping them.
set "metadata:2:moniker\"#\"" 1234

Which Unicode characters are "composing" characters (whose sole purpose is to add accent, tilda)?

This is related to
What are the characters that count as the same character under collation of UTF8 Unicode? And what VB.net function can be used to merge them?
This is how I plan to do this:
Use http://msdn.microsoft.com/en-us/library/dd374126%28v=vs.85%29.aspx to turn the string into
KD form.
Basically it'll turn most variation such as superscript into the normal number. Also it decompose tilda and accent into 2 characters.
Next step would be to remove all characters whose sole purpose is tildaing or accenting character.
How do I know which characters are like that? Which characters are just "composing characters"
How do I find such characters? After I find those, how do I get rid of it? Should I scan character by character and remove all such "combining characters?"
For example:
Character from 300 to 362 can be gotten rid off.
Then what?
Combining characters are listed in UnicodeData.txt as having a nonzero Canonical_Combining_Class, and a General_Category of Mn (Mark, nonspacing).
For each character in the string, call GetUnicodeCategory and check the UnicodeCategory for NonSpacingMark, SpacingCombiningMark or EnclosingMark.
You may be able to do it more efficiently using regex, eg Regex.Replace(str, "\p{M}", "").

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".