Redis: acceptable characters for values - redis

I am aware of the naming conventions for Redis keys (this is a great link here Naming Convention and Valid Characters for a Redis Key ) but what of the values? Will I have an issue if my values include characters such as &^*$#+{ ?

From http://redis.io/topics/data-types:
Redis Strings are binary safe, this means that a Redis string can contain any kind of data, for instance a JPEG image or a serialized Ruby object.
A String value can be at max 512 Megabytes in length.
So those chars you've specified will be fine, as will any other data.

#Ruan is not exactly covering the whole story. I have looked close at that section of the Redis docs and it doesn't cover special characters.
For example, you will need to escape double quotes " with a preceding backslash \" in your key.
Also if you do have special characters in your key i.e, spaces, single or double quotes, you will need to wrap your whole key in double quotes.
The following keys are valid and you can use them to start understanding how special characters are handled.
The following allows spaces in your key.
set "users:100:John Doe" 1234
The following allows special characters by escaping them.
set "metadata:2:moniker\"#\"" 1234

Related

i18next: is there a maximum key length? or special characters not allowed in key?

I am trying to use the default language value itself as key in my usage of i18next to keep it simple.
Sometimes the messages are long and sometime have special chars like single quote/ double quote or apostrophe ...
Even though the resourceBundle is having this key value pair, i18next is still logging this as missing key.
Happens only for long text or keys with special chars mentioned above.
verified that only the key separator and ns separator only have to be replaced. Otherwise everything else look good.

Character-set of SSH keys (safe delimiter for using sed with public keys)

I am using sed to replace a placeholder in a script with my public ssh key. The character / is definitely present in some SSH keys, how can I find out which character I can use as delimiter for sed?
I am looking for an answer of either the set of all characters that can be part of the string generated by ssh-keygen, or which characters are guaranteed not to.
The public key in opnessh format is base64-encoded as mentioned for example in the manual page for sshd. Therefore you can use any character that is not in the list of base64 characters. The / is there but for example | can be used safely (though in the comment section can be anything).
For information, from the info sed, section 3.5:
The '/' characters may be uniformly replaced by any other single character within any given 's' command.
The '/' character (or whatever other character is used in its stead) can
appear in the REGEXP or REPLACEMENT only if it is preceded by a '\'
character.
So you can chose any suitable character that doesn't appear in your input data.

handling strings with \n in plain text e-mail

I have a column in my database that contains a string like this:
"Warning set for 7 days.\nCritical Notice - Last Time Machine backup was 118 days ago at 2012-11-16 20:40:52\nLast Time Machine Destination was FreeAgent GoFlex Drive\n\nDefined Destinations:\nDestination Name: FreeAgent GoFlex Drive\nBackup Path: Not Specified\nLatest Backup: 2012-11-17"
I am displaying this data in an e-mail to users. I have be able to easily format the field in my html e-mails perfectly by doing the following:
simple_format(#servicedata.service_exit_details.gsub('\n', '<br>'))
The above code replaces the "\n" with "<br>" tags and simple_format handles the rest.
My issues arises with how to format it properly in the plain text template. Initially I thought I could just call the column, seeing as it has "\n" I assumed the plain text would interpret and all would be well. However this simply spits out the string with "\n" intact just as displayed above rather than created line breaks as desired.
In an attempt to find a way to parse the string so the line breaks are acknowledged. I have tried:
#servicedata.service_exit_details.gsub('\n', '"\r\n"')
#servicedata.service_exit_details.gsub('\n', '\r\n')
raw #servicedata.service_exit_details
markdown(#servicedata.service_exit_details, autolinks: false) # with all the necessary markdown setup
simple_format(#servicedata.service_exit_details.html_safe)
none of which worked.
Can anyone tell me what I'm doing wrong or how I can make this work?
What I want is for the plain text to acknowledge the line breaks and format the string as follows:
Warning set for 7 days.
Critical Notice - Last Time Machine backup was 118 days ago at 2012-11-16 20:40:52
Last Time Machine Destination was FreeAgent GoFlex Drive
Defined Destinations:
Destination Name: FreeAgent GoFlex Drive
Backup Path: Not Specified\nLatest Backup: 2012-11-17"
I see.
You need to differentiate a literal backslash followed by a letter n as a sequence of two characters, and a LF character (a.k.a. newline) that is usually represented as \n.
You also need to distinguish two different kinds of quoting you're using in Ruby: singles and doubles. Single quotes are literal: the only thing that is interpreted in single quotes specially is the sequence \', to escape a single quote, and the sequence \\, which produces a single backslash. Thus, '\n' is a two-character string of a backslash and a letter n.
Double quotes allow for all kinds of weird things in it: you can use interpolation with #{}, and you can insert special characters by escape sequences: so "\n" is a string containing the LF control character.
Now, in your database you seem to have the former (backslash and n), as hinted by two pieces of evidence: the fact that you're seeing literal backslash and n when you print it, and the fact that gsub finds a '\n'. What you need to do is replace the useless backslash-and-n with the actual line separator characters.
#servicedata.service_exit_details.gsub('\n', "\r\n")

Which Unicode characters are "composing" characters (whose sole purpose is to add accent, tilda)?

This is related to
What are the characters that count as the same character under collation of UTF8 Unicode? And what VB.net function can be used to merge them?
This is how I plan to do this:
Use http://msdn.microsoft.com/en-us/library/dd374126%28v=vs.85%29.aspx to turn the string into
KD form.
Basically it'll turn most variation such as superscript into the normal number. Also it decompose tilda and accent into 2 characters.
Next step would be to remove all characters whose sole purpose is tildaing or accenting character.
How do I know which characters are like that? Which characters are just "composing characters"
How do I find such characters? After I find those, how do I get rid of it? Should I scan character by character and remove all such "combining characters?"
For example:
Character from 300 to 362 can be gotten rid off.
Then what?
Combining characters are listed in UnicodeData.txt as having a nonzero Canonical_Combining_Class, and a General_Category of Mn (Mark, nonspacing).
For each character in the string, call GetUnicodeCategory and check the UnicodeCategory for NonSpacingMark, SpacingCombiningMark or EnclosingMark.
You may be able to do it more efficiently using regex, eg Regex.Replace(str, "\p{M}", "").

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".