Hyphenated terms in KDB WHERE clause list - where-clause

I'm trying to list hyphenated criteria in a KDB WHERE IN list. The single (non-hyphenated) terms work just fine but when I need to have a hyphen in the literal, KDB doesn't like it. I've tried quoting the strings in a comma delimited list but that doesn't seem to work either.
This works just fine:
where product in (`CD`MUNICIPAL)
This gives me an error:
where product in (`TREASURY-NOTE`TREASURY-BOND`TREASURY-TIPS)
Error:
'TIPS
This is what I'm trying but with no luck:
where product in ("TREASURY-NOTE","TREASURY-BOND","TREASURY-TIPS")

Because "-" is a special character you need to declare these as strings before casting to symbols.
where product in `$("TREASURY-NOTE";"TREASURY-BOND";"TREASURY-TIPS")
You could also use "like" which allows you to use some basic regex:
where product like "TREASURY*"

Related

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

Trim a full string, not characters - Redshift

This is the same question as here, but the answers there were very specific to PHP (and I'm using Redshift SQL, not PHP).
I'm trying to remove specific suffixes from strings. I tried using RTRIM, but that will remove any of the listed characters, not just the full string. I only want the string changed if the exact suffix is there, and I only want it replaced once.
For example, RTRIM("name",' Inc') will convert "XYZ Company Corporation" into "XYZ Company Corporatio". (Removed final 'n' since that's part of 'Inc')
Next, I tried using a CASE statement to limit the incorrect replacements, but that still didn't fix the problem, since it will continue making replacements past the original suffix.
For example, when I run this:
CASE WHEN "name" LIKE '% Inc' THEN RTRIM("name",' Inc')
I get the following results:
"XYZ Association Inc" becomes "XYZ Associatio". (It trimmed Inc but also the final 'n')
I'm aware I can use the REPLACE function, but my understanding is that this will replace values from anywhere in the string, and I only want to replace when it exists at the end of the string.
How can I do this with Redshift? (I don't have the ability to use any other languages or tools here).
You can use REGEXP_REPLACE to remove the trailing Inc by using a regex that anchors the Inc to the end of the string:
CASE WHEN "name" LIKE '% Inc' THEN REGEXP_REPLACE("name", ' Inc$', '')

amazon simpledb attribute naming restrictions?

I want to know if there are any restriction to attribute names in amazons simpledb.
I tried the following attribute name
my.attribute.name
Running the following query
select * from mydomain where my.attribute.name is not null
results in an error: "The specified query expression syntax is not valid.".
Also surrounding 'my.attribute.name' results in an error because is invalid select syntax.
Changing point to underscore and everything works fine:
my_attribute_name
and the query runs fine
select * from mydomain where my_attribute_name is not null
Now my question: What are the allowed characters for attributes?
On the amazon developer manual the names are restricted to characters that are valid in xml documents. What exactly does this mean? The linked W3C documents seems not answering this. In domain names the dot "." is allowed.
Currently I use the sdbTool. I hope this doesnt affect the behaviour.
Inserting some other characters in attribute names is working, like this one: 'my:attribute-name.with other%20chars'.
Any ideas?
Can you please enclosed your attribute name in back-tick quotes and try again ?
Domain names & Attribute names need to be enclosed in back-tick quotes if they contains any special characters. Attribute and domain names may appear without quotes if they contain only letters, numbers, underscores (_), or dollar symbols ($). You must quote all other attribute and domain names with the back-tick (`) if they contains any special characters.

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".

How do I include a single-quote in MSBuild item transformation seperator?

I need to include a single quote in an item transformation, like so:
<DatabaseFileNames>#(DatabaseFiles->'%(PhysicalName)', '','')</DatabaseFileNames>
This, however, spits out a rather cryptic error:
error MSB4095: The item metadata
%(PhysicalName) is being referenced
without an item name. Specify the
item name by using
%(itemname. PhysicalName).
I'm basically trying to create a comma-seperated list of single-quoted values.
How do I get single-quotes into the transformation seperator?
I tried using HTML-entities (the entity for single quote is '), like so:
<DatabaseFileNames>#(DatabaseFiles->'%(PhysicalName)', '','')</DatabaseFileNames>
But I get the same error.
It looks like you have to use URL-encoding style escapes, that is, %CharacterHexNumber. In this case, the single quote is ASCII character 39, which is 27 in hex, so the correct escape sequence is:
<DatabaseFileNames>#(DatabaseFiles->'%(PhysicalName)', '%27,%27')</DatabaseFileNames>