invalid escape character "\\" was specified in a LIKE predicate - sql

i'm getting this error message with my query.. i'm using openjpa and sql server for my database...
this is my query:
public static List<ProcesosGeneralEntity> getALLbyProductor(String productor){
Query q = entityManager.createQuery("select a from ProcesosGeneralEntity a where a.productor like :productor ");
q.setParameter("productor", '%'+productor+'%');
List<ProcesosGeneralEntity>resultado=q.getResultList();
List<ProcesosGeneralEntity>result2=new ArrayList<ProcesosGeneralEntity>(resultado);
return result2;
}

Just in case my comment above is correct, I am submitting this as a tentative or possible answer.
Different database software behaves slightly differently, the ANSI SQL Standard does not cover all behavioral quirks of SQL, and so things like escape characters in strings differ between implementations. In SQL Server, escaping a quote mark is done with another quote mark, so to print the string "Alice's dog", one needs to use use, in SQL, the string 'Alice''s dog'. In Oracle Database, the escape character is a backslash, so to print that same string, you instead use 'Alice\'s dog'. Escape characters themselves need to be able to be printed, so in Oracle Database, to print the string "R2\D2", you need to enter the string 'R2\\D2'.
The problem you are having appears to be that it thinks it is talking to an Oracle Database, and thus defaulted to the latter behavior, and used \\ to quote a single \, instead of leaving it be. SQL Server then had a hiccup on this option or some-such. I'm not sure why it threw it back, to be honest.
Regardless, according to the OpenJPA manual's section 4. Database Support - Chapter 4. JDBC you need to specify the correct DBDictionary. The DBDictionary specifies settings like which escape characters to use in which cases, and other non-standard options that are not uniform across all database systems supported.
The solution appears to be that in the configuration file for your software, you must specify something like:
<property name="openjpa.jdbc.DBDictionary" value="sqlserver"/>

Related

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

What regular expression characters have to be escaped in SQL?

To prevent SQL injection attack, the book "Building Scalable Web Sites" has a function to replace regular expression characters with escaped version:
function db_escape_str_rlike($string) {
preg_replace("/([().\[\]*^\$])/", '\\\$1', $string);
}
Does this function escape ( ) . [ ] * ^ $? Why are only those characters escaped in SQL?
I found an excerpt from the book you mention, and found that the function is not for escaping to protect against SQL injection vulnerabilities. I assumed it was, and temporarily answered your question with that in mind. I think other commenters are making the same assumption.
The function is actually about escaping characters that you want to use in regular expressions. There are several characters that have special meaning in regular expressions, so if you want to search for those literal characters, you need to escape them (precede with a backslash).
This has little to do with SQL. You would need to escape the same characters if you wanted to search for them literally using grep, sed, perl, vim, or any other program that uses regular expression searches.
Unfortunately, active characters in sql databases is an open issue. Each database vendor uses their own (mainly oracle's mysql, that uses \ escape sequences)
The official SQL way to escape a ', which is the string delimiter used for values is to double the ', as in ''.
That should be the only way to ensure transparency in SQL statements, and the only way to introduce a proper ' into a string. As soon as any vendor admits \' as a synonim of a quote, you are open to support all the extra escape sequences to delimit strings. Suppose you have:
'Mac O''Connor' (should go into "Mac O'Connor" string)
and assume the only way to escape a ' is that... then you have to check the next char when you see a ' for a '' sequence and:
you get '' that you change into '.
you get another, and you terminate the string literal and process the char as the first of the next token.
But if you admit \ as escape also, then you have to check for \' and for \\', and \\\' (this last one should be converted to \' on input) etc. You can run into trouble if you don't detect special cases as
\'' (should the '' be processed as SQL mandates, or the first \' is escaping the first ' and the second is the string end quote?)
\\'' (should the \\ be converted into a single \ then the ' should be the string terminator, or do we have to switch to SQL way of encoding and consider '' as a single quote?)
etc.
You have to check your database documentation to see if \ as escape characters affect only the encoding of special characters (like control characters or the like) and also affects the interpretation of the quote character or simply doesn't, and you have to escape ' the other way.
That is the reason for the vendors to include functions to do the escape/unescape of character literals into values to be embedded in a SQL statement. The idea of the attackers is to include (if you don't properly do) escape sequences into the data they post to you to see if that allows them to modify the text of the sql command to simply add a semicolon ; and write a complete sql statement that allows them to access freely your database.

How do I escape the # character in Cognos?

I have created a pass-through Query Item in Cognos 8 Framework Manager that requires the # character as part of the query. Unfortunately this gets interpreted by Cognos as the opening of a macro.
How do I escape the # (number sign/sharp) character in a Query Item?
There does't seem to be any "official" way in the documentation, but this seems to work.
#"#"#
#'#'# works too, unless it appears in a literal SQL string in your query, so it's safer to use #"#"#.
A single backslash \ should delimit it, I know it delimits square brackets [].

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".

Replace character in SQL results

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)
It looks like this when I copy and paste the results.
spouse�s
is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?
Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.
I ran this
select dump('�') from dual;
which returned
Typ=96 Len=3: 239,191,189
This seems to work so far
select translate('What is your spouse�s first name?', '�', '''') from dual;
but this doesn't work
select translate(Fieldname, '�', '''') from TableName
Select FN from TN
What is your spouse�s first name?
SELECT DUMP(FN, 1016) from TN
Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f
EDIT:
So I have established that is the backquote character. I can't get the DB updated so I'm trying this code
SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN
and I"m getting ORA-00904:"Regex_Replace":invalid identifier
This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.
Try to DUMP the value. you can do it with a select as simple as:
SELECT DUMP(the_column)
FROM xxx
WHERE xxx
UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.
UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.
Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.
UPDATE 3: On your examples:
select translate('What. It works because the � is exactly the same char: You have pasted on both sides.
select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.
Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).
I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.
I'm going to propose a front-end application-based, client-side approach to the problem:
I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.
The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().
Try running the following query to get the character code:
SELECT DUMP(<column with weird character>, 1016)
FROM <your table>
WHERE <column with weird character> like '%spouse%';
If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.
Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:
UPDATE <table>
set <column with offending character>
= REGEX_REPLACE(<column with offending character>,
"<character code of �>",
"'")
WHERE regex_like(<column with offending character>,"<character code of �>");
If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.
EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:
0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.
And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:
INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
"\0092",
"\0027") -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");
DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");
Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".
TRANSLATE() is a useful function for replacing or eliminating known single character codes.