DB2 complex like - sql

I have to write a select statement following the following pattern:
[A-Z][0-9][0-9][0-9][0-9][A-Z][0-9][0-9][0-9][0-9][0-9]
The only thing I'm sure of is that the first A-Z WILL be there. All the rest is optional and the optional part is the problem. I don't really know how I could do that.
Some example data:
B/0765/E 3
B/0765/E3
B/0764/A /02
B/0749/K
B/0768/
B/0784//02
B/0807/
My guess is that I best remove al the white spaces and the / in the data and then execute the select statement. But I'm having some problems writing the like pattern actually.. Anyone that could help me out?
The underlying reason for this is that I'm migrating a database. In the old database the values are just in 1 field but in the new one they are splitted into several fields but I first have to write a "control script" to know what records in the old database are not correct.
Even the following isn't working:
where someColumn LIKE '[a-zA-Z]%';

You can use Regular Expression via xQuery to define this pattern. There are many question in StackOverFlow that talk about patterns in DB2, and they have been solved with Regular Expressions.
DB2: find field value where first character is a lower case letter
Emulate REGEXP like behaviour in SQL

Related

How to search for string in SQL treating apostrophe and single quote as equal

We have a database where our customer has typed "Bob's" one time and "Bob’s" another time. (Note the slight difference between the single-quote and apostrophe.)
When someone searches for "Bob's" or "Bob’s", I want to find all cases regardless of what they used for the apostrophe.
The only thing I can come up with is looking at people's queries and replacing every occurrence of one or the other with (’|'') (Note the escaped single quote) and using SIMILAR TO.
SELECT * from users WHERE last_name SIMILAR TO 'O(’|'')Dell'
Is there a better way, ideally some kind of setting that allows these to be interchangeable?
You can use regexp matching
with a_table(str) as (
values
('Bob''s'),
('Bob’s'),
('Bobs')
)
select *
from a_table
where str ~ 'Bob[''’]s';
str
-------
Bob's
Bob’s
(2 rows)
Personally I would replace all apostrophes in a table with one query (I had the same problem in one of my projects).
If you find that both of the cases above are valid and present the same information then you might actually consider taking care of your data before it arrives into the database for later retrieval. That means you could effectively replace one sign into another within your application code or before insert trigger.
If you have more cases like the one you've mentioned then specifying just LIKE queries would be a way to go, unfortunately.
You could also consider hints for your customer while creating another user that would fetch up records from database and return closest matches if there are any to avoid such problems.
I'm afraid there is no setting that makes two of these symbols the same in DQL of Postgres. At least I'm not familiar with one.

using regular expression in sql

I have the following rows in my table
COL1 EXTRA DOUBLE TEST
12 TEST
123 EXTRA
125 EXTRA 95 DOUBLE
EXTRA 45 99 DOUBLE
I am using regular expressions to filter out the rows and move them appropriately to different columns. So:
For the first row, I want 12 to be extracted and put in column TEST.
For the second row, I want 123 to be extracted and put in column EXTRA.
For 3rd row, I want 125 to be extracted and put in column EXTRA.
I want to ignore 95.
For the last row, I want 45 to be extracted and put in column EXTRA.
I can extract the values and put them in appropriate columns through my query, I am using this regular expression for extracting the values:
'%[0-9]%[^A-Z]%[0-9]%'
the problem with this regular expression is that it extracts 12, but it does not extract 123 from the second row, if I change the regular expression to:
'%[0-9]*%[^A-Z]%[0-9]%'
then it extracts 123, but for the third row, it concatenates 125 with 95 so I get 12595. Is there any way I can avoid 95 and just get the value 125? If I remove the star then it does not do any concatenation.
Any help will be appreciated. I posted this question before, but some of you were asking for more explanation so I posted a new question for that.
I believe that the regex that you are looking for is below. This will match digits followed by numbers, followed by ignoring any future number patterns. However, I believe that when you use the %regex%regex%..., that it runs each regex separately, so I am not sure about the nuances of regex in SQL. However, if you run this against rubular.com it seems to solve the problem you are asking. Hopefully it can be of some use in your regex search :)
([0-9]*)([^A-Z])(?>[0-9]*)
However, I did just look at your other examples of the letters coming first, and that would not work here. But, maybe this can still be of use to you
SQL Server does not natively support Regex. It does support some limited pattern matching through Like and Patindex.
If you really want to use Regex inside of SQL Server you can use a .NET language like C# to create a special CLR and import that into SQL Server, but that comes with a number of drawbacks. If you want to use Regex the better way is to have an application that runs on top of SQL Server. That application could be written in any language that can interface ODBC like C# or Python, and in fact in an intro article I talk about interfacing Python with SQL Server to use regex on Simple-Talk.
But, the patterns you provide are using SQL Servers more limited pattern matching capabilities rather than Regex, so that seems to be what you want. There is a full description at Pattern Matching in Search Conditions
As for solving your particular problem, you don't seem to have one particular pattern but several possible patterns anway. That type of situation is almost impossible to handle with a single SQL Server pattern and the regex logic gets unnecessarily complicated too. So, if I were in your position I would not try to create a single pattern but a series of cases and then extract the number you need based on that.
Assuming this is SQL 2005 (or later I guess... I can only speak for 2005), and all different permutations of COL1 data are in your question:
UPDATE NameOfYourTable
SET TEST = SUBSTRING(Col1, 0, LEN(Col1) - (LEN(Col1) - PATINDEX('%[0-9] TEST%', Col1) - 1))
WHERE COL1 LIKE '%[0-9] TEST%'
UPDATE NameOfYourTable
SET EXTRA = SUBSTRING(Col1, 0, LEN(Col1) - (LEN(Col1) - PATINDEX('%[0-9] EXTRA%', Col1) - 1))
WHERE COL1 LIKE '%[0-9] EXTRA%'
UPDATE NameOfYourTable
SET EXTRA = SUBSTRING(Col1, PATINDEX('%[0-9]%', Col1), LEN(Col1) - (LEN(Col1) - PATINDEX('%[0-9] [0-9]%', Col1) + LEN('EXTRA ')))'
WHERE COL1 LIKE 'EXTRA [0-9]%'
Somehow though, I really don't think this is going to resolve your problem. I would strongly advise you to make sure this will catch all the cases you need to handle by running this on some test data.
If you have a lot of different cases to handle, then the better alternative I think would be to make a small console program in something like C# (that has much better RegExp support) to sift through your data and apply updates that way. Trying to handle numerous permutations of COL1 data is going to be a nightmare in SQL.
Also read these on LIKE, PATINDEX and their (limited) pattern-matching abilities:
LIKE: http://msdn.microsoft.com/en-us/library/ms179859(v=sql.90).aspx
PATINDEX: http://msdn.microsoft.com/en-us/library/ms188395(v=sql.90).aspx

Search literal within a word

Is there a way to perform a FULLTEXT search which returns literals found within words?
I have been using MATCH(col) AGAINST('+literal*' IN BOOLEAN MODE) but it fails if the text is like:
blah,blah,literal,blah
blahliteralblah
blah,blah,literal
Please Note that there is no space after commas.
I want all three cases above to be returned.
I think that should be better fetching the array of entries and then perform a text manipulation over the fetched data (in this case a search)!
Because any text manipulation or complex query take more resources and if your database contains a lot of data, the query become too slow! Moreover, if you are running your
query on a shared server, that increases the performance issues!
You can easily accomplish what you are trying to do with regex, once you have fetched the data from the database!
UPDATE: My suggestion is the same even if you are running your script on a dedicated server! However, if you want to perform a full-text search of the word "literal" in BOOLEAN MODE like you have described, you can remove the + operator (because you are searching only one word) and construct the query as follow:
SELECT listOfColumsNames WHERE
MATCH (colName)
AGAINST ('literal*' IN BOOLEAN MODE);
However, even if you add the AND operator, your query works fine: tested on Apache Server with MySQL 5.1!
I suggest you to read the documentation about the full-text search in boolean mode.
The only one problem of this query is that doesn't matches the word "literal" if it is a sub-string inside an other word, for example: "textliteraltext".
As you noticed, you can't use the * operator at the beginning of the word!
So, to accomplish what you are trying to do, the fastest and easiest way is to follow the suggestion of Paul, using the % placeholder:
SELECT listOfColumsNames
WHERE colName LIKE '%literal%';

Openbase SQL case-sensitivity oddities ('=' vs. LIKE) - porting to MySQL

We are porting an app which formerly used Openbase 7 to now use MySQL 5.0.
OB 7 did have quite badly defined (i.e. undocumented) behavior regarding case-sensitivity. We only found this out now when trying the same queries with MySQL.
It appears that OB 7 treats lookups using "=" differently from those using "LIKE": If you have two values "a" and "A", and make a query with WHERE f="a", then it finds only the "a" field, not the "A" field. However, if you use LIKE instead of "=", then it finds both.
Our tests with MySQL showed that if we're using a non-binary collation (e.g. latin1), then both "=" and "LIKE" compare case-insensitively. However, to simulate OB's behavior, we need to get only "=" to be case-sensitive.
We're now trying to figure out how to deal with this in MySQL without having to add a lot of LOWER() function calls to all our queries (there are a lot!).
We have full control over the MySQL DB, meaning we can choose its collation mode as we like (our table names and unique indexes are not affected by the case sensitivity issues, fortunately).
Any suggestions how to simulate the OpenBase behaviour on MySQL with the least amount of code changes?
(I realize that a few smart regex replacements in our source code to add the LOWER calls might do the trick, but we'd rather find a different way)
Another idea .. does MySQL offer something like User Defined Functions? You could then write a UDF-version of like that is case insesitive (ci_like or so) and change all like's to ci_like. Probably easier to do than regexing a call to lower in ..
These two articles talk about case sensitivity in mysql:
Case Sensitive mysql
mySql docs "Case Sensitivity"
Both were early hits in this Google search:
case sensitive mysql
I know that this is not the answer you are looking for .. but given that you want to keep this behaviour, shouldn't you explicitly code it (rather than changing some magic 'config' somewhere)?
It's probably quite some work, but at least you'd know which areas of your code are affected.
A quick look at the MySQL docs seems to indicate that this is exactly how MySQL does it:
This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a.

Regular expression to match common SQL syntax?

I was writing some Unit tests last week for a piece of code that generated some SQL statements.
I was trying to figure out a regex to match SELECT, INSERT and UPDATE syntax so I could verify that my methods were generating valid SQL, and after 3-4 hours of searching and messing around with various regex editors I gave up.
I managed to get partial matches but because a section in quotes can contain any characters it quickly expands to match the whole statement.
Any help would be appreciated, I'm not very good with regular expressions but I'd like to learn more about them.
By the way it's C# RegEx that I'm after.
Clarification
I don't want to need access to a database as this is part of a Unit test and I don't wan't to have to maintain a database to test my code. which may live longer than the project.
Regular expressions can match languages only a finite state automaton can parse, which is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex. So, you can stop trying.
SQL is a type-2 grammar, it is too powerful to be described by regular expressions. It's the same as if you decided to generate C# code and then validate it without invoking a compiler. Database engine in general is too complex to be easily stubbed.
That said, you may try ANTLR's SQL grammars.
As far as I know this is beyond regex and your getting close to the dark arts of BnF and compilers.
http://savage.net.au/SQL/
Same things happens to people who want to do correct syntax highlighting. You start cramming things into regex and then you end up writing a compiler...
I had the same problem - an approach that would work for all the more standard sql statements would be to spin up an in-memory Sqlite database and issue the query against it, if you get back a "table does not exist" error, then your query parsed properly.
Off the top of my head: Couldn't you pass the generated SQL to a database and use EXPLAIN on them and catch any exceptions which would indicate poorly formed SQL?
Have you tried the lazy selectors. Rather than match as much as possible, they match as little as possible which is probably what you need for quotes.
To validate the queries, just run them with SET NOEXEC ON, that is how Entreprise Manager does it when you parse a query without executing it.
Besides if you are using regex to validate sql queries, you can be almost certain that you will miss some corner cases, or that the query is not valid from other reasons, even if it's syntactically correct.
I suggest creating a database with the same schema, possibly using an embedded sql engine, and passing the sql to that.
I don't think that you even need to have the schema created to be able to validate the statement, because the system will not try to resolve object_name etc until it has successfully parsed the statement.
With Oracle as an example, you would certainly get an error if you did:
select * from non_existant_table;
In this case, "ORA-00942: table or view does not exist".
However if you execute:
select * frm non_existant_table;
Then you'll get a syntax error, "ORA-00923: FROM keyword not found where expected".
It ought to be possible to classify errors into syntax parsing errors that indicate incorrect syntax and errors relating to tables name and permissions etc..
Add to that the problem of different RDBMSs and even different versions allowing different syntaxes and I think you really have to go to the db engine for this task.
There are ANTLR grammars to parse SQL. It's really a better idea to use an in memory database or a very lightweight database such as sqlite. It seems wasteful to me to test whether the SQL is valid from a parsing standpoint, and much more useful to check the table and column names and the specifics of your query.
The best way is to validate the parameters used to create the query, rather than the query itself. A function that receives the variables can check the length of the strings, valid numbers, valid emails or whatever. You can use regular expressions to do this validations.
public bool IsValid(string sql)
{
string pattern = #"SELECT\s.*FROM\s.*WHERE\s.*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
return rgx.IsMatch(sql);
}
I am assuming you did something like .\* try instead [^"]* that will keep you from eating the whole line. It still will give false positives on cases where you have \ inside your strings.