What's the best approach to embed RegEx in Oracle or SQL Server 2005 SQL? - sql

This is a 3 part question regarding embedded RegEx into SQL statements.
How do you embed a RegEx expression into an Oracle PL/SQL
select statement that will parse out
the “DELINQUENT” string in the text
string shown below?
What is the performance impact if used within a
mission critical business
transaction?
Since embedding regex
into SQL was introduced in Oracle
10g and SQL Server 2005, is it
considered a recommended practice?
Dear Larry :
Thank you for using ABC's alert service.
ABC has detected a change in the status of one of your products in the state of KS. Please review the
information below to determine if this status change was intended.
ENTITY NAME: Oracle Systems, LLC
PREVIOUS STATUS: --
CURRENT STATUS: DELINQUENT
As a reminder, you may contact your the ABC Team for assistance in correcting any delinquencies or, if needed, reinstating
the service. Alternatively, if the system does not intend to continue to engage this state, please notify ABC
so that we can discontinue our services.
Kind regards,
Service Team 1
ABC
--PLEASE DO NOT REPLY TO THIS EMAIL. IT IS NOT A MONITORED EMAIL ACCOUNT.--
Notice: ABC Corporation cannot independently verify the timeliness, accuracy, or completeness of the public information
maintained by the responsible government agency or other sources of data upon which these alerts are based.

Why would you need regular expressions here?
INSTR and SUBSTR will do the job perfectly.
But if you convinced you need Regex'es you can use:
REGEXP_INSTR
REGEXP_REPLACE
REGEXP_SUBSTR
(only available in Oracle 10g and up)
SELECT emp_id, text
FROM employee_comment
WHERE REGEXP_LIKE(text,'...-....');

If I recall correctly, it is possible to write a UDF in c#/vb for SQL Server.
Here's a link, though possibly not the best: http://www.novicksoftware.com/coding-in-sql/Vol3/cis-v3-N13-dot-net-clr-in-sql-server.htm

Why not just use INSTR (for Oracle) or CHARINDEX (for SQL Server) combined with SUBSTRING? Seems a bit more straightforward (and portable, since it's supported in older versions).
http://www.techonthenet.com/oracle/functions/instr.php and http://www.adp-gmbh.ch/ora/sql/substr.html
http://www.databasejournal.com/features/mssql/article.php/3071531 and http://msdn.microsoft.com/en-us/library/ms187748.aspx

INSTR and CHARINDEX are great alternative approaches but I'd like to explore the benefits of embedding Regex.

In MS SQL you can use LIKE which has some "pattern matching" in it. I would guess Oracle has something similar. Its not Regex, but has some of the matching capabilities. (Note: its not particularly fast).. Fulltext searching could also be an option (again MS SQL) (probably a much faster way in the context of a good sized database)

Related

best search in mdb for big data

I have created a dictionary for English to Kurdish and I saved my data in .mdb access file, my data are more than 78,000 words.
Please can anyone help me to make a quick search?
I'm using this query for search
"SELECT english FROM table WHERE English LIKE '" +text Searchlight. Text+"%'";
If your query is:
SELECT english
FROM table
WHERE English LIKE '" +text Searchlight. Text+"%'"
Then I'm a little confused. Access generally uses * as the wildcard for searching rather than % (which is the SQL standard). Because the LIKE pattern does not start with a wildcard, many databases will use an index (if available) for this query. I don't know if MS Access has this optimization.
In any case, you seem to be going down a path where full text search is beneficial. If so, I think you have the wrong tool for the job. MS Access doesn't support full text search. I would suggest that you use a database that does (obvious choices are SQL Server Express, Postgres, and MySQL, all of which are free). By the way, all three of these do use an index for LIKE, when the pattern does not start with a wildcard character.
If you decide to use SQL Server Express, this answer should be helpful for the installation.

Does the 'S' in SQL stand for "standard" or "structured"?

I'm thinking "structured", but my dad claims that when he taught a class that involved SQL (decades ago), they used "standard". I was wondering if this changed over time, or is he mistaken? I googled it with "standard" and did see some pages that said that's what it stands for. Any old timers willing to give a history lesson?
Wikipedia says first:
SQL often referred to as Structured Query Language.
And then further down:
SQL was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL (Structured English Query Language).
There's no mention of "standard query language" on the whole page.
The other test of course is to search Google for "structured query language" vs "standard query language". For which I currently get 913,000 results compared to 124,000. So clearly "structured" wins, however interestingly there was apparently a divided preference at one time. This site says:
In the early days of the system there was divided preference between Standard Query Language and Structured Query Language but it did not make a whole lot of difference since most people most of the time called it by the acronym SQL. Now the overwhelming but not complete preference is for Structured Query Language.
It stands for Structured.
The precursor was called SEQUEL, standing for Structured English QUEry Language.
From wikipedia:
The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark
In the beginning of Oracle Databases, for instance, it was called 'Standard Query Language'. Yes, it is a structured language...but it is known to us old schoolers has Standard Query Language (SQL).
Now if you want to call it Structure that is entirely up to you. Maybe we should use sSQL or lower case sql has the acronym instead of SQL in upper caps. Either way its entirely up to you. There is always someone trying to put a different twist on things. Whatever acronym you use get the sequel script right.
The original wasn't even SQL acronym, it was SEQUEL: (from wiki)
SQL was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL (Structured English Query Language), was designed to manipulate and retrieve data stored in IBM's original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed during the 1970s.[8] The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company
It's certainly "structured" e.g. although logically the SELECT clause is evaluted after the FROM clause you cannot write a SQL statement that way because you would be violating its "structure".
It is Structured Query Language but what i think is one of the reason why they thought it was "Standard Query language" is because SQL is ANSI and ISO which also at the i first place i also thought it was Standard Query Language which gets me wrong in exam.

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Good SQL search tool?

FreeTextTable is really great for searching, as it actually returns a relevancy score for each item it finds.
The problem is, it doesn't support the logical operator AND, so if I have 10 items with the word 'ice' in it, but not 'cream', and vice versa, then 20 results will be returned, when in this scenario 0 should've been returned.
Are there any alternative tools to search a SQL Server database? Or should I just write my own code to provide 'AND' functionality (I.E. doing two seperate searches in the scenario 'Ice'Cream' (splitting each search by spaces))
You can try SQL Search from RedGate.
It is a free tool (though not open source) - I have used it before and it is very powerful.
There is also a free SQL Search tool from ApexSQL you can try. It integrates into SSMS and can also show relationship diagrams and help with safely removing/renaming objects in your database. They do require you to leave email but the product itself is completely free. ApexSQL Search
Since you have full text search enabled to use FREETEXTTABLE perhaps you could make use of CONTAINS instead? (I have to be honest, I've not used full text search myself).
It would appear you can query like this:
SELECT Name, Price FROM Product
WHERE CONTAINS(Name, 'ice')
AND CONTAINS(Name, 'cream')

Can you perform an AND search of keywords using FREETEXT() on SQL Server 2005?

There is a request to make the SO search default to an AND style functionality over the current OR when multiple terms are used.
The official response was:
not as simple as it sounds; we use SQL Server 2005's FREETEXT() function, and I can't find a way to specify AND vs. OR -- can you?
So, is there a way?
There are a number of resources on it I can find, but I am not an expert.
As far as I've seen, it is not possible to do AND when using FREETEXT() under SQL 2005 (nor 2008, afaik).
A FREETEXT query ignores Boolean, proximity, and wildcard operators by design. However you could do this:
WHERE FREETEXT('You gotta love MS-SQL') > 0
AND FREETEXT('You gotta love MySQL too...') > 0
Or that's what I think :)
-- The idea is make it evaluate to Boolean, so you can use boolean operators. Don't know if this would give an error or not. I think it should work. But reference material is pointing to the fact that this is not possible by design.
The use of CONTAINS() instead of FREETEXT() could help.
OK, this change is in -- we now use CONTAINS() with implicit AND instead of FREETEXT() and its implicit OR.
I just started reading about freetext so bear with me. If what you are trying to do is allow searches for a tag, say VB, also find things tagged as VB6, Visual Basic, VisualBasic and VB.Net, wouldn't those values be set as synonyms in the DB's Thesaurus rather than query parameters?
If that is indeed the case, this link on MSDN explains how to add items to the Thesaurus.