Using Regex to determine what kind of SQL statement a row is from a list? - sql

I have a large list of SQL commands such as
SELECT * FROM TEST_TABLE
INSERT .....
UPDATE .....
SELECT * FROM ....
etc. My goal is to parse this list into a set of results so that I can easily determine a good count of how many of these statements are SELECT statements, how many are UPDATES, etc.
so I would be looking at a result set such as
SELECT 2
INSERT 1
UPDATE 1
...
I figured I could do this with Regex, but I'm a bit lost other than simply looking at everything string and comparing against 'SELECT' as a prefix, but this can run into multiple issues. Is there any other way to format this using REGEX?

You can add the SQL statements to a table and run them through a SQL query. If the SQL text is in a column called SQL_TEXT, you can get the SQL command type using this:
upper(regexp_substr(trim(regexp_replace(SQL_TEXT, '\\s', ' ')),
'^([\\w\\-]+)')) as COMMAND_TYPE

You'll need to do some clean up to create a column that indicates the type of statement you have. The rest is just basic aggregation
with cte as
(select *, trim(lower(split_part(regexp_replace(col, '\\s', ' '),' ',1))) as statement
from t)
select statement, count(*) as freq
from cte
group by statement;

SQL is a language and needs a parser to turn it from text into a structure. Regular expressions can only do part of the work (such as lexing).
Regular Expression Vs. String Parsing
You will have to limit your ambition if you want to restrict yourself to using regular expressions.
Still you can get some distance if you so want. A quick search found this random example of tokenizing MySQL SQL statements using regex https://swanhart.livejournal.com/130191.html

Related

How to manipulate multi-value string parameters for a SQL Command in Crystal Reports

I have a Crystal Report based on a SQL Command that, in part, consists of procedure names. What I'm trying to do is add a multi-value string parameter to the SQL Command such that the end users can enter partial procedure names and the report will return only those relevant procedures that string match.
For example, a user should be able to enter "%KNEE%" and "%HIP%" into the parameter and return all procedures that contain the words "KNEE" and "HIP". The problem is that I can't figure out how to manipulate the parameter value in the SQL to accomplish this. I've done this before with a report parameter (as opposed to a SQL Command parameter) by simply adding the line {table.procedure_name} like {?name match parameter} to the record selection formula, but taking the same approach in the SQL Command gets me an "ORA-00907: Missing right parenthesis" error.
Any suggestions on how I can manipulate the multi-value string parameter to accomplish this?
I dont like to post this as an answer because I don't care for the solution however it is the only way I have found to work around this.
I have had to instruct users to enter '%KNEE%','%HIP%','%ETC%' at the parameter prompt. Then the {table.procedure_name} like {?name match parameter} should work in your SQL. Not optimal, especially for your scenario with the %. I would love to hear someone provide a better solution because I have wrestled with this many times.
Here's an approach:
SELECT column0
FROM table0
INNER JOIN (
SELECT trim('%' || x.column_value.extract('e/text()') || '%') SEARCH
FROM ( SELECT 'arm,knee' options FROM dual ) t,
TABLE (xmlsequence(xmltype('<e><e>' || replace(t.options,',','</e><e>')|| '</e></e>').extract('e/e'))) x
) v ON column0 LIKE v.search
Use Oracle's XML functionality to convert a comma-delimited string to an equivalent number of rows, wrapping each clause with %%. Then join those rows to the desired table.
To use with CR, create a single-value, string parameter and add it to the code:
...
FROM ( SELECT '{?search_param}' options FROM dual ) t,
...

Sub-Queries in Sybase SQL

We have an application which indexes data using user-written SQL statements. We place those statements within parenthesis so we can limit that query to a certain criteria. For example:
select * from (select F_Name from table_1)q where ID > 25
Though we have discovered that this format does not function using a Sybase database. Reporting a syntax error around the parenthesis. I've tried playing around on a test instance but haven't been able to find a way to achieve this result. I'm not directly involved in the development and my SQL knowledge is limited. I'm assuming the 'q' is to give the subresult an alias for the application to use.
Does Sybase have a specific syntax? If so, how could this query be adapted for it?
Thanks in advance.
Sybase ASE is case sensitive w.r.t. all identifiers and the query shall work:
as per #HannoBinder query :
select id from ... is not the same as select ID from... so make sure of the case.
Also make sure that the column ID is returned by the Q query in order to be used in where clause .
If the table and column names are in Upper case the following query shall work:
select * from (select F_NAME, ID from TABLE_1) Q where ID > 25

Replacement for 'OR' in SphinxQL

I'm currently trying to integrate Sphinx search engine into Python application. The problem is that SphinxQL doesn't support OR clause as common SQL does. There are some hacks to use, like writing expressions in SELECT like this:
SELECT id,(field1 = val1 OR field2 = val2) as expr FROM foo_bar WHERE expr = 1;
However, it doesn't work with strings, because they should be handled using MATCH function. So I decided to divide query into separate subqueries and combine results obtained. Yet there's still a problem of getting a proper META information, especially the total_found field. Sphinx counts it for separate queries, but rows obtained from these queries may intersect and I have no ability to check it (database is large).
I believe there must be a solution. I'm using Sphinxit (SphinxAlchemy has a version conflict with SQLAlchemy I'm using).
Repost from SphinxSearch forum:
I have a table I need to search in with text and numerical columns as well. I need to
write a query with OR condition; found out that there's a way to do it using SELECT
expressions like:
SELECT *, quantity>=50 OR quantity=0 AS mycond FROM table1 WHERE mycond = 1;
Hopelessly it doesn't work with string attributes. This query isn't parsed:
SELECT *, category='foo' OR category='bar' AS mycond FROM table1 WHERE mycond = 1;
Yet this is working in Beta 2.2.3:
SELECT * FROM table1 WHERE category='foo';
What should I do to find count of rows that fit one of conditions, not every one of them?
I can make a few queries and merge obtained items into one list, but I need to now how
much of these rows are in the database now.
For attribute / facet OR'ing, I think you're correct that the only way is to put an expression in the SELECT clause.
For strings, though, check out the documentation on the fulltext query syntax. You can't exactly use the OR keyword, but something like this should work:
SELECT id, name
FROM recipes
WHERE MATCH('(#ingredients chocolate) | (#name cake)')
LIMIT 10;

Use like in T-SQl to search for words separated by an unknown number of spaces

I have this query:
select * from table where column like '%firstword[something]secondword[something]thirdword%'
What do I replace [something] with to match an unknown number of spaces?
Edited to add: % will not work as it matches any character, not just spaces.
Perhaps somewhat optimistically assuming "unknown number" includes zero.
select *
from table where
REPLACE(column_name,' ','') like '%firstwordsecondwordthirdword%'
The following may help: http://blogs.msdn.com/b/sqlclr/archive/2005/06/29/regex.aspx
as it describes using regular expressions in SQL queries in SQL Server 2005
I would definitely suggest cleaning the input data instead, but this example may work when you call it as a function from the SELECT statement. Note that this will potentially be very expensive.
http://www.bigresource.com/MS_SQL-Replacing-multiple-spaces-with-a-single-space-9llmmF81.html

Make an SQL request more efficient and tidy?

I have the following SQL query:
SELECT Phrases.*
FROM Phrases
WHERE (((Phrases.phrase) Like "*ing aids*")
AND ((Phrases.phrase) Not Like "*getting*")
AND ((Phrases.phrase) Not Like "*contracting*"))
AND ((Phrases.phrase) Not Like "*preventing*"); //(etc.)
Now, if I were using RegEx, I might bunch all the Nots into one big (getting|contracting|preventing), but I'm not sure how to do this in SQL.
Is there a way to render this query more legibly/elegantly?
Just by removing redundant stuff and using a consistent naming convention your SQL looks way cooler:
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND phrase NOT LIKE '%getting%'
AND phrase NOT LIKE '%contracting%'
AND phrase NOT LIKE '%preventing%'
You talk about regular expressions. Some DBMS do have it: MySQL, Oracle... However, the choice of either syntax should take into account the execution plan of the query: "how quick it is" rather than "how nice it looks".
With MySQL, you're able to use regular expression where-clause parameters:
SELECT something FROM table WHERE column REGEXP 'regexp'
So if that's what you're using, you could write a regular expression string that is possibly a bit more compact that your 4 like criteria. It may not be as easy to see what the query is doing for other people, however.
It looks like SQL Server offers a similar feature.
Sinec it sounds like you're building this as you go to mine your data, here's something that you could consider:
CREATE TABLE Includes (phrase VARCHAR(50) NOT NULL)
CREATE TABLE Excludes (phrase VARCHAR(50) NOT NULL)
INSERT INTO Includes VALUES ('%ing aids%')
INSERT INTO Excludes VALUES ('%getting%')
INSERT INTO Excludes VALUES ('%contracting%')
INSERT INTO Excludes VALUES ('%preventing%')
SELECT
*
FROM
Phrases P
WHERE
EXISTS (SELECT * FROM Includes I WHERE P.phrase LIKE I.phrase) AND
NOT EXISTS (SELECT * FROM Excludes E WHERE P.phrase LIKE E.phrase)
You are then always just running the same query and you can simply change what's in the Includes and Excludes tables to refine your searches.
Depending on what SQL server you are using, it may support REGEX itself. For example, google searches show that SQL Server, Oracle, and mysql all support regex.
You could push all your negative criteria into a short circuiting CASE expression (works Sql Server, not sure about MSAccess).
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND CASE
WHEN phrase LIKE '%getting%' THEN 2
WHEN phrase LIKE '%contracting%' THEN 2
WHEN phrase LIKE '%preventing%' THEN 2
ELSE 1
END = 1
On the "more efficient" side, you need to find some criteria that allows you to avoid reading the entire Phrases column. Double sided wildcard criteria is bad. Right sided wildcard criteria is good.