Remove parameters from SQL Query in with Regex / LogStash - sql

A 3rd party system I use logs all SQL queries along with rowcount & response time which I then send to Logstash/Elastic to calculate metrics. As this system doesn't use bind variables, and there are 10's of millions of queries a day, I need to be able to rollup the data, which I can't do if the majority of queries are unique. I need a way to replace the SQL query parameters with '?' as Oracle would do via Cursor Sharing.
i.e.
replace
'SELECT * FROM table_name WHERE id = 123'
with
'SELECT * FROM table_name WHERE id = ?'
I have access to Ruby scripting magic in Logstash, but unfortunately all of the google results for 'sql regex' or similar return results of how to use regular expressions in SQL, not the other way round. Before I go crafting a regular expression parser, I thought I would check in here to see if others have tried to solve a similar problem.
FYI, have looked at implementing a solution using a Ruby SQL AST library such as https://github.com/lfittl/pg_query but plugging Ruby libraries in to Logstash becomes more of a problem of writing a custom Filter plugin to do the work, which may be the answer, but i'm hoping I'm missing something obvious.

I am not a logstash/ruby developer/user, but in terms of regular expression you may try this one:
(=\s\W\w+\W|=\s\d+)
You can test this here
SELECT * FROM Table1 WHERE Column1 = 1
SELECT * FROM Table1 WHERE Column1 = 'abc'
SELECT * FROM Table1 WHERE (Column1 = 'abc' OR Column2 = 1)
SELECT * FROM Table1 WHERE (Column1 = 'abc' AND Column2 = 1) OR Column2 = 'zxy'
SELECT * FROM Table1 WHERE (Column1 = 'abc' AND Column2 = 1) OR Column2 = 'zxy' AND
Column3 = 2
SELECT * FROM Table1 WHERE Column1 = 1 AND Column2 = 2
Expected Results:
Match 1
Full match = 1
Group 1. = 1
Match 2
Full match = 'abc'
Group 1. = 'abc'
Match 3
Full match = 'abc'
Group 1. = 'abc'
Match 4
Full match = 1
Group 1. = 1
Match 5
Full match = 'abc'
Group 1. = 'abc'
Match 6
Full match = 1
Group 1. = 1
Match 7
Full match = 'zxy'
Group 1. = 'zxy'
Match 8
Full match = 'abc'
Group 1. = 'abc'
Match 9
Full match = 1
Group 1. = 1
Match 10
Full match = 'zxy'
Group 1. = 'zxy'
Match 11
Full match = 2
Group 1. = 2
Match 12
Full match = 1
Group 1. = 1
Match 13
Full match = 2
Group 1. = 2
Based on these results you can create a function to replace the value of '= 2' to '= ?'.
Hope that it at least gives you a starting point.

Related

Conditional for a field generated by a SELECT statement

SELECT
Req_PK,
Req_PostDate,
Req_code,
Req_CreateDate,
Req_FillDate,
Req_Canceldate,
Req_Hold,
(
Select Convert(varchar(50),Count(CanReq_PK))
From CanReq
Where CanReq_ReqFK = Req_PK) AS Applications,
Req_PublishstatusFK
FROM Req
WHERE Req_Filled <> 1
AND Req_Cancelled <> 1
AND Req_Template <> 1
AND Req_PublishstatusFK = 1
AND Req_publishstatusfk = 1
How do I modify this query so the Applications field/alias is not the everything returned by the SELECT statement but everything except the ones with value '0'?
Help is very much appreciated!
Use an explicit JOIN instead:
SELECT r.Req_PK, r.Req_PostDate, r.Req_code, r.Req_CreateDate, r.Req_FillDate,
r.Req_Canceldate, r.Req_Hold,
cr.num_Applications,
r.Req_PublishstatusFK
FROM Req r JOIN
(SELECT cr.CanReq_ReqFK, Count(*) as num_applications
FROM CanReq cr
GROUP BY cr.CanReq_ReqFK
) cr
ON cr.CanReq_ReqFK = r.Req_PK
WHERE r.Req_Filled <> 1 AND r.Req_Cancelled <> 1 AND r.Req_Template <> 1 AND
r.Req_PublishstatusFK = 1 AND r.Req_publishstatusfk = 1;
This will filter out any rows that don't have a record in CanReq -- the ones that would have a count of 0 in your version of the query.
I don't know why you would want the count to be a string. Of course, you can include the conversion if you need it, but it doesn't seem necessary to me.

SQL Implementing 'IN' operator with 'OR'

I am working on a legacy system which has a custom java implementation for generating SQL queries. That doesn't support 'IN' operation.
To implement 'IN' I have written something like
SELECT * from Q
WHERE IS_HIDDEN = 0 AND ID = 1
OR ID = 2 OR ID = 3 AND IS_DELETED = 0;
I know that the one like below would have been fine.
SELECT * from Q
WHERE IS_HIDDEN = 0 AND (ID = 1
OR ID = 2 OR ID = 3) AND IS_DELETED = 0 ;
Both these return the same result but I'm not too confident about SQL operator priorities. I had read that AND takes precedence
Is it safe to assume that both the SQL statemets are equivalent.
The actual query that I wanted to write is
SELECT * from Q
WHERE IS_HIDDEN = 0 AND ID IN(1, 2, 3) AND IS_DELETED = 0;
The DB in question is oracle 10g.
Update: The reason that this was working is because the oracle CBO rearranges the subclauses in the where clause.
No your queries are not the same
SELECT * from Q
WHERE IS_HIDDEN = 0 AND ID = 1
OR ID = 2 OR ID = 3 AND IS_DELETED = 0;
is like
SELECT * FROM Q WHERE IS_HIDDEN = 0 AND ID = 1
UNION
SELECT * FROM Q WHERE ID = 2
UNION
SELECT * FROM Q WHERE ID = 3 AND IS_DELETED = 0
when you use the parentheses for your ORs then you have the same like the IN-Clause
You can try it: SQLFiddle
You first query is equal to the IN. You should use that:
Your second query is like this:
SELECT * from Q
WHERE (IS_HIDDEN = 0 AND ID = 1) OR ID = 2 OR (ID = 3 AND IS_DELETED = 0);
If IS_HIDDEN is 1 or DELETED Is 1, but ID is 2, your query will still give you records. Try it..

SQL query to retrieve value or catch all

I'm dealing with a table in the following form:
A B
------ -----------
1 value1
2 value2
3 value3
-1 value4
In this table, the value -1 indicates a catch all, if there's no other match for A in the column. This means, a query for A = 2 should return a single record for which value2 is the value of column B. If the table is queried for, let's say, A = 6, then the value for B should be value4 (because it's the value associated with the catch all).
What's the "best" query to achieve this? Is there a better solution? I've scripted a small setup example in SQLFiddle, if that helps.
The database is SQL Server.
Can you help? Many thanks.
select top (1) A, B
from (
select A, B, 0 as priority from t where A = #value
union
select A, B, 1 from t where A = -1
) foo
order by priority
DECLARE #param int
SET #param = 6
SELECT *
FROM test
WHERE a = #param OR (
a = -1 AND #param NOT IN (SELECT a FROM test)
)
Replace #param = 6 with #param = 2 to test again
Here's how I would write the query:
SELECT TOP 1 B
FROM mytable
WHERE A IN (2,-1)
ORDER BY CASE WHEN A = -1 THEN 1 ELSE 0 END, B;
This statement has a hardcoded search argument of "2" (as in your example).
You would substitute your search argument in place of the hardcoded "2", obviously.
The CASE expression in the ORDER BY clause makes sure that the "catch all" value of -1 is a "last resort", the -1 will be the LAST value in the list, any other any value will come BEFORE the -1 in the sort.
A expr
-- ----
-2 0
-1 1
0 0
1 0
2 0
So, when I ORDER BY expr we are guaranteed that a value of -1 will be LAST in the list. After the resultset is sorted, the TOP 1 will return no more than 1 row. So, the value associated with the -1 "catch all" value will only be returned if no other matching value is found.
In MySQL, this would be trivial:
SELECT b FROM test WHERE a = #var or a = -1 ORDER BY a DESC LIMIT 1;
However, MSSQL doesn't have such a trick built in--you would need to add a stored procedure to limit it properly.
EDIT
It seems that in 2005, they added some paging functions:
SELECT TOP 1 b FROM test WHERE a = #var or a = -1 ORDER BY a DESC;
should work.
All that said, this sounds like an issue of poor design; I would look at the application that needs this and see if there wasn't a cleaner way to achieve a default value.
if exists(select * from YourTable WHERE A=6)
select * from YourTable WHERE A=6
else
select * from YourTable WHERE A<0

In SQL, what does using parentheses with an OR mean?

Example:
select count(*) from my table
where
column1 is not null
and
(column1 = 4 OR column1 = 5)
Example 2:
select count(*) from my table
where
column1 is not null
and
column1 = 4 OR column1 = 5
In my database with the real column names, I get two different results. The one with the parentheses is right because if I do:
select count(*) from my table
where
column1 is not null
and
column1 = 4
and then
select count(*) from my table
where
column1 is not null
and
column1 = 5
and add them together, I get the right answer...I think. Same as the first example with the parentheses above.
Why do I get different results by changing precedence with the OR test?
It's not Oracle or SQL. It's basic boolean logic. The AND condition is "stronger" (has precedence) than OR, meaning it will be evaluated first:
column1 is not null
and
column1 = 4 OR column1 = 5
Means
column1 is not null
and
column1 = 4
is evaluated first, then OR is applied between this and column1 = 5
Adding parentheses ensures OR is evaluated first and then the AND.
Pretty much like in maths:
2 * 3 + 5 = 6 + 5 = 11
but
2 * (3 + 5) = 2 * 8 = 16
More reading here: http://msdn.microsoft.com/en-us/library/ms190276.aspx
This comes down to whether your expression is parsed as:
(column1 is not null and column1 = 4) OR column1 = 5
or
column1 is not null and (column1 = 4 OR column1 = 5)
See the difference?
Parenthesis matter, (A AND B) OR C ≠ A AND (B OR C)
just like in math: (0 * 1) + 2 ≠ 0 * (1 + 2)
However, you can choose not to use parenthesis : SQL doesn't have operator precedence rules, so it strictly evaluates expressions from left to right. For instance:
true OR false AND false
is false, just like
(true OR false) AND false
while
true OR (false AND false)
is true.

Can I check that two sub query return the same result without using stored procedure?

I have a table like this:
ObjId OtherObjId active (bool)
1 5 0
1 7 1
1 9 0
2 6 0
...
...
...
54 5 0
54 7 1
54 9 0
This two queries return an identical result:
select OtherObj2,active from MyTable where ObjId1 = 1;
select OtherObj2,active from MyTable where ObjId1 = 54;
I would like to run a single query that return return true if the two query result are identical or false if they are not.
The table is a configuration table and I want to test easily if two object have the same configuration. I can perform the check using a store procedure, however I would like to avoid to use it.
I cannot think a way to check it using a query, I am wondering if it is possible.
Any hints?
Thanks for your help.
Don't know if your DBMS supports except but if it does you can use this.
select OtherObjId, active
from MyTable
where ObjId = 1
except
select OtherObjId, active
from MyTable
where ObjId = 54
you could use minus
(select OtherObj2,active from MyTable where ObjId1 = 1
minus
select OtherObj2,active from MyTable where ObjId1 = 54
)
union
(select OtherObj2,active from MyTable where ObjId1 = 54
minus
select OtherObj2,active from MyTable where ObjId1 = 1
)
returns no rows if the result is the same.
(You need both minuses to see if query1 has more rows than query2 or if query2 has more rows than query1)
If your database supports set operations, use EXCEPT (AKA MINUS):
select OtherObj2,active from MyTable where ObjId1 = 1
EXCEPT
select OtherObj2,active from MyTable where ObjId1 = 54
This will result in an empty table if both are identical.
You could try something like this. If it returns any records, the query results were not identical. If it returns 0 records, there were no mismatches.
select
OtherObjId,
Active
from
MyTable
where
ObjId in (1, 54)
group by
OtherObjId,
Active
having COUNT(*) < 2