SQL purpose of colon - sql

My colleague wrote this SQL statement and I had a hard time understanding it. What exactly is the purpose of using a colon in the where clause?
WHERE MGM_YYMM like :AS_YYMM
Full Query:
SELECT A.MGM_YYMM,
A.MGM_DATE,
A.MGM_GB,
A.INDATE,
A.SUDATE,
A.EMPNUM,
FROM SE_MAGAM A(NOLOCK)
WHERE MGM_YYMM like :AS_YYMM
ORDER BY MGM_YYMM DESC

It is a bind variable.
The program (or whatever else is issuing the query) will assign a value to :AS_YYMM, in this case the pattern to match against the MGM_YYMM column.
These kind of parameterized queries are useful because they can be prepared/parsed/compiled/analyzed once and then be run multiple times for varying inputs with reduced overhead (compared to a new query each time). Also helps against SQL injection (compared to building a dynamic SQL statement from user input).

Related

how to separate the parameters in the sql query and push it in to array to avoid sql injection

SELECT * FROM table1 WHERE year_month BETWEEN '2021-08' AND '2022-01';
update table2 set note_description = 'test #8:57am', patient_id = '5840', note_updated_by = '10000019', note_update_date = '2022-07-13 09:45:49' where note_id = '639'
now my backend queries can be attacked by sql injection so i want to avoid the sql injection
in the above queries I want to separate the parameters from queries and replace it with special characters so that I can avoid sql injection is there any package or anything to do it.
If you have received the SQL statement with the parameters already concatenated in, then this is the wrong place to fix your issue - there’s no way to safely parse the statement and separate out the parameters from the query.
You should find the place in the code where the parameters are concatenated into the Statement and leveraging Prepared Statements/Parameterized Queries to safely pass/bind the parameters.
If that’s not possible (for example because the code is structured to only pass along the statement) a less desirable alternative is to encode/enquote the parameters before concatenating them in, while ensuring they are all quoted in the statement. How you do that part will depend on the database / language being used.
I've seen one product that does this: pt-query-digest. It's a free tool that parses the MySQL query log, and produces reports of aggregate time spent running each query. To do this, it must establish a query "fingerprint" which allows it to group queries that are the same except for constant values. Like SELECT * FROM mytable WHERE id = 123 has the same fingerprint as SELECT * FROM mytable WHERE id = 456.
This means it must parse the queries and replace each constant value, like a numeric or string literal, with a placeholder ?. In cases of IN() predicates, it replaces the list of values with ?+. Also it reduces whitespace and removes comments.
It's a non-trivial amount of code, about 100 lines of Perl: https://github.com/percona/percona-toolkit/blob/3.x/lib/QueryRewriter.pm#L139-L248
In spite of this, the function is preceded by a comment that the developers acknowledge it is not perfect, and may miss some cases. Implementing a recursive-descent parser using regular expressions is not efficient or correct.
But this is probably not what you want to do anyway. You shouldn't be starting from a query with constant values and making them into a parameterized query. You should design parameterized queries yourself, as needed.
Not every constant value in an SQL query necessarily must be parameterized. Only the ones that aren't fixed values. That is, if you need to combine a variable from your client code into the SQL query string, and you can't guarantee that the variable is safe, then use a parameter. If a query has a constant value that is fixed (not interpolated from a variable), then it can remain in the query. If a query has a value that comes from a variable, but that variable is known to be safe, and never can be tainted by untrusted input, then it can remain in the query.
It's more reliable and economical for you to make these judgments. You know the code and the context much better than any automated system can.

why would you use WHERE 1=0 statement in SQL?

I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.

find match between 2 sql queries

I have two queries from v$sqlarea. For example
query 1: select * from employee emp where emp.eid = 5
query 2: select * from employee v where v.eid = 15
Both are exactly the same in structure. but they will be compiled separately each time..
I need to match such queries that vary only by alias names or bind variables.
The inbuilt function utl_match.jaro_winkler_similarity() in oracle provides a pretty good string match algorithm. But sql perspective match is not provided. Is there any other solution?
there is a script on Asktom that will find exactly this kind of statements (statements NOT using binds).
The output of that last query will show you statements that are identical in the shared
pool after all numbers and character string constants have been removed. These
statements -- and more importantly their counts -- are the potential bottlenecks. In
addition to causing the contention, they will be HUGE cpu consumers.
Oracle actually does this internally, to support cursor_sharing = similar. I am not aware that they expose this functionality anywhere.
The alias names is a tricky one. You could look for SQLs with the same PLAN_HASH_VALUE as a starting point.