How to search for matches with optional special characters in SQL? - sql

I have a problem with a search query I am using my data contains names and information that has apostrophes in various forms (HTML encoded and actual).
So for example I would like an alternative to this:
SELECT * FROM Customers WHERE REPLACE(LastName,'''','')
LIKE Replace('O''Brien,'''','')
This is just an example, what I want is a way where if someone types OBrien or O'Brien this will still work, I need to replace three versions of the character and the data is feed sourced and cannot be changed - what can be done to a query to allow for this kind of search to work.
I have Items with names which work this way which currently have many nested REPLACE functions and cannot seem to find something that will work this way, which is more efficient.
I am using MS SQL 2000 with ASP if that helps.
Edit
Here is the query that needs to match O'Brien or OBrien, this query does this but is too inefficient - it is joined by another for Item Names and FirstName (optional) for matching.
SELECT * FROM Customers
WHERE
REPLACE(REPLACE(REPLACE(LastName,''',''),''',''),'''','')
LIKE
REPLACE(REPLACE(REPLACE('%O'Brien%',''',''),''',''),'''','')

If you want to stay correct and do this in SQL this is probably the best you can do
SELECT * FROM Customers WHERE
LastName LIKE 'O%Brien' AND
REPLACE(LastName,'''','') LIKE 'O''Brien'
You will still get table scans sometimes, due to poor selectivity.
The reason for the first where is to try to use an existing index.
The reason for the second match is to ensure that last names like ObbBrien do not match.
Of course the best thing to do would be not to need the ugly replace. This could be achieved in the app by storing an additional clean lastname column. Or in a trigger. Or in an indexed view.

You could try this:
SELECT *
FROM Customers
WHERE LastName LIKE Replace('O''Brien,'''','%')
This should allow it to use an index as you are not modifying the original column.

For pure SQL, the escaping is entirely unnecessary.
SELECT * FROM Customers WHERE LastName = 'O''Brien'

Use parameters instead of building the queries in code.
If you are using ADO you can use a syntax like this:
Dim cmd, rs, connect, intNumber
Set cmd = Server.CreateObject("ADODB.Command")
cmd.ActiveConnection = "your connectionstring"
cmd.CommandText = "SELECT * FROM Customers WHERE LastName LIKE #LastName"
cmd.Parameters.Append cmd.CreateParameter("#LastName",,,,"O'Brien")
Set rs = cmd.Execute
This should perform the query and insert the string O'Brien properly formatted for your database.
Using parameters ensures that all values are properly formatted and it also protects you against sql injection attacks.

Related

Using SilverStripe's SQLQuery object without selecting * fields

Looking at the documentation, there are fairly good instructions on how to build up a SQL query.
My code looks like this:
$sqlQuery = new SQLQuery();
$sqlQuery->setFrom('PropertyPage')->selectField('PropertyType')->setDistinct(true);
I'm aiming to get the following SQL:
SELECT DISTINCT PropertyType FROM PropertyPage
but instead I'm getting back this:
SELECT DISTINCT *, PropertyType FROM PropertyPage
Even their own example seems to give back a 'SELECT DISTINCT *. How can I avoid this?
Why do you use SQLQuery directly?
With pure ORM it should go like this:
$result = PropertyPage::get()->setQueriedColumns(array('PropertyType'))->distinct();
which returns a DataList you can loop over.
See setQueriedColumns() and distinct()
What version of SS framework are you using? Distinct was added in 3.1.7 iirc.
Just to add to #wmk's answer as well as directly address how to do it with SQLQuery, your call to selectField is the reason why that query happened. The documentation to selectField shows that it adds an additional field to select, not restrict the list to just that field.
The reason why their examples (A and B) also had the issue is that the first parameter for the constructor for SQLQuery is a default value of "*" for select statement.
To still use your code as a base, replace your use of selectField with setSelect. Your query will then look like this:
SELECT DISTINCT PropertyType FROM PropertyPage
It isn't bad to directly query the database using SQLQuery especially if you really just want the raw data of a particular column or the result itself cannot be expressed against a DataObject (eg. using column aliases). You would also have a small performance improvement that PHP would not have to instantiate a DataObject.
While saying that, it can be far more useful having a DataObject and the various functions that it exposes per record.

SQL wildcard issue

I have a database which can be modified by our users through an interface. For one field (companyID) they should have the ability to place an asterisk in the string as a wildcard character.
For example, they can put in G378* to stand for any companyID starting with G378.
Now on my client program I'm providing a "full" companyID as a parameter:
SELECT * FROM table WHERE companyID = '" + myCompanyID + "'
But I have to check for the wildcard, is there anything I can add to my query to check for this. I'm not sure how to explain it but it's kinda backwards from what I'm used to. Can I modify the value I provide (the full companyID) to match the wildcard value from in the query itself??
I hope this maked sense.
Thanks!
EDIT: The user is not using SELECT. The user is only using INSERT or UPDATE and THEY are the ones placing the * in the field. My program is using SELECT and I only have the full companyID (no asterisk).
This is a classic SQL Injection target! You should be glad that you found it now.
Back to your problem, when users enter '*', replace it with '%', and use LIKE instead of = in your query.
For example, when end-users enter "US*123", run this query:
SELECT * FROM table WHERE companyID LIKE #companyIdTemplate
set #companyIdTemplate parameter to "US%123", and run the query.
I used .NET's # in the example, but query parameters are denoted in ways specific to your hosting language. For example, they become ? in Java. Check any DB programming tutorial on use of parameterized queries to find out how it's done in your system.
EDIT : If you would like to perform an insert based on a wildcard that specifies records in another table, you can do an insert-from-select, like this:
INSERT INTO CompanyNotes (CompanyId, Note)
SELECT c.companyId, #NoteText
FROM Company c
WHERE c.companyId LIKE 'G378%'
This will insert a record with the value of the #NoteText parameter into CompanyNotes table for each company with the ID matching "G378%".
in TSQL I would use replace and like. ie:
select * from table where companyid like replace(mycompanyid,'*','%');
This is somewhat implementation dependant and you did not mention which type of SQL you are dealing with. However, looking at MS SQL Server wildcards include % (for any number of characters) or _ (for a single character). Wildcards are only evaluated as wildcards when used with "like" and not an = comparison. But you can pass in a paramater that includes a wildcard and have it evaluated as a wildcard as long as you are using "like"

why would you use WHERE 1=0 statement in SQL?

I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection

VB.NET - Having some trouble with a parametized queries

I built a prototype system with some database queries. Since it was just a prototype and I was very new to databases, I just used a direct string. This is the string I used and it works fine:
command = New OleDbCommand("SELECT * FROM " + prefix + "CanonicForms WHERE Type=1 AND Canonic_Form='" + item + "'", dictionary_connection)
Now, in putting it in a real system, I wanted to use the more secure parametized method, so after some googling I came up with this:
command = New OleDbCommand("SELECT * FROM #prefix WHERE Type=1 AND Canonic_Form=#form", dictionary_connection)
command.Parameters.AddWithValue("#prefix", prefix + "CanonicForms")
command.Parameters.AddWithValue("#form", item)
But all I get is an error for an incomplete query clause. What have I done differently between the two?
Your table name can't be a parameter. You might have to do some form of concatenation. It's not really a parameter in the formal sense of the word.
As ek_ny states, your table name can not be a parameter.
If you are really paranoid about injection then check the passed in table name against a white list of allowable values prior to buidling up the SQL string.
#cost, I agree that it would be nice but its just not a feature that exists. Generally its assumed that your WHERE clauses are the dynamic portion of the query and everything else, including the SELECT list and the table name, are static. In fact, most of the WHERE clause isn't even dynamic, only the search values themselves. You can't dynamically add column names this way either. This could definitely be built but it would require that the query engine be more aware of the database engine which is a can of worms that Microsoft didn't feel like opening.
Imagine a drop down menu with table names such as 'Jobs', 'People', 'Employers' that some naive developer built. On postback that value is used to SELECT * FROM. A simple SQL injection would be all that it takes to jump to another table not listed in that menu. And passing something really weird could very easily throw an error that could reveal something about the database. Value-only parametrized queries can still be broken but the surface area is much smaller.
Some people with multiple table prefixes use schemas, is that an option for you?

How can I make MS Access Query Parameters Optional?

I have a query that I would like to filter in different ways at different times. The way I have done this right now by placing parameters in the criteria field of the relevant query fields, however there are many cases in which I do not want to filter on a given field but only on the other fields. Is there any way in which a wildcard of some sort can be passed to the criteria parameter so that I can bypass the filtering for that particular call of the query?
If you construct your query like so:
PARAMETERS ParamA Text ( 255 );
SELECT t.id, t.topic_id
FROM SomeTable t
WHERE t.id Like IIf(IsNull([ParamA]),"*",[ParamA])
All records will be selected if the parameter is not filled in.
Note the * wildcard with the LIKE keyword will only have the desired effect in ANSI-89 Query Mode.
Many people mistakenly assume the wildcard character in Access/Jet is always *. Not so. Jet has two wildcards: % in ANSI-92 Query Mode and * in ANSI-89 Query Mode.
ADO is always ANSI-92 and DAO is always ANSI-89 but the Access interface can be either.
When using the LIKE keyword in a database object (i.e. something that will be persisted in the mdb file), you should to think to yourself: what would happen if someone used this database using a Query Mode other than the one I usually use myself? Say you wanted to restrict a text field to numeric characters only and you'd written your Validation Rule like this:
NOT LIKE "*[!0-9]*"
If someone unwittingly (or otherwise) connected to your .mdb via ADO then the validation rule above would allow them to add data with non-numeric characters and your data integrity would be shot. Not good.
Better IMO to always code for both ANSI Query Modes. Perhaps this is best achieved by explicitly coding for both Modes e.g.
NOT LIKE "*[!0-9]*" AND NOT LIKE "%[!0-9]%"
But with more involved Jet SQL DML/DDL, this can become very hard to achieve concisely. That is why I recommend using the ALIKE keyword, which uses the ANSI-92 Query Mode wildcard character regardless of Query Mode e.g.
NOT ALIKE "%[!0-9]%"
Note ALIKE is undocumented (and I assume this is why my original post got marked down). I've tested this in Jet 3.51 (Access97), Jet 4.0 (Access2000 to 2003) and ACE (Access2007) and it works fine. I've previously posted this in the newsgroups and had the approval of Access MVPs. Normally I would steer clear of undocumented features myself but make an exception in this case because Jet has been deprecated for nearly a decade and the Access team who keep it alive don't seem interested in making deep changes to the engines (or bug fixes!), which has the effect of making the Jet engine a very stable product.
For more details on Jet's ANSI Query modes, see About ANSI SQL query mode.
Back to my previous exampe in your previous question. Your parameterized query is a string looking like that:
qr = "Select Tbl_Country.* From Tbl_Country WHERE id_Country = [fid_country]"
depending on the nature of fid_Country (number, text, guid, date, etc), you'll have to replace it with a joker value and specific delimitation characters:
qr = replace(qr,"[fid_country]","""*""")
In order to fully allow wild cards, your original query could also be:
qr = "Select Tbl_Country.* From Tbl_Country _
WHERE id_Country LIKE [fid_country]"
You can then get wild card values for fid_Country such as
qr = replace(qr,"[fid_country]","G*")
Once you're done with that, you can use the string to open a recordset
set rs = currentDb.openRecordset(qr)
I don't think you can. How are you running the query?
I'd say if you need a query that has that many open variables, put it in a vba module or class, and call it, letting it build the string every time.
I'm not sure this helps, because I suspect you want to do this with a saved query rather than in VBA; however, the easiest thing you can do is build up a query line by line in VBA, and then creating a recordset from it.
A quite hackish way would be to re-write the saved query on the fly and then access that; however, if you have multiple people using the same DB you might run into conflicts, and you'll confuse the next developer down the line.
You could also programatically pass default value to the query (as discussed in you r previous question)
Well, you can return non-null values by passing * as the parameter for fields you don't wish to use in the current filter. In Access 2003 (and possibly earlier and later versions), if you are using like [paramName] as your criterion for a numeric, Text, Date, or Boolean field, an asterisk will display all records (that match the other criteria you specify). If you want to return null values as well, then you can use like [paramName] or Is Null as the criterion so that it returns all records. (This works best if you are building the query in code. If you are using an existing query, and you don't want to return null values when you do have a value for filtering, this won't work.)
If you're filtering a Memo field, you'll have to try another approach.