Should I quote numbers in SQL? - sql

I remember reading about quoting stuff when doing a SQL query and that when you quote something, it becomes a string. I also read that numbers should not be quoted. Now, I can't find that quotation and I need to refresh my memory to see if I should quote numbers.

You should not quote numbers if you want to treat them as numbers.
You're correct by remembering that it makes it a string.
SELECT 10 AS x
is perfectly legal and will return (in most database engines) a column of datatype int (or a variation thereof.)
If you do this:
SELECT '10' AS x
instead you'll get a textual data type. This might too be suitable in some cases, but you need to decide whether you want the result as text or as a number.

Here's an example, where quoting would produce inconsistent results (in MySQL):
select 1 < 1.0; // returns 0
select '1' < '1.0'; // returns 1
That's because the second comparison is performed using the current string collation rather than numerically.
It's better not to quote numbers, since that would just be an extra unnecessary step for the database to convert the string literal into a numeric value for comparison and could alter the meaning of comparisons.

This answer is for Microsoft SQL Server, in particular MSSQL 2008 R2.
In handwritten SQL I would never quote numbers (unless inserting a value into a varchar column where the string inserted just happens to be a number). But sometimes when generating SQL programmatically it can make life simpler to just quote everything. (This is in database maintenance scripts or library routines that work on any table, without knowing the column types in advance.)
I wondered whether doing so would impose a performance penalty. If I've used a quoted value in my SQL statement, the server must parse it as a string and then have to convert it to integer. But then, parsing SQL would involve converting strings to integers anyway. And the query parse time is usually only a small fraction of the total.
I ran some test statements looking like
insert into #t values (123, 123, 123), (123, 123, 123)
insert into #t values ('123', '123', '123'), ('123', '123', '123')
but with a larger number of columns in #t, a larger number of value tuples inserted in one go, and each statement repeated many times. I happened to use Perl for that:
$dbh = my_database_connection(); # using DBD::Sybase
$n = 20; # this many value tuples, and also repeated this many times
$v = "'123'";
# $v = 123; # uncomment this to insert without quoting
#cols = 'aa' .. 'zz';
#ds = map { "[$_] int not null" } #cols;
#vs = map { $v } #cols;
$" = ", ";
$dbh->do("create table #t (#ds)");
foreach (1 .. $n) {
$sql = 'insert into #t values ';
$sql .= "(#vs), " foreach 1 .. $n;
$sql =~ s/, \z//;
$dbh->do($sql);
}
but you can trivially write the same benchmark in any language. I ran this several times for the quoted case and for the unquoted, and observed no significant difference in speed. (On my setup a single run took about ten seconds; you can obviously change $n to make it faster or slower.)
Of course, the generated SQL is bigger if it has the redundant quote characters in it. It surely can't be any faster, but it does not appear to be noticeably slower.
Given this result, I will simplify my SQL-generation code to add single quotes around all values, without needing to know the data type of the column to be inserted into. (The code does still need to defend against SQL injection by making sure the input value doesn't itself contain the ' character, or otherwise quoting cleverly.)
I still don't recommend quoting numbers in SQL in the normal course of things, but if you find you have to do it, it doesn't appear to cause any harm. Similarly, in generated code I put [] around column names whether needed or not, but I consider that unnecessary cruft in handwritten SQL.

I don't know what you may have read, but don't quote numbers.

Obviously, don't forget to check to make sure any value you passed is really a number.

Well, at the risk of igniting a flame, may I respectfully disagree a little here that it is NEVER ok to use single quotes around a numeric value? It seems to me that it SOMETIMES makes sense to use single quotes around a numeric value. If col1 is an INT column, then (using vbscript as an example)
sql = "INSERT " & foo & " INTO col1 WHERE ID = 1"
and
sql = "INSERT '" & foo & "' INTO col1 WHERE ID = 1"
will BOTH, when sql is executed, insert any integer value of foo correctly. But what if you want 0 to be inserted when foo is not initialized? Using a quoted numeric expression like this prevents an error and handles the null case. Whether or not you think this is good practice, it is certainly true.

eh... no you shouldn't?
I assume you mean by quoting enclosing in ' like 'this'
INSERT INTO table (foo) VALUES (999); is perfectly legal as long as foo is an INT type collumn
INSERT INTO table (foo) VALUES('foo'); Inserts the string foo into the table. You can't do this on INT type tables of course.

Related

WHERE clause returning no results

I have the following query:
SELECT *
FROM public."Matches"
WHERE 'Matches.Id' = '24e81894-2f1e-4654-bf50-b75e584ed3eb'
I'm certain there is an existing match with this Id (tried it on other Ids as well), but it returns 0 rows. I'm new to querying with PgAdmin so it's probably just a simple error, but I've read the docs up and down and can't seem to find why this is returning nothing.
Single quotes are only used for strings in SQL. So 'Matches.Id' is a string constant and obviously not the same as '24e81894-2f1e-4654-bf50-b75e584ed3eb' thus the WHERE condition is always false (it's like writing where 1 = 0)
You need to use double quotes for identifiers, the same way you did in the FROM clause.
WHERE "Matches"."Id" = ...
In general the use of quoted identifiers is strongly discouraged.

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Conditional update query to pad alphanumeric values in MS Access

I have an Access table with a field that contains alphanumeric values (1234, 123A, 12A34, ABC3, etc). I am trying to create a conditional update query to add leading zeros to bring all values that contain at least 1 letter up to five characters but none to the only numeric values (eg 123, 00A12, 0000X).
My current code looks like:
UPDATE MyTable SET MyTable!Field = Format(Field, String(5, "0")) WHERE MyTable!Field LIKE '*[A-Z]*'
When I run the query, I don't get any error messages but it also fails to add any leading zeros.
I've also tried Format(Field, "00000") using Not Like and '*[0-9]*' or '*[0123456789]*' etc.
Interestingly, when I run a query by itself to select any of the values containing a letter (Like '*[A-Z]*'), it correctly pulls all 1000 values that need to be updated but when I add the conditional, it fails. Similarly, I've been successful in the past with adding leading zeros the entire field using Format(Field), String(5, "0") but it also fails when I add a conditional.
I'm pretty new to Access and SQL, so I feel like I've probably misunderstood the syntax somewhere. Or is there something else I should be doing?
Format() is wrong function to use.
If every value in field is 5 characters or less, consider:
UPDATE MyTable SET Field = String(5-Len(Field), "0")) & Field WHERE Not IsNumeric(Nz(Field,0))

Coldfusion Query of Queries with Empty Strings

The query I start out with has 40,000 lines of empty rows, which stems from a problem with the original spreadsheet from which it was taken.
Using CF16 server
I would like to do a Query of Queries on a variably named 'key column'.
In my query:
var keyColumn = "Permit No."
var newQuery = "select * from source where (cast('#keyColumn#' as varchar) <> '')";
Note: the casting comes from this suggestion
I still get all those empty fields in there.
But when I use "City" as the keyColumn, it works. How do the values in both those columns differ when they both say [empty string] on the query dump?
Is it a problem with column names? What kind of data are in those cells?
where ( cast('Permit No.' as varchar) <> '' )
The problem is the SQL, not the values. By enclosing the column name in quotes, you are actually comparing the literal string "P-e-r-m-i-t N-o-.", not the values inside that column. Since the string "Permit No." can never equal an empty string, the comparison always returns true. That is why the resulting query still includes all rows.
Unless it was fixed in ColdFusion 2016, QoQ's do not support column names containing invalid characters like spaces. One workaround is to use the "columnNames" attribute to specify valid column names when reading the spreadsheet. Failing that, another option is to take advantage of the fact that query columns are arrays and duplicate the data under a valid column name: queryAddColumn(yourQuery, "PermitNo", yourQuery["Permit No."]) (Though the latter option is less ideal because it may require copying the underlying data internally):

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".