Puzzling SQL server behaviour - results in different formats if there is a 1<>2 expression in WHERE clause - sql

I have two almost identical SELECT statements. I am running them on a SQL Server 2012 with server collation Danish_Norwegian_CI_AS, and database collation Danish_Norwegian_CI_AS. The database runs in compatibility level set to SQL Server 2005 (90).
I run both of the queries on the same client via a SQL Server 2012 Management Studio. The client is a Windows 8.1 laptop.
The puzzling part is, although the statements are almost identical, the resultset is different as shown below (one returns 24-hour format time, the other with AM / PM, which gets truncated tpo P in this case). The only difference is the 'and 1<>2' expression in WHERE clause. I looked up and down, searched in google, digged as deep as I could, cannot find an explanation. Tried COLLATE to force conversion, did not help. If I use 108 to force formatting in CONVERT call, then the resultsets are alike. But not knowing the reason why this does not work is eating me alive.
Issue recreated on SqlFiddle, SQL Server 2008:
http://sqlfiddle.com/#!3/a97f8/1
Have someone an explain for this?
The SQL DDL and statements after results can be used to recreate the issue. The script creates a table with two columns, inserts some rows, and makes two selects.
On my machine the sql without the 1<>2 expression returns:
Id StartTime
----------- ---------
2 2:00P
2 2:14P
The sql with the 1<>2 expression returns:
Id StartTime
----------- ---------
2 14:00
2 14:14
if NOT EXISTS (Select * from sysobjects where name = 'timeVarchar')
begin
create table timeVarchar (
Id int not null,
timeTest datetime not null
)
end
if not exists (select * from timeVarchar)
begin
-- delete from timeVarchar
print 'inserting'
insert into timeVarchar (Id, timeTest) values (1, '2014-04-09 11:37:00')
insert into timeVarchar (Id, timeTest) values (2, '1901-01-01 14:00:00')
insert into timeVarchar (Id, timeTest) values (3, '2014-04-05 15:00:00')
insert into timeVarchar (Id, timeTest) values (2, '1901-01-01 14:14:14')
end
select
Id,
convert ( varchar(5), convert ( time, timeTest)) as 'StartTime'
from
timeVarchar
where
Id = 2
select
Id,
convert ( varchar(5), convert ( time, timeTest)) as 'StartTime'
from
timeVarchar
where
Id = 2 and
1 <> 2

I can't answer why this is happening (at least not at the moment), but setting the conversion format explicitly does solve the issue:
select Id,
convert (varchar(5), convert (time, timeTest), 14) as "StartTime"
from timeVarchar
where Id = 2;
select Id,
convert (varchar(5), convert (time, timeTest), 14) as "StartTime"
from timeVarchar
where Id = 2
and 1 <> 2;
Going through the execution plan, the two queries end up very different indeed.
The first one passes 2 as a parameter and (!) does CONVERT_IMPLICIT of the value. The second one passes it as a part of the query itself!
In the end, the actual query that gets to run in the first case actually explicitly does CONVERT(x, y, 0). For US locale, this is not a problem, since 0 is the invariant (~US) culture. But outside of the US, you're suddenly using 0 instead of e.g. 4 (for Germany).
So, definitely, one thing to take from this is that queries that look very much alike could execute completely differently.
The second thing is - always use convert with a specific format. The defaults don't seem to be entirely reliable.
EDIT: Ah, finally fished the thing out of the MSDN:
http://msdn.microsoft.com/en-us/library/ms187928.aspx
In earlier versions of SQL Server, the default style for CAST and
CONVERT operations on time and datetime2 data types is 121 except when
either type is used in a computed column expression. For computed
columns, the default style is 0. This behavior impacts computed
columns when they are created, used in queries involving
auto-parameterization, or used in constraint definitions.
Since the first query is invoked as a parametrized query, it gets the default style 0, rather than 121. This behaviour is fixed in compatibility level 110+ (i.e. SQL SERVER 2012+) - on those servers, the default is always 121.

It seems the problem is solved in SQL2012
see this link
http://sqlfiddle.com/#!6/a97f8/4
p.s Your mentioned url on sqlfiddle is running on SQL2008

Related

Concatenate different variables with number values in SQL Server

I have different variables with string values, that I simply want to concatenate into one number.
For example I have the following variables:
pupil_id, class, term, id should result into Pupil_class_id
23 8 3 23 8323
I tried:
Select
pupil_id, class, term, id,
concat(class + term + id) as Pupil_class_id
from
school
and I tried
concat(class, term, id) as Pupil_class_id
from school
and I tried
concat('class', 'term' 'id') as Pupil_class_id
from school
I would have thought that solution 1 or 2 work, but they don't
Any suggestions?
No errors, but also no result, just 0
The approach with using a comma (,) as separator, and specifying the column names without single quotes is the way to do. I cannot reproduce any problems with that - not sure why you say this didn't work for you....
Try this (I'm using SQL Server 2016, but this should work from 2012 on):
DECLARE #Input TABLE (pupil_id INT, class INT, term INT, id INT)
INSERT INTO #Input (pupil_id, class, term, id)
VALUES (23, 8, 3, 23)
SELECT
CONCAT(class, term, id)
FROM
#Input
I get this output:
(No column name)
8323
which - if I understand correctly - is what you're looking for.
The only reason this might not work is if you're using an "old" database, e.g. if your database compatibility level is set to an earlier version of SQL Server. Was this database "upgraded" from a previous version of SQL Server?
Try this:
SELECT compatibility_level
FROM sys.databases
WHERE database_id = DB_ID()
What value do you get?? SQL Server 2012 should be "110" - do you have a lower value?

SQL Server subquery behaviour

I have a case where I want to check to see if an integer value is found in a column of a table that is varchar, but is a mix of values that can be integers some are just strings. My first thought was to use a subquery to just select the rows with numeric-esque values. The setup looks like:
CREATE TABLE #tmp (
EmployeeID varchar(50) NOT NULL
)
INSERT INTO #tmp VALUES ('aa1234')
INSERT INTO #tmp VALUES ('1234')
INSERT INTO #tmp VALUES ('5678')
DECLARE #eid int
SET #eid = 5678
SELECT *
FROM (
SELECT EmployeeID
FROM #tmp
WHERE IsNumeric(EmployeeID) = 1) AS UED
WHERE UED.EmployeeID = #eid
DROP TABLE #tmp
However, this fails, with: "Conversion failed when converting the varchar value 'aa1234' to data type int.".
I don't understand why it is still trying to compare #eid to 'aa1234' when I've selected only the rows '1234' and '5678' in the subquery.
(I realize I can just cast #eid to varchar but I'm curious about SQL Server's behaviour in this case)
You can't easily control the order things will happen when SQL Server looks at the query you wrote and then determines the optimal execution plan. It won't always produce a plan that follows the same logic you typed, in the same order.
In this case, in order to find the rows you're looking for, SQL Server has to perform two filters:
identify only the rows that match your variable
identify only the rows that are numeric
It can do this in either order, so this is also valid:
identify only the rows that are numeric
identify only the rows that match your variable
If you look at the properties of this execution plan, you see that the predicate for the match to your variable is listed first (which still doesn't guarantee order of operation), but in any case, due to data type precedence, it has to try to convert the column data to the type of the variable:
Subqueries, CTEs, or writing the query a different way - especially in simple cases like this - are unlikely to change the order SQL Server uses to perform those operations.
You can force evaluation order in most scenarios by using a CASE expression (you also don't need the subquery):
SELECT EmployeeID
FROM #tmp
WHERE EmployeeID = CASE IsNumeric(EmployeeID) WHEN 1 THEN #eid END;
In modern versions of SQL Server (you forgot to tell us which version you use), you can also use TRY_CONVERT() instead:
SELECT EmployeeID
FROM #tmp
WHERE TRY_CONVERT(int, EmployeeID) = #eid;
This is essentially shorthand for the CASE expression, but with the added bonus that it allows you to specify an explicit type, which is one of the downsides of ISNUMERIC(). All ISNUMERIC() tells you is if the value can be converted to any numeric type. The string '1e2' passes the ISNUMERIC() check, because it can be converted to float, but try converting that to an int...
For completeness, the best solution - if there is an index on EmployeeID - is to just use a variable that matches the column data type, as you suggested.
But even better would be to use a data type that prevents junk data like 'aa1234' from getting into the table in the first place.

What is SQL Server 2005 expected behavior of insert into table select query where one of the columns attempts to convert a null value

We have a statement in some legacy SQL Server 2005 code like
insert into myTable
select distinct
wherefield1,
wherefield2,
anotherfield,
convert(numeric(10,2), varcharfield1),
convert(numeric(10,2), varcharfield2),
convert(numeric(10,2), varcharfield3),
convert(datetime, varcharfield4),
otherfields
from myStagingTable
where insertflag='true'
and wherefield1 = #wherevalue1
and wherefield2 = #wherevalue2
Earlier in the code, a variable is set to determine whether varcharfield1 or varcharfield2 is null, and the insert is programmed to execute as long as one of them is not null.
We know that if varcharfield1, varcharfield2, or varcharfield3 is a nonnumeric character string, an exception will be thrown and the insert will not occur. But I am perplexed by the behavior when one of these variables is null, as it often is. Actually, it is always the case that one of these values is null. But it seems that the insertion does take place. It looks like the legacy code relies on this to prevent only insertion of nonnumeric character data, while allowing insertion of null or empty values (in an earlier step, all empty strings in these fields of myStagingTable are replaced with null values).
This has been running on a Production SQL Server 2005 instance with all default settings for a number of years. Is this behavior we can rely on if we upgrade to a newer version of SQL Server?
Thanks,
Rebeccah
conversion of NULL to anything is still NULL. If the column allows NULL, that's what you'll get. If the column is not nullable, it will fail.
You can see this yourself without even doing an INSERT. Just run this:
SELECT CONVERT(numeric(10,2), NULL)
and note how it produces a NULL result. Then run this:
SELECT CONVERT(numeric(10,2), 'x')
and note how it throws an error message instead of returning anything.

T-Sql - Select query in another select query takes long time

I have a procedure with arguments but its calling takes a very long time. I decided to check what is wrong with my query and came to the conclusion that the problem is Column In (SELECT [...]).
Both queries return 1500 rows.
First query: time 45 second
Second query: time 0 second
1.
declare #FILTER_OPTION int
declare #ID_DISTRIBUTOR type_int_value
declare #ID_DATA_TYPE type_bigint_value
declare #ID_AGGREGATION_TYPE type_int_value
set #FILTER_OPTION = 8
insert into #ID_DISTRIBUTOR values (19)
insert into #ID_DATA_TYPE values (30025)
insert into #ID_AGGREGATION_TYPE values (10)
SELECT * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (select [VALUE] from #ID_DISTRIBUTOR)
AND [ID_DATA_TYPE] IN (select [VALUE] from #ID_DATA_TYPE)
AND [ID_AGGREGATION_TYPE] IN (select [VALUE] from #ID_AGGREGATION_TYPE)
2.
select * FROM dbo.[DATA] WHERE
[ID_DISTRIBUTOR] IN (19)
AND [ID_DATA_TYPE] IN (30025)
AND [ID_AGGREGATION_TYPE] IN (10)
Why this is happening?
How should I create a stored procedure that takes an array of arguments to use it quickly?
Edit:
Maybe it's a problem with indexes? indexes are created on these three columns.
For such a large performance difference, I would guess that you have one or more indexes. In particular, if you have an index on (ID_DISTRIBUTOR, ID_DATA_TYPE, ID_AGGREGATION_TYPE), then the second query can make use of the index. SQL Server can recognize that the IN is really = and the query is a simple lookup.
In the first case, SQL Server doesn't "know" that the subqueries really have only one row in them. That requires a different set of optimizations. In particular, the above index cannot be used, because the IN generally optimizes differently from =.
As for what to do. First, look at the execution plans so you can see the different between the two versions. Then, test the second version with more than one value in the IN lists.
If you can live with just one value for each comparison, then use = rather than IN.

sql server with an update and 2 datetime field and getdate()

requirement is, both field must be equal, what would you do
declare #var datetime
set #var = getdate()
update table set f1=#var,f2=#var
or simply
update table set f1=getdate(),f2=getdate()
Definitely the first way, because 2 calls to getdate() will most likely return different values.
Original Answer: getdate() seems to be like rand() and only evaluated once in a query. This query took more than a minute to return and all getdate()s are the same.
select getdate()
from sys.objects s1, sys.objects s2, sys.objects s3
Updated But when I looked at the query plan for an update of 2 different columns I could see the compute scalar operator was calling getdate() twice.
I tested doing an update with rand()
CREATE TABLE #t(
[f1] [float] NULL,
[f2] [float] NULL,
)
insert into #t values (1,1)
insert into #t values (2,2)
insert into #t values (3,3)
update #t set f1=rand(),f2=rand()
select * from #t
That Gives
f1 f2
---------------------- ----------------------
0.54168308978257 0.574235819564939
0.54168308978257 0.574235819564939
0.54168308978257 0.574235819564939
Actually, this depends on the version of SQL.
GetDate() was a deterministic function prior to SQL 2005. The answer returned was the same value for the duration of the statement.
In SQL 2005 (and onwards), Getdate() is non-deterministic, which means every time you call it you will get a different value.
Since both GetDate() functions will be evaluated before the update starts, IMO they will come back with the same value.
Not knowing the size of your table and partitions and the load on your server, I would go with option #1
I'm going to go with something other than performance: readability / communication of intent.
Along those lines, option one is probably better. You are, in effect, telling future developers "I am explicitly setting f1 and f2 to the same DateTime." If the requirements change in the future, and (for some reason) f1 and f2 have to be updated at separate times (or something changes and they get evaluated at different times), you still have the same datetime for both.
In option two, all you're saying is that f1 and f2 have to be updated with the current time of whenever their update operations run. Again, if something changes in your requirements and they have to be evaluated in separate statements for some reason, now they won't necessarilly be the same value any more.