Strange SQL Server type conversion issue

Strange SQL Server type conversion issue - sql-server-2005

I've experienced today strange issue. One of my projects is running .NET + SQL Server 2005 Express.
There is one query I use for some filtering.
SELECT *
FROM [myTable]
where UI = 2011040773395012950010370
GO
SELECT *
FROM [myTable]
where UI = '2011040773395012950010370'
GO
UI column is nvarchar(256) and UI value passed to filter is always 25 digits.
On my DEV environment - both queries return same row and no errors. However at my customers, after few months of running fine, first version started to return type conversion error.
Any idea why?
I'm not looking for solution - I'm looking for explanation why on one environment it works and on other doesn't and why out of sudden it started to return errors instead of results. I'm using same tools on both (SQL Server Management Studio Express and 2 different .NET clients)
Environments are more or less the same (W2k3 + SQL Server 2005 Express)

This is completely predictable and expected because of Datatype precedence
For this, the UI column will be changed to decimal(25,0)
where UI = 2011040773395012950010370
This one is almost correct. The right hand side is varchar and is changed to nvarchar
where UI = '2011040773395012950010370'
This is the really correct version where both types are the same
where UI = N'2011040773395012950010370'
Errors will have started because the UI column now contains a value that won't CAST to decimal(25,0).
Some unrelated notes:
if you have an index on the UI column it would be ignored in the first version because of the implicit CAST required
do you need unicode to store numeric digits? There is a serious overhead with unicode data types in storage and performance
why not use char(25) or nchar(25) is values are always fixed length? Your queries use too much memory as the optimiser assumes an average length of 128 characters based on nvarchar(256)
Edit, after comment
Don't assume "why does it works sometimes" when you don't know that it does work
Examples:
The value could have been deleted then added later
A TOP clause or SET ROWCOUNT could mean the offending value is not reached
The query was never run so it couldn't fail
The error is silently ignored by some other code?
Edit 2 for hopefully more clarity
Chat
gbn:
When you run WHERE UI = 2011040773395012950010370, you do not know the order of row access. So if one row does have "bicycle" you may or may not hit that row.
Random:
So the problem may be not in the row which i was trying to access but other one with corrupted value?
gbn
different machines will have different order of reads based on service pack level, index and table fragmentation, number of CPUs, parallelism maybe
correct
and TOP even. That kind of stuff
As Tao mentions, it's important to understand that another unrelated can break the query even if this one is OK.
data type precedence can cause ALL the data in that column to be converted before the where clause is evaluated

Related

Why are dot-separated prefixes ignored in the column list for INSERT statements?

I've just come across some SQL syntax that I thought was invalid, but actually works fine (in SQL Server at least).
Given this table:
create table SomeTable (FirstColumn int, SecondColumn int)
The following insert statement executes with no error:
insert SomeTable(Any.Text.Here.FirstColumn, It.Does.Not.Matter.What.SecondColumn)
values (1,2);
The insert statement completes without error, and checking select * from SomeTable shows that it did indeed execute properly. See fiddle: http://sqlfiddle.com/#!6/18de0/2
SQL Server seems to just ignore anything except the last part of the column name given in the insert list.
Actual question:
Can this be relied upon as documented behaviour?
Any explanation about why this is so would also be appreciated.

It's unlikely to be part of the SQL standard, given its dubious utility (though I haven't checked specifically (a)).
What's most likely happening is that it's throwing away the non-final part of the column specification because it's superfluous. You have explicitly stated what table you're inserting into, with the insert into SomeTable part of the command, and that's the table that will be used.
What you appear to have done here is to find a way to execute SQL commands that are less readable but have no real advantage. In that vein, it appears similar to the C code:
int nine = 9;
int eight = 8;
xyzzy = xyzzy + nine - eight;
which could perhaps be better written as xyzzy++; :-)
I wouldn't rely on it at all, possibly because it's not standard but mostly because it makes maintenance harder rather than easier, and because I know DBAs all over the world would track me down and beat me to death with IBM DB2 manuals, their choice of weapon due to the voluminous size and skull-crushing abilities :-)
(a) I have checked non-specifically, at least for ISO 9075-2:2003 which dictates the SQL03 language.
Section 14.8 of that standard covers the insert statement and it appears that the following clause may be relevant:
Each column-name in the insert-column-list shall identify an updatable column of T.
Without spending a huge amount of time (that document is 1,332 pages long and would take several days to digest properly), I suspect you could argue that the column could be identified just using the final part of the column name (by removing all the owner/user/schema specifications from it).
Especially since it appears only one target table is possible (updatable views crossing table boundaries notwithstanding):
<insertion target> ::= <table name>
Fair warning: I haven't checked later iterations of the standard so things may have changed. But I'd consider that unlikely since there seems to be no real use case for having this feature.

This was reported as a bug on Connect and despite initially encouraging comments the current status of the item is closed as "won't fix".
The Order by clause used to behave in a similar fashion but that one was fixed in SQL Server 2005.

I get errors when I try to run the script on SQL Server 2012 as well as SQL Server 2014 and SQL Server 2008 R2. So you can certainly not rely on the behavior you see with sqlfiddle.
Even if this were to work, I would never rely on undocumented behavior in production code. Microsoft will include notice of breaking changes with documented features but not undocumented ones. So if this were an actual T-SQL parsing bug that was later fixed, it would break the mal-formed code.

Possible index corruption?

In the above image what you can see is that I have a table, that when I query a max value of a field from it, I get different results based on a where clause that the rest of the queries seem to rule out as irrelevant.
Back end is MSDE 2000, front end is application written in VB.NET 2008, verification performed using SSMS 2008R2 attached to MSDE instance over VPN.
It is a closed system from application development, however if I could correct whatever is causing this I believe both DB and application would resume operation.
The problem is is causing is when it requests Max([record_index]) + 1 where the [station_id] = 10, the value is coming up as a record that already exists in that table, and the insert is failing because of a unique constraint.

Reindex of the PK index solved the problem and makes the above queries for Max([record_index]) return the same number as Max([record_index]) WHERE... return the same numbers, as they should. So at this point index corruption is the only logical answer. The DB engine is 12 years old, and this is the only time it has ever happened to us, guess I will just have to accept it

Why do SQL errors not show you the error source?

Is it possible to find the line or column where an error is occurring when executing SQL code in Oracle SQL developer?
For example, imagine you are running a very simple line of code
SELECT * FROM employeesTbl WHERE active = 1
But for some reason, active is VARCHAR and someone has entered the ";!/asd02" into this field.
You will only get an ORA- error, but it does not tell you which row caused it.
Does anyone know why this is?

The reason behind this is that in general developer support in sql, pl/sql and the like is really abysmal. One result is a really broken exception concept in pl/sql, almost useless exceptions in (oracle) sql and little hope that it is better in any rdbms.
I think the reason behind all that is that databases are persistent beasts (pun intended). Many companies and developers change from time to time there preferred main development language (C, C++, VB, Java, C#, Groovy, Scala ..). But they rarely change the database, possibly because you will still have the old databases around with no chance to migrate them.
This in turn means most DB-devs know only a single database system reasonable well, so they don't see what is possible in other systems. Therefore there is little to no pressure to make database systems any more usable for developers.

Multiple rows may contain errors. For the system to be consistent (as a "set-based" language), it ought to return you all rows which contain errors - and not all row errors may be caused by the same error.
However, it could be computationally expensive to compute this entire error set - and the system "knows" that any further computation on this query is going to result in failure anyway - so it represents wasted resources when other queries could be running successfully.
I agree that it would be nice to turn on this type of reporting as an option (especially in non-production environments), but no database vendor seems to have done so.

You get an error because the field is a character and you're assuming it's a number. Which, you shouldn't be doing. If you want the field to be numeric then you have to have a numeric field! This is a general rule, all non-character columns should be the correct data-type to avoid this type of problem.
I'm not certain why Oracle doesn't tell you what row caused the error, it may be physically possible using the rowid in a simple select as you have here. If you're joining tables or using conversion functions such as to_number it would become a lot more difficult, if possible at all.
I would imagine that Oracle did not want to implement something only partially, especially when this is not an Oracle error but a coding error.
To sort out the problem create the following function:
create or replace function is_number( Pvalue varchar2
) return number is
/* Test whether a value is a number. Return a number
rather than a plain boolean so it can be used in
SQL statements as well as PL/SQL.
*/
l_number number;
begin
-- Explicitly convert.
l_number := to_number(Pvalue);
return 1;
exception when others then
return 0;
end;
/
Run the following to find your problem rows:
SELECT * FROM employeesTbl WHERE is_number(active) = 0
Or this to ignore them:
SELECT *
FROM ( SELECT *
FROM employeesTbl
WHERE is_number(active) = 1 )
WHERE active = 1

Performance of SQL functions vs. code functions

We're currently investigating the load against our SQL server and looking at ways to alleviate it. During my post-secondary education, I was always told that, from a performance standpoint, it was cheaper to make SQL Server do the work. But is this true?
Here's an example:
SELECT ord_no FROM oelinhst_sql
This returns 783119 records in 14 seconds. The field is a char(8), but all of our order numbers are six-digits long so each has two blank characters leading. We typically trim this field, so I ran the following test:
SELECT LTRIM(ord_no) FROM oelinhst_sql
This returned the 783119 records in 13 seconds. I also tried one more test:
SELECT LTRIM(RTRIM(ord_no)) FROM oelinhst_sql
There is nothing to trim on the right, but I was trying to see if there was any overhead in the mere act of calling the function, but it still returned in 13 seconds.
My manager was talking about moving things like string trimming out of the SQL and into the source code, but the test results suggest otherwise. My manager also says he heard somewhere that using SQL functions meant that indexes would not be used. Is there any truth to this either?

Only optimize code that you have proven to be the slowest part of your system. Your data so far indicates that SQL string manipulation functions are not effecting performance at all. take this data to your manager.
If you use a function or type cast in the WHERE clause it can often prevent the SQL server from using indexes. This does not apply to transforming returned columns with functions.

It's typically user defined functions (UDFs) that get a bad rap with regards to SQL performance and might be the source of the advice you're getting.
The reason for this is you can build some pretty hairy functions that cause massive overhead with exponential effect.
As you've found with rtrim and ltrim this isn't a blanket reason to stop using all functions on the sql side.

It somewhat depends on what all is encompassed by: "things like string trimming", but, for string trimming at least, I'd definitely let the database do that (there will be less network traffic as well). As for the indexes, they will still be used if you're where clause is just using the column itself (as opposed to a function of the column). Use of the indexes won't be affected whatsoever by using functions on the actual columns you're retrieving (just on how you're selecting the rows).
You may want to have a look at this for performance improvement suggestions: http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/

As I said in my comment, reduce the data read per query and you will get a speed increase.
You said:
our order numbers are six-digits long
so each has two blank characters
leading
Makes me think you are storing numbers in a string, if so why are you not using a numeric data type? The smallest numeric type which will take 6 digits is an INT (I'm assuming SQL Server) and that already saves you 4 bytes per order number, over the number of rows you mention that's quite a lot less data to read off disk and send over the network.
Fully optimise your database before looking to deal with the data outside of it; it's what a database server is designed to do, serve data.

As you found it often pays to measure but I what I think your manager may have been referring to is somthing like this.
This is typically much faster
SELECT SomeFields FROM oelinhst_sql
WHERE
datetimeField > '1/1/2011'
and
datetimeField < '2/1/2011'
than this
SELECT SomeFields FROM oelinhst_sql
WHERE
Month(datetimeField) = 1
and
year(datetimeField) = 2011
even though the rows that are returned are the same

Emulating TO_SECONDS() in older versions of MySQL (<5.5.0)

For performance and simplicity reasons I would like to get the contents of a DATETIME column in my MySQL 3.x server as seconds (or any numeric type really - I just want to avoid all the apparent nastiness of timezones when using UNIX_TIMESTAMP() [the dates in my table are indeed from different locales so I'd rather not have any doubt as to whether or not some timezone-compensation weirdness is going on behind my back]).
The TO_SECONDS() function seemed ideal, until I found out that it only works on newer MySQL installations (upgrading is not an option)...
I though about doing something like this:
SELECT (TO_DAYS(Timestamp)-730486)*86400+TIME_TO_SEC(Timestamp)
To manually calculate the number of seconds elapsed since 2000-01-01. But it seems like it might put unnecessary load on the server by forcing it to make a temporary table or something?
I could also just do:
SELECT TO_DAYS(Timestamp), TIME_TO_SEC(Timestamp)
and combine the results myself, but that makes it less simple on the code-side.
What's the best compromise? I'll be fetching a large number of rows (on the order of 10^6) each query, so both client and server-side performance is not entirely unimportant..
Thanks,.
(post edited to reduce confusion)

Firstly, just to make sure, the new field will be a BIGINT... correct?
Can you use explicit casting in to prevent the overflow?
SELECT CAST(TO_DAYS(Timestamp)*86400 + TIME_TO_SEC(Timestamp) AS UNSIGNED INTEGER)
Or perhaps use an intermediate string before populating the new BIGINT field?
SELECT CAST(TO_DAYS(Timestamp)*86400 + TIME_TO_SEC(Timestamp) AS UNSIGNED CHAR(11))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Strange SQL Server type conversion issue - sql-server-2005

Related

Why are dot-separated prefixes ignored in the column list for INSERT statements?

Possible index corruption?

Why do SQL errors not show you the error source?

Performance of SQL functions vs. code functions

Emulating TO_SECONDS() in older versions of MySQL (<5.5.0)

Categories

Resources