SQL Server join question

SQL Server join question - sql

This is on Microsoft SQL Server. We have a query where we are trying to join two tables on fields containing numeric data.
One table has the field defined as numeric(18,2) and the other table has the field defined as decimal(24,4). When joining with the native data types, the query hangs and we run out of patience before it will finish (left it running 6 min…). So we tried casting the two fields to be both numeric(18,2) and the query finished in under 10 seconds. So we tried casting the two fields to be both decimal(18,2) and again the query hangs. Does anyone know the difference between the decimal and numeric data types that would make them perform so differently?

DECIMAL and NUMERIC datatypes are the one and the same thing in SQL Server.
Quote from BOL:
Numeric data types that have fixed
precision and scale.
decimal[ (p[ ,s] )] and numeric[ (p[
,s] )] Fixed precision and scale
numbers. When maximum precision is
used, valid values are from - 10^38 +1
through 10^38 - 1. The ISO synonyms
for decimal are dec and dec(p, s).
numeric is functionally equivalent to
decimal.
From that, I'm surprised to hear of a difference. I'd expect the execution plans to be the same between the 2 routes, can you check?

Why are you using two datatypes to begin with? If they contain the same type of data (and joining on them implies they do), they should be the same datatype. Fix this and all your problems go away. Why waste server resources continually casting to match two fields that should be defined the same?
You of course may need to adjust the input variables for any insert or update queries to match waht you chose as the datatype.

My guess is that it's not a matter of a specific difference between the two data types, but simply the fact that SQL Server needs to implicitly convert them to match for the join operation.
I don't know why there would be a difference from your first query and the second, where you explicitly convert, but I can see why there might be a problem when you convert to a datatype that doesn't match and then SQL Server has to implicitly convert them anyway (as in your third case). Maybe in the first case, SQL Server is implicitly converting both to decimal(24,4) so as not to lose data and that operation takes longer than converting the other way. Have you tried explicitly converting the numeric(18,2) to a decimal(24,4)?

Related

How Sql Compare 2 different integer types

If i have a column_1 with column type as short (-32 768 and 32 767), If I write a sql query.
Select *
from some_table
where column_1 < 2147483647
Where 2147483647 is java INT_MAX. How does sql compare these types.

The SQL standard appears to state only that the behaviour is implementation-defined. (Meaning : RTFM of your particular DBMS.) (And I'm open to standing corrected.)
However, for operations of arithmetic the standard seems to mandate quite some things that go in the direction of "avoid data truncation at all cost". So it is not so unreasonable to assume that when it comes to numeric comparisons, the implementation will likewise do whatever it can to avoid any data truncation, i.e. in this particular case, "upcasting the SMALLINT to INTEGER". But that's assumption not legislation.

The difference between the 2 datatypes is one of storage size and possible value range.
SQL server will handle implicit conversions (when necessary) for performing comparisons. In your case short will be converted to int as 'temp', then two values will be compared and 'temp' will be deleted.
So you don't have to be worry about it.

Sql Server Data Types

I am writing a script that will generate a column of every data type within sql server. I know that sql-server considers things like decimal(5,1) and decimal(5,0) as different data types, however the various resources that I have looked at (w3schools and the microsoft reference page) do not make it clear if this is also the case for things like: Binary (length), or datetimeoffset(0-7), etc. Any insight would be greatly appreciated.

I'd say the answer is a bit messy.
On the one hand the SQL Server datatypes page clearly makes a distinction between "data types, collations, precision, scale, or length".
So that would indicate that collations, precision, scale, or length are all attributes distinct from datatype.
However the same page also mentions
Large value data types: varchar(max), nvarchar(max), and varbinary(max)
So that would imply that adding the max does change the datatype in some sense. Then also you've got cases like float(24) which is treated in SQL Server as a distinct datatype of real.
What is the purpose of this script? Simply generating a column of every data type doesn't sound useful in its own. What is it for? What makes sense for the script?

if you are interested with only every data type within sql server, you can get these information from INFORMATION_SCHEMA.COLUMNS
you can run this script:
use my_databas
go
SELECT distinct
DATA_TYPE ,
CHARACTER_MAXIMUM_LENGTH,
CHARACTER_OCTET_LENGTH,
NUMERIC_PRECISION , NUMERIC_SCALE
FROM INFORMATION_SCHEMA.COLUMNS
order by 1,2,3
Edit:
The table sys.types Contains a row for each system and user-defined type.
The next query get all data types in the sql server
select *
from sys.types
order by name
For decimal /binary data types , the storage is based on the following table:
Precision Storage bytes
1 - 9 5
10-19 9
20-28 13
29-38 17
reference: decimal and numeric
Precision is the maximum total number of decimal digits that will be stored, both to the left and to the right of the decimal point.
for example
decimal(5,1) (Precision 6) and decimal(5,0) (Precision 5) have the same storage 5 bytes each

I'm not sure where did that information come from.
SQL Server specifies decimal(x,y) a data type - an exact numeric data type. The difference is amount of bytes it uses, and that makes it "consider" numeric(5,1) and numeric(5,0) as different data types. Key difference is when operation on two numerics is required (addition or any other), SQL Server tries to adapt to the precision that should account for the correct result.
Almost the same thing should happen with binary(x). When operation is performed between different precisions, you should get the precision that should contain the correct result.
Datetimeoffset(0-7) should follow the same logic.

Implicit casting when joining fields of different types

I am joining a field that has single digit numbers formatted with a leading 0 to another that does not have leading 0's. When I realized this I tested my query out only to find that it was actually working perfectly. Then I realized what I'd done... I had joined an nvarchar field to an int field. I would have thought sql would have given me an error for this but apparently it converts the character field to an int field for me.
I realize this is probably not a good practice and I plan to explicitly cast it myself now, but I'm just curious if there are rules for how SQL decides which field to cast in these situations. What's to keep it from casting the int field to a character type instead (in which case my query would no longer work properly)?

There are rules indeed.
CAST and CONVERT (Transact-SQL) to learn what can be converted to what ("Implicit Conversions" section).
Data Type Precedence (Transact-SQL) to learn what will be converted to what unless specifically asked.

MySQL Type Conversion: Why is float the lowest common denominator type?

I recently ran into an issue where a query was causing a full table scan, and it came down to a column had a different definition that I thought, it was a VARCHAR not an INT. When queried with "string_column = 17" the query ran, it just couldn't use the index. That really threw me for a loop.
So I went searching and found what happened, the behavior I was seeing is consistent with what MySQL's documentation says:
In all other cases, the arguments are compared as floating-point (real) numbers.
So my question is... why a float?
I could see trying to convert numbers to strings (although the points in the MySQL page linked above are good reasons not to). I could also understand throwing some sort of error, or generating a warning (my preference). Instead it happily runs.
So why convert everything to a float? Is that from the SQL standard, or based on some other reason? Can anyone shed some light on this choice for me?

I feel your pain. We have a column in our DB that holds what is well-known in the company as an "order number". But it's not always a number, in certain circumstances it can have other characters too, so we keep it in a varchar. With SQL Server 2000, this means that selecting on "order_number = 123456" is bad. SQL Server effectively rewrites the predicate as "CAST(order_number, INT) = 123456" which has two undesirable effects:
the index is on order_number as a varchar, so it starts a full scan
those non-numeric order numbers eventually cause a conversion error to be thrown to the user, with a rather unhelpful message.
In a way it's good that we do have those non-numeric "numbers", since at least badly-written queries that pass the parameter as a number get trapped rather than just sucking up resources.
I don't think there is a standard. I seem to remember PostgreSQL 8.3 dropped some of the default casts between number and text types so that this kind of situation would throw an error when the query was being planned.
Presumably "float" is considered to be the widest-ranging numeric type and therefore the one that all numbers can be silently promoted to?
Oh, and similar problems (but no conversion errors) for when you have varchar columns and a Java application that passes all string literals as nvarchar... suddenly your varchar indices are no longer used, good luck finding the occurrences of that happening. Of course you can tell the Java app to send strings as varchar, but now we're stuck with only using characters in windows-1252 because that's what the DB was created as 5-6 years ago when it was just a "stopgap solution", ah-ha.

Well, it's easily understandable: float is able to hold the greatest range of numbers.
If the underlying datatype is datetime, for instance, it can be simply converted to a float number that has the same intrinsic value.
If the datatype is an string it is easy to parse it to a float, degrading performance not withstanding.
So float datatype is better to fallback.

Difference between DECIMAL and NUMERIC

What's the difference between the SQL datatype NUMERIC and DECIMAL ?
If databases treat these differently, I'd like to know how for at least:
SQL Server
Oracle
Db/2
MySQL
PostgreSQL
Furthermore, are there any differences in how database drivers interpret these types?

They are the same for almost all purposes.
At one time different vendors used different names (NUMERIC/DECIMAL) for almost the same thing. SQL-92 made them the same with one minor difference which can be vendor specific:
NUMERIC must be exactly as precise as it is defined — so if you define 4 decimal places to the left of the decimal point and 4 decimal places to the right of it, the DB must always store 4 + 4 decimal places, no more, no less.
DECIMAL is free to allow higher numbers if that's easier to implement. This means that the database can actually store more digits than specified (due to the behind-the-scenes storage having space for extra digits). This means the database might allow storing 12345.0000 in the above example of 4 + 4 decimal places, but storing 1.00005 is still not allowed if doing so could affect any future calculations.
Most current database systems treat DECIMAL and NUMERIC either as perfect synonyms, or as two distinct types with exactly the same behavior. If the types are considered distinct at all, you might not be able to define a foreign key constrain on a DECIMAL column referencing a NUMERIC column or vice versa.

They are synonyms, no difference at all.
At least on SQL Server in the ANSI SQL standards.
This SO answer shows some difference in ANSI but I suspect in implementation they are the same

Postgres: No difference
in documentation description in table 8.1 looks same, yet it is not explained why it is mentioned separately, so
according to Tom Lane post
There isn't any difference, in
Postgres. There are two type names because the SQL standard requires
us to accept both names. In a quick look in the standard it appears
that the only difference is this:
17)NUMERIC specifies the data type exact numeric, with the decimal
precision and scale specified by the <precision> and <scale>.
18)DECIMAL specifies the data type exact numeric, with the decimal
scale specified by the <scale> and the implementation-defined
decimal precision equal to or greater than the value of the
specified <precision>.
ie, for DECIMAL the implementation is allowed to allow more digits
than requested to the left of the decimal point. Postgres doesn't
exercise that freedom so there's no difference between these types for
us.
regards, tom lane
also a page lower docs state clearly, that
The types decimal and numeric are equivalent. Both types are part of
the SQL standard.
and also at aliases table decimal [ (p, s) ] is mentioned as alias for numeric [ (p, s) ]

They are actually equivalent, but they are independent types, and not technically synonyms, like ROWVERSION and TIMESTAMP - though they may have been referred to as synonyms in the documentation at one time. That is a slightly different meaning of synonym (e.g. they are indistinguishable except in name, not one is an alias for the other). Ironic, right?
What I interpret from the wording in MSDN is actually:
These types are identical, they just have different names.
Other than the type_id values, everything here is identical:
SELECT * FROM sys.types WHERE name IN (N'numeric', N'decimal');
I have absolutely no knowledge of any behavioral differences between the two, and going back to SQL Server 6.5, have always treated them as 100% interchangeable.
for DECIMAL(18,2) and NUMERIC(18,2)? Assigning one to the other is technically a "conversion"?
Only if you do so explicitly. You can prove this easily by creating a table and then inspecting the query plan for queries that perform explicit or - you might expect - implicit conversions. Here's a simple table:
CREATE TABLE [dbo].[NumDec]
(
[num] [numeric](18, 0) NULL,
[dec] [decimal](18, 0) NULL
);
Now run these queries and capture the plan:
DECLARE #num NUMERIC(18,0);
DECLARE #dec DECIMAL(18,0);
SELECT
CONVERT(DECIMAL(18,0), [num]), -- conversion
CONVERT(NUMERIC(18,0), [dec]) -- conversion
FROM dbo.NumDec
UNION ALL SELECT [num],[dec]
FROM dbo.NumDec WHERE [num] = #dec -- no conversion
UNION ALL SELECT [num],[dec]
FROM dbo.NumDec WHERE [dec] = #num; -- no conversion
we have explicit conversions where we asked for them, but no explicit conversions where we might have expected them. Seems the optimizer is treating them as interchangeable, too.
Personally, I prefer to use the term DECIMAL just because it's much more accurate and descriptive. BIT is "numeric" too.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas