I am writing a script that will generate a column of every data type within sql server. I know that sql-server considers things like decimal(5,1) and decimal(5,0) as different data types, however the various resources that I have looked at (w3schools and the microsoft reference page) do not make it clear if this is also the case for things like: Binary (length), or datetimeoffset(0-7), etc. Any insight would be greatly appreciated.
I'd say the answer is a bit messy.
On the one hand the SQL Server datatypes page clearly makes a distinction between "data types, collations, precision, scale, or length".
So that would indicate that collations, precision, scale, or length are all attributes distinct from datatype.
However the same page also mentions
Large value data types: varchar(max), nvarchar(max), and varbinary(max)
So that would imply that adding the max does change the datatype in some sense. Then also you've got cases like float(24) which is treated in SQL Server as a distinct datatype of real.
What is the purpose of this script? Simply generating a column of every data type doesn't sound useful in its own. What is it for? What makes sense for the script?
if you are interested with only every data type within sql server, you can get these information from INFORMATION_SCHEMA.COLUMNS
you can run this script:
use my_databas
go
SELECT distinct
DATA_TYPE ,
CHARACTER_MAXIMUM_LENGTH,
CHARACTER_OCTET_LENGTH,
NUMERIC_PRECISION , NUMERIC_SCALE
FROM INFORMATION_SCHEMA.COLUMNS
order by 1,2,3
Edit:
The table sys.types Contains a row for each system and user-defined type.
The next query get all data types in the sql server
select *
from sys.types
order by name
For decimal /binary data types , the storage is based on the following table:
Precision Storage bytes
1 - 9 5
10-19 9
20-28 13
29-38 17
reference: decimal and numeric
Precision is the maximum total number of decimal digits that will be stored, both to the left and to the right of the decimal point.
for example
decimal(5,1) (Precision 6) and decimal(5,0) (Precision 5) have the same storage 5 bytes each
I'm not sure where did that information come from.
SQL Server specifies decimal(x,y) a data type - an exact numeric data type. The difference is amount of bytes it uses, and that makes it "consider" numeric(5,1) and numeric(5,0) as different data types. Key difference is when operation on two numerics is required (addition or any other), SQL Server tries to adapt to the precision that should account for the correct result.
Almost the same thing should happen with binary(x). When operation is performed between different precisions, you should get the precision that should contain the correct result.
Datetimeoffset(0-7) should follow the same logic.
Related
I'm pulling in some external data into my MSSQL server. Several columns of incoming data are marked as 'number' (it's a json file). It's millions of rows in size and many of the columns appear to be decimal (18,2) like 23.33. But I can't be sure that it will always be like that, in fact a few have been 23.333 or longer numbers like 23.35555555 which will mess up my import.
So my question is given a column is going to have some kind of number imported into it, but I can't be sure really how big or how many decimal places it's going to have... do I have to resort to making my column a varchar or is there a very generic number kind of column I'm not thinking of?
Is there a max size decimal, sort of like using VARCHAR(8000) or VARCHAR(MAX) ?
update
This is the 'data type' of number that I'm pulling in:
https://dev.socrata.com/docs/datatypes/number.html#
Looks like it can be pretty much any number, as per their writing:
"Numbers are arbitrary precision, arbitrary scale numbers."
The way I handle things like this is to import the raw data into a staging table in a varchar(max) column.
Then I use TRY_PARSE() or TRY_CONVERT() when moving it to the desired datatype in my final destination table.
The point here is that the shape of the incoming data shouldn't determine the datatype you use. The datatype should be determined by the usage of the data once it's in your table. And if the incoming data doesn't fit, there are ways of making it fit.
What do those numbers represent? If they are just values to show you could just set float as datatype and you're good to go.
But if they are coordinates or currencies or anything you need for absolute precise calculations float might sometimes give rounding problems. Then you should set your desired minimal precision with decimal and simply truncate what's eventually over.
For instance if most of the numbers have two decimals, you could go with 3 or 4 decimal points to be sure, but over that it will be cut.
If i have a column_1 with column type as short (-32 768 and 32 767), If I write a sql query.
Select *
from some_table
where column_1 < 2147483647
Where 2147483647 is java INT_MAX. How does sql compare these types.
The SQL standard appears to state only that the behaviour is implementation-defined. (Meaning : RTFM of your particular DBMS.) (And I'm open to standing corrected.)
However, for operations of arithmetic the standard seems to mandate quite some things that go in the direction of "avoid data truncation at all cost". So it is not so unreasonable to assume that when it comes to numeric comparisons, the implementation will likewise do whatever it can to avoid any data truncation, i.e. in this particular case, "upcasting the SMALLINT to INTEGER". But that's assumption not legislation.
The difference between the 2 datatypes is one of storage size and possible value range.
SQL server will handle implicit conversions (when necessary) for performing comparisons. In your case short will be converted to int as 'temp', then two values will be compared and 'temp' will be deleted.
So you don't have to be worry about it.
I am having to create a second header line and am using the first record of the Query to do this. I am using a UNION All to create this header record and the second part of the UNION to extract the Data required.
I have one issue on one column.
,'Active Energy kWh'
UNION ALL
,SUM(cast(invc.UNITS as Decimal (15,0)))
Each side are 11 lines before and after the Union and I have tried all sorts of combinations but it always results in an error message.
The above gives me "Error converting data type varchar to numeric."
Any help would be much appreciated.
The error message indicates that one of your values in the INVC table UNITS column is non-numeric. I would hazard a guess that it's either a string (VARCHAR or similar) column or something else - and one of the values has ended up in a state where it cannot be parsed.
Unfortunately there is no way other than checking small ranges of the table to gradually locate the 'bad' row (i.e. Try running the query for a few million rows at a time, then reducing the number until you home in on the bad data). SQL 2014 if you can get a database restored to it has the TRY_CONVERT function which will permit conversions to fail, enabling a more direct check - but you'll need to play with this on another system
(I'm assuming that an upgrade to 2014 for this feature is out of the question - your best bet is likely just looking for the bad row).
The problem is that you are trying to mix header information with data information in a single query.
Obviously, all your header columns will be strings. But not all your data columns will be strings, and SQL Server is unhappy when you mix data types this way.
What you are doing is equivalent to this:
select 'header1' as col1 -- string
union all
select 123.5 -- decimal
The above query produces the following error:
Error converting data type varchar to numeric.
...which makes sense, because you are trying to mix both a string (the header) with a decimal field.
So you have 2 options:
Remove the header columns from your query, and deal with header information outside your query.
Accept the fact that you'll need to convert the data type of every column to a string type. So when you have numeric data, you'll need to cast the column to varchar(n) explicitly.
In your case, it would mean adding the cast like this:
,'Active Energy kWh'
UNION ALL
,CAST(SUM(cast(invc.UNITS as Decimal (15,0))) AS VARCHAR(50)) -- Change 50 to appropriate value for your case
EDIT: Based on comment feedback, changed the cast to varchar to have an explicit length (varchar(n)) to avoid relying on the default length, which may or may not be long enough. OP knows the data, so OP needs to pick the right length.
This is on Microsoft SQL Server. We have a query where we are trying to join two tables on fields containing numeric data.
One table has the field defined as numeric(18,2) and the other table has the field defined as decimal(24,4). When joining with the native data types, the query hangs and we run out of patience before it will finish (left it running 6 min…). So we tried casting the two fields to be both numeric(18,2) and the query finished in under 10 seconds. So we tried casting the two fields to be both decimal(18,2) and again the query hangs. Does anyone know the difference between the decimal and numeric data types that would make them perform so differently?
DECIMAL and NUMERIC datatypes are the one and the same thing in SQL Server.
Quote from BOL:
Numeric data types that have fixed
precision and scale.
decimal[ (p[ ,s] )] and numeric[ (p[
,s] )] Fixed precision and scale
numbers. When maximum precision is
used, valid values are from - 10^38 +1
through 10^38 - 1. The ISO synonyms
for decimal are dec and dec(p, s).
numeric is functionally equivalent to
decimal.
From that, I'm surprised to hear of a difference. I'd expect the execution plans to be the same between the 2 routes, can you check?
Why are you using two datatypes to begin with? If they contain the same type of data (and joining on them implies they do), they should be the same datatype. Fix this and all your problems go away. Why waste server resources continually casting to match two fields that should be defined the same?
You of course may need to adjust the input variables for any insert or update queries to match waht you chose as the datatype.
My guess is that it's not a matter of a specific difference between the two data types, but simply the fact that SQL Server needs to implicitly convert them to match for the join operation.
I don't know why there would be a difference from your first query and the second, where you explicitly convert, but I can see why there might be a problem when you convert to a datatype that doesn't match and then SQL Server has to implicitly convert them anyway (as in your third case). Maybe in the first case, SQL Server is implicitly converting both to decimal(24,4) so as not to lose data and that operation takes longer than converting the other way. Have you tried explicitly converting the numeric(18,2) to a decimal(24,4)?
What's the difference between the SQL datatype NUMERIC and DECIMAL ?
If databases treat these differently, I'd like to know how for at least:
SQL Server
Oracle
Db/2
MySQL
PostgreSQL
Furthermore, are there any differences in how database drivers interpret these types?
They are the same for almost all purposes.
At one time different vendors used different names (NUMERIC/DECIMAL) for almost the same thing. SQL-92 made them the same with one minor difference which can be vendor specific:
NUMERIC must be exactly as precise as it is defined — so if you define 4 decimal places to the left of the decimal point and 4 decimal places to the right of it, the DB must always store 4 + 4 decimal places, no more, no less.
DECIMAL is free to allow higher numbers if that's easier to implement. This means that the database can actually store more digits than specified (due to the behind-the-scenes storage having space for extra digits). This means the database might allow storing 12345.0000 in the above example of 4 + 4 decimal places, but storing 1.00005 is still not allowed if doing so could affect any future calculations.
Most current database systems treat DECIMAL and NUMERIC either as perfect synonyms, or as two distinct types with exactly the same behavior. If the types are considered distinct at all, you might not be able to define a foreign key constrain on a DECIMAL column referencing a NUMERIC column or vice versa.
They are synonyms, no difference at all.
At least on SQL Server in the ANSI SQL standards.
This SO answer shows some difference in ANSI but I suspect in implementation they are the same
Postgres: No difference
in documentation description in table 8.1 looks same, yet it is not explained why it is mentioned separately, so
according to Tom Lane post
There isn't any difference, in
Postgres. There are two type names because the SQL standard requires
us to accept both names. In a quick look in the standard it appears
that the only difference is this:
17)NUMERIC specifies the data type exact numeric, with the decimal
precision and scale specified by the <precision> and <scale>.
18)DECIMAL specifies the data type exact numeric, with the decimal
scale specified by the <scale> and the implementation-defined
decimal precision equal to or greater than the value of the
specified <precision>.
ie, for DECIMAL the implementation is allowed to allow more digits
than requested to the left of the decimal point. Postgres doesn't
exercise that freedom so there's no difference between these types for
us.
regards, tom lane
also a page lower docs state clearly, that
The types decimal and numeric are equivalent. Both types are part of
the SQL standard.
and also at aliases table decimal [ (p, s) ] is mentioned as alias for numeric [ (p, s) ]
They are actually equivalent, but they are independent types, and not technically synonyms, like ROWVERSION and TIMESTAMP - though they may have been referred to as synonyms in the documentation at one time. That is a slightly different meaning of synonym (e.g. they are indistinguishable except in name, not one is an alias for the other). Ironic, right?
What I interpret from the wording in MSDN is actually:
These types are identical, they just have different names.
Other than the type_id values, everything here is identical:
SELECT * FROM sys.types WHERE name IN (N'numeric', N'decimal');
I have absolutely no knowledge of any behavioral differences between the two, and going back to SQL Server 6.5, have always treated them as 100% interchangeable.
for DECIMAL(18,2) and NUMERIC(18,2)? Assigning one to the other is technically a "conversion"?
Only if you do so explicitly. You can prove this easily by creating a table and then inspecting the query plan for queries that perform explicit or - you might expect - implicit conversions. Here's a simple table:
CREATE TABLE [dbo].[NumDec]
(
[num] [numeric](18, 0) NULL,
[dec] [decimal](18, 0) NULL
);
Now run these queries and capture the plan:
DECLARE #num NUMERIC(18,0);
DECLARE #dec DECIMAL(18,0);
SELECT
CONVERT(DECIMAL(18,0), [num]), -- conversion
CONVERT(NUMERIC(18,0), [dec]) -- conversion
FROM dbo.NumDec
UNION ALL SELECT [num],[dec]
FROM dbo.NumDec WHERE [num] = #dec -- no conversion
UNION ALL SELECT [num],[dec]
FROM dbo.NumDec WHERE [dec] = #num; -- no conversion
we have explicit conversions where we asked for them, but no explicit conversions where we might have expected them. Seems the optimizer is treating them as interchangeable, too.
Personally, I prefer to use the term DECIMAL just because it's much more accurate and descriptive. BIT is "numeric" too.