PostgreSQL how to create table for big numbers like 1.33E+09 -1.8E+09? - sql

I am new to SQL and I need to create a table to accommodate a bunch of data that is in this format:
1.33E+09 -1.8E+09 58 -1.9E+09 2.35E+10 2.49E+10 2.49E+10 3.35E+08 etc.
How to deal with it? I am not surely populating the table with this and if I need to convert it in order to work with it... Any suggestions?
is that a BIGINT?

The correct data type for data like this is double precision or float.
It is obvious from the data that high precision is not necessary, so numeric would not be appropriate (it takes more storage and makes computations much slower).
float is only a good choice if you really need as few significant digits as your examples suggest and storage space is at a premium.
Your data are already in a format that PostgreSQL can accept for these data types, so there is no need for conversion.

The answer depends on whether you know the maximum range of data values your application might create.
A postgresql 'numeric' column will certainly hold any of the values you list above. Per the docs: "up to 131072 digits before the decimal point; up to 16383 digits after the decimal point".
See the data type reference here.

Related

SQL data type - recommendation for 'unknown' number

I'm pulling in some external data into my MSSQL server. Several columns of incoming data are marked as 'number' (it's a json file). It's millions of rows in size and many of the columns appear to be decimal (18,2) like 23.33. But I can't be sure that it will always be like that, in fact a few have been 23.333 or longer numbers like 23.35555555 which will mess up my import.
So my question is given a column is going to have some kind of number imported into it, but I can't be sure really how big or how many decimal places it's going to have... do I have to resort to making my column a varchar or is there a very generic number kind of column I'm not thinking of?
Is there a max size decimal, sort of like using VARCHAR(8000) or VARCHAR(MAX) ?
update
This is the 'data type' of number that I'm pulling in:
https://dev.socrata.com/docs/datatypes/number.html#
Looks like it can be pretty much any number, as per their writing:
"Numbers are arbitrary precision, arbitrary scale numbers."
The way I handle things like this is to import the raw data into a staging table in a varchar(max) column.
Then I use TRY_PARSE() or TRY_CONVERT() when moving it to the desired datatype in my final destination table.
The point here is that the shape of the incoming data shouldn't determine the datatype you use. The datatype should be determined by the usage of the data once it's in your table. And if the incoming data doesn't fit, there are ways of making it fit.
What do those numbers represent? If they are just values to show you could just set float as datatype and you're good to go.
But if they are coordinates or currencies or anything you need for absolute precise calculations float might sometimes give rounding problems. Then you should set your desired minimal precision with decimal and simply truncate what's eventually over.
For instance if most of the numbers have two decimals, you could go with 3 or 4 decimal points to be sure, but over that it will be cut.

Float type storing values in format "2.46237846387469E+15"

I have a table ProductAmount with columns
Id [BIGINT]
Amount [FLOAT]
now when I pass value from my form to table it gets stored in format 2.46237846387469E+15 whereas actual value was 2462378463874687. Any ideas why this value is being converted and how to stop this?
It is not being converted. That is what the floating point representation is. What you are seeing is the scientific/exponential format.
I am guessing that you don't want to store the data that way. You can alter the column to use a fixed format representation:
alter table ProductAmount alter amount decimal(20, 0);
This assumes that you do not want any decimal places. You can read more about decimal formats in the documentation.
I would strongly discourage you from using float unless:
You have a real floating point number (say an expected value from a statistical calculation).
You have a wide range of values (say, 0.00000001 to 1,000,000,000,000,000).
You only need a fixed number of digits of precision over a wide range of magnitudes.
Floating point numbers are generally not needed for general-purpose and business applications.
The value gets stored in a binary format, because this is what you specified by requesting FLOAT as the data type for the column.
The value that you store in the field is represented exactly, because 64-bit FLOAT uses 52 bits to represent the mantissa*. Even though you see 2.46237846387469E+15 when selecting the value back, it's only the presentation that is slightly off: the actual value stored in the database matches the data that you inserted.
But i want to store 2462378463874687 as a value in my db
You are already doing it. This is the exact value stored in the field. You just cannot see it, because querying tool of SQL Management Studio formats it using scientific notation. When you do any computations on the value, or read it back into a double field in your program, you will get back 2462378463874687.
If you would like to see the exact number in your select query in SQL Management Studio, use CONVERT:
CONVERT (VARCHAR(50), float_field, 128) -- See note below
Note 1: 128 is a deprecated format. It will work with SQL Server-2008, which is one of the tags of your question, but in versions of SQL Server 2016 and above you need to use 3 instead.
Note 2: Since the name of the column is Amount, good chances are that you are looking for a different data type. Look into decimal data types, which provide a much better fit for representing monetary amounts.
* 2462378463874687 is right on the border for exact representation, because it uses all 52 bits of mantissa.

Losing decimal places in hive tables

I created a float columns for a hive table. I then uploaded some lat/lng data:
-74.223827433599993,40.698842158200002
-117.57601661229999,34.044852414099999
-81.055917609600002,29.239413820500001
-80.369884327199998,25.789129988199999
When I query the data out or into another table, the rounding is significant:
-74.22383,40.69884
-117.57602,34.044853
-81.055916,29.239414
-80.36988,25.78913
I think you want to use the DOUBLE data type rather than FLOAT.
BTW the 6th decimal place corresponds to 0.1m at the equator which is possibly much less than the margin of error on your data! (See the answer to this question.)

SQL - Numeric data type with leading zeros

I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)

Is the CHAR datatype in SQL obsolete? When do you use it?

The title pretty much frames the question. I have not used CHAR in years. Right now, I am reverse-engineering a database that has CHAR all over it, for primary keys, codes, etc.
How about a CHAR(30) column?
Edit:
So the general opinion seems to be that CHAR if perfectly fine for certain things. I, however, think that you can design a database schema that does not have a need for "these certain things", thus not requiring fixed-length strings. With the bit, uniqueidentifier, varchar, and text types, it seems that in a well-normalized schema you get a certain elegance that you don't get when you use encoded string values. Thinking in fixed lenghts, no offense meant, seems to be a relic of the mainframe days (I learned RPG II once myself). I believe it is obsolete, and I did not hear a convincing argument from you claiming otherwise.
I use char(n) for codes, varchar(m) for descriptions. Char(n) seems to result in better performance because data doesn't need to move around when the size of contents change.
Where the nature of the data dictates the length of the field, I use CHAR. Otherwise VARCHAR.
CHARs are still faster for processing than VARCHARs in the DBMS I know well. Their fixed size allow for optimizations that aren't possible with VARCHARs. In addition, the storage requirements are slightly less for CHARS since no length has to be stored, assuming most of the rows need to fully, or near-fully, populate the CHAR column.
This is less of an impact (in terms of percentage) with a CHAR(30) than a CHAR(4).
As to usage, I tend to use CHARs when either:
the fields will generally always be close to or at their maximum length (stock codes, employee IDs, etc); or
the lengths are short (less than 10).
Anywhere else, I use VARCHARs.
I use CHAR when length of the value is fixed. For example we are generating a code or something based on some algorithm which returns the code with the specific fixed lenght lets say 13.
Otherwise, I found VARCHAR better. One more reason to use VARCHAR is that when you get the value back in your application you don't need to trim that value. In the case of CHAR you will get the full length of the column whether the value is filling it fully or not. It would get filled by spaces and you end up trimming every value, and forgetting that would lead to errors.
For PostgreSQL, the documentation states that char() has no advantage in storage space over varchar(); the only difference is that it's blank-padded to the specified length.
Having said that, I still use char(1) or char(3) for one-character or three-character codes. I think that the clarity due to the type specifying what the column should contain provides value, even if there are no storage or performance advantages. And yes, I typically use check constraints or foreign key constraints as well. Apart from those cases, I generally just stick with text rather than using varchar(). Again, this is informed by the database implementation, which automatically switches from inline to out-of-line storage if the value is large enough, which some other database implementations don't do.
Char isn't obsolete, it just should only be used if the length of the field should never vary. In the average database, this would be very few fields mostly some kind of code field like State Abbreviations which are a standard 2 character filed if you use the postal codes. Using Char where the filed length is varaible means that there will be a lot of trimming going on and that is extra, unnecessary work and the database should be refactored.