Minimum precision SQL numbers - sql

What is the best way to check for minimum precision with numbers in SQL with Oracle database?
CreditCardNumber NUMBER(16) NOT NULL
CHECK (CreditCardNumber LIKE '________________')
or
CreditCardNumber NUMBER(16) NOT NULL
CHECK (REGEXP_LIKE(CreditCardNumber, '^\d{16}'))
I understand this is presentation level checking but I heard it's still good practice to avoid illogical data on the database level.

None of your function really work as expected as you will end-up having an implicit number to string conversion. Losing leading zeros. Maybe this is not a problem in that particular case, assuming a credit card number never start with a leading zero (and never will ? — according to ISO/IEC 7812 the leading number could be a 0 in some corner cases).
However, notice you don't have any benefit here in using the NUMBER type, as you will never perform calculation on the credit card "number". So, for that kind of data (credit card "numbers, telephone "numbers", zip codes, ...), I would strongly suggest you to use a character type (VARCHAR2 or CHAR if you prefer) instead, and at the very least check using an appropriate regexp than only digits are part of the string. Would be better to validate the checksum as suggested by #Allan in his answer though.
In addition, even if 16 digits is the most common case, bank card numbers are variable length -- from 12 to 19 digits (according to http://www.watersprings.org/pub/id/draft-eastlake-card-map-08.txt as I don't have access to the ISO official document).
Finally, concerning credit card numbers, you have to remember that depending your local regulation you are not necessarily allowed to store them unencrypted...

NUMBER(16) will only allow integers (i.e. if you try to insert '10.1', it will round to '10').
Keep in mind too that credit card numbers aren't always 16 digits - American Express uses 15.
The only benefit you'll gain from storing credit cards numbers in the NUMBER type is storage space. Since the digits are packed at a ratio of 9 digits to 4 bytes, 8 bytes will store 16 digits. However, every interaction with the data will require a type conversion to or from text, so you'll have to weigh the costs of storage, processing, and ease of coding.

The obvious way to validate that the value is wide enough is to check the value numerically:
CreditCardNumber NUMBER(16) NOT NULL
CHECK (CreditCardNumber >= 1000000000000000)
However, as #BenGrimm points out, this may not be valid for all credit card numbers.
One way to validate the card lenght per providers is to have a lookup table with each provider that you accept and the length of their card numbers. Again, you'd have to use a trigger to check against that, but it would allow you to verify that the length is appropriate is precisely correct.
A better validation might be to implement the Luhn algorithm in a function and use it to validate the column value via a trigger.
Finally, to reiterate what Sylvain Leroux pointed out, this should all be academic. You shouldn't be storing credit card numbers in plain text and may even be legally or contractually prevented from doing so.

You could potentially use ceiling(log10(Number)) = 16. I think for a credit card number there are better ways to check though.

Related

Which datatype should be used for currency?

Seems like Money type is discouraged as described here.
My application needs to store currency, which datatype shall I be using? Numeric, Money or FLOAT?
Your source is in no way official. It dates to 2011 and I don't even recognize the authors. If the money type was officially "discouraged" PostgreSQL would say so in the manual - which it doesn't.
For a more official source, read this thread in pgsql-general (from just this week!), with statements from core developers including D'Arcy J.M. Cain (original author of the money type) and Tom Lane:
Related answer (and comments!) about improvements in recent releases:
Jasper Report: unable to get value for field 'x' of class 'org.postgresql.util.PGmoney'
Basically, money has its (very limited) uses. The Postgres Wiki suggests to largely avoid it, except for those narrowly defined cases. The advantage over numeric is performance.
decimal is just an alias for numeric in Postgres, and widely used for monetary data, being an "arbitrary precision" type. The manual:
The type numeric can store numbers with a very large number of digits.
It is especially recommended for storing monetary amounts and other
quantities where exactness is required.
Personally, I like to store currency as integer representing Cents if fractional Cents never occur (basically where money makes sense). That's more efficient than any other of the mentioned options.
Numeric with forced 2 units precision. Never use float or float like datatype to represent currency because if you do, people are going to be unhappy when the financial report's bottom line figure is incorrect by + or - a few dollars.
The money type is just left in for historical reasons as far as I can tell.
Take this as an example: 1 Iranian Rial equals 0.000030 United States Dollars. If you use fewer than 5 fractional digits then 1 IRR will be rounded to 0 USD after conversion. I know we're splitting rials here, but I think that when dealing with money you can never be too safe.
Your choices are:
bigint : store the amount in cents. This is what EFTPOS transactions use.
decimal(12,2) : store the amount with exactly two decimal places. This what most general ledger software uses.
float : terrible idea - inadequate accuracy. This is what naive developers use.
Option 2 is the most common and easiest to work with. Make the precision (12 in my example, meaning 12 digits in all) as large or small as works best for you.
Note that if you are aggregating multiple transactions that were the result of a calculation (eg involving an exchange rate) into a single value that has business meaning, the precision should be higher to provide a accurate macro value; consider using something like decimal(18, 8) so the sum is accurate and the individual values can be rounded to cent precision for display.
Use a 64-bit integer stored as bigint
Store in the small currency unit (cents) or use a big multiplier to create larger integers if cents are not granular enough. I recommend something like micro-dollars where dollars are divided by 1 million.
For example: $5,123.56 can be stored as 5123560000 microdollars.
Simple to use and compatible with every language.
Enough precision to handle fractions of a cent.
Works for very small per-unit pricing (like ad impressions or API charges).
Smaller data size for storage than strings or numerics.
Easy to maintain accuracy through calculations and apply rounding at the final output.
I keep all of my monetary fields as:
numeric(15,6)
It seems excessive to have that many decimal places, but if there's even the slightest chance you will have to deal with multiple currencies you'll need that much precision for converting. No matter what I'm presenting a user, I always store to US Dollar. In that way I can readily convert to any other currency, given the conversion rate for the day involved.
If you never do anything but one currency, the worst thing here is that you wasted a bit of space to store some zeroes.
Use BigInt to store currency as a positive integer representing the monetary value in the smallest currency unit (e.g., 100 cents to store $1.00 or 100 to store ¥100 (Japanese yen, a zero-decimal currency). This is what Stripe does--one the most important financial service companies for global ecommerce.
Source: see "Zero-decimal currencies" at https://stripe.com/docs/currencies
This is not a direct answer, but an example of why float is not the best data type for currency.
Because of the way floating point is represented internally, it is more susceptible to round off errors.
In our own decimal system, you’ll get round off errors whenever you divide by anything other than 2 or 5, which are the factors of 10. In binary, it’s only 2 and not 5, so even “clean” decimals, such as 0.2 (1/5) are at risk.
You can see this if you try the following:
select
0.1::float + 0.2::float as floats, -- 0.30000000000000004
0.1::numeric + 0.2::numeric as numerics --- 0.3
;
That’s the sort of thing that drives auditors round the bend.
My personal recommendation is decimal with the precision according to your needs. Decimal with precision = 0 can be the option if you want to store the integer number of currency minor units (e.g. cents) and you have troubles handling decimals in your programming language.
To find out the needed precision you need to consider the following:
Types of currencies you support (they can have different number of decimals). Cryptocurrencies have up to 18 decimals (ETH). The number of decimals can change over time due to inflation.
Storing prices of small units of goods (probably as a result of conversion from another currency) or having accumulators (accumulate 10% fee from 1 cent transactions until the sum reaches 1 cent) can require using more decimals than are defined for a currency
Storing integer number of minimal units can lead to the need of rescaling values in the future if you need to change the precision. If you use decimals, it's much easier.
Note, that you also need to find the corresponding data type in the programming language you use.
More details and caveats in the article.

Swedish "personnummer" (personal identity number) in SQL

This is a specific instance of an old problem: How to store "numbers" (e.g. phone numbers, IP addresses, social security numbers) in SQL databases?
Background: In Sweden, Personal Identity Numbers ("personnummer") are extremely common: You use them when communicating with the government, the bank, your employer, etc. People born in Sweden are assigned them when born. My immigrant friends lament the dark couple of weeks before they got a personnummer and could finally get a debit card and start looking for jobs.
My organization needs to store personnummer of our members. We have a SQL database for this. How should I store the data?
From Wikipedia, regarding the format of a personnummer:
The personal identity number consists of 10 digits and a hyphen. The first six correspond to the person's birthday, in YYMMDD form. They are followed by a hyphen. People over the age of 100 replace the hyphen with a plus sign. The seventh through ninth are a serial number. An odd ninth number is assigned to males and an even ninth number is assigned to females. Some county authorities, such as Stockholm, and some banks, have started using 12 digit numbers to allow YYYYMMDD. This format is also used on some Swedish ID-cards[clarification needed] and on the Swedish European Health Insurance Cards but not on state-issued identity documents.
The tenth digit is a checksum which was introduced in 1967 when the system was computerized.
So, a personnummer could be "120101-3842" for a person born this year. This is also commonly formatted as "20120101-3842" because of Y2K and "replacing the hyphen with a plus sign" is not well-known.
In a database column, I imagine I can:
Store it as a VARCHAR, formatted as "120101-3842", "20120101-3842" or "201201013842" (shaving of a byte by getting of the superfluous hyphen in the YYYYMMDD-format).
Store the full YYYYMMDDXXXX as an INTEGER, which is too big for 32 bits but fits without problems in 64 bits.
There won't be any issues with leading zeroes in this case, and using a VARCHAR is almost twice the size. Unlike IP addresses, storing this number as an INTEGER does not make it harder to read for a human (i.e. "127.0.0.1" compared to 2130706433).
I appreciate the "strictness" of an INTEGER column but also feel that this might run into unseen issues.
EDIT: We have a real need to validate this input with the checksum et cetera, which requires doing math on the indivdual digits (multiplying, summing etc). Since digits aren't really ... uh... part of a quantity, but of decimal formatting, it might make sense to consider it a varchar after all.
Use VARCHAR with a fixed length because it is the most simple approach. And I don't think that your organisation will store the number of all 9.5 million inhabitants so that saving space is a real design goal? :)
So, as I understand it, the hyphen / plus signs are only required for the format with 2 digit year.
If I were you, I would on the application side convert to the 4 digit year format (And drop the hyphen). Then store the resulting value as an integer. As you have stated, this will save space, and will allow you to mathematically transform the values (Although I imagine that on personal numbers this may be irrelevant).
I think the key here is that you should choose a single format rather than trying to manage two different formats in the database. This will also help to lead to application consistency. When it comes to external applications that require one or another format, you can place a transform into the transfer code.
On a side note, it should be fairly trivial to create a trigger that would automatically assign the 2 digit year format (As long as you replace the hyphen / plus with a digit) To the 4 year format.
I would store the canonical form 201201013842 as a CHAR (rather than a VARCHAR).
The bottom line is that you do not control the semantics of the number (Swedish authorities do). If at some point they decide to add non numeric characters to the number (as the number already does in the older format), you will be better equipped to deal with the change.
We have the same problem and we currently store it as yyyyMMdd-xxxx, but if i where to redesign this today i would store the yyyyMMdd in a date field as that would handle the validation of the date, then i would store the 4 other values in a nchar(4) and add a constraint to ensure its only numbers.

List of Best Practice MySQL Data Types

Is there a list of best practice MySQL data types for common applications. For example, the list would contain the best data type and size for id, ip address, email, subject, summary, description content, url, date (timestamp and human readable), geo points, media height, media width, media duration, etc
Thank you!!!
i don't know of any, so let's start one!
numeric ID/auto_increment primary keys: use an unsigned integer. do not use 0 as a value. and keep in mind the maximum value of of the various sizes, i.e. don't use int if you don't need 4 billion values when the 16 million offered by mediumint will suffice.
dates: unless you specifically need dates/times that are outside the supported range of mysql's DATE and TIME types, use them! if you instead use unix timestamps, you have to convert them to use the built-in date and time functions. if your app needs unix timestamps, you can always convert the standard date and time data types on the way out using unix_timestamp().
ip addresses: use inet_aton() and inet_ntoa() since it easily compacts an ip address in to 4 bytes and gives you the ability to do range searches that utilize indexes.
Integer Display Width You likely define your integers something like this "INT(4)" but have been baffled by the fact that (4) has no real effect on the stored numbers. In other words, you can store numbers like 999999 just fine. The reason is that for integers, (4) is the display width, and only has an effect if used with the ZEROFILL modifier. Further, this is for display purposes only, so you could define a column as "INT(4) ZEROFILL" and store 99999. If you stored 999, the mysql REPL (console) would output 0999 when you've selected this column.
In other words, if you don't need the ZEROFILL stuff, you can leave off the display width.
Money: Use the Decimal data type. Based on real-world production scenarios I recommend (19,8).
EDIT: My original recommendation was (19,4); however, I've recently run into a production issue where the client reported that they absolutely needed decimal with a "scale" of "8"; thus "4" wasn't enough and was causing improper tax calculations. I now recommend (19,8) based on a real-world scenario. I would love to hear stories needing a more granular scale.

Is there any reason for numeric rather than int in T-SQL?

Why would someone use numeric(12, 0) datatype for a simple integer ID column? If you have a reason why this is better than int or bigint I would like to hear it.
We are not doing any math on this column, it is simply an ID used for foreign key linking.
I am compiling a list of programming errors and performance issues about a product, and I want to be sure they didn't do this for some logical reason. If you follow this link:
http://msdn.microsoft.com/en-us/library/ms187746.aspx
... you can see that the numeric(12, 0) uses 9 bytes of storage and being limited to 12 digits, theres a total of 2 trillion numbers if you include negatives. WHY would a person use this when they could use a bigint and get 10 million times as many numbers with one byte less storage. Furthermore, since this is being used as a product ID, the 4 billion numbers of a standard int would have been more than enough.
So before I grab the torches and pitch forks - tell me what they are going to say in their defense?
And no, I'm not making a huge deal out of nothing, there are hundreds of issues like this in the software, and it's all causing a huge performance problem and using too much space in the database. And we paid over a million bucks for this crap... so I take it kinda seriously.
Perhaps they're used to working with Oracle?
All numeric types including ints are normalized to a standard single representation among all platforms.
There are many reasons to use numeric - for example - financial data and other stuffs which need to be accurate to certain decimal places. However for the example you cited above, a simple int would have done.
Perhaps sloppy programmers working who didn't know how to to design a database ?
Before you take things too seriously, what is the data storage requirement for each row or set of rows for this item?
Your observation is correct, but you probably don't want to present it too strongly if you're reducing storage from 5000 bytes to 4090 bytes, for example.
You don't want to blow your credibility by bringing this up and having them point out that any measurable savings are negligible. ("Of course, many of our lesser-experienced staff also make the same mistake.")
Can you fill in these blanks?
with the data type change, we use
____ bytes of disk space instead of ____
____ ms per query instead of ____
____ network bandwidth instead of ____
____ network latency instead of ____
That's the kind of thing which will give you credibility.
How old is this application that you are looking into?
Previous to SQL Server 2000 there was no bigint. Maybe its just something that has made it from release to release for many years without being changed or the database schema was copied from an application that was this old?!?
In your example I can't think of any logical reason why you wouldn't use INT. I know there are probably reasons for other uses of numeric, but not in this instance.
According to: http://doc.ddart.net/mssql/sql70/da-db_1.htm
decimal
Fixed precision and scale numeric data from -10^38 -1 through 10^38 -1.
numeric
A synonym for decimal.
int
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).
It is impossible to know if there is a reason for them using decimal, since we have no code to look at though.
In some databases, using a decimal(10,0) creates a packed field which takes up less space. I know there are many tables around my work that use that. They probably had the same kind of thought here, but you have gone to the documentation and proven that to be incorrect. More than likely, I would say it will boil down to a case of "that's the way we have always done it, because someone one time said it was better".
It is possible they spend a LOT of time in MS Access and see 'Number' often and just figured, its a number, why not use numeric?
Based on your findings, it doesn't sound like they are the optimization experts, and just didn't know. I'm wondering if they used schema generation tools and just relied on them too much.
I wonder how efficient an index on a decimal value (even if 0 scale is set) for a primary key compares to a pure integer value.
Like Mark H. said, other than the indexing factor, this particular scenario likely isn't growing the database THAT much, but if you're looking for ammo, I think you did find some to belittle them with.
In your citation, the decimal shows precision of 1-9 as using 5 bytes. Your column apparently has 12,0 - using 4 bytes of storage - same as integer.
Moreover, INT, datatype can go to a power of 31:
-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
While decimal is much larger to 38:
- 10^38 +1 through 10^38 - 1
So the software creator was actually providing more while using the same amount of storage space.
Now, with the basics out of the way, the software creator actually limited themselves to just 12 numbers or 123,456,789,012 (just an example for place holders not a maximum number). If they used INT they could not scale this column - it would go up to the full 31 digits. Perhaps there is a business reason to limit this column and associated columns to 12 digits.
An INT is an INT, while a DECIMAL is scalar.
Hope this helps.
PS:
The whole number argument is:
A) Whole numbers are 0..infinity
B) Counting (Natural) numbers are 1..infinity
C) Integers are infinity (negative) .. infinity (positive)
D) I would not cite WikiANYTHING for anything. Come on, use a real source! May as well be http://MyPersonalMathCite.com

Storing money in a decimal column - what precision and scale?

I'm using a decimal column to store money values on a database, and today I was wondering what precision and scale to use.
Since supposedly char columns of a fixed width are more efficient, I was thinking the same could be true for decimal columns. Is it?
And what precision and scale should I use? I was thinking precision 24/8. Is that overkill, not enough or ok?
This is what I've decided to do:
Store the conversion rates (when applicable) in the transaction table itself, as a float
Store the currency in the account table
The transaction amount will be a DECIMAL(19,4)
All calculations using a conversion rate will be handled by my application so I keep control of rounding issues
I don't think a float for the conversion rate is an issue, since it's mostly for reference, and I'll be casting it to a decimal anyway.
Thank you all for your valuable input.
If you are looking for a one-size-fits-all, I'd suggest DECIMAL(19, 4) is a popular choice (a quick Google bears this out). I think this originates from the old VBA/Access/Jet Currency data type, being the first fixed point decimal type in the language; Decimal only came in 'version 1.0' style (i.e. not fully implemented) in VB6/VBA6/Jet 4.0.
The rule of thumb for storage of fixed point decimal values is to store at least one more decimal place than you actually require to allow for rounding. One of the reasons for mapping the old Currency type in the front end to DECIMAL(19, 4) type in the back end was that Currency exhibited bankers' rounding by nature, whereas DECIMAL(p, s) rounded by truncation.
An extra decimal place in storage for DECIMAL allows a custom rounding algorithm to be implemented rather than taking the vendor's default (and bankers' rounding is alarming, to say the least, for a designer expecting all values ending in .5 to round away from zero).
Yes, DECIMAL(24, 8) sounds like overkill to me. Most currencies are quoted to four or five decimal places. I know of situations where a decimal scale of 8 (or more) is required but this is where a 'normal' monetary amount (say four decimal places) has been pro rata'd, implying the decimal precision should be reduced accordingly (also consider a floating point type in such circumstances). And no one has that much money nowadays to require a decimal precision of 24 :)
However, rather than a one-size-fits-all approach, some research may be in order. Ask your designer or domain expert about accounting rules which may be applicable: GAAP, EU, etc. I vaguely recall some EU intra-state transfers with explicit rules for rounding to five decimal places, therefore using DECIMAL(p, 6) for storage. Accountants generally seem to favour four decimal places.
PS Avoid SQL Server's MONEY data type because it has serious issues with accuracy when rounding, among other considerations such as portability etc. See Aaron Bertrand's blog.
Microsoft and language designers chose banker's rounding because hardware designers chose it [citation?]. It is enshrined in the Institute of Electrical and Electronics Engineers (IEEE) standards, for example. And hardware designers chose it because mathematicians prefer it. See Wikipedia; to paraphrase: The 1906 edition of Probability and Theory of Errors called this 'the computer's rule' ("computers" meaning humans who perform computations).
We recently implemented a system that needs to handle values in multiple currencies and convert between them, and figured out a few things the hard way.
NEVER USE FLOATING POINT NUMBERS FOR MONEY
Floating point arithmetic introduces inaccuracies that may not be noticed until they've screwed something up. All values should be stored as either integers or fixed-decimal types, and if you choose to use a fixed-decimal type then make sure you understand exactly what that type does under the hood (ie, does it internally use an integer or floating point type).
When you do need to do calculations or conversions:
Convert values to floating point
Calculate new value
Round the number and convert it back to an integer
When converting a floating point number back to an integer in step 3, don't just cast it - use a math function to round it first. This will usually be round, though in special cases it could be floor or ceil. Know the difference and choose carefully.
Store the type of a number alongside the value
This may not be as important for you if you're only handling one currency, but it was important for us in handling multiple currencies. We used the 3-character code for a currency, such as USD, GBP, JPY, EUR, etc.
Depending on the situation, it may also be helpful to store:
Whether the number is before or after tax (and what the tax rate was)
Whether the number is the result of a conversion (and what it was converted from)
Know the accuracy bounds of the numbers you're dealing with
For real values, you want to be as precise as the smallest unit of the currency. This means you have no values smaller than a cent, a penny, a yen, a fen, etc. Don't store values with higher accuracy than that for no reason.
Internally, you may choose to deal with smaller values, in which case that's a different type of currency value. Make sure your code knows which is which and doesn't get them mixed up. Avoid using floating point values even here.
Adding all those rules together, we decided on the following rules. In running code, currencies are stored using an integer for the smallest unit.
class Currency {
String code; // eg "USD"
int value; // eg 2500
boolean converted;
}
class Price {
Currency grossValue;
Currency netValue;
Tax taxRate;
}
In the database, the values are stored as a string in the following format:
USD:2500
That stores the value of $25.00. We were able to do that only because the code that deals with currencies doesn't need to be within the database layer itself, so all values can be converted into memory first. Other situations will no doubt lend themselves to other solutions.
And in case I didn't make it clear earlier, don't use float!
When handling money in MySQL, use DECIMAL(13,2) if you know the precision of your money values or use DOUBLE if you just want a quick good-enough approximate value.
So if your application needs to handle money values up to a trillion dollars (or euros or pounds), then this should work:
DECIMAL(13, 2)
Or, if you need to comply with GAAP then use:
DECIMAL(13, 4)
The money datatype on SQL Server has four digits after the decimal.
From SQL Server 2000 Books Online:
Monetary data represents positive or negative amounts of money. In Microsoft® SQL Server™ 2000, monetary data is stored using the money and smallmoney data types. Monetary data can be stored to an accuracy of four decimal places. Use the money data type to store values in the range from -922,337,203,685,477.5808 through +922,337,203,685,477.5807 (requires 8 bytes to store a value). Use the smallmoney data type to store values in the range from -214,748.3648 through 214,748.3647 (requires 4 bytes to store a value). If a greater number of decimal places are required, use the decimal data type instead.
4 decimal places would give you the accuracy to store the world's smallest currency sub-units. You can take it down further if you need micropayment (nanopayment?!) accuracy.
I too prefer DECIMAL to DBMS-specific money types, you're safer keeping that kind of logic in the application IMO. Another approach along the same lines is simply to use a [long] integer, with formatting into ¤unit.subunit for human readability (¤ = currency symbol) done at the application level.
If you were using IBM Informix Dynamic Server, you would have a MONEY type which is a minor variant on the DECIMAL or NUMERIC type. It is always a fixed-point type (whereas DECIMAL can be a floating point type). You can specify a scale from 1 to 32, and a precision from 0 to 32 (defaulting to a scale of 16 and a precision of 2). So, depending on what you need to store, you might use DECIMAL(16,2) - still big enough to hold the US Federal Deficit, to the nearest cent - or you might use a smaller range, or more decimal places.
Sometimes you will need to go to less than a cent and there are international currencies that use very large demoniations. For example, you might charge your customers 0.088 cents per transaction. In my Oracle database the columns are defined as NUMBER(20,4)
If you're going to be doing any sort of arithmetic operations in the DB (multiplying out billing rates and so on), you'll probably want a lot more precision than people here are suggesting, for the same reasons that you'd never want to use anything less than a double-precision floating point value in application code.
I would think that for a large part your or your client's requirements should dictate what precision and scale to use. For example, for the e-commerce website I am working on that deals with money in GBP only, I have been required to keep it to Decimal( 6, 2 ).
A late answer here, but I've used
DECIMAL(13,2)
which I'm right in thinking should allow upto 99,999,999,999.99.