Strings or integers for Steam IDs - sql

What is the preferred datatype for storing Steam IDs? These IDs are very similar to credit card numbers, but is different cases of use. Until now I'm using unsigned big integer but I'm not 100% sure yet. If the ID starts with a zero number, can cause issues? Eg ID: 76561197960287930

In general number take less space on the disk to store and on the transfer from the database to the application compared to strings. They are for the same reason faster to compare e.g. in the where-clause of a query.
Have a look here for the bytes needed to store numbers and bytes to store strings.
In the database the numbers are stored without leading zeros. You could fill up your numbers with leading zeros in your application after loading them from the database, if the numbers always have a fixed size.
But if the numbers can have leading zeros strings are easier to handle, because you do not have to implement additional logic for edgecases like leading zeros.

Related

Minimum precision SQL numbers

What is the best way to check for minimum precision with numbers in SQL with Oracle database?
CreditCardNumber NUMBER(16) NOT NULL
CHECK (CreditCardNumber LIKE '________________')
or
CreditCardNumber NUMBER(16) NOT NULL
CHECK (REGEXP_LIKE(CreditCardNumber, '^\d{16}'))
I understand this is presentation level checking but I heard it's still good practice to avoid illogical data on the database level.
None of your function really work as expected as you will end-up having an implicit number to string conversion. Losing leading zeros. Maybe this is not a problem in that particular case, assuming a credit card number never start with a leading zero (and never will ? — according to ISO/IEC 7812 the leading number could be a 0 in some corner cases).
However, notice you don't have any benefit here in using the NUMBER type, as you will never perform calculation on the credit card "number". So, for that kind of data (credit card "numbers, telephone "numbers", zip codes, ...), I would strongly suggest you to use a character type (VARCHAR2 or CHAR if you prefer) instead, and at the very least check using an appropriate regexp than only digits are part of the string. Would be better to validate the checksum as suggested by #Allan in his answer though.
In addition, even if 16 digits is the most common case, bank card numbers are variable length -- from 12 to 19 digits (according to http://www.watersprings.org/pub/id/draft-eastlake-card-map-08.txt as I don't have access to the ISO official document).
Finally, concerning credit card numbers, you have to remember that depending your local regulation you are not necessarily allowed to store them unencrypted...
NUMBER(16) will only allow integers (i.e. if you try to insert '10.1', it will round to '10').
Keep in mind too that credit card numbers aren't always 16 digits - American Express uses 15.
The only benefit you'll gain from storing credit cards numbers in the NUMBER type is storage space. Since the digits are packed at a ratio of 9 digits to 4 bytes, 8 bytes will store 16 digits. However, every interaction with the data will require a type conversion to or from text, so you'll have to weigh the costs of storage, processing, and ease of coding.
The obvious way to validate that the value is wide enough is to check the value numerically:
CreditCardNumber NUMBER(16) NOT NULL
CHECK (CreditCardNumber >= 1000000000000000)
However, as #BenGrimm points out, this may not be valid for all credit card numbers.
One way to validate the card lenght per providers is to have a lookup table with each provider that you accept and the length of their card numbers. Again, you'd have to use a trigger to check against that, but it would allow you to verify that the length is appropriate is precisely correct.
A better validation might be to implement the Luhn algorithm in a function and use it to validate the column value via a trigger.
Finally, to reiterate what Sylvain Leroux pointed out, this should all be academic. You shouldn't be storing credit card numbers in plain text and may even be legally or contractually prevented from doing so.
You could potentially use ceiling(log10(Number)) = 16. I think for a credit card number there are better ways to check though.

Swedish "personnummer" (personal identity number) in SQL

This is a specific instance of an old problem: How to store "numbers" (e.g. phone numbers, IP addresses, social security numbers) in SQL databases?
Background: In Sweden, Personal Identity Numbers ("personnummer") are extremely common: You use them when communicating with the government, the bank, your employer, etc. People born in Sweden are assigned them when born. My immigrant friends lament the dark couple of weeks before they got a personnummer and could finally get a debit card and start looking for jobs.
My organization needs to store personnummer of our members. We have a SQL database for this. How should I store the data?
From Wikipedia, regarding the format of a personnummer:
The personal identity number consists of 10 digits and a hyphen. The first six correspond to the person's birthday, in YYMMDD form. They are followed by a hyphen. People over the age of 100 replace the hyphen with a plus sign. The seventh through ninth are a serial number. An odd ninth number is assigned to males and an even ninth number is assigned to females. Some county authorities, such as Stockholm, and some banks, have started using 12 digit numbers to allow YYYYMMDD. This format is also used on some Swedish ID-cards[clarification needed] and on the Swedish European Health Insurance Cards but not on state-issued identity documents.
The tenth digit is a checksum which was introduced in 1967 when the system was computerized.
So, a personnummer could be "120101-3842" for a person born this year. This is also commonly formatted as "20120101-3842" because of Y2K and "replacing the hyphen with a plus sign" is not well-known.
In a database column, I imagine I can:
Store it as a VARCHAR, formatted as "120101-3842", "20120101-3842" or "201201013842" (shaving of a byte by getting of the superfluous hyphen in the YYYYMMDD-format).
Store the full YYYYMMDDXXXX as an INTEGER, which is too big for 32 bits but fits without problems in 64 bits.
There won't be any issues with leading zeroes in this case, and using a VARCHAR is almost twice the size. Unlike IP addresses, storing this number as an INTEGER does not make it harder to read for a human (i.e. "127.0.0.1" compared to 2130706433).
I appreciate the "strictness" of an INTEGER column but also feel that this might run into unseen issues.
EDIT: We have a real need to validate this input with the checksum et cetera, which requires doing math on the indivdual digits (multiplying, summing etc). Since digits aren't really ... uh... part of a quantity, but of decimal formatting, it might make sense to consider it a varchar after all.
Use VARCHAR with a fixed length because it is the most simple approach. And I don't think that your organisation will store the number of all 9.5 million inhabitants so that saving space is a real design goal? :)
So, as I understand it, the hyphen / plus signs are only required for the format with 2 digit year.
If I were you, I would on the application side convert to the 4 digit year format (And drop the hyphen). Then store the resulting value as an integer. As you have stated, this will save space, and will allow you to mathematically transform the values (Although I imagine that on personal numbers this may be irrelevant).
I think the key here is that you should choose a single format rather than trying to manage two different formats in the database. This will also help to lead to application consistency. When it comes to external applications that require one or another format, you can place a transform into the transfer code.
On a side note, it should be fairly trivial to create a trigger that would automatically assign the 2 digit year format (As long as you replace the hyphen / plus with a digit) To the 4 year format.
I would store the canonical form 201201013842 as a CHAR (rather than a VARCHAR).
The bottom line is that you do not control the semantics of the number (Swedish authorities do). If at some point they decide to add non numeric characters to the number (as the number already does in the older format), you will be better equipped to deal with the change.
We have the same problem and we currently store it as yyyyMMdd-xxxx, but if i where to redesign this today i would store the yyyyMMdd in a date field as that would handle the validation of the date, then i would store the 4 other values in a nchar(4) and add a constraint to ensure its only numbers.

Is varchar(128) better than varchar(100)

Quick question. Does it matter from the point of storing data if I will use decimal field limits or hexadecimal (say 16,32,64 instead of 10,20,50)?
I ask because I wonder if this will have anything to do with clusters on HDD?
Thanks!
VARCHAR(128) is better than VARCHAR(100) if you need to store strings longer than 100 bytes.
Otherwise, there is very little to choose between them; you should choose the one that better fits the maximum length of the data you might need to store. You won't be able to measure the performance difference between them. All else apart, the DBMS probably only stores the data you send, so if your average string is, say, 16 bytes, it will only use 16 (or, more likely, 17 - allowing 1 byte for storing the length) bytes on disk. The bigger size might affect the calculation of how many rows can fit on a page - detrimentally. So choosing the smallest size that is adequate makes sense - waste not, want not.
So, in summary, there is precious little difference between the two in terms of performance or disk usage, and aligning to convenient binary boundaries doesn't really make a difference.
If it would be a C-Program I'd spend some time to think about that, too. But with a database I'd leave it to the DB engine.
DB programmers spent a lot of time in thinking about the best memory layout, so just tell the database what you need and it will store the data in a way that suits the DB engine best (usually).
If you want to align your data, you'll need exact knowledge of the internal data organization: How is the string stored? One, two or 4 bytes to store the length? Is it stored as plain byte sequence or encoded in UTF-8 UTF-16 UTF-32? Does the DB need extra bytes to identify NULL or > MAXINT values? Maybe the string is stored as a NUL-terminated byte sequence - then one byte more is needed internally.
Also with VARCHAR it is not neccessary true, that the DB will always allocate 100 (128) bytes for your string. Maybe it stores just a pointer to where space for the actual data is.
So I'd strongly suggest to use VARCHAR(100) if that is your requirement. If the DB decides to align it somehow there's room for extra internal data, too.
Other way around: Let's assume you use VARCHAR(128) and all things come together: The DB allocates 128 bytes for your data. Additionally it needs 2 bytes more to store the actual string length - makes 130 bytes - and then it could be that the DB aligns the data to the next (let's say 32 byte) boundary: The actual data needed on the disk is now 160 bytes 8-}
Yes but it's not that simple. Sometimes 128 can be better than 100 and sometimes, it's the other way around.
So what is going on? varchar only allocates space as necessary so if you store hello world in a varchar(100) it will take exactly the same amount of space as in a varchar(128).
The question is: If you fill up the rows, will you hit a "block" limit/boundary or not?
Databases store their data in blocks. These have a fixed size, for example 512 (this value can be configured for some databases). So the question is: How many blocks does the DB have to read to fetch each row? Rows that span several block will need more I/O, so this will slow you down.
But again: This doesn't depend on the theoretical maximum size of the columns but on a) how many columns you have (each column needs a little bit of space even when it's empty or null), b) how many fixed width columns you have (number/decimal, char), and finally c) how much data you have in variable columns.

[My]SQL VARCHAR Size and Null-Termination

Disclaimer: I'm very new to SQL and databases in general.
I need to create a field that will store a maximum of 32 characters of text data. Does "VARCHAR(32)" mean that I have exactly 32 characters for my data? Do I need to reserve an extra character for null-termination?
I conducted a simple test and it seems that this is a WYSIWYG buffer. However, I wanted to get a concrete answer from people who actually know what they're doing.
I have a C[++] background, so this question is raising alarm bells in my head.
Yes, you have 32 characters at your disposal. SQL does not concern itself with nul terminated strings like some programming languages do.
Your VARCHAR specification size is the max size of your data, so in this case, 32 characters. However, VARCHARS are a dynamic field, so the actual physical storage used is only the size of your data, plus one or two bytes.
If you put a 10-character string into a VARCHAR(32), the physical storage will be 11 or 12 bytes (the manual will tell you the exact formula).
However, when MySQL is dealing with result sets (ie. after a SELECT), 32 bytes will be allocated in memory for that field for every record.

in sql,How does fixed-length data type take place in memory?

I want to know in sql,how fixed-length data type take places length in memory?I know is that for varchar,if we specify length is (20),and if user input length is 15,it takes 20 by setting space.for varchar2,if we specify length is (20),and if user input is 15,it only take 15 length in memory.So how about fixed-length data type take place?I searched in Google,but I did not find explanation with example.Please explain me with example.Thanks in advance.
A fixed length data field always consumes its full size.
In the old days (FORTRAN), it was padded at the end with space characters. Modern databases might do that too, but either implicitly trim trailing blanks off or the query might have to do it explicitly.
Variable length fields are a relative newcomer to databases, probably in the 1970s or 1980s they made widespread appearances.
It is considerably easier to manage fixed length record offsets and sizes rather than compute the offset of each data item in a record which has variable length fields. Furthermore, a fixed length data record is easily addressed in a data file by computing the byte offset of its beginning by multiplying the record size times the record number (and adding the length of whatever fixed header data is at the beginning of file).