MySQL - how to front pad zip code with "0"? - sql

In my MySQL InnoDB database, I have dirty zip code data that I want to clean up.
The clean zip code data is when I have all 5 digits for a zip code (e.g. "90210").
But for some reason, I noticed in my database that for zipcodes that start with a "0", the 0 has been dropped.
So "Holtsville, New York" with zipcode "00544" is stored in my database as "544"
and
"Dedham, MA" with zipcode "02026" is stored in my database as "2026".
What SQL can I run to front pad "0" to any zipcode that is not 5 digits in length? Meaning, if the zipcode is 3 digits in length, front pad "00". If the zipcode is 4 digits in length, front pad just "0".
UPDATE:
I just changed the zipcode to be datatype VARCHAR(5)

Store your zipcodes as CHAR(5) instead of a numeric type, or have your application pad it with zeroes when you load it from the DB. A way to do it with PHP using sprintf():
echo sprintf("%05d", 205); // prints 00205
echo sprintf("%05d", 1492); // prints 01492
Or you could have MySQL pad it for you with LPAD():
SELECT LPAD(zip, 5, '0') as zipcode FROM table;
Here's a way to update and pad all rows:
ALTER TABLE `table` CHANGE `zip` `zip` CHAR(5); #changes type
UPDATE table SET `zip`=LPAD(`zip`, 5, '0'); #pads everything

You need to decide the length of the zip code (which I believe should be 5 characters long). Then you need to tell MySQL to zero-fill the numbers.
Let's suppose your table is called mytable and the field in question is zipcode, type smallint. You need to issue the following query:
ALTER TABLE mytable CHANGE `zipcode` `zipcode`
MEDIUMINT( 5 ) UNSIGNED ZEROFILL NOT NULL;
The advantage of this method is that it leaves your data intact, there's no need to use triggers during data insertion / updates, there's no need to use functions when you SELECT the data and that you can always remove the extra zeros or increase the field length should you change your mind.

Ok, so you've switched the column from Number to VARCHAR(5). Now you need to update the zipcode field to be left-padded. The SQL to do that would be:
UPDATE MyTable
SET ZipCode = LPAD( ZipCode, 5, '0' );
This will pad all values in the ZipCode column to 5 characters, adding '0's on the left.
Of course, now that you've got all of your old data fixed, you need to make sure that your any new data is also zero-padded. There are several schools of thought on the correct way to do that:
Handle it in the application's business logic. Advantages: database-independent solution, doesn't involve learning more about the database. Disadvantages: needs to be handled everywhere that writes to the database, in all applications.
Handle it with a stored procedure. Advantages: Stored procedures enforce business rules for all clients. Disadvantages: Stored procedures are more complicated than simple INSERT/UPDATE statements, and not as portable across databases. A bare INSERT/UPDATE can still insert non-zero-padded data.
Handle it with a trigger. Advantages: Will work for Stored Procedures and bare INSERT/UPDATE statements. Disadvantages: Least portable solution. Slowest solution. Triggers can be hard to get right.
In this case, I would handle it at the application level (if at all), and not the database level. After all, not all countries use a 5-digit Zipcode (not even the US -- our zipcodes are actually Zip+4+2: nnnnn-nnnn-nn) and some allow letters as well as digits. Better NOT to try and force a data format and to accept the occasional data error, than to prevent someone from entering the correct value, even though it's format isn't quite what you expected.

I know this is well after the OP. One way you can go with that keeps the table storing the zipcode data as an unsigned INT but displayed with zeros is as follows.
select LPAD(cast(zipcode_int as char), 5, '0') as zipcode from table;
While this preserves the original data as INT and can save some space in storage you will be having the server perform the INT to CHAR conversion for you. This can be thrown into a view and the person who needs this data can be directed there vs the table itself.

It would still make sense to create your zip code field as a zerofilled unsigned integer field.
CREATE TABLE xxx (
zipcode INT(5) ZEROFILL UNSIGNED,
...
)
That way mysql takes care of the padding for you.

CHAR(5)
or
MEDIUMINT (5) UNSIGNED ZEROFILL
The first takes 5 bytes per zip code.
The second takes only 3 bytes per zip code. The ZEROFILL option is necessary for zip codes with leading zeros.

you should use UNSIGNED ZEROFILL in your table structure.

LPAD works with VARCHAR2 as it does not put spaces for left over bytes.
LPAD changes leftover/null bytes to zeros on LHS
SO datatype should be VARCHAR2

Related

Oracle SQL to change column type from varchar2(10) to char(10) while it contains data

How to change column type from varchar2(10) to char(10) without losing existing data using oracle developer?
ALTER TABLE TBL_NAME
MODIFY (CRTE CHAR(10));
Will it impact existing data?
It should not impact existing data.
See this SQL Fiddle.
I would reconsider changing varchar2 for char. Char will take up more space on HDD as char will always take N bytes. And if your strings are not execatly 10 characters, all of them char will be pain for searches.
There is really no reason to do this. As this "Ask Tom" answer, explains, the two are stored equivalently.
If you want to ensure that the value has exactly 10 characters, then use a check constraint:
ALTER TABLE TBL_NAME ADD CONSTRAINT CHECK (LEN(CRTE) = 10);
The difference is padding the string in result sets, and that is often better handled on an ad-hoc basis. Trailing spaces can be quite tricky to deal with.
If you really want to change the column type, you can use:
alter table t modify ( x char(10) );
This should be safe with existing data, because you are not reducing the length of the column.

MS SQL Server Zero Padding

EDIT:
I'm changing column datatype to Varchar, should suggestion work, answer will be upvoted
Full Story:
I receive data for a person with an associated temporary number for every person that is 5 digits long, I process this information and then send variables to a stored procedure that handles the inserting of this data. When sending the variables to the stored procedure I appear to be losing any prefixed 0's.
For example:
Number sent to stored Proc - Number actually inserted column
12345 - 12345
01234 - 1234
12340 - 12340
This only appears to be happening for numbers with a 0 in front. Meaning if I received:
00012 it would insert as 12
Is there a way where I could either update the column to always 0 pad to the left by a fixed number, meaning if we got 12 it would automatically make the value 00012.
OR
Is there a way to do this with the variable when its received by the stored procedure before the variable is inserted into the table.
Something along the lines of:
SET #zeroPaddedFixedNum = LeftPad(#numberRecieved, '0', 5);
Additionally, I now need to stop any more numbers from inserting and update all current incorrectly lengthed numbers. Any suggestions?
Perhaps it's just my Google ability that has failed but I have tried searching numerous pages.
For this, the column should be of varchar datatype. You can then do this
Insert into table(col)
select right('00000'+cast(#var as varchar(5)),5)
EDIT : To update existing data
Update table
set col=right('00000'+cast(col as varchar(5)),5)
where len(col)<5
As pointed out, you'll have to use VARCHAR(5) for your needs... But I would not change the columns type, if the values stored are numbers actually. Rather use one of the following, whenever you pass these values to your SP (You might use a computed column or a VIEW though).
Try
SELECT REPLACE(STR(YourNumber,5),' ','0');
The big advantage: In cases, where your number exceeds 5 digits, this would return *****. It is better to get an error than to get wrong numbers... Other approaches with RIGHT() might truncate your result unpredictably.
With SQL Server 2012 you should use FORMAT()
SELECT FORMAT(YourNumber,'00000')

How to restrict the length of INTEGER when creating a table in SQL Server?

When creating a table in SQL SERVER, I want to restrict that the length of an INTEGER column can only be equal 10.
eg: the PhoneNumber is an INTEGER, and it must be a 10 digit number.
How can I do this when I creating a table?
If you want to limit the range of an integer column you can use a check constraint:
create table some_table
(
phone_number integer not null check (phone_number between 0 and 9999999999)
);
But as R.T. and huMpty duMpty have pointed out: a phone number is usually better stored in a varchar column.
If I understand correctly, you want to make sure the entries are exactly 10 digits in length.
If you insist on an Integer Data Type, I would recommend Bigint because of the range limitation of Int(-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647))
CREATE TABLE dbo.Table_Name(
Phone_Number BIGINT CONSTRAINT TenDigits CHECK (Phone_Number BETWEEN 1000000000 and 9999999999)
);
Another option would be to have a Varchar Field of length 10, then you should check only numbers are being entered and the length is not less than 10.
I would recommend you to use varchar as phone number(only for phone numbers as some phone numbers may contain hyphen,phus sign) and restrict the length to 10 ie, varchar(10).
As correctly pointed by a_horse_with_no_name in comments you can put constraint on the numbers to be of specified range like this:
check (phone_number between 0 and 9999999999)
Also on a side note:-
You will receive a error message like this if you use numbers outside the range of int -2147483648 through 2147483647
Arithmetic overflow error converting expression to data type int.
So you will not be able to use all the int of length 10 in your case.
As per me Phone number should not be stored in integer as we are not going to do any numeric operation on it like adding dividing .we are going to treat it as string for e.g. finding all number with ISD say '91' or STD '022' etc secondly if you switch to make it integer you have to handle overflow
I don't think there is a way to limit if you use number fields like int, bigint, smallint, and tinyint
Make a varchar(10) field and validate before insert
Still you need to use int field to store the phone number, you will need to restrict before in your application
Make column varchar and create a check that it strictly should have 10 characters
create table some_table
(
phone_number varchar(10) check (len(phone_number)=10)
);
First consider internal and external format:
Yes, a telephone number can be stored as an integer. You would have to assure however that all numbers are stored in the same format, e.g. as the international number without the plus sign. 4940123456 would then be a German number for instance, as 49 is the German country code. To analize the number later, however, would be difficult; country codes can be 1 to 4 digits, followed by a presumably unknown number of area code digits. But just to know a number and not to know its structure may be sufficient for your purposes. With check constraints you could assure that the number is positive and not longer than, well, how long is the longest number allowed? Be aware: Everytime you show the number, you may have to format the output (in the example given: add a leading plus sign to the number).
The other way would be to store phone numbers as strings. That would make it possible to store numbers such as '+49-40-123456'. Then the internal format is the same as the external. Advantage: You wouldn't have to think of formatting the output everytime you show the number. But you could even change the format on output if you wanted (remove dashes or replace the plus sign with the actual county dial code or remove country and area code for local calls, etc.) You would have to decide whether to enforce a certain format or not. If not, then numbers could look very different '123456', '004940123456', '040/123456', ... To enforce a certain format, you would write a function (because of the complexity of such a format) and use that in a check constraint. Or write an insert trigger (this should be a BEFORE INSERT trigger, because you want to change a value; as T-SQL doesn't provide this, you would use an INSTEAD OF INSERT trigger instead) to have the field formatted as you desire.
My recommendation is:
CREATE TABLE trial_table (phone_number VARCHAR(13));
The column can be used for international numbers too.

Entering special characters fails in Oracle table

I need to test if my application is reading special characters from the database and displaying them in exactly the same way. For this, I need to populate the database table with all special characters available. However, I am not sure how I can specify the special characters in the sql insert query. Can anyone please guide me to an example where I can insert a special character in the query? For simplicity sake, suppose the table is a City table with Area and Avg_Temperature being the 2 columns. If I need to insert the degree (celcius/farhenheit) symbol in Avg_Temperature column, how should I write the query?
*[Edit on 1/9/2012 at 2:50PM EST]*As per Justin Cave's suggestion below, I did following analysis:
Table: create table city(area number, avg_temperature nvarchar2(10));
Date: insert into city values (1100, '10◦C');
Query:
select dump(avg_temperature, 1010) from city where area = 1100;
O/P
DUMP(AVG_TEMPERATURE,1010)
----------------------------------------------------------
Typ=1 Len=8 CharacterSet=AL16UTF16: 0,49,0,48,0,191,0,67
Query
select value$ from sys.props$ where name='NLS_CHARACTERSET';
O/P
VALUE$
----------------
WE8MSWIN1252
Query:
select value$ from sys.props$ where name='NLS_NCHAR_CHARACTERSET';
O/P
----------------
AL16UTF16
It seems that the insert does mess up the special characters as Justin Cave suggested. But I am not able to understand why this is happening? Can anyone please provide related suggestion?
First you should not store the symbol as part of your column. That requires you to declare the column as VARCHAR which will give you lots of problems in the long run (e.g. you cannot sum() on them, you cannot avg() on them and so on)
You should store the unit in which the temperature was taken in a second column (e.g. 1 = celcius and 2 = fahrenheit) and translate this when displaying the data in the frontend. If you really want to store the symbol, declare the units columns as CHAR(1):
CREATE TABLE readings
(
area number(22),
avg_temperature number(10,3),
units varchar(2)
)
Then you can insert it as follows:
INSERT INTO readings
(area, avg_temperature, units)
VALUES
(1000, 12.3, '°C');
But again: I would not recommend to store the actual symbol. Store only the code!
First you need to know what the database character set is. Then you need to know what character set your "client" connection is using. Life is always easier if these are the same.
If your databse is utf-8 and your client is utf-8 then you don't need to do any character escaping you can just use the utf-8 encoding for the desired character.
In your example the degree character is unicode codepoint u+00b0.
In utf-8 this is a two-byte sequence: x'c2', x'b0'.

SQL row is only varchar(10) but has 30+ characters in it

table on external database (when I click modify) states that row A is a varchar(10) but when I look at the data there is obviously many more characters in it. How is this possible?
This concerns me because when I pull data from that row, I only get 10 characters, and the rest is cut off. I am not allowed to modify the external database tables.
How is this possible?
The column was probably originally a varchar(30) and was subsequently altered to varchar(10). I assume data has been written since the change to varchar(10), which makes this a true mess. If altering the column back to a length of 30 is not possible, I would investigate the implications of truncating the old data to 10 characters.
Update
run the following statement to confirm the column length:
select character_maximum_length
from information_schema.columns
where table_name='tablename' and COLUMN_NAME='columnname'
Update 2:
select max(len(column_name))
from tablename