Sql function to turn character field into number field - sql

I'm importing data from one system to another. The former keys off an alphanumeric field whereas the latter requires a numeric integer field. I'd like to find or write a function that I can feed the alphanumeric value to and have it return a number that would be unique to the value passed in.
My first thought was to do a hash, but of course the result of any built in hashes are going to contains letters and plus it's technically possible (however unlikely) that a hash may not be unique.
My first question is whether there is anything built in to sql that I'm overlooking, and short of that I'd like to hear suggestions on the easiest way to implement such a function.

Here is a function which will probably convert from base 10 (integer) to base 36 (alphanumeric) and back again:
https://www.simple-talk.com/sql/t-sql-programming/numeral-systems-and-numbers-conversion-in-sql/
You might find the resultant number is too big to be held in an integer though.

You could concatenate the ascii values of each character of your string and cast the result as a bigint.

If the original data is known to be integers you can use cast:
SELECT CAST(varcharcol AS INT) FROM Table

Related

Like operator for integer

I have a column of type bigint (ProductSerial) in my table. I need to filter the table by the Product serial using like operator. But I found that, like operator can't be used for integer type.
Is there any other method for this (I don't want to use the = operator).
If you must use LIKE, you can cast your number to char/varchar, and perform the LIKE on the result. This is quite inefficient, but since LIKE has a high potential of killing indexes anyway, it may work in your scenario:
... AND CAST(phone AS VARCHAR(9)) LIKE '%0203'
If you are looking to use LIKE to match the beginning or the end of the number, you could use integer division and modulus operators to extract the digits. For example, if you want all nine-digit numbers starting in 407, search for
phone / 1000000 = 407
Although I'm a bit late to the party, I'd like to add the method I'm using to match the first N given numbers (in the example, 123) in any numeric-type column:
SELECT * FROM MyTable WHERE MyColumn / POWER(10, LEN(MyColumn) - LEN(123)) = 123
The technique is similar to #dasblinkenlight's one, but it works regardless of the number of digits of the target column values. This is a viable workaround if your column contain numbers with different length and you don't want to use the CAST+LIKE method (or a calculated column).
For additional details on that (and other LIKE workarounds) check out this blog post that I wrote on this topic.
If you have control over the database you could add a calculated column to copy the integer value to a string:
ALTER TABLE MyTable
ADD CalcCol AS (CAST(ProductSerial AS VARCHAR)) PERSISTED
And query like:
SELECT *
FROM MyTable
WHERE ProductSerial LIKE '%2548%'
This will move the calculation to the insert/update and only on rows inserted/updated rather then converting every row for each query.
This may be a problem if there are a lot of updated to columns as it will add a very small overhead to these.
There may be a way to do it mathematically using modulus but this would take a lot of working out and testing.
You can change your Field PhoneNumbers and store as String and then use the Like You can alter your table so that you can use the LIKE statement, if you still want to use BIGint for your phone numbers, you cannot get the exact Phone Number without using = the method you can use is Between method that looks for the Numbers that are inside the range.
For the edited question: I think you should use = sign for their ID, or convert the Int to String and then Use Like.
The original question related to a phone number. OP has since edited it to refer to serial numbers. This answer refers to the original question only.
My suggestion is to avoid storing your phone numbers as integers in the first place, and thus the problem does not occur. My phone number is in the form, internationally, of:
+44 7844 51515
Storing it as an integer makes no sense here, as you will never need to do any mathematical operation on it, and you would lose the leading plus. Within the UK, it is:
07844 51515
and thus storing it as an integer would lose its leading zero. Unless you have a very very specific requirement to store it as an integer, you would fare significantly better storing it as a string instead.
[Note: Not actually my phone number]

Can Long Integer store letters?

I ran 'Analyze Performance' feature in Access and it had an "idea" to improve performance; Access said I should convert items that are alphanumeric mixes that look like this 12BB1-DF740§ from text data type into long integer (the specific name from the idea). Whether Access is right that this would improve performance is secondary to whether long integer can store letters at all.
[§ About the data - the hyphen in the data provided to me is always present at that location; the letters are always A-F]
From what I can tell, w3schools is indicating that Long will only store numbers
Long - Allows whole numbers between -2,147,483,648 and 2,147,483,647
Am I conflating data types? (Further, when I pull up the design view, it only offers number as a data type; there is no long or long integer)
Can Long Integer store letters?
If my column is already populated, and I convert the data type, will I lose data?
You could store those values by splitting them into 2 Long Integer columns. Then when you need the original text form, concatenate their Hex() values with a dash between.
? Hex(76721) & "-" & Hex(915264)
12BB1-DF740
However I don't see why that would be worth doing. Occasionally a performance analyzer suggestion just doesn't make sense to me; this is such a case.
I've never run into this but it looks like it thinks your strings are hexadecimal numbers.
If you never have letters other than A-F then you could store them as longs and then convert back using the Hex() function but that seems mighty kludgy and something I'd avoid unless you're really desperate to eek out some performance.
If it is in fact hexadecimal data, and it always has the same format so that the dash could just be added at the same place, then it would be possible to store the data numeric, and convert it into the hexadecimal notation when needed.
Ten hexadecimal digits represent 40 bits of data, so the Long type described at the w3schools page wouldn't do, as it's only 32 bits. You would need a data type that is a 64 bits, like a double or bigint. (The latter one might not be available in Access.)
However, that would only be any real gain if you actually do any processing of the data in the numeric form. Otherwise you would only save a few bytes per record, and you would need extra processing to convert to and from the numeric format.
If your table is already populated, you would have to read out the values, convert them, and store them back in the numeric form.

SQL - Numeric data type with leading zeros

I need to store Medicare APC codes. I believe the format requires 4 numbers. Leading zeros are relevant. Is there any way to store this data type with verification? How should I store this data (varchar(4), int)?
This kind of issue, storing zero leading numbers that need to be treated as Numeric values on some scenarios (i.e. sorting) and as textual values in others (i.e. addresses) is always a pain and there is no one answer that is best for all users. At my company we have a database that stores numbers as text for codes (not Medicare APC codes) and we must pad them with zero’s so they will sort properly when used in an order operation.
Do not use a numeric data type for this because the item is not a true number but textual data that uses numeric characters. You will not be performing any calculations or aggregates on the codes and so the only benefit to storing them as a number would be to ensure proper sorting of the codes and that can be done with the code stored as text by padding it with zeros where needed. If you sue a numeric data type then any time the code is combined with other textual values you will have to explicitly convert it to CHAR/VARCHAR or let SQL Server do it since implicit conversions should always be avoided that means a lot of extra work for you and the query processor any time the code is used.
Assuming you decide to go with a textual data type the question then is should you use VARCHAR or CHAR and while many who have posted say VARCHAR I would suggest you go with CHAR set to a length of 4. WHY?
The VARCHAR data type is for textual data in which the size (the length or number of characters) is unknown in advance. For this Medicare code we know the length will always be at least 4 and possibly no more than 4 for the foreseeable future. SQL Server handles the storage of the data differently between CHAR and VARCHAR. SQL Server’s BOL (Books On Line) says :
Use CHAR when the size of the column data entries are consistent
Use VARCHAR when the size of the column data varies considerably.
I can’t say for certain this is true for SQL Server 2008 and up but for earlier versions, the use of a VARCHAR data type carries an extra overhead of 1 byte per row of data per column in a table that has a VARCHAR data type. If the data stored is always the same size and in your scenario it sounds like it is then this extra byte is a waste.
In the end it’s up to you as to whether you like CHAR or VARCHAR better but definitely don’t use a numeric data type to store a fixed length code.
That's not numeric data; it's textual data that happens to contain digits.
Use a VARCHAR.
I agree, using
CHAR(4)
for the check constraint use
check( APC_ODE LIKE '[0-9][0-9][0-9][0-9]' )
This will force a 4 digit number only to be accepted...
varchar(4)
optionally, you can still add a check constraint to ensure the data is numeric with leading zeros. This example will throw exceptions in Oracle. In other RDBMS, you could use regular expression checks:
alter table X add constraint C
check (cast(APC_CODE as int) = cast(APC_CODE as int))
If you are certain that the APC codes will always be numeric (that is if it wouldn't change in the near future), a better way would be to leave the database column as is, and handle the formatting (to include leading zeros) at places where you use this field values.
If you need leading 0s, then you must use a varchar or other string data type.
There are ways to format the output for leading 0s without compromising your actual data.
See this blog entry for an easy method.
CHAR(4) seems more appropriate to me (if I understood you right, and the code is always 4 digits).
What you want to use is a VARCHAR data type with a CHECK constraint, using LIKE with a pattern to check for numeric values.
in TSQL
check( isnumeric(APC_ODE) = 1)

Theory of storing a number and text in same SQL field

I have a three tables
Results:
TestID
TestCode
Value
Tests:
TestID
TestType
SysCodeID
SystemCodes
SysCodeID
ParentSysCodeID
Description
The question I have is for when the user is entering data into the results table.
The formatting code when the row gets the focus changes the value field to a dropdown combobox if the testCode is of type SystemList. The drop down has a list of all the system codes that have a parentsyscodeID of the test.SysCodeID. When the user chooses a value in the list it translates into a number which goes into the value field.
The datatype of the Results.Value field is integer. I made it an integer instead of a string because when reporting it is easier to do calculations and sorting if it is a number. There are issues if you are putting integer/decimal value into a string field. As well, when the system was being designed they only wanted numbers in there.
The users now want to put strings into the value field as well as numbers/values from a list and I'm wondering what the best way of doing that would be.
Would it be bad practice to convert the field over to a string and then store both strings and integers in the same field? There are different issues related to this one but i'm not sure if any are a really big deal.
Should I add another column into the table of string datatype and if the test is a string type then put the data the user enters into the different field.
Another option would be to create a 1-1 relationship to another table and if the user types in a string into the value field it adds it into the new table with a key of a number.
Anyone have any interesting ideas?
What about treating Results.Value as if it were a numeric ValueCode that becomes an foreign key referencing another table that contains a ValueCode and a string that matches it.
CREATE TABLE ValueCodes
(
Value INTEGER NOT NULL PRIMARY KEY,
Meaning VARCHAR(32) NOT NULL UNIQUE
);
CREATE TABLE Results
(
TestID ...,
TestCode ...,
Value INTEGER NOT NULL FOREIGN KEY REFERENCES ValueCodes
);
You continue storing integers as now, but they are references to a limited set of values in the ValueCodes table. Most of the existing values appear as an integer such as 100 with a string representing the same value "100". New codes can be added as needed.
Are you saying that they want to do free-form text entry? If that's the case, they will ruin the ability to do meaningful reporting on the field, because I can guarantee that they will not consistently enter the strings.
If they are going to be entering one of several preset strings (for example, grades of A, B, C, etc.) then make a lookup table for those strings which maps to numeric values for sorting, evaluating, averaging, etc.
If they really want to be able to start entering in free-form text and you can't dissuade them from it, add another column along the lines of other_entry. Have a predefined value that means "other" to put in your value column. That way, when you're doing reporting you can either roll up all of those random "other" values or you can simply ignore them. Make sure that you add the "other" into your SystemCodes table so that you can keep a foreign key between that and the Results table. If you don't already have one, then you should definitely consider adding one.
Good luck!
The users now want to put strings into
the value field as well as
numbers/values from a list and I'm
wondering what the best way of doing
that would be.
It sounds like the users want to add new 'testCodes'. If that is the case why not just add them to your existing testcode table and keep your existing format.
Would it be bad practice to convert
the field over to a string and then
store both strings and integers in
the same field? There are different
issues related to this one but i'm not
sure if any are a really big deal.
No it's not a big deal. Often PO numbers or Invoice numbers have numbers or a combination of letters and numbers. You are right however about the performance of the database on a number field as opposed to a string, but if you index the string field you end up with the database doing it's scans on numeric indexes anyway.
The problems you may have had with your decimals as strings probably have to do with the floating point data types in which the server essentially estimates the value of the field and only retains accuracy to a certain number of digits. This can lead to a whole host of rounding errors if you are concerned about the digits. You can avoid that issue by using currency fields or the like that have static accuracy of the decimals. lol I learned this the hard way.
Tom H. did a great job addressing everything else.
I think the easiest way to do it would be to convert Results.Value to a "string" (char, varchar, whatever). Yes, this ruins the ability to do numeric sorting (and you won't be able to do a cast or convert on the column any longer since text will be intermingled with integer values), but I think any other method would be too complex to maintain properly. (For example, in the 1-1 case you mentioned, is that integer value the actual value or a foreign key to the string table? Now we need another column to determine that.)
I would create the extra column for string values. It's not true normalization but it's the easiest to implement and to work with.
Using the same field for both numbers and strings would work to as long as you don't plan on doing anything with the numbers like summing or sorting.
The extra table approach while good from a normalization standpoint is probably overly complex.
I'd convert the value field to string and add a column indicating what the datatype should be treated as for post processing and reporting.
Sql Server at least has an IsNumeric function you can use:
ORDER BY IsNumeric(Results.Value) DESC,
CASE WHEN IsNumeric(Results.Value) = 1 THEN Len(Results.Value) ELSE 99 END,
Results.Value
One of two solutions comes to mind. It kind of depends on what you're doing with the numbers. If they just represent a choice of some kind, then pick one. If you need to do math on it (sorting, conversion, etc..) then pick another.
Change the column to be a varchar, and then either put numbers or text in it. Sorting numerically will suck, but hey, it's one column.
Have both a varchar column for the text, and an int column for the number. Use a view to hide the differences, and to control the sorting if necessary. You can coalesce the two columns together if you don't care about whether you're looking at numbers or text.

Force numerical order on a SQL Server 2005 varchar column, containing letters and numbers?

I have a column containing the strings 'Operator (1)' and so on until 'Operator (600)' so far.
I want to get them numerically ordered and I've come up with
select colname from table order by
cast(replace(replace(colname,'Operator (',''),')','') as int)
which is very very ugly.
Better suggestions?
It's that, InStr()/SubString(), changing Operator(1) to Operator(001), storing the n in Operator(n) separately, or creating a computed column that hides the ugly string manipulation. What you have seems fine.
If you really have to leave the data in the format you have - and adding a numeric sort order column is the better solution - then consider wrapping the text manipulation up in a user defined function.
select colname from table order by dbo.udfSortOperator(colname)
It's less ugly and gives you some abstraction. There's an additional overhead of the function call but on a table containing low thousands of rows in a not-too-heavily hit database server it's not a major concern. Make notes in the function to optomise later as required.
My answer would be to change the problem. I would add an operatorNumber field to the table if that is possible. Change the update/insert routines to extract the number and store it. That way the string conversion hit is only once per record.
The ordering logic would require the string conversion every time the query is run.
Well, first define the meaning of that column. Is operator a name so you can justify using chars? Or is it a number?
If the field is a name then you will use chars, and then you would want to determine the fixed length. Pad all operator names with zeros on the left. Define naming rules for operators (I.E. No leters. Or the codes you would use in a series like "A001")
An index will sort the physical data in the server. And a properly define text naming will sort them on a query. You would want both.
If the operator is a number, then you got the data type for that column wrong and needs to be changed.
Indexed computed column
If you find yourself ordering on or otherwise querying operator column often, consider creating a computed column for its numeric value and adding an index for it. This will give you a computed/persistent column (which sounds like oxymoron, but isn't).