Convert any string to an integer - sql

Simply put, I'd like to be able to convert any string to an integer, preferably being able to restrict the size of the integer and ensure that the result is always identical. In other words is there a hashing function, supported by Oracle, that returns a numeric value and can that value have a maximum?
To provide some context if needed, I have two tables that have the following, simplified, format:
Table 1 Table 2
id | sequence_number id | sequence_number
-------------------- -------------
1 | 1 1 | 2QD44561
1 | 2 1 | 6HH00244
2 | 1 2 | 5DH08133
3 | 1 3 | 7RD03098
4 | 2 4 | 8BF02466
The column sequence_number is number(3) in Table 1 and varchar2(11) in Table 2; it is part of the primary key in both tables.
The data is externally provided and cannot be changed; in Table 1 it is, I believe, created by a simple sequence but in Table 2 has a meaning. The data is made up but representative.
Someone has promised that we would output a number(3) field. While this is fine for the column in the first table, it causes problems for the second.
I would like to be able to both convert sequence_number to an integer (easy), that is less than 1000 (harder) and if at all possible is constant (seemingly impossible). This means that I would like '2QD44561' to always return 586. It does not matter, much, if two strings return the same number.
Simply converting to an integer I can use utl_raw.cast_to_number():
select utl_raw.cast_to_number((utl_raw.cast_to_raw('2QD44561'))) from dual;
UTL_RAW.CAST_TO_NUMBER((UTL_RAW.CAST_TO_RAW('2QD44561')))
---------------------------------------------------------
-2.033E+25
But as you can see this isn't less than 1000
I've also been playing around with dbms_crypto and utl_encode to see if I could come up with something but I've not managed to get a small integer. Is there a way?

How about ora_hash?
select ora_hash(sequence_number, 999) from table_2;
... will produce a maximum of 3 digits. You could also seed it with the id I suppose, but not sure that adds much with so few values, and I'm not sure you'd want that anyway.

You are talking about using a hash function. There are lots of solutions out there - sha1 is very common.
But just FYI, when you say "restrict the size of the integer" understand that you will then be mapping an infinite set of strings or numbers onto a limited set of values. So while your strings will always map to the same value when they are the same, they will not be the only string to map to that value

Related

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.
Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

How would SQL table store Map Legends?

I am not asking which Map legend classification to choose. But assume I have picked one, how would I store it properly in the SQL tables? The reason of asking is, in the end, I need to store more info than I expected... Hope someone could verify.
Consider 3 cases below:
Numeric, Range
Numeric, Single Value
Alphabet, Single Value
To be able to store it properly, SO THAT I am able to do logic classification at real time (do coloring), meaning, I need to store the operators as well (less than [lt], greater than [gt], equal [eq], etc)
Ends up, Say I have 2 tables:
LegendSetup
Column:
1. LegendKey (int)
2. Type (varchar)
3. Min (decimal)
4. Max (decimal)
LegendValueSetup
Column:
1. ValueKey (int) //AutoIncrement PK
2. LegendKey (int) //FK
3. RangeNumeric (decimal) //numeric
4. RangeAlpha (varchar) //alphabet
5. RangeOperator (varchar) //eq, lt, gt
6. RangeShow (varchar) //for display purpose
7. HexColor (varchar)
Is that how it normally works?
As you only need ranges and equalities I suggest you don't use explicit operators, but ranges like this:
CREATE TABLE LegendValueSetup (
ValueAlpha varchar,
ValueNumeric decimal,
LowerNumeric decimal,
UpperNumeric decimal,
... -- other columns
)
For your examples 2 and 3 you would just store the explicit values in the ValueAlpha and ValueNumeric column.
For example 1 you would have to think about if LowerNumeric is an inclusive and UpperNumeric an exclusive bound and vice versa. Then you can store the legend like this:
LowerNumeric | UpperNumeric | Color
NULL | 100 | Black
100 | 200 | Red
200 | 400 | Orange
400 | 600 | Yellow
600 | NULL | Green
If you have now a value and want to get the color, you only have to query
SELECT color FROM LegendValueSetup WHERE LowerNumeric <= $value AND UpperNumeric > $value
You can of course go further and split the table into three different ones, one for each legend type.

SQL composite key value vs string

I have a list of integer from 1 to N elements (N < 24)
At the moment, there are two solutions to manage this value in a SQL database (I think it is the same for MySQL and Microsoft SQL Server)
Solution 1: use VARCHAR and , to separate integer values:
aaa | 40,50,50,10,600,200
aab | 40,50,600,200
aac | 40,50,50,10,600,200,500,1
Solution 2: create a new table with composite primary key (key, id) (id = index of element in list) and value:
aaa | 0 | 40
aaa | 1 | 50
aaa | 2 | 50
....
aab | 0 | 40
aab | 1 | 50
aab | 2 | 600
....
What is it better solution considering I have many items of data to load and I need to refresh this data many times
Thanks
Edit:
my operative case is: i need to refresh/read all data (list for key) with same call and i never call one by one, this is why i think first approach better.
And all math like avg or max i wanna do on client.
Usually the second approach is preferable. One advantage is ease of access:
-- Third value of aaa
select value from mytable where key = 'aaa' and pos = 3;
-- Avarage value of aaa
select avg(value) from mytable where key = 'aaa';
-- Avarage number of values
select avg(cnt) from (select count(*) as cnt from mytable group by key) counted;
Another is data consistency. You can add simple constraints to your columns, such as to allow only integers from, say, 1 to 700 and positions only up to 23.
There is an exception to the above, though. If you use the database only to store the list as is and you don't want to select separate values or even aggregate them, i.e. if this is just a string to the DBMS and your queries don't care about its content, then store it as a simple string. Why not?
The second solution that you propose is the classic way of doing this, I would recommend that.
The first solution is quite terrible in scaling and in other hundred things

SQL: Find highest number if its in nvarchar format containing special characters

I need to pull the record containing the highest value, specifically I only need the value from that field. The problem is that the column is nvarchar format that contains a mix of numbers and special characters. The following is just an example:
PK | Column 2 (nvarchar)
-------------------
1 | .1.1.
2 | .10.1.1
3 | .5.1.7
4 | .4.1.
9 | .10.1.2
15 | .5.1.4
Basically, because of natural sort, the items in column 2 are sorted as strings. So instead of returning the PK for the row containing ".10.1.2" as the highest value i get the PK for the row that contains ".5.1.7" instead.
I attempted to write some functions to do this but it seems what I've written looked way more complicated than it should be. Anyone got something simple or complicated functions are the only way?
I want to make clear that I'm trying to grab the PK of the record that contains the highest Column 2 value.
This query might return what you desire
SELECT MAX(CAST(REPLACE(Column2, '.', '') as INT)) FROM table

Query Performance with NULL

I would like to know about how NULL values affect query performance in SQL Server 2005.
I have a table similar to this (simplified):
ID | ImportantData | QuickPickOrder
--------------------------
1 | 'Some Text' | NULL
2 | 'Other Text' | 3
3 | 'abcdefg' | NULL
4 | 'whatever' | 4
5 | 'it is' | 2
6 | 'technically' | NULL
7 | 'a varchar' | NULL
8 | 'of course' | 1
9 | 'but that' | NULL
10 | 'is not' | NULL
11 | 'important' | 5
And I'm doing a query on it like this:
SELECT *
FROM MyTable
WHERE QuickPickOrder IS NOT NULL
ORDER BY QuickPickOrder
So the QuickPickOrder is basically a column used to single out some commonly chosen items from a larger list. It also provides the order in which they will appear to the user. NULL values mean that it doesn't show up in the quick pick list.
I've always been told that NULL values in a database are somehow evil, at least from a normalization perspective, but is it an acceptable way to filter out unwanted rows in a WHERE constraint?
Would it be better to use specific number value, like -1 or 0, to indicate items that aren't wanted? Are there other alternatives?
EDIT:
The example does not accuratly represent the ratio of real values to NULLs. An better example might show at least 10 NULLs for every non-NULL. The table size might be 100 to 200 rows. It is a reference table so updates are rare.
SQL Server indexes NULL values, so this will most probably just use the Index Seek over an index on QuickPickOrder, both for filtering and for ordering.
Another alternative would be two tables:
MyTable:
ID | ImportantData
------------------
1 | 'Some Text'
2 | 'Other Text'
3 | 'abcdefg'
4 | 'whatever'
5 | 'it is'
6 | 'technically'
7 | 'a varchar'
8 | 'of course'
9 | 'but that'
10 | 'is not'
11 | 'important'
QuickPicks:
MyTableID | QuickPickOrder
--------------------------
2 | 3
4 | 4
5 | 2
8 | 1
11 | 5
SELECT MyTable.*
FROM MyTable JOIN QuickPicks ON QuickPickOrder.MyTableID = MyTable.ID
ORDER BY QuickPickOrder
This would allow updating QuickPickOrder without locking anything in MyTable or logging a full row transaction for that table. So depending how big MyTable is, and how often you are updating QuickPickOrder, there may be a scalability advantage.
Also, having a separate table will allow you to add a unique index on QuickPickOrder to ensure no duplication, and could be more easily scaled later to allow different kinds of QuickPicks, having them specific to certain contexts or users, etc.
They do not have a negative performance hit on the database. Remember, NULL is more of a state than a value. Checking for NOT NULL vs setting that value to a -1 makes no difference other than the -1 is probably breaking your data integrity, imo.
SQL Server's performance can be affected by using NULLS in your database. There are several reasons for this.
First, NULLS that appear in fixed length columns (CHAR) take up the entire size of the column. So if you have a column that is 25 characters wide, and a NULL is stored in it, then SQL Server must store 25 characters to represent the NULL value. This added space increases the size of your database, which in turn means that it takes more I/O overhead to find the data you are looking for. Of course, one way around this is to use variable length fields instead. When NULLs are added to a variable length column, space is not unnecessarily wasted as it is with fixed length columns.
Second, use of the IS NULL clause in your WHERE clause means that an index cannot be used for the query, and a table scan will be performed. This can greatly reduce performance.
Third, the use of NULLS can lead to convoluted Transact-SQL code, which can mean code that doesn't run efficiently or that is buggy.
Ideally, NULLs should be avoided in your SQL Server databases.
Instead of using NULLs, use a coding scheme similar to this in your databases:
NA: Not applicable
NYN: Not yet known
TUN: Truly unknown
Such a scheme provides the benefits of using NULLs, but without the drawbacks.
NULL looks fine to me for this purpose. Performance is likely to be basically the same as with a non-null column and constant value, or maybe even better for filtering out all NULLs.
The alternative is to normalize QuickPickOrder into a table with a foreign key, and then perform an inner join to filter the nulls out (or a left join with a where clause to filter the non-nulls out).
NULL looks good to me as well. SQL Server has many kinds of indices to choose from. I forget which ones do this, but some only index values in a given range. If you had that kind of index on the column being tested, the NULL valued records would not be in the index, and the index scan would be fast.
Having a lot of NULLs in a column which has an index on it (or starting with it) is generally beneficial to this kind of query.
NULL values are not entered into the index, which means that inserting / updating rows with NULL in there doesn't take the performance hit of having to update another secondary index. If, say, only 0.001% of your rows have a non-null value in that column, the IS NOT NULL query becomes pretty efficient as it just scans a relatively small index.
Of course all of this is relative, if your table is tiny anyway, it makes no appreciable difference.