PL/SQL Comparing bit strings, I need an AND operation? - sql

I currently have a script that calculates the tanimoto coefficient on the fingerprints of a chemical library. However during testing I found my implementation could not be feasibly scaled up due to my method of comparing the bit strings (It just takes far too long). See below. This is the loop I need to improve. I've simplified this so it is just looking at two structures the real script does permutations about the dataset of structures, but that would over complicate the issue I have here.
LOOP
-- Find the NA bit
SELECT SUBSTR(qsar_kb.fingerprint.fingerprint, var_fragment_id ,1) INTO var_na FROM qsar_kb.fingerprint where project_id = 1 AND structure_id = 1;
-- FIND the NB bit
SELECT SUBSTR(qsar_kb.fingerprint.fingerprint, var_fragment_id ,1) INTO var_nb FROM qsar_kb.fingerprint where project_id = 1 AND structure_id = 2;
-- Test for both bits the same
IF var_na > 0 AND var_nb > 0 then
var_tally := var_tally + 1;
END IF;
-- Test for bit in A on and B off
IF var_na > var_nb then
var_tna := var_tna + 1;
END IF
-- Test for bit in B on and A off.
IF var_nb > var_na then
var_tnb := var_tnb + 1;
END IF;
var_fragment_id := var_fragment_id + 1;
EXIT WHEN var_fragment_id > var_maxfragment_id;
END LOOP;
For a simple example
Structure A = '101010'
Structure B = '011001'
In my real data set the length of the binary is 500 bits and up.
I need to know:
1)The number of bits ON common to Both
2)The number of bits ON in A but off in B
3)The number of bits ON in B but off in B
So in this case
1) = 1
2) = 2
3) = 2
Ideally I want to change how I'm doing this. I don't want to be steeping though each bit in each string its just too time expensive when I scale the whole system up with thousands of structures each with fingerprint bit strings in the length order of 500-1000
My logic to fix this would be to:
Take the total number of bits ON in both
A) = 3
B) = 3
Then perform an AND operation and find how many bits are on in both
= 1
Then just subtract this from the totals to find the number of bits on in one but not the other.
So how can I perform an AND like operation on two strings of 0's and 1's to find the number of common 1's?

Check out the BITAND function.
The BITAND function treats its inputs and its output as vectors of bits; the output is the bitwise AND of the inputs.
However, according to the documentation, this only works for 2^128

You should move the SELECT out of the loop. I'm pretty sure you're spending 99% of the time selecting 1 bit 500 times where you could select 500 bits in one go and then loop through the string:
DECLARE
l_structure_a LONG;
l_structure_b LONG;
var_na VARCHAR2(1);
var_nb VARCHAR2(1);
BEGIN
SELECT MAX(decode(structure_id, 1, fingerprint)),
MAX(decode(structure_id, 2, fingerprint))
INTO l_structure_a, l_structure_b
FROM qsar_kb.fingerprint
WHERE project_id = 1
AND structure_id IN (1,2);
LOOP
var_na := substr(l_structure_a, var_fragment_id, 1);
var_nb := substr(l_structure_b, var_fragment_id, 1);
-- Test for both bits the same
IF var_na > 0 AND var_nb > 0 THEN
var_tally := var_tally + 1;
END IF;
-- Test for bit in A on and B off
IF var_na > var_nb THEN
var_tna := var_tna + 1;
END IF;
-- Test for bit in B on and A off.
IF var_nb > var_na THEN
var_tnb := var_tnb + 1;
END IF;
var_fragment_id := var_fragment_id + 1;
EXIT WHEN var_fragment_id > var_maxfragment_id;
END LOOP;
END;
Edit:
You could also do it in a single SQL statement:
SQL> WITH DATA AS (
2 SELECT '101010' fingerprint,1 project_id, 1 structure_id FROM dual
3 UNION ALL SELECT '011001', 1, 2 FROM dual),
4 transpose AS (SELECT ROWNUM fragment_id FROM dual CONNECT BY LEVEL <= 1000)
5 SELECT COUNT(CASE WHEN var_na = 1 AND var_nb = 1 THEN 1 END) nb_1,
6 COUNT(CASE WHEN var_na > var_nb THEN 1 END) nb_2,
7 COUNT(CASE WHEN var_na < var_nb THEN 1 END) nb_3
8 FROM (SELECT to_number(substr(struct_a, fragment_id, 1)) var_na,
9 to_number(substr(struct_b, fragment_id, 1)) var_nb
10 FROM (SELECT MAX(decode(structure_id, 1, fingerprint)) struct_a,
11 MAX(decode(structure_id, 2, fingerprint)) struct_b
12 FROM DATA
13 WHERE project_id = 1
14 AND structure_id IN (1, 2))
15 CROSS JOIN transpose);
NB_1 NB_2 NB_3
---------- ---------- ----------
1 2 2

I'll sort of expand on the answer from Lukas with a little bit more information.
A little bit of an internet search revealed code from Tom Kyte (via Jonathan Lewis) to convert between bases. There is a function to_dec which will take a string and convert it to a number. I have reproduced the code below:
Convert base number to decimal:
create or replace function to_dec(
p_str in varchar2,
p_from_base in number default 16) return number
is
l_num number default 0;
l_hex varchar2(16) default '0123456789ABCDEF';
begin
for i in 1 .. length(p_str) loop
l_num := l_num * p_from_base + instr(l_hex,upper(substr(p_str,i,1)))-1;
end loop;
return l_num;
end to_dec;
Convert decimal to base number:
create or replace function to_base( p_dec in number, p_base in number )
return varchar2
is
l_str varchar2(255) default NULL;
l_num number default p_dec;
l_hex varchar2(16) default '0123456789ABCDEF';
begin
if ( trunc(p_dec) <> p_dec OR p_dec < 0 ) then
raise PROGRAM_ERROR;
end if;
loop
l_str := substr( l_hex, mod(l_num,p_base)+1, 1 ) || l_str;
l_num := trunc( l_num/p_base );
exit when ( l_num = 0 );
end loop;
return l_str;
end to_base;
This function can be called to convert the string bitmap into a number which can then be used with bitand. An example of this would be:
select to_dec('101010', 2) from dual
Oracle only really provides BITAND (and BIT_TO_NUM which isn't really relevant here` as a way of doing logical operations but the operations required here are (A AND B), (A AND NOT B) and (NOT A AND B). So we need a var of converting A to NOT A. A simple way of doing this is to use translate.
So.... the final outcome is:
select
length(translate(to_base(bitand(data_A, data_B),2),'10','1')) as nb_1,
length(translate(to_base(bitand(data_A, data_NOT_B),2),'10','1')) as nb_2,
length(translate(to_base(bitand(data_NOT_A, data_B),2),'10','1')) as nb_3
from (
select
to_dec(data_A,2) as data_A,
to_dec(data_b,2) as data_B,
to_dec(translate(data_A, '01', '10'),2) as data_NOT_A,
to_dec(translate(data_B, '01', '10'),2) as data_NOT_B
from (
select '101010' as data_A, '011001' as data_B from dual
)
)
This is somewhat more complicated than I was hoping when I started writing this answer but it does seem to work.

Can be done pretty simply with something like this:
SELECT utl_raw.BIT_AND( t.A, t.B ) SET_IN_A_AND_B,
length(replace(utl_raw.BIT_AND( t.A, t.B ), '0', '')) SET_IN_A_AND_B_COUNT,
utl_raw.BIT_AND( t.A, utl_raw.bit_complement(t.B) ) ONLY_SET_IN_A,
length(replace(utl_raw.BIT_AND( t.A, utl_raw.bit_complement(t.B) ),'0','')) ONLY_SET_IN_A_COUNT,
utl_raw.BIT_AND( t.B, utl_raw.bit_complement(t.A) ) ONLY_SET_IN_A,
length(replace(utl_raw.BIT_AND( t.B, utl_raw.bit_complement(t.A) ),'0','')) ONLY_SET_IN_A_COUNT
FROM (SELECT '1111100000111110101010' A, '1101011010101010100101' B FROM dual) t
Make sure your bit string has an even length (just pad it with a zero if it has an odd length).

Related

decimal to binary 2's complement oracle sql

Convert a number to Binary's 2's compliment-
I have a sample number in a column of oracle table - 1647795600
I want to convert this to binary 2's compliment.
Expected output-01100010001101110101110110010000
Reference link - https://www.rapidtables.com/convert/number/decimal-to-binary.html
You can create the function:
CREATE FUNCTION dec_to_2c_bin(
value IN PLS_INTEGER,
width IN PLS_INTEGER := 32
) RETURN VARCHAR2 DETERMINISTIC
IS
v_unsigned PLS_INTEGER;
v_binary VARCHAR2(201);
BEGIN
IF value < 0 THEN
v_unsigned := -1 - value;
ELSE
v_unsigned := value;
END IF;
WHILE ( v_unsigned > 0 ) LOOP
v_binary := MOD(v_unsigned, 2) || v_binary;
v_unsigned := TRUNC( v_unsigned / 2 );
END LOOP;
IF LENGTH(v_binary) > width - 1 THEN
RAISE_APPLICATION_ERROR(-20000, 'The value is too large.');
END IF;
v_binary := LPAD(v_binary, width, '0');
IF value < 0 THEN
RETURN TRANSLATE(v_binary, '01', '10');
ELSE
RETURN v_binary;
END IF;
END;
/
Then for the sample data:
CREATE TABLE table_name (value) AS
SELECT +1647795600 FROM DUAL UNION ALL
SELECT -1647795600 FROM DUAL UNION ALL
SELECT 25143 FROM DUAL UNION ALL
SELECT +3142 FROM DUAL UNION ALL
SELECT -3142 FROM DUAL;
The query:
SELECT value, dec_to_2c_bin(value, 64) AS binary2c
FROM table_name;
Outputs:
VALUE
BINARY2C
1647795600
0000000000000000000000000000000001100010001101110101110110010000
-1647795600
1111111111111111111111111111111110011101110010001010001001110000
25143
0000000000000000000000000000000000000000000000000110001000110111
3142
0000000000000000000000000000000000000000000000000000110001000110
-3142
1111111111111111111111111111111111111111111111111111001110111010
db<>fiddle here
I found a statement which worked fine for sample number tested-
select reverse(max(replace(sys_connect_by_path(mod(trunc(&N/power(2,level-1)),2),' '),' ',''))) bin
from dual
connect by level <= 32
;
But might not completely useful, need to modify to read from table.

Oracle function to compare strings in a not ordered way

I need a function to make a comparison between two strings withouth considering the order in oracle.
i.e. "asd" and "sad" should be considered as equal.
Are there similar functions? Or I need to write my own function?
This can be done with a simple java function to sort the characters of a string alphabetically:
CREATE AND COMPILE JAVA SOURCE NAMED SORTSTRING AS
public class SortString {
public static String sort( final String value )
{
final char[] chars = value.toCharArray();
java.util.Arrays.sort( chars );
return new String( chars );
}
};
/
Which you can then create a PL/SQL function to invoke:
CREATE FUNCTION SORTSTRING( in_value IN VARCHAR2 ) RETURN VARCHAR2
AS LANGUAGE JAVA NAME 'SortString.sort( java.lang.String ) return java.lang.String';
/
Then you can do a simple comparison on the sorted strings:
SELECT CASE
WHEN SORTSTRING( 'ads' ) = SORTSTRING( 'das' )
THEN 'Equal'
ELSE 'Not Equal'
END
FROM DUAL;
Not exactly a rocket science, but works (kind of, at least on simple cases).
What does it do? Alphabetically sorts letters in every string and compares them.
SQL> with test (col1, col2) as
2 (select 'asd', 'sad' from dual),
3 inter as
4 (select
5 col1, regexp_substr(col1, '[^.]', 1, level) c1,
6 col2, regexp_substr(col2, '[^.]', 1, level) c2
7 from test
8 connect by level <= greatest(length(col1), length(col2))
9 ),
10 agg as
11 (select listagg(c1, '') within group (order by c1) col1_new,
12 listagg(c2, '') within group (order by c2) col2_new
13 from inter
14 )
15 select case when col1_new = col2_new then 'Equal'
16 else 'Different'
17 end result
18 From agg;
RESULT
---------
Equal
SQL> with test (col1, col2) as
2 (select 'asd', 'sadx' from dual),
<snip>
RESULT
---------
Different
SQL>
Yet another solution, using the SUBSTR function and CONNECT BY loop.
SQL Fiddle
Query 1:
WITH a
AS (SELECT ROWNUM rn, a1.*
FROM ( SELECT SUBSTR ('2asd', LEVEL, 1) s1
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('2asd')
ORDER BY s1) a1),
b
AS (SELECT ROWNUM rn, a2.*
FROM ( SELECT SUBSTR ('asd2', LEVEL, 1) s2
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('asd2')
ORDER BY s2) a2)
SELECT CASE COUNT (NULLIF (s1, s2)) WHEN 0 THEN 'EQUAL' ELSE 'NOT EQUAL' END
res
FROM a INNER JOIN b ON a.rn = b.rn
Results:
| RES |
|-------|
| EQUAL |
EDIT : A PL/SQL Sort function for alphanumeric strings.
CREATE OR replace FUNCTION fn_sort(str VARCHAR2)
RETURN VARCHAR2 DETERMINISTIC AS
v_s VARCHAR2(4000);
BEGIN
SELECT LISTAGG(substr(str, LEVEL, 1), '')
within GROUP ( ORDER BY substr(str, LEVEL, 1) )
INTO v_s
FROM dual
CONNECT BY LEVEL < = length(str);
RETURN v_s;
END;
/
select fn_sort('shSdf3213Js') as s
from dual;
| S |
|-------------|
| 1233JSdfhss |
In case you want to create your own sort function, you can use below code,
CREATE OR REPLACE FUNCTION sort_text (p_text_to_sort VARCHAR2) RETURN VARCHAR2
IS
v_sorted_text VARCHAR2(1000);
BEGIN
v_sorted_text := p_text_to_sort;
FOR i IN 1..LENGTH(p_text_to_sort)
LOOP
FOR j IN 1..LENGTH(p_text_to_sort)
LOOP
IF SUBSTR(v_sorted_text, j, 1)||'' > SUBSTR(v_sorted_text, j+1, 1)||'' THEN
v_sorted_text := SUBSTR(v_sorted_text, 1, j-1)||
SUBSTR(v_sorted_text, j+1, 1)||
SUBSTR(v_sorted_text, j, 1)||
SUBSTR(v_sorted_text, j+2);
END IF;
END LOOP;
END LOOP;
RETURN v_sorted_text;
END;
/
SELECT SORT_TEXT('zlkdsadfsdfasdf') SORTED_TEXT
FROM dual;
SORTED_TEXT
---------------
aaddddfffklsssz

Number System Conversion - Base 10 to base x using SQL statements only

I want to stop making the same performance mistakes and need someone way steadier on SQL statements than me on this one.
Basically I want my function:
create or replace FUNCTION SEQGEN(vinp in varchar2, iSeq in INTEGER)
RETURN VARCHAR2 is vResult VARCHAR2(32);
iBas INTEGER; iRem INTEGER; iQuo INTEGER; lLen CONSTANT INTEGER := 2;
BEGIN
iBas := length(vInp);
iQuo := iSeq;
WHILE iQuo > 0 LOOP
iRem := iQuo mod iBas;
--dbms_output.put_line('Now we divide ' || lpad(iQuo,lLen,'0') || ' by ' || lpad(iBas,lLen,'0') || ', yielding a quotient of ' || lpad( TRUNC(iQuo / iBas) ,lLen,'0') || ' and a remainder of ' || lpad(iRem,lLen,'0') || ' giving the char: ' || substr(vInp, iRem, 1)); end if;
iQuo := TRUNC(iQuo / iBas);
If iRem < 1 Then iRem := iBas; iQuo := iQuo - 1; End If;
vResult := substr(vInp, iRem, 1) || vResult;
END LOOP;
RETURN vResult;
END SEQGEN;
to be written with SQL statements only.
Something like:
WITH sequence ( vResult, lSeq ) AS
(
SELECT str, length(str) base FROM (SELECT 'abc' str FROM DUAL)
)
SELECT vResult FROM sequence WHERE lSeq < 13
if str = 'aB' output: if str = 'abC' output:
1 a a
2 B b
3 a a C
4 a B a a
5 B a a b
6 B B a C
7 a a a b a
8 a a B b b
9 a B a b C
10 a B B C a
11 B a a C b
12 B a B C C
13 B B a a a a
and if str = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' then:
SELECT vResult FROM sequence
WHERE vResult in ('0001','0002','0009','000A','000Z',
'0010','0011','001A','ZZZZ') --you get the idea...
I have found some questions at Stackoverflow that are fairly related.
Base 36 to Base 10 conversion using SQL only and
PL/SQL base conversion without functions.
But with my current knowledge of SQL I am not quite able to hack this one...
EDITED:
Or well, it is almost like this one:
select sum(position_value) pos_val from (
select power(base,position-1) * instr('abc', digit) as position_value from (
select substr(input_string,length(input_string)+1-level,1) digit, level position, length(input_string) base
from (select 'cc' input_string from dual)
connect by level <= length(input_string)
)
)
Excepted that I want to give the pos_val as a parameter and get the input_string out...
Here is one way. Another cool usage of the model function in Oracle SQL.

Incrementing values in table using PL/SQL?

In my query I'm using a for loop, which displays 1000 three times. I have to increment 1000 for each iteration of the loop, i.e. 1001, 1002,.. same number three times i.e. I want to add into my table 1000,1000,1000,1001,1001,1001 and 1002,1002,1002,
declare
CPName varchar(20) :=1000;
a number;
begin
for a in 1 .. 3 loop
insert into clients values (CPName,null,null);
end loop;
end;
How can I do this?
CPName is a VARCHAR; I assume you want this to be a number, in which case you just add it on.
There's no need to define the variable a either, it's implicitly declared by the LOOP. I would call this i as it's a more common name for an index variable.
declare
CPName integer := 1000;
begin
for i in 1 .. 3 loop
insert into clients values (CPName + i, null, null);
end loop;
end;
You can do this all in a single SQL statement though; there's no need to use PL/SQL.
insert into clients
select 1000 + i, null, null
from dual
cross join ( select level as i
from dual
connect by level <= 3 )
Based on your comments you actually want something like this:
insert into clients
with multiply as (
select level - 1 as i
from dual
connect by level <= 3
)
select 1000 + m.i, null, null
from dual
cross join multiply m
cross join multiply
This will only work if you want the same number of records as you want to increase so maybe you'd prefer to do it this way, which will give you a lot more flexibility:
insert into clients
with increments as (
select level - 1 as i
from dual
connect by level <= 5
)
, iterations as (
select level as j
from dual
connect by level <= 3
)
select 1000 + m.i, null, null
from dual
cross join increments m
cross join iterations
Using your LOOP methodology this would involve a second, interior loop:
declare
CPName integer := 1000;
begin
for i in 1 .. 3 loop
for j in 1 .. 3 loop
insert into clients values (CPName + i, null, null);
end loop;
end loop;
end;

SQL sort by version "number", a string of varying length

I'm trying to create an SQL query that will order the results by a version number (e.g. 1.1, 4.5.10, etc.)
Here's what I tried:
SELECT * FROM Requirements
WHERE Requirements.Release NOT LIKE '%Obsolete%'
ORDER BY Requirements.ReqNum
Now, the ReqNum field is a string field and unfortunately I can't change it to a float or something like that because I have requirement numbers like 162.1.11.
When I get the results back, I'll get ordering like this:
1.1
1.10
1.11
1.3
How can I write a query that will sort by lexicographic order?
... or,
How can I correctly sort the data?
Thanks for the input in advance!
In PostgreSQL you can do:
SELECT * FROM Requirements
ORDER BY string_to_array(version, '.')::int[];
This last ::int[] makes it convert string values into integers and then compare as such.
For best results, refactor version number storage so that each section has it's own column: MajorVersion, MinorVersion, Revision, Build. Then the ordering problem suddenly becomes trivial. You can also build a computed column for easy retrieval of the full string.
SELECT * FROM Requirements
WHERE Requirements.Release NOT LIKE '%Obsolete%'
ORDER BY cast('/' + replace(Requirements.ReqNum , '.', '/') + '/' as hierarchyid);
A slight variation on #vuttipong-l answer (T-SQL)
SELECT VersionNumber
FROM (
SELECT '6.1.3' VersionNumber UNION
SELECT '6.11.3' UNION
SELECT '6.2.3' UNION
SELECT '6.1.12'
) AS q
ORDER BY cast('/' + VersionNumber + '/' as hierarchyid)
Works in SQL Server starting with 2008, dots are OK in a string representation of a hierarchyid column, so we don't need to replace them with slashes.
A quote from the doc:
Comparison is performed by comparing the integer sequences separated
by dots in dictionary order.
There's one caveat though: the version segments must not be prefixed with zeroes.
If you are in SQL Server land...
DECLARE #string varchar(40)
SET #string = '1.2.3.4'
SELECT PARSENAME(#string, 1), PARSENAME(#string, 2), PARSENAME(#string, 3), PARSENAME(#string, 4)
Results:
4, 3, 2, 1
Useful for parsing IP Addresses and other dotted items, such as a version number. (You can use REPLACE() to convert items into dotted notation too... e.g. 1-2-3-4 -> 1.2.3.4)
If you don't re-design the table as Joel Coehoorn sensibly suggests, then you need to re-format the version numbers to a string that sorts as you require, e.g.
1.1 -> 0001.0001.0000
162.1.11 -> 0162.0001.0011
This could be done by a function, or using a computed/virtual column if your DBMS has these. Then you can use that function or column in the ORDER BY clause.
The following function will take a version number and format each level out to 3 digits:
Usage:
select * from TableX order by dbo.fn_VersionPad(VersionCol1)
Function:
CREATE FUNCTION [dbo].[fn_VersionPad]
(
#version varchar(20)
)
RETURNS varchar(20)
AS
BEGIN
/*
Purpose: Pads multi-level Version Number sections to 3 digits
Example: 1.2.3.4
Returns: 001.002.003.004
*/
declare #verPad varchar(20)
declare #i int
declare #digits int
set #verPad = ''
set #i = len(#version)
set #digits = 0
while #i > 0
begin
if (substring(#version, #i, 1) = '.')
begin
while (#digits < 3)
begin
-- Pad version level to 3 digits
set #verPad = '0' + #verPad
set #digits = #digits + 1
end
set #digits = -1
end
set #verPad = substring(#version, #i, 1) + #verPad
set #i = #i - 1
set #digits = #digits + 1
end
while (#digits < 3)
begin
-- Pad version level to 3 digits
set #verPad = '0' + #verPad
set #digits = #digits + 1
end
return #verPad
END
You could split up the string (you already know the delimiters: ".") with CHARINDEX / SUBSTR and ORDER BY the different parts. Do it in a function or do it part by part.
It won't be pretty and it won't be fast: so if you need fast queries, follow Tony or Joel.
NOT USİNG CODE
Insert into #table
Select 'A1' union all
Select 'A3' union all
Select 'A5' union all
Select 'A15' union all
Select 'A11' union all
Select 'A10' union all
Select 'A2' union all
Select 'B2' union all
Select 'C2' union all
Select 'C22' union all
Select 'C221' union all
Select 'A7'
Select cod from #table
Order by LEN(cod),cod
Result :
A1
A2
A3
A5
A7
B2
C2
A10
A11
A15
C22
C221
It's simple as:
Declare #table table(id_ int identity(1,1), cod varchar(10))
Insert into #table
Select 'A1' union all
Select 'A3' union all
Select 'A5' union all
Select 'A15' union all
Select 'A11' union all
Select 'A10' union all
Select 'A2' union all
Select 'A7'
Select cod from #table
Order by LEN(cod),cod
On PostgreSQL, it couldn't be easier:
SELECT ver_no FROM version ORDER BY string_to_array(ver_no, '.', '')::int[]
This would work if you're using Microsoft SQL Server:
create function fnGetVersion (#v AS varchar(50)) returns bigint as
begin
declare #n as bigint;
declare #i as int;
select #n = 0;
select #i = charindex('.',#v);
while(#i > 0)
begin
select #n = #n * 1000;
select #n = #n + cast(substring(#v,1,#i-1) as bigint);
select #v = substring(#v,#i+1,len(#v)-#i);
select #i = charindex('.',#v);
end
return #n * 1000 + cast(#v as bigint);
end
Test by running this command:
select dbo.fnGetVersion('1.2.3.4')
That would return the number 1002003004 wich is perfectly sortable. Is you need 9.0.1 to be bigger than 2.1.2.3 then you would need to change the logic slightly. In my example 9.0.1 would be sorted before 2.1.2.3.
Function for PostgreSQL
Simply use
select *
from sample_table
order by _sort_version(column_version);
CREATE FUNCTION _sort_version (
p_version text
)
RETURNS text AS
$body$
declare
v_tab text[];
begin
v_tab := string_to_array(p_version, '.');
for i in 1 .. array_length(v_tab, 1) loop
v_tab[i] := lpad(v_tab[i], 4, '0');
end loop;
return array_to_string(v_tab, '.');
end;
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY DEFINER
COST 1;
I've had the same problem, though mine was with apartment numbers like A1, A2, A3, A10, A11, etc, that they wanted to sort "right". If splitting up the version number into separate columns doesn't work, try this PL/SQL. It takes a string like A1 or A10and expands it into A0000001, A0000010, etc, so it sorts nicely. Just call this in ORDER BY clause, like
select apt_num
from apartment
order by PAD(apt_num)
function pad(inString IN VARCHAR2)
return VARCHAR2
--This function pads the numbers in a alphanumeric string.
--It is particularly useful in sorting, things like "A1, A2, A10"
--which would sort like "A1, A10, A2" in a standard "ORDER BY name" clause
--but by calling "ORDER BY pkg_sort.pad(name)" it will sort as "A1, A2, A10" because this
--function will convert it to "A00000000000000000001, A00000000000000000002, A00000000000000000010"
--(but since this is in the order by clause, it will
--not be displayed.
--currently, the charTemplate variable pads the number to 20 digits, so anything up to 99999999999999999999
--will work correctly.
--to increase the size, just change the charTemplate variable. If the number is larger than 20 digits, it will just
--appear without padding.
is
outString VARCHAR2(255);
numBeginIndex NUMBER;
numLength NUMBER;
stringLength NUMBER;
i NUMBER;
thisChar VARCHAR2(6);
charTemplate VARCHAR2(20) := '00000000000000000000';
charTemplateLength NUMBER := 20;
BEGIN
outString := null;
numBeginIndex := -1;
numLength := 0;
stringLength := length(inString);
--loop through each character, get that character
FOR i IN 1..(stringLength) LOOP
thisChar := substr(inString, i, 1);
--if this character is a number
IF (FcnIsNumber(thisChar)) THEN
--if we haven't started a number yet
IF (numBeginIndex = -1) THEN
numBeginIndex := i;
numLength := 1;
--else if we're in a number, increase the length
ELSE
numLength := numLength + 1;
END IF;
--if this is the last character, we have to append the number
IF (i = stringLength) THEN
outString:= FcnConcatNumber(inString, outString, numBeginIndex, numLength, charTemplate, charTemplateLength);
END IF;
--else this is a character
ELSE
--if we were previously in a number, concat that and reset the numBeginIndex
IF (numBeginIndex != -1) THEN
outString:= FcnConcatNumber(inString, outString, numBeginIndex, numLength, charTemplate, charTemplateLength);
numBeginIndex := -1;
numLength := 0;
END IF;
--concat the character
outString := outString || thisChar;
END IF;
END LOOP;
RETURN outString;
--any exception, just return the original string
EXCEPTION WHEN OTHERS THEN
RETURN inString;
END;
Here is an example query that extracts the string. You should be able to use this in either the UPDATE refactoring of the database, or simply in your query as-is. However, I'm not sure how it is on time; just something to watch out and test for.
SELECT SUBSTRING_INDEX("1.5.32",'.',1) AS MajorVersion,
SUBSTRING_INDEX(SUBSTRING_INDEX("1.5.32",'.',-2),'.',1) AS MinorVersion,
SUBSTRING_INDEX("1.5.32",'.',-1) AS Revision;
this will return:
MajorVersion | MinorVersion | Revision
1 | 5 | 32
Ok, if high performance is an issue then your only option is to change your values into something numeric.
However, if this is a low usage query then you can just split your numbers and order by those.
This query assumes just major and minor version numbers and that they contain just numbers.
SELECT
*
FROM
Requirements
WHERE
Requirements.Release NOT LIKE '%Obsolete%'
ORDER BY
CONVERT(int, RIGHT(REPLICATE('0', 10) + LEFT(Requirements.ReqNum, CHARINDEX('.', Requirements.ReqNum)-1), 10)),
CONVERT(int, SUBSTRING(Requirements.ReqNum, CHARINDEX('.', Requirements.ReqNum )+1, LEN(Requirements.ReqNum) - CHARINDEX('.', Requirements.ReqNum )))
For the all-in-one-query purists, assuming Oracle, some instr/substr/decode/to_number voodoo can solve it:
SELECT *
FROM Requirements
WHERE Release NOT LIKE '%Obsolete%'
ORDER BY
to_number(
substr( reqnum, 1, instr( reqnum, '.' ) - 1 )
)
, to_number(
substr(
reqnum
, instr( reqnum, '.' ) + 1 -- start: after first occurance
, decode(
instr( reqnum, '.', 1, 2 )
, 0, length( reqnum )
, instr( reqnum, '.', 1, 2 ) - 1
) -- second occurance (or end)
- instr( reqnum, '.', 1, 1) -- length: second occurance (or end) less first
)
)
, to_number(
decode(
instr( reqnum, '.', 1, 2 )
, 0, null
, substr(
reqnum
, instr( reqnum, '.', 1, 2 ) + 1 -- start: after second occurance
, decode(
instr( reqnum, '.', 1, 3 )
, 0, length( reqnum )
, instr( reqnum, '.', 1, 3 ) - 1
) -- third occurance (or end)
- instr( reqnum, '.', 1, 2) -- length: third occurance (or end) less second
)
)
)
, to_number(
decode(
instr( reqnum, '.', 1, 3 )
, 0, null
, substr(
reqnum
, instr( reqnum, '.', 1, 3 ) + 1 -- start: after second occurance
, decode(
instr( reqnum, '.', 1, 4 )
, 0, length( reqnum )
, instr( reqnum, '.', 1, 4 ) - 1
) -- fourth occurance (or end)
- instr( reqnum, '.', 1, 3) -- length: fourth occurance (or end) less third
)
)
)
;
I suspect there are plenty of caveats including:
assumption of the presence of minor version (second)
limited to four versions as specified in question's comments
Here's a comparison function for PostgreSQL that will compare arbitrary strings such that sequences of digits are compared numerically. In other words, "ABC123" > "ABC2", but "AB123" < "ABC2". It returns -1, 0, or +1 as such comparison functions usually do.
CREATE FUNCTION vercmp(a text, b text) RETURNS integer AS $$
DECLARE
ar text[];
br text[];
n integer := 1;
BEGIN
SELECT array_agg(y) INTO ar FROM (SELECT array_to_string(regexp_matches(a, E'\\d+|\\D+|^$', 'g'),'') y) x;
SELECT array_agg(y) INTO br FROM (SELECT array_to_string(regexp_matches(b, E'\\d+|\\D+|^$', 'g'),'') y) x;
WHILE n <= array_length(ar, 1) AND n <= array_length(br, 1) LOOP
IF ar[n] ~ E'^\\d+$' AND br[n] ~ E'^\\d+$' THEN
IF ar[n]::integer < br[n]::integer THEN
RETURN -1;
ELSIF ar[n]::integer > br[n]::integer THEN
RETURN 1;
END IF;
ELSE
IF ar[n] < br[n] THEN
RETURN -1;
ELSIF ar[n] > br[n] THEN
RETURN 1;
END IF;
END IF;
n := n + 1;
END LOOP;
IF n > array_length(ar, 1) AND n > array_length(br, 1) THEN
RETURN 0;
ELSIF n > array_length(ar, 1) THEN
RETURN 1;
ELSE
RETURN -1;
END IF;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
You can then create an operator class so that sorting can be done by using the comparison function with ORDER BY field USING <#:
CREATE OR REPLACE FUNCTION vernum_lt(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN vercmp(a, b) < 0;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION vernum_lte(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN vercmp(a, b) <= 0;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION vernum_eq(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN vercmp(a, b) = 0;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION vernum_gt(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN vercmp(a, b) > 0;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION vernum_gte(a text, b text) RETURNS boolean AS $$
BEGIN
RETURN vercmp(a, b) >= 0;
END;
$$ IMMUTABLE LANGUAGE plpgsql;
CREATE OPERATOR <# ( PROCEDURE = vernum_lt, LEFTARG = text, RIGHTARG = text);
CREATE OPERATOR ># ( PROCEDURE = vernum_gt, LEFTARG = text, RIGHTARG = text);
CREATE OPERATOR =# ( PROCEDURE = vernum_lte, LEFTARG = text, RIGHTARG = text);
CREATE OPERATOR <=# ( PROCEDURE = vernum_lte, LEFTARG = text, RIGHTARG = text);
CREATE OPERATOR >=# ( PROCEDURE = vernum_gte, LEFTARG = text, RIGHTARG = text);
CREATE OPERATOR CLASS vernum_ops FOR TYPE varchar USING btree AS
OPERATOR 1 <# (text, text),
OPERATOR 2 <=# (text, text),
OPERATOR 3 =#(text, text),
OPERATOR 4 >=# (text, text),
OPERATOR 5 ># (text, text),
FUNCTION 1 vercmp(text, text)
;
FİXED THİS WAY.
<pre>
00000001 1
00000001.00000001 1.1
00000001.00000001.00000001 1.1.1
00000001.00000002 1.2
00000001.00000009 1.9
00000001.00000010 1.10
00000001.00000011 1.11
00000001.00000012 1.12
00000002 2
00000002.00000001 2.1
00000002.00000001.00000001 2.1.1
00000002.00000002 2.2
00000002.00000009 2.9
00000002.00000010 2.10
00000002.00000011 2.11
00000002.00000012 2.12
select * from (select '000000001' as tCode,'1' as Code union
select '000000001.000000001' as tCode,'1.1'as Code union
select '000000001.000000001.000000001' as tCode,'1.1.1'as Code union
select '000000001.000000002' as tCode,'1.2' union
select '000000001.000000010' as tCode,'1.10'as Code union
select '000000001.000000011' as tCode,'1.11'as Code union
select '000000001.000000012' as tCode,'1.12'as Code union
select '000000001.000000009' as tCode,'1.9' as Code
union
select '00000002' as tCode,'2'as Code union
select '00000002.00000001' as tCode,'2.1'as Code union
select '00000002.00000001.00000001' as tCode,'2.1.1'as Code union
select '00000002.00000002' as tCode,'2.2'as Code union
select '00000002.00000010' as tCode,'2.10'as Code union
select '00000002.00000011' as tCode,'2.11'as Code union
select '00000002.00000012' as tCode,'2.12'as Code union
select '00000002.00000009' as tCode,'2.9'as Code ) as t
order by t.tCode
</pre>
<pre>
public static string GenerateToCodeOrder(this string code)
{
var splits = code.Split('.');
var codes = new List<string>();
foreach (var str in splits)
{
var newStr = "";
var zeroLength = 10 - str.Length;
for (int i = 1; i < zeroLength; i++)
{
newStr += "0";
}
newStr += str;
codes.Add(newStr);
}
return string.Join(".", codes);
}
</pre>
In M$ SQL I had issues with hierachyid with some data...
select Convert(hierarchyid, '/' + '8.3.0000.1088' + '/')
To get around this I used pasename (relies on '.' being the separator)...
Order by
convert(int, reverse (Parsename( reverse(tblSoftware.softwareVersion) , 1))),
convert(int, reverse (Parsename( reverse(tblSoftware.softwareVersion) , 2))),
convert(int, reverse (Parsename( reverse(tblSoftware.softwareVersion) , 3))),
convert(int, reverse (Parsename( reverse(tblSoftware.softwareVersion) , 4))),
convert(int, reverse (Parsename( reverse(tblSoftware.softwareVersion) , 5)))
If the column type for version is varchar the sorting is done as expected.
This is beacuse varchar is not padded by spaces.
Here is an ORACLE expression you can use in an ORDER BY:
select listagg(substr('0000000000' || column_value,-9), '.') within group(order by rownum) from xmltable(replace(version, '.',','))
assuming your version column has only dot as separator (any number of levels).
(if not, up to you to change the replace by e.g. translate(version, '.-', ',,'))
I would do as Joel Coehoorn said. Then to re-arrange your data structure you don't have to manually do it. You can write a simple script that will do the job for all 600 records.
Just remove the dots (Inline, replace with empty string) cast the result as int and order by the result. Works great:
a.Version = 1.4.18.14
select...
Order by cast( replace (a.Version,'.','') as int)