Sort alphanumeric column - sql

I have a column in database:
Serial Number
-------------
S1
S10
...
S2
S11
..
S13
I want to sort and return the result as follows for serial number <= 10 :
S1
S2
S10
One way I tried was:
select Serial_number form table where Serial_Number IN ('S1', 'S2',... 'S10');
This solves the purpose but looking for a better way

Here is an easy way for this format:
order by length(Serial_Number),
Serial_Number
This works because the prefix ('S') is the same length on all the values.

For Postgres you can use something like this:
select serial_number
from the_table
order by regexp_replace(serial_number, '[^0-9]', '', 'g')::integer;
The regexp_replace will remove all non-numeric characters and the result is treated as a number which is suited for a "proper" sorting.
Edit 1:
You can use the new "number" to limit the result of the query:
select serial_number
from (
select serial_number,
regexp_replace(serial_number, '[^0-9]', '', 'g')::integer as snum
from the_table
) t
where snum <= 10
order by snum;
Edit 2
If you receive the error ERROR: invalid input syntax for integer: "" then apparently you have values in the serial_number column which do no follow the format you posted in your question. It means that regexp_replace() remove all characters from the string, so a string like S would cause that.
To prevent that, you need to either exclude those rows from the result using:
where length(regexp_replace(serial_number, '[^0-9]', '', 'g')) > 0
in the inner select. Or, if you need those rows for some reason, deal with that in the select list:
select serial_number
from (
select serial_number,
case
when length(regexp_replace(serial_number, '[^0-9]', '', 'g')) > 0 then regexp_replace(serial_number, '[^0-9]', '', 'g')::integer as snum
else null -- or 0 whatever you need
end as snum
from the_table
) t
where snum <= 10
order by snum;
This is a really nice example on why you should never mix two different things in a single column. If all your serial numbers have a prefix S you shouldn't store it and put the real number in a real integer (or bigint) column.
Using something like NOT_SET to indicate a missing value is also a bad choice. The NULL value was precisely invented for that reason: to indicate the absence of data.

Since only the first character spoils your numeric fun, just trim it with right() and sort by the numeric value:
SELECT *
FROM tbl
WHERE right(serial_number, -1)::int < 11
ORDER BY right(serial_number, -1)::int;
Requires Postgres 9.1 or later. In older versions substitute with substring (x, 10000).

Related

PL SQL replace conditionally suggestion

I need to replace the entire word with 0 if the word has any non-digit character. For example, if digital_word='22B4' then replace with 0, else if digital_word='224' then do not replace.
SELECT replace_funtion(digital_word,'has non numeric character pattern',0,digital_word)
FROM dual;
I tried decode, regexp_instr, regexp_replace but could not come up with the right solution.
Please advise.
Thank you.
the idea is simple - you need check if the value is numeric or not
script:
with nums as
(
select '123' as num from dual union all
select '456' as num from dual union all
select '7A9' as num from dual union all
select '098' as num from dual
)
select n.*
,nvl2(LENGTH(TRIM(TRANSLATE(num, ' +-.0123456789', ' '))),'0',num)
from nums n
result
1 123 123
2 456 456
3 7A9 0
4 098 098
see more articles below to see which way is better to you
How can I determine if a string is numeric in SQL?
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:15321803936685
How to tell if a value is not numeric in Oracle?
You might try the following:
SELECT CASE WHEN REGEXP_LIKE(digital_word, '\D') THEN '0' ELSE digital_word END
FROM dual;
The regular expression class \D matches any non-digit character. You could also use [^0-9] to the same effect:
SELECT CASE WHEN REGEXP_LIKE(digital_word, '\D') THEN '0' ELSE digital_word END
FROM dual;
Alternately you could see if the value of digital_word is made up of nothing but digits:
SELECT CASE WHEN REGEXP_LIKE(digital_word, '^\d+$') THEN digital_word ELSE '0' END
FROM dual;
Hope this helps.
The fastest way is to replace all digits with null (to simply delete them) and see if anything is left. You don't need regular expressions (slow!) for this, you just need the standard string function TRANSLATE().
Unfortunately, Oracle has to work around their own inconsistent treatment of NULL - sometimes as empty string, sometimes not. In the case of the TRANSLATE() function, you can't simply translate every digit to nothing; you must also translate a non-digit character to itself, so that the third argument is not an empty string (which is treated as a real NULL, as in relational theory). See the Oracle documentation for the TRANSLATE() function. https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions216.htm#SQLRF06145
Then, the result can be obtained with a CASE expression (or various forms of NULL handling functions; I prefer CASE, which is SQL Standard):
with
nums ( num ) as (
select '123' from dual union all
select '-56' from dual union all
select '7A9' from dual union all
select '0.9' from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your own table and column names.
select num,
case when translate(num, 'z0123456789', 'z') is null
then num
else '0'
end as result
from nums
;
NUM RESULT
--- ------
123 123
-56 0
7A9 0
0.9 0
Note: everything here is in varchar2 data type (or some other kind of string data type). If the results should be converted to number, wrap the entire case expression within TO_NUMBER(). Note also that the strings '-56' and '0.9' are not all-digits (they contain non-digits), so the result is '0' for both. If this is not what you needed, you must correct the problem statement in the original post.
Something like the following update query will help you:
update [table] set [col] = '0'
where REGEXP_LIKE([col], '.*\D.*', 'i')

Get group maxima from combined strings

I have a table with a column code containing multiple pieces of data like this:
001/2017/TT/000001
001/2017/TT/000002
001/2017/TN/000003
001/2017/TN/000001
001/2017/TN/000002
001/2016/TT/000001
001/2016/TT/000002
001/2016/TT/000001
002/2016/TT/000002
There are 4 items in 001/2016/TT/000001: 001, 2016, TT and 000001.
How can I extract the max for every group formed by the first 3 items? The result I want is this:
001/2017/TT/000003
001/2017/TN/000002
001/2016/TT/000002
002/2016/TT/000002
Edit
The subfield separator is /, and the length of subfields can vary.
I use PostgreSQL 9.3.
Obviously, you should normalize the table and split the combined string into 4 columns with proper data type. The function split_part() is the tool of choice if the separator '/' is constant in your string and the length of can vary.
CREATE TABLE tbl_better AS
SELECT split_part(code, '/', 1)::int AS col_1 -- better names?
, split_part(code, '/', 2)::int AS col_2
, split_part(code, '/', 3) AS col_3 -- text?
, split_part(code, '/', 4)::int AS col_4
FROM tbl_bad
ORDER BY 1,2,3,4 -- optionally cluster data.
Then the task is trivial:
SELECT col_1, col_2, col_3, max(col_4) AS max_nr
FROM tbl_better
GROUP BY 1, 2, 3;
Related:
Split comma separated column data into additional columns
Of course, you can do it on the fly, too. For varying subfield length you could use substring() with a regular expression like this:
SELECT max(substring(code, '([^/]*)$')) AS max_nr
FROM tbl_bad
GROUP BY substring(code, '^(.*)/');
Related (with basic explanation for regexp pattern):
Filter strings with regex before casting to numeric
Or to get only the complete string as result:
SELECT DISTINCT ON (substring(code, '^(.*)/'))
code
FROM tbl_bad
ORDER BY substring(code, '^(.*)/'), code DESC;
About DISTINCT ON:
Select first row in each GROUP BY group?
Be aware that data items cast to a suitable type may behave differently from their string representation. The max of 900001 and 1000001 is 900001 for text and 1000001 for integer ...
Use the LEFT and RIGHT functions.
SELECT MAX(RIGHT(code,6)) AS MAX_CODE
FROM yourtable
GROUP BY LEFT(code,12)
check this out, possible helpfull
select
distinct on (tab[4],tab[2]) tab[4],tab[3],tab[2],tab[1]
from
(
select
string_to_array(exe.x,'/') as tab,
exe.x
from
(
select
unnest
(
array
['001/2017/TT/000001',
'001/2017/TT/000002',
'001/2017/TN/000003',
'001/2017/TN/000001',
'001/2017/TN/000002',
'001/2016/TT/000001',
'001/2016/TT/000002',
'001/2016/TT/000001',
'002/2016/TT/000002']
) as x
) exe
) exe2
order by tab[4] desc,tab[2] desc,tab[3] desc;

Select statement with column contains '%'

I want to select names from a table where the 'name' column contains '%' anywhere in the value. For example, I want to retrieve the name 'Approval for 20 % discount for parts'.
SELECT NAME FROM TABLE WHERE NAME ... ?
You can use like with escape. The default is a backslash in some databases (but not in Oracle), so:
select name
from table
where name like '%\%%' ESCAPE '\'
This is standard, and works in most databases. The Oracle documentation is here.
Of course, you could also use instr():
where instr(name, '%') > 0
One way to do it is using replace with an empty string and checking to see if the difference in length of the original string and modified string is > 0.
select name
from table
where length(name) - length(replace(name,'%','')) > 0
Make life easy on yourselves and just use REGEXP_LIKE( )!
SQL> with tbl(name) as (
select 'ABC' from dual
union
select 'E%FS' from dual
)
select name
from tbl
where regexp_like(name, '%');
NAME
----
E%FS
SQL>
I read the documentation mentioned by Gordon. The relevent sentence is:
An underscore (_) in the pattern matches exactly one character (as opposed to one byte in a multibyte character set) in the value
Here was my test:
select c
from (
select 'a%be' c
from dual) d
where c like '_%'
The value a%be was returned.
While the suggestions of using instr() or length in the other two answers will lead to the correct answer, they will do so slowly. Filtering on function results simply take longer than filtering on fields.

When is a virtual column using AS rendered in SQL?

I wasn't sure how to search for the answer:
select orderid, REGEXP_REPLACE (
orderid,
'^0+(.)',
'\1'
) as new_order_id
from orders
where length('new_order_id') < 6
This returns nothing. But I know the data is there. If I do:
select orderid, REGEXP_REPLACE (
orderid,
'^0+(.)',
'\1'
) as new_order_id
from orders
order by order_id asc
I get order ids like 1, 2, 3...
So how can I get back the ones that are less than six? Does the where not operation on my returned regexp_replace data after the dataset is returned. Oracle if it matters.
Also, I believe my query is knocking out all leading zeros and replacing it with nothing. Not sure what the \1 means. Yes, I copied it. I thought it was putting nothing there, which is what I want. Just truncate leading zeros.
Thanks.
In your query,
where length('new_order_id') < 6
compares the length of the literal string 'new_order_id', not the value of the field new_order_id.
Try removing the quotes:
where length(new_order_id) < 6
Try this:
select * from
(select orderid
, regexp_replace(orderid,'^0+(.)','\1') new_order_id
from orders)
where length(new_order_id) < 6;
You can avoid using regexp:
select orderid
, ltrim(orderid,'0') new_order_id
from orders
where length(ltrim(orderid,'0'))<6
order by 1;
The length of the string 'new_order_id' is never less than 6. Probably you will have to do the length(regexp_replace(...)) < 6 instead if oracle doesn't support using the output column name without quotes (I have no idea).

Finding rows that don't contain numeric data in Oracle

I am trying to locate some problematic records in a very large Oracle table. The column should contain all numeric data even though it is a varchar2 column. I need to find the records which don't contain numeric data (The to_number(col_name) function throws an error when I try to call it on this column).
I was thinking you could use a regexp_like condition and use the regular expression to find any non-numerics. I hope this might help?!
SELECT * FROM table_with_column_to_search WHERE REGEXP_LIKE(varchar_col_with_non_numerics, '[^0-9]+');
To get an indicator:
DECODE( TRANSLATE(your_number,' 0123456789',' ')
e.g.
SQL> select DECODE( TRANSLATE('12345zzz_not_numberee',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"contains char"
and
SQL> select DECODE( TRANSLATE('12345',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
and
SQL> select DECODE( TRANSLATE('123405',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
Oracle 11g has regular expressions so you could use this to get the actual number:
SQL> SELECT colA
2 FROM t1
3 WHERE REGEXP_LIKE(colA, '[[:digit:]]');
COL1
----------
47845
48543
12
...
If there is a non-numeric value like '23g' it will just be ignored.
In contrast to SGB's answer, I prefer doing the regexp defining the actual format of my data and negating that. This allows me to define values like $DDD,DDD,DDD.DD
In the OPs simple scenario, it would look like
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^[0-9]+$');
which finds all non-positive integers. If you wau accept negatiuve integers also, it's an easy change, just add an optional leading minus.
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+$');
accepting floating points...
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+(\.[0-9]+)?$');
Same goes further with any format. Basically, you will generally already have the formats to validate input data, so when you will desire to find data that does not match that format ... it's simpler to negate that format than come up with another one; which in case of SGB's approach would be a bit tricky to do if you want more than just positive integers.
Use this
SELECT *
FROM TableToSearch
WHERE NOT REGEXP_LIKE(ColumnToSearch, '^-?[0-9]+(\.[0-9]+)?$');
After doing some testing, i came up with this solution, let me know in case it helps.
Add this below 2 conditions in your query and it will find the records which don't contain numeric data
and REGEXP_LIKE(<column_name>, '\D') -- this selects non numeric data
and not REGEXP_LIKE(column_name,'^[-]{1}\d{1}') -- this filters out negative(-) values
Starting with Oracle 12.2 the function to_number has an option ON CONVERSION ERROR clause, that can catch the exception and provide default value.
This can be used for the test of number values. Simple set NULL when the conversion fails and filer all not NULL values.
Example
with num as (
select '123' vc_col from dual union all
select '1,23' from dual union all
select 'RV12P2000' from dual union all
select null from dual)
select
vc_col
from num
where /* filter numbers */
vc_col is not null and
to_number(vc_col DEFAULT NULL ON CONVERSION ERROR) is not null
;
VC_COL
---------
123
1,23
From http://www.dba-oracle.com/t_isnumeric.htm
LENGTH(TRIM(TRANSLATE(, ' +-.0123456789', ' '))) is null
If there is anything left in the string after the TRIM it must be non-numeric characters.
I've found this useful:
select translate('your string','_0123456789','_') from dual
If the result is NULL, it's numeric (ignoring floating point numbers.)
However, I'm a bit baffled why the underscore is needed. Without it the following also returns null:
select translate('s123','0123456789', '') from dual
There is also one of my favorite tricks - not perfect if the string contains stuff like "*" or "#":
SELECT 'is a number' FROM dual WHERE UPPER('123') = LOWER('123')
After doing some testing, building upon the suggestions in the previous answers, there seem to be two usable solutions.
Method 1 is fastest, but less powerful in terms of matching more complex patterns.
Method 2 is more flexible, but slower.
Method 1 - fastest
I've tested this method on a table with 1 million rows.
It seems to be 3.8 times faster than the regex solutions.
The 0-replacement solves the issue that 0 is mapped to a space, and does not seem to slow down the query.
SELECT *
FROM <table>
WHERE TRANSLATE(replace(<char_column>,'0',''),'0123456789',' ') IS NOT NULL;
Method 2 - slower, but more flexible
I've compared the speed of putting the negation inside or outside the regex statement. Both are equally slower than the translate-solution. As a result, #ciuly's approach seems most sensible when using regex.
SELECT *
FROM <table>
WHERE NOT REGEXP_LIKE(<char_column>, '^[0-9]+$');
You can use this one check:
create or replace function to_n(c varchar2) return number is
begin return to_number(c);
exception when others then return -123456;
end;
select id, n from t where to_n(n) = -123456;
I tray order by with problematic column and i find rows with column.
SELECT
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND,
D.COL1 AS COL1
FROM
VW_DATA_ALL_GC D
WHERE
(D.PERIOADA IN (:pPERIOADA)) AND
(D.FORM = 62)
AND D.COL1 IS NOT NULL
-- AND REGEXP_LIKE (D.COL1, '\[\[:alpha:\]\]')
-- AND REGEXP_LIKE(D.COL1, '\[\[:digit:\]\]')
--AND REGEXP_LIKE(TO_CHAR(D.COL1), '\[^0-9\]+')
GROUP BY
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND ,
D.COL1
ORDER BY
D.COL1