Order varchar string as numeric - sql

Is it possible to order result rows by a varchar column cast to integer in Postgres 8.3?

It's absolutely possible.
ORDER BY varchar_column::int
Be sure to have valid integer literals in your varchar column for each entry or you get an exception invalid input syntax for integer. Leading and trailing white space is ok - that's trimmed automatically.
If that's the case, though, then why not convert the column to integer to begin with? Smaller, faster, cleaner, simpler.
How to avoid exceptions?
To remove non-digit characters before the cast and thereby avoid possible exceptions:
ORDER BY NULLIF(regexp_replace(varchar_column, '\D', '', 'g'), '')::int
The regexp_replace() expression effectively removes all non-digits, so only digits remain or an empty string. (See below.)
\D is shorthand for the character class [^[:digit:]], meaning all non-digits ([^0-9]).
In old Postgres versions with the outdated setting standard_conforming_strings = off, you have to use Posix escape string syntax E'\\D' to escape the backslash \. This was default in Postgres 8.3, so you'll need that for your outdated version.
The 4th parameter g is for "globally", instructing to replace all occurrences, not just the first.
You may want to allow a leading dash (-) for negative numbers.
If the the string has no digits at all, the result is an empty string which is not valid for a cast to integer. Convert empty strings to NULL with NULLIF. (You might consider 0 instead.)
The result is guaranteed to be valid. This procedure is for a cast to integer as requested in the body of the question, not for numeric as the title mentions.
How to make it fast?
One way is an index on an expression.
CREATE INDEX tbl_varchar_col2int_idx ON tbl
(cast(NULLIF(regexp_replace(varchar_column, '\D', '', 'g'), '') AS integer));
Then use the same expression in the ORDER BY clause:
ORDER BY
cast(NULLIF(regexp_replace(varchar_column, '\D', '', 'g'), '') AS integer)
Test with EXPLAIN ANALYZE whether the functional index actually gets used.

Also in case you want to order by a text column that has something convertible to float, then this does it:
select *
from your_table
order by cast(your_text_column as double precision) desc;

Related

Remove template text on regexp_replace in Oracle's SQL

I am trying to remove template text like &#x; or &#xx; or &#xxx; from long string
Note: x / xx / xxx - is number, The length of the number is unknown, The cell type is CLOB
for example:
SELECT 'H'ello wor±ld' FROM dual
A desirable result:
Hello world
I know that regexp_replace should be used, But how do you use this function to remove this text?
You can use
SELECT REGEXP_REPLACE(col,'&&#\d+;')
FROM t
where
& is put twice to provide escaping for the substitution character
\d represents digits and the following + provides the multiple occurrences of them
ending the pattern with ;
or just use a single ampersand ('&#\d+;') for the pattern as in the case of Demo , since an ampersand has a special meaning for Oracle, a usage is a bit problematic.
In case you wanted to remove the entities because you don't know how to replace them by their character values, here is a solution:
UTL_I18N.UNESCAPE_REFERENCE( xmlquery( 'the_double_quoted_original_string' RETURNING content).getStringVal() )
In other words, the original 'H'ello wor±ld' should be passed to XMLQUERY as '"H'ello wor±ld"'.
And the result will be 'H'ello wo±ld'

How to search by SQL while doing "a cut of trailing zeros" on a number field?

I have a db table in oracle where I have a column defined as a number.
The columns contains numbers like:
MyColumn
12540000000
78590000000
I want to find the records by searching MyColumn=12540000000 as well as MyColumn=1254 (without trailing zeros).
What could I try? TO_CHAR and a cutting logic or is there something more simple?
rtrim(MyColumn, '0') = '1254'
Note that on the right I enclosed the string within quotes (so it is really seen as a string, not a number). Apparently you are treating these as strings, right? Even if MyColumn is a number, it will be implicitly converted to a string before applying rtrim.

try to concatenate 2 strings, result ends in a lot of spaces

select CONCAT(convert(char, 123), 'sda');
Or
select convert(char, 123) + 'sda'
Or
select ltrim(convert(char, 123) + 'sda')
Output is:
How can I get the output without those spaces?
The problem here is 2 fold. Firstly that you are converting to a char, which is a fixed width datatype, and secondly that you aren't defining the length of your char, therefore the default length is used. For CAST and CONVERT that's a char(30).
So, what you have to start is convert(char, 123). This converts the int 123 to the fixed width string '123 '. Then you concatenate the varchar(3) value 'sda' to that, resulting in '123 sda'. This is working exactly as written, but clearly not as you intend.
The obvious fix would be to use a varchar and define a length, such as CONCAT(CONVERT(varchar(5),123),'sda') which would return '123sda', however, all of the CONCAT function's parameters are a string type:
string_value
A string value to concatenate to the other values. The CONCAT function requires at least two string_value arguments, and no more than 254 string_value arguments.
This means you can simply just pass the value 123 and it'll be implicitly cast to a string type: CONCAT(123,'sda').
To reiterate my comment's link too: Bad Habits to Kick : Declaring VARCHAR without (length)
You are using char while you probably want [n]varchar(...): the former pads the string with white spaces, while the latter does not:
concat(convert(varchar(10), 123), 'sda');
But simpler yet: concat() forces the conversion of its arguments to the correct datatype by default, so this should do it:
concat(123, 'sda')
First, in SQL Server, never us char or related string definitions without a length. SQL Server requires a length and the default depends on the context. If you depend on the default length your code has a bug just waiting to happen.
Second, char is almost never what you want. It is a fixed length string, with shorter strings padded with spaces.
If you want an explicit conversion use varchar, variable length strings:
select convert(varchar(255), 123) + 'sda'
Or dispense with the explicit conversion and use concat():
select concat(123, 'sda')
As the others have already pointed out the root cause of the issue, if you cannot edit the datatype, you can always use SELECT CONCAT(TRIM(CONVERT(char,123)),'sda'). Although it's highly recommended to either use varchar(n) or give exclusive length of char as it is kind of pointless to create fixed length string and then reduce the length by using TRIM. varchar(30) would perfectly fit in here as the length can still NOT exceed the 30 symbols, but would not use all the length if the string is shorter.
Lets refer to Microsoft docs:
When n isn't specified in a data definition or variable declaration statement, the default length is 1. If n isn't specified when using the CAST and CONVERT functions, the default length is 30.
Reference: https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver15#remarks
So, You have Convert(char, 123), and you did not specify the n for char, so your code is equal to Convert(char(30), 123).
Now it is clear why you have many space characters. To resolve the problem simply use variant length character datatypes such as varchar instead, however I recommend you to always use character datatypes with length. (Same as what #GordonLinoff posted: https://stackoverflow.com/a/63467483/1666800)
select convert(varchar, 123) + 'sda'

Get only Number

How to ignore special characters and get only number with the below input as string.
Input: '33-01-616-000'
Output should be 3301616000
Use the REPLACE() function to remove the - characters.
REPLACE(columnname, '-', '')
Or if there can be other non-numeric characters, you can use REGEXP_REPLACE() to remove anything that isn't a number.
REGEXP_REPLACE(columnname, '\D', '')
Standard string functions (like REPLACE, TRANSLATE etc.) are often much faster (one order of magnitude faster) than their regular expression counterparts. Of course, this is only important if you have a lot of data to process, and/or if you don't have that much data but you must process it very frequently.
Here is one way to use TRANSLATE for this problem even if you don't know ahead of time what other characters there may be in the string - besides digits:
TRANSLATE(columnname, '0123456789' || columnname, '0123456789')
This will map 0 to 0, 1 to 1, etc. - and all other characters in the input string columnname to nothing (so they will be simply removed). Note that in the TRANSLATE mapping, only the first occurrence of a character in the second argument matters - any additional mapping (due to the appearance of the same character in the second argument more than once) is ignored.
You can also use REGEXP_REPLACE function. Try code below,
SELECT REGEXP_REPLACE('33-01-61ASDF6-0**(98)00[],./123', '([^[:digit:]])', NULL)
FROM DUAL;
SELECT regexp_replace('33-01-616-000','[^0-9]') digits_only FROM dual;
/

CAST TEXT as INTEGER

I have a CHAR column that contains messy OCR'd scan of printed integers.
I need to do SUM() operators on that column. But I'm unable to cast properly.
;Good
sqlite> select CAST("123" as integer);
123
;No Good, should be '323999'
sqlite> select CAST("323,999" as integer);
323
I believe SQLite interprets the comma as marking the end of the "the longest possible prefix of the value that can be interpreted as an integer number"
I prefer to avoid the agony of writing python scripts to do data cleaning on this column. Is there any clever way to do it strictly with SQL?
If you are trying to ignore commas, then remove them before the conversion:
select cast(replace('323,999', ',', '') as integer)