Procedure to apply formatting to all rows in a table - sql

I had a SQL procedure that increments through each row and and pads some trailing zeros on values depending on the length of the value after a decimal point. Trying to carry this over to a PSQL environment I realized there was a lot of syntax differences between SQL and PSQL. I managed to make the conversion over time but I am still getting a syntax error and cant figure out why. Can someone help me figure out why this wont run? I am currently running it in PGadmin if that makes any difference.
DO $$
DECLARE
counter integer;
before decimal;
after decimal;
BEGIN
counter := 1;
WHILE counter <> 2 LOOP
before = (select code from table where ID = counter);
after = (SELECT SUBSTRING(code, CHARINDEX('.', code) + 1, LEN(code)) as Afterward from table where ID = counter);
IF before = after
THEN
update table set code = before + '.0000' where ID = counter;
ELSE
IF length(after) = 1 THEN
update table set code = before + '000' where ID = counter;
ELSE IF length(after) = 2 THEN
update table set code = before + '00' where ID = counter;
ELSE IF length(after) = 3 THEN
update table set code = before + '0' where ID = counter;
ELSE
select before;
END IF;
END IF;
counter := counter + 1;
END LOOP
END $$;
Some examples of the input/output of the intended result:
Input 55.5 > Output 55.5000
Input 55 > Output 55.0000
Thanks for your help,
Justin

There is no need for a function or even an update on the table to format values when displaying them.
Assuming the values are in fact numbers stored in a decimal or float column, all you need to do is to apply the to_char() function when retrieving them:
select to_char(code, 'FM999999990.0000')
from data;
This will output 55.5000 or 55.0000
The drawback of the to_char() function is that you need to anticipate the maximum number of digits of that can occur. If you have not enough 9 in the format mask, the output will be something like #.###. But as too many digits in the format mask don't hurt, I usually throw a lot into the format mask.
For more information on formatting functions, please see the manual: https://www.postgresql.org/docs/current/static/functions-formatting.html#FUNCTIONS-FORMATTING-NUMERIC-TABLE
If you insist on storing formatted data, you can use to_char() to update the table:
update the_table
set code = to_char(code::numeric, 'FM999999990.0000');
Casting the value to a number will of course fail if there a non-numeric values in the column.
But again: I strong recommend to store numbers as numbers, not as strings.
If you want to compare this to a user input, it's better to convert the user input to a proper number and compare that to the (number) values stored in the database.
The string matching that you are after doesn't actually require a function either. Using substring() with a regex will do that:
update the_table
set code = code || case length(coalesce(substring(code from '\.[0-9]*$'), ''))
when 4 then '0'
when 3 then '00'
when 2 then '000'
when 1 then '0000'
when 0 then '.0000'
else ''
end
where length(coalesce(substring(code from '\.[0-9]*$'), '')) < 5;
substring(code from '\.[0-9]*$') extracts everything the . followed by numbers that is at the end of the string. So for 55.0 it returns .0 for 55.50 it returns .50 if there is no . in the value, then it returns null that's why the coalesce is needed.
The length of that substring tells us how many digits are present. Depending on that we can then append the necessary number of zeros. The case can be shortened so that not all possible length have to be listed (but it's not simpler):
update the_table
set code = code || case length(coalesce(substring(code from '\.[0-9]*$'), ''))
when 0 then '.0000'
else lpad('0', 5- length(coalesce(substring(code from '\.[0-9]*$'), '')), '0')
end
where length(coalesce(substring(code from '\.[0-9]*$'), '')) < 5;
Another option is to use the position of the . inside the string to calculate the number of 0 that need to be added:
update the_table
set code =
code || case
when strpos(code, '.') = 0 then '0000'
else rpad('0', 4 - (length(code) - strpos(code, '.')), '0')
end
where length(code) - strpos(code, '.') < 4;
Regular expressions are quite expensive not using them will make this faster. The above will however only work if there is always at most one . in the value.
But if you can be sure that every value can be cast to a number, the to_char() method with a cast is definitely the most robust one.
To only process rows where the code columns contains correct numbers, you can use a where clause in the SQL statement:
where code ~ '^[0-9]+(\.[0-9][0-9]?)?$'

To change the column type to numeric:
alter table t alter column code type numeric

Related

How to ignore specific string value when using pattern and patindex function in SQL Server Query?

I have this query here.
WITH Cte_Reverse
AS (
SELECT CASE PATINDEX('%[^0-9.- ]%', REVERSE(EmailName))
WHEN 0
THEN REVERSE(EmailName)
ELSE left(REVERSE(EmailName), PATINDEX('%[^0-9.- ]%', REVERSE(EmailName)) - 1)
END AS Platform_Campaign_ID,
EmailName
FROM [Arrakis].[xtemp].[Stage_SendJobs_Marketing]
)
SELECT REVERSE(Platform_Campaign_ID) AS Platform_Campaign_ID, EmailName
FROM Cte_Reverse
WHERE REVERSE(Platform_Campaign_ID) <> '2020'
AND REVERSE(Platform_Campaign_ID) <> ''
AND LEN(REVERSE(Platform_Campaign_ID)) = 4;
It is working for the most part, below is a screenshot of the result set.
The query I posted above extracts the 4 numbers to the right out of the initial value that is set for the column I am extracting out of. But I am unable to figure out how I can also have the query ignore cases when the right most value is -v2, -v1, etc. essentially anything with -v and whatever number version it is.
If you want four digits, then one method is:
select substring(emailname, patindex('%[0-9][0-9][0-9][0-9]%', emailname), 4)

How can I generate ID with Prefix, Numeric Number and suffix?

I want to generate an ID in MSSQL Server 2008. Which will be Prefix + Numeric Number + suffix Like 'PV#000001#SV'. Which will be user defined (depends on configuration ) prefix, numeric length, suffix and starting number. Numeric number will be increased every time.
I tied to write this :
Blockquote
ALTER PROCEDURE [dbo].[spACC_SELECT_VOUCHER_NUMBER]
#COMPANY_ID uniqueidentifier,
#VOUCHER_TYPE INT
AS BEGIN
DECLARE #IS_AUTOMETIC BIT = (SELECT VOUCHER_CONFIG_NUMBERING_METHOD
FROM ACC_VOUCHER_CONFIG WHERE
ACC_VOUCHER_CONFIG.VOUCHER_CONFIG_VALUE=#VOUCHER_TYPE )
IF(#IS_AUTOMETIC=1)
BEGIN
SELECT CASE WHEN SUBSTRING(V.VOUCHER_CODE, 7, 23) IS NULL
THEN CASE WHEN VC.VOUCHER_CONFIG_PREFIX IS NULL THEN '' ELSE VC.VOUCHER_CONFIG_PREFIX END +
RIGHT ('0000000000000'+ CAST( VC.VOUCHER_CONFIG_BEGINING_NUMBER AS VARCHAR), VC.VOUCHER_CONFIG_NUMERIC_WIDTH) +
CASE WHEN VC.VOUCHER_CONFIG_SUFFIX IS NULL THEN '' ELSE VC.VOUCHER_CONFIG_SUFFIX END
ELSE CASE WHEN VC.VOUCHER_CONFIG_PREFIX IS NULL THEN '' ELSE VC.VOUCHER_CONFIG_PREFIX END +
RIGHT ('0000000000000'+ CAST((CAST( SUBSTRING(V.VOUCHER_CODE, 7, 23) AS INT)+1) AS VARCHAR), VC.VOUCHER_CONFIG_NUMERIC_WIDTH) +
CASE WHEN VC.VOUCHER_CONFIG_SUFFIX IS NULL THEN '' ELSE VC.VOUCHER_CONFIG_SUFFIX END
END AS VOUCHER_CODE FROM ACC_VOUCHER_CONFIG VC
LEFT OUTER JOIN ACC_VOUCHER V ON VC.VOUCHER_CONFIG_VALUE = V.VOUCHER_TYPE
WHERE VC.COMPANY_ID=#COMPANY_ID AND VC.VOUCHER_CONFIG_VALUE=#VOUCHER_TYPE
END
END
When I change the numeric length / suffix its not working.
Thanks
Nahid
For the six-digit number you're struggling with, add leading zeroes like this:
SELECT RIGHT('00000'+ CONVERT(VARCHAR,Num),6) AS NUM FROM your_table
Where Num is your sequential number.
This prepends 5 zeroes and then takes the right 6 characters from the resulting string.
A more detailed writeup of custom ID generation is here:
http://www.sqlteam.com/article/custom-auto-generated-sequences-with-sql-server
My suggestion would be to store just a number in the database (i.e. an int) and format the ID client side with tools that are better suited for it (i.e. a programming language that has sprintf or equivalent string formatting).

How can I determine if a string is numeric in SQL?

In a SQL query on Oracle 10g, I need to determine whether a string is numeric or not. How can I do this?
You can use REGEXP_LIKE:
SELECT 1 FROM DUAL
WHERE REGEXP_LIKE('23.9', '^\d+(\.\d+)?$', '')
You ca try this:
SELECT LENGTH(TRIM(TRANSLATE(string1, ' +-.0123456789', ' '))) FROM DUAL
where string1 is what you're evaluating. It will return null if numeric. Look here for further clarification
I don't have access to a 10G instance for testing, but this works in 9i:
CREATE OR REPLACE FUNCTION is_numeric (p_val VARCHAR2)
RETURN NUMBER
IS
v_val NUMBER;
BEGIN
BEGIN
IF p_val IS NULL OR TRIM (p_val) = ''
THEN
RETURN 0;
END IF;
SELECT TO_NUMBER (p_val)
INTO v_val
FROM DUAL;
RETURN 1;
EXCEPTION
WHEN OTHERS
THEN
RETURN 0;
END;
END;
SELECT is_numeric ('333.5') is_numeric
FROM DUAL;
I have assumed you want nulls/empties treated as FALSE.
As pointed out by Tom Kyte in http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:7466996200346537833, if you're using the built-in TO_NUMBER in a user defined function, you may need a bit of extra trickery to make it work.
FUNCTION is_number(x IN VARCHAR2)
RETURN NUMBER
IS
PROCEDURE check_number (y IN NUMBER)
IS
BEGIN
NULL;
END;
BEGIN
PRAGMA INLINE(check_number, 'No');
check_number(TO_NUMBER(x);
RETURN 1;
EXCEPTION
WHEN INVALID_NUMBER
THEN RETURN 0;
END is_number;
The problem is that the optimizing compiler may recognize that the result of the TO_NUMBER is not used anywhere and optimize it away.
Says Tom (his example was about dates rather then numbers):
the disabling of function inlining will make it do the call to
check_date HAS to be made as a function call - making it so that the
DATE has to be pushed onto the call stack. There is no chance for the
optimizing compiler to remove the call to to_date in this case. If the
call to to_date needed for the call to check_date fails for any
reason, we know that the string input was not convertible by that date
format.
Here is a method to determine numeric that can be part of a simple query, without creating a function. Accounts for embedded spaces, +- not the first character, or a second decimal point.
var v_test varchar2(20);
EXEC :v_test := ' -24.9 ';
select
(case when trim(:v_test) is null then 'N' ELSE -- only banks, or null
(case when instr(trim(:v_test),'+',2,1) > 0 then 'N' ELSE -- + sign not first char
(case when instr(trim(:v_test),'-',2,1) > 0 then 'N' ELSE -- - sign not first char
(case when instr(trim(:v_test),' ',1,1) > 0 then 'N' ELSE -- internal spaces
(case when instr(trim(:v_test),'.',1,2) > 0 then 'N' ELSE -- second decimal point
(case when LENGTH(TRIM(TRANSLATE(:v_test, ' +-.0123456789',' '))) is not null then 'N' ELSE -- only valid numeric charcters.
'Y'
END)END)END)END)END)END) as is_numeric
from dual;
I found that the solution
LENGTH(TRIM(TRANSLATE(string1, ' +-.0123456789', ' '))) is null
allows embedded blanks ... it accepts "123 45 6789" which for my purpose is not a number.
Another level of trim/translate corrects this. The following will detect a string field containing consecutive digits with leading or trailing blanks such that to_number(trim(string1)) will not fail
LENGTH(TRIM(TRANSLATE(translate(trim(string1),' ','X'), '0123456789', ' '))) is null
For integers you can use the below. The first translate changes spaces to be a character and the second changes numbers to be spaces. The Trim will then return null if only numbers exist.
TRIM(TRANSLATE(TRANSLATE(TRIM('1 2 3d 4'), ' ','#'),'0123456789',' ')) is null

SQL Server, where field is int?

how can I accomplish:
select * from table where column_value is int
I know I can probably inner join to the system tables and type tables but I'm wondering if there's a more elegant way.
Note that column_value is a varchar that "could" have an int, but not necessarily.
Maybe I can just cast it and trap the error? But again, that seems like a hack.
select * from table
where column_value not like '[^0-9]'
If negative ints are allowed, you need something like
where column_value like '[+-]%'
and substring(column_value,patindex('[+-]',substring(column_value,1))+1,len(column_value))
not like '[^0-9]'
You need more code if column_value can be an integer that exceeds the limits of the "int" type, and you want to exclude such cases.
Here if you want to implement your custom function
CREATE Function dbo.IsInteger(#Value VARCHAR(18))
RETURNS BIT
AS
BEGIN
RETURN ISNULL(
(SELECT CASE WHEN CHARINDEX('.', #Value) > 0 THEN
CASE WHEN CONVERT(int, PARSENAME(#Value, 1)) <> 0 THEN 0 ELSE 1 END
ELSE 1
END
WHERE ISNUMERIC(#Value + 'e0') = 1), 0)
END
ISNUMERIC returns 1 when the input
expression evaluates to a valid
integer, floating point number, money
or decimal type; otherwise it returns
0. A return value of 1 guarantees that expression can be converted to one of
these numeric types.
I would do a UDF as Svetlozar Angelov suggests, but I would check for ISNUMERIC first (and return 0 if not), and then check for column_value % 1 = 0 to see if it's an integer.
Here's what the body might look like. You have to put the modulo logic in a separate branch because it will throw an exception if the value isn't numeric.
DECLARE #RV BIT
IF ISNUMERIC(#value) BEGIN
IF CAST(#value AS NUMERIC) % 1 = 0 SET #RV = 1
ELSE SET #RV = 0
END
ELSE SET #RV = 0
RETURN #RV
This should handle all cases without throwing any exceptions:
--This handles dollar-signs, commas, decimal-points, and values too big or small,
-- all while safely returning an int.
DECLARE #IntString as VarChar(50) = '$1,000.'
SELECT CAST((CASE WHEN --This IsNumeric check here does most of the heavy lifting. The rest is Integer-Specific
ISNUMERIC(#IntString) = 1
--Only allow Int-related characters. This will exclude things like 'e' and other foreign currency characters.
AND #IntString NOT LIKE '%[^ $,.\-+0-9]%' ESCAPE '\'--'
--Checks that the value is not out of bounds for an Integer.
AND CAST(REPLACE(REPLACE(#IntString,'$',''),',','') as Decimal(38)) BETWEEN -2147483648 AND 2147483647
--This allows values with decimal-points for count as an Int, so long as there it is not a fractional value.
AND CAST(REPLACE(REPLACE(#IntString,'$',''),',','') as Decimal(38)) = CAST(REPLACE(REPLACE(#IntString,'$',''),',','') as Decimal(38,2))
--This will safely convert values with decimal points to casting later as an Int.
THEN CAST(REPLACE(REPLACE(#IntString,'$',''),',','') as Decimal(10))
END) as Int)[Integer]
Throw this into a Scalar UDF and call it ReturnInt().
If the value comes back as NULL, then it's not an int (so there's your IsInteger() requirement)
If you don't like typing "WHERE ReturnInt(SomeValue) IS NOT NULL", you could throw it into another scalar UDF called IsInt() to call this function and simply return "ReturnInt(SomeValue) IS NOT NULL".
The cool thing is, the UDF can serve double duty by returning the "safely" converted int value.
Just because something can be an int doesn't mean casting it as an int won't throw a huge exception. This takes care of that for you.
Also, I'd avoid the other solutions because this universal approach will handle commas, decimals, dollar signs, and checks the acceptable Int value's range while the other solutions do not - or they require multiple SET operations that prevent you from using the logic in a Scalar-Function for maximum performance.
See the examples below and test them against my code and others:
--Proves that appending "e0" or ".0e0" is NOT a good idea.
select ISNUMERIC('$1' + 'e0')--Returns: 0.
select ISNUMERIC('1,000' + 'e0')--Returns: 0.
select ISNUMERIC('1.0' + '.0e0')--Returns: 0.
--While these are numeric, they WILL break your code
-- if you try to cast them directly as int.
select ISNUMERIC('1,000')--Returns: 1.
select CAST('1,000' as Int)--Will throw exception.
select ISNUMERIC('$1')--Returns: 1.
select CAST('$1' as Int)--Will throw exception.
select ISNUMERIC('10.0')--Returns: 1.
select CAST('10.0' as Int)--Will throw exception.
select ISNUMERIC('9999999999223372036854775807')--Returns: 1. This is why I use Decimal(38) as Decimal defaults to Decimal(18).
select CAST('9999999999223372036854775807' as Int)--Will throw exception.
Update:
I read a comment here that you want to be able to parse a value like '123.' into an Integer. I have updated my code to handle this as well.
Note: This converts "1.0", but returns null on "1.9".
If you want to allow for rounding, then tweak the logic in the "THEN" clause to add Round() like so:
ROUND(CAST(REPLACE(REPLACE(#IntString,'$',''),',','') as Decimal(10)), 0)
You must also remove the "AND" that checks for "decimal-points" to allow for Rounding or Truncation.
Why not use the following and test for 1?
DECLARE #TestValue nvarchar(MAX)
SET #TestValue = '1.04343234e5'
SELECT CASE WHEN ISNUMERIC(#TestValue) = 1
THEN CASE WHEN ROUND(#TestValue,0,1) = #TestValue
THEN 1
ELSE 0
END
ELSE null
END AS Analysis
If you are purely looking to verify a string is all digits and not just CAST-able to INT you can do this terrible, terrible thing:
select LEN(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
'-1.223344556677889900e-1'
,'0','') ,'1','') ,'2','') ,'3','') ,'4','') ,'5','') ,'6','') ,'7','') ,'8','') ,'9','')
)
It returns 0 when the string was empty or pure digits.
To make it a useful check for "poor-man's" Integer you'd have to deal with empty string, and an initial negative sign. And manually make sure it isn't too long for your variety of INTEGER.

Natural Sort in MySQL

Is there an elegant way to have performant, natural sorting in a MySQL database?
For example if I have this data set:
Final Fantasy
Final Fantasy 4
Final Fantasy 10
Final Fantasy 12
Final Fantasy 12: Chains of Promathia
Final Fantasy Adventure
Final Fantasy Origins
Final Fantasy Tactics
Any other elegant solution than to split up the games' names into their components
Title: "Final Fantasy"
Number: "12"
Subtitle: "Chains of Promathia"
to make sure that they come out in the right order? (10 after 4, not before 2).
Doing so is a pain in the a** because every now and then there's another game that breaks that mechanism of parsing the game title (e.g. "Warhammer 40,000", "James Bond 007")
Here is a quick solution:
SELECT alphanumeric,
integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
Just found this:
SELECT names FROM your_table ORDER BY games + 0 ASC
Does a natural sort when the numbers are at the front, might work for middle as well.
Same function as posted by #plalx, but rewritten to MySQL:
DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000))
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE position int;
DECLARE tmp_position int;
SET position = 5000;
SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
IF (position = 5000) THEN RETURN 0; END IF;
RETURN position;
END
;;
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50))
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE sortString varchar(4000);
DECLARE numStartIndex int;
DECLARE numEndIndex int;
DECLARE padLength int;
DECLARE totalPadLength int;
DECLARE i int;
DECLARE sameOrderCharsLen int;
SET totalPadLength = 0;
SET instring = TRIM(instring);
SET sortString = instring;
SET numStartIndex = udf_FirstNumberPos(instring);
SET numEndIndex = 0;
SET i = 1;
SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);
WHILE (i <= sameOrderCharsLen) DO
SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
SET i = i + 1;
END WHILE;
WHILE (numStartIndex <> 0) DO
SET numStartIndex = numStartIndex + numEndIndex;
SET numEndIndex = numStartIndex;
WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
SET numEndIndex = numEndIndex + 1;
END WHILE;
SET numEndIndex = numEndIndex - 1;
SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);
IF padLength < 0 THEN
SET padLength = 0;
END IF;
SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));
SET totalPadLength = totalPadLength + padLength;
SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
END WHILE;
RETURN sortString;
END
;;
Usage:
SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")
I think this is why a lot of things are sorted by release date.
A solution could be to create another column in your table for the "SortKey". This could be a sanitized version of the title which conforms to a pattern you create for easy sorting or a counter.
I've written this function for MSSQL 2000 a while ago:
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* #author Alexandre Potvin Latreille (plalx)
* #param {nvarchar(4000)} string The formatted string.
* #param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* #param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* #return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
CREATE FUNCTION dbo.udf_NaturalSortFormat(
#string nvarchar(4000),
#numberLength int = 10,
#sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE #sortString varchar(4000),
#numStartIndex int,
#numEndIndex int,
#padLength int,
#totalPadLength int,
#i int,
#sameOrderCharsLen int;
SELECT
#totalPadLength = 0,
#string = RTRIM(LTRIM(#string)),
#sortString = #string,
#numStartIndex = PATINDEX('%[0-9]%', #string),
#numEndIndex = 0,
#i = 1,
#sameOrderCharsLen = LEN(#sameOrderChars);
-- Replace all char that has to have the same order by a space.
WHILE (#i <= #sameOrderCharsLen)
BEGIN
SET #sortString = REPLACE(#sortString, SUBSTRING(#sameOrderChars, #i, 1), ' ');
SET #i = #i + 1;
END
-- Pad numbers with zeros.
WHILE (#numStartIndex <> 0)
BEGIN
SET #numStartIndex = #numStartIndex + #numEndIndex;
SET #numEndIndex = #numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(#string, #numEndIndex, 1)) = 1)
BEGIN
SET #numEndIndex = #numEndIndex + 1;
END
SET #numEndIndex = #numEndIndex - 1;
SET #padLength = #numberLength - (#numEndIndex + 1 - #numStartIndex);
IF #padLength < 0
BEGIN
SET #padLength = 0;
END
SET #sortString = STUFF(
#sortString,
#numStartIndex + #totalPadLength,
0,
REPLICATE('0', #padLength)
);
SET #totalPadLength = #totalPadLength + #padLength;
SET #numStartIndex = PATINDEX('%[0-9]%', RIGHT(#string, LEN(#string) - #numEndIndex));
END
RETURN #sortString;
END
GO
MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).
Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.
Requests to add a "natural sort" come up from time to time on the MySQL bugs and discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BY part of the query, e.g.
SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)
This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.
So, while I know that you have found a satisfactory answer, I was struggling with this problem for awhile, and we'd previously determined that it could not be done reasonably well in SQL and we were going to have to use javascript on a JSON array.
Here's how I solved it just using SQL. Hopefully this is helpful for others:
I had data such as:
Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2
I actually didn't "cast" things though I suppose that may also have worked.
I first replaced the parts that were unchanging in the data, in this case "Scene ", and then did a LPAD to line things up. This seems to allow pretty well for the alpha strings to sort properly as well as the numbered ones.
My ORDER BY clause looks like:
ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')
Obviously this doesn't help with the original problem which was not so uniform - but I imagine this would probably work for many other related problems, so putting it out there.
Add a Sort Key (Rank) in your table. ORDER BY rank
Utilise the "Release Date" column. ORDER BY release_date
When extracting the data from SQL, make your object do the sorting, e.g., if extracting into a Set, make it a TreeSet, and make your data model implement Comparable and enact the natural sort algorithm here (insertion sort will suffice if you are using a language without collections) as you'll be reading the rows from SQL one by one as you create your model and insert it into the collection)
Regarding the best response from Richard Toth https://stackoverflow.com/a/12257917/4052357
Watch out for UTF8 encoded strings that contain 2byte (or more) characters and numbers e.g.
12 南新宿
Using MySQL's LENGTH() in udf_NaturalSortFormat function will return the byte length of the string and be incorrect, instead use CHAR_LENGTH() which will return the correct character length.
In my case using LENGTH() caused queries to never complete and result in 100% CPU usage for MySQL
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50))
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE sortString varchar(4000);
DECLARE numStartIndex int;
DECLARE numEndIndex int;
DECLARE padLength int;
DECLARE totalPadLength int;
DECLARE i int;
DECLARE sameOrderCharsLen int;
SET totalPadLength = 0;
SET instring = TRIM(instring);
SET sortString = instring;
SET numStartIndex = udf_FirstNumberPos(instring);
SET numEndIndex = 0;
SET i = 1;
SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);
WHILE (i <= sameOrderCharsLen) DO
SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
SET i = i + 1;
END WHILE;
WHILE (numStartIndex <> 0) DO
SET numStartIndex = numStartIndex + numEndIndex;
SET numEndIndex = numStartIndex;
WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
SET numEndIndex = numEndIndex + 1;
END WHILE;
SET numEndIndex = numEndIndex - 1;
SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);
IF padLength < 0 THEN
SET padLength = 0;
END IF;
SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));
SET totalPadLength = totalPadLength + padLength;
SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
END WHILE;
RETURN sortString;
END
;;
p.s. I would have added this as a comment to the original but I don't have enough reputation (yet)
Add a field for "sort key" that has all strings of digits zero-padded to a fixed length and then sort on that field instead.
If you might have long strings of digits, another method is to prepend the number of digits (fixed-width, zero-padded) to each string of digits. For example, if you won't have more than 99 digits in a row, then for "Super Blast 10 Ultra" the sort key would be "Super Blast 0210 Ultra".
To order:
0
1
2
10
23
101
205
1000
a
aac
b
casdsadsa
css
Use this query:
SELECT
column_name
FROM
table_name
ORDER BY
column_name REGEXP '^\d*[^\da-z&\.\' \-\"\!\#\#\$\%\^\*\(\)\;\:\\,\?\/\~\`\|\_\-]' DESC,
column_name + 0,
column_name;
If you do not want to reinvent the wheel or have a headache with lot of code that does not work, just use Drupal Natural Sort ... Just run the SQL that comes zipped (MySQL or Postgre), and that's it. When making a query, simply order using:
... ORDER BY natsort_canon(column_name, 'natural')
Another option is to do the sorting in memory after pulling the data from mysql. While it won't be the best option from a performance standpoint, if you are not sorting huge lists you should be fine.
If you take a look at Jeff's post, you can find plenty of algorithms for what ever language you might be working with.
Sorting for Humans : Natural Sort Order
You can also create in a dynamic way the "sort column" :
SELECT name, (name = '-') boolDash, (name = '0') boolZero, (name+0 > 0) boolNum
FROM table
ORDER BY boolDash DESC, boolZero DESC, boolNum DESC, (name+0), name
That way, you can create groups to sort.
In my query, I wanted the '-' in front of everything, then the numbers, then the text. Which could result in something like :
-
0
1
2
3
4
5
10
13
19
99
102
Chair
Dog
Table
Windows
That way you don't have to maintain the sort column in the correct order as you add data. You can also change your sort order depending on what you need.
A lot of other answers I see here (and in the duplicate questions) basically only work for very specifically formatted data, e.g. a string that's entirely a number, or for which there's a fixed-length alphabetic prefix. This isn't going to work in the general case.
It's true that there's not really any way to implement a 100% general nat-sort in MySQL, because to do it what you really need is a modified comparison function, that switches between lexicographic sorting of the strings and numeric sort if/when it encounters a number. Such code could implement any algorithm you could desire for recognising and comparing the numeric portions within two strings. Unfortunately, though, the comparison function in MySQL is internal to its code, and cannot be changed by the user.
This leaves a hack of some kind, where you try to create a sort key for your string in which the numeric parts are re-formatted so that the standard lexicographic sort actually sorts them the way you want.
For plain integers up to some maximum number of digits, the obvious solution is to simply left-pad them with zeros so that they're all fixed width. This is the approach taken by the Drupal plugin, and the solutions of #plalx / #RichardToth. (#Christian has a different and much more complex solution, but it offers no advantages that I can see).
As #tye points out, you can improve on this by prepending a fixed-digit length to each number, rather than simply left-padding it. There's much, much more you can improve on, though, even given the limitations of what is essentially an awkward hack. Yet, there doesn't seem to be any pre-built solutions out there!
For example, what about:
Plus and minus signs? +10 vs 10 vs -10
Decimals? 8.2, 8.5, 1.006, .75
Leading zeros? 020, 030, 00000922
Thousand separators? "1,001 Dalmations" vs "1001 Dalmations"
Version numbers? MariaDB v10.3.18 vs MariaDB v10.3.3
Very long numbers? 103,768,276,592,092,364,859,236,487,687,870,234,598.55
Extending on #tye's method, I've created a fairly compact NatSortKey() stored function that will convert an arbitrary string into a nat-sort key, and that handles all of the above cases, is reasonably efficient, and preserves a total sort-order (no two different strings have sort keys that compare equal). A second parameter can be used to limit the number of numbers processed in each string (e.g. to the first 10 numbers, say), which can be used to ensure the output fits within a given length.
NOTE: Sort-key string generated with a given value of this 2nd parameter should only be sorted against other strings generated with the same value for the parameter, or else they might not sort correctly!
You can use it directly in ordering, e.g.
SELECT myString FROM myTable ORDER BY NatSortKey(myString,0); ### 0 means process all numbers - resulting sort key might be quite long for certain inputs
But for efficient sorting of large tables, it's better to pre-store the sort key in another column (possibly with an index on it):
INSERT INTO myTable (myString,myStringNSK) VALUES (#theStringValue,NatSortKey(#theStringValue,10)), ...
...
SELECT myString FROM myTable ORDER BY myStringNSK;
[Ideally, you'd make this happen automatically by creating the key column as a computed stored column, using something like:
CREATE TABLE myTable (
...
myString varchar(100),
myStringNSK varchar(150) AS (NatSortKey(myString,10)) STORED,
...
KEY (myStringNSK),
...);
But for now neither MySQL nor MariaDB allow stored functions in computed columns, so unfortunately you can't yet do this.]
My function affects sorting of numbers only. If you want to do other sort-normalization things, such as removing all punctuation, or trimming whitespace off each end, or replacing multi-whitespace sequences with single spaces, you could either extend the function, or it could be done before or after NatSortKey() is applied to your data. (I'd recommend using REGEXP_REPLACE() for this purpose).
It's also somewhat Anglo-centric in that I assume '.' for a decimal point and ',' for the thousands-separator, but it should be easy enough to modify if you want the reverse, or if you want that to be switchable as a parameter.
It might be amenable to further improvement in other ways; for example it currently sorts negative numbers by absolute value, so -1 comes before -2, rather than the other way around. There's also no way to specify a DESC sort order for numbers while retaining ASC lexicographical sort for text. Both of these issues can be fixed with a little more work; I will updated the code if/when I get the time.
There are lots of other details to be aware of - including some critical dependencies on the chaset and collation that you're using - but I've put them all into a comment block within the SQL code. Please read this carefully before using the function for yourself!
So, here's the code. If you find a bug, or have an improvement I haven't mentioned, please let me know in the comments!
delimiter $$
CREATE DEFINER=CURRENT_USER FUNCTION NatSortKey (s varchar(100), n int) RETURNS varchar(350) DETERMINISTIC
BEGIN
/****
Converts numbers in the input string s into a format such that sorting results in a nat-sort.
Numbers of up to 359 digits (before the decimal point, if one is present) are supported. Sort results are undefined if the input string contains numbers longer than this.
For n>0, only the first n numbers in the input string will be converted for nat-sort (so strings that differ only after the first n numbers will not nat-sort amongst themselves).
Total sort-ordering is preserved, i.e. if s1!=s2, then NatSortKey(s1,n)!=NatSortKey(s2,n), for any given n.
Numbers may contain ',' as a thousands separator, and '.' as a decimal point. To reverse these (as appropriate for some European locales), the code would require modification.
Numbers preceded by '+' sort with numbers not preceded with either a '+' or '-' sign.
Negative numbers (preceded with '-') sort before positive numbers, but are sorted in order of ascending absolute value (so -7 sorts BEFORE -1001).
Numbers with leading zeros sort after the same number with no (or fewer) leading zeros.
Decimal-part-only numbers (like .75) are recognised, provided the decimal point is not immediately preceded by either another '.', or by a letter-type character.
Numbers with thousand separators sort after the same number without them.
Thousand separators are only recognised in numbers with no leading zeros that don't immediately follow a ',', and when they format the number correctly.
(When not recognised as a thousand separator, a ',' will instead be treated as separating two distinct numbers).
Version-number-like sequences consisting of 3 or more numbers separated by '.' are treated as distinct entities, and each component number will be nat-sorted.
The entire entity will sort after any number beginning with the first component (so e.g. 10.2.1 sorts after both 10 and 10.995, but before 11)
Note that The first number component in an entity like this is also permitted to contain thousand separators.
To achieve this, numbers within the input string are prefixed and suffixed according to the following format:
- The number is prefixed by a 2-digit base-36 number representing its length, excluding leading zeros. If there is a decimal point, this length only includes the integer part of the number.
- A 3-character suffix is appended after the number (after the decimals if present).
- The first character is a space, or a '+' sign if the number was preceded by '+'. Any preceding '+' sign is also removed from the front of the number.
- This is followed by a 2-digit base-36 number that encodes the number of leading zeros and whether the number was expressed in comma-separated form (e.g. 1,000,000.25 vs 1000000.25)
- The value of this 2-digit number is: (number of leading zeros)*2 + (1 if comma-separated, 0 otherwise)
- For version number sequences, each component number has the prefix in front of it, and the separating dots are removed.
Then there is a single suffix that consists of a ' ' or '+' character, followed by a pair base-36 digits for each number component in the sequence.
e.g. here is how some simple sample strings get converted:
'Foo055' --> 'Foo0255 02'
'Absolute zero is around -273 centigrade' --> 'Absolute zero is around -03273 00 centigrade'
'The $1,000,000 prize' --> 'The $071000000 01 prize'
'+99.74 degrees' --> '0299.74+00 degrees'
'I have 0 apples' --> 'I have 00 02 apples'
'.5 is the same value as 0000.5000' --> '00.5 00 is the same value as 00.5000 08'
'MariaDB v10.3.0018' --> 'MariaDB v02100130218 000004'
The restriction to numbers of up to 359 digits comes from the fact that the first character of the base-36 prefix MUST be a decimal digit, and so the highest permitted prefix value is '9Z' or 359 decimal.
The code could be modified to handle longer numbers by increasing the size of (both) the prefix and suffix.
A higher base could also be used (by replacing CONV() with a custom function), provided that the collation you are using sorts the "digits" of the base in the correct order, starting with 0123456789.
However, while the maximum number length may be increased this way, note that the technique this function uses is NOT applicable where strings may contain numbers of unlimited length.
The function definition does not specify the charset or collation to be used for string-type parameters or variables: The default database charset & collation at the time the function is defined will be used.
This is to make the function code more portable. However, there are some important restrictions:
- Collation is important here only when comparing (or storing) the output value from this function, but it MUST order the characters " +0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" in that order for the natural sort to work.
This is true for most collations, but not all of them, e.g. in Lithuanian 'Y' comes before 'J' (according to Wikipedia).
To adapt the function to work with such collations, replace CONV() in the function code with a custom function that emits "digits" above 9 that are characters ordered according to the collation in use.
- For efficiency, the function code uses LENGTH() rather than CHAR_LENGTH() to measure the length of strings that consist only of digits 0-9, '.', and ',' characters.
This works for any single-byte charset, as well as any charset that maps standard ASCII characters to single bytes (such as utf8 or utf8mb4).
If using a charset that maps these characters to multiple bytes (such as, e.g. utf16 or utf32), you MUST replace all instances of LENGTH() in the function definition with CHAR_LENGTH()
Length of the output:
Each number converted adds 5 characters (2 prefix + 3 suffix) to the length of the string. n is the maximum count of numbers to convert;
This parameter is provided as a means to limit the maximum output length (to input length + 5*n).
If you do not require the total-ordering property, you could edit the code to use suffixes of 1 character (space or plus) only; this would reduce the maximum output length for any given n.
Since a string of length L has at most ((L+1) DIV 2) individual numbers in it (every 2nd character a digit), for n<=0 the maximum output length is (inputlength + 5*((inputlength+1) DIV 2))
So for the current input length of 100, the maximum output length is 350.
If changing the input length, the output length must be modified according to the above formula. The DECLARE statements for x,y,r, and suf must also be modified, as the code comments indicate.
****/
DECLARE x,y varchar(100); # need to be same length as input s
DECLARE r varchar(350) DEFAULT ''; # return value: needs to be same length as return type
DECLARE suf varchar(101); # suffix for a number or version string. Must be (((inputlength+1) DIV 2)*2 + 1) chars to support version strings (e.g. '1.2.33.5'), though it's usually just 3 chars. (Max version string e.g. 1.2. ... .5 has ((length of input + 1) DIV 2) numeric components)
DECLARE i,j,k int UNSIGNED;
IF n<=0 THEN SET n := -1; END IF; # n<=0 means "process all numbers"
LOOP
SET i := REGEXP_INSTR(s,'\\d'); # find position of next digit
IF i=0 OR n=0 THEN RETURN CONCAT(r,s); END IF; # no more numbers to process -> we're done
SET n := n-1, suf := ' ';
IF i>1 THEN
IF SUBSTRING(s,i-1,1)='.' AND (i=2 OR SUBSTRING(s,i-2,1) RLIKE '[^.\\p{L}\\p{N}\\p{M}\\x{608}\\x{200C}\\x{200D}\\x{2100}-\\x{214F}\\x{24B6}-\\x{24E9}\\x{1F130}-\\x{1F149}\\x{1F150}-\\x{1F169}\\x{1F170}-\\x{1F189}]') AND (SUBSTRING(s,i) NOT RLIKE '^\\d++\\.\\d') THEN SET i:=i-1; END IF; # Allow decimal number (but not version string) to begin with a '.', provided preceding char is neither another '.', nor a member of the unicode character classes: "Alphabetic", "Letter", "Block=Letterlike Symbols" "Number", "Mark", "Join_Control"
IF i>1 AND SUBSTRING(s,i-1,1)='+' THEN SET suf := '+', j := i-1; ELSE SET j := i; END IF; # move any preceding '+' into the suffix, so equal numbers with and without preceding "+" signs sort together
SET r := CONCAT(r,SUBSTRING(s,1,j-1)); SET s = SUBSTRING(s,i); # add everything before the number to r and strip it from the start of s; preceding '+' is dropped (not included in either r or s)
END IF;
SET x := REGEXP_SUBSTR(s,IF(SUBSTRING(s,1,1) IN ('0','.') OR (SUBSTRING(r,-1)=',' AND suf=' '),'^\\d*+(?:\\.\\d++)*','^(?:[1-9]\\d{0,2}(?:,\\d{3}(?!\\d))++|\\d++)(?:\\.\\d++)*+')); # capture the number + following decimals (including multiple consecutive '.<digits>' sequences)
SET s := SUBSTRING(s,LENGTH(x)+1); # NOTE: LENGTH() can be safely used instead of CHAR_LENGTH() here & below PROVIDED we're using a charset that represents digits, ',' and '.' characters using single bytes (e.g. latin1, utf8)
SET i := INSTR(x,'.');
IF i=0 THEN SET y := ''; ELSE SET y := SUBSTRING(x,i); SET x := SUBSTRING(x,1,i-1); END IF; # move any following decimals into y
SET i := LENGTH(x);
SET x := REPLACE(x,',','');
SET j := LENGTH(x);
SET x := TRIM(LEADING '0' FROM x); # strip leading zeros
SET k := LENGTH(x);
SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294) + IF(i=j,0,1),10,36),2,'0')); # (j-k)*2 + IF(i=j,0,1) = (count of leading zeros)*2 + (1 if there are thousands-separators, 0 otherwise) Note the first term is bounded to <= base-36 'ZY' as it must fit within 2 characters
SET i := LOCATE('.',y,2);
IF i=0 THEN
SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x,y,suf); # k = count of digits in number, bounded to be <= '9Z' base-36
ELSE # encode a version number (like 3.12.707, etc)
SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x); # k = count of digits in number, bounded to be <= '9Z' base-36
WHILE LENGTH(y)>0 AND n!=0 DO
IF i=0 THEN SET x := SUBSTRING(y,2); SET y := ''; ELSE SET x := SUBSTRING(y,2,i-2); SET y := SUBSTRING(y,i); SET i := LOCATE('.',y,2); END IF;
SET j := LENGTH(x);
SET x := TRIM(LEADING '0' FROM x); # strip leading zeros
SET k := LENGTH(x);
SET r := CONCAT(r,LPAD(CONV(LEAST(k,359),10,36),2,'0'),x); # k = count of digits in number, bounded to be <= '9Z' base-36
SET suf := CONCAT(suf,LPAD(CONV(LEAST((j-k)*2,1294),10,36),2,'0')); # (j-k)*2 = (count of leading zeros)*2, bounded to fit within 2 base-36 digits
SET n := n-1;
END WHILE;
SET r := CONCAT(r,y,suf);
END IF;
END LOOP;
END
$$
delimiter ;
Other answers are correct, but you may want to know that MariaDB 10.11 LTS has a natural_sort_key() function. The function is documented here.
If you're using PHP you can do the the natural sort in php.
$keys = array();
$values = array();
foreach ($results as $index => $row) {
$key = $row['name'].'__'.$index; // Add the index to create an unique key.
$keys[] = $key;
$values[$key] = $row;
}
natsort($keys);
$sortedValues = array();
foreach($keys as $index) {
$sortedValues[] = $values[$index];
}
I hope MySQL will implement natural sorting in a future version, but the feature request (#1588) is open since 2003, So I wouldn't hold my breath.
A simplified non-udf version of the best response of #plaix/Richard Toth/Luke Hoggett, which works only for the first integer in the field, is
SELECT name,
LEAST(
IFNULL(NULLIF(LOCATE('0', name), 0), ~0),
IFNULL(NULLIF(LOCATE('1', name), 0), ~0),
IFNULL(NULLIF(LOCATE('2', name), 0), ~0),
IFNULL(NULLIF(LOCATE('3', name), 0), ~0),
IFNULL(NULLIF(LOCATE('4', name), 0), ~0),
IFNULL(NULLIF(LOCATE('5', name), 0), ~0),
IFNULL(NULLIF(LOCATE('6', name), 0), ~0),
IFNULL(NULLIF(LOCATE('7', name), 0), ~0),
IFNULL(NULLIF(LOCATE('8', name), 0), ~0),
IFNULL(NULLIF(LOCATE('9', name), 0), ~0)
) AS first_int
FROM table
ORDER BY IF(first_int = ~0, name, CONCAT(
SUBSTR(name, 1, first_int - 1),
LPAD(CAST(SUBSTR(name, first_int) AS UNSIGNED), LENGTH(~0), '0'),
SUBSTR(name, first_int + LENGTH(CAST(SUBSTR(name, first_int) AS UNSIGNED)))
)) ASC
I have tried several solutions but the actually it is very simple:
SELECT test_column FROM test_table ORDER BY LENGTH(test_column) DESC, test_column DESC
/*
Result
--------
value_1
value_2
value_3
value_4
value_5
value_6
value_7
value_8
value_9
value_10
value_11
value_12
value_13
value_14
value_15
...
*/
Also there is natsort. It is intended to be a part of a drupal plugin, but it works fine stand-alone.
Here is a simple one if titles only have the version as a number:
ORDER BY CAST(REGEXP_REPLACE(title, "[a-zA-Z]+", "") AS INT)';
Otherwise you can use simple SQL if you use a pattern (this pattern uses a # before the version):
create table titles(title);
insert into titles (title) values
('Final Fantasy'),
('Final Fantasy #03'),
('Final Fantasy #11'),
('Final Fantasy #10'),
('Final Fantasy #2'),
('Bond 007 ##2'),
('Final Fantasy #01'),
('Bond 007'),
('Final Fantasy #11}');
select REGEXP_REPLACE(title, "#([0-9]+)", "\\1") as title from titles
ORDER BY REGEXP_REPLACE(title, "#[0-9]+", ""),
CAST(REGEXP_REPLACE(title, ".*#([0-9]+).*", "\\1") AS INT);
+-------------------+
| title |
+-------------------+
| Bond 007 |
| Bond 007 #2 |
| Final Fantasy |
| Final Fantasy 01 |
| Final Fantasy 2 |
| Final Fantasy 03 |
| Final Fantasy 10 |
| Final Fantasy 11 |
| Final Fantasy 11} |
+-------------------+
8 rows in set, 2 warnings (0.001 sec)
You can use other patterns if needed.
For example if you have a movie "I'm #1" and "I'm #1 part 2" then maybe wrap the version e.g. "Final Fantasy {11}"
I know this topic is ancient but I think I've found a way to do this:
SELECT * FROM `table` ORDER BY
CONCAT(
GREATEST(
LOCATE('1', name),
LOCATE('2', name),
LOCATE('3', name),
LOCATE('4', name),
LOCATE('5', name),
LOCATE('6', name),
LOCATE('7', name),
LOCATE('8', name),
LOCATE('9', name)
),
name
) ASC
Scrap that, it sorted the following set incorrectly (It's useless lol):
Final Fantasy 1
Final Fantasy 2
Final Fantasy 5
Final Fantasy 7
Final Fantasy 7: Advent Children
Final Fantasy 12
Final Fantasy 112
FF1
FF2