Sorting '£' (pound symbol) in sql - sql

I am trying to sort £ along with other special characters, but its not sorting properly.
I want that string to be sorted along with other strings starting with special characters. For example I have four strings:
&!##
££$$
abcd
&#$%.
Now its sorting in the order: &!##, &#$%, abcd, ££$$.
I want it in the order: &!##, &#$%, ££$$, abcd.
I have used the function order by replace(column,'£','*') so that it sorts along with strings starting with *. Although this seems to work while querying the DB, when used in code and deployed the £ gets replaced by �, i.e. (replace(column,'�','*') in the query, and doesn't sort as expected.
How to resolve this issue? Is there any other solution to sort the pound symbol/£? Any help would be greatly appreciated.

You seem to have two problems; performing the actual sort, and (possibly) how the £ symbol appears in the results in your code. Without knowing anything about your code or client or environment it's rather hard to guess what you might need to change, but I'd start by looking at your NLS_LANG and other NLS settings at the client end. #amccausl's link might be useful, but it depends what you're doing. I suspect you'll find different values in nls_session_parameters when queried from SQL*Plus and from your code, which may give you some pointers.
The sorting itself is slightly clearer now. Have a look at the docs for Linguistic Sorting and String Searching and NLSSORT.
You can do something like this (with a CTE to generate your data):
with tmp_tab as (
select '&!##' as value from dual
union all select '££$$' from dual
union all select 'abcd' from dual
union all select '&#$%' from dual
)
select * from tmp_tab
order by nlssort(value, 'NLS_SORT = WEST_EUROPEAN')
VALUE
------
&!##
&#$%
££$$
abcd
4 rows selected.
You can get sort values supported by your configuration with select value from v$nls_valid_values where parameter = 'SORT', but WESTERN_EUROPEAN seems to do what you want, for this sample data anyway.
You can see the default sorting in your current session with select value from nls_session_parameters where parameter = 'NLS_SORT'. (You can change that with an ALTER SESSION, but it's only letting me do that with some values, so that may not be helpful here).

You need to make sure your application code is all proper UTF-8 (see http://htmlpurifier.org/docs/enduser-utf8.html for more details)

Seems like your issue is with db characterset, or difference in charactersets between the app and db. For Oracle side, you can check by doing:
select value from sys.nls_database_parameters where parameter='NLS_CHARACTERSET';
If this comes up ascii (like US7ASCII), then you may have issues storing the data properly. Even if this is the charset, you should be able to insert and retrieve sorted (binary sort) by using nvarchar2 and unistr (assuming they conform to your NLS_NCHAR_CHARACTERSET, see above query but change parameter), like:
create table test1(val nvarchar2(100));
insert into test1(val) values (unistr('\00a3')); -- pound currency
insert into test1(val) values (unistr('\00a5')); -- yen currency
insert into test1(val) values ('$'); -- dollar currency
commit;
select * from test1
order by val asc;
-- will give symbols in order: dollar('\0024'), pound ('\00a3'), yen ('\00a5')
I will say that I would not resort to using the national characterset, I would probably change the db characterset to fit the needs of my data, as supporting 2 diff character sets isn't ideal, but its available anyway
If you have no issues storing/retrieving on the data side, then your app/client characterset is probably different than your db.

Use nchar(168). It will work.
select nchar(168)

Related

SQL Decode format numbers only

I want to format amounts to salary format, e.g. 10000 becomes 10,000, so I use to_char(amount, '99,999,99')
SELECT SUM(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0)) Salary,
SUM(DECODE(e.element_name,'Transportation Allowance',to_char(v.screen_entry_value,'99,999,99'),0)) Transportation,
SUM(DECODE(e.element_name,'GOSI Processing',to_char(v.screen_entry_value,'99,999,99'),0)) GOSI,
SUM(DECODE(e.element_name,'Housing Allowance',to_char(v.screen_entry_value,'99,999,99'),0)) Housing
FROM values v,
values_types vt,
elements e
WHERE vt.value_type = 'Amount'
this gives error invalid number because not all values are numbers until value_type is equal to Amount but I guess decode check all values anyway although what I know is that the execution begins with from then where then select, what's going wrong here?
You said you added decode(...), but it looks like you might have actually added sum(decode(...)).
You are converting your values to strings with to_char(v.screen_entry_value,'99,999,99'), so your decode() generates a string - the default 0 will be converted to '0' - giving you a value like '1,234,56'. Then you are aggregating those, so sum() has to implicitly convert those strings to numbers - and it is throwing the error when it tries to do that:
select to_number('1,234,56') from dual
will also get "ORA-01722: invalid number", unless you supply a similar format mask so it knows how to interpret it. You could do that, e.g.:
SUM(to_number(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0),'99,999,99'))
... but it's maybe more obvious that something is strange, and even if you did, you would end up with a number, not a formatted string.
So instead of doing:
SUM(DECODE(e.element_name,'Basic Salary',to_char(v.screen_entry_value,'99,999,99'),0))
you should format the result after aggregating:
to_char(SUM(DECODE(e.element_name,'Basic Salary',v.screen_entry_value,0)),'99,999,99')
fiddle with dummy tables, data and joins.

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Why do I get different results depending on the function I use? (SQL Server)

I've been tasked with creating a report for my company. The report is generated from the results returned by the Stored Procedure spGenerateReport, which has multiple filters.
Inside the SP, this is how the filter is expected to work:
SELECT * FROM MyTable WHERE column1 IN (
'filters', 'for', 'this', 'report'
)
Entering the code above yields ~30000 rows in 9s. However, I want to be able to change my SP's filter by passing it a single argument (since I may use 1 or 2 or n filters), like so:
spGenerateReport 'Filters,for,this,report'
For this I have the User-Created Function fnSplitString (yes, I do know that there is a STRING_SPLIT function but I can't use it due to a lower compatibility level of my database) which splits a single string into a table, like so:
SELECT splitData FROM fnSplitString('Filters,for,this,report')
Returns:
splitData
------
Filters
for
this
report
Thus the final code in my SP is:
SELECT * FROM MyTable WHERE column1 IN (
SELECT * FROM fnSplitString('Filters,for,this,report')
)
However, this instead yields ~10000 rows in 60s. The time taken to complete this SP is weird but isn't too much of a problem, however nearly a quarter of my rows disappearing into the void certainly is. The results only have rows from the first couple filters (for example, 'Filters' and 'for'; if I change the order of the arguments (e.g.: fnSplitString('report,for,Filters,this')), I get a different number of rows, and only from filters 'report', 'for', 'Filters'! I don't understand why using the function returns different results than those obtained when using the literal strings. Is there some inside gimmick that I'm not aware of?
PS - I'm sorry in advance for being bad at explaining myself, and for any grammar mistakes
You should definitely be getting the same results with both techniques. Something is wrong.
You havent posted the fnSplitString code but I suspect fnSplitString is not outputting the last string in the list, or maybe the last string in the list is being truncated before it reaches fnSplitString so that no matches are found.
e.g. if the parameter going into your spGenerateReport stored procedure is varchar(20) then what will reach the function is 'Filters,for,this,rep' with the last bit truncated.
SSRS, for example, will truncate strings that are being passed into an SP instead of warning you with an error message

PostgreSQL ORDER BY issue - natural sort

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

How to find MAX() value of character column?

We have legacy table where one of the columns part of composite key was manually filled with values:
code
------
'001'
'002'
'099'
etc.
Now, we have feature request in which we must know MAX(code) in order to give user next possible value, in example case form above next value is '100'.
We tried to experiment with this but we still can't find any reasonable explanation how DB2 engine calculates that
MAX('001', '099', '576') is '576'
MAX('099', '99', 'www') is '99' and so on.
Any help or suggestion would be much appreciated!
You already have the answer to getting the maximum numeric value, but to answer the other part with regard to 'www','099','99'.
The AS/400 uses EBCDIC to store values, this is different to ASCII in several ways, the most important for your purposes is that Alpha characters come before numbers, which is the opposite of Ascii.
So on your Max() your 3 strings will be sorted and the highest EBCDIC value used so
'www'
'099'
'99 '
As you can see your '99' string is really '99 ' so it is higher that the one with the leading zero.
Cast it to int before applying max()
For the numeric maximum -- filter out the non-numeric values and cast to a numeric for aggregation:
SELECT MAX(INT(FLD1))
WHERE FLD1 <> ' '
AND TRANSLATE(FLD1, '0123456789', '0123456789') = FLD1
SQL Reference: TRANSLATE
And the reasonable explanation:
SQL Reference: MAX
This max working well in your type definition, when you want do max on integer values then convert values to integer before calling MAX, but i see you mixing max with string 'www' how you imagine this works?
Filter integer only values, cast it to int and call max. This is not good designed solution but looking at your problem i think is enough.
Sharing the solution for postgresql
which worked for me.
Suppose here temporary_id is of type character in database. Then above query will directly convert char type to int type when it gives response.
SELECT MAX(CAST (temporary_id AS Integer)) FROM temporary
WHERE temporary_id IS NOT NULL
As per my requirement I've applied MAX() aggregate function. One can remove that also and it will work the same way.