Oracle varchar2 and unicode texts - sql

According to spec VARCHAR2(max_size CHAR) should store max_size chars. I observe other/strange behavior for Unicode texts.
Let's consider that example:
create table test (id varchar2(3 char) not null, primary key (id));
insert into test(id) values('abc');
insert into test(id) values('ффф');
Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 6, maximum: 3)
So varchar2 3 chars actually mean the same as byte? NO :)
create table test (id varchar2(3 byte) not null, primary key (id))
insert into test(id) values('abc')
insert into test(id) values('ффф')
Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 18, maximum: 3)
And my question remains how to tell Oracle that varchar2 length is for Unicode text (UTF8 to be more precise)?
Update: Is it possible to write down a SQL query that will show all tables/columns that length was in bytes?
Actually, my issue split into 2 parts incorrect query encoding of TablePlus, length in bytes (w/o char suffix) for random columns :)
Update 2: Thanks to #Wernfried Domscheit!
The query show table and columns with varchar2 that length is provided in bytes:
SELECT TABLE_NAME, COLUMN_NAME, DATA_LENGTH, CHAR_USED
FROM USER_TAB_COLUMNS WHERE DATA_TYPE = 'VARCHAR2' AND CHAR_USED = 'B'

Your example is working for me:
SELECT *
FROM V$NLS_PARAMETERS
WHERE PARAMETER = 'NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------
NLS_CHARACTERSET AL32UTF8
1 row selected.
CREATE TABLE TEST (ID VARCHAR2(3 CHAR));
Table created.
INSERT INTO TEST(ID) VALUES('abc');
1 row created.
INSERT INTO TEST(ID) VALUES('ффф');
1 row created.
Maybe a typo on your side?
Update:
Looks like your client uses wrong character settings.
ф (U+0444: Cyrillic Small Letter Ef) has these byte values:
+-------------------------------------------------------------------------------+
|Encoding|hex |dec (bytes)|dec |binary |
+-------------------------------------------------------------------------------+
|UTF-8 |D1 84 |209 132 |53636 |11010001 10000100 |
|UTF-16BE|04 44 |4 68 |1092 |00000100 01000100 |
|UTF-16LE|44 04 |68 4 |17412 |01000100 00000100 |
|UTF-32BE|00 00 04 44|0 0 4 68 |1092 |00000000 00000000 00000100 01000100|
|UTF-32LE|44 04 00 00|68 4 0 0 |1141112832|01000100 00000100 00000000 00000000|
+-------------------------------------------------------------------------------+
DUMP should return Typ=1 Len=6 CharacterSet=AL32UTF8: d1,84,d1,84,d1,84 but you get ef,bf,bd which is U+FFFD: Replacement Character
You don't insert ффф, it is converted to ���.
I guess actually your client uses UTF-8 but you did not tell the database, so most likely the database assumes the client uses default US7ASCII (or something else). The client sends 6 Bytes (d1,84,d1,84,d1,84) but the Oracle database interprets it as 6 Single-Byte characters.
Typically you use the NLS_LANG environment variable to define this. However, dbeaver is Java based and Java/JDBC does not use the NLS_LANG settings - at least not by default.

Related

Same string but different byte length

I'm having some issues here trying to select some data.
I have a trigger that inserts in other table, but only if the string value doesn't exists on the other table , so I validate before insert by counting:
SELECT COUNT(*) INTO v_puesto_x_empresa FROM p_table_1
WHERE PUE_EMP_ID = Z.PIN_CODEMP
AND PUE_NOMBRE = v_puesto_nombre;
If the counter above is lower than 1 or equal to 0, then the process allow to insert data into the correspondent table.
As it turns out, is duplicating the data for some strange reason, so I checked the source.
I use a cursor that prepares the data I need to insert, and I noticed that for some strings, even though they are the same, it treats them as different strings.
select
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci')))) PIN_DESCRIPCION,
LENGTHB(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) LEGTHB,
LENGTH(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "NORMAL LENGTH",
LENGTHC(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH C",
LENGTH2(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH 2"
FROM PES_PUESTOS_INTERNOS
where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by PIN_DESCRIPCION
order by PIN_DESCRIPCION asc
;
These are the results:
Results but in text :
PIN_DESCRIPCION
----------------------------------------------------------------------
LEGTHB NORMAL LENGTH LENGTH C LENGTH 2
---------- ------------- ---------- ----------
ADMINISTRADOR DE PROCESOS
27 27 27 27
ADMINISTRADOR DE PROCESOS Y CALIDAD
36 36 36 36
AFORADOR
9 9 9 9
AFORADOR
10 10 10 10
ASISTENTE ADMINISTRATIVO
25 25 25 25
ASISTENTE ADMINISTRATIVO
26 26 26 26
So my guess, is that for some reason, even though they are the same, somehow they are treated as different internally.
Note: this table loads user input data, so the results, although are meant to be the 'same word', may encounter some linguistic char difference, such as :
user input 1: aforador
user input 2: Aforador
For that reason, I applied the next piece of code, so I can process only the one word(string):
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))
So, if for example, I query the same data without that, I would get the following result:
PIN_DESCRIPCION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
administrador de procesos
administrador de procesos
Administrador de Procesos y Calidad
Aforador
aforador
aforador
I'll appreciate any help with this issue.
Thanks in advance.
My best regards.
This strings are not the same. The second one has another character that is not (obviously) being displayed.
If you do:
SELECT PIN_DESCRIPCION,
DUMP(PIN_DESCRIPCION)
FROM PES_PUESTOS_INTERNOS
WHERE pin_codemp = '8F90CF5D287E2419E0530200000AA716'
GROUP BY PIN_DESCRIPCION
ORDER BY PIN_DESCRIPCION asc;
Then you will see the binary values that comprise the data and should see that there is an extra trailing character. This could be:
Whitespace (which you can remove with RTRIM);
A zero-width character;
A NUL (ASCII 0) character;
Or something else that is not being displayed.
For example, if you have a function to create a string from bytes (passed as hexadecimal values):
CREATE FUNCTION createString( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then we can create the sample data:
CREATE TABLE table_name ( value VARCHAR2(50) )
/
INSERT INTO table_name ( value )
SELECT 'AFORADOR' FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '00' ) FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '01' ) FROM DUAL
/
Then, if we use:
SELECT value,
DUMP( value ) AS dump,
LENGTH(TRIM(value)) AS length
FROM table_name
This outputs:
| VALUE | DUMP | LENGTH |
|-----------|----------------------------------------|--------|
| AFORADOR | Typ=1 Len=8: 65,70,79,82,65,68,79,82 | 8 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,0 | 9 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,1 | 9 |
(Yes, the alignment of the table is messed up ... that's because of the unprintable characters in the string.)
sqlfiddle here
Update:
From comments:
What is the output of SELECT DUMP(pin_descripcion) FROM <rest of your query> for those rows?
These are the results:
Aforador Typ=1 Len=8: 65,102,111,114,97,100,111,114
aforador Typ=1 Len=9: 97,102,111,114,97,100,111,114,0
Look at the difference in the byte values.
The first characters are 65 (or A) and 97 (or a); however, by using UPPER, you are masking that difference in your output. If you use UPPER in the GROUP BY then the difference in case will not matter.
Additionally, the second one has an extra byte at the end with the value of 0 (ASCII NUL) so, regardless of case, they are different strings. This extra last character is not printable so you won't see the difference in the output (and is used as a string terminator so may have unexpected side-effects; for example, trying to display a NUL character in a db<>fiddle hangs the output) but it means that the strings are different so that GROUP BY will not aggregate them together.
You can do:
CREATE FUNCTION createStringFromHex( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then, to remove the NUL string terminators:
UPDATE PES_PUESTOS_INTERNOS
SET PIN_DESCRIPCION = RTRIM( PIN_DESCRIPCION, createStringFromHex( '00' ) )
WHERE SUBSTR( PIN_DESCRIPCION, -1 ) = createStringFromHex( '00' );
Then you should be able to do:
SELECT UPPER(PIN_DESCRIPCION) AS PIN_DESCRIPCION,
LENGTH(UPPER(PIN_DESCRIPCION)) AS LENGTH
FROM PES_PUESTOS_INTERNOS
--where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by UPPER(PIN_DESCRIPCION)
order by UPPER(PIN_DESCRIPCION) asc;
and only see single rows.
db<>fiddle here

Oracle returns wrong values with LENGTH and INSTR

I'm aiming to retrieve the position of chars in a string, plus the length of a string.
The value of the notes field in the internal_notes table, for the row with ticket_id equal to 1679467247 is literally 'this is a test note'.
When I use the functions against the literal stirngs they work, but when I retrieve the info directly from the table column, the values are just wrong.
Any ideas as to what might be happening?
select notes,
LENGTH(notes),
INSTR(notes,' ')
FROM internal_notes
where ticket_id = 1679467247
union
select 'this is a test note',
LENGTH('this is a test note'),
INSTR('this is a test note',' ')
from dual
This returns the following:
NOTES LENGTH(NOTES) INSTR(NOTES,' ')
------------------- ------------- ----------------
this is a test note 32 11
this is a test note 19 5
You can get this apparent inconsistency if you have zero-width characters in the value; for example:
create table internal_notes(ticket_id number, notes varchar2(32 char));
insert into internal_notes(ticket_id, notes)
values (1679467247, unistr('\200c\200cthis is a test note\200c\200c\200c\200c\200c\200c\200c\200c\200c\200c\200c'));
insert into internal_notes(ticket_id, notes)
values (1679467248, unistr('\200c\200cthis is a test note'));
insert into internal_notes(ticket_id, notes)
values (1679467249, 'this is a test note');
select notes,
LENGTH(notes),
INSTR(notes,' ')
FROM internal_notes
where ticket_id = 1679467247;
NOTES LENGTH(NOTES) INSTR(NOTES,'')
-------------------------------- ------------- ---------------
‌‌this is a test note‌‌‌‌‌‌‌‌‌‌‌ 32 7
I said 'apparent inconsistency' because those numbers are correct; they just don't look it if you can't see some of the characters. Invisible characters still count.
As #MTO suggested you can use the dump() function to see exactly what is stored in the table, in decimal or hex representation, or mixed with 'normal' characters which can be a bit easier to interpret:
select notes,
LENGTH(notes),
INSTR(notes,' '),
dump(notes, 1000) as dmp
FROM internal_notes;
NOTES LENGTH(NOTES) INSTR(NOTES,'')
-------------------------------- ------------- ---------------
DMP
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
‌‌this is a test note‌‌‌‌‌‌‌‌‌‌‌ 32 7
Typ=1 Len=58: e2,80,8c,e2,80,8c,t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,
8c
‌‌this is a test note 21 7
Typ=1 Len=25: e2,80,8c,e2,80,8c,t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e
this is a test note 19 5
Typ=1 Len=19: t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e
db<>fiddle - though that is showing the zero-width characters as question marks, unlike SQL Developer and SQL*Plus.
Other zero-width characters are available (space, non-joiner, joiner), and you might see something different in your dump - it just has to be something your client doesn't display at all. Whatever is in there, if it affects all rows and not just that single ticket, then how and why is probably down to whatever front-end/application populates the table - possibly from a character set mismatch, but it could be intentional. If it is just that ticket then that note is an interesting test...

Oracle convert column which is varchar / hex to numeric format after loading data from xlsx file

I loaded an Excel file into a table and found out that some data in my varchar2 field is in HEX format.
When I execute my query, I have no problem, but when I try to insert my data into another table with a number format it does not work.
This query shows which column is in HEX format :
SELECT qty, TO_NUMBER(REPLACE(qty, CHR(32), '')) as nbkg, RAWTOHEX(qty) as Graphics
FROM (
SELECT nvl(qty, 0) AS qty,
case
when pkg_tools.f_is_number(qty) = 1 then 'OK'
else 'NOK'
end kg
FROM table
)
WHERE kg = 'NOK';
*qty is a varchar2(50)
My output :
qty nbkg Graphics
--- ---- --------
10 009,000 10009,000 3130203030392C303030 -- work
3 250,00 3250,00 33203235302C3030 -- work
1 000,00 1000,00 31203030302C3030 -- work
1 230,00 1 230,00 31A03233302C3030 -- Not work
1 750,00 1 750,00 31A03735302C3030 -- Not work
4 000,00 4 000,00 34A03030302C3030 -- Not work
1 980,00 1 980,00 31A03938302C3030 -- Not work
1 050,00 1 050,00 31A03035302C3030 -- Not work
1 050,00 1 050,00 31A03035302C3030 -- Not work
1 000,00 1 000,00 31A03030302C3030 -- Not work
39 950,00 39 950,00 3339A03935302C3030 -- Not work
3 000,00 3 000,00 33A03030302C3030
...
...
I am trying to convert it into a number before inserting my data :
SELECT TO_NUMBER(REPLACE(qty, CHR(32), ''))
FROM table;
SELECT TO_NUMBER(REGEXP_REPLACE(qty, '\s'))
FROM table;
and I am getting an error :
ORA-01722: invalid number
How can i convert this column which is varchar / hex to numeric format?
Thank you.
add a Format mask and NLS information by using to_number function.
so it could look like:
select to_number('1 250,000','999G999G999D99999','NLS_NUMERIC_CHARACTERS='', ''') as n
from dual
if you check the hex value e.g. 31A03030302C3030 you can see the A0 on the second Position. that is displayed as empty string but is and not a space that has a hex Position 20 in ASCII table. So just replace that 160 with 32
to_number(replace('1 250,000',chr(160),chr(32)),'999G999G999D99999','NLS_NUMERIC_CHARACTERS='', ''') as n
result:
n
------
1250

Insert character to a number datatype column

Note: I cant change the datatype of column
I want to store a character into a table that has column with number datatype.
The work around i found is convert the character values to ASCII and when retrieving it from the database convert it back to character.
I used couple of function ASCII and ASCIISTR but the limitation with these functions are they are converting only first character of the string.
So i used dump function
select dump('Puneet_kushwah1') from dual;
Result: Typ=96 Len=15: 80,117,110,101,101,116,95,107,117,115,104,119,97,104,49
This function is giving ASCII value of all the characters. Then i execute below query
select replace(substr((DUMP('Puneet_kushwah1')),(instr(DUMP('Puneet_kushwah1'),':')+2 )),',',' ') from dual;
Result: 80 117 110 101 101 116 95 107 117 115 104 119 97 104 49
then i used a special character to fill the space, so that i can replace it while retrieving from the database.
select replace(substr((DUMP('Puneet_kushwah1')),(instr(DUMP('Puneet_kushwah1'),':')+2 )),',','040') from dual;
Result: 80040117040110040101040101040116040950401070401170401150401040401190409704010404049
Table definition:
create table test (no number);
Then i inserted it into the table
INSERT into test SELECT replace(substr((DUMP('Puneet_kushwah1')),(instr(DUMP('Puneet_kushwah1'),':')+2 )),',','040') from dual;
Problem 1:
When i execute
select * from test;
i got
Result: 8.004011704011E82
I want to convert it into number only. Exact same what i inserted.
Problem 2:
And then when i execute select i want it to return the exact character string.
Please help i tried many functions.
Thanks in advance.
You can't get the exact string back because Oracle numbers are only stored up to 38 digits of precision.
So if you run this:
select cast(no as varchar2(100))
from test;
You'll get:
80040117040110040101040101040116040950400000000000000000000000000000000000000000000
While I advise not to proceed like this as this could be rife for errors and a possible maintenance nightmare, I do like a challenge and have been forced to do some screwy things myself in order make some vendor's bizarre way of doing things work for us so I sympathize with you if that is the case. So, for the fun of it check this out.
Convert to hex, then to a decimal and insert into the database (x_test has one NUMBER column), then select, converting back:
SQL> insert into x_test
2 select to_number(rawtohex('Puneet_kushwah1'), rpad('X', length(rawtohex('Puneet_kushwah1')), 'X')) from dual;
1 row created.
SQL> select * from x_test;
col1
----------
4.1777E+35
SQL> SELECT utl_raw.cast_to_varchar2(hextoraw(trim(to_char(col1, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))))
2 FROM x_test;
UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW(TRIM(TO_CHAR(col1,'XXXXXXXXXXXXXXXXXXXXXXXXXXXX
--------------------------------------------------------------------------------
Puneet_kushwah1
SQL>
While it's a horrible idea and a horrible data model, you could convert some strings into numbers by converting their raw representation into a number:
create or replace function string_to_number(p_string varchar2)
return number as
l_raw raw(40);
l_number number;
begin
l_raw := utl_i18n.string_to_raw(data => p_string, dst_charset => 'AL32UTF8');
l_number := to_number(rawtohex(l_raw), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx');
return l_number;
end;
/
And back again:
create or replace function number_to_string(p_number number)
return varchar2 as
l_raw raw(40);
l_string varchar2(20);
begin
l_raw := hextoraw(to_char(p_number, 'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'));
l_string := utl_i18n.raw_to_char(data => l_raw, src_charset => 'AL32UTF8');
return l_string;
end;
/
Which you could use as:
insert into test (no) values (string_to_number('Puneet_kushwah1'));
1 row inserted.
select * from test;
NO
---------------------------------------
417765537084927079232028220523112497
select number_to_string(no) from test;
NUMBER_TO_STRING(NO)
--------------------------------------------------------------------------------
Puneet_kushwah1
You don't really need functions, you could do the conversions in-line; but this makes what's happening a bit clearer.
But you're restricted by the precision of the number type. I think you're limited to about 20 characters, but it'll depend a bit on the actual string and its hex representation.
(I am not endorsing this approach, it's just a mildly interesting problem).

Sequences with PL SQL

I know how to create a sequence in pl sql. However, how would I set the values to all have say 3 digits? is there another sql statement to do this when I create a sequence?
so an example would be:
000
001
012
003
Thanks guys!
First, just to be clear, you do not create sequences in PL/SQL. You can only create sequences in SQL.
Second, if you want a column to store exactly three digits, you would need the data type to be VARCHAR2 (or some other string type) rather than the more common NUMBER since a NUMBER by definition does not store leading zeroes. You can, of course, do that, but it would be unusual.
That said, you can use the "fm009" format mask to generate a string with exactly 3 characters from a numeric sequence (the "fm" bit is required to ensure that you don't get additional spaces-- you could TRIM the result of the TO_CHAR call as well and dispense with the "fm" bit of the mask).
SQL> create table t( col1 varchar2(3) );
Table created.
SQL> create sequence t_seq;
Sequence created.
SQL> ed
Wrote file afiedt.buf
1 insert into t
2 select to_char( t_seq.nextval, 'fm009' )
3 from dual
4* connect by level <= 10
SQL> /
10 rows created.
SQL> select * from t;
COL
---
004
005
006
007
008
009
010
011
012
013
10 rows selected.
haven't used plsql in a while, but here goes:
given an integer sequence myseq,
to_char(myseq.nextval, '009')
You can also use the lpad function.
In Oracle/PLSQL, the lpad function pads the left-side of a string with a specific set of characters.
For example:
lpad('tech', 7); would return ' tech'
lpad('tech', 2); would return 'te'
lpad('tech', 8, '0'); would return '0000tech'
lpad('tech on the net', 15, 'z'); would return 'tech on the net'
lpad('tech on the net', 16, 'z'); would return 'ztech on the net'
In your example you would use
lpad('tech', 8, '0'); would return '0000tech'
i.e. if the string is less than 8 characters long, add 0s to the start of the string until the string is 8 characters long.
Ref: http://www.techonthenet.com/oracle/functions/lpad.php
Also, to add the 0s to the right you can use the rpad function.