Same string but different byte length - sql

I'm having some issues here trying to select some data.
I have a trigger that inserts in other table, but only if the string value doesn't exists on the other table , so I validate before insert by counting:
SELECT COUNT(*) INTO v_puesto_x_empresa FROM p_table_1
WHERE PUE_EMP_ID = Z.PIN_CODEMP
AND PUE_NOMBRE = v_puesto_nombre;
If the counter above is lower than 1 or equal to 0, then the process allow to insert data into the correspondent table.
As it turns out, is duplicating the data for some strange reason, so I checked the source.
I use a cursor that prepares the data I need to insert, and I noticed that for some strings, even though they are the same, it treats them as different strings.
select
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci')))) PIN_DESCRIPCION,
LENGTHB(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) LEGTHB,
LENGTH(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "NORMAL LENGTH",
LENGTHC(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH C",
LENGTH2(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH 2"
FROM PES_PUESTOS_INTERNOS
where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by PIN_DESCRIPCION
order by PIN_DESCRIPCION asc
;
These are the results:
Results but in text :
PIN_DESCRIPCION
----------------------------------------------------------------------
LEGTHB NORMAL LENGTH LENGTH C LENGTH 2
---------- ------------- ---------- ----------
ADMINISTRADOR DE PROCESOS
27 27 27 27
ADMINISTRADOR DE PROCESOS Y CALIDAD
36 36 36 36
AFORADOR
9 9 9 9
AFORADOR
10 10 10 10
ASISTENTE ADMINISTRATIVO
25 25 25 25
ASISTENTE ADMINISTRATIVO
26 26 26 26
So my guess, is that for some reason, even though they are the same, somehow they are treated as different internally.
Note: this table loads user input data, so the results, although are meant to be the 'same word', may encounter some linguistic char difference, such as :
user input 1: aforador
user input 2: Aforador
For that reason, I applied the next piece of code, so I can process only the one word(string):
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))
So, if for example, I query the same data without that, I would get the following result:
PIN_DESCRIPCION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
administrador de procesos
administrador de procesos
Administrador de Procesos y Calidad
Aforador
aforador
aforador
I'll appreciate any help with this issue.
Thanks in advance.
My best regards.

This strings are not the same. The second one has another character that is not (obviously) being displayed.
If you do:
SELECT PIN_DESCRIPCION,
DUMP(PIN_DESCRIPCION)
FROM PES_PUESTOS_INTERNOS
WHERE pin_codemp = '8F90CF5D287E2419E0530200000AA716'
GROUP BY PIN_DESCRIPCION
ORDER BY PIN_DESCRIPCION asc;
Then you will see the binary values that comprise the data and should see that there is an extra trailing character. This could be:
Whitespace (which you can remove with RTRIM);
A zero-width character;
A NUL (ASCII 0) character;
Or something else that is not being displayed.
For example, if you have a function to create a string from bytes (passed as hexadecimal values):
CREATE FUNCTION createString( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then we can create the sample data:
CREATE TABLE table_name ( value VARCHAR2(50) )
/
INSERT INTO table_name ( value )
SELECT 'AFORADOR' FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '00' ) FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '01' ) FROM DUAL
/
Then, if we use:
SELECT value,
DUMP( value ) AS dump,
LENGTH(TRIM(value)) AS length
FROM table_name
This outputs:
| VALUE | DUMP | LENGTH |
|-----------|----------------------------------------|--------|
| AFORADOR | Typ=1 Len=8: 65,70,79,82,65,68,79,82 | 8 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,0 | 9 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,1 | 9 |
(Yes, the alignment of the table is messed up ... that's because of the unprintable characters in the string.)
sqlfiddle here
Update:
From comments:
What is the output of SELECT DUMP(pin_descripcion) FROM <rest of your query> for those rows?
These are the results:
Aforador Typ=1 Len=8: 65,102,111,114,97,100,111,114
aforador Typ=1 Len=9: 97,102,111,114,97,100,111,114,0
Look at the difference in the byte values.
The first characters are 65 (or A) and 97 (or a); however, by using UPPER, you are masking that difference in your output. If you use UPPER in the GROUP BY then the difference in case will not matter.
Additionally, the second one has an extra byte at the end with the value of 0 (ASCII NUL) so, regardless of case, they are different strings. This extra last character is not printable so you won't see the difference in the output (and is used as a string terminator so may have unexpected side-effects; for example, trying to display a NUL character in a db<>fiddle hangs the output) but it means that the strings are different so that GROUP BY will not aggregate them together.
You can do:
CREATE FUNCTION createStringFromHex( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then, to remove the NUL string terminators:
UPDATE PES_PUESTOS_INTERNOS
SET PIN_DESCRIPCION = RTRIM( PIN_DESCRIPCION, createStringFromHex( '00' ) )
WHERE SUBSTR( PIN_DESCRIPCION, -1 ) = createStringFromHex( '00' );
Then you should be able to do:
SELECT UPPER(PIN_DESCRIPCION) AS PIN_DESCRIPCION,
LENGTH(UPPER(PIN_DESCRIPCION)) AS LENGTH
FROM PES_PUESTOS_INTERNOS
--where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by UPPER(PIN_DESCRIPCION)
order by UPPER(PIN_DESCRIPCION) asc;
and only see single rows.
db<>fiddle here

Related

How to include commas in Oracle external table

I have a pipe delimited file that contains commas on the very last field like so:
COLOR|CAT|CODES
Red|Pass|tiger, 12#fol, letmein
Blue|Pass|jkd#332, forpw, wonton
Gray|Pass|rochester, tommy, 23$ai,
I terminate the last column by whitespace, and everything works out good with no errors, except that it will only include/read the first value and first comma in the last column e.g. tiger, jkd#332, etc. Obviously because of the whitespace after the comma.
How do I include the commas without getting any errors? I have tried " ", /r, /n, /r/n and even excluding the "terminated by" in the last column, and while those will work to include the commas, I will get the ORA-29913 and ORA-30653 reject error every time I select all from the external table (contains thousands of records).
I have the reject limit to 10, but I don't want to change it to UNLIMITED because I don't want to ignore those errors, also I cannot change the file.
My code:
--etc..
FIELDS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
--etc..
CODES CHAR TERMINATED BY WHITESPACE
Here's how:
SQL> create table color (
2 color varchar2(5),
3 cat varchar2(5),
4 codes varchar2(50)
5 )
6 organization external (
7 type oracle_loader
8 default directory ext_dir
9 access parameters (
10 records delimited by newline
11 skip 1
12 fields terminated by '|'
13 missing field values are null
14 (
15 color char(5),
16 cat char(5),
17 codes char(50)
18 )
19 )
20 location ('color.txt')
21 )
22 parallel 5
23 reject limit unlimited;
SQL>
SQL> select * From color;
COLOR CAT CODES
----- ----- --------------------------------------------------
Red Pass tiger, 12#fol, letmein
Blue Pass jkd#332, forpw, wonton
Gray Pass rochester, tommy, 23$ai,
SQL>

How to count length of line break as 2 characters in oracle sql

I have data stored in table's column and it has a line break in the data. When I count the length of the string it returns me the count just fine. I want to make some changes and take the line break as 2 characters so if the data in table is something like this.
This
That
This should return length as 10 instead it is returning 9 for now which is understandable but I was to count the length of line break as 2 characters. So if there are 2 line breaks in data it will count them as 4 characters.
How can I achieve this ?
I want to use this in SUBSTR(COL, 1, 7)
By counting line break as 2 character it should return data like this
This
T
Hope someone can help
Just replace new line in the string with 2 characters, for example 'xx', before counting string length. More info on how to replace new lines in Oracle: Oracle REPLACE() function isn't handling carriage-returns & line-feeds
Update your value to have a line feed character before the carriage return character.
So if you have the table:
CREATE TABLE test_data ( value VARCHAR2(20) );
INSERT INTO test_data ( value ) VALUES ( 'This
That' );
Then you can insert the LF before the CR:
UPDATE test_data
SET value = REPLACE( value, CHR(10), CHR(13) || CHR(10) )
WHERE INSTR( value, CHR(10) ) > 0
Then your query:
SELECT SUBSTR( value, 1, 7 ) FROM test_data;
Outputs:
| SUBSTR(VALUE,1,7) |
| :---------------- |
| This |
| T |
db<>fiddle here

Oracle convert column which is varchar / hex to numeric format after loading data from xlsx file

I loaded an Excel file into a table and found out that some data in my varchar2 field is in HEX format.
When I execute my query, I have no problem, but when I try to insert my data into another table with a number format it does not work.
This query shows which column is in HEX format :
SELECT qty, TO_NUMBER(REPLACE(qty, CHR(32), '')) as nbkg, RAWTOHEX(qty) as Graphics
FROM (
SELECT nvl(qty, 0) AS qty,
case
when pkg_tools.f_is_number(qty) = 1 then 'OK'
else 'NOK'
end kg
FROM table
)
WHERE kg = 'NOK';
*qty is a varchar2(50)
My output :
qty nbkg Graphics
--- ---- --------
10 009,000 10009,000 3130203030392C303030 -- work
3 250,00 3250,00 33203235302C3030 -- work
1 000,00 1000,00 31203030302C3030 -- work
1 230,00 1 230,00 31A03233302C3030 -- Not work
1 750,00 1 750,00 31A03735302C3030 -- Not work
4 000,00 4 000,00 34A03030302C3030 -- Not work
1 980,00 1 980,00 31A03938302C3030 -- Not work
1 050,00 1 050,00 31A03035302C3030 -- Not work
1 050,00 1 050,00 31A03035302C3030 -- Not work
1 000,00 1 000,00 31A03030302C3030 -- Not work
39 950,00 39 950,00 3339A03935302C3030 -- Not work
3 000,00 3 000,00 33A03030302C3030
...
...
I am trying to convert it into a number before inserting my data :
SELECT TO_NUMBER(REPLACE(qty, CHR(32), ''))
FROM table;
SELECT TO_NUMBER(REGEXP_REPLACE(qty, '\s'))
FROM table;
and I am getting an error :
ORA-01722: invalid number
How can i convert this column which is varchar / hex to numeric format?
Thank you.
add a Format mask and NLS information by using to_number function.
so it could look like:
select to_number('1 250,000','999G999G999D99999','NLS_NUMERIC_CHARACTERS='', ''') as n
from dual
if you check the hex value e.g. 31A03030302C3030 you can see the A0 on the second Position. that is displayed as empty string but is and not a space that has a hex Position 20 in ASCII table. So just replace that 160 with 32
to_number(replace('1 250,000',chr(160),chr(32)),'999G999G999D99999','NLS_NUMERIC_CHARACTERS='', ''') as n
result:
n
------
1250

oracle sql statement to parse string

I have a database field that stores a password with junk characters between each letter in the password. 3 junk chars, then 2 junk chars, then 3 junk chars, etc There will be 3 junk chars to start password and 2 or 3 junk chars at end of password.
So if password is BOB, the db value will be xxxBxxOxxxBxx where x is a random character.
Is there a way to return BOB in an oracle select statement using substrings,etc?
Thanks for anyone up for this challenge
You can use the regular expression ...((.)..(.)?)? and just keep the 2nd and 3rd capture groups:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( password ) AS
SELECT 'xxx' FROM DUAL UNION ALL
SELECT 'xxxBxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxxBxxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxxBxxxOxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxxBxxxOxxBxxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxxBxxxOxxBxxxBxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxxBxxxOxxBxxxBxxOxxx' FROM DUAL UNION ALL
SELECT 'xxxBxxOxxxBxxBxxxOxxBxxxBxxOxxxBxx' FROM DUAL;
Query 1:
SELECT REGEXP_REPLACE(
password,
'...((.)..(.)?)?',
'\2\3'
) As password
FROM table_name
Results:
| PASSWORD |
|-----------|
| (null) |
| B |
| BO |
| BOB |
| BOBB |
| BOBBO |
| BOBBOB |
| BOBBOBB |
| BOBBOBBO |
| BOBBOBBOB |
If you have alphanumeric characters as junk as well and you know your maximum password length then you could do it the dirty way using substr() function. I've generated numbers with 2 and 3 letters gap from 4 to 98 first and cross joined it to table which stores passwords to avoid typing each number by hand. This will cover passwords up to 28 characters. Feel free to play with that.
Test data
create table t(pw varchar(255));
insert into t values ('xxxBxxOxxxBxxFxxxIxxVxxxExx!xxx');
insert into t values ('xxxPxxAxxxSxxSxxx');
Solution
Uses internal table to generate values used as input for substring function, cross join to apply each substring and then listagg to combine it again
with lookup as (
select column_value as nr
from table(sys.odcinumberlist(4,7,11,14,18,21,25,28,32,35,39,42,46,49,53,56,60,63,67,70,74,77,81,84,88,91,95,98))
)
select listagg(substr(t.pw, l.nr, 1), '') within group(ORDER BY l.nr) as password
from lookup l
cross join t
group by t.pw;
Output
password
--------
BOBFIVE!
PASS
Check it out here: Live DEMO
This solution may take a bit of time to process for many rows.
Here is a PL/SQL solution that will take a password of any size:
CREATE OR REPLACE FUNCTION decode_password (p_password IN VARCHAR2)
RETURN VARCHAR2
DETERMINISTIC
AS
l_ret VARCHAR2 (100);
l_pos INT := 4;
BEGIN
WHILE LENGTH (p_password) >= l_pos
LOOP
l_ret := l_ret || SUBSTR (p_password, l_pos, 1);
l_pos := l_pos + 3;
l_ret := l_ret || SUBSTR (p_password, l_pos, 1);
l_pos := l_pos + 4;
END LOOP;
RETURN l_ret;
END decode_password;
To test it:
WITH
aset
AS
(SELECT 'BBBBRRRRIIIAAAANNN' pwd
FROM DUAL)
SELECT pwd, decode_password (pwd) decoded
FROM aset;
The result is
BBBBRRRRIIIAAAANNN BRIAN
Reusing the idea from here, though in this case the REGEXP_REPLACE() approach from MTO is definitely the best one.
First we generate CTE as a number table with integer from 0 to the maximum of plaintext characters of a password minus 1.
We can get the number of cleartext characters in an obfuscated string as follows:
We can ignore the first 3 characters, those will always be garbage.
If the number of plaintext characters is even, there will be 7 characters for each 2 plaintext characters, because the total of the garbage characters is 5 for 2 plaintext characters.
So we get the number of plaintext characters with FLOOR((LENGTH(PASSWORDS.PASSWORD) - 3) / 7) * 2 if it is even.
If it is odd, it is one more than an even number of plaintext characters and the length of the string's length minus 3 is no longer divisible by 7, because the last 1 plaintext character will be followed by 2 garbage characters.
So we can check for the string length minus 3 modulo 7. If it is 0 the number of plaintext characters is even, we don't need to add anything. If it isn't 0, we add 1 and get the total (odd) number of plaintext characters.
Together that's FLOOR((LENGTH(PASSWORDS.PASSWORD) - 3) / 7) * 2 + DECODE(MOD(LENGTH(PASSWORDS.PASSWORD) - 3, 7), 0, 0, 1).
We left join that CTE to the passwords table, so that for each cleartext character of a password there is a row with the password and a number from 0 to the number of cleartext characters of the password.
We can now use SUBSTR() to get each cleartext character. The base offset is 4, as we can ignore the first three characters. The joined number from the CTE let us calculate the additional offset. We always advance by at least 3 characters, which gives us CTE.I * 3. Additionally every 2 cleartext characters we need to further advance 1 time, so we add FLOOR(CTE.I / 2) giving us SUBSTR(PASSWORDS.PASSWORD, 4 + CTE.I * 3 + FLOOR(CTE.I / 2), 1).
No we have every single cleartext character, but in different rows. To concatenate them back together we group by the obfuscated password (and possibly by an ID too, should there be more then one row in the base table with the same password) and use LISTAGG. Ordering the the number from the CTE makes sure every plaintext character gets the right position.
WITH CTE(I)
AS
(
SELECT 0 I
FROM DUAL
UNION ALL
SELECT CTE.I + 1
FROM CTE
WHERE CTE.I + 1 < (SELECT MAX(FLOOR((LENGTH(PASSWORDS.PASSWORD) - 3) / 7) * 2
+ DECODE(MOD(LENGTH(PASSWORDS.PASSWORD) - 3, 7),
0, 0,
1))
FROM PASSWORDS)
)
SELECT PASSWORDS.ID,
PASSWORDS.PASSWORD PASSWORD_OBFUSACTED,
LISTAGG(SUBSTR(PASSWORDS.PASSWORD, 4 + CTE.I * 3 + FLOOR(CTE.I / 2), 1))
WITHIN GROUP (ORDER BY CTE.I) PASSWORD_CLEARTEXT
FROM PASSWORDS
LEFT JOIN CTE
ON CTE.I < FLOOR((LENGTH(PASSWORDS.PASSWORD) - 3) / 7) * 2
+ DECODE(MOD(LENGTH(PASSWORDS.PASSWORD) - 3, 7),
0, 0,
1)
GROUP BY PASSWORDS.ID,
PASSWORDS.PASSWORD;
db<>fiddle
Note: This demonstrates, that an attacker must not even guess the random character (or characters, even that wouldn't make a difference), to get the cleartext password. Ergo this is an unsafe method to store passwords! Use hashing instead, that's (most likely, of course depending on the algorithm) irreversible.

Oracle SQL My input should be the format mask in to_char(:n,'fm000')

If my string has more length than 4 digits then my output is displayed as #######
For that the query is:
select to_char(:n,'fm0000') from dual;
It should take the number of zeros based on the input bind variables length after fm.
INPUT : 123
OUTPUT: 0123
INPUT : 123456789
OUTPUT: 0123456789
Zero should come before the number of input.
Any Suggestions!!
Edit - the OP changed the requirement completely.
The solution to the edited problem is trivial:
... '0' || to_char(:n)
End of edit - Original answer (to the question as originally posted) below.
It seems what you are trying to do is best achieved like this. Never mind the with clause, that is only for testing. Adapt as needed, and use enough nines in the format model to cover your longest inputs.
with
test_data ( n ) as (
select 3 from dual union all
select 19923 from dual
)
select n, to_char(n, 'fm99990000') as n_str
from test_data
;
N N_STR
------ --------
3 0003
19923 19923
2 rows selected.
According to documentation
If you omit fmt, then n is converted to a VARCHAR2 value exactly long
enough to hold its significant digits
so perhaps all you need is to_char(:n)
Based on your updated requirement, you can do this with a slight modification of #mathguy's original format mask manipulation; or with default conversion to a string and left-padding, or even more simly just prepending a single zero:
with t (n) as (
select 1 from dual
union all select 123 from dual
union all select 123456789 from dual
)
select n,
to_char(n, 'fm' || rpad('0', length(n) + 1, '0')) as n_str_1,
lpad(to_char(n), length(n) + 1, '0') as n_str_2,
'0' || to_char(n) as n_str_3
from t;
N N_STR_1 N_STR_2 N_STR_3
---------- ---------- ---------- ----------
1 01 01 01
123 0123 0123 0123
123456789 0123456789 0123456789 0123456789
If the field you are checking is a string:
case when substring(field, 1, 1)!=0 then '0'||field else field end
If it's a number you can always add a 0 as proposed by others:
'0'||to_char(field)