How to include commas in Oracle external table - sql

I have a pipe delimited file that contains commas on the very last field like so:
COLOR|CAT|CODES
Red|Pass|tiger, 12#fol, letmein
Blue|Pass|jkd#332, forpw, wonton
Gray|Pass|rochester, tommy, 23$ai,
I terminate the last column by whitespace, and everything works out good with no errors, except that it will only include/read the first value and first comma in the last column e.g. tiger, jkd#332, etc. Obviously because of the whitespace after the comma.
How do I include the commas without getting any errors? I have tried " ", /r, /n, /r/n and even excluding the "terminated by" in the last column, and while those will work to include the commas, I will get the ORA-29913 and ORA-30653 reject error every time I select all from the external table (contains thousands of records).
I have the reject limit to 10, but I don't want to change it to UNLIMITED because I don't want to ignore those errors, also I cannot change the file.
My code:
--etc..
FIELDS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
--etc..
CODES CHAR TERMINATED BY WHITESPACE

Here's how:
SQL> create table color (
2 color varchar2(5),
3 cat varchar2(5),
4 codes varchar2(50)
5 )
6 organization external (
7 type oracle_loader
8 default directory ext_dir
9 access parameters (
10 records delimited by newline
11 skip 1
12 fields terminated by '|'
13 missing field values are null
14 (
15 color char(5),
16 cat char(5),
17 codes char(50)
18 )
19 )
20 location ('color.txt')
21 )
22 parallel 5
23 reject limit unlimited;
SQL>
SQL> select * From color;
COLOR CAT CODES
----- ----- --------------------------------------------------
Red Pass tiger, 12#fol, letmein
Blue Pass jkd#332, forpw, wonton
Gray Pass rochester, tommy, 23$ai,
SQL>

Related

OPTIONALLY ENCLOSED BY. can we use in sql query? or what alternate solution for this?

oracle 12.2.9
db version 18c
we are getting .csv(comma separated) file form external source. and need to split_string into 1 TABLE type Array field and then need to insert into interface table.
but as i can see in .csv(comma separated) file amount field having "," between amount i.e "71,007,498.00"
i have this value "71,007,498.00",0.00,0.00,"71,007,498.00",
so while splitting this value, it should be like
lv_data_tbl := split_string('"71,007,498.00",0.00,0.00,"71,007,498.00",' , ',');
expected output
lv_data_tbl(1)=71,007,498.00
lv_data_tbl(2)=0.00
lv_data_tbl(3)=0.00
lv_data_tbl(4)=71,007,498.00
but getting this output:-
lv_data_tbl(1)=71
lv_data_tbl(2)=007
lv_data_tbl(3)=498.00
lv_data_tbl(4)=0.00
lv_data_tbl(5)=0.00
lv_data_tbl(6)=71
lv_data_tbl(7)=007
lv_data_tbl(8)=498.00
I guess it can be done in SQL by parsing that string, but - why wouldn't you use an external tables feature? As it uses SQL*Loader in the background, it lets you use optionally enclosed parameter.
For example, this is contents of my source data (filename is test_quotes.csv):
1,"71,007,498.00",0.00,0.00,"71,007,498.00"
2,15.00,"12,332.08","8.13",2.82
Let's create the external table. It requires you to have access to a directory (line #11) (Oracle object that points to a filesystem directory that contains the file. If you're not sure how to get it, talk to your DBA. If there's none, say so):
SQL> CREATE TABLE test_ext
2 (
3 id NUMBER,
4 val1 VARCHAR2 (15),
5 val2 VARCHAR2 (15),
6 val3 VARCHAR2 (15),
7 val4 VARCHAR2 (15)
8 )
9 ORGANIZATION EXTERNAL
10 (TYPE oracle_loader
11 DEFAULT DIRECTORY kcdba_dpdir
12 ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE
13 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
14 (id,
15 val1,
16 val2,
17 val3,
18 val4))
19 LOCATION ('test_quotes.csv'))
20 REJECT LIMIT UNLIMITED;
Table created.
Any data?
SQL> SELECT * FROM test_ext;
ID VAL1 VAL2 VAL3 VAL4
---------- --------------- --------------- --------------- ---------------
1 71,007,498.00 0.00 0.00 71,007,498.00
2 15.00 12,332.08 8.13 2.82
SQL>
Fine; it works with no effort at all. OK, with a little bit of effort.
On the other hand, you could use SQL*Loader itself - write a control file and load data. It is really, really fast. And - you don't have to have access to any directory - source file can reside on your own hard disk.

Same string but different byte length

I'm having some issues here trying to select some data.
I have a trigger that inserts in other table, but only if the string value doesn't exists on the other table , so I validate before insert by counting:
SELECT COUNT(*) INTO v_puesto_x_empresa FROM p_table_1
WHERE PUE_EMP_ID = Z.PIN_CODEMP
AND PUE_NOMBRE = v_puesto_nombre;
If the counter above is lower than 1 or equal to 0, then the process allow to insert data into the correspondent table.
As it turns out, is duplicating the data for some strange reason, so I checked the source.
I use a cursor that prepares the data I need to insert, and I noticed that for some strings, even though they are the same, it treats them as different strings.
select
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci')))) PIN_DESCRIPCION,
LENGTHB(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) LEGTHB,
LENGTH(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "NORMAL LENGTH",
LENGTHC(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH C",
LENGTH2(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH 2"
FROM PES_PUESTOS_INTERNOS
where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by PIN_DESCRIPCION
order by PIN_DESCRIPCION asc
;
These are the results:
Results but in text :
PIN_DESCRIPCION
----------------------------------------------------------------------
LEGTHB NORMAL LENGTH LENGTH C LENGTH 2
---------- ------------- ---------- ----------
ADMINISTRADOR DE PROCESOS
27 27 27 27
ADMINISTRADOR DE PROCESOS Y CALIDAD
36 36 36 36
AFORADOR
9 9 9 9
AFORADOR
10 10 10 10
ASISTENTE ADMINISTRATIVO
25 25 25 25
ASISTENTE ADMINISTRATIVO
26 26 26 26
So my guess, is that for some reason, even though they are the same, somehow they are treated as different internally.
Note: this table loads user input data, so the results, although are meant to be the 'same word', may encounter some linguistic char difference, such as :
user input 1: aforador
user input 2: Aforador
For that reason, I applied the next piece of code, so I can process only the one word(string):
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))
So, if for example, I query the same data without that, I would get the following result:
PIN_DESCRIPCION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
administrador de procesos
administrador de procesos
Administrador de Procesos y Calidad
Aforador
aforador
aforador
I'll appreciate any help with this issue.
Thanks in advance.
My best regards.
This strings are not the same. The second one has another character that is not (obviously) being displayed.
If you do:
SELECT PIN_DESCRIPCION,
DUMP(PIN_DESCRIPCION)
FROM PES_PUESTOS_INTERNOS
WHERE pin_codemp = '8F90CF5D287E2419E0530200000AA716'
GROUP BY PIN_DESCRIPCION
ORDER BY PIN_DESCRIPCION asc;
Then you will see the binary values that comprise the data and should see that there is an extra trailing character. This could be:
Whitespace (which you can remove with RTRIM);
A zero-width character;
A NUL (ASCII 0) character;
Or something else that is not being displayed.
For example, if you have a function to create a string from bytes (passed as hexadecimal values):
CREATE FUNCTION createString( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then we can create the sample data:
CREATE TABLE table_name ( value VARCHAR2(50) )
/
INSERT INTO table_name ( value )
SELECT 'AFORADOR' FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '00' ) FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '01' ) FROM DUAL
/
Then, if we use:
SELECT value,
DUMP( value ) AS dump,
LENGTH(TRIM(value)) AS length
FROM table_name
This outputs:
| VALUE | DUMP | LENGTH |
|-----------|----------------------------------------|--------|
| AFORADOR | Typ=1 Len=8: 65,70,79,82,65,68,79,82 | 8 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,0 | 9 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,1 | 9 |
(Yes, the alignment of the table is messed up ... that's because of the unprintable characters in the string.)
sqlfiddle here
Update:
From comments:
What is the output of SELECT DUMP(pin_descripcion) FROM <rest of your query> for those rows?
These are the results:
Aforador Typ=1 Len=8: 65,102,111,114,97,100,111,114
aforador Typ=1 Len=9: 97,102,111,114,97,100,111,114,0
Look at the difference in the byte values.
The first characters are 65 (or A) and 97 (or a); however, by using UPPER, you are masking that difference in your output. If you use UPPER in the GROUP BY then the difference in case will not matter.
Additionally, the second one has an extra byte at the end with the value of 0 (ASCII NUL) so, regardless of case, they are different strings. This extra last character is not printable so you won't see the difference in the output (and is used as a string terminator so may have unexpected side-effects; for example, trying to display a NUL character in a db<>fiddle hangs the output) but it means that the strings are different so that GROUP BY will not aggregate them together.
You can do:
CREATE FUNCTION createStringFromHex( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then, to remove the NUL string terminators:
UPDATE PES_PUESTOS_INTERNOS
SET PIN_DESCRIPCION = RTRIM( PIN_DESCRIPCION, createStringFromHex( '00' ) )
WHERE SUBSTR( PIN_DESCRIPCION, -1 ) = createStringFromHex( '00' );
Then you should be able to do:
SELECT UPPER(PIN_DESCRIPCION) AS PIN_DESCRIPCION,
LENGTH(UPPER(PIN_DESCRIPCION)) AS LENGTH
FROM PES_PUESTOS_INTERNOS
--where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by UPPER(PIN_DESCRIPCION)
order by UPPER(PIN_DESCRIPCION) asc;
and only see single rows.
db<>fiddle here

How to count length of line break as 2 characters in oracle sql

I have data stored in table's column and it has a line break in the data. When I count the length of the string it returns me the count just fine. I want to make some changes and take the line break as 2 characters so if the data in table is something like this.
This
That
This should return length as 10 instead it is returning 9 for now which is understandable but I was to count the length of line break as 2 characters. So if there are 2 line breaks in data it will count them as 4 characters.
How can I achieve this ?
I want to use this in SUBSTR(COL, 1, 7)
By counting line break as 2 character it should return data like this
This
T
Hope someone can help
Just replace new line in the string with 2 characters, for example 'xx', before counting string length. More info on how to replace new lines in Oracle: Oracle REPLACE() function isn't handling carriage-returns & line-feeds
Update your value to have a line feed character before the carriage return character.
So if you have the table:
CREATE TABLE test_data ( value VARCHAR2(20) );
INSERT INTO test_data ( value ) VALUES ( 'This
That' );
Then you can insert the LF before the CR:
UPDATE test_data
SET value = REPLACE( value, CHR(10), CHR(13) || CHR(10) )
WHERE INSTR( value, CHR(10) ) > 0
Then your query:
SELECT SUBSTR( value, 1, 7 ) FROM test_data;
Outputs:
| SUBSTR(VALUE,1,7) |
| :---------------- |
| This |
| T |
db<>fiddle here

Oracle returns wrong values with LENGTH and INSTR

I'm aiming to retrieve the position of chars in a string, plus the length of a string.
The value of the notes field in the internal_notes table, for the row with ticket_id equal to 1679467247 is literally 'this is a test note'.
When I use the functions against the literal stirngs they work, but when I retrieve the info directly from the table column, the values are just wrong.
Any ideas as to what might be happening?
select notes,
LENGTH(notes),
INSTR(notes,' ')
FROM internal_notes
where ticket_id = 1679467247
union
select 'this is a test note',
LENGTH('this is a test note'),
INSTR('this is a test note',' ')
from dual
This returns the following:
NOTES LENGTH(NOTES) INSTR(NOTES,' ')
------------------- ------------- ----------------
this is a test note 32 11
this is a test note 19 5
You can get this apparent inconsistency if you have zero-width characters in the value; for example:
create table internal_notes(ticket_id number, notes varchar2(32 char));
insert into internal_notes(ticket_id, notes)
values (1679467247, unistr('\200c\200cthis is a test note\200c\200c\200c\200c\200c\200c\200c\200c\200c\200c\200c'));
insert into internal_notes(ticket_id, notes)
values (1679467248, unistr('\200c\200cthis is a test note'));
insert into internal_notes(ticket_id, notes)
values (1679467249, 'this is a test note');
select notes,
LENGTH(notes),
INSTR(notes,' ')
FROM internal_notes
where ticket_id = 1679467247;
NOTES LENGTH(NOTES) INSTR(NOTES,'')
-------------------------------- ------------- ---------------
‌‌this is a test note‌‌‌‌‌‌‌‌‌‌‌ 32 7
I said 'apparent inconsistency' because those numbers are correct; they just don't look it if you can't see some of the characters. Invisible characters still count.
As #MTO suggested you can use the dump() function to see exactly what is stored in the table, in decimal or hex representation, or mixed with 'normal' characters which can be a bit easier to interpret:
select notes,
LENGTH(notes),
INSTR(notes,' '),
dump(notes, 1000) as dmp
FROM internal_notes;
NOTES LENGTH(NOTES) INSTR(NOTES,'')
-------------------------------- ------------- ---------------
DMP
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
‌‌this is a test note‌‌‌‌‌‌‌‌‌‌‌ 32 7
Typ=1 Len=58: e2,80,8c,e2,80,8c,t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,
8c
‌‌this is a test note 21 7
Typ=1 Len=25: e2,80,8c,e2,80,8c,t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e
this is a test note 19 5
Typ=1 Len=19: t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e
db<>fiddle - though that is showing the zero-width characters as question marks, unlike SQL Developer and SQL*Plus.
Other zero-width characters are available (space, non-joiner, joiner), and you might see something different in your dump - it just has to be something your client doesn't display at all. Whatever is in there, if it affects all rows and not just that single ticket, then how and why is probably down to whatever front-end/application populates the table - possibly from a character set mismatch, but it could be intentional. If it is just that ticket then that note is an interesting test...

How can I set parameters in an external table?

I'm tring to create an external table from a csv file with SQL.
The csv file has this structure:
I001234
I012344
I000234
...
I wrote this code for the upload:
create table Arranger_check(
matriid char(8))
organization external (
type oracle_loader default directory ext_tab_data access parameters
(
records delimited by newline
)
location('file.csv')) reject limit unlimited;
If I try to interrogate db, the result is wrong. I have eight digits and the last is space 32(ascii). The result is that a query with IN or NOT IN doesn't work.
matriid
--------
I001234
I012344
I000234
...
I've tried to change matriid char(8) in char(7), but create table uploads 0 rows.
If you define your column as char(8) then it will always be padded with spaces, whether the value in the file has spaces or not: if I make the file have a mix:
create table Arranger_check(
matriid char2(8))
...
select matriid, length(matriid), dump(matriid) as dumped
from Arranger_check;
MATRIID LENGTH(MATRIID) DUMPED
-------- --------------- ----------------------------------------
I001234 8 Typ=96 Len=8: 73,48,48,49,50,51,52,32
I012344 8 Typ=96 Len=8: 73,48,49,50,51,52,52,32
I000234 8 Typ=96 Len=8: 73,48,48,48,50,51,52,32
With varchar2 the column value will only have spaces if the file has trailing spaces, so with the same file with a mix you get varying lengths in the table:
create table Arranger_check(
matriid varchar2(8))
...
select matriid, length(matriid), dump(matriid) as dumped
from Arranger_check;
MATRIID LENGTH(MATRIID) DUMPED
-------- --------------- ----------------------------------------
I001234 8 Typ=1 Len=8: 73,48,48,49,50,51,52,32
I012344 7 Typ=1 Len=7: 73,48,49,50,51,52,52
I000234 7 Typ=1 Len=7: 73,48,48,48,50,51,52
If you want the column values to not have trailing spaces even if the file values do, you need to trim them off if they exist:
create table Arranger_check(
matriid varchar2(8))
organization external (
type oracle_loader default directory ext_tab_data access parameters
(
records delimited by newline
fields
(
matriid char(8) rtrim
)
)
location('file.csv')) reject limit unlimited;
Then with the same file with a mix of values with and without spaces:
select matriid, length(matriid), dump(matriid) as dumped
from Arranger_check;
MATRIID LENGTH(MATRIID) DUMPED
-------- --------------- ----------------------------------------
I001234 7 Typ=1 Len=7: 73,48,48,49,50,51,52
I012344 7 Typ=1 Len=7: 73,48,49,50,51,52,52
I000234 7 Typ=1 Len=7: 73,48,48,48,50,51,52
Note that rtrim will have no real effect if you stick with char(8), as it's that data type that causes all the values to be re-padded with spaces to the full size of the column. You need to use varchar2(8).