How can I set parameters in an external table? - sql

I'm tring to create an external table from a csv file with SQL.
The csv file has this structure:
I001234
I012344
I000234
...
I wrote this code for the upload:
create table Arranger_check(
matriid char(8))
organization external (
type oracle_loader default directory ext_tab_data access parameters
(
records delimited by newline
)
location('file.csv')) reject limit unlimited;
If I try to interrogate db, the result is wrong. I have eight digits and the last is space 32(ascii). The result is that a query with IN or NOT IN doesn't work.
matriid
--------
I001234
I012344
I000234
...
I've tried to change matriid char(8) in char(7), but create table uploads 0 rows.

If you define your column as char(8) then it will always be padded with spaces, whether the value in the file has spaces or not: if I make the file have a mix:
create table Arranger_check(
matriid char2(8))
...
select matriid, length(matriid), dump(matriid) as dumped
from Arranger_check;
MATRIID LENGTH(MATRIID) DUMPED
-------- --------------- ----------------------------------------
I001234 8 Typ=96 Len=8: 73,48,48,49,50,51,52,32
I012344 8 Typ=96 Len=8: 73,48,49,50,51,52,52,32
I000234 8 Typ=96 Len=8: 73,48,48,48,50,51,52,32
With varchar2 the column value will only have spaces if the file has trailing spaces, so with the same file with a mix you get varying lengths in the table:
create table Arranger_check(
matriid varchar2(8))
...
select matriid, length(matriid), dump(matriid) as dumped
from Arranger_check;
MATRIID LENGTH(MATRIID) DUMPED
-------- --------------- ----------------------------------------
I001234 8 Typ=1 Len=8: 73,48,48,49,50,51,52,32
I012344 7 Typ=1 Len=7: 73,48,49,50,51,52,52
I000234 7 Typ=1 Len=7: 73,48,48,48,50,51,52
If you want the column values to not have trailing spaces even if the file values do, you need to trim them off if they exist:
create table Arranger_check(
matriid varchar2(8))
organization external (
type oracle_loader default directory ext_tab_data access parameters
(
records delimited by newline
fields
(
matriid char(8) rtrim
)
)
location('file.csv')) reject limit unlimited;
Then with the same file with a mix of values with and without spaces:
select matriid, length(matriid), dump(matriid) as dumped
from Arranger_check;
MATRIID LENGTH(MATRIID) DUMPED
-------- --------------- ----------------------------------------
I001234 7 Typ=1 Len=7: 73,48,48,49,50,51,52
I012344 7 Typ=1 Len=7: 73,48,49,50,51,52,52
I000234 7 Typ=1 Len=7: 73,48,48,48,50,51,52
Note that rtrim will have no real effect if you stick with char(8), as it's that data type that causes all the values to be re-padded with spaces to the full size of the column. You need to use varchar2(8).

Related

OPTIONALLY ENCLOSED BY. can we use in sql query? or what alternate solution for this?

oracle 12.2.9
db version 18c
we are getting .csv(comma separated) file form external source. and need to split_string into 1 TABLE type Array field and then need to insert into interface table.
but as i can see in .csv(comma separated) file amount field having "," between amount i.e "71,007,498.00"
i have this value "71,007,498.00",0.00,0.00,"71,007,498.00",
so while splitting this value, it should be like
lv_data_tbl := split_string('"71,007,498.00",0.00,0.00,"71,007,498.00",' , ',');
expected output
lv_data_tbl(1)=71,007,498.00
lv_data_tbl(2)=0.00
lv_data_tbl(3)=0.00
lv_data_tbl(4)=71,007,498.00
but getting this output:-
lv_data_tbl(1)=71
lv_data_tbl(2)=007
lv_data_tbl(3)=498.00
lv_data_tbl(4)=0.00
lv_data_tbl(5)=0.00
lv_data_tbl(6)=71
lv_data_tbl(7)=007
lv_data_tbl(8)=498.00
I guess it can be done in SQL by parsing that string, but - why wouldn't you use an external tables feature? As it uses SQL*Loader in the background, it lets you use optionally enclosed parameter.
For example, this is contents of my source data (filename is test_quotes.csv):
1,"71,007,498.00",0.00,0.00,"71,007,498.00"
2,15.00,"12,332.08","8.13",2.82
Let's create the external table. It requires you to have access to a directory (line #11) (Oracle object that points to a filesystem directory that contains the file. If you're not sure how to get it, talk to your DBA. If there's none, say so):
SQL> CREATE TABLE test_ext
2 (
3 id NUMBER,
4 val1 VARCHAR2 (15),
5 val2 VARCHAR2 (15),
6 val3 VARCHAR2 (15),
7 val4 VARCHAR2 (15)
8 )
9 ORGANIZATION EXTERNAL
10 (TYPE oracle_loader
11 DEFAULT DIRECTORY kcdba_dpdir
12 ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE
13 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
14 (id,
15 val1,
16 val2,
17 val3,
18 val4))
19 LOCATION ('test_quotes.csv'))
20 REJECT LIMIT UNLIMITED;
Table created.
Any data?
SQL> SELECT * FROM test_ext;
ID VAL1 VAL2 VAL3 VAL4
---------- --------------- --------------- --------------- ---------------
1 71,007,498.00 0.00 0.00 71,007,498.00
2 15.00 12,332.08 8.13 2.82
SQL>
Fine; it works with no effort at all. OK, with a little bit of effort.
On the other hand, you could use SQL*Loader itself - write a control file and load data. It is really, really fast. And - you don't have to have access to any directory - source file can reside on your own hard disk.

Oracle varchar2 and unicode texts

According to spec VARCHAR2(max_size CHAR) should store max_size chars. I observe other/strange behavior for Unicode texts.
Let's consider that example:
create table test (id varchar2(3 char) not null, primary key (id));
insert into test(id) values('abc');
insert into test(id) values('ффф');
Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 6, maximum: 3)
So varchar2 3 chars actually mean the same as byte? NO :)
create table test (id varchar2(3 byte) not null, primary key (id))
insert into test(id) values('abc')
insert into test(id) values('ффф')
Query 1 ERROR: ORA-12899: value too large for column "TEST"."TEST"."ID" (actual: 18, maximum: 3)
And my question remains how to tell Oracle that varchar2 length is for Unicode text (UTF8 to be more precise)?
Update: Is it possible to write down a SQL query that will show all tables/columns that length was in bytes?
Actually, my issue split into 2 parts incorrect query encoding of TablePlus, length in bytes (w/o char suffix) for random columns :)
Update 2: Thanks to #Wernfried Domscheit!
The query show table and columns with varchar2 that length is provided in bytes:
SELECT TABLE_NAME, COLUMN_NAME, DATA_LENGTH, CHAR_USED
FROM USER_TAB_COLUMNS WHERE DATA_TYPE = 'VARCHAR2' AND CHAR_USED = 'B'
Your example is working for me:
SELECT *
FROM V$NLS_PARAMETERS
WHERE PARAMETER = 'NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------
NLS_CHARACTERSET AL32UTF8
1 row selected.
CREATE TABLE TEST (ID VARCHAR2(3 CHAR));
Table created.
INSERT INTO TEST(ID) VALUES('abc');
1 row created.
INSERT INTO TEST(ID) VALUES('ффф');
1 row created.
Maybe a typo on your side?
Update:
Looks like your client uses wrong character settings.
ф (U+0444: Cyrillic Small Letter Ef) has these byte values:
+-------------------------------------------------------------------------------+
|Encoding|hex |dec (bytes)|dec |binary |
+-------------------------------------------------------------------------------+
|UTF-8 |D1 84 |209 132 |53636 |11010001 10000100 |
|UTF-16BE|04 44 |4 68 |1092 |00000100 01000100 |
|UTF-16LE|44 04 |68 4 |17412 |01000100 00000100 |
|UTF-32BE|00 00 04 44|0 0 4 68 |1092 |00000000 00000000 00000100 01000100|
|UTF-32LE|44 04 00 00|68 4 0 0 |1141112832|01000100 00000100 00000000 00000000|
+-------------------------------------------------------------------------------+
DUMP should return Typ=1 Len=6 CharacterSet=AL32UTF8: d1,84,d1,84,d1,84 but you get ef,bf,bd which is U+FFFD: Replacement Character
You don't insert ффф, it is converted to ���.
I guess actually your client uses UTF-8 but you did not tell the database, so most likely the database assumes the client uses default US7ASCII (or something else). The client sends 6 Bytes (d1,84,d1,84,d1,84) but the Oracle database interprets it as 6 Single-Byte characters.
Typically you use the NLS_LANG environment variable to define this. However, dbeaver is Java based and Java/JDBC does not use the NLS_LANG settings - at least not by default.

Same string but different byte length

I'm having some issues here trying to select some data.
I have a trigger that inserts in other table, but only if the string value doesn't exists on the other table , so I validate before insert by counting:
SELECT COUNT(*) INTO v_puesto_x_empresa FROM p_table_1
WHERE PUE_EMP_ID = Z.PIN_CODEMP
AND PUE_NOMBRE = v_puesto_nombre;
If the counter above is lower than 1 or equal to 0, then the process allow to insert data into the correspondent table.
As it turns out, is duplicating the data for some strange reason, so I checked the source.
I use a cursor that prepares the data I need to insert, and I noticed that for some strings, even though they are the same, it treats them as different strings.
select
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci')))) PIN_DESCRIPCION,
LENGTHB(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) LEGTHB,
LENGTH(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "NORMAL LENGTH",
LENGTHC(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH C",
LENGTH2(UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))) AS "LENGTH 2"
FROM PES_PUESTOS_INTERNOS
where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by PIN_DESCRIPCION
order by PIN_DESCRIPCION asc
;
These are the results:
Results but in text :
PIN_DESCRIPCION
----------------------------------------------------------------------
LEGTHB NORMAL LENGTH LENGTH C LENGTH 2
---------- ------------- ---------- ----------
ADMINISTRADOR DE PROCESOS
27 27 27 27
ADMINISTRADOR DE PROCESOS Y CALIDAD
36 36 36 36
AFORADOR
9 9 9 9
AFORADOR
10 10 10 10
ASISTENTE ADMINISTRATIVO
25 25 25 25
ASISTENTE ADMINISTRATIVO
26 26 26 26
So my guess, is that for some reason, even though they are the same, somehow they are treated as different internally.
Note: this table loads user input data, so the results, although are meant to be the 'same word', may encounter some linguistic char difference, such as :
user input 1: aforador
user input 2: Aforador
For that reason, I applied the next piece of code, so I can process only the one word(string):
UPPER(trim(utl_raw.cast_to_varchar2(nlssort(PIN_DESCRIPCION, 'nls_sort=binary_ci'))))
So, if for example, I query the same data without that, I would get the following result:
PIN_DESCRIPCION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
administrador de procesos
administrador de procesos
Administrador de Procesos y Calidad
Aforador
aforador
aforador
I'll appreciate any help with this issue.
Thanks in advance.
My best regards.
This strings are not the same. The second one has another character that is not (obviously) being displayed.
If you do:
SELECT PIN_DESCRIPCION,
DUMP(PIN_DESCRIPCION)
FROM PES_PUESTOS_INTERNOS
WHERE pin_codemp = '8F90CF5D287E2419E0530200000AA716'
GROUP BY PIN_DESCRIPCION
ORDER BY PIN_DESCRIPCION asc;
Then you will see the binary values that comprise the data and should see that there is an extra trailing character. This could be:
Whitespace (which you can remove with RTRIM);
A zero-width character;
A NUL (ASCII 0) character;
Or something else that is not being displayed.
For example, if you have a function to create a string from bytes (passed as hexadecimal values):
CREATE FUNCTION createString( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then we can create the sample data:
CREATE TABLE table_name ( value VARCHAR2(50) )
/
INSERT INTO table_name ( value )
SELECT 'AFORADOR' FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '00' ) FROM DUAL UNION ALL
SELECT 'AFORADOR' || createString( '01' ) FROM DUAL
/
Then, if we use:
SELECT value,
DUMP( value ) AS dump,
LENGTH(TRIM(value)) AS length
FROM table_name
This outputs:
| VALUE | DUMP | LENGTH |
|-----------|----------------------------------------|--------|
| AFORADOR | Typ=1 Len=8: 65,70,79,82,65,68,79,82 | 8 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,0 | 9 |
| AFORADOR | Typ=1 Len=9: 65,70,79,82,65,68,79,82,1 | 9 |
(Yes, the alignment of the table is messed up ... that's because of the unprintable characters in the string.)
sqlfiddle here
Update:
From comments:
What is the output of SELECT DUMP(pin_descripcion) FROM <rest of your query> for those rows?
These are the results:
Aforador Typ=1 Len=8: 65,102,111,114,97,100,111,114
aforador Typ=1 Len=9: 97,102,111,114,97,100,111,114,0
Look at the difference in the byte values.
The first characters are 65 (or A) and 97 (or a); however, by using UPPER, you are masking that difference in your output. If you use UPPER in the GROUP BY then the difference in case will not matter.
Additionally, the second one has an extra byte at the end with the value of 0 (ASCII NUL) so, regardless of case, they are different strings. This extra last character is not printable so you won't see the difference in the output (and is used as a string terminator so may have unexpected side-effects; for example, trying to display a NUL character in a db<>fiddle hangs the output) but it means that the strings are different so that GROUP BY will not aggregate them together.
You can do:
CREATE FUNCTION createStringFromHex( hex VARCHAR2 ) RETURN VARCHAR2 DETERMINISTIC
IS
value VARCHAR2(50);
BEGIN
DBMS_STATS.CONVERT_RAW_VALUE( HEXTORAW( hex ), value );
RETURN value;
END;
/
Then, to remove the NUL string terminators:
UPDATE PES_PUESTOS_INTERNOS
SET PIN_DESCRIPCION = RTRIM( PIN_DESCRIPCION, createStringFromHex( '00' ) )
WHERE SUBSTR( PIN_DESCRIPCION, -1 ) = createStringFromHex( '00' );
Then you should be able to do:
SELECT UPPER(PIN_DESCRIPCION) AS PIN_DESCRIPCION,
LENGTH(UPPER(PIN_DESCRIPCION)) AS LENGTH
FROM PES_PUESTOS_INTERNOS
--where pin_codemp = '8F90CF5D287E2419E0530200000AA716'
group by UPPER(PIN_DESCRIPCION)
order by UPPER(PIN_DESCRIPCION) asc;
and only see single rows.
db<>fiddle here

How to include commas in Oracle external table

I have a pipe delimited file that contains commas on the very last field like so:
COLOR|CAT|CODES
Red|Pass|tiger, 12#fol, letmein
Blue|Pass|jkd#332, forpw, wonton
Gray|Pass|rochester, tommy, 23$ai,
I terminate the last column by whitespace, and everything works out good with no errors, except that it will only include/read the first value and first comma in the last column e.g. tiger, jkd#332, etc. Obviously because of the whitespace after the comma.
How do I include the commas without getting any errors? I have tried " ", /r, /n, /r/n and even excluding the "terminated by" in the last column, and while those will work to include the commas, I will get the ORA-29913 and ORA-30653 reject error every time I select all from the external table (contains thousands of records).
I have the reject limit to 10, but I don't want to change it to UNLIMITED because I don't want to ignore those errors, also I cannot change the file.
My code:
--etc..
FIELDS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
--etc..
CODES CHAR TERMINATED BY WHITESPACE
Here's how:
SQL> create table color (
2 color varchar2(5),
3 cat varchar2(5),
4 codes varchar2(50)
5 )
6 organization external (
7 type oracle_loader
8 default directory ext_dir
9 access parameters (
10 records delimited by newline
11 skip 1
12 fields terminated by '|'
13 missing field values are null
14 (
15 color char(5),
16 cat char(5),
17 codes char(50)
18 )
19 )
20 location ('color.txt')
21 )
22 parallel 5
23 reject limit unlimited;
SQL>
SQL> select * From color;
COLOR CAT CODES
----- ----- --------------------------------------------------
Red Pass tiger, 12#fol, letmein
Blue Pass jkd#332, forpw, wonton
Gray Pass rochester, tommy, 23$ai,
SQL>

ÿ is not displaying while concatenate with other column in oracle

I have a problem regarding a special character. I have a table test as below:
SQL> desc test
Name Type Nullable Default Comments
-------------- ----------------- -------- ------- --------
DOT DATE Y
LOWVAL_CHAR VARCHAR2(3) Y
I am inserting data in this table using sqlldr. The data file is as below:
1984/01/10ÿ:-1 0 -99999+99999Sourav Bhattacharya
ctl file is as below:
LOAD DATA
CHARACTERSET AL32UTF8
APPEND
PRESERVE BLANKS
INTO TABLE TEST
(dot POSITION(1) CHAR(10) "TO_DATE(:DOT,'YYYY/MM/DD')",
lowval_char POSITION(11) CHAR(1),
highval_char POSITION(12) CHAR(1),
lowval_un_num POSITION(13) INTEGER EXTERNAL(6),
highval_un_num POSITION(19) INTEGER EXTERNAL(6),
lowval_si_num POSITION(25) INTEGER EXTERNAL(6),
highval_si_num POSITION(31) INTEGER EXTERNAL(6),
fn POSITION(37) CHAR(10),
mn POSITION(47) CHAR(5),
ln POSITION(52) CHAR(20)
)
Data got inserted correctly.
Now if I execute the query
select dot,lowval_char from test;
It is giving me correct result but if I concatenate i.e
select dot||lowval_char from test;
ÿ is not displaying, only the dot value is visible.
If I do not use SQL loader and do any insert into it is giving correct result for both the cases.
Settings:
character set is AL32UTF8
nls_language='AMERICAN'
NLS_LENGTH_SEMANTICS='BYTE'