We are trying to load a file created by FastExport into an oracle database.
However the Float column is being exported like this: 1.47654345670000000000 E010.
How do you configure SQL*Loader to import it like that.
Expecting Control Script to look like:
OPTIONS(DIRECT=TRUE, ROWS=20000, BINDSIZE=8388608, READSIZE=8388608)
UNRECOVERABLE LOAD DATA
infile 'data/SOME_FILE.csv'
append
INTO TABLE SOME_TABLE
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols (
FLOAT_VALUE CHAR(38) "???????????????????",
FILED02 CHAR(5) "TRIM(:FILED02)",
FILED03 TIMESTAMP "YYYY-MM-DD HH24:MI:SS.FF6",
FILED04 CHAR(38)
)
I tried to_number('1.47654345670000000000 E010', '9.99999999999999999999 EEEE')
Error: ORA-01481: invalid number format model error.
I tried to_number('1.47654345670000000000 E010', '9.99999999999999999999EEEE')
Error: ORA-01722: invalid number
These are the solutions I came up with in order of preference:
to_number(replace('1.47654345670000000000 E010', ' ', ''))
to_number(TRANSLATE('1.47654345670000000000 E010', '1 ', '1'))
I would like to know if there are any better performing solutions.
As far as I'm aware there is no way to have to_number ignore the space, and nothing you can do in SQL*Loader to prepare it. If you can't remove it by pre-processing the file, which you've suggested isn't an option, then you'll have to use a string function at some point. I wouldn't expect it to add a huge amount of processing, above what to_number will do anyway, but I'd always try it and see rather than assuming anything - avoiding the string functions sounds a little like premature optimisation. Anyway, the simplest is possibly replace:
select to_number(replace('1.47654345670000000000 E010',' ',''),
'9.99999999999999999999EEEE') from dual;
or just for display purposes:
column num format 99999999999
select to_number(replace('1.47654345670000000000 E010',' ',''),
'9.99999999999999999999EEEE') as num from dual
NUM
------------
14765434567
You could define your own function to simplify the control file slightly, but not sure it'd be worth it.
Two other options come to mind. (a) Load into a temporary table as a varchar, and then populate the real table using the to_number(replace()); but I doubt that will be any improvement in performance and might be substantially worse. Or (b) if you're running 11g, load into a varchar column in the real table, and make your number column a virtual column that applies the functions.
Actually, a third option... don't use SQLLoader at all, but use the CSV file as an external table, and populate your real table from that. You'll still have to do the to_number(replace()) but you might see a difference in performance over doing it in SQLLoader. The difference could be that it's worse, of course, but might be worth trying.
Change number width with "set numw"
select num from blabla >
result >> 1,0293E+15
set numw 20;
select num from blabla >
result >> 1029301200000021
Here is the solution I went with:
OPTIONS(DIRECT=TRUE, ROWS=20000, BINDSIZE=8388608, READSIZE=8388608)
UNRECOVERABLE LOAD DATA
infile 'data/SOME_FILE.csv'
append
INTO TABLE SOME_TABLE
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols (
FLOAT_VALUE CHAR(38) "REPLACE(:FLOAT_VALUE,' ','')",
FILED02 CHAR(5) "TRIM(:FILED02)",
FILED03 TIMESTAMP "YYYY-MM-DD HH24:MI:SS.FF6",
FILED04 CHAR(38)
)
In my solution the conversion to a number is implicit:
"REPLACE(:FLOAT_VALUE,' ','')"
In Oracle 11g, it's not needed to convert numbers specially.
Just use integer external in the .ctl-file:
I tried the following in my Oracle DB:
field MYNUMBER has type NUMBER.
Inside .ctl-file I used the following definition:
MYNUMBER integer external
In the datafile the value is: MYNUMBER: -1.61290E-03
As for the result: sqlldr loaded the notation correctly: MYNUMBER field: -0.00161290
I am not sure if it's a bug or a feature; but it works in Oracle 11g.
Related
CONCAT('ERROR - Portfolio '', ''' not found in br_portfolio and composite_member table')
My issue is that ''' is creating a huge string that goes on for many lines after it is, so I guess the problem is how do I create ' char as a string that is wrapped in ''?
Thanks
Maybe this is more of a comment, but for what it's worth some additional options for you.
If you have a long string that includes return characters and single quotes, you can always use string literals using $$:
select $ONE$I'm a dog$ONE$, $FOOT$Hello,
Hambone's Dog$FOOT$
Hopefully you can see what goes inside the $$ doesn't matter -- it just has to match the beginning and end of the literal.
Second comment: I really like the format function in Pg. It's cleaner, in my opinion, than doing concat or multiple concat operators:
select format ('Eating too much %s is dangerous', 'cake')
So, combining these two, I wonder if this will help make your code cleaner/easier:
create table test (portfolio varchar(20));
insert into test values ('foot'), ('ball');
select
format ($EE$ERROR - Portfolio '%s' not found in br_portfolio$EE$, portfolio)
from test;
I have a Cast Procedure for a table with "raw" data. Any time a record comes from any of our locations into the raw table, my procedure "cleans" the data and loads it into a new table. The original raw table is all varchars and my procedure converts date and number fields to the proper data types. From the clean table, a Java program selects any new records on a daily basis and FTPs them off in a file to another dept. Have just learned that a few of the fields accept input from users and on a rare occasion, someone uses a pipe in what they input. A pipe symbol happens to be the delimiter that the other dept is using and whenever a pipe shows up in the middle of a field, it throws a wrench on their end.
I've never used REGEX or REGEXP_REPLACE in Oracle before. There are only three fields where the users can input data - MISTINTCOMMENT, PALETTE, COLORID. How do I use REGEX or REGEXP_REPLACE to replace any pipes with a space? Do I want to do it on each field? Or is this something I should "wrap around" the entire statement (in case there's a field I missed where someone might be able to input a pipe)?
Here is the portion of the procedure where the Values are cleaned and inserted into new table. How to best use RegEx with this?
VALUES (CASE
WHEN THECOSTCENTER IS NOT NULL
THEN THECOSTCENTER
ELSE (SUBSTR(TRIM(THESENDING_QMGR), -6))
END,
CASE
WHEN THESTORENBR = '0' AND (SUBSTR(THESENDING_QMGR, 1, 5) = 'PDPOS')
THEN TO_NUMBER(SUBSTR(THESENDING_QMGR, 8, 4))
WHEN THESTORENBR = '0' AND (SUBSTR(THESENDING_QMGR, 1, 8) = 'PROD_POS')
THEN TO_NUMBER(SUBSTR(THESENDING_QMGR, 9, 4))
ELSE TO_NUMBER(NVL(THESTORENBR,'0'))
END,
TO_NUMBER(NVL(THECONTROLNBR,'0')), TO_NUMBER(NVL(THELINENBR,'0')), THESALESNBR, TO_NUMBER(NVL(THEQTYMISTINT,'0')), THEREASONCODE, THEMISTINTCOMMENT,
THESIZECODE, THETINTERMODEL, THETINTERSERIALNBR, TO_NUMBER(NVL(THEEMPNBR,'0')), TO_DATE(THETRANDATE,'YYYY-MM-DD'), THETRANTIME, THECDSADLFLD,
THEPRODNBR, THEPALETTE, THECOLORID, TO_DATE(THEINITTRANDATE,'YYYY-MM-DD'), TO_NUMBER(NVL(THEGALLONSMISTINTED,'0'),'999999999.99'), THEUPDATEEMPNBR,
TO_DATE(THEUPDATETRANDATE,'YYYY-MM-DD'), TO_NUMBER(NVL(THEGALLONS,'0'),'999999999.99'), THEFORMSOURCE, THEUPDATETRANTIME, THESOURCEIND,
TO_DATE(THECANCELDATE,'YYYY-MM-DD'), THECOLORTYPE, TO_NUMBER(NVL(THECANCELEMPNBR,'0')), TO_BOOLEAN(THENEEDEXTRACTED), TO_BOOLEAN(THEMISTINTMQXTR),
THEDATASOURCE, THETRANGUID, TO_NUMBER(NVL(THETERMNBR,'0')), TO_NUMBER(NVL(THETRANNBR,'0')), TO_NUMBER(NVL(THETRANID,'0')), THEID, THETINTABLESALESNBR,
TO_NUMBER(NVL(THERETURNQTY,'0')), THECREATED_TS, THEXMIT_GUID, THESENDING_QMGR, THEMSG_ID, THEPUT_TS,
THEBROKER_NAME, THECHECKSUM);
If you have to use a REGEXP_REPLACE to replace pipes, escape them:
REGEXP_REPLACE(x, '\|', ' ')
This is useful to know when your more complex expressions include a pipe.
In this case, REPLACE that performs literal text search and replace will suffice:
REPLACE(x, '|', ' ')
Unfortunately I don't have the possibility to change field type.
I would like to REPLACE a , to . in a Typ=1 type of field (e.g.: 4,37 so in the end it should be 4.37), and I've tried CAST() and TO_NUMBER and TO_CHAR and I don't even know what else also, but I keep getting the ORA-01722 and it drives me crazy already. Why does it have to be a number for replacing ???
SELECT REPLACE(fmm, ',', '.') fmm FROM ...
Or do you have a better idea how can I do it without REPLACE maybe ?
UPDATE: it seems he has a problem with:
ORDER BY TO_NUMBER(fmm, '99D99')
So it seems he is taking the replaced version, so with . of fmm, but why ????
Try to remove the commas by replace(nvl(nr,0),',',''), and then formatting by
with tab as
(
select '1,234,567' as nr
from dual
)
select to_char(
replace(nvl(nr,0),',','')
,'fm999G999G990','NLS_NUMERIC_CHARACTERS = '',.''')
as "Number"
from tab;
Number
----------
1.234.567
Demo
Passing a string (varchar2) value into the replace function cannot throw an ORA-01722.
it seems he has a problem with:
ORDER BY TO_NUMBER(fmm, '99D99')
If that's complaining when fnm is '4,37' then you could add a replace() call inside the to_number(), but it's simpler/clearer to specify the NLS_NUMERIC_CHARACTERS as part of the conversion, so it knows that D is represented by a comma, and doesn't rely on the session settings:
order by to_number(fnm, '99D99', 'NLS_NUMERIC_CHARACTERS=,.')
If your table has a mix of values with period and comma decimal separators then you need to fix the data - this is the main reason you should not be storing numbers as strings in the first place. If you can't fix the data then you can workaround it with replace(), but it isn't ideal; you can then use a fixed period as the decimal character:
order by to_number(replace(fnm, ',', '.'), '99.99');
or still specify NLS_NUMERIC_CHARACTERS:
order by to_number(replace(fnm, ',', '.'), '99D99', 'NLS_NUMERIC_CHARACTERS=.,')
Either way that is 'normalising' all the string to only have periods, with no commas; and that allows them all to be converted.
db<>fiddle
what I don't understand, if I do some changes in the SELECT to a field, how can it affect the ORDER BY section? fmm should still remain 4,37 and not 4.37 in the ORDER BY section, shouldn't it?
No, because you gave the column expression REPLACE(fmm, ',', '.') the alias fnm, which is the same as the original column name; and the order-by clause is the only place column aliases are allowed, where it masks the original table column. When you do:
ORDER BY TO_NUMBER(fmm, '99D99')
the fnm in that conversion is the value of the column expression aliased as fnm, and not the original table column.
You can still access the table column, but to do so you have to prefix it with table name or alias, as the column from expression from the select list takes precedence (which is implied but not stated clearly in the docs:
expr orders rows based on their value for expr. The expression is based on columns in the select list or columns in the tables, views, or materialized views in the FROM clause.
So you can either explicitly refer to the table column via the table name or, here, an alias:
SELECT REPLACE(t.fmm, ',', '.') fmm
FROM your_table t
ORDER BY TO_NUMBER(t.fmm, '99D99')
though you still shouldn't rely on the session NLS settings really, so can/should still specify the NLS option to match the table column format:
SELECT REPLACE(t.fmm, ',', '.') fmm
FROM your_table t
ORDER BY TO_NUMBER(t.fmm, '99D99', 'NLS_NUMERIC_CHARACTERS=,.')
or use the replaced value and specify the NLS option for that (notice the option itself is different):
SELECT REPLACE(fmm, ',', '.') fmm
FROM your_table
ORDER BY TO_NUMBER(fmm, '99D99', 'NLS_NUMERIC_CHARACTERS=.,')
db<>fiddle
If your table has a mix of period and comma values then you need to use the column-alias version so it is consistent when it tries to convert. If you you only have commas then you can use either. (But again, you shouldn't be storing numbers as strings in the first place...)
I have a Teradata table that I've inherited that was not formatted in a great manner.
ID
123456789.
234567890.
I've tried:
TRIM(new_card_srgt_id (FORMAT 'Z(17)9'))
but my version of Teradata gave a funny error:
Format string has combination of numeric, character, and GRAPHIC values.
Any suggestions welcomed.
UPDATE: The suggestion to use TRIM(trailing '.' from ID) results in a numeric overflow when I went to cast it. Any other way to fix it.
When you load data from a decimal field into a character field without specifying the format, teradata does the type conversion. It inherently adds '.' to the trailing value.
If the ID field is char and it already contains '.' and you want to extract it as decimal without the '.' then the syntax is as follows
SELECT cast(id as decimal(15,0)) form table;
If the ID field is decimal and you want to extract it as character without the '.' then the syntax is as follows
SELECT TRIM(ID (FORMAT 'ZZZZZZZZZZZZZ9')) from table;
Solved it with:
cast(TRIM(trailing '.' from ID) as decimal(18,0))
I have a CHAR column that contains messy OCR'd scan of printed integers.
I need to do SUM() operators on that column. But I'm unable to cast properly.
;Good
sqlite> select CAST("123" as integer);
123
;No Good, should be '323999'
sqlite> select CAST("323,999" as integer);
323
I believe SQLite interprets the comma as marking the end of the "the longest possible prefix of the value that can be interpreted as an integer number"
I prefer to avoid the agony of writing python scripts to do data cleaning on this column. Is there any clever way to do it strictly with SQL?
If you are trying to ignore commas, then remove them before the conversion:
select cast(replace('323,999', ',', '') as integer)