vertica UTF-8 character - sql

I have a table with a column whose type is varchar(32) the value it has is
'Movies & TV' . This data is loaded by Copy command, when I query this table like
select * from activity where name='Movies & TV' (typed this value)
it won't return any record this is mainly because of & character there is something going on with this character.
When I tried
Select ISUTF8(name) from activity it returns true, which means the data is actually stored in the UTF-8 format.
Select length(name) and length('Movies & TV') are also same. However, when I paste these values in the vi editor and saw an extra space in the DB string. In addition, the field name in activity table can have Chines characters too, which is stored correctly in DB now.
Any idea what is going on here? Should I specify explicit UTF-8 when loading the data?

If you are doing it from SQLPLUS use
SET DEFINE OFF
to stop it treading & as a special case.
An alternate solution, use concatenation and the chr function:
SELECT * FROM activity WHERE name= 'Movies ' || chr(38) || ' TV';
Solution from:
How to insert a string which contains an "&"

Related

Oracle RegEx in a Cast Procedure

I have a Cast Procedure for a table with "raw" data. Any time a record comes from any of our locations into the raw table, my procedure "cleans" the data and loads it into a new table. The original raw table is all varchars and my procedure converts date and number fields to the proper data types. From the clean table, a Java program selects any new records on a daily basis and FTPs them off in a file to another dept. Have just learned that a few of the fields accept input from users and on a rare occasion, someone uses a pipe in what they input. A pipe symbol happens to be the delimiter that the other dept is using and whenever a pipe shows up in the middle of a field, it throws a wrench on their end.
I've never used REGEX or REGEXP_REPLACE in Oracle before. There are only three fields where the users can input data - MISTINTCOMMENT, PALETTE, COLORID. How do I use REGEX or REGEXP_REPLACE to replace any pipes with a space? Do I want to do it on each field? Or is this something I should "wrap around" the entire statement (in case there's a field I missed where someone might be able to input a pipe)?
Here is the portion of the procedure where the Values are cleaned and inserted into new table. How to best use RegEx with this?
VALUES (CASE
WHEN THECOSTCENTER IS NOT NULL
THEN THECOSTCENTER
ELSE (SUBSTR(TRIM(THESENDING_QMGR), -6))
END,
CASE
WHEN THESTORENBR = '0' AND (SUBSTR(THESENDING_QMGR, 1, 5) = 'PDPOS')
THEN TO_NUMBER(SUBSTR(THESENDING_QMGR, 8, 4))
WHEN THESTORENBR = '0' AND (SUBSTR(THESENDING_QMGR, 1, 8) = 'PROD_POS')
THEN TO_NUMBER(SUBSTR(THESENDING_QMGR, 9, 4))
ELSE TO_NUMBER(NVL(THESTORENBR,'0'))
END,
TO_NUMBER(NVL(THECONTROLNBR,'0')), TO_NUMBER(NVL(THELINENBR,'0')), THESALESNBR, TO_NUMBER(NVL(THEQTYMISTINT,'0')), THEREASONCODE, THEMISTINTCOMMENT,
THESIZECODE, THETINTERMODEL, THETINTERSERIALNBR, TO_NUMBER(NVL(THEEMPNBR,'0')), TO_DATE(THETRANDATE,'YYYY-MM-DD'), THETRANTIME, THECDSADLFLD,
THEPRODNBR, THEPALETTE, THECOLORID, TO_DATE(THEINITTRANDATE,'YYYY-MM-DD'), TO_NUMBER(NVL(THEGALLONSMISTINTED,'0'),'999999999.99'), THEUPDATEEMPNBR,
TO_DATE(THEUPDATETRANDATE,'YYYY-MM-DD'), TO_NUMBER(NVL(THEGALLONS,'0'),'999999999.99'), THEFORMSOURCE, THEUPDATETRANTIME, THESOURCEIND,
TO_DATE(THECANCELDATE,'YYYY-MM-DD'), THECOLORTYPE, TO_NUMBER(NVL(THECANCELEMPNBR,'0')), TO_BOOLEAN(THENEEDEXTRACTED), TO_BOOLEAN(THEMISTINTMQXTR),
THEDATASOURCE, THETRANGUID, TO_NUMBER(NVL(THETERMNBR,'0')), TO_NUMBER(NVL(THETRANNBR,'0')), TO_NUMBER(NVL(THETRANID,'0')), THEID, THETINTABLESALESNBR,
TO_NUMBER(NVL(THERETURNQTY,'0')), THECREATED_TS, THEXMIT_GUID, THESENDING_QMGR, THEMSG_ID, THEPUT_TS,
THEBROKER_NAME, THECHECKSUM);
If you have to use a REGEXP_REPLACE to replace pipes, escape them:
REGEXP_REPLACE(x, '\|', ' ')
This is useful to know when your more complex expressions include a pipe.
In this case, REPLACE that performs literal text search and replace will suffice:
REPLACE(x, '|', ' ')

How to concatenate a tab in a DB2 view field?

I've been attempting to create a Db2 (the database is hosted on an IBM i, running 7.3) view in which one of the fields (a character/char field) is constructed by concatenating several different pieces of data together. The catch is that between each of these fields of data, there needs to be a tab present which is used for delimiting the fields in a DataMatrix barcode.
The following link is an ASCII and EBCDIC character set that I'm using as a reference. I'm using the hexadecimal code for a horizontal tab as follows to try and concatenate the tabs in the character field that I'm constructing(e.g.):
select 'data1' || X'09' || 'data2' from
sysibm.sysdummy1;
Unfortunately, the only thing present, which results from the hexadecimal code (X'09') appears to be one single space, as follows:
Result set:
data1 data2
When I use the resulting field in the view to generate a 2D barcode, there are actually no spaces at all delimiting the fields (seen after scanning said barcode). What's the trick to actually getting a tab to be rendered in a Db2 view field? Is there a different code or function I should be using? I've also tried using char(05) and char(09), but to no avail. In addition, I've tried casting the hexadecimal code as a character, as follows, but with no success:
select 'data1' || cast(X'09' as CHAR) || 'data2' from
sysibm.sysdummy1;
Any thoughts or ideas would be much appreciated!
Try x'05' instead.
If you copy-past the following char sequence ("a" + "\tab" + "b") from some text editor, you get the result as described:
values hex('a b');
|00001 |
|------|
|810582|
Tou can use CHR() on both Db2 for LUW https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000778.html
Returns the character that has the ASCII code value specified by the argument.
and Db2 for i https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_72/db2/rbafzscachr.htm
The CHR function returns the EBCDIC character that has the ASCII code value specified by the argument.
values 'A' || CHR(9) || 'B'
returns
1
---
A B

Search an Oracle clob for special characters that are not escaped

Is it possible to run a query that can search an Oracle clob for any record that contains an ampersand character where the word in which the character is located in is not one of any of the following (or possible any escape code):
& - &
< - <
> - >
" - "
' - &apos;
I want to extract 5 character before the ampersand and 5 characters after the ampersand so i can see the actual value.
Basically i want to search for any record that contains those fields and replace it with the escape code.
At the moment i am doing something like this:
Select * from articles
where dbms_lob.instr(article_summary , '&amp' ) = 0 and dbms_lob.instr(article_summary , '&' )
Update
If i was to use a regular expression, how would i specify it if i want to retrieve all fields where the value is & followed by any character other than 'a'?
You can use DBMS_XMLGEN.CONVERT for this. The second parameter is optional and if left out will escape the the XML special characters.
select DBMS_XMLGEN.CONVERT(article_summary)
from articles;
But, if article summary contains a mixture of escaped and unescaped characters, then this will give wrong result. Easiest way to solve it, is to unescape the characters first and then escape it.
select DBMS_XMLGEN.CONVERT(
DBMS_XMLGEN.CONVERT(article_summary,1) --1 as parameter does unescaping
)
from articles;

SQL -- SELECT statement -- concatenate strings to

I have an SQL question. Everything works fine in the below SELECT statement except the portion I have highlighted in bold. What I'm trying to do is allow the user to search for a specific Rule within the database. Unfortunately, I do not actually have a Rule column, and so I need to concatenate certain field values to create a string with which to compare to the user's searchtext.
Any idea why the part in bold does not work? In theory, I would like this statement to check for whether the string "Rule " + part_num (where part_num is the value contained in the part_num field) equals the value of searchtext (the value of searchtext is obtained from my PHP script).
I did some research on concatenating strings for SQL purposes, but none seem to fit the bill. Does someone out there have any suggestions?
SELECT id,
part_num,
part_title,
rule_num,
rule_title,
sub_heading_num,
sub_heading,
contents
FROM rules
WHERE part_title LIKE "%'.$searchtext.'%"
OR rule_title LIKE "%'.$searchtext.'%"
OR sub_heading LIKE "%'.$searchtext.'%"
OR contents LIKE "%'.$searchtext.'%"
OR "rule" + part_num LIKE "%'.$searchtext.'%" --RULE PLUS PART_NUM DOESN'T WORK
ORDER BY id;
Since you didn't specify which DB your using, I'm going to assume SQL Sever.
Strings are specified in SQL Server with single quotes 'I'm a string', not double quotes.
See + (String Concatenation) on MSDN for examples.
Another possibility is that part_num is a numeric. If so, cast the number to a string (varchar) before concatenating.

How do you convert from scientific notation in Oracle SQL?

We are trying to load a file created by FastExport into an oracle database.
However the Float column is being exported like this: 1.47654345670000000000 E010.
How do you configure SQL*Loader to import it like that.
Expecting Control Script to look like:
OPTIONS(DIRECT=TRUE, ROWS=20000, BINDSIZE=8388608, READSIZE=8388608)
UNRECOVERABLE LOAD DATA
infile 'data/SOME_FILE.csv'
append
INTO TABLE SOME_TABLE
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols (
FLOAT_VALUE CHAR(38) "???????????????????",
FILED02 CHAR(5) "TRIM(:FILED02)",
FILED03 TIMESTAMP "YYYY-MM-DD HH24:MI:SS.FF6",
FILED04 CHAR(38)
)
I tried to_number('1.47654345670000000000 E010', '9.99999999999999999999 EEEE')
Error: ORA-01481: invalid number format model error.
I tried to_number('1.47654345670000000000 E010', '9.99999999999999999999EEEE')
Error: ORA-01722: invalid number
These are the solutions I came up with in order of preference:
to_number(replace('1.47654345670000000000 E010', ' ', ''))
to_number(TRANSLATE('1.47654345670000000000 E010', '1 ', '1'))
I would like to know if there are any better performing solutions.
As far as I'm aware there is no way to have to_number ignore the space, and nothing you can do in SQL*Loader to prepare it. If you can't remove it by pre-processing the file, which you've suggested isn't an option, then you'll have to use a string function at some point. I wouldn't expect it to add a huge amount of processing, above what to_number will do anyway, but I'd always try it and see rather than assuming anything - avoiding the string functions sounds a little like premature optimisation. Anyway, the simplest is possibly replace:
select to_number(replace('1.47654345670000000000 E010',' ',''),
'9.99999999999999999999EEEE') from dual;
or just for display purposes:
column num format 99999999999
select to_number(replace('1.47654345670000000000 E010',' ',''),
'9.99999999999999999999EEEE') as num from dual
NUM
------------
14765434567
You could define your own function to simplify the control file slightly, but not sure it'd be worth it.
Two other options come to mind. (a) Load into a temporary table as a varchar, and then populate the real table using the to_number(replace()); but I doubt that will be any improvement in performance and might be substantially worse. Or (b) if you're running 11g, load into a varchar column in the real table, and make your number column a virtual column that applies the functions.
Actually, a third option... don't use SQLLoader at all, but use the CSV file as an external table, and populate your real table from that. You'll still have to do the to_number(replace()) but you might see a difference in performance over doing it in SQLLoader. The difference could be that it's worse, of course, but might be worth trying.
Change number width with "set numw"
select num from blabla >
result >> 1,0293E+15
set numw 20;
select num from blabla >
result >> 1029301200000021
Here is the solution I went with:
OPTIONS(DIRECT=TRUE, ROWS=20000, BINDSIZE=8388608, READSIZE=8388608)
UNRECOVERABLE LOAD DATA
infile 'data/SOME_FILE.csv'
append
INTO TABLE SOME_TABLE
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols (
FLOAT_VALUE CHAR(38) "REPLACE(:FLOAT_VALUE,' ','')",
FILED02 CHAR(5) "TRIM(:FILED02)",
FILED03 TIMESTAMP "YYYY-MM-DD HH24:MI:SS.FF6",
FILED04 CHAR(38)
)
In my solution the conversion to a number is implicit:
"REPLACE(:FLOAT_VALUE,' ','')"
In Oracle 11g, it's not needed to convert numbers specially.
Just use integer external in the .ctl-file:
I tried the following in my Oracle DB:
field MYNUMBER has type NUMBER.
Inside .ctl-file I used the following definition:
MYNUMBER integer external
In the datafile the value is: MYNUMBER: -1.61290E-03
As for the result: sqlldr loaded the notation correctly: MYNUMBER field: -0.00161290
I am not sure if it's a bug or a feature; but it works in Oracle 11g.