DB2 SQL Interpret a field as other CCSID - sql

So I have a file on my AS400 as a result of DSPJRN and I want to look at some data in the JOESD field which is the after image from the journal of a file. This is defined as char with CCSID = 65535. I guess this is because it is the whole record with a mixture of ccsid and numeric fields.
I can use substr() to get the actual field from the original file. In the original file the column is defined graphic(10) ccsid 13488. Thats UCS-2. If I do hex(substr(joesd,522,20)) I get a result of 004100530044... and so on so I know it's the correct data but I can't get it to display as 'ASD...'
I tried graphic(substr(joesd,522,20),10,13488) but it gives an error that the conversion from ccsid 65535 to 13488 isn't valid. I don't want to convert it but interpret it as the other ccsid

GRAPHIC() doesn't take CCSID as a parm. The third parm is length according to my 7.1 reference.
What version are you using?
I thought CAST() might be a solution, but it doesn't appear to work.
As I see it, one option would be to build a user defined function (UDF) that does the conversion you need; possibly with the iconv() API.
The other option, would be to dump the data into a properly formatted file. I use the DBUJRN utility from DBU. There's other similar options. Including an open source one (sorry that the description is in German, but google translate does a good enough job to figure out the source to download).
The utilities basically work the same way; you can in fact run through the same process manually. Try the following:
Step 1 (the DSPJRN you've been doing)
DSPJRN <...> OUTFILE(MYLIB/MYJRNOUT)
Step 2 - Create a new file with the journal header fields followed by all the fields from your journaled file (MYFILE)
CREATE TABLE mylib/mytbl as
( select JOENTL, JOSEQN, JOCODE, JOENTT, JODATE,
JOTIME, JOJOB, JOUSER, JONBR, JOPGM, JOOBJ,
JOLIB, JOMBR, JOCTRR, JOFLAG, JOCCID,
JOINCDAT, JOMINESD, JORES,
m.*
from MYLIB/MYJRNOUT , MYLIB/MYFILE m
) with no data
Step 3 - Copy the data without regard to the format differences..
CPYF FROMFILE(MYLIB/MYJRNOUT) TOFILE(MYLIB/MYTBL) MBROPT(*ADD) FMTOPT(*NOCHK)
You should end up with data originally in JOESD split into it's appropriate fields.
Note of course that this technique only works for one file at a time. Also, make sure you're only dumping *RCD entries and you'll probably want to skip the DELETE entries.

Related

Access VBA, importing csv file via TransferText with commata as decimal separator and semicolon as delimiter

I'm having some problems importing double numbers from csv files. The files have a semicolon delimiter and comma as decimal separator.
I can't set up import specs since the order of the fields in the csv often changes and it would be a desaster if the data goes into the wrong field.
Also the csv files will have to written to a temporary table first. Don't hate me for it, but since I have to process data and set some information fields for later data processing this is by far the easiest, fastest and safest way to achieve it.
Here is the problem itself:
When using TransferText it will import, but of course interpret the comma as delimiter. Not good ...
When replacing comma by full stop and semicolon by comma it works. But it will ignore full stops, so 1.2 becomes 12, 1.333 becomes 1333. The field will be of type double.
I've tests numerous things. Besides TransferText I've tried:
DoCmd.RunSQL ("INSERT INTO Tabelle1 SELECT cdbl(a1) as aa FROM[TEXT;FMT=Delimited;HDR=YES;CharacterSet=437;DATABASE=C:\SPOT].[test.csv]")
But nothing seems to work, even when I create a new table with field type DOUBLE before using TransferText ... decimals are still ignored.
So, I would be happy if you could tell me either how to use TransferText with or without replacing semicolon and comma in a first step or how to use the INSERT INTO stuff.
Thank you very much!
Ok, I think I got it!
The problem where the regional settings and that my Access uses comma as decimal separator. I was also not able to create a Import Spec via manual import, since it needs to have defined which fields will have to be imported.
What I did now was this:
Open the table MSysIMEXSpecsthat contains the import specs via query:
select * from MSysIMEXSpecs
Then add a new row and set SpecName = "Whatever", DecimalPoint= "," and 'FieldSeparator` = ";" and whatever other settings have to be made.
Since there is this workaround, isn't there a way to do this easier?

idl strange symbols in file

I've written several IDL programs to analyse some data. To keep it simple the programs read in some time varying data and calculate the fourier spectrum. This spectrum is written to file using this code:
openw,3,filename
printf,3,[transpose(freq),transpose(power)],format='(e,e)'
close,3
The file is then read by another program using this code:
rdfloat,filename,freq,power,/double
The rdfloat procedure can be found here: http://idlastro.gsfc.nasa.gov/
The error i get when trying to read the a file is: "Input conversion error. Unit: 101"
When i delve in to the file being read, i notice several types of unrecognised characters. I dont know if these are a result of the writing to the file or some thing else related to the number of files being created (over 300 files)
These symbols/characters are in the place of a single number:
< dle> < dc1> < dc2> < dc3> < dc4> < can> < nak> < em> < soh> < syn>
Example of what appears in the file being read, Note they are not consecutive lines.
7.7346< dle>18165493007e+01 8.4796811549010105e+00
7.7354408697119453e+01 1.04459538071< dc2>1749e+01
7.7360701595839< can>28e+01 3.0447318983094189e+00
Whenever I run the procedures that write the files, there is always at least one file that has some or all of these characters. The file/s that contains these characters is always different.
Can anyone explain what these symbols are and what I might be doing to create them as well as how to ensure they are not written to file?
I see two things that may be causing a problem. But first, I want to suggest a few tips.
When you open a file, it is useful to use the /GET_LUN keyword because it allows IDL to find and use a logical unit number (LUN) that is available (e.g., in case you left LUN 3 open somewhere else). When you print formatted data, you should specify the total width and number of decimal places. It will make things easier because then you need not worry about changing spacings between numbers in a file.
So I would change your first set of code to the following (or some variant of the following):
OPENW,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
PRINTF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine
So the first potential issue I see is that if your variable power was calculated using something like FFT.pro, then it will be a complex float or complex double, depending on the input and keywords used.
The second potential issue may be due to an incorrect format statement. You did not tell PRINTF how many columns or rows to expect. It might not know how to handle the input properly, so it guesses and may result in those characters you show. Those characters may be spacing characters due to the vague format statement or the software you are using to look at the files (e.g., I would not recommend using Word to open text files, use a text editor).
Side Note: You can open and read the file you just wrote in a similar fashion to what I showed above, but changed to the following:
n = FILE_LINES(filename[0])
freq = DBLARR(n)
power = DBLARR(n)
OPENR,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
READF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine

Cannot upload CSV that starts with an integer

I'm stuck with what seems like a weird BigQuery bug : I cannot upload a CSV file that starts (first line, first column) by an integer.
Here's my schema : COL1:INTEGER,COL2:INTEGER,COL3:STRING
Here's my csv file content :
100,4,XXX
100,4,XXX
If I put the STRING column as first column, the upload is OK.
If I add a header and tell BigQuery to skip it during the import, the upload is ok too.
But with the CSV and schema above, BigQuery always complains : Line:1 / Field:1, Value cannot be converted to expected type.
Anyone knows what the problem is ?
Thank you in advance,
David
I could not reproduce this problem--I copied and pasted the content into a file and uploaded it with no problems.
Perhaps the uploaded file format is corrupted somehow? If there are extra bytes at the beginning of the file, those would be ignored in a header row but might result in this error is the first value of the first field is expected to be an integer. I'd recommend examining the actual binary data in the file to make sure there's nothing funny going on.
Also, are you doing this import via web UI, command-line tool, or API? Have you tried one of the other methods?

Pentaho Spoon - Validate Fixed Width Input File Format

I'm trying to process a fixed width input file in pentaho and validate the format. The file will be a mixture of strings, numbers and dates. However when attempting to process a number field that has an incorrect character present (which i had expected would throw an error) it just reads the first part of the number and ignores the bad char.
I can recreate this issue with a very simple input file containing a single field:
I specify the expected number format, along with start position and length:
On running the transformation i would have expected the 'Q' to cause an error instead the following result is displayed, just reading the first two digits "67" and padding the rest to match the specified format:
If the input file is formatted correctly it runs perfectly well, but need it to throw an error otherwise. Any suggestions would be awesome. Thanks!
Just an FYI in case someone stumbles accross this question after hitting the same issues as myself.
I was able to construct a workaround by reading all values in the "Text File Input" step as strings, and then using a "Data Validator" step equipped with regex evaluation to ensure numbers were correctly formatted before parsing to number type with a following "Select Values" step.
Takes a bit longer to do this for every field, but was the most robust solution i could come up with.
Thanks

Replace character in SQL results

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)
It looks like this when I copy and paste the results.
spouse�s
is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?
Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.
I ran this
select dump('�') from dual;
which returned
Typ=96 Len=3: 239,191,189
This seems to work so far
select translate('What is your spouse�s first name?', '�', '''') from dual;
but this doesn't work
select translate(Fieldname, '�', '''') from TableName
Select FN from TN
What is your spouse�s first name?
SELECT DUMP(FN, 1016) from TN
Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f
EDIT:
So I have established that is the backquote character. I can't get the DB updated so I'm trying this code
SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN
and I"m getting ORA-00904:"Regex_Replace":invalid identifier
This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.
Try to DUMP the value. you can do it with a select as simple as:
SELECT DUMP(the_column)
FROM xxx
WHERE xxx
UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.
UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.
Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.
UPDATE 3: On your examples:
select translate('What. It works because the � is exactly the same char: You have pasted on both sides.
select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.
Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).
I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.
I'm going to propose a front-end application-based, client-side approach to the problem:
I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.
The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().
Try running the following query to get the character code:
SELECT DUMP(<column with weird character>, 1016)
FROM <your table>
WHERE <column with weird character> like '%spouse%';
If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.
Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:
UPDATE <table>
set <column with offending character>
= REGEX_REPLACE(<column with offending character>,
"<character code of �>",
"'")
WHERE regex_like(<column with offending character>,"<character code of �>");
If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.
EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:
0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.
And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:
INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
"\0092",
"\0027") -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");
DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");
Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".
TRANSLATE() is a useful function for replacing or eliminating known single character codes.