Progress codpage export data to sql via dtsx - sql

In our Progress (V9.1B) the codepage used is ibm8858-1. On a unix sco machine.
In Progress I dump data as follows :
export stream strA to /u/usr/ppd/tesk.txt convert target "1252".
for each tab1 no-lock :
put tab1.field1 ";" tab1.field2 skip.
end.
When I use vi to open de file is see the french words as follows : Apr\212s in stead of après.
Normally I ftp that file to a PC and then via a dtsx I load it in sql.
But in sql the french characters are also not in the correct format.
Does somebody knows how I have to export the data (text) and how I need to import it in sql (2005) . Now I also use codepage 1252 in my dtsx.
Tkx,
Jac

Are you 100% sure the field doesn't contain "Apr\212s"? I'm also unsure if there is a ibm8858-1 codepage, I'm guessing you might be mixing it up with iso8859-1 (Scandinavian letters).
Whenever I encounter codepage problems the characters are garbled, not converted to something that looks like a ascii code.
You can always try (assuming its really iso8859-1)
convert source "iso8859-1" target "1252".

Related

SQL Server adding extra special characters in query result

I am trying to extract some records in a file using BCP command in SQL Server. However when the file is generated, there are extract spaces in between the result for each column.
To try I just wrote basic SQL Query as simple as this
select 'ABC', 40, 'TEST','NOTWORKING'
When we copy the output of above query and paste it in Notepad, the output comes as
ABC 40 TEST NOTWORKING
Notice the space between each value? The file that system is generating using BCP command also has same space coming in the output file which is incorrect. What I want to see in the output file is
ABC40TESTNOTWORKING
What must be causing this issue? I am simply amazed to see such weird issue and hoping that it can be fixed by some changes or setting. Please help.
Sample BCP command
EXEC xp_cmdshell 'bcp "select ''ABC'', 40, ''TEST'',''NOTWORKING''" queryout "E:\Testfile.txt" -c -T -S""'
Output in the File - Testfile.txt
ABC 40 TEST NOTWORKING
There are probably tabs between the values. If you want a single value, use concat():
select CONCAT('ABC', 40, 'TEST', 'NOTWORKING')
There's no issue. The command line has no field terminator argument, so the default is used, a tab. That's described in the docs :
-t field_term
Specifies the field terminator. The default is \t (tab character). Use this parameter to override the default field terminator. For more information, see Specify Field and Row Terminators (SQL Server).
If you specify the field terminator in hexadecimal notation in a bcp.exe command, the value will be truncated at 0x00. For example, if you specify 0x410041, 0x41 will be used.
If field_term begins with a hyphen (-) or a forward slash (/), do not include a space between -t and the field_term value.
The link points to an entire article that explains how to use terminators, for each of the bulk operations.
As for the Copy/Paste operation, it has nothing to do with SQL Server. SQL Server has no UI, it's a service. I suspect what was pasted in Notepad was copied from an SSMS grid.
SSMS is a client tool just like any other. When you copy data from it into the clipboard, it decides what to put there and what format to use. That format can be plain text, using spaces and tabs for layout, RTF, HTML etc.
Plain text with tabs as field separators is probably the best choice for any tool, as it preserves the visual layout up to a point and uses only a single character as a separator. A fixed-length layout using spaces could also be used but that would add characters that may well be part of a field.
Encodings and codepages
-c exports the data using the user's default codepage. This means that text stored in varchar fields using a different codepage (collation) may get mangled. Non-visible Unicode characters will also get mangled and appear as something else, or as ?.
-c
Performs the operation using a character data type. This option does not prompt for each field; it uses char as the storage type, without prefixes and with \t (tab character) as the field separator and \r\n (newline character) as the row terminator. -c is not compatible with -w.
It's better to use export the file as UTF16 using -w.
-w
Performs the bulk copy operation using Unicode characters. This option does not prompt for each field; it uses nchar as the storage type, no prefixes, \t (tab character) as the field separator, and \n (newline character) as the row terminator. -w is not compatible with -c.
The codepage can be specified using the -C parameter. -C 1251 for example will export the data using Windows' Latin1 codepage. 1253 will export it using the Greek codepage.
-C { ACP | OEM | RAW | code_page }
Specifies the code page of the data in the data file. code_page is relevant only if the data contains char, varchar, or text columns with character values greater than 127 or less than 32.
SQL Server 2016 and later can also export text as UTF8 with -C 65001. Earlier versions don't support UTF8.
Versions prior to version 13 (SQL Server 2016 (13.x)) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.
All this is described in bcp's online documentation.
This subject is so important for any database that it has an entire section in the docs, that describes data format and considerations, using format files to specify different settings per column, and guidelines to ensure compatibility with other applications

UTL_FILE operations produces special characters in the output file

We've some String values stored in Oracle DB. We're writing these values to a .DAT file . The query snippet in our package looks like below :
Opening file :
l_file := UTL_FILE.fopen(l_dir, l_file_name, 'W');
Writing to file :
UTL_FILE.putf(l_file, ('E' ||';'|| TO_CHAR(l_rownum) ||';'|| TO_CHAR(v_cdr_session_id_seq) ||';' || l_record_data));
String value in the DB : "Owner’s address"
String value in the .DAT file : "Owner’s address"
Question is : How to avoid those special special characters while writing it to an output file?
I assume your database uses character set AL32UTF8 (which is the default nowadays). In such case try this:
l_file := UTL_FILE.FOPEN_NCHAR(l_dir, l_file_name, 'W');
PUT_NCHAR(l_file, ('E;'|| l_rownum ||';'|| v_cdr_session_id_seq||';' ||l_record_data);
Note for function FOPEN_NCHAR: Even though the contents of an NVARCHAR2 buffer may be AL16UTF16 or UTF8 (depending on the national character set of the database), the contents of the file are always read and written in UTF8.
Summarising from comments, your Linux session and Vim configuration are both using UTF-8, but your terminal emulator software is using the Windows-1252 codepage. That renders the 'curly' right-quote mark you have, ’, which is Unicode codepoint 2019, as ’.
You need to change your emulator's configuration from Windows-1252 to UTF-8. Exactly how depends on which emulator you are using. For example, in PuTTY you can change the current session by right-clicking on the window title bar and choosing "Change settings...", then going to the "Window" category and its "Translation" subcategory, and changing the value from the default "Win 252 (Western)" to "UTF-8".
If you have the file open in Vim you can control-L to redraw it. That will only affect the current session though; to make that change permanent you'll need to make the same change to the stored session settings, from the "New session..." dialog - loading your current settings, and remembering to save the changes before actually opening the new session.
Other emulators have similar settings but in different places. For XShell, according to their web site:
You can easily switch to UTF8 encoding by selecting Unicode (UTF8) in the Encoding list on the Standard toolbar.
It looks like you can also set it for a session, or as the default.

DB2 SQL Interpret a field as other CCSID

So I have a file on my AS400 as a result of DSPJRN and I want to look at some data in the JOESD field which is the after image from the journal of a file. This is defined as char with CCSID = 65535. I guess this is because it is the whole record with a mixture of ccsid and numeric fields.
I can use substr() to get the actual field from the original file. In the original file the column is defined graphic(10) ccsid 13488. Thats UCS-2. If I do hex(substr(joesd,522,20)) I get a result of 004100530044... and so on so I know it's the correct data but I can't get it to display as 'ASD...'
I tried graphic(substr(joesd,522,20),10,13488) but it gives an error that the conversion from ccsid 65535 to 13488 isn't valid. I don't want to convert it but interpret it as the other ccsid
GRAPHIC() doesn't take CCSID as a parm. The third parm is length according to my 7.1 reference.
What version are you using?
I thought CAST() might be a solution, but it doesn't appear to work.
As I see it, one option would be to build a user defined function (UDF) that does the conversion you need; possibly with the iconv() API.
The other option, would be to dump the data into a properly formatted file. I use the DBUJRN utility from DBU. There's other similar options. Including an open source one (sorry that the description is in German, but google translate does a good enough job to figure out the source to download).
The utilities basically work the same way; you can in fact run through the same process manually. Try the following:
Step 1 (the DSPJRN you've been doing)
DSPJRN <...> OUTFILE(MYLIB/MYJRNOUT)
Step 2 - Create a new file with the journal header fields followed by all the fields from your journaled file (MYFILE)
CREATE TABLE mylib/mytbl as
( select JOENTL, JOSEQN, JOCODE, JOENTT, JODATE,
JOTIME, JOJOB, JOUSER, JONBR, JOPGM, JOOBJ,
JOLIB, JOMBR, JOCTRR, JOFLAG, JOCCID,
JOINCDAT, JOMINESD, JORES,
m.*
from MYLIB/MYJRNOUT , MYLIB/MYFILE m
) with no data
Step 3 - Copy the data without regard to the format differences..
CPYF FROMFILE(MYLIB/MYJRNOUT) TOFILE(MYLIB/MYTBL) MBROPT(*ADD) FMTOPT(*NOCHK)
You should end up with data originally in JOESD split into it's appropriate fields.
Note of course that this technique only works for one file at a time. Also, make sure you're only dumping *RCD entries and you'll probably want to skip the DELETE entries.

Encoding issue in I/O with Jena

I'm generating some RDF files with Jena. The whole application works with utf-8 text. The source code as well is stored in utf-8.
When I print a string contaning non-English characters on the console, I get the right format, e.g. Est un lieu généralement officielle assis....
Then, I use the RDF writer to output the file:
Model m = loadMyModelWithMultipleLanguages()
log.info( getSomeStringFromModel(m) ) // log4j, correct output
RDFWriter w = m.getWriter( "RDF/XML" ) // default enc: utf-8
w.setProperty("showXmlDeclaration","true") // optional
OutputStream out = new FileOutputStream(pathToFile)
w.write( m, out, "http://someurl.org/base/" )
// file contains garbled text
The RDF file starts with: <?xml version="1.0"?>. If I add utf-8 nothing changes.
By default the text should be encoded to utf-8.
The resulting RDF file validates ok, but when I open it with any editor/visualiser (vim, Firefox, etc.), non-English text is all messed up: Est un lieu généralement officielle assis ... or Est un lieu g\u221A\u00A9n\u221A\u00A9ralement officielle assis....
(Either way, this is obviously not acceptable from the user's viewpoint).
The same issue happens with any output format supported by Jena (RDF, NT, etc.).
I can't really find a logical explanation to this.
The official documentation doesn't seem to address this issue.
Any hint or tests I can run to figure it out?
My guess would be that your strings are messed up, and your printStringFromModel() method just happens to output them in a way that accidentally makes them display correctly, but it's rather hard to say without more information.
You're instructing Jena to include an XML declaration in the RDF/XML file, but don't say what encoding (if any) Jena declares in the XML declaration. This would be helpful to know.
You're also not showing how you're printing the strings in the printStringFromModel() method.
Also, in Firefox, go to the View menu and then to Character Encoding. What encoding is selected? If it's not UTF-8, then what happens when you select UTF-8? Do you get it to show things correctly when selecting some other encoding?
Edit: The snippet you show in your post looks fine and should work. My best guess is that the code that reads your source strings into a Jena model is broken, and reads the UTF-8 source as ISO-8859-1 or something similar. You should be able to confirm or disconfirm that by checking the length() of one of the offending strings: If each of the troublesome characters like é are counted as two, then the error is on reading; if it's correctly counted as one, then it's on writing.
My hint/answer would be to inspect the byte sequence in 3 places:
The data source. Using a hex editor, confirm that the é character in your source data is represented by the expected utf-8 hex sequence 0xc3a8.
In memory. Right after your call to printStringFromModel, put a breakpoint and inspect the bytes in the string (or convert to hex and print them out.
The output file. Again, use a hex editor to inspect the byte sequence is 0xc3a8.
This will tell exactly what is happening to the bytes as they travel along the path of your program, and also where they deviate from the expected 0xc3a8.
The best way to address this would be to package up the smallest unit of your code that you can that demonstrates the issue, and submit a complete, runnable test case as a ticket on the Jena Jira.

SQL Server to CSV character encoding

I have a SQL Server database extract I'm doing.
At the beginning of my program, I have:
ini_set('mssql.charset','cp1250');
My database calls do not do anything special.
I'm only call the following methods:
mssql_connect, mssql_select_db, mssql_query, mssql_fetch_object,
mssql_next_result and mssql_close.
When I print the output of my export on screen, all the characters look fine. When I export fputcsv() into a csv file, I get a ton of <92> and <93> characters (this is the way that they look when I use a terminal to read them). When I open the file using Excel, they look like ì, í and î
This is causing major problems. Do you have any ideas?
try to convert encoding into utf8
iconv('cp1250', 'utf-8', $text);
also print this:
var_dump(iconv_get_encoding('all'));
Thanks but it turns out that the problem isn't with the encoding so much as it is with the fact that my fputcsv() call actually was not specifying a delimiter. I chose "\t" for the delim and everything worked perfectly.