SQL Server adding extra special characters in query result - sql

I am trying to extract some records in a file using BCP command in SQL Server. However when the file is generated, there are extract spaces in between the result for each column.
To try I just wrote basic SQL Query as simple as this
select 'ABC', 40, 'TEST','NOTWORKING'
When we copy the output of above query and paste it in Notepad, the output comes as
ABC 40 TEST NOTWORKING
Notice the space between each value? The file that system is generating using BCP command also has same space coming in the output file which is incorrect. What I want to see in the output file is
ABC40TESTNOTWORKING
What must be causing this issue? I am simply amazed to see such weird issue and hoping that it can be fixed by some changes or setting. Please help.
Sample BCP command
EXEC xp_cmdshell 'bcp "select ''ABC'', 40, ''TEST'',''NOTWORKING''" queryout "E:\Testfile.txt" -c -T -S""'
Output in the File - Testfile.txt
ABC 40 TEST NOTWORKING

There are probably tabs between the values. If you want a single value, use concat():
select CONCAT('ABC', 40, 'TEST', 'NOTWORKING')

There's no issue. The command line has no field terminator argument, so the default is used, a tab. That's described in the docs :
-t field_term
Specifies the field terminator. The default is \t (tab character). Use this parameter to override the default field terminator. For more information, see Specify Field and Row Terminators (SQL Server).
If you specify the field terminator in hexadecimal notation in a bcp.exe command, the value will be truncated at 0x00. For example, if you specify 0x410041, 0x41 will be used.
If field_term begins with a hyphen (-) or a forward slash (/), do not include a space between -t and the field_term value.
The link points to an entire article that explains how to use terminators, for each of the bulk operations.
As for the Copy/Paste operation, it has nothing to do with SQL Server. SQL Server has no UI, it's a service. I suspect what was pasted in Notepad was copied from an SSMS grid.
SSMS is a client tool just like any other. When you copy data from it into the clipboard, it decides what to put there and what format to use. That format can be plain text, using spaces and tabs for layout, RTF, HTML etc.
Plain text with tabs as field separators is probably the best choice for any tool, as it preserves the visual layout up to a point and uses only a single character as a separator. A fixed-length layout using spaces could also be used but that would add characters that may well be part of a field.
Encodings and codepages
-c exports the data using the user's default codepage. This means that text stored in varchar fields using a different codepage (collation) may get mangled. Non-visible Unicode characters will also get mangled and appear as something else, or as ?.
-c
Performs the operation using a character data type. This option does not prompt for each field; it uses char as the storage type, without prefixes and with \t (tab character) as the field separator and \r\n (newline character) as the row terminator. -c is not compatible with -w.
It's better to use export the file as UTF16 using -w.
-w
Performs the bulk copy operation using Unicode characters. This option does not prompt for each field; it uses nchar as the storage type, no prefixes, \t (tab character) as the field separator, and \n (newline character) as the row terminator. -w is not compatible with -c.
The codepage can be specified using the -C parameter. -C 1251 for example will export the data using Windows' Latin1 codepage. 1253 will export it using the Greek codepage.
-C { ACP | OEM | RAW | code_page }
Specifies the code page of the data in the data file. code_page is relevant only if the data contains char, varchar, or text columns with character values greater than 127 or less than 32.
SQL Server 2016 and later can also export text as UTF8 with -C 65001. Earlier versions don't support UTF8.
Versions prior to version 13 (SQL Server 2016 (13.x)) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.
All this is described in bcp's online documentation.
This subject is so important for any database that it has an entire section in the docs, that describes data format and considerations, using format files to specify different settings per column, and guidelines to ensure compatibility with other applications

Related

invalid byte sequence for encoding “UTF8”

I am trying to load a 3GB (24 Million rows) csv file to greenplum database using gpload functionality but I keep getting the below error
Error -
invalid byte sequence for encoding "UTF8": 0x8d
I have tried solution provided by Mike but for me, my client_encoding and file encoding are already the same. Both are UNICODE.
Database -
show client_encoding;
"UNICODE"
File -
file my_file_name.csv
my_file_name.csv: UTF-8 Unicode (with BOM) text
I have browsed through Greenplum's documentation as well, which says the encoding of external file and database should match. It is matching in my case yet somehow it is failing.
I have uploaded similar smaller files as well (same UTF-8 Unicode (with BOM) text)
Any help is appreciated !
Posted in another thread - use the iconv command to strip these characters out of your file. Greenplum is instantiated using a character set, UTF-8 by default, and requires that all characters be of the designated character set. You can also choose to log these errors with the LOG ERRORS clause of the EXTERNAL TABLE. This will trap the bad data and allow you to continue up to set LIMIT that you specify during create.
iconv -f utf-8 -t utf-8 -c file.txt
will clean up your UTF-8 file, skipping all the invalid characters.
-f is the source format
-t the target format
-c skips any invalid sequence

Update/insert/retrieve accented character in DB?

I am using oracle 12G
when i run #F:\update.sql from sql plus it displays accented character é as junk character when I retrieve from either sqlplus or sql developer
By when run the individual statement from sql plus. Now if I retrieve it from sqlplus, it displays the correct character, but when I retrieve it from sqldeveloper, it again displays the junk character.
update.sql content is this
update employee set name ='é' where id= 1;
What i want is when i run #F:\update.sql , it should insert/update/retrieve it in correct format whether it is from sqlplus or any other tool ?
For information :- when i run
SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET%'
i get below information
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET WE8MSWIN1252
NLS_NCHAR_CHARACTERSET AL16UTF16
when i run #.[%NLS_LANG%] from command prompt i see
SP2-0310: unable to open file ".[AMERICAN_AMERICA.WE8MSWIN1252]"
I am not familiar with SQL Developer but I can give solution for SQL*Plus.
Presume you like to work in Windows CP1252
First of all ensure that the file F:\update.sql is saved in CP1252 encoding. Many editors call this encoding ANSI which is the same (let's skip the details about difference between term ANSI and Windows-1252)
Then before you run the script enter
chcp 1252
in order to switch encoding of your cmd.exe to CP1252. By default encoding of cmd.exe is most likely CP850 or CP437 which are different.
Then set NLS_LANG environment variable to character set WE8MSWIN1252, e.g.
set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
After that your script should work fine with SQL*Plus. SQL*Plus inherits the encoding (or "character set", if you prefer this term) from parent cmd.exe. NLS_LANG tells the Oracle driver which character set you are using.
Example Summary:
chcp 1252
set NLS_LANG=.WE8MSWIN1252
sqlplus username/password#db #F:\update.sql
Some notes: In order to set encoding of cmd.exe permanently, see this answer: Unicode characters in Windows command line - how?
NLS_LANG can be set either as Environment Variable or in your Registry at HKLM\SOFTWARE\Wow6432Node\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 32-bit Oracle Client), resp. HKLM\SOFTWARE\ORACLE\KEY_%ORACLE_HOME_NAME%\NLS_LANG (for 64-bit Oracle Client).
For SQL Developer check you options, somewhere it should be possible to define encoding of SQL files.
You are not forced to use Windows-1252. The same works also for other encoding, for example WE8ISO8859P1 (i.e. ISO-8859-1, chcp 28591) or UTF-8. However, in case of UTF-8 your SQL-script may contain characters which are not supported by database character set WE8MSWIN1252. Such characters would be replaced by placeholder (e.g. ¿).

UTL_FILE operations produces special characters in the output file

We've some String values stored in Oracle DB. We're writing these values to a .DAT file . The query snippet in our package looks like below :
Opening file :
l_file := UTL_FILE.fopen(l_dir, l_file_name, 'W');
Writing to file :
UTL_FILE.putf(l_file, ('E' ||';'|| TO_CHAR(l_rownum) ||';'|| TO_CHAR(v_cdr_session_id_seq) ||';' || l_record_data));
String value in the DB : "Owner’s address"
String value in the .DAT file : "Owner’s address"
Question is : How to avoid those special special characters while writing it to an output file?
I assume your database uses character set AL32UTF8 (which is the default nowadays). In such case try this:
l_file := UTL_FILE.FOPEN_NCHAR(l_dir, l_file_name, 'W');
PUT_NCHAR(l_file, ('E;'|| l_rownum ||';'|| v_cdr_session_id_seq||';' ||l_record_data);
Note for function FOPEN_NCHAR: Even though the contents of an NVARCHAR2 buffer may be AL16UTF16 or UTF8 (depending on the national character set of the database), the contents of the file are always read and written in UTF8.
Summarising from comments, your Linux session and Vim configuration are both using UTF-8, but your terminal emulator software is using the Windows-1252 codepage. That renders the 'curly' right-quote mark you have, ’, which is Unicode codepoint 2019, as ’.
You need to change your emulator's configuration from Windows-1252 to UTF-8. Exactly how depends on which emulator you are using. For example, in PuTTY you can change the current session by right-clicking on the window title bar and choosing "Change settings...", then going to the "Window" category and its "Translation" subcategory, and changing the value from the default "Win 252 (Western)" to "UTF-8".
If you have the file open in Vim you can control-L to redraw it. That will only affect the current session though; to make that change permanent you'll need to make the same change to the stored session settings, from the "New session..." dialog - loading your current settings, and remembering to save the changes before actually opening the new session.
Other emulators have similar settings but in different places. For XShell, according to their web site:
You can easily switch to UTF8 encoding by selecting Unicode (UTF8) in the Encoding list on the Standard toolbar.
It looks like you can also set it for a session, or as the default.

Postgres 9.3 end-of-copy marker corrupt - Any way to change this setting?

I am trying to stream data through an AWK program to a Postgres COPY command. This works great usually. However, in my data recently I have been getting long text stings containing '\.' values.
Postgres Documentation mentions this combination of characters represents the end-of-data marker, http://www.postgresql.org/docs/9.2/static/sql-copy.html, and I am getting the associated errors when trying to insert with COPY.
My question is, is there a way to turn this off? Perhaps change the end-of-data marker to a different combination of characters? Or do I have to alter/remove these strings before trying to insert using the COPY command?
You can try to filter your data through sed 's:\\:\\\\:g' - this would change every \ in your data to \\, which is a correct escape sequence for single backslash in copy data.
But I think not only backslash would be problematic. Also newlines should be encoded by \n, carriage returns as \r and tabs as \t (tab is a default field delimiter in copy).

Sqlcmd trailing spaces in output file

Here is my simplified scenario:
I have a table in SQL Server 2005 with single column of type varchar(500). Data in the column is always 350 characters in length.
When I run a select on it in SSMS query editor, copy & paste the result set in to a text file, the line length in the file is 350, which matches the actual data length.
But when I use sqlcmd with the -o parameter, the resulting file has line length 500, which matches the max length of varchar(500).
So question is, without using any string functions in select, is there a way to let sqlcmd know not to treat it like char(500) ?
You can use the sqlcmd formatting option -W to remove trailing spaces from the output file.
Read more at this MSDN article.
-W only works with default size of 256 for variable size columns. If you want more than that you got to use -y modifier which will tell you its mutually exclusive with -W. Basically you are out of luck and as in my case file grows from 0.5M to 172M. You have to use other ways to strip white space post file generation. Some PowerShell command or something.