SQL Server to CSV character encoding

SQL Server to CSV character encoding - sql

I have a SQL Server database extract I'm doing.
At the beginning of my program, I have:
ini_set('mssql.charset','cp1250');
My database calls do not do anything special.
I'm only call the following methods:
mssql_connect, mssql_select_db, mssql_query, mssql_fetch_object,
mssql_next_result and mssql_close.
When I print the output of my export on screen, all the characters look fine. When I export fputcsv() into a csv file, I get a ton of <92> and <93> characters (this is the way that they look when I use a terminal to read them). When I open the file using Excel, they look like ì, í and î
This is causing major problems. Do you have any ideas?

try to convert encoding into utf8
iconv('cp1250', 'utf-8', $text);
also print this:
var_dump(iconv_get_encoding('all'));

Thanks but it turns out that the problem isn't with the encoding so much as it is with the fact that my fputcsv() call actually was not specifying a delimiter. I chose "\t" for the delim and everything worked perfectly.

Related

Multi-line text in a .env file

In vue, is there a way to have a value span multiple lines in an .env file. Ex:
Instead of:
someValue=[{"someValue":"Here is a really really long piece which should be split into multiple lines"}]
I want to do something like:
someValue=`[{"someValue":"Here is a really
really long piece which
should be split into multiple lines"}]`
Doing the latter gives me a JSON parsing error if I try to do JSON.parse(someValue) in my code

I don't know if this will work, but I can't format a comment appropriately enough to get the point across so see if this will work:
someValue=[{"someValue":"Here is a really\
really long piece which\
should be split into multiple lines"}]
Where "\" should escape the newline similar to how you can write long bash commands while escaping the newline. I'm not certain the .env interpreter will support it though.
EDIT
Looks like this won't work. This syntax was actually proposed, but I don't think it was incorporated. See motdotla/dotenv#333 (which is what Vue uses to parse .env).

Like #zero298 said, this isn't possible. Likely you could delimit the entry with a character that wouldn't show up normally in the text (^ is a good candidate), then parse it within the application using string.replace('^', '\n');

Combine SQL files with command `copy` in a batch file introduce an incorrect syntaxe because it does add an invisible character `U+FEFF`

In a pre-build event, a batch file is executed to combine multiple SQL files into a single one.
It is done using this command :
COPY %#ProjectDir%\Migrations\*.sql %#ProjectDir%ContinuousDeployment\AllFilesMergedTogether.sql
Everything appear to work fine but somehow the result give an incorrect syntaxe error.
After two hours of investigation, it turn out the issue is caused by an invisible character that remain invisible even with notepad++.
Using an online website, the character has been spotted and is U+FEFF has shown in following image.
Here are the two input scripts.
PRINT 'Script1'
PRINT 'Script2'
Here is the output given by the copy command.
PRINT 'Script1'
PRINT 'Script2'
Additional info :
Batch file is encoded with UTF-8
Input files are encoded with UTF-8-BOM
Output file is encoded with UTF-8-BOM.
I'm not sure it is possible to change the encoding output of command copy.
I've tried and failed.
What should be done to eradicate this extremely frustrating parasitic character?

It has turned out that changing encoding of input files to ANSI does fix the issue.
No more pesky character(s).
Also, doing so does change the encoding of the result file to UTF-8 instead of UTF-8-BOM which is great I believe.
Encoding can be changed using Notepad++ as show in following picture.

Progress codpage export data to sql via dtsx

In our Progress (V9.1B) the codepage used is ibm8858-1. On a unix sco machine.
In Progress I dump data as follows :
export stream strA to /u/usr/ppd/tesk.txt convert target "1252".
for each tab1 no-lock :
put tab1.field1 ";" tab1.field2 skip.
end.
When I use vi to open de file is see the french words as follows : Apr\212s in stead of après.
Normally I ftp that file to a PC and then via a dtsx I load it in sql.
But in sql the french characters are also not in the correct format.
Does somebody knows how I have to export the data (text) and how I need to import it in sql (2005) . Now I also use codepage 1252 in my dtsx.
Tkx,
Jac

Are you 100% sure the field doesn't contain "Apr\212s"? I'm also unsure if there is a ibm8858-1 codepage, I'm guessing you might be mixing it up with iso8859-1 (Scandinavian letters).
Whenever I encounter codepage problems the characters are garbled, not converted to something that looks like a ascii code.
You can always try (assuming its really iso8859-1)
convert source "iso8859-1" target "1252".

Script consecutive Replace-All operations in Notepad++

Is there a way to script consecutive Replace-All operations in Notepad++?
For example, I want to be able to first replace all “ characters with &ldquo and then to replace all ” characters with &rdquo and then I would like to replace all "string1" with "string2", etc...

Nevermind,
I finally figured it out, and it seems like a great solution.
I used PythonScript for NotePad++ (which is what I started with but which kept giving me errors, until I finally fixed a few things).
So, here is the code for those who may be interested:
# This Python file uses the following encoding: utf-8
import os, sys
editor.replace(r"“",r"“")
editor.replace(r"”",r"”")
editor.replace(r"’",r"’")
the r before the quotation marks allows the use of special characters as they are, and this is what was so difficult for me to get working.

Bulk insert, SQL Server 2000, unix linebreaks

I am trying to insert a .csv file into a database with unix linebreaks. The command I am running is:
BULK INSERT table_name
FROM 'C:\file.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
If I convert the file into Windows format the load works, but I don't want to do this extra step if it can be avoided. Any ideas?

I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.
As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.
What did work very slick was: with (ROWTERMINATOR = '0x0a')
Problem solved with Hex!

Thanks to all who have answered but I found my preferred solution.
When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.
DECLARE #bulk_cmd varchar(1000)
SET #bulk_cmd = 'BULK INSERT table_name
FROM ''C:\file.csv''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC (#bulk_cmd)
Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.
What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.

I confirm that the syntax
ROWTERMINATOR = '''+CHAR(10)+'''
works when used with an EXEC command.
If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:
ROWTERMINATOR = '''+CHAR(124)+''+CHAR(10)+'''

It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!

One option would be to use bcp, and set up a control file with '\n' as the line break character.
Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with '\r\n' line breaks.
Finally, you can use the FORMATFILE option on BULK INSERT. This will use a bcp control file to specify the import format.

Looks to me there are two general avenues that can be taken: some alternate way to read the CSV in the SQL script or convert the CSV beforehand with any of the numerous ways you can do that (bcp, unix2dos, if it is a one-time king of a thing, you can probably even use your code editor to fix the file for you).
But you will have to have an extra step!
If this SQL is launched from a program, you might want to convert the line endings in that program. In that case and you decide to code the conversion yourself, here is what you need to watch out for:
1. The line ending might be \n
2. or \r\n
3. or even \r (Mac!)
4. good grief, it could be that some lines have \r\n and others \n, any combination is possible unless you control where the CSV came from
OK, OK. Possibility 4 is farfetched. It happens in email, but that is another story.

I would think "ROWTERMINATOR = '\n'" would work. I would suggest opening the file in a tool that shows "hidden characters" to make sure the line is being terminated like you think. I use notepad++ for things like this.

It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).
When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).
Why can't you do:
(ROWTERMINATOR = CHAR(10))
Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:
(ROWTERMINATOR =
)
What happens when you echo out the contents of #bulk_cmd?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server to CSV character encoding - sql

try to convert encoding into utf8 iconv('cp1250', 'utf-8', $text); also print this: var_dump(iconv_get_encoding('all'));

Thanks but it turns out that the problem isn't with the encoding so much as it is with the fact that my fputcsv() call actually was not specifying a delimiter. I chose "\t" for the delim and everything worked perfectly.

Related

Multi-line text in a .env file

Combine SQL files with command `copy` in a batch file introduce an incorrect syntaxe because it does add an invisible character `U+FEFF`

Progress codpage export data to sql via dtsx

Script consecutive Replace-All operations in Notepad++

Bulk insert, SQL Server 2000, unix linebreaks

Categories

Resources