SSIS convert escaped character to asc(non printable characters) - sql

I use SSIS to migrate some csv files (from UNIX) into SQL. one csv file contains data like:
ID,Name,Desc
1,12345\t,12345\177
If the schema for Name or Desc must be varchar(6), then the migration will fail due to the non printable characters are presented as \t, \177...
How can I convert the escaped characters to non printable characters in SSIS?
\t to Char(9)
\177 to Char(127)
Is there a better solution?

You should be able to use regular expressions regex.unescape to do this. I would suggest loading the assembly and doing it in SQL server using a temporary table, check this out http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Related

SQL How can I copy a csv file into a table with this delimiter problem?

I'm trying to copy a csv file into a table. The delimiter is ',' but the csv file has a field named 'Description' where it also uses ',' but not as a delimiter. As part of a text.
How could I copy the csv file into the Import table?
If the comma is always within the double quotes then it shouldn't be a problem.
If not, you have a corrupt CSV file. The simplest way is probably to parse your file prior to importing to fix the corruption.
The details of how exactly to parse will depend on the dataset. Which fields are optional? which fields are compulsory? How many commas can occur at most? That kind of information is crucial for writing a parsing script.

Exporting SQL Server table containing a large text column

I have to export a table from a SQL Server, the table contains a column that has a large text content with the maximum length of the text going up to 100,000 characters.
When I use Excel as an export destination, I find out that the length of this text is capped and truncated to 32,765.
Is there an export format that preserves the length?
Note:
I will eventually be importing this data into another SQL Server
The destination SQL Server is in another network, so linked servers and other local options are not feasible
I don't have access to the actual server, so generating back up is difficult
As is documented in the Excel specifications and limits the maximum characters that can be stored in a single Excel cell is 32,767 characters; hence why your data is being truncated.
You might be better off exporting to a CSV, however, note that Quote Identified CSV files aren't supported within bcp/BULK INSERT until SQL Server 2019 (currently in preview). You can use a characters like || to denote a field delimited, however, if you have any line breaks you'll need to choose a different row delimitor too. SSIS, and other ETL tools, however, do support quote identified CSV files; so you can use something like that.
Otherwise, if you need to export such long values and want to use Excel as much as you can (which I actually personally don't recommend due to those awful ACE drivers), I would suggest exporting the (n)varchar(MAX) values to something else, like a text file, and naming each file with the value of your Primary Key included. Then, when you import the data back you can retrieve the (n)varchar(MAX) value again from each individual file.
The .sql is the best format for sql table. Is the native format for sql table, with that, you haven't to concert the export.

Japanese ANSI character in CSV file

I have a csv file generated from Japanese source system. The Japanese character is shown as given below ¬¼ˆã—Ê튔Ž®‰ïŽÐ ‘åã‰c‹ÆŠ. I have changed file type to UTF-8 and also ETL setting to incorporate that but that is working on new data only.
How can I change existing data in my table which shows characters like ‘åã‰c‹ÆŠ.
Is it possible to get original Japanese characters using SQL functions. I am using SQL Sever as database.
Thanks in advance.

Registered Symbol not getting inserted as-is in table

I am working on Oracle 10gR2.
The character set for DB is as below:
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_CHARACTERSET AL32UTF8.
I am getting data to be processed in TXT files. The first step in processing this data is creating external tables based on these flat files. One of the fields (and the columns in DB) in the flat file has String data, which contains ® (registered symbol). This character is visible in the txt file, but when I check the external table, the character is saved as �
I have modified the encoding of the IDE to UTF-8, where I am seeing the output of the query.
The data type for the column is: COL NVARCHAR2(1000)
Please suggest as to what could be causing this?
Generally this is caused by incorrect setting of the NLS_LANG environment variable. The NLS_LANG variable must tell oracle the encoding you are using for your data. If the NLS_LANG is unset, oracle assumes ASCII text (and your symbol is non-ascii).
If your data is UTF-8, try:
NLS_LANG=.AL32UTF8
For windows/iso try
NLS_LANG=.WE8ISO8859P15
You NEED to determine the encoding of your text file first. Use a hex editor to determine of the (R) symbol is UTF-8 or not.

bcp and backspace (^H) delimiter

I need to parse a flat file which is containing backspace (^H) character delimiter between fields. I need to parse this file and insert into sql server 2005 tables.I tried to use bcp utility along with the format file but I wasn't able to specify the delimiter as backspace.
The default one is tab (\t). There are several other delimiters as well but none to specify backspace. Anyone has any ideas, please do help me.
Also I need to export data from sql server table to fixed length flat file.I tried to use non-xml format file, but always it asks for a delimiter.How can I create a flat file using bcp without any delimiter between the fields?
All above are character files.
This is an ugly workaround, but you could always find something that's not in the flat file, and replace everything in the flat file with that, then use that as the column terminator (using bcp -t that).
Sorry that I'm almost 11 years late on this, hopefully you've already solved your problem but you can use the hexadecimal representation of the backspace character 0x08 to parse your input file and properly delimit your fields which are separated with a backspace character.