BULK INSERT is not working correctly

BULK INSERT is not working correctly - sql

I used bulk insert into SQL Server Management Studio 2008 R2, 10 words from a text UTF-8 file, into single column.
However, the words do not appear correctly, I get extra space in front of some words.
Note: None of the answers have solved my problem, so far. :(
SCREENSHOT OF THE PROBLEM

This issue may occur if you are not using the correct collation (language settings). You need to use the appropriate collation in order to display your data in the correct format.
See the link http://technet.microsoft.com/en-us/library/ms187582(v=sql.105).aspx for more details.

You can also try using a different row terminator:
bulk insert table_name
from 'filename.txt' WITH (ROWTERMINATOR='\n')

Look at this post How to write UTF-8 characters using bulk insert in SQL Server?
Quote: You can't. You should first use a N type data field, convert
your file to UTF-16 and then import it. The database does not support
UTF-8.
Original answers
look at the encoding of youre text file. Should be utf8. If not this could cause problems.
Open with notepad, file-> save as and choose encoding
After this try to import as a bulk
secondly, make sure the column datatype is nvarchar and not varchar. Also see here

Related

DB2 to SQL LinkedServer OpenQuery NonAscii Character Issue

So I've been scouring SO for answer and I've seen some great SQL functions to help try and remove non-ascii characters from my db, but I wanted to post the entire question / process here first to see if maybe upstream on my select from db2 into sql there is a fix.
What I'm doing: Getting data from a db2 database into SQL
Issue: Non-ascii characters causing problems
Process: It's pretty simple. I have a SQL Insert statement to select a bunch of columns from a db2 linkedserver using open query
insert into [table](stuff) select (stuff) From Openquery(SSF400,'select stuff from table')
However, in my SQL db, when editing the landed table, I'm getting weird trailing characters that appear as a space in a sql select statement, but are actually artifacts in SQL Edit mode:
I've tried using a few functions I found here on SO to strip these characters, but after these function(s) I'm leftover with a combination of greek/english characters similar to the below:
I'm thinking there must be a better way for me to do the initial insert other than using openquery so that the junk characters don't come over. I know SQL pretty well, but DB2 not so much...any advice?
Update: There does seem to be a junk character or two in the source system. Discovered using iNavigator. Also, source system is using db2 v7r3m0
Update here is a screenshot of the regexp expression mentioned in the comments used in a query in iNavigator. Although several characters were removed, some do remain. The original column is on the left, the cleansed column is on the right.
Cheers,
MD

I would try REGEXP_REPLACE(stuff,'[^\u0020-\u007E\u0009\u000A\u000D]+','') which will remove everything that is not a character from the 7-bit ASCII set but also removes any 7-bit ASCII control characters apart from Tab, New Line and Carriage Return. It also removes DEL

SQL Server Bulk Insert - 0 row(s) affected

I am trying to do a bulk insert of a .CSV from a remote location.
My SQL statement is:
BULK INSERT dbo.tblMaster 
FROM '\\ZAJOHVAPFL20\20ZA0004\E\EDData\testbcp.csv'
WITH (FIELDTERMINATOR = ',',
      ROWTERMINATOR = '\n')
My .CSV looks like this:
john,smith
jane,doe
The CSV is saved with UTF-8 encoding, and there is no blank line at the bottom of the file. The table that I am bulk inserting too is also empty.
The table has two columns; firstname (nvarchar(max)) and secondname (nvarchar(max)).
I have sysadmin rights on the server so have permission to perform bulk inserts.
When running the SQL, it runs without error, and simple shows -
0 row(s) affected
and doesn't insert any information.
Any help is greatly appreciated.

I know this may be too late to answer but I thought this might help anyone looking for the fix. I had similar issue with Bulk Insert but didn't find any fix online. Most probably the flat/csv file was generated with non-windows format. If you can open the file in Notepad++ then go to edit tab and change the EOL Conversion to "Windows Format". This fixed the problem for me.
Notepad++>> Edit >> EOL Conversion >> Windows Format

When you specify \n as a row terminator for bulk export, or implicitly use the default row terminator, it outputs a carriage return-line feed combination (CRLF) as the row terminator. If you want to output a line feed character only (LF) as the row terminator - as is typical on Unix and Linux computers - use hexadecimal notation to specify the LF row terminator. For example:
ROWTERMINATOR='0x0A'
no need to do the following:
Notepad++>> Edit >> EOL Conversion >> Windows Format

I opened the CSV with Excel and hit Control + S to resave it. That fixed the issue for me.

Try inserting the file using bcp.exe first, see if you get any row or any error. The problem with
BULK INSERT ...
FROM '\\REMOTE\SHARE\...'
is that you're now bringing impersonation and delegation security into picture and is more difficult to diagnose the issue. When you access a remote share like this you are actually doing a 'double-hop' Kerberos impersonation (aka. delegation) and you need special security set up. Read Bulk Insert and Kerberos for the details.

The problem is, at least in part, the UTF-8 encoding. That is not supported by default. If you are using SQL Server 2016 then you can specify Code Page 65001 (i.e. add , CODEPAGE = '65001' to the WITH clause). If using an earlier version of SQL Server, then you need to first convert the file encoding to UTF-16 Little Endian (known as "Unicode" in the Microsoft universe). That can be done either when saving the file or by some command line utility.

Encoding in Oracle database

I have a problem when inserting values into my Oracle database. I have to insert French characters like à or è and when I try to insert them through an INSERT statement it will convert the character to ¿ or ?.
Is there any possibility to set the encoding of that specific script, or what can I do in this situation ?
Thank you

Usually you would set the character set when you install your database. You can, however, change it post-setup if required (Look up CSALTER). If your database needs to support multiple languages, then you should take a look at this: Supporting Multilingual Databases with Unicode

I have fixed this problem by adding an Environment Variable called NLS_LANG with the value .AL32UTF8 . This worked even though the database has as language American and territory America. The problem that I have faced here was that once I changed the NLS_LANG variable, it started to encode my characters also in the application.
Also you can try to change the encoding of the script that you are running. For example I have used ANSI encoding (you can do it by opening a script in notepad++ and from the Encoding menu, select Convert to ANSI) and it worked properly.
Thank you guys for your help :)

sql server conditional replace of csv data

I have a csv file that I am trying to import using BULK INSERT. The problem is that there is a field in the file that will be quoted (with double quotes) if a comma exists within the text (not quoted if no comma exists). The existence of the extra comma is causing SQL Server to throw errors because of an incorrect number of columns during the insert.
Here is a sample data set:
928 Riata Dr,Magnolia,TX,77354,4/15/2014
22 Roberts Ave.,McKinney,TX,75069,4/15/2014
"5531 Trinity Place, #22",San Antonio,TX,78212,4/15/2014
As you can see, the third row contains a comma within the address field, thus the address field is quoted. Since the BULK INSERT command is throwing errors because of this, I'm assuming I will need to scrub the file contents before attempting to load it.
Unless someone has a better solution
To scrub the file contents I will need to open the file (with SQL), read in the contents, and do a conditional replacement of the internal comma (found within the quotes). Since that comma doesn't really need to exist, I can just replace it with '' (blank).
Then, I can handle the quotes separately after the data gets imported with an update statement to replace any other characters I don't want.
I think the logic is sound, the problem is the syntax. I can't seem to find any syntax related to REGEX in SQL Server (Booo Microsoft). Which means I would need some other way to determine if the comma appears within quotes, and replace it if so.
Any thoughts, Suggestions, Code, etc.?
Thanks in advance.

This sounds too simple on the face of it, but if you can just replace the commas, can you open the csv in, say, Excel or OpenOffice Calc, and then do a find replace (commas with nothing)? I just tried with a csv of mine and it worked fine. The csv remains properly delimited.
Maybe I am missing something that prevents this, such as Excel opening this with extra cells due to the comma, in which case my answer is stupid. But it would make more sense to handle this in a spreadsheet app rather than after opening with SQL.
You may have to try delimiting with something other than commas, such as tabs or etc. I've had to do this with SQL imports before. In many cases you can save as a tab delimited txt file and upload to SQL.
Note that using Excel for this type of thing can be its own problem. For help with Excel and tab delimited SQL imports, see my answer here.

How to insert Arabic characters into SQL database?

How can I insert Arabic characters into a SQL Server database? I tried to insert Arabic data into a table and the Arabic characters in the insert script were inserted as '??????' in the table.
I tried to directly paste the data into the table through SQL Server Management Studio and the Arabic characters was successfully and accurately inserted.
I looked around for resolutions for this problems and some threads suggested changing the datatype to nvarchar instead of varchar. I tried this as well but without any luck.
How can we insert Arabic characters into SQL Server database?

For the field to be able to store unicode characters, you have to use the type nvarchar (or other similar like ntext, nchar).
To insert the unicode characters in the database you have to send the text as unicode by using a parameter type like nvarchar / SqlDbType.NVarChar.
(For completeness: if you are creating SQL dynamically (against common advice), you put an N before a string literal to make it unicode. For example: insert into table (name) values (N'Pavan').)

Guess the solation is first turn on the field to ntext then write N with the value. For example
insert into eng(Name) values(N'حسن')

If you are trying to load data directly into the database like me, I found a great way to do so by creating a table using Excel and then export as CSV. Then I used the database browser SQLite to import the data correctly into the SQL database. You can then adjust the table properties if needed. Hope this would help.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas