How to retrieve German characters from a large CSV File into SQL Server 2017 script

How to retrieve German characters from a large CSV File into SQL Server 2017 script - sql

I have a CSV file including a list of employees, where some of them includes German characters like 'ö' in their names. I need to create a temp table in my SQL Server 2017 script and fill it with the content of the CSV file. My script is:
CREATE TABLE #AllAdUsers(
[PhysicalDeliveryOfficeName] [NVARCHAR](255) NULL,
[Name] [NVARCHAR](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DisplayName] [NVARCHAR](255) NULL,
[Company] [NVARCHAR](255) NULL,
[SAMAccountName] [NVARCHAR](255) NULL
)
--import AD users
BULK INSERT #AllAdUsers
FROM 'C:\Employees.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
TABLOCK
)
However, even though I use "Nvarchar" variable type with the collation of "SQL_Latin1_General_CP1_CI", the German characters are not seem OK, for instance "Kösker" seems like:
"K├╢sker"
I've tried many other collations but couldn't find a fix for it. Any help would be very much appreciated.

Related

SQL Insert Out of Sync

I have a bit of SQL here which is throwing an error:
DROP TABLE HACP_TEMP_PIC_HCV_Imported;
CREATE TABLE HACP_TEMP_PIC_HCV_Imported
(
HeadSSN varchar(255) NOT NULL,
HeadFName varchar(255) NOT NULL,
HeadMName varchar(255),
HeadLName varchar(255) NOT NULL,
ModifiedDate varchar(255) NOT NULL,
ActionType varchar(255) NOT NULL,
EffectiveDate varchar(255) NOT NULL
);
BULK INSERT HACP_TEMP_PIC_HCV_Imported
FROM 'C:\Work\MTWAdhocReport.csv'
WITH
(
FIRSTROW = 11,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\Work\Import_ErrorRows_HCV.csv',
TABLOCK
);
UPDATE HACP_TEMP_PIC_HCV_Imported
SET HeadSSN = REPLACE(HeadSSN, '"', ''),
HeadFName = REPLACE(HeadFName, '"', ''),
HeadMName = REPLACE(HeadMName, '"', ''),
HeadLName = REPLACE(HeadLName, '"', ''),
ModifiedDate = REPLACE(ModifiedDate, '"', ''),
ActionType = REPLACE(ActionType, '"', ''),
EffectiveDate = REPLACE(REPLACE(EffectiveDate, '"', ''),',','');
DROP TABLE HACP_PIC_HCV_Imported;
CREATE TABLE HACP_PIC_HCV_Imported
(
HeadSSN varchar(255) NOT NULL,
HeadFName varchar(255) NOT NULL,
HeadMName varchar(255),
HeadLName varchar(255) NOT NULL,
ModifiedDate varchar(255) NOT NULL,
ActionType int NOT NULL,
EffectiveDate varchar(255) NOT NULL
);
INSERT INTO HACP_PIC_HCV_Imported(HeadSSN, HeadFName, HeadMName, HeadLName, ModifiedDate, ActionType, EffectiveDate)
SELECT
LTRIM(HeadSSN),
LTRIM(HeadFName),
LTRIM(HeadMName),
LTRIM(HeadLName),
LTRIM(ModifiedDate),
CONVERT(int, LTRIM(ActionType)),
LTRIM(EffectiveDate)
FROM
HACP_TEMP_PIC_HCV_Imported;
Stepping through this, creating the temp table and importing the CSV into it works fine. Updating the table to remove quotes and a trailing comma from the EffectiveDate column works. Creating the new table-proper works.
When trying to copy the data into the second table (and converting ActionType into an INT), I get this error message:
Conversion failed when converting the varchar value '4/07/2016' to data type int.
That data is the second row value in ModifiedDate, so the columns are apparently getting out of sync after importing the first row. I have double-checked that all of the data is in the proper columns after being imported into the temp table initially.
Any thoughts? I feel like I'm missing something obvious.

Your code suggests that you are using "proper" CSV format, which allows fields to be enclosed in double quotes. These delimited fields can contain commas. This is the format produced and read by Excel.
My guess is that you have a comma in such a delimited field and this is throwing off the import.
But, this format is not read properly by bulk insert. Ironically, (at least) one database does import the CSV formatted files with commas in the fields.
In the past when I've had this problem, it has only been on smallish files. I simply loaded the data into Excel and then saved in out using tabs or vertical bars as delimiters. This solved the problem in my case.
I'm not sure if there is a more advanced solution now. But I'm pretty sure your problem is that some fields have embedded commas in the text fields.

Unexpected EOF encountered in BCP

Trying to import data into Azure.
Created a text file in Management Studio 2005.
I have tried both a comma and tab delimited text file.
BCP IN -c -t, -r\n -U -S -P
I get the error {SQL Server Native Client 11.0]Unexpected EOF encountered in BCP data file
Here is the script I used to create the file:
SELECT top 10 [Id]
,[RecordId]
,[PracticeId]
,[MonthEndId]
,ISNULL(CAST(InvoiceItemId AS VARCHAR(50)),'') AS InvoiceItemId
,[Date]
,[Number]
,[RecordTypeId]
,[LedgerTypeId]
,[TargetLedgerTypeId]
,ISNULL(CAST(Tax1Id as varchar(50)),'')AS Tax1Id
,[Tax1Exempt]
,[Tax1Total]
,[Tax1Exemption]
,ISNULL(CAST([Tax2Id] AS VARCHAR(50)),'') AS Tax2Id
,[Tax2Exempt]
,[Tax2Total]
,[Tax2Exemption]
,[TotalTaxable]
,[TotalTax]
,[TotalWithTax]
,[Unassigned]
,ISNULL(CAST([ReversingTypeId] AS VARCHAR(50)),'') AS ReversingTypeId
,[IncludeAccrualDoctor]
,12 AS InstanceId
FROM <table>
Here is the table it is inserted into
CREATE TABLE [WS].[ARFinancialRecord](
[Id] [uniqueidentifier] NOT NULL,
[RecordId] [uniqueidentifier] NOT NULL,
[PracticeId] [uniqueidentifier] NOT NULL,
[MonthEndId] [uniqueidentifier] NOT NULL,
[InvoiceItemId] [uniqueidentifier] NULL,
[Date] [smalldatetime] NOT NULL,
[Number] [varchar](17) NOT NULL,
[RecordTypeId] [tinyint] NOT NULL,
[LedgerTypeId] [tinyint] NOT NULL,
[TargetLedgerTypeId] [tinyint] NOT NULL,
[Tax1Id] [uniqueidentifier] NULL,
[Tax1Exempt] [bit] NOT NULL,
[Tax1Total] [decimal](30, 8) NOT NULL,
[Tax1Exemption] [decimal](30, 8) NOT NULL,
[Tax2Id] [uniqueidentifier] NULL,
[Tax2Exempt] [bit] NOT NULL,
[Tax2Total] [decimal](30, 8) NOT NULL,
[Tax2Exemption] [decimal](30, 8) NOT NULL,
[TotalTaxable] [decimal](30, 8) NOT NULL,
[TotalTax] [decimal](30, 8) NOT NULL,
[TotalWithTax] [decimal](30, 8) NOT NULL,
[Unassigned] [decimal](30, 8) NOT NULL,
[ReversingTypeId] [tinyint] NULL,
[IncludeAccrualDoctor] [bit] NOT NULL,
[InstanceId] [tinyint] NOT NULL,
CONSTRAINT [PK_ARFinancialRecord] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
There are actually several hundred thousand actual records and I have done this from a different server, the only difference being the version of management studio.

If the file is tab-delimited then the command line flag for the column separator should be -t\t -t,

Just an FYI that I encountered this same exact error and it turned out that my destination table contained one extra column than the DAT file!

"Unexpected EOF" normally means means the column or row terminator is not what you expect
That is, your command line arguments for these do match the file
Typical causes:
Unix vs Windows line endings
Text data containing your column delimiter (comma in actual data)
Or a mix of the two.
SSMS should have nothing to do with it: it's the format (expected vs actual) that matters

I think most of us prefer real-world examples than syntax hints, so here's what I did:
bcp LoadDB.dbo.test in C:\temp\test.txt -S 123.66.108.207 -U testuser -P testpass -c -r /r
My data was an extract from a Unix-based Oracle DB which was tab delimited and had an LF end of line character.
Because my data was tab delimited I did not specify a -t parameter, the bcp default is tab.
Because my row terminator was a LineFeed (LF) character, then I used -r /r
Because my data was all being loaded into char fields I used the -c parameter

I every case that I have encountered this error, it ends up being an issue where the number of columns in the table do not the match the number of columns delimited in the text file. The easy way to confirm this is to load the text file into excel and compare the column count to that of the table.

I will share my experience with this issue. My users were sending me UTF-8 encoding and everything was working fine. My load started to fail when they updated the encoding to Encode in UCS-2 LE BOM. Use notepad++ to check these setting.
Reverting back to UTF-8 fixed my problem.
This link helped me resolving my issue.

Open the CSV file in EXCEL and "save as" new CSV file

I was facing the same error while trying to bcp in the records from datafile to table. A workaround which works is just open the file in Notepad++ or similar editor and add extra line at the end of the file.This worked for my case - field separator - |^|, row separator - new line (LRCF).
Command used: bcp in -T -c -t"|^|"

In my case the issue was that the record I was trying to import had an invalid foreign key

The answer to this puzzle is insidious.
I spent time that I can never get back...
If you're in Windows, use NotePad++ and under "Encoding" on the menu, change it to:
UCS-2 LE BOM
LE = Little Endian...
Such a hateful error! And I just installed SQL Server 2019 and the latest SQLCMD/BCP tools. Seems this error has been around for a while.
This guy saved my life: https://shades-of-orange.com/post/Unexpected-EOF-encountered-in-BCP-data-file

SQL Server 2005 - Bulk Insert failing

I have a txt file that contains 1600 rows and 82 columns of comma delineated data that I am trying to import into a table. I get the following error on every row on the very last field:
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 81 (DB252D20C8).
The import statement is
BULK
INSERT [ENERGY].[dbo].[READINGS1]
from 'c:\readings2.txt'
with
(
DATAFILETYPE='widechar',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
The table structure is as follows, the top and bottom of the script:
USE [ENERGY]
GO
/****** Object: Table [dbo].[READINGS1] Script Date: 05/13/2013 20:00:30 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[READINGS1](
[DateAndTime] [datetime] NOT NULL,
[DB240D4C7] [float] NULL,
[DB240D8C7] [float] NULL,
[DB240D12C7] [float] NULL,
[DB240D16C7] [float] NULL,
[DB252D12C8] [float] NULL,
[DB252D16C8] [float] NULL,
[DB252D20C8] [float] NULL,
CONSTRAINT [READINGS1DataTimeStamp] PRIMARY KEY CLUSTERED
(
[DateAndTime] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
The text file is as follows:
2013-02-19 00:00:00.000,6,945,1886,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,22,2040,6299,0,0,6,567,1248,0,0,251,8859,8655,0,0,10,316,1786,0,0,7,180,1206,0,0,1,16,56,0,0,368,18953,36949,0,0,NULL,NULL
2013-02-19 01:00:00.000,6,147,1886,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,22,1516,6299,0,0,3,115,1248,0,0,250,5077,8655,0,0,9,219,1786,0,0,5,147,1206,0,0,1,15,56,0,0,362,8907,36949,0,0,NULL,NULL

Alright so what you need to do is alter your statement so that after the end of the file you use KEEPNULLS. This informs SQL server that you wish to keep your null values. Currently it's trying to convert NULL as a string into your FLOAT COLUMN. Alter your statment to look like this.
BULK
INSERT [ENERGY].[dbo].[READINGS1]
from 'c:\readings2.txt'
with
(
DATAFILETYPE='widechar',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
KEEPNULLS
)
GO
There is an article on BOL about this. .
Otherwise you can always build a Integration Services package to handle this. That is an easy fast way to import information from flat file sources.

It turns out that there were too many fields in the input text file for the table.

Sql Server - Insufficient result space to convert uniqueidentifier value to char

I am getting below error when I run sql query while copying data from one table to another,
Msg 8170, Level 16, State 2, Line 2
Insufficient result space to convert
uniqueidentifier value to char.
My sql query is,
INSERT INTO dbo.cust_info (
uid,
first_name,
last_name
)
SELECT
NEWID(),
first_name,
last_name
FROM dbo.tmp_cust_info
My create table scripts are,
CREATE TABLE [dbo].[cust_info](
[uid] [varchar](32) NOT NULL,
[first_name] [varchar](100) NULL,
[last_name] [varchar](100) NULL)
CREATE TABLE [dbo].[tmp_cust_info](
[first_name] [varchar](100) NULL,
[last_name] [varchar](100) NULL)
I am sure there is some problem with NEWID(), if i take out and replace it with some string it is working.
I appreciate any help. Thanks in advance.

A guid needs 36 characters (because of the dashes). You only provide a 32 character column. Not enough, hence the error.

You need to use one of 3 alternatives
1, A uniqueidentifier column, which stores it internally as 16 bytes. When you select from this column, it automatically renders it for display using the 8-4-4-4-12 format.
CREATE TABLE [dbo].[cust_info](
[uid] uniqueidentifier NOT NULL,
[first_name] [varchar](100) NULL,
[last_name] [varchar](100) NULL)
2, not recommended Change the field to char(36) so that it fits the format, including dashes.
CREATE TABLE [dbo].[cust_info](
[uid] char(36) NOT NULL,
[first_name] [varchar](100) NULL,
[last_name] [varchar](100) NULL)
3, not recommended Store it without the dashes, as just the 32-character components
INSERT INTO dbo.cust_info (
uid,
first_name,
last_name
)
SELECT
replace(NEWID(),'-',''),
first_name,
last_name
FROM dbo.tmp_cust_info

I received this error when I was trying to perform simple string concatenation on the GUID. Apparently a VARCHAR is not big enough.
I had to change:
SET #foo = 'Old GUID: {' + CONVERT(VARCHAR, #guid) + '}';
to:
SET #foo = 'Old GUID: {' + CONVERT(NVARCHAR(36), #guid) + '}';
...and all was good. Huge thanks to the prior answers on this one!

Increase length of your uid column from varchar(32) ->varchar(36)
because guid take 36 characters
Guid.NewGuid().ToString() -> 36 characters
outputs: 12345678-1234-1234-1234-123456789abc

You can try this. This worked for me.
Specify a length for VARCHAR when you cast/convert a value..for uniqueidentifier use VARCHAR(36) as below:
SELECT Convert (varchar(36),NEWID()) AS NEWID
The default length for VARCHAR datatype if we don't specify a length during CAST/CONVERT is 30..
Credit : Krishnakumar S
Reference : https://social.msdn.microsoft.com/Forums/en-US/fb24a153-f468-4e18-afb8-60ce90b55234/insufficient-result-space-to-convert-uniqueidentifier-value-to-char?forum=transactsql

Regex to extract fields and data types from sql statement

I have this sql statement:
CREATE TABLE [dbo].[User]( [UserId] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [MiddleName]
[varchar](50) COLLATE SQL_Latin1_General_CP1_CI_A
What i want is regex code which i can use to get all fields and data type.
So will return something like that:
FirstName varchar
MiddleName varchar
Notes:
The sql statement will always have this format.
I am using .Net to run this regex

You didn't mention whether the SQL statement is in a string on one line or if it's spanning multiple lines.
Assuming it's on one line, this may fit your request:
Dim input As String = "CREATE TABLE [dbo].[User]( [UserId] [int] IDENTITY(1,1) NOT NULL, " & _
"[FirstName] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [MiddleName] " & _
"[varchar](50) COLLATE SQL_Latin1_General_CP1_CI_A"
For Each m As Match In Regex.Matches(input, "\[(?<Field>\w+)\]\s*\[(?<Type>\w+)\]")
Console.WriteLine("{0} : {1}", m.Groups("Field").Value, m.Groups("Type").Value)
Next

I don't know anything .NET. In some other worlds, the following could handle the search portion of the operation:
\[(.*?)\][\s\n\r]+\[(.*?)\]\((\d\d)\)
Insert that into the "search" format for a .NET regex (whatever that might be), write your output stuff. If linebreaks can occur midword then this could have problems. Note that the above also pulls the type's length, so it would produce
MiddleName varchar 50
To do without the third backreference, just leave it out of the replace (wasted) or do
\[(.*?)\][\s\n\r]+\[(.*?)\]\(\d\d\)
Lots of fine ways to do it. As usual just make sure you understand the potential variability of the input.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to retrieve German characters from a large CSV File into SQL Server 2017 script - sql

Related

SQL Insert Out of Sync

Unexpected EOF encountered in BCP

SQL Server 2005 - Bulk Insert failing

Sql Server - Insufficient result space to convert uniqueidentifier value to char

Regex to extract fields and data types from sql statement

Categories

Resources