Bulk insert issue - sql

During a bulk insert from a csv file a row in the file has 00000100008 value, both source (from which csv file is created) and the destination temptable has same field (char(11)).
When I try to insert I got the following error:
Bulk load data conversion error (truncation) for row 1, column 1 (fieldname)
If I remove the leading 0s and change this value to 100008 in csv file and then bulk insert, the destination table temptable shows '++ 100008 as inserted value. Why is that? How I can cope with value without leading double plus signs?
Here is the script:
BULK
INSERT temptable
FROM 'c:\TestFile.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
Edit: Some sample records from csv file.
100008,111122223333,Mr,ForeName1,SurName1,1 Test Lane,London,NULL,NULL,NULL,wd25 123,test#email.com.com,NULL
322,910315715845,Ms,G,Dally,17 Elsie Street,NULL,NULL,GOOLE,NULL,DN146DU,test1#email1.com,
323,910517288401,Mrs,G,Tom,2 White Mead,NULL,NULL,YEOVIL,NULL,BA213RS,test3#tmail2.com,

My first thought is that the file was saved on a Unix system and that you may have incompatibilities with the different style line breaks.
My first advice would be to analyze the text file using a hex editor to try to determine what character is getting put there.
++ 100008 basically means - Row format is inconsistent with page header. To solve this problem Run dbcc checktable.
I hope that this is going to help you.
Regards,

Related

Invalid digits on Redshift

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.
I'm trying to run the following query:
SELECT
CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
public.some_table;
The some_field is a column that has data with two numbers joined by an underscore like this:
some_field -> 38972691802309_48937927428392
And I'm trying to get the second part.
That said, here is the error I'm getting:
[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0,
Type: Long
Details:
-----------------------------------------------
error: Invalid digit, Value '1', Pos 0, Type: Long
code: 1207
context:
query: 1097254
location: :0
process: query0_99 [pid=0]
-----------------------------------------------;
Execution time: 2.61s
Statement 1 of 1 finished
1 statement failed.
It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.
I thought it would be an encoding error, but I've not found any references to solve that.
Anyone has any idea?
Thanks everybody.
I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.
In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.
select cast(colname as integer) from
(select
case when colname ~ '^[0-9]+$' then colname
else null
end as colname
from tablename);
Bottom line: this Redshift error is completely confusing and really needs to be fixed.
When you are using a Glue job to upsert data from any data source to Redshift:
Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.
In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.
Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will
obviously throw an error because my first column is an ID Key, not
like the other string column).
To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table..
It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.
For example: source table has fields ID|EMAIL|NAME with values 1|abcd#gmail.com|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write abcd#gmail.com|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.
Hmmm. I would start by investigating the problem. Are there any non-digit characters?
SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';
Is the value too long for a bigint?
SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27
If you need more than 27 digits of precision, consider a decimal rather than bigint.
If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file.
So the COPY command will look like below:
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;
For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.
For example, setting my columns datatype to Char with a specific length worked:
Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))

CAST of all columns of a table

I am having trouble exporting a file in proper csv in UTF8 from excel. I got the error : "the code page on input column is 1252 and is required to be 65001"
I looked on the error and some solution was to pass the column type to varchar.
I am trying to get my request to cast every columns (I have 191) at the same time and export this from SQL server to csv file, but i can't figure out how to make it work
My request is :
SELECT top 100000 CAST( * AS varchar(200)) from dbo.AccountingData ORDER BY NEWID();

SQL Bulk Insert skipping last 849 Lines from Text File

Good day all! For some reason my bulk Insert statement is skipping the last 849 lines from the text file I am reading. I know this because when I manually add my own last line I don't see it in the table after the insert is done and when debugging I see the message: (133758 row(s) affected) and the text file has 134607 Lines, excluding the first 2.
My query looks like this:
BULK INSERT #TEMP FROM 'C:\Test\Test.txt'
WITH (FIELDTERMINATOR ='\t', ROWTERMINATOR = '0x0a', FIRSTROW = 2, MAXERRORS = 50, KEEPNULLS)
I have checked if the are more columns then what the table has and that's not the case. I have changed MAXERRORS from 10 to 20 to 30 to 40 to 50, to see if there are any changes but the Row(s) affected stays the same. Is there maybe something I haven't handled for or missing?
Thanks awesome people.
P.S. I am using this insert for another text file and table but with different column headers and there are less columns in the text file and it works perfectly.

Error uploading CSV to PostgreSQL

I make table:
CREATE TABLE data (
date date,
time time,
val3 float,
val4 float);
And trying to load csv to it with next command:
copy data from 'G:\test\1.txt' DELIMETERS ' ' CSV;
the CSV have same structure:
date time val3 val4
2012.08.10 06:53:18 695.417 773.29
But I am getting next error:
ERROR: invalid input syntax for type date: "date"
Could you help me to find the reason of error?
I'm not very used to postgres, but I think you should set your datestyle before importing your file :
set datestyle German, YMD;
Look at these links : How do I alter the date format in Postgres? and DateTime Output
Sorry if I'm wrong, but I think you have to correctly set your datestyle (you can also do in the postgres.cnf file).
Do you have a header line in your CSV? If so, try
copy data from 'G:\test\1.txt' DELIMETERS ' ' CSV HEADER;

Import fixed width text to SQL

We have records in this format:
99 0882300 25 YATES ANTHONY V MAY 01 12 04 123456 12345678
The width is fixed and we need to import it into SQL. We tried bulk import, but it didn't work because it's not ',' or '\t' separated. It's separated by individual spaces, of various lengths, in the text file, which is where our dilemma is located.
Any suggestions on how to handle this? Thanks!
question is pretty old but might still be relevant.
I had exactly the same problem as you.
My solution was to use BULK INSERT, together with a FORMAT file.
This would allow you to:
keep the code much leaner
have the mapping for the text file
to upload in a separate file that you can easy tweak
skip columns if you fancy
To cut to the chase, here is my data format (that is one line)
608054000500SS001 ST00BP0000276AC024 19980530G10379 00048134501283404051N02912WAC 0024 04527N05580WAC 0024 1998062520011228E04ST 04856 -94.769323 26.954832
-94.761114 26.953626G10379 183 1
And here is my SQL code:
BULK INSERT dbo.TARGET_TABLE
FROM 'file_to_upload.dat'
WITH (
BATCHSIZE = 2000,
FIRSTROW = 1,
DATAFILETYPE = 'char',
ROWTERMINATOR = '\r\n',
FORMATFILE = 'formatfile.Fmt'
);
Please note the ROWTERMINATOR parameter set there, and the DATAFILETYPE.
And here is the format file
11.0
6
1 SQLCHAR 0 12 "" 1 WELL_API SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 19 "" 2 SPACER1 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 8 "" 3 FIELD_CODE SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 95 "" 4 SPACER2 SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 5 "" 5 WATER_DEPTH SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 93 "" 6 SPACER3 SQL_Latin1_General_CP1_CI_AS
I put documentation links below, but what you must note is the following:
the ""s in the 5th column, which indicates the separator (for a .csv would be obviously ","), which in our case is set to just "";
column 2 is fully "SQLCHAR", as it's a text file. This must stay so even if the destination field in the data table is for example an integer (it is my case)
Bonus note: in my case I only needed three fields, so the stuff in the middle I just called "spacer", and in my format file gets ignored (you change numbers in column 6, see documentation).
Hope it answers your needs, works fine for me.
Cheers
Documentation here:
https://msdn.microsoft.com/en-us/library/ms178129.aspx
https://msdn.microsoft.com/en-us/library/ms187908.aspx
When you feel more at home with SQL than importing tools, you could bulk import the file into a single VARCHAR(255) column in a staging table. Then process all the records with SQL and transform them to your destination table:
CREATE TABLE #DaTable(MyString VARCHAR(255))
INSERT INTO #DaTable(MyString) VALUES ('99 0882300 25 YATES ANTHONY V MAY 01 12 04 123456 12345678')
INSERT INTO FInalTable(Col1, Col2, Col3, Name)
SELECT CAST(SUBSTRINg(MyString, 1, 3) AS INT) as Col1,
CAST(SUBSTRING(MyString, 4, 7) AS INT) as Col2,
CAST(SUBSTRING(MyString, 12, 3) AS INT) as Col3,
SUBSTRING(MyString, 15, 6) as Name
FROM #DaTable
result: 99 882300 25 YATES
To import from TXT to SQL:
CREATE TABLE #DaTable (MyString VARCHAR(MAX));
And to import from a file
BULK INSERT #DaTable
FROM'C:\Users\usu...IDA_S.txt'
WHITH
(
CODEPAGE = 'RAW'
)
3rd party edit
The sqlite docs to import files has an example usage to insert records into a pre-existing temporary table from a file which has column names in its first row:
sqlite> .import --csv --skip 1 --schema temp C:/work/somedata.csv tab1
My advice is to import the whole file in a new table (TestImport) with 1 column like this
sqlite> .import C:/yourFolder/text_flat.txt TestImport
and save it to a db file
sqlite> .save C:/yourFolder/text_flat_out.db
And now you can do all sorts of etl with it.
I did this for a client a while back and, sad as it may seem, Microsoft Access was the best tool for the job for his needs. It's got support for fixed width files baked in.
Beyond that, you're looking at writing a script that translates the file's rows into something SQL can understand in an insert/update statement.
In Ruby, you could use the String#slice method, which takes an index and length, just like fixed width files' definitions are usually expressed in. Read the file in, parse the lines, and write it back out as a SQL statement.
Use SSIS instead.
This is much clearer and has various options for the import of (text) files