I have a number of datasets that I am trying to load into amazon redhsift from s3 buckets. Here is my command:
"copy tablename from 'my_s3_bucket' iam_role 'my_role' delimiter ',' IGNOREHEADER 1 null as ''
this works but for some files throws an error:
Invalid digit, Value 'i', Pos 0, Type: Decimal...
On inspection the data has 'inf' in some positions which is causing the error. I am wondering if there is a way to handle infinite values with this type of command? Or simply upload it as a null - though I have '' already specified as null so not sure if I can do another?
Maybe change table's schema to load data as VARCHAR and then create a view with a CASE statement handling inf values and cast values to proper data type.
Related
I am making a table that absorbs values from a csv file. I have several numeric columns and when converting text to number, when inserting values, I get an error.
I have NULL values in the csv. Is there a way to remove them from the database. I remove them manually.
NULL -> '0'.
I tried DEFAULT. ISNULL, I don't know if I can use it when creating a table.
What I want to try is not to open the csv anymore. This way I get it. And when importing to the database, the NULLs go to zero.
I receive the files in csv. I open them, remove the NULL values.
7939102772 2401679 108271 0 3000062862 174529 8129
7939102772 2401679 108271 0 3000062862 174529 8129
7939102772 2401679 108271 0 3000062862 174529 8129
1. NULL NULL NULL NULL NULL NULL NULL
And I use:
BULK INSERT [dbo]
FROM 'C:csv'
WITH (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR='\n');
You can allow nulls while importing to remove errors caused by null insertion. If you are importing csv using a script or sql query you can update your query as
INSERT INTO table(Col1,Col2,Col3.....) VALUES(ISNULL(value1,0),ISNULL(value2,0),ISNULL(value3,0))
Or You can update your table after successfuly importing the csv with following script
update Tableset Col1=IIf([Col1] Is Null, "0", [Col1]),
[Col2]=IIf([Col2] Is Null, "0", [Col2]),
[Col3] =IIf([Col3] Is Null, "0", [Col3])
where [Col1] is null or [Col2] is null or [Col3] is null
The best approach though is to process and clean your data before importing into a database using python, jupyter or any language you are comfortable with.
SQL Server Import Wizard treats NULL as literal string 'NULL'
another idea is to put the imported data into a staging table then apply and update for the string 'NULL' resulting from the import
I have a bigint column named mycolumn. I execute SQL scripts using the PSQL command.
Using COPY command:
COPY public.mytable (myothercol, mycolumn) FROM stdin;
1 \N
\.
This works. But the following does not work:
EXECUTE 'insert into public.mytable (myothercol, mycolumn) values ($1,$2);' USING
1,NULL;
This gives me error:
column "mycolumn" is of type bigint but expression is of type text
Why does insert not work for null value, whereas COPY works?
You best tell PostgreSQL to convert the parameter to bigint explicitly:
EXECUTE 'insert into public.mytable (myothercol, mycolumn) values ($1,$2::bigint);'
USING 1,NULL;
The problem is that PostgreSQL does not automatically know what data type a NULL is, so it guesses text. COPY does not have to guess a data type.
I've got a front table that essentially matches our SSMS database table t_myTable. Some columns I'm having problems with are those with numeric data types in the db. They are set to allow null, but from the front end when the user deletes the numeric value and tries to send a blank value, it's not posting to the database. I suspect because this value is sent back as an empty string "" which does not translate to the null allowable data type.
Is there a trigger I can create to convert these empty strings into null on insert and update to the database? Or, perhaps a trigger would already happen too late in the process and I need to handle this on the front end or API portion instead?
We'll call my table t_myTable and the column myNumericColumn.
I could also be wrong and perhaps this 'empty string' issue is not the source of my problem. But I suspect that it is.
As #DaleBurrell noted, the proper place to handle data validation is in the application layer. You can wrap each of the potentially problematic values in a NULLIF function, which will convert the value to a NULL if an empty string is passed to it.
The syntax would be along these lines:
SELECT
...
,NULLIF(ColumnName, '') AS ColumnName
select nullif(Column1, '') from tablename
SQL Server doesn't allow to convert an empty string to the numeric data type. Hence the trigger is useless in this case, even INSTEAD OF one: SQL Server will check the conversion before inserting.
SELECT CAST('' AS numeric(18,2)) -- Error converting data type varchar to numeric
CREATE TABLE tab1 (col1 numeric(18,2) NULL);
INSERT INTO tab1 (col1) VALUES(''); -- Error converting data type varchar to numeric
As you didn't mention this error, the client should pass something other than ''. The problem can be found with SQL Profiler: you need to run it and see what exact SQL statement is executing to insert data into the table.
I have a few processes where I use the copy command to copy data from S3 into Redshift.
I have a new csv file where I am unable to figure out how I can bring in the "note" field- which is a free hand field a sales person writes anything into. It can have ";", ",", ".", spaces, new lines- anything.
Any common suggestions to copy this type of field? it is varchar(max) type in table_name.
Using this:
copy table_name
from 's3://location'
iam_role 'something'
delimiter as ','
ignoreheader 1
escape
removequotes
acceptinvchars
I get Delimiter not found
Using this:
copy table_name
from 's3://location'
iam_role 'something'
delimiter as ','
fillrecord
ignoreheader 1
escape
removequotes
acceptinvchars
I get String length exceeds DDL length
The second copy command command fixed your initial issue, namely of copy parsing the csv file. But now it can't be inserted because the input value exceeds the maximum column length of yr column in database. Try increasing the size of the column:
Alter column data type in Amazon Redshift
I am having trouble loading decimal data into a database - specifically, my negative numbers are getting truncated, and I can't figure it out.
Here is what my query looks like:
> CREATE TABLE IF NOT EXISTS mytable (id INT(12) NOT NULL AUTO_INCREMENT,
mydecimal DECIMAL(13,2),PRIMARY KEY(id));
> LOAD DATA INFILE 'data.dat' INTO TABLE mytable FIELDS TERMINATED BY ';';
And the data.dat that I'm loading:
;000000019.50 ;
;000000029.50-;
;000000049.50 ;
When it completes, giving me a warning that "Data truncated for column 'mydecimal' at row 2." And when I look at the data, it's stored as positive number. Any ideas how to fix this?
The best way to handle data abnormalities like this in the input file is to load them into a local variable, then set the actual column value based on a transformation of the local variable.
In your case, you can load the strings into a local variable, then either leave it alone or multiply by negative one depending on whether it ends with a minus sign.
Something like this should work for you:
LOAD DATA INFILE 'data.dat'
INTO TABLE mytable FIELDS TERMINATED BY ';'
(id,#mydecimal)
set mydecimal = IF(#mydecimal like '%-',#mydecimal * -1,#mydecimal);
I'm not sure why you're putting the minus sign after the number rather than before it. Does it work when you place the '-' sign at the start of the line?
you can consider this
CREATE TABLE IF NOT EXISTS mytable (id INT(12) NOT NULL AUTO_INCREMENT,
mydecimal varchar(255),PRIMARY KEY(id));
LOAD DATA INFILE 'data.dat' INTO TABLE mytable FIELDS TERMINATED BY ';';
update mytable set mydecimal =
cast(mydecimal as decimal(13,2))*if (substring(mydecimal, -1)='-', -1, 1);
alter table mytable modify column mydecimal decimal(13,2) signed;