AWS RedShift Copy SQL - how do I convert multiple values to NULL?

AWS RedShift Copy SQL - how do I convert multiple values to NULL? - sql

I have a RedShift COPY command which is executed as SQL:-
COPY some_schema.some_table FROM 's3://a-bucket/home/a_file.csv' CREDENTIALS 'aws_access_key_id=SOMEKEY;aws_secret_access_key=SOMESECRETKEY' IGNOREHEADER 1 CSV DATEFORMAT 'YYYY-MM-DD' NULL 'NOT-CAPTURED'
The data I need to import has a date column with occasional occurrences of 'NOT-CAPTURED'. The addition of the NULL option allows these to be treated as null and prevents a load error. This apparently worked.
Can this statement be extended to treat multiple types of occurrence as null? I have 'N.A' in a date column in a similar file and would like to use a common statement?
I have tried obvious variations to provide more than one value to replace as null such as NULL 'NOT-CAPTURED','N.A' and couldn't find any documentation covering it.
Thanks!

Related

Column Datatypes Issue in sql server 2019 when Import Flatfile using SSIS

I have column in flatfile contain value like. 2021-12-15T02:40:39+01:00
When I tried to Insert to table whose column datatype is datetime2.
It throwing Error as :
The data conversion for column "Mycol" returned status value 2 and status text
"The value could not be converted because of a potential loss of data.".
What could be best datatype for such values.

It seems the problem is two-fold here. One, the destination column for your value should be a datetimeoffset(0) and two that SSIS doesn't support the format yyyy-MM-ddThh:mm:ss for a DT_DBTIMESTAMPOFFSET; the T causes it problems.
Therefore I suggest that you define the column, MyCol, in your Flat File Connection as a DT_STR. Them, in your data flow task, use a derived column transformation which replaces MyCol and uses the following expression to remove the T and with a space ( ):
(DT_DBTIMESTAMPOFFSET,0) (REPLACE(Mycol,"T"," "))
This will then cause the correct data type and value to be inserted into the database.

Partial Loading Due to " in data - Snowflake Issue

I haven't been able to find anything that describes this issue I am having, although, I am sure many have had this problem. It may be as simple as forcing pre-processing in Python before loading the data in.
I am trying to load data from S3 into Snowflake tables. I am seeing errors such as:
Numeric value '' is not recognized
Timestamp '' is not recognized
In the table definitions, these columns are set to DEFAULT NULL, so if there are NULL values here it should be able to handle them. I opened the files in Python to check on these columns and sure enough some of the rows (the exact number throwing an error in Snowflake) are NaN's.
Is there a way to correct for this in Snowflake?

Good chance you need to add something to your COPY INTO statement to get this to execute correctly. Try this parameter in your format options:
NULL_IF = ('NaN')
If you have more than just NaN values (like actual strings of 'NULL'), then you can add those to the list in the () above.

If you are having issues loading data into tables (from any source) and are experiencing a similar issue to the one described above, such that the error is telling you *datatype* '' is not recognized then you will need to follow these instructions:
Go into the FILE_FORMAT you are using through the DATABASES tab
Select the FILE_FORMAT and click EDIT in the tool bar
Click on Show SQL on the bottom left window that appears, copy the statement
Paste the statement into a worksheet and alter the NULL_IF statement as follows
NULL_IF = ('\\N','');
Snowflake doesn't seem to recognize a completely empty value by default, so you need to add it as an option!

Postgres Format When Creating a Table

My work is testing out Postgres. I usually write in SQL using SAS with Oracle and Teradata syntax. Our test database has a really sloppy table in which every column was created as a character 255. I have a very simple thing I'm trying to do but it's not working. I want to create a new table and reformat from 255 to 10. I also want to remove all the trailing blanks. Also, "IS NULL" is not working. Even when there is nothing visible for a value.
PROC SQL;
CONNECT TO POSTGRES(&connectstuff);
EXECUTE(CREATE TABLE common.UNQ_NUM_LIST AS
SELECT DISTINCT UNIQUE_NUM,
BTRIM(PAT_ACCT) AS PAT_ACCT
FROM ACCT_DATA.ACCNTS
) by postgres;
DISCONNECT FROM POSTGRES;
QUIT;
I want to create PAT_ACCT as a character 10 format but not sure how. Can I indicate a new format when creating a table? Everything I've tried didn't work. Even the BTRIM doesn't actually seem to get rid of the trailing spaces on that value either. And again, null values aren't being recognized with "IS NULL". I feel like this should be very simple!

To influence the data type, cast the result column in the query appropriately:
CAST (btrim(pat_acct) AS character(10)) AS pat_acct
There is, however, no way to set a column NOT NULL that way.
I recommend that you execute two statements: one that creates the table the way you want it, and another one like
INSERT INTO ...
SELECT ... FROM ...

Replacing Null Values Using Derived Column in SSIS

I have a data flow task which picks up data from a non-unicode flat file to a SQL Server table destination.
I'm using a Derived Column task to replace NULL values in a date column with the string "1900-01-01". The destination table column is a varchar data type.
I'm using this SSIS Expression (DT_STR,10,1252)REPLACENULL(dateColumn,"1900-01-01") and the task executes successfully but I still see NULLs instead of the "1900-01-01" string at the destination.
Why is this? I've tried replacing the column, adding a new column but whatever I do I still see NULLs and not the replacement string. I can see my new derived column in the Advanced Editor so can see no reason why this isn't working. Any help would be most welcome.

If your your source is non-unicode, why are you using DT_STR? varchar is a already a non-unicode data-type. You should just be able to do it with
REPLACENULL(dateColumn,"1900-01-01")
Also, did you put in a lookup transformation to update the column? Have you made sure the right keys are being looked up and updated?

Exporting Null integer data from SQL Server 2005 to a file

When I export records using a query from the SQL Server 2005 import export wizard to a comma delimited file, the integer data type column with a NULL value is exported as ,, i.e. the null is replaced by no character, while I want the output to be ,NULL,.
How do I do it?

Use the bcp tool. It can export query results. Use the native format and use the same bcp tool to import the data at destination.
Among other advantages (speed, minimal logging, correct code page and unicode handling, batching etc etc) you will also get correct NULL handling in the process. bcp allows fine tune control over NULLs, see Keep Nulls or Use Default Values During Bulk Import (SQL Server).
If you insist on csv files then there is fundamental no way of representing NULLs, see BCP and NULL Values:
The only non-ambiguous way to represent NULL in character-mode BCP is by using two adjacent delimiters in a character-delimited file. Manipulation of the source file to achieve this state is possible via several techniques. Alternatively, if 0 or spaces in the destination SQL column has no valid meaning for the particular database, you can change it to NULL via a bulk UPDATE statement following the BCP in.

Null means no value, but you want a string value of 'NULL' to be exported. Perhaps you should use a CASE statement in your query to substitute a string for nulls:
case
when yourColumn is null then 'NULL'
else cast(yourColumn as varchar(100))
end as IntegerColumn

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

AWS RedShift Copy SQL - how do I convert multiple values to NULL? - sql

Related

Column Datatypes Issue in sql server 2019 when Import Flatfile using SSIS

Partial Loading Due to " in data - Snowflake Issue

Postgres Format When Creating a Table

Replacing Null Values Using Derived Column in SSIS

Exporting Null integer data from SQL Server 2005 to a file

Categories

Resources