Netezza - nzload data with default values - sql

I am trying to load int data into a table on a Netezza server. However, I do not know how to load in data with a default value in the case that it does not exist or is null.
Right now, the table consists of two columns with both having its own default value.
Attribute | Type | Modifier | Default Value
----------+---------+----------+----------------
number1 | integer | | 0
number2 | integer | | 100
I am currently running the following nzload command: nzload -cf test.log
The test.log file looks like this:
DATAFILE /usr/me/test.dat
{
Database test
TableName numberTest
Delimiter '|'
}
The test.dat file looks like this:
1|2
3|4
5|6
7|
|8
The issue I am faced with is that while the command runs fine, the integer values default to Null as opposed to what the default value was set to. I have tried using insert within the nzsql, and that creates the correct default values, but I was wondering if there is a way to do this with nzload.
Any help would be much appreciated.

The default value constraint will be enforced when performing inserts where the column with the default value is not referenced in the column list for the insert.
For example:
TESTDB.ADMIN(ADMIN)=> create table default_test (col1 varchar(10),
TESTDB.ADMIN(ADMIN)(> col2 varchar(10) default 'myDefault', col3 varchar(10));
CREATE TABLE
TESTDB.ADMIN(ADMIN)=> insert into default_test (col1, col3) values ('A','C');
INSERT 0 1
TESTDB.ADMIN(ADMIN)=> select * from default_test;
COL1 | COL2 | COL3
------+-----------+------
A | myDefault | C
(1 row)
However, when you are performing an nzload, Netezza is actually performing an insert into the target table with a select from an external table defined on your load datafile. In doing so it is including each column in the column list, and therefore the default value will not be triggered, even if the value in the external table's data file is NULL or an empty string.
[nz#netezza test]$ cat test.txt
A,B,C
D,,F
G,NULL,I
TESTDB.ADMIN(ADMIN)=> create external table default_test_ext
TESTDB.ADMIN(ADMIN)-> sameas default_test using (
TESTDB.ADMIN(ADMIN)(> dataobject '/export/home/nz/test/test.txt' delimiter ','
TESTDB.ADMIN(ADMIN)(> );
CREATE EXTERNAL TABLE
TESTDB.ADMIN(ADMIN)=> select * from default_test_ext;
COL1 | COL2 | COL3
------+------+------
A | B | C
D | | F
G | | I
(3 rows)
TESTDB.ADMIN(ADMIN)=> select * from default_test_ext where
TESTDB.ADMIN(ADMIN)-> (col2 is null or col2 = '');
COL1 | COL2 | COL3
------+------+------
D | | F
G | | I
(2 rows)
Since NULL and empty strings are valid values, and nzload is referencing that column in its insert the default value cannot/should not be used. It's working as I would expect it to, however it would definitely be useful if you could tell nzload to transform NULLs or empty strings to the default value for a column. Unfortunately that functionality doesn't currently exist (at least not to my knowledge).
While this is hyper-kludgey, I have gotten around this for data loads by doing the external table manually, and loading in two steps.
TESTDB.ADMIN(ADMIN)=> truncate table default_test;
TRUNCATE TABLE
TESTDB.ADMIN(ADMIN)=> insert into default_test (col1, col3)
TESTDB.ADMIN(ADMIN)-> select col1, col3 from default_test_ext
TESTDB.ADMIN(ADMIN)-> where (col2 is null or col2 = '');
INSERT 0 2
TESTDB.ADMIN(ADMIN)=> select * from default_test;
COL1 | COL2 | COL3
------+-----------+------
D | myDefault | F
G | myDefault | I
(2 rows)
TESTDB.ADMIN(ADMIN)=> insert into default_test
TESTDB.ADMIN(ADMIN)-> select * from default_test_ext
TESTDB.ADMIN(ADMIN)-> where (col2 is not null and col2 <> '');
INSERT 0 1
TESTDB.ADMIN(ADMIN)=> select * from default_test;
COL1 | COL2 | COL3
------+-----------+------
A | B | C
D | myDefault | F
G | myDefault | I
(3 rows)

Netezza does not enforce the default value constraint. It merely exists for notation. IBM Documentation. In order to fix your table you must run update statements.

Related

Split a String in Bigquery? (Split on character numbers instead of delimiter)?

I have the following dataset:
| Column1 |
| 100BB7832036 B120501|
I would like the output to look like:
|column 1 | column 2 | column 3 | column 4 |
|100BB7832036| B | 1205 | 01 |
I am having trouble splitting this string as the only delimiter is ' ', and I am not sure if it is possible to split this string based on the character values (ex: values 0-11 would give 100BB7832036, and value 13 would give B, and values 14-17 would give 1205, and values 18-19 would give 01)
So far I have tried:
split(column, ' ')[offset(0)] as Colum1
split(column, ' ')[offset(1)] as Column2
however this results in
| Column 1 | Column 2 |
| 100BB7832036| |
where column 2 is blank
Any help or suggestions would be greatly appreciated!
Thanks!
You can use SUBSTR function to split string into column(s) with this syntax
SUBSTR(column, start, [length])
Feel free to adjust the start index and length for your use case.
with example as (
select "100BB7832036 B120501" as column1
)
select
substr(column1, 0, 13) as col1,
substr(column1,14, 1) as col2,
substr(column1,15, 4) as col3,
substr(column1,19) as col3,
from example
Output:
col1 col2 col3 col4
100BB7832036 B 1205 01
Example:

AUTOINCREMENT primary key for snowflake bulk loading

I would like to upload data into snowflake table. The snowflake table has a primary key field with AUTOINCREMENT.
When I tried to upload data into snowflake without a primary key field, I've received following error message:
The COPY failed with error: Number of columns in file (2) does not
match that of the corresponding table (3), use file format option
error_on_column_count_mismatch=false to ignore this error
Does anyone know if I can bulk load data into a table that has an AUTOINCREMENT primary key?
knozawa
You can query the stage file using file format to load your data. I have created sample table like below. First column set autoincrement:
-- Create the target table
create or replace table Employee (
empidnumber autoincrement start 1 increment 1,
name varchar,
salary varchar
);
I have staged one sample file into snowflake internal stage to load data into table and I have queried stage file using following and then I have executed following copy cmd:
copy into mytable (name, salary )from (select $1, $2 from #test/test.csv.gz );
And it loaded the table with incremented values.
The docs have the following example which suggests this can be done:
https://docs.snowflake.net/manuals/user-guide/data-load-transform.html#include-autoincrement-identity-columns-in-loaded-data
-- Omit the sequence column in the COPY statement
copy into mytable (col2, col3)
from (
select $1, $2
from #~/myfile.csv.gz t
)
;
Could you please try this syntax and see if it works for you?
Create the target table
create or replace table mytable (
col1 number autoincrement start 1 increment 1,
col2 varchar,
col3 varchar
);
Stage a data file in the internal user stage
put file:///tmp/myfile.csv #~;
Query the staged data file
select $1, $2 from #~/myfile.csv.gz t;
+-----+-----+
| $1 | $2 |
|-----+-----|
| abc | def |
| ghi | jkl |
| mno | pqr |
| stu | vwx |
+-----+-----+
Omit the sequence column in the COPY statement
copy into mytable (col2, col3)
from (
select $1, $2
from #~/myfile.csv.gz t
)
;
select * from mytable;
+------+------+------+
| COL1 | COL2 | COL3 |
|------+------+------|
| 1 | abc | def |
| 2 | ghi | jkl |
| 3 | mno | pqr |
| 4 | stu | vwx |
+------+------+------+
Adding of PRIMARY KEY is different in SNOWFLAKE when compared to SQL
syntax for adding primary key with auto increment
CREATE OR REPLACE TABLE EMPLOYEES (
NAME VARCHAR(100),
SALARY VARCHAR(100),
EMPLOYEE_ID AUTOINCREMENT START 1 INCREMENT 1,
);
START 1 = STARTING THE PRIMARY KEY AT NUMBER 1 (WE CAN START AT ANY NUMBER WE WANT )
INCREMENT = FOR THE ID ADD THE NUMBER 1 TO PREVIOUS EXISTING NUMBER ( WE CAN GIVE ANYTHING WE WANT)

Oracle PLSQL validate a row and insert each cell value of row as a new row with validation results in another table oracl

I am very much new to the DB world, so wanted to review whether I am following right approach or not.
I have two tables,
table A --> is a table with 40 columns
table B --> is a table with the 2 columns (each column of table A is represented as row in this table.)
Example:
A:
column_1 | column_2 | column_3 ......... | column_40
-----------------------------------------------------------
value1_1 | value1_2 | value1_3...... | value1_40
B:
column_name |column_value | column_errorKey
----------------------------------------------------
column_1 | value1_1 | value1_1_errorKey
column_2 | value1_2 | value1_2_errorKey
What am I doing?
Validate each value of a row from table A and insert into the table B with
its value, error key and corresponding column name.
My PL SQL code is as below for, (Note: SQL code has considered only two columns to minimize the code here)
INSERT WHEN (LENGTH(column_1) <=7) THEN
into table_B values(
'column_1',column_1,'NoError')
WHEN (LENGTH(column_1) >7) THEN
into table_B values(
'column_1',column_1,'invalidLength')
WHEN (LENGTH(column_2) <= 75) THEN
into table_B values(
'column_2',column_2,'NoError')
WHEN (LENGTH(column_2) > 75) THEN
into table_B values(
'column_2',column_2,'invalidLength')
( select column_1,column_2,...,column_40
from table_A );
The validation that is happening within When the condition has only one validation but we have more validation like this for the value of each cell. I wanted to know is I am in the right approach or is another better way we have.
As suggested by APC, the best approach is to change your DB design.
You could probably use UNPIVOT and a single INSERT INTO SELECT .
The select statement would look like something below.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TableA(
column_1 VARCHAR(13)
,column_2 VARCHAR(25)
,column_3 VARCHAR(22)
,column_4 VARCHAR(11)
);
INSERT INTO TableA(column_1,column_2,column_3,column_4) VALUES ('value1_1','value1_2','value1_3','value1_40');
Query 1:
SELECT column_name
,column_value
,CASE
WHEN LENGTH(COLUMN_VALUE) <= 7
THEN 'NoError'
ELSE 'invalidLength'
END AS column_errorKey
FROM TableA
UNPIVOT(column_value FOR column_name IN (
COLUMN_1
,COLUMN_2
,COLUMN_3
,COLUMN_4
))
Results:
| COLUMN_NAME | COLUMN_VALUE | COLUMN_ERRORKEY |
|-------------|--------------|-----------------|
| COLUMN_1 | value1_1 | invalidLength |
| COLUMN_2 | value1_2 | invalidLength |
| COLUMN_3 | value1_3 | invalidLength |
| COLUMN_4 | value1_40 | invalidLength |

Hive: How to check if values of one array are present in another?

I have two arrays like this , which are being returned from a UDF I created:
array A - [P908,S57,A65]
array B - [P908,S57]
I need to check if elements of array A are present in array B, or elements of array B are present in array A using hive queries.
I am stuck here. Could anyone suggest a way?
Can I also return some other data type from the UDF in place of array to make the comparison easier?
select concat(',',concat_ws(',',A),',') regexp
concat(',(',concat_ws('|',B),'),') as are_common_elements
from mytable
;
Demo
create table mytable (id int,A array<string>,B array<string>);
insert into table mytable
select 1,array('P908','S57','A65'),array('P908','S57')
union all select 2,array('P908','S57','A65'),array('P9','S5777')
;
select * from mytable;
+------------+----------------------+----------------+
| mytable.id | mytable.a | mytable.b |
+------------+----------------------+----------------+
| 1 | ["P908","S57","A65"] | ["P908","S57"] |
| 2 | ["P908","S57","A65"] | ["P9","S5777"] |
+------------+----------------------+----------------+
select id
,concat(',',concat_ws(',',A),',') as left_side_of_regexp
,concat(',(',concat_ws('|',B),'),') as right_side_of_regexp
,concat(',',concat_ws(',',A),',') regexp
concat(',(',concat_ws('|',B),'),') as are_common_elements
from mytable
;
+----+---------------------+----------------------+---------------------+
| id | left_side_of_regexp | right_side_of_regexp | are_common_elements |
+----+---------------------+----------------------+---------------------+
| 1 | ,P908,S57,A65, | ,(P908|S57), | true |
| 2 | ,P908,S57,A65, | ,(P9|S5777), | false |
+----+---------------------+----------------------+---------------------+
We can do this using the Lateral view.
Lets we have 2 tables , Table1 and Table2 and column with array field as col1 and col2 respectively in the tables.
Use something like below:-
select collect_set (array_contains (col1 , r.tab2) )
from table1 ,
(select exp1 as tab2
from (table2 t2 lateral view explode(col2) exploded_table as exp1 ) ) r
You can also use array_intersection or other array function.

How to add a column using sql that in every record the unit is the result of computation of other columns?

I am using Netezza to manipulate some data. I am trying to add a column to a table with values that are results of computation of other columns.
First of all, I ran this sql to create a table to rearrange the order of other table:
CREATE TABLE SEQ_6_3_FNN_CID218_ORDERED AS
SELECT A.* FROM SEQ_6_3_FNN_CID218 A
ORDER BY TIMESTAMP
And then, what I need is like this, assuming columns TMP and ATT1 already there, and I need to insert ATT2:
TMP ATT1 ATT2
1 1 NULL
2 4 4-1=3
3 5 5-4=1
4 8 8-5=3
5 9 9-8=1
6 12 12-9=3
What is the sql that can achieve this? Or is there a way that this can be achieved directly running sql on SEQ_6_3_FNN_CID218 without running my create new table by order?
Thanks very much for your help.
HELP STILL NEEDED!
What you are looking for here is often referred to as a "calculated column". Netezza does not implement this feature, nor does it implement triggers (another method by which you might achieve the same result). Since Netezza is focused on data warehousing, the sorts of calculations you're talking about are usually done in the ETL process, by the ETL tool.
The good news is that you can do this purely through SQL with the LAG function, which is designed to do exactly this. Then, if you like, you can encode that in a view.
TESTDB.ADMIN(ADMIN)=> insert into base_table select * from base_ext;
INSERT 0 6
TESTDB.ADMIN(ADMIN)=> select * from base_table order by col1;
COL1 | COL2
------+------
1 | 1
2 | 4
3 | 5
4 | 8
5 | 9
6 | 12
(6 rows)
TESTDB.ADMIN(ADMIN)=> select col1, col2, col2 - lag(col2,1,NULL) over (
TESTDB.ADMIN(ADMIN)(> order by col1 asc) col3 from base_table;
COL1 | COL2 | COL3
------+------+------
1 | 1 |
2 | 4 | 3
3 | 5 | 1
4 | 8 | 3
5 | 9 | 1
6 | 12 | 3
(6 rows)
For clarity, the SQL again is:
select col1, col2, col2 - lag(col2,1,NULL) over ( order by col1 asc) col3 from base_table;
SQL Server can't do this "natively", but you could accomplish this with an insert and update trigger that responds to changes to the two columns and updates the third column.
Edit -- I stand corrected: SQL Server can do this natively. See Amirreza Keshavarz's answer.
Alter table SEQ_6_3_FNN_CID218_ORDERED
Add att2 as (att1-tmp)