AUTOINCREMENT primary key for snowflake bulk loading - primary-key

I would like to upload data into snowflake table. The snowflake table has a primary key field with AUTOINCREMENT.
When I tried to upload data into snowflake without a primary key field, I've received following error message:
The COPY failed with error: Number of columns in file (2) does not
match that of the corresponding table (3), use file format option
error_on_column_count_mismatch=false to ignore this error
Does anyone know if I can bulk load data into a table that has an AUTOINCREMENT primary key?
knozawa

You can query the stage file using file format to load your data. I have created sample table like below. First column set autoincrement:
-- Create the target table
create or replace table Employee (
empidnumber autoincrement start 1 increment 1,
name varchar,
salary varchar
);
I have staged one sample file into snowflake internal stage to load data into table and I have queried stage file using following and then I have executed following copy cmd:
copy into mytable (name, salary )from (select $1, $2 from #test/test.csv.gz );
And it loaded the table with incremented values.

The docs have the following example which suggests this can be done:
https://docs.snowflake.net/manuals/user-guide/data-load-transform.html#include-autoincrement-identity-columns-in-loaded-data
-- Omit the sequence column in the COPY statement
copy into mytable (col2, col3)
from (
select $1, $2
from #~/myfile.csv.gz t
)
;
Could you please try this syntax and see if it works for you?

Create the target table
create or replace table mytable (
col1 number autoincrement start 1 increment 1,
col2 varchar,
col3 varchar
);
Stage a data file in the internal user stage
put file:///tmp/myfile.csv #~;
Query the staged data file
select $1, $2 from #~/myfile.csv.gz t;
+-----+-----+
| $1 | $2 |
|-----+-----|
| abc | def |
| ghi | jkl |
| mno | pqr |
| stu | vwx |
+-----+-----+
Omit the sequence column in the COPY statement
copy into mytable (col2, col3)
from (
select $1, $2
from #~/myfile.csv.gz t
)
;
select * from mytable;
+------+------+------+
| COL1 | COL2 | COL3 |
|------+------+------|
| 1 | abc | def |
| 2 | ghi | jkl |
| 3 | mno | pqr |
| 4 | stu | vwx |
+------+------+------+

Adding of PRIMARY KEY is different in SNOWFLAKE when compared to SQL
syntax for adding primary key with auto increment
CREATE OR REPLACE TABLE EMPLOYEES (
NAME VARCHAR(100),
SALARY VARCHAR(100),
EMPLOYEE_ID AUTOINCREMENT START 1 INCREMENT 1,
);
START 1 = STARTING THE PRIMARY KEY AT NUMBER 1 (WE CAN START AT ANY NUMBER WE WANT )
INCREMENT = FOR THE ID ADD THE NUMBER 1 TO PREVIOUS EXISTING NUMBER ( WE CAN GIVE ANYTHING WE WANT)

Related

PostgreSQL add new not null column and fill with ids from insert statement

I´ve got 2 tables.
CREATE TABLE content (
id bigserial NOT NULL,
name text
);
CREATE TABLE data (
id bigserial NOT NULL,
...
);
The tables are already filled with a lot of data.
Now I want to add a new column content_id (NOT NULL) to the data table.
It should be a foreign key to the content table.
Is it possible to automatically create an entry in the content table to set a content_id in the data table.
For example
**content**
| id | name |
| 1 | abc |
| 2 | cde |
data
| id |... |
| 1 |... |
| 2 |... |
| 3 |... |
Now I need an update statement that creates 3 (in this example) content entries and add the ids to the data table to get this result:
content
| id | name |
| 1 | abc |
| 2 | cde |
| 3 | ... |
| 4 | ... |
| 5 | ... |
data
| id |... | content_id |
| 1 |... | 3 |
| 2 |... | 4 |
| 3 |... | 5 |
demo:db<>fiddle
According to the answers presented here: How can I add a column that doesn't allow nulls in a Postgresql database?, there are several ways of adding a new NOT NULL column and fill this directly.
Basicly there are 3 steps. Choose the best fitting (with or without transaction, setting a default value first and remove after, leave the NOT NULL contraint first and add afterwards, ...)
Step 1: Adding new column (without NOT NULL constraint, because the values of the new column values are not available at this point)
ALTER TABLE data ADD COLUMN content_id integer;
Step 2: Inserting the data into both tables in a row:
WITH inserted AS ( -- 1
INSERT INTO content
SELECT
generate_series(
(SELECT MAX(id) + 1 FROM content),
(SELECT MAX(id) FROM content) + (SELECT COUNT(*) FROM data)
),
'dummy text'
RETURNING id
), matched AS ( -- 2
SELECT
d.id AS data_id,
i.id AS content_id
FROM (
SELECT
id,
row_number() OVER ()
FROM data
) d
JOIN (
SELECT
id,
row_number() OVER ()
FROM inserted
) i ON i.row_number = d.row_number
) -- 3
UPDATE data d
SET content_id = s.content_id
FROM (
SELECT * FROM matched
) s
WHERE d.id = s.data_id;
Executing several statements one after another by using the results of the previous one can be achieved using WITH clauses (CTEs):
Insert data into content table: This generates an integer series starting at the MAX() + 1 value of the current content's id values and has as many records as the data table. Afterwards the new ids are returned
Now we need to match the current records of the data table with the new ids. So for both sides, we use row_number() window function to generate a consecutive row count for each records. Because both, the insert result and the actual data table have the same number of records, this can be used as join criterion. So we can match the id column of the data table with the new content's id values
This matched data can used in the final update of the new content_id column
Step 3: Add the NOT NULL constraint
ALTER TABLE data ALTER COLUMN content_id SET NOT NULL;

Hive insert overwrite directory stored as parquet made NULL values

I'm trying to add some data in one directory, and after to add these data as partition to a table.
create table test (key int, value int) partitioned by (dt int) stored as parquet location '/user/me/test';
insert overwrite directory '/user/me/test/dt=1' stored as parquet select 123, 456, 1;
alter table test add partition (dt=1);
select * from test;
This code sample is simple... but don't work. With the select statement, the output is NULL, NULL, 1. But I need 123, 456, 1.
When I read the data with Impala, I received 123, 456, 1... what is expected.
Why ? What is wrong ?
If I removed the two "stored as parquet", it's all ok... but I want my data in parquet !
PS : I want this construct for a switch of partition, so that when the data are calculated, they don't go to the user...
Identifying the issue
hive
create table test (key int, value int)
partitioned by (dt int)
stored as parquet location '/user/me/test'
;
insert overwrite directory '/user/me/test/dt=1'
stored as parquet
select 123, 456
;
alter table test add partition (dt=1)
;
select * from test
;
+----------+------------+---------+
| test.key | test.value | test.dt |
+----------+------------+---------+
| NULL | NULL | 1 |
+----------+------------+---------+
bash
parquet-tools cat hdfs://{fs.defaultFS}/user/me/test/dt=1/000000_0
_col0 = 123
_col1 = 456
Verifying the issue
hive
alter table test change column `key` `_col0` int cascade;
alter table test change column `value` `_col1` int cascade;
select * from test
;
+------------+------------+---------+
| test._col0 | test._col1 | test.dt |
+------------+------------+---------+
| 123 | 456 | 1 |
+------------+------------+---------+
Suggestd Solution
create additional table test_admin and do the insert through it
create table test_admin (key int, value int)
partitioned by (dt int)
stored as parquet location '/user/me/test'
;
create external table test (key int, value int)
partitioned by (dt int)
stored as parquet
location '/user/me/test'
;
insert into test_admin partition (dt=1) select 123, 456
;
select * from test_admin
;
+----------+------------+---------+
| test.key | test.value | test.dt |
+----------+------------+---------+
| 123 | 456 | 1 |
+----------+------------+---------+
select * from test
;
(empty result set)
alter table test add partition (dt=1)
;
select * from test
;
+----------+------------+---------+
| test.key | test.value | test.dt |
+----------+------------+---------+
| 123 | 456 | 1 |
+----------+------------+---------+

Inserting a row at the specific place in SQLite database

I was creating the database in SQLite Manager & by mistake I forgot to mention a row.
Now, I want to add a row in the middle manually & below it the rest of the Auto-increment keys should be increased by automatically by 1 . I hope my problem is clear.
Thanks.
You shouldn't care about key values, just append your row at the end.
If you really need to do so, you could probably just update the keys with something like this. If you want to insert the new row at key 87
Make room for the key
update mytable
set key = key + 1
where key >= 87
Insert your row
insert into mytable ...
And finally update the key for the new row
update mytable
set key = 87
where key = NEW_ROW_KEY
I would just update IDs, incrementing them, then insert record setting ID manually:
CREATE TABLE cats (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name VARCHAR
);
INSERT INTO cats (name) VALUES ('John');
INSERT INTO cats (name) VALUES ('Mark');
SELECT * FROM cats;
| 1 | John |
| 2 | Mark |
UPDATE cats SET ID = ID + 1 WHERE ID >= 2; -- "2" is the ID of forgotten record.
SELECT * FROM cats;
| 1 | John |
| 3 | Mark |
INSERT INTO cats (id, name) VALUES (2, 'SlowCat'); -- "2" is the ID of forgotten record.
SELECT * FROM cats;
| 1 | John |
| 2 | SlowCat |
| 3 | Mark |
Next record, inserted using AUTOINCREMENT functionality, will have next-to-last ID (4 in our case).

Write SQL script to insert data

In a database that contains many tables, I need to write a SQL script to insert data if it is not exist.
Table currency
| id | Code | lastupdate | rate |
+--------+---------+------------+-----------+
| 1 | USD | 05-11-2012 | 2 |
| 2 | EUR | 05-11-2012 | 3 |
Table client
| id | name | createdate | currencyId|
+--------+---------+------------+-----------+
| 4 | tony | 11-24-2010 | 1 |
| 5 | john | 09-14-2010 | 2 |
Table: account
| id | number | createdate | clientId |
+--------+---------+------------+-----------+
| 7 | 1234 | 12-24-2010 | 4 |
| 8 | 5648 | 12-14-2010 | 5 |
I need to insert to:
currency (id=3, Code=JPY, lastupdate=today, rate=4)
client (id=6, name=Joe, createdate=today, currencyId=Currency with Code 'USD')
account (id=9, number=0910, createdate=today, clientId=Client with name 'Joe')
Problem:
script must check if row exists or not before inserting new data
script must allow us to add a foreign key to the new row where this foreign related to a row already found in database (as currencyId in client table)
script must allow us to add the current datetime to the column in the insert statement (such as createdate in client table)
script must allow us to add a foreign key to the new row where this foreign related to a row inserted in the same script (such as clientId in account table)
Note: I tried the following SQL statement but it solved only the first problem
INSERT INTO Client (id, name, createdate, currencyId)
SELECT 6, 'Joe', '05-11-2012', 1
WHERE not exists (SELECT * FROM Client where id=6);
this query runs without any error but as you can see I wrote createdate and currencyid manually, I need to take currency id from a select statement with where clause (I tried to substitute 1 by select statement but query failed).
This is an example about what I need, in my database, I need this script to insert more than 30 rows in more than 10 tables.
any help
You wrote
I tried to substitute 1 by select statement but query failed
But I wonder why did it fail? What did you try? This should work:
INSERT INTO Client (id, name, createdate, currencyId)
SELECT
6,
'Joe',
current_date,
(select c.id from currency as c where c.code = 'USD') as currencyId
WHERE not exists (SELECT * FROM Client where id=6);
It looks like you can work out if the data exists.
Here is a quick bit of code written in SQL Server / Sybase that I think answers you basic questions:
create table currency(
id numeric(16,0) identity primary key,
code varchar(3) not null,
lastupdated datetime not null,
rate smallint
);
create table client(
id numeric(16,0) identity primary key,
createddate datetime not null,
currencyid numeric(16,0) foreign key references currency(id)
);
insert into currency (code, lastupdated, rate)
values('EUR',GETDATE(),3)
--inserts the date and last allocated identity into client
insert into client(createddate, currencyid)
values(GETDATE(), ##IDENTITY)
go

Is it possible to update an "order" column from within a trigger in MySQL?

We have a table in our system that would benefit from a numeric column so we can easily grab the 1st, 2nd, 3rd records for a job. We could, of course, update this column from the application itself, but I was hoping to do it in the database.
The final method must handle cases where users insert data that belongs in the "middle" of the results, as they may receive information out of order. They may also edit or delete records, so there will be corresponding update and delete triggers.
The table:
CREATE TABLE `test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`seq` int(11) unsigned NOT NULL,
`job_no` varchar(20) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
And some example data:
mysql> SELECT * FROM test ORDER BY job_no, seq;
+----+-----+--------+------------+
| id | seq | job_no | date |
+----+-----+--------+------------+
| 5 | 1 | 123 | 2009-10-05 |
| 6 | 2 | 123 | 2009-10-01 |
| 4 | 1 | 123456 | 2009-11-02 |
| 3 | 2 | 123456 | 2009-11-10 |
| 2 | 3 | 123456 | 2009-11-19 |
+----+-----+--------+------------+
I was hoping to update the "seq" column from a t rigger, but this isn't allowed by MySQL, with an error "Can't update table 'test' in stored function/trigger because it is already used by statement which invoked this stored function/trigger".
My test trigger is as follows:
CREATE TRIGGER `test_after_ins_tr` AFTER INSERT ON `test`
FOR EACH ROW
BEGIN
SET #seq = 0;
UPDATE
`test` t
SET
t.`seq` = #seq := (SELECT #seq + 1)
WHERE
t.`job_no` = NEW.`job_no`
ORDER BY
t.`date`;
END;
Is there any way to achieve what I'm after other than remembering to call a function after each update to this table?
What about this?
CREATE TRIGGER `test_after_ins_tr` BEFORE INSERT ON `test`
FOR EACH ROW
BEGIN
SET #seq = (SELECT COALESCE(MAX(seq),0) + 1 FROM test t WHERE t.job_no = NEW.job_no);
SET NEW.seq = #seq;
END;
From Sergi's comment above:
http://dev.mysql.com/doc/refman/5.1/en/stored-program-restrictions.html - "Within a stored function or trigger, it is not permitted to modify a table that is already being used (for reading or writing) by the statement that invoked the function or trigger."