Missing data for column SQL COPY - sql

This is my code:
CREATE TABLE data
(
event_time DATE,
event_type1 TEXT,
event_type2 TEXT,
event_type3 TEXT,
event_type4 TEXT,
continent TEXT
);
COPY data
FROM '/home/data/data.csv' WITH CSV DELIMITER ';' ;
I get this error:
ERROR: missing data for column "event_type3"
Where: COPY dilans_data, line 11: "2018-01-01 00:07:41;subscribe;2458151268"
1 statement failed.
I do have missing values in some columns but first I would like to import the data after dealing with it.
I tried to add: NULL 'null_string' (NaN, N, 0) but it did not worked.
Can you help me?

You can create the table with the columns for which you have data and then add the other columns after the import.
CREATE TABLE data
(
event_time DATE,
event_type1 TEXT,
event_type2 TEXT
);
✓
insert into data values ('2018-01-01 00:07:41','subscribe','2458151268');
1 rows affected
ALTER TABLE data ADD COLUMN event_type3 TEXT;
ALTER TABLE data ADD COLUMN event_type4 TEXT ;
ALTER TABLE data ADD COLUMN continent TEXT ;
✓
✓
✓
SELECT * FROM data;
event_time | event_type1 | event_type2 | event_type3 | event_type4 | continent
:--------- | :---------- | :---------- | :---------- | :---------- | :--------
2018-01-01 | subscribe | 2458151268 | null | null | null
db<>fiddle here

Related

How to use selected value as table name for another select in PostgreSQL

I have table with two coloms source_name and output_name:
CREATE TABLE all_table(
source_name text,
target_name text
);
Source name it is some external data name. Target name it is auto-generated table name in my DB. There is a relationship between the source name and the target name, there is only one target name for each source name.
I have additional table in DB:
CREATE TABLE output_table_1(
first_name text,
second_name text,
birthday timestamp
);
CREATE TABLE output_table_2(
login text,
money int
);
In table "all_table" I have some rows:
| source_name | target_name |
|---------------|----------------|
| personal data | output_table_1 |
| login data | output_table_2 |
I want select information from correct table by source name. So I tried
WITH selected_table AS (
SELECT target_name FROM all_table WHERE source_name='personal data'
)
SELECT * FROM selected_table;
And also
SELECT first_name FROM
(SELECT target_name FROM all_table WHERE source_name='personal data') AS out_table;
But Postgres print me only correct target name
| target_name |
|----------------|
| output_table_1 |
I want to get something similar on my query
| first_name | second_name | birthday |
|------------|-------------|----------|
| FName1 | SName1 | Date1 |
| FName2 | SName2 | Date2 |
| FName3 | SName3 | Date3 |
| FName4 | SName4 | Date4 |
| ... | ... | ... |
I've also tried this query
DO
$$
BEGIN
EXECUTE format('SELECT * FROM %s LIMIT 10', (SELECT target_name FROM all_table WHERE source_name='personal data'));
END;
$$ LANGUAGE plpgsql;
Query executed but nothing happened.
Surfing on Google doesn't do anything useful. But mb I'm bad in this.
¯\(ツ)/¯
if you want to obtain data from DO block you need to define cursor for query.
do
$$
declare
_query text ;
_cursor CONSTANT refcursor :='_cursor';
begin
_query:='Select * from '||(Select tab_name from ... where ..);
open _cursor for execute _query;
end;
$$;
fetch all from _cursor;

PostgreSQL add new not null column and fill with ids from insert statement

I´ve got 2 tables.
CREATE TABLE content (
id bigserial NOT NULL,
name text
);
CREATE TABLE data (
id bigserial NOT NULL,
...
);
The tables are already filled with a lot of data.
Now I want to add a new column content_id (NOT NULL) to the data table.
It should be a foreign key to the content table.
Is it possible to automatically create an entry in the content table to set a content_id in the data table.
For example
**content**
| id | name |
| 1 | abc |
| 2 | cde |
data
| id |... |
| 1 |... |
| 2 |... |
| 3 |... |
Now I need an update statement that creates 3 (in this example) content entries and add the ids to the data table to get this result:
content
| id | name |
| 1 | abc |
| 2 | cde |
| 3 | ... |
| 4 | ... |
| 5 | ... |
data
| id |... | content_id |
| 1 |... | 3 |
| 2 |... | 4 |
| 3 |... | 5 |
demo:db<>fiddle
According to the answers presented here: How can I add a column that doesn't allow nulls in a Postgresql database?, there are several ways of adding a new NOT NULL column and fill this directly.
Basicly there are 3 steps. Choose the best fitting (with or without transaction, setting a default value first and remove after, leave the NOT NULL contraint first and add afterwards, ...)
Step 1: Adding new column (without NOT NULL constraint, because the values of the new column values are not available at this point)
ALTER TABLE data ADD COLUMN content_id integer;
Step 2: Inserting the data into both tables in a row:
WITH inserted AS ( -- 1
INSERT INTO content
SELECT
generate_series(
(SELECT MAX(id) + 1 FROM content),
(SELECT MAX(id) FROM content) + (SELECT COUNT(*) FROM data)
),
'dummy text'
RETURNING id
), matched AS ( -- 2
SELECT
d.id AS data_id,
i.id AS content_id
FROM (
SELECT
id,
row_number() OVER ()
FROM data
) d
JOIN (
SELECT
id,
row_number() OVER ()
FROM inserted
) i ON i.row_number = d.row_number
) -- 3
UPDATE data d
SET content_id = s.content_id
FROM (
SELECT * FROM matched
) s
WHERE d.id = s.data_id;
Executing several statements one after another by using the results of the previous one can be achieved using WITH clauses (CTEs):
Insert data into content table: This generates an integer series starting at the MAX() + 1 value of the current content's id values and has as many records as the data table. Afterwards the new ids are returned
Now we need to match the current records of the data table with the new ids. So for both sides, we use row_number() window function to generate a consecutive row count for each records. Because both, the insert result and the actual data table have the same number of records, this can be used as join criterion. So we can match the id column of the data table with the new content's id values
This matched data can used in the final update of the new content_id column
Step 3: Add the NOT NULL constraint
ALTER TABLE data ALTER COLUMN content_id SET NOT NULL;

postgres insert data from an other table inside array type columns

I have tow table on Postgres 11 like so, with some ARRAY types columns.
CREATE TABLE test (
id INT UNIQUE,
category TEXT NOT NULL,
quantitie NUMERIC,
quantities INT[],
dates INT[]
);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (1, 'cat1', 33, ARRAY[66], ARRAY[123678]);
INSERT INTO test (id, category, quantitie, quantities, dates) VALUES (2, 'cat2', 99, ARRAY[22], ARRAY[879889]);
CREATE TABLE test2 (
idweb INT UNIQUE,
quantities INT[],
dates INT[]
);
INSERT INTO test2 (idweb, quantities, dates) VALUES (1, ARRAY[34], ARRAY[8776]);
INSERT INTO test2 (idweb, quantities, dates) VALUES (3, ARRAY[67], ARRAY[5443]);
I'm trying to update data from table test2 to table test only on rows with same id. inside ARRAY of table test and keeping originals values.
I use INSERT on conflict,
how to update only 2 columns quantities and dates.
running the sql under i've got also an error that i don't understand the origin.
Schema Error: error: column "quantitie" is of type numeric but expression is of type integer[]
INSERT INTO test (SELECT * FROM test2 WHERE idweb IN (SELECT id FROM test))
ON CONFLICT (id)
DO UPDATE
SET
quantities = array_cat(EXCLUDED.quantities, test.quantities),
dates = array_cat(EXCLUDED.dates, test.dates);
https://www.db-fiddle.com/f/rs8BpjDUCciyZVwu5efNJE/0
is there a better way to update table test from table test2, or where i'm missing the sql?
update to show result needed on table test:
**Schema (PostgreSQL v11)**
| id | quantitie | quantities | dates | category |
| --- | --------- | ---------- | ----------- | --------- |
| 2 | 99 | 22 | 879889 | cat2 |
| 1 | 33 | 34,66 | 8776,123678 | cat1 |
Basically, your query fails because the structures of the tables do not match - so you cannot insert into test select * from test2.
You could work around this by adding "fake" columns to the select list, like so:
insert into test
select idweb, 'foo', 0, quantities, dates from test2 where idweb in (select id from test)
on conflict (id)
do update set
quantities = array_cat(excluded.quantities, test.quantities),
dates = array_cat(excluded.dates, test.dates);
But this looks much more convoluted than needed. Essentially, you want an update statement, so I would just recommend:
update test
set
dates = test2.dates || test.dates,
quantities = test2.quantities || test.quantities
from test2
where test.id = test2.idweb
Note that this ues || concatenation operator instead of array_cat() - it is shorter to write.
Demo on DB Fiddle:
id | category | quantitie | quantities | dates
-: | :------- | --------: | :--------- | :------------
2 | cat2 | 99 | {22} | {879889}
1 | cat1 | 33 | {34,66} | {8776,123678}

Creating a trigger to copy data from 1 table to another

I am trying to create a trigger that will copy data from table 1 when and paste it into table 2, when a new entry has been put into table 1:
Table 1
id | first_name | last_name | email | uid | pwd
----+------------+-----------+-------+-----+-----
Table 2
user_id | user_first_name | user_last_name | user_uid
---------+-----------------+----------------+---------
the code i am using is this :
DROP TRIGGER IF EXISTS usersetup_identifier ON users;
CREATE OR REPLACE FUNCTION usersetup_identifier_insert_update() RETURNS trigger AS $BODY$
BEGIN
if NEW.identifier is null then
NEW.identifier := "INSERT INTO users_setup (user_id, user_first_name, user_last_name, user_uid)
SELECT id, first_name, last_name, uid
FROM users";
end if;
RETURN NEW;
end
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER usersetup_identifier
AFTER INSERT OR UPDATE ON users FOR EACH ROW
EXECUTE PROCEDURE usersetup_identifier_insert_update();
but when i insert data into table 1 i am getting this error message :
NOTICE: identifier "INSERT INTO users_setup (user_id, user_first_name, user_last_name, user_uid)
SELECT id, first_name, last_name, uid
FROM users" will be truncated to "INSERT INTO users_setup (user_id, user_first_name, user_last_na"
ERROR: record "new" has no field "identifier"
CONTEXT: SQL statement "SELECT NEW.identifier is null"
PL/pgSQL function usersetup_identifier_insert_update() line 3 at IF
the table descriptions are:
Table "public.users"
Column | Type | Collation | Nullable | Default
------------+---------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('users_id_seq'::regclass)
first_name | character(20) | | not null |
last_name | character(20) | | not null |
email | character(60) | | not null |
uid | character(20) | | not null |
pwd | character(20) | | not null |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_email_key" UNIQUE CONSTRAINT, btree (email)
"users_pwd_key" UNIQUE CONSTRAINT, btree (pwd)
"users_uid_key" UNIQUE CONSTRAINT, btree (uid)
Triggers:
usersetup_identifier AFTER INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE usersetup_identifier_insert_update()
All the columns match there corresponding columns
can any one help and tell me where i am going wrong?
Table "public.users_setup"
Column | Type | Collation | Nullable | Default
-----------------+------------------------+-----------+----------+-----------------------------------------
id | integer | | not null | nextval('users_setup_id_seq'::regclass)
user_id | integer | | |
user_first_name | character(20) | | |
user_last_name | character(20) | | |
user_uid | character(20) | | |
Can any one help me with where I am going wrong?
There are multiple errors in your code
the table users has a column named identifier so the expression NEW.identifier is invalid
You are assigning a value to a (non-existing) column with the expression new.identifier := ... - but you want to run an INSERT statement, not assign a value.
String values need to be enclosed in single quotes, e.g. 'Arthur', double quotes denote identifiers (e.g. a table or column name). But there is no column named "INSERT INTO use ..."
To access the values of the row being inserted you need to use the new record and the column names. No need to select from the table:
As far as I can tell, this is what you want:
CREATE OR REPLACE FUNCTION usersetup_identifier_insert_update()
RETURNS trigger
AS
$BODY$
BEGIN
INSERT INTO users_setup (user_id, user_first_name, user_last_name, user_uid)
values (new.id, new.first_name, new.last_name, new.uid);
RETURN NEW;
end
$BODY$
LANGUAGE plpgsql;
Unrelated, but:
copying data around like that is bad database design. What happens if you change the user's name? Then you would need to UPDATE the user_setup table as well. It is better to only store a (foreign key) reference in the user_setup table that references the users table.

Hive insert overwrite directory stored as parquet made NULL values

I'm trying to add some data in one directory, and after to add these data as partition to a table.
create table test (key int, value int) partitioned by (dt int) stored as parquet location '/user/me/test';
insert overwrite directory '/user/me/test/dt=1' stored as parquet select 123, 456, 1;
alter table test add partition (dt=1);
select * from test;
This code sample is simple... but don't work. With the select statement, the output is NULL, NULL, 1. But I need 123, 456, 1.
When I read the data with Impala, I received 123, 456, 1... what is expected.
Why ? What is wrong ?
If I removed the two "stored as parquet", it's all ok... but I want my data in parquet !
PS : I want this construct for a switch of partition, so that when the data are calculated, they don't go to the user...
Identifying the issue
hive
create table test (key int, value int)
partitioned by (dt int)
stored as parquet location '/user/me/test'
;
insert overwrite directory '/user/me/test/dt=1'
stored as parquet
select 123, 456
;
alter table test add partition (dt=1)
;
select * from test
;
+----------+------------+---------+
| test.key | test.value | test.dt |
+----------+------------+---------+
| NULL | NULL | 1 |
+----------+------------+---------+
bash
parquet-tools cat hdfs://{fs.defaultFS}/user/me/test/dt=1/000000_0
_col0 = 123
_col1 = 456
Verifying the issue
hive
alter table test change column `key` `_col0` int cascade;
alter table test change column `value` `_col1` int cascade;
select * from test
;
+------------+------------+---------+
| test._col0 | test._col1 | test.dt |
+------------+------------+---------+
| 123 | 456 | 1 |
+------------+------------+---------+
Suggestd Solution
create additional table test_admin and do the insert through it
create table test_admin (key int, value int)
partitioned by (dt int)
stored as parquet location '/user/me/test'
;
create external table test (key int, value int)
partitioned by (dt int)
stored as parquet
location '/user/me/test'
;
insert into test_admin partition (dt=1) select 123, 456
;
select * from test_admin
;
+----------+------------+---------+
| test.key | test.value | test.dt |
+----------+------------+---------+
| 123 | 456 | 1 |
+----------+------------+---------+
select * from test
;
(empty result set)
alter table test add partition (dt=1)
;
select * from test
;
+----------+------------+---------+
| test.key | test.value | test.dt |
+----------+------------+---------+
| 123 | 456 | 1 |
+----------+------------+---------+