How to import daily csv data into table with generated columns postgres

How to import daily csv data into table with generated columns postgres - sql

I'm new to PostgreSQL and and looking for some guidance and best practice.
I have created a table by importing data from a csv file. I then altered the table by creating multiple generated columns like this:
ALTER TABLE master
ADD office VARCHAR(50)
GENERATED ALWAYS AS (CASE WHEN LEFT(location,4)='Chic' THEN 'CHI'
ELSE LEFT(location,strpos(location,'_')-1) END) STORED;
But when I try to import new data into the table I get the following error:
ERROR: column "office" is a generated column
DETAIL: Generated columns cannot be used in COPY.
My goal is to be able to import new data each day to the table and have the generated columns automatically populate in order to transform the data as I would like. How can I do so?

CREATE TEMP TABLE master (location VARCHAR);
ALTER TABLE master
ADD office VARCHAR
GENERATED ALWAYS AS (
CASE
WHEN LEFT(location, 4) = 'Chic' THEN 'CHI'
ELSE LEFT(location, strpos(location, '_') - 1)
END
) STORED;
--INSERT INTO master (location) VALUES ('Chicago');
--INSERT INTO master (location) VALUES ('New_York');
COPY master (location) FROM $$d:\cities.csv$$ CSV;
SELECT * FROM master;
Is this the structure and the behaviour you are expecting? If not, please provide more details regarding your table structure, your importable data and your importing commands.
Also, maybe when you try to import the csv file, the columns are not linked properly, or maybe the delimiter is not properly set. Try to specify each column in the exact order that appear in your csv file.
https://www.postgresql.org/docs/12/sql-copy.html
Note: d:\cities.csv contains:
Chicago
New_York
EDIT:
If columns positions are mixed up between table and csv, the following operation may come in handy:
1. create temporary table tmp (csv_column1 <data_type>, csv_column_2 <data_type>, ...); (including ALL csv columns)
2. copy tmp from '/path/to/file.csv';
3. insert into master (location, other_info, ...) select csv_column_3 as location, csv_column_7 as other_info, ... from tmp;
Importing data using an intermediate table may slow things down a little, but gives you a lot of flexibility.

I was getting the same error when importing to PG from a csv - I found that even though my column was generated, I still had to have it in the imported data, just left it empty. Worked fine when the column name was in there and mapped to my DB col name.

Related

How can I copy data from CSV to a destination table based on column names?

Context
I am receiving CSV files in S3, which do not always follow the same schema and/or order. For example, sometimes files look like:
foo, bar, bla
hi , 007, 42
bye, 008, 44
But other times, they can look like (bar can be missing):
foo, bla
hi , 42
bye, 44
Now let's say I'm only interested in getting the foo column regardless of what else is there. But I can't really count on the order of the columns in the CSV. so on some days foo could be the first column, but on other days foo could be the third column. By the way, I am using Snowflake as a database.
What I have tried to do
I created a destination table like:
CREATE TABLE woof.meow (foo TEXT);
Then I tried to use Snowflake's COPY INTO command to copy data from the CSV into the table I created. The catch here, is that I tried to do the same way I normally do for Parquet files (matching by column names!) like:
COPY INTO woof.meow
FROM '#STAGES.MY_S3_BUCKET_STAGE/'
file_format = (
TYPE=CSV,
COMPRESSION=GZIP,
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
But sadly I always got: error: Insert value list does not match column list expecting 1 but got 0
Some research lead me to this section of the docs (about MATCH_BY_COLUMN_NAME) to discover CSV is not supported:
This copy option is supported for the following data formats:
- JSON
- Avro
- ORC
- Parquet
Desired objective
How can I copy data from the STAGE (containing csv file on s3)to a pre-created table based on column names?
I am happy to provide any further information if needed.

You are trying to insert CSV which is comma separated values file data into one text column ,to my knowledge your column order in your source data files should be same as column orders that you have created for target table in Snowflake which means if you have foo , bar and bla as columns in source csv file then your target table columns should be also be created as separate columns , in same order as source csv files;
If you have unsure of what columns could come in your source file ; i would recommend you transform this file to JSON (that is my choice you can choose other option too like avro) and load that content into VARIANT column in Snowflake;
By this way you would not worry much about order of columns in source files , you would store data as JSON/AVRO into target table and would use JSON handling mechanism to convert JSON values into Columns.(Flatten the JSON to convert it onto relational table)`

Adding column while loading csv in SQL

I have an existing table (NameList) in to which I would like to load the contents of multiple csv files (fileA.csv, fileB.csv ...). The columns of the table are identical to those of the csv except that I want to record for each row the id of the csv file it came from. The id would be taken from another table which contains the properties of each file.
The table with the list of files would look like this:
CREATE TABLE files
(
id serial,
fileName varchar(128),
path varchar(256),
PRIMARY KEY (id)
)
The table to insert the csv contents in to would look like:
CREATE TABLE NameList
(
FirstName varchar(40),
LastName varchar(40),
SourceFile_ID int,
FOREIGN KEY (SourceFile_ID) REFERENCES files(id)
)
The csv files would look as follows:
Name of file:
fileA.csv
Contents:
FirstName,LastName
John,Smith
.
.
.
The only thing relating to this I could find so far is this:
Add extra column while importing csv data in table in SQL server table
However they suggest to use a default value on the additional column which would not solve my problem since I need to have a different value for each file I add.

You could insert the data into a temporary table (https://www.postgresqltutorial.com/postgresql-temporary-table), update the column, then move the data to the main table.
This would avoid problems with 2 CSVs being loaded at once, because they'd be using different temp tables (as long as 2 different db sessions are used for the inserts). Even if only one session is used, you could have different names for the temp table for different CSVs.

How to use 'COPY FROM VERTICA' on same database to copy data from one table to another

I want to copy data from one table to another in vertica using COPY FROM VERTICA command. I have a table having large data in it and i want to select few data (where field1 = 'some val' etc) from it and copy to another table.
Source table has columns of type long varchar and i want to copy these value in another table having different column type like varchar, date and boolean etc. What i want is that only valid values should be copied in destination table, error data should be rejected.
I tried to move data using insert command like below, but problem is that if even there is a single row with invalid data then it 'll terminate process (i have nothing copied in destination table).
INSERT INTO cb.destTable(field1, field2, field3)
Select cast(field1 as varchar), cast(field2 as varchar), cast(field3 as int)
FROM sourceTable Where Id = 2;
How this can be done?

COPY FROM VERTICA and EXPORT TO VERTICA are intended to copy data between clusters. Even if you did loopback the connection, you would not be able to use rejects as they are not supported by COPY FROM VERTICA. The mappings are strict, so if it cannot coerce it will fail.
You'll have to:
INSERT ... SELECT ... WHERE <conditions to filter out data that won't coerce>
INSERT ... SELECT <expressions that massage data that won't coerce>
Export data to a file using vsql (you can turn off headers/footers, turn off padding, set the delimiter to something that doesn't exist in your data, etc) Then use a copy to load it back in.

Try exporting it into a csv file:
=>/o output.csv
=>Select cast(field1 as varchar), cast(field2 as varchar), cast(field3 as int) FROM sourceTable Where Id = 2;
=>/o
Then use COPY command to load it back into the desired table.
COPY FROM '(csv_directory)' DELIMITER '(comma or your configured delimiter)' NO ESCAPE NULL '(NULL indicator)' SKIP 1;

Are they both in the same Vertica database? If so an alternative is:
DROP TABLE IF EXISTS cb.destTable;
CREATE TABLE cb.destTable AS
SELECT field1::VARCHAR, field2::VARCHAR, field3::VARCHAR
FROM sourceTable WHERE Id = 2;

How to add lines from text file to sqlite db rows that already exist?

I have 12 columns with +/- 2000 rows in a sqlite DB.
Now I want to add a 13th column with the same amount of rows.
If I import the text from a cvs file it will add this after the existing rows (now I have a 4000 row table)
How can I avoid adding it underneath these rows?
Do I need to create a script to run trough each row of the table and add the text from the cvs file for each row?

If you have the code that imported the original data, and if the data has not changed in the meantime, you could just drop the table and reimport it.
Otherwise, you indeed have to create a script that looks up the corresponding record in the table and updates it.
You could also import the new data into a temporary table, and then copy the values over with a command like this:
UPDATE MyTable
SET NewColumn = (SELECT NewColumn
FROM TempTable
WHERE ID = MyTable.ID)

I ended up using Razor SQL great program.
http://www.razorsql.com/

sql dump of data based on selection criteria

When extracting data from a table (schema and data) I can do this by right clicking on the database and by going to tasks->Generate Scripts and it gives me all the data from the table including the create script, which is good.
This though gives me all the data from the table - can this be changed to give me only some of the data from the table? e.g only data on the table after a certain dtmTimeStamp?
Thanks,

I would recommend extracting your data into a separate table using a query and then using generate scripts on this table. Alternatively you can extract the data separately into a flatfile using the export data wizard (include your column headers and use comma seperators with double quote field delimiters).
To make a copy of your table:
SELECT Col1 ,Col2
INTO CloneTable
FROM MyTable
WHERE Col3 = #Condition
(Thanks to #MarkD for adding that)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to import daily csv data into table with generated columns postgres - sql

I was getting the same error when importing to PG from a csv - I found that even though my column was generated, I still had to have it in the imported data, just left it empty. Worked fine when the column name was in there and mapped to my DB col name.

Related

How can I copy data from CSV to a destination table based on column names?

Adding column while loading csv in SQL

How to use 'COPY FROM VERTICA' on same database to copy data from one table to another

How to add lines from text file to sqlite db rows that already exist?

sql dump of data based on selection criteria

Categories

Resources