How can I copy data from CSV to a destination table based on column names? - sql

Context
I am receiving CSV files in S3, which do not always follow the same schema and/or order. For example, sometimes files look like:
foo, bar, bla
hi , 007, 42
bye, 008, 44
But other times, they can look like (bar can be missing):
foo, bla
hi , 42
bye, 44
Now let's say I'm only interested in getting the foo column regardless of what else is there. But I can't really count on the order of the columns in the CSV. so on some days foo could be the first column, but on other days foo could be the third column. By the way, I am using Snowflake as a database.
What I have tried to do
I created a destination table like:
CREATE TABLE woof.meow (foo TEXT);
Then I tried to use Snowflake's COPY INTO command to copy data from the CSV into the table I created. The catch here, is that I tried to do the same way I normally do for Parquet files (matching by column names!) like:
COPY INTO woof.meow
FROM '#STAGES.MY_S3_BUCKET_STAGE/'
file_format = (
TYPE=CSV,
COMPRESSION=GZIP,
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
But sadly I always got: error: Insert value list does not match column list expecting 1 but got 0
Some research lead me to this section of the docs (about MATCH_BY_COLUMN_NAME) to discover CSV is not supported:
This copy option is supported for the following data formats:
- JSON
- Avro
- ORC
- Parquet
Desired objective
How can I copy data from the STAGE (containing csv file on s3)to a pre-created table based on column names?
I am happy to provide any further information if needed.

You are trying to insert CSV which is comma separated values file data into one text column ,to my knowledge your column order in your source data files should be same as column orders that you have created for target table in Snowflake which means if you have foo , bar and bla as columns in source csv file then your target table columns should be also be created as separate columns , in same order as source csv files;
If you have unsure of what columns could come in your source file ; i would recommend you transform this file to JSON (that is my choice you can choose other option too like avro) and load that content into VARIANT column in Snowflake;
By this way you would not worry much about order of columns in source files , you would store data as JSON/AVRO into target table and would use JSON handling mechanism to convert JSON values into Columns.(Flatten the JSON to convert it onto relational table)`

Related

How to import daily csv data into table with generated columns postgres

I'm new to PostgreSQL and and looking for some guidance and best practice.
I have created a table by importing data from a csv file. I then altered the table by creating multiple generated columns like this:
ALTER TABLE master
ADD office VARCHAR(50)
GENERATED ALWAYS AS (CASE WHEN LEFT(location,4)='Chic' THEN 'CHI'
ELSE LEFT(location,strpos(location,'_')-1) END) STORED;
But when I try to import new data into the table I get the following error:
ERROR: column "office" is a generated column
DETAIL: Generated columns cannot be used in COPY.
My goal is to be able to import new data each day to the table and have the generated columns automatically populate in order to transform the data as I would like. How can I do so?
CREATE TEMP TABLE master (location VARCHAR);
ALTER TABLE master
ADD office VARCHAR
GENERATED ALWAYS AS (
CASE
WHEN LEFT(location, 4) = 'Chic' THEN 'CHI'
ELSE LEFT(location, strpos(location, '_') - 1)
END
) STORED;
--INSERT INTO master (location) VALUES ('Chicago');
--INSERT INTO master (location) VALUES ('New_York');
COPY master (location) FROM $$d:\cities.csv$$ CSV;
SELECT * FROM master;
Is this the structure and the behaviour you are expecting? If not, please provide more details regarding your table structure, your importable data and your importing commands.
Also, maybe when you try to import the csv file, the columns are not linked properly, or maybe the delimiter is not properly set. Try to specify each column in the exact order that appear in your csv file.
https://www.postgresql.org/docs/12/sql-copy.html
Note: d:\cities.csv contains:
Chicago
New_York
EDIT:
If columns positions are mixed up between table and csv, the following operation may come in handy:
1. create temporary table tmp (csv_column1 <data_type>, csv_column_2 <data_type>, ...); (including ALL csv columns)
2. copy tmp from '/path/to/file.csv';
3. insert into master (location, other_info, ...) select csv_column_3 as location, csv_column_7 as other_info, ... from tmp;
Importing data using an intermediate table may slow things down a little, but gives you a lot of flexibility.
I was getting the same error when importing to PG from a csv - I found that even though my column was generated, I still had to have it in the imported data, just left it empty. Worked fine when the column name was in there and mapped to my DB col name.

HIVE Query - Loading data into HIVE Table

I have a dataset (txt file) in which there are 10 columns from which, last column has string data separated by a tab. for example -> abcdef lkjhj pqrst...wxyz
I created a new table defining col 10 as STRING but after loading the data into this table and I verify the data it shows only abcdef populated in the last column and the rest are ignored.
Plz can someone help how do I load entire string of data in the hive table. Do I need to write UDF ?
Thanks in advance

Dynamically export SQL table to CSV file everysec using SSIS

I am new to SQL and SSIS work. I'd like to dynamically export SQL datatable to a CSV file everysec using SSIS. one of column name RECIPE STATUS, value is 1 OR 2 (int) (1 is new Recipe, 2 is old recipe which is existed), Another column name is RECIPE NAME, value is AAABBB (varchar) some more columns with values In Database table, only one row data avaialble, it will be changed ev sec. So we are trying to export it to csv file to log different/unique RECIPENAMES and data for analysis.
Table Schema is
SELECT TOP 5 [RowID]
,[RowInsertTime]
,[TransId]
,[RecipeStatus]
,[RecipeName]
,[RecipeTagName]
,[Value]
,[ReadStatus]
,[sData1]
,[sData2]
,[sData3]
,[nData1]
,[nData2]
,[nData3]
FROM [MES].[dbo].[MESWrite_RecipeData]
While exporting, based on RECIPE STATUS value if 1, then Insert/create a new row in CSV and export. (Its a simple insert TSQL statement, insert into csvfile (values from databsetable)
if value 2 means RECIPENAME value is AAABBB existed in CSV already. so find and update other columns values. so in this case don't create a new row in csv. (i can say update all othercolumns with values from database table to csv where RECIPENAME="AAABBB") like Update Query in T-SQL
. finally if we have to send this SSIS package to a customer, it can be executable file. or how to make secured? Please help me ..

add multi rows to coupons sql with csv(one field only)

I have a a table with the structure:
I also have a csv containing all the coupon codes only.
I want to use the same values for the other fields (except id of course).
What would be the be the sql required to insert these rows?
Using phpMyAdmin You can insert data from CSV only if it contains all of the required (NOT NULL column) values - the other (when missing) will be filled with defaults, NULL or auto_incremeneted. But Using this tool it is not possible to insert only codes from CSV and use some same values for the other fields. So here You have two options: either set the default values of that columns not being updated from CSV to the desired ones or create a PHP import script, that will do that for You (if You do not want to change the DB table schema).

sql dump of data based on selection criteria

When extracting data from a table (schema and data) I can do this by right clicking on the database and by going to tasks->Generate Scripts and it gives me all the data from the table including the create script, which is good.
This though gives me all the data from the table - can this be changed to give me only some of the data from the table? e.g only data on the table after a certain dtmTimeStamp?
Thanks,
I would recommend extracting your data into a separate table using a query and then using generate scripts on this table. Alternatively you can extract the data separately into a flatfile using the export data wizard (include your column headers and use comma seperators with double quote field delimiters).
To make a copy of your table:
SELECT Col1 ,Col2
INTO CloneTable
FROM MyTable
WHERE Col3 = #Condition
(Thanks to #MarkD for adding that)