Is it possible to prefilter data when copying it from csv to sql table? - sql

I have a large .csv table that I want to insert into a Postgres DB. I don't need all rows from the table, is it possible to somehow filter it using SQL before it is uploaded to the database? Or the only option is to delete the rows I don't need afterward?

Related

CSV deletes the data from the table

I have created a new table in SQLPro for Postgres, and I want to upload multiple CSV into that Table.
Each CSV has about 5K records. Basically, whenever I want to upload another one it deletes/overrides the information from the table.
Can you help? :)
Basically,
Merge all the CSV with the headers insert them into your table.
Delete all the rows that were created by headers.
Might be obvious, but remember it will only work with CSV where data is mapped the same way.

How to do bulk upsert operation in snowflake?

Am syncing my mongo DB data to snowflake on a daily basis using a node js script. So if a row is already existing in snowflake, then I want to replace that row with the new data, or if the row doesn't exist in snowflake then I want to insert a new row.
Also, I want to do this for a lot of data.
So is there any way to do bulk upsert in snowflake? If not, then what will be the optimal way to achieve this?
The table may have millions of rows and possibly go to billions in the future.
This is a typical use case for a merge statement. You can see the documentation for merge here: https://docs.snowflake.com/en/sql-reference/sql/merge.html
Using a merge statement for billions of rows can lead to some high-churn tables so it isn't ideal. It could be better if you can append to the table only and figure out the latest record with a select statement.
You can bulk copy your data into a staging table then use MERGE feature in snowflake.

How to ETL a table that contains Blob columns from one Oracle table to another using SSIS

We have a table that contains 50 rows of data.the table includes BLOB data types, we are trying to see if we can use SSIS to copy the data from table1 to table2 inclusing the BLOB columns as we have tried to use other methods but did not succeed.
The Blob columns contain Excel documents.
Is this possible? Is there an easier way to do it in SSIS?
If not possible as there an easier way to it on Oracle?

how to find the distinct records in a CSV file using sql * loader

it is possible to find records
SQL Loader is a utility that inserts data from data file(s) into database table(s). You can't find distinct records in the file using it.
If the table is defined with primary key(s), referential integrity and check constraints, those duplicate records from your csv file won't get inserted into the table. SQL loader will reject those records as bad records.

Is it possible to overwrite with a SSIS Insert or similar?

I have a .csv file that gets pivoted into 6 million rows during a SSIS package. I have a table in SQLServer 2005 of 25 million + rows. The .csv file has data that duplicates data in the table, is it possible for rows to get updated if it already exists or what would be the best method to achieve this efficiently?
Comparing 6m rows against 25m rows is not going to be too efficient with a lookup or a SQL command data flow component being called for each row to do an upsert. In these cases, sometimes it is most efficient to load them quickly into a staging table and use a single set-based SQL command to do the upsert.
Even if you do decide to do the lookup - split the flow into two streams, one which inserts and the other which inserts into a staging table for an update operation.
If you don't mind losing the old data (ie. the latest file is all that matters, not what's in the table) you could erase all the records in the table and insert them again.
You could also load into a temporary table and determine what needs to be updated and what needs to be inserted from there.
You can use the Lookup task to identify any matching rows in the CSV and the table, then pass the output of this to another table or data flow and use a SQL task to perform the required Update.