Migrate data using custom query for a table with EDB MTK - enterprisedb

I want to extract data from an Oracle table using MTK to EDB Postgres, but I need a custom query for this specific table, is it possible to run this? I want to do a data transformation in between.

Related

How to do bulk upsert operation in snowflake?

Am syncing my mongo DB data to snowflake on a daily basis using a node js script. So if a row is already existing in snowflake, then I want to replace that row with the new data, or if the row doesn't exist in snowflake then I want to insert a new row.
Also, I want to do this for a lot of data.
So is there any way to do bulk upsert in snowflake? If not, then what will be the optimal way to achieve this?
The table may have millions of rows and possibly go to billions in the future.
This is a typical use case for a merge statement. You can see the documentation for merge here: https://docs.snowflake.com/en/sql-reference/sql/merge.html
Using a merge statement for billions of rows can lead to some high-churn tables so it isn't ideal. It could be better if you can append to the table only and figure out the latest record with a select statement.
You can bulk copy your data into a staging table then use MERGE feature in snowflake.

Can I use big query export data statement and scheduled the query?

I have a similar question asked in this link BigQuery - Export query results to local file/Google storage
I need to extract data from 2 big query tables using joins and where conditions. The extracted data has to be placed in a file on cloud storage. Mostly csv file. I want to go with a simple solution. Can I use big query export data statement In standard sql and schedule it?? Does it has a limitation of 1 Gb export?? If yes, what is the best possible way to implement this? Creating another temp table to save results from the query and using a data flow job to extras the data from the temp table? Please advise.
Basically google cloud now supports below
Please see code snippet in cloud documentation
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#exporting_data_to_csv_format
I’m thinking if I can use the above statement to export data into a file and select query will have join from 2 tables and other conditions.
This query will be a scheduled query in big query.
Any inputs please??

Deleting records in a table with billion records using spark or scala

we have a table in Azure Data Warehouse with 17 billion records. Now we have a scenario where we have to delete records from this table based on some where condition. We are writing Spark in Scala language in Azure Databricks notebooks.
We searched for different options to do this in Spark, but all suggested to first read the entire table, delete records from this and then overwrite the entire table in Data Warehosue. However this approach will not work in our case due to huge number of records in our table.
Can you please suggest how we can achieve this functionality using spark/scala?
1) checked if we can call stored procedure through spark/scala code in azure databricks but Spark do not support stored procedures.
2) Tried reading the entire table first to delete the records but it goes into never ending loop.
Is possible to create view with select clause as per your requirement, then using of the view

U-SQL job to query multiple tables with dynamic names

Our challenge is the following one :
in an Azure SQL database, we have multiple tables with the following table names : table_num where num is just an integer. These tables are created dynamically so the number of tables can vary. (from table_1, table_2 to table_N) All tables have the same columns.
As part of a U-SQL script file, we would like to execute the same query on all of these tables and generate an output csv file with the combined results of all these queries.
We tried several things :
U-SQL does not allow looping so we were thinking creating a View in our Azure SQL database that would combine all the tables using a cursor of some sort. Then, the U-SQL file would query this View (using external source). However, a View in Azure SQL database can only be created via a function and a function cannot execute dynamic SQL or even call a stored procedure...
We did not find a way to call a stored procedure of the external data source directly from U-SQL
we dont want to update our U-SQL job each time a new table is added...
Is there a way to do that in U-SQL through a custom extractor for instance? Any other ideas?
One solution I can think of is to use Azure Data Factory (v2) to assist in this.
You could create a pipeline with the following activities:
Lookup activity configured to execute the stored procedure
For Each activity that uses the output of the lookup activity as a source
As a child item use a U-Sql Activity that executes your U-Sql script which writes the output of a single table (the item of the For Each activity) to blob or datalake
Add a Copy Activity that merges the blobs from step 2.1 to one final blob.
If you have little or no experience working with ADF v2 do mind that it takes some time to get to know it but once you do, you won't regret it. Having a GUI to create the pipeline is a nice bonus.
Edit: as #wBob mentions another (far easier) solution is to somehow create a single table with all rows since all dynamically generated table have the same schema. You can create a stored procedure for populating this table for example.

SSRS Complex Query Report

I am developing a report in SSRS.
My report has around 50 row headers. Data for each row header is the reult of a complex query to the database.
2 row header may/may not have data that relates to another row header.
In this case what would be the best way to create the report?
-- Do we create a procedure that gets all data to a temporary table and then generate the report using this temp table?
-- Do we create multiple datasets for this report.
Please advice on what would be the best way to proceed.
I read somewhere about using Link wherein data is retrieved from the post gre database (project uses postGreSql db) to the local sql server that SSRS provides.
Report then retrieves data from the local sql server to generate the report.
Thoughts?
You are best using a stored procedure as the source. It is easy to optimize the stored procedure so as to get the best performance so the report runs fast.
Assembling all your data so that you can use a single dataset to represent it would be the way to go.