Delete rows before doing any transformation in datastage designer? - designer

I want to delete some rows before doing any processing. But I don't know how to do it in datastage designer? Is there any stage that holds sql requests??

Not knowing the job you will do to do the "processing" you can do one job that has 2 stages: a row_generator and a database link (depending on you db).
Configure the row generator to generate 1 row only. On the db stage you can do whatever delete ou want. This will be your job that deletes the rows you want.
After that, do a sequence job and add the job you've done to delete rows and link it to whatever jobs you want to run next - don't forget to configure the triggers on the these links.
Hope this helps.

Yes it can be done via writing a SQL in the BEFORE SQL part of the source database connector stage.
So whenever your job runs first that delete query would be executed and other things later.
Thankyou

Related

SQL Server: copy newly added rows from one table and insert into another automatically

I need to perform some calculations using few columns from a table. This database table that gets updated every couple of hours generates duplicates on couple of columns every other day. There is no way tell which one is inserted first which affects my calculations.
Is there a way to copy these rows into a new table automatically as data gets added every couple of hours and perform calculations on the fly? This way whatever comes first will be captured into a new table for a dashboard and for other business use cases.
I thought of creating a stored procedure and using a job scheduler to perform this. But I do not have admin access and can not schedule jobs. Is there another way of doing this efficiently? Much appreciated!
Edit: My request for admin access is being approved.
Another way as to stated in the answers, what you can do is:
Make a temp table.
Make a prod table.
Use stored procedure to copy everything from the temp table into prod table after any load have been done.
Use the same stored procedure to clean the temp table after the load is done.
Don't know if this will work, but this is in general how we are dealing with huge amount of load on a daily basis.

MuleSoft ESB Batch Transactions

Thanks is advance for any help. Here is the scenario that I am trying to recreate in Mulesoft.
1,500,000 records in a table. Here is the current process that we use.
Start a transaction.
delete all records from the table.
reload the table from a flat file.
commit the transaction.
in the end we need the file in a good state, thus the use of the transaction. If there is any failure, the data in the table will be rolled back to the initial valid state.
I was able to get the speed that we needed by using the Batch element < 10 minutes, but it appears that transactions are not supported around the whole batch flow.
Any ideas how I could get this to work in Mulesoft?
Thanks again.
A little different workflow but how about:
Load temp table from flat file
If successful drop original table
Rename temp table to original table name
You can keep your Mule batch processing workflow to load the temp table and forget about rolling back.
For this you might try the following:
Use XA transactions (since more than one connector will be used,
regardless of the use of the same transport or not)
Enlist in the transaction the resource used in the custom Java code.
This also can be applied within the same transport (e.g. JDBC on the Mule configuration and also on the Java component), so it's not restricted to the case demonstrated in the PoC, which is only given as a reference.
Please refer to this article https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro
From temp table poll records.You can contruct array with any number of records. With 100K size will only involve 15 round trips in total.
To determine error records you can insert records in an error table but that has to be implemented in database procedure.

Scheduled job to copy data

I need help with setting up a scheduled job.
I have two SQL Server databases on two different servers. The job would do SELECT on database A and INSERT on database B. When something changes in database A, the job would compare what had changed and did an update on database B.
Is this possible if I have SQL Server 2008 R2 Management Studio?
Thank you very much in advance.
I would suggest to make a Replication if possible. Read more about it here.
Otherwise if you really need a own job you have two ways.
Execute the Job with your SQL Agent every X minutes/hours. Check your new data and execute the INSERT-statement.
You can create a trigger on the source table which sets a flag in a table or on the sourcetable itself after an insert is executed. Your job on the target server executes every x minutes or even seconds and check the source table. After that he can evaluate if a changed happen and just copy the flagged rows to your target.
You could set up one-way replication between the two servers and let it take care of everything for you.
Or, you could add server B as a linked server then take responsibility for the record inspection and crafting the insert/update/delete statements yourself.
Have you tried either?

MSSQL Automatic Merge Database

I have a PC that has a MSSQL database with 800 variables being populated every second. I need that database to merge/backup to a second database on another server PC at least every 10 minutes. Additionally, the first database needs to be wiped clean once per week, in order to save local drive space, so that only 1 week's worth of data is stored on that first database at any given time; meanwhile, the second database keeps everything intact and never gets cleared, only being added upon by the merges that occur every 10 minutes.
To my knowledge, this means I cannot rely on database mirroring, since the first one will be wiped every week. So from what I have gathered, this means I have to have scheduled merges going on every 10 minutes.
I will readily admit I know next to nothing about SQL. So my two questions are:
How do I set up scheduled merges to occur from one database to another in 10 minute frequencies?
How do I set a database to be scheduled/scripted so that it gets cleared every week?
(Note: both databases are running on MS SQL Server 2012 Standard.)
Assuming you can create a linked server on server A that connects to server B (Here's a guide)
Then create a trigger on your table, for example table1:
CREATE TRIGGER trigger1
ON table1
AFTER INSERT
AS
INSERT INTO ServerB.databaseB.dbo.table1
select *
from inserted
More on triggers here.
For part 2, you can schedule a job to truncate the table on whatever schedule you would like. How to create a scheduled job.
The trigger only fires on Inserts so deleting rows does nothing to the table on server B.
How is the purging/deleting of the data happening, via a stored proc? If so, you could also try transactional replication, and replicate the execution of that particular stored proc, but dummy the proc on the subscriber, so when the proc gets replicated and executed on the subscriber, nothing will get deleted/purged.

BigQuery Schema error despite updating schema

I'm trying to run multiple simultaneous jobs in order to load around 700K record to a single BigQuery table. My code (Java) creates the schema from the records of is job, and updates the BigQuery schema, if needed.
Workflow is as follows:
A single job creates the table and sets the (initial) schema.
For each load job we create the schema from the records of the job. Then we pull the existing table schema from BigQuery, and if it's not a superset of the schema associated with the job, we update the schema with the new merged schema. The last part (starting from pulling the existing schema) is synced (using a lock) - only one job performs it at a time. The update of the schema is using the UPDATE method, and the lock is released only after the client update method returns.
I was expecting to avoid encountering schema update errors using this workflow. I'm assuming that once the client returns from the update job, then the table is updated, and that jobs that are in process can't be hurt from the schema update.
Nevertheless, I still get schema update errors from time to time. Is the update method atomic? How do I know when a schema was actually updated?
Updates in BigQuery are atomic, but they are applied at the end of the job. When a job completes, it makes sure that the schemas are equivalent. If there was a schema update while the job was running, this check will fail.
We should probably make sure that the schemas are compatible instead of equivalent. If you do an append with a compatible schema (i.e. you have a subset of the table schema) that should succeed, but currently BigQuery doesn't allow this. I'll file a bug.