Merge rows from multiple tables SQL (incremental) - sql

I'm consolidating the information of 7 SQL databases into one.
I've made a SSIS package and used Lookup transformation and I managed to get the result as expected.
The problem: 30 million rows. And I want to perform a daily task to add to the destination table the new rows in the source tables.
So it takes like 4 hours to execute the package...
Any suggestion ?
Thanks!
I have only tried full cache mode...

Related

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

SSIS Alternatives to one-by-one update from RecordSet

I'm looking for a way to speed up the following process: I have a SSIS package that loads data from Excel files on a weekly basis to SQL Server. There are 3 fields: Brand, Date, Value.
In the dataflow, I check for existing combinations of Brand+Date, and new combinations go to the table directly, the existing ones go to a RecordSet destination for updates:
The next step is to update the Value of the existing combinations:
As you can see, there are thousands of records to update, and it takes too long. The number of records tend to grow week by week. Please suggest.
The fastest way will be do this inside a Stored procedure using ELT (Extract Load Transform) approach.
Push all data from excel as is into a table(called load to a staging table in theory). Since you do not seem to be concerned with data validation steps, this table can be a replica of final destination table columns.
Next step is to call a stored procedure using Execute SQL task. Inside this procedure you can put all your business logic. Since this steps with native data manipulation on SQL server entities, it is the fastest alternative.
As a last part, please delete all entries from the staging table.
You can use indexes on staging table to make the SP part even faster.

Is there a MAX size for a table in MSSQL

I have 150Million records with 300 columns (nchar),I am running a script to import data to the database database but it is always stopping when it gets to 10Million..
Is there a MSSQL setting that controls how many records can be on a table? What can it be making it stop at 10Million?
Edit:
I have run the script multiple times and it has been able to create multiple tables, but they all max at the same 10million records
depends on available storage. The more available storage you have, the more rows can you have in a table

Using SSIS to create new Database from two separate databases

I am new to SSIS.I got the task have according to the scenario as explained.
Scenario:
I have two databases A and B on different machines and have around 25 tables and 20 columns with relationships and dependencies. My task is to create a database C with selected no of tables and in each table I don't require all the columns but selected some. Conditions to be met are that the relationships should be intact and created automatically in new database.
What I have done:
I have created a package using the transfer SQL Server object task to transfer the tables and relationships.
then I have manually edited the columns that are not required
and then I transferred the data using the data source and destination
My question is: can I achieve all these things in one package? Also after I have transferred the data how can I schedule the package to just transfer the recently inserted rows in the database to the new database?
Please help me
thanks in advance
You can schedule the package by using a SQL Server Agent Job - one of the options for a job step is run SSIS package.
With regard to transferring new rows, I would either:
Track your current "position" in another table, assumes you have either an ascending key or a time stamp column - load the current position into an SSIS variable, use this variable in the WHERE statement of your data source queries.
Transfer all data across into "dump" copies of each table (no relationships/keys etc required just the same schema) & use a T-SQL MERGE statement to load new rows in, then truncate "dump" tables.
Hope this makes sense - its a bit difficult to get across in writing.

Is it possible to overwrite with a SSIS Insert or similar?

I have a .csv file that gets pivoted into 6 million rows during a SSIS package. I have a table in SQLServer 2005 of 25 million + rows. The .csv file has data that duplicates data in the table, is it possible for rows to get updated if it already exists or what would be the best method to achieve this efficiently?
Comparing 6m rows against 25m rows is not going to be too efficient with a lookup or a SQL command data flow component being called for each row to do an upsert. In these cases, sometimes it is most efficient to load them quickly into a staging table and use a single set-based SQL command to do the upsert.
Even if you do decide to do the lookup - split the flow into two streams, one which inserts and the other which inserts into a staging table for an update operation.
If you don't mind losing the old data (ie. the latest file is all that matters, not what's in the table) you could erase all the records in the table and insert them again.
You could also load into a temporary table and determine what needs to be updated and what needs to be inserted from there.
You can use the Lookup task to identify any matching rows in the CSV and the table, then pass the output of this to another table or data flow and use a SQL task to perform the required Update.