How SSIS Dataflow really works?

How SSIS Dataflow really works? - sql

I have an SSIS package which transfers the data from one database to another.
The SSIS package runs on an application server.
I am thinking of moving one of the two databases to another data server. Will there be an impact in performance? How is the data flows in SSIS i.e. does all the data go in the application server where the SSIS runs and then to the destination database?

SSIS is a client-side process, so if it is running on a server other than the machine running the DBMS the traffic will be going over the network. Your question is not very clearly worded, but I think you want to know whether moving a DB will affect performance given that the SSIS package is already running on a separate machine.
If the SSIS job is already running on an application server that is a physically separate machine to the DB server then moving one of the databases will probably not affect the performance unless it has a radically slower network connection than the other.

I recently came across the same situation and we upgraded our source system to a better configuration box. I didn't have to do anything on my part, but the data load times from the source to SQL box were cut down from approximately 40 minutes to under 12 minutes on average. To answer your question, you will only see any performance variance depending on 1) Your new systems resources and 2) If you make changes to the box hosting your SQL Server.

Related

One-Time Synchronization with SQL Server database

We are currently migrating our whole IT systems (application server, web server, DBMS etc.) to a new datacenter. The systems used before have been spread out across several datacenters in Europe and one of the goals for this migration was to have it all in one place, in one private cloud. After an initial migration of the productive systems on a particular day, we are right now reconfiguring the whole systems, going to be testing them thoroughly. Afterwards we will have to bring the newly migrated, configured and tested systems to the latest version and will use the new system.
The systems so far have been using several large SQL Server 2008 databases, transferring them to the new datacenter has been quite painful, and had to be done via harddrive, since the outgoing network connection from that old location is only 10 Mbit. So I don't want to use a brute force approach to redo the whole way through using a harddrive, so transport the whole databases, I'd rather like to use database tools to get the job done.
I've heard of several technologies applicable for this, varying from SQL Server Replication to the Sync Framework, but I would like to know which of them would be the best for a one time synchronization of such larger databases, having especially in mind the rather slow outgoing network connection and a forced disconnect every 24 hours at that old Location.
I'd still go for Sync Framework, but the more I read about it, it seems to be rather targeted for scenarios where you want to implement periodic updates between several databases, so I'm unsure if I overlooked a rather simple and easier solution for this synchronization.

MSSQL Automated Jobs

I have been recently reading about configuring jobs within SQL Server and that they can be configured to do specific tasks.
I recently had issues whereby all the DB indexes where > 75% fragmented and I wondered if there was a way to have SQL Server automatically manage itself.
Now when reading about setting up and configuring jobs it mentions the SQL Server Agent.
In the DB Server I was looking at the SQL Server Agent was switched off.
This made me think that having a "job" to handle the rebuilding/reorganising of indexes may not be great if this agent can simply be disabled...
Is there anything at a DB level which can be configured to do this, or is this still really in the hands of a "DBA"?
So to summarise, my question is, what is the best way to handle rebuilding/reorganising indexes?

A job calling some stored procedures could be your answer.
Automation of this task depends on your DB: volume of data, fragmentation degree, batch updates, etc.
I recommend you to check regularly your index fragmentation, before applying an automatized solution.
Also, you can programmatically check if SQL Server Agent is running.

SQL Server full copy of database for read operations

Please advise what suits my problem better. I have a highload web app hosted on the same server where SQL server is hosted. I also have SQL Service reporting running on the same server, generating user reports.
So my server basically works on top of disk read/write speed. I'm going to get another server and install there another SQL server in order to host SSRS there. So my criteria is to get as fresh data as it possible.
I've looked a couple of solution, currently I do make backup via jobs, copy it to second server and restore it there, also via jobs. But that's not the best solution.
All replication mechanism(transaction, merge, snapshot) affect publisher database by locking it's table, what is unacceptable for me.
So I wonder is there any possibility to create a replica with read only access, that would be synced periodically not affecting main db? I would put all report load to that replica and make my primary db be used only by web app.
What solution might suit my problem? As I'm not a DBA, I'd start investigating that direction. Thanks.

Transactional Replication is typically used to off-load reporting to another server/instance and can be near real-time in a best case scenario. The benefit of Transactional Replication is you can place different indexes on the subscriber(s) to optimize reporting. You can also choose to replicate only a portion of the data if only a subset is needed for reporting.
The only time locking occurs with Transactional Replication is when you generate a snapshot. With concurrent snapshot processing, which is the default for Transactional Replication, the shared locks are only held for a short period of time, so users are able to continue working uninterrupted. Either way, this shouldn't be an issue since you'll likely be generating the snapshot during a period of low user activity anyway.

Where is the bottleneck / what are the gotchas when selecting records from a remote (linked) SQL server?

I'm in a satellite office that needs to pull some data from our main office for display on our intranet. We use MS SQL Server in both locations and we're planning to create a linked server in our satellite office pointing to the main office. The connection between the two is a VPN tunnel I believe (does that sound right? What do I know, I'm a programmer!)
I'm concerned about generating a lot of traffic across a potentially slow connection. We will be getting access to a SQL view on the main office's server. It's not a lot of data (~500 records) once the select query has run, but the view is huge (~30000 records) without a query.
I assume running a query on a linked server will bring back only the results over the wire (and not the entire view to be queried locally). In that case the major bottleneck is most likely the connection itself assuming the view is indexed, etc. Are there any other gotchas or potential bottlenecks (maybe based on the way I structure queries) that I should be aware of?

From what you explained your connection is likely to be the bottleneck.
Also, you might also consider caching data at the satellite location.
The decision will depend on the following:
- how many rows and how often data are updated in the main database
- how often you need to load the same data set at satellite location
Two edge examples:
Data is static or relatively static - inserts only in main DB. In satellite location users often query the same data again and again. In this case it would make sense to cache the data locally at satellite location.
Data is volatile, a lot of updates or/and deletes. Users in satellite location rarely query data and when they do, it is always different where condition. In this case it doesn't make sense to cache. If connection is slow and there are often changes you might end up never being at sync with the main DB.
Another advantage of caching is that you can implement data compression, which will alleviate bad effect of slow connection.
If you chose to cache at local location there are a lot of options, but this I believe would be another topic.
[Edit]
About compression: You can use compressed transaction log shipping. In SQL 2008 compression is supported in Enterprise edition only. In SQL 2008 R2 it is available starting Standard version. http://msdn.microsoft.com/en-us/library/bb964719.aspx .
You can implement custom compression before you ship transaction logs, using any compression library you like.

should i advocate migrating from access to (my)sql

We have a windows MFC app that is written against an access database on a company server. The db is not that big: 19 MB. There are at most 2-3 users accessing it at any one time. It is used in a factory environment where access speed (or lack thereof) over the intranet becomes noticeable as it is part of the manufacturing time for our widgets.
The scenario is this: as each widget is completed, it gets a record in the db.. by the end of the year, the db is larger and searching for a record takes longer and longer. The solution so far has been to manually move older records to an archival table about once a year.
We are reworking other portions of this app right now, and it would be a good time to move to another db if we are going to do it.
It is my understanding that if we were using sql, the search time would not go up as the table gets bigger because the entire .mdb does not have to be sent over the network each time. Is this correct? Does anyone have any insight about whether it could be worth it to go to the trouble (time and money) of migrating to a new db, or should I just add more functionality to the application we have now, and maybe automatically purge the older records from time to time, and add additional facilities to the app to get at the older records when needed?
Thanks for any wisdom you can share..

Since your database is small and very few users, I could not make a solid case for migration. I would definetly set up an script to archive old records on a more frequent basis (don't archive into same db, this would somewhat defeat the purpose).
But also make sure two things are correct as well.
INDEXES. If your queries start slowing down, make sure you have proper indexes
http://support.microsoft.com/kb/304272
Your network connection between computers is fast. Maybe upgrade to gigabit cards and router? Possibly put the db on a scsi drive (raid 10 for speed and redundancy)
Throwing advanced technology at simple problems is an expensive way to go and not always the answer!

First of all, the information that the whole table and the whole database is transferred across the network is simply incorrect. If the queries are indexed, then the search times should not go up that much over time.
As others have mentioned spending the time + money to setup and maintain and then have someone maintain and manage and support that database server is certainly a possibility here. However, keep in mind that simply migrating a JET based application to sql server in many cases will run slower, and in fact sql server is slower then JET when no network is involved.
So, I would take some time to ascertain why things slow down so much, and also check into how indexing is setup.
So, just keep in mind that it is pure folklore and myth that the whole tables and whole database is transferred over the network. This concept is ONLY DUE to most people really not having any computer training and not knowing and understanding how the JET data engine works.

I would probably move to either Microsoft SQL Server 2008 R2 Express Edition (free) or MySQL (free) if there is both funding and time to put in a data access layer. Because you will be making requests of a remote server and not operating on data at the local workstation this move is very involved from the development standpoint.
However you should analyze whether or not its more cost effective to perform your archival process quarterly or monthly, and just move the archive database to SQL Server 2008 R2 Express Edition. (You can install the Microsoft SQL Server Management Studio client tools on workstations and query the archival database for faster reports on historical data without rewriting your entire production application; similar solutions exist for using MySQL or other OSS/free RDBMS).

I have cilents with 300 mb databases although they should be upsizing to SQL Server for other reasons. 19 Mb is relatively small. If performance is bad enough that archiving speeds things up then check the indexes to the tables for all your sorting and selection fields. Albert gave you a good URL there to check.
Entire MDB files do not go down the wire. Unless you are missing indexes.

Instead of shipping the DB over the network to the client and then performing queries, you could instead write a small wrapper on the server that handles requests, looks up the result in the Access DB (using SQL + the Access ODBC driver), and returns the result. This avoids the overhead of a large migration you might not need and still gets rid of the basic problem the users are experiencing.
Moving to a "proper" database solution is the best long term solution, but if your needs scale linearly and slowly over the next 30 years, it's hard to justify an expensive migration. That said, if you expect to really ramp up, or want to be more "future-proof", migrating now will likely save money/time.

It is my understanding that if we were
using sql, the search time would not
go up as the table gets bigger because
the entire .mdb does not have to be
sent over the network each time. Is
this correct?
This general idea is true for almost all databases. The idea of a database is to separate your application from the actual data. The data resides in a database server. Your application doesn't.
Does anyone have any insight about
whether it could be worth it to go to
the trouble (time and money) of
migrating to a new db
Yes. Having proposed this many times. It's expensive. It's complicated. Your MS-Access database will never get better or faster.
Other database servers will (and can) get faster and more sophisticated. After all, you're not sending .MDB files through a network anymore. The limitations are reduced. You're working with standard SQL through ODBC. Any database will work at the end of ODBC. You can fire vendors to find better, faster, cheaper products. Once you stop using Access you have choices.
Either stop using Access now or plan to suffer with it forever. And remake this decision every year until the end of time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How SSIS Dataflow really works? - sql

Related

One-Time Synchronization with SQL Server database

MSSQL Automated Jobs

SQL Server full copy of database for read operations

Where is the bottleneck / what are the gotchas when selecting records from a remote (linked) SQL server?

should i advocate migrating from access to (my)sql

Categories

Resources