I read it somewhere"
"A separate database should be created
if SSIS logging is required. (Do not
use the Sysdtslog90 table in either
master or msdb. This is not a security
related concern but could be a
performance issue since SSIS can
generate a lot of logging data.
Microsoft recommends creating a
separate database for logging."
Why? Just to keep things separate and be more organized or there is a deeper meaning to this?
Regards
Manjot
a performance issue since SSIS can
generate a lot of logging data
To be able to finely control the I/O path of the logging data: on which LUN, how many spindles etc.
If it's generating a lot of log information, it may compete with the other database functions if they are sharing the same physical disks. This means both your logging and your database could take a performance hit.
Related
Here is the use-case: we need to backup some of the tables from a client server, copy it to our servers, restore it, then running some queries using ODBC.
I managed to do this process for the entire database by using probkup for backup, prorest for restore and proserve to make it accessible for SQL queries.
However, some of the databases are big (> 8GB), so we are looking for a solution to do the backup for only the tables we need. I didn't find anything with the documentation of probkup how this can be done.
Progress only supports full database backups.
To get the effect that you are looking for you could dump (export) the tables that you want and then load them into an empty database.
"proutil dump" and "proutil load" are where you want to start digging.
The details will vary depending on exactly what you want to do and what resources and capabilities you have available to you.
Another option would be to replicate the tables in question to a partial database. Progress has a product called "pro2" that can help with that. It is usually pointed at SQL targets but you could also point it at a Progress database.
Or, if you have programming skills, you could put together a solution using replication triggers (under the covers that's what pro2 does...)
probkup and prorest are block-level programs and can't do a backup or restore by table.
To do what you're asking for, you'll need to do a dump the data from the source db's tables and then load it into the target db.
If your object is simply to maintain a copy of the DB, you might also try incremental backups. Depending upon your situation, that might speed things up a bit.
Other options include various forms of DB replication, which allow you to keep real- or near-real-time copies of your database.
OpenEdge Replication. With the correct license, you can do query-only access on the replication target, which is good for reporting and analysis.
Third-party replication products. These can be more flexible in terms of both target DBs and limiting the tables to be replicated.
Home-grown replication (by copying and applying AI files). This is not terribly complicated, but you have to factor the cost of doing the work and maintaining the system. There are some scripts out there that can get you started.
Or, as Tom said, you can get clever with replication via triggers.
When I want to move data between two databases, what better choice.
A) Linked Servers
database local-> Linked Servers -> database azure
b) ETL - SSIS
database local create procedure xml -> integration service -> serialize xml to object c#> call wcf service async(queue-servicebus) -> persist database azure
The following link addresses the pros and cons of Linked Servers vs. SSIS, with a recommendation that Linked Servers are best applied in moderation for queries.
https://dba.stackexchange.com/questions/5712/whats-the-difference-between-linked-server-solution-and-ssis-solution
It really boils down to how much data your are looking at moving from one database to another and for what purpose. That is, are you dealing with real-time data that must be acquired for an interface? It must be considered on a case-by-case basis. In my development environment, real-time is not required when pulling information from other sources into the database. In this case, SSIS works best and it provides a great log of the package applications throughout the day.
Additional observations:
SSIS is typically faster using BULK INSERTS and has better security benefits.
Linked Servers can create disaster recovery issues and can pose a problem when moving code between environments where one or more servers may not be available.
Lastly, I recommend that you speak with your DBA about applying Linked Servers. The DBA's I've worked with in the past have mostly been apprehensive with the responsibility of maintaining their application. This is one of those "could" vs. "should" issues in development where you must focus on the impact to the system as whole.
When we use Linked Servers, there are also options to use BULK INSERT. In this case, SSIS won't be faster (in many cases it's even slower).
SSIS has some limitations in certain implementations:
- cross domains issues when the domains are not trusted (when we call the packages, SSIS does not work with SQL authentication)
- not easy to automate when the schema changes
- if transformations are required, TSQL is generally faster.
- SSIS with integrated CDC Data Sources works incorrectly and slow in certain scenarios. Confirmed by Microsoft, the issues are not yet fixed (SQL 2014/2016)
As mentioned above, it should be "must be considered on a case-by-case basis". There is no 'YES' or 'NO' here.
I have two very busy tables in an email dispatch system. One is for batching mail for dispatch, the other is used for logging. Expensive queries are ran that use both of these tables to produce stats for a UI. I would like to remove the reporting overhead on these tables as I am seeing timeouts during report generation.
My question is - what are my options for reducing the query overhead on these two tables while generating the report data.
I've considered using triggers to create exact copies of the tables. Is there any built in functionality in SQL server for mirroring data within a database? If I can avoid growing the database unnecessarily though that would be an advantage. It doesn't matter if the stats are not real time.
There is a built in functionality for this scenario and it's known as Database Snapshot.
If you run a query against a DB snapshot table, no shared locks should be created on original database.
You can use Resource Governor for SQL Server. Unfortunately, I have only read about it and haven't used it yet. It is used to isolate workloads on SQL Server.
Please try and let us know if it helps.
Some helpful links: MSDN SQLBlog technet
Kind Regards,
Sumit
Please advise what suits my problem better. I have a highload web app hosted on the same server where SQL server is hosted. I also have SQL Service reporting running on the same server, generating user reports.
So my server basically works on top of disk read/write speed. I'm going to get another server and install there another SQL server in order to host SSRS there. So my criteria is to get as fresh data as it possible.
I've looked a couple of solution, currently I do make backup via jobs, copy it to second server and restore it there, also via jobs. But that's not the best solution.
All replication mechanism(transaction, merge, snapshot) affect publisher database by locking it's table, what is unacceptable for me.
So I wonder is there any possibility to create a replica with read only access, that would be synced periodically not affecting main db? I would put all report load to that replica and make my primary db be used only by web app.
What solution might suit my problem? As I'm not a DBA, I'd start investigating that direction. Thanks.
Transactional Replication is typically used to off-load reporting to another server/instance and can be near real-time in a best case scenario. The benefit of Transactional Replication is you can place different indexes on the subscriber(s) to optimize reporting. You can also choose to replicate only a portion of the data if only a subset is needed for reporting.
The only time locking occurs with Transactional Replication is when you generate a snapshot. With concurrent snapshot processing, which is the default for Transactional Replication, the shared locks are only held for a short period of time, so users are able to continue working uninterrupted. Either way, this shouldn't be an issue since you'll likely be generating the snapshot during a period of low user activity anyway.
I have a database server with few main databases, and few dozens of small ones.
These small databases are kind of intermediary/staging databases for data import from various sources into main database. Data import is a daily task. They are all quite similar in structure as the implementation of these data imports are similar, so basically they have a configuration tables, which define mapping, conversions etc, and the data tables, which contain the results of the import.
Some time ago there have been only the handful of small ones, but now I have more then 20 of them will grow further with the number of supported data feeds.
I have just migrated all the server environment to SQL Server 2008, and having some time now for clean-up/refactoring, I am thinking to merge all of data-import databases into just one database, and use database schema to separate them.
Question-0: Any other ideas for the described situation?
Question-1: Shall I change from a separate database to a separate schema?
Question-2: !!!: Any tricky thing to be careful about in database schema implementation?
Edit-1: highlighted question-2 as the most 'unanswered' currently.
In your instance, I would probably put merge the databases into one. I don't really see a reason to have them separated, and merging them will reduce the amount of work you have to do to support backups etc. If you were importing data from a data source once and then never using the staging tables again, I could see the reason to bring up separate databases to handle the data transformation. Since you use these tables on an ongoing basis, I would much rather keep them together so that I only have to go to one place to find the full end to end state of the production data and the data load states.
2008 is really good at handling database partitioning too, if the db gets too large, or you need to separate data for security reasons you get the benefit of having a single db with the advantages like having several smaller ones. You won't get that with multiple smaller dbs.
When we migrated we had a very similar situation and I ended up moving everything into one some-what large Importing database like you have hinted towards. We did not, however, separate them using schemas.
Because the database is the unit of referential integrity and backup, if you are bringing in large amounts of data for staging which does not need to be backed up on the same schedule, it might be easiest to keep it in a separate DB.
You can use a single DB with multiple file groups and different backups, but it will require a lot more design.
The basic factors this will depend on are: recovery model, backup objectives, usage patterns and amount of effort to design and maintain your file group design.
All the prior answers work for me, particularly your comment about selectively combining databases -- if some are very busy, very large, or process sensitive data, you might want to keep them separate, or in separate groupings. This would make it easier to configure backups/restores and disk/drive allocation (give the busy ones their own set of spindles).
Like possibly most database developers, I have dealt almost exclusively with objects in the dbo schema, but I have done some recent work with other schemas. The main gotcha I've encountered is remembering to always specify the schema when referring to any database object. Never assume that any given connection will reference an object in the schema you want it to--always be clear and precise!
I would put all your import staging tables in one database separate from your regular production databse as the backup needs may be very different. This database should also contains things like your configuration management for SSIS packages, any logging tables, any import metadata tables (we keep track of every run of the imports and the status of that run as well as a bazillion other things about the import like the filename, the normal file size, etc. Comes in handy for researching problems and for adding checks to the processing. We usea a schema that is by client and then an additional schema for objects realted to the importing/exporting process (logs, meta data etc.)