Advantage of backing up to multiple files? - db2-luw

I come from a Sybase background, and with it, if a backup to one file took 20 minutes, a backup to two files would take 10 minutes (plus a bit of overhead), four files would take 5 minutes (plus a bit more overhead), etc. I expected to see the same results with DB2 but it doesn't seem to be reducing the overall backup time at all. While not optimal, in both the Sybase and DB2 tests the files were all being written to the same filesystem. Am I misunderstanding what the multi-file backup achieves in DB2? Thanks.

When you take a look at the BACKUP DATABASE syntax and options you will notice that Db2 supports several storage targets (with respective options) as well as options on how the database data is read. The backup process consists of reading the relevant data from the database and writing it to the backup device.
For the reading part, there are options like BUFFER and PARALLELISM that impact performance and throughput. By default, if not specified by the user, Db2 tries to come up with good values. This is something you could look into.
Are you compressing or encrypting the backup file? Are you writing the backup to the same file system as your database is in? That would be more to consider.

Related

What is the difference between physical and logical backup? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 15 days ago.
The community reviewed whether to reopen this question 15 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I was reading about backup. I understood what physical backup is. But I am not able understand what logical backup is? how does it work?
Pictorial representation of the working would help.
Thanks in advance
Logical vs. Physical (Basic difference):
Logical backup is using SQL statements. Export using exp tool is logical.
Physical backup is copying the data files either when the database is up and running (HOT BACKUP) or when the database is shutdown (COLD BACKUP)
In other words,
physical backup is to copy for backing up all the physical files that belongs to database.(like data files,control files,log files, executables etc).
In logical backup, you don't take the copies of any physical things,you only extract the data from the data files into dump files.(ex : using export )
Read This Article
Physical Backup
The operating system saves the database files onto tape or some other media. This is useful to restire the system to an earlier point whenever needed.
Logical Backup
In logical backup technique, the IMPORT/EXPORT utilities are used to create the backup of the database. A logical backup backs-up the contents of the database. A logical backup can be used to restore the database to the last backup. However, unlike physical back, it should not be used to create an OS back up copy because restoring using this approach would make it possible to correct the damaged datafiles. Therefore in these situations physical backups should be preferred.
more types as... Cold & Hot backups under Physical Backup, is also explained there.
Logical vs. Physical Database Backups :
Once you’ve made a decision about your uptime requirements, you need to make decisions about what kind of data you will be backing up from your databases.
Physical Files, such as text files, are stored as a single document on your hard drive. Although databases consist of many complex elements, these are usually aggregated into simple files on your server’s hard drive. These files can easily be backed up just like any other files on your system.
Logical Data Elements such as tables, records and their associated meta data are stored across many different locations. Backups for tables and other logical database elements usually require special tools that are designed to work with your specific database platforms. Although these types of backups are more complex, they offer more granular recovery capabilities. This is especially true when doing point-in-time recovery of tables that involve complex transactions and inter-dependencies.
Logical database backups are critical for granular recovery of specific components. And Physical backups are useful for full disaster recovery scenarios.
The choice between Logical and Physical database backups should be covered as part of your Recovery Point Objectives. (RPOs)
In my understanding, a logical backup is just an export of one or more tables from the database. If it contains all tables of the database, one can use it to restore the state of the database at the time when the logical backup was made. One can also use it to import the tables into a different database. For instance, a script with CREATE TABLE and INSERT statements would be a possible file format for a logical backup (as used by MySQL - Oracle has its own file format for export files).
A physical backup is a copy of the internal database files. Only this permits to use the log files to restore the database to the last second before the media failure (i.e. to a much later time than the time of the backup - as long as one has a copy of all log files that were written since the time of the backup). I.e. only this is the "real backup" one usually expects from a database that is constantly updated.
(Just for safety: Note that a copy of the internal database files while the database is running will be of no help, unless special precautions are taken: Since the copying takes some time, it will give an inconsistent view of the files. Check the manual of your DBMS for "hot backups" if you cannot shut down the DBMS before copying the files. It is also essential to protect the log files, e.g. by duplicating them on two independent disks. In Oracle, you must switch to ARCHIVELOG mode to make sure that the log files are not overwritten after some time. In general, being really prepared for a media failure needs a lot of knowledge and also practical tests on a different computer. A logical backup is probably simpler and there is less risk that it turns out to be completely unusable when one needs it, because the file format is simpler. However, long ago, I destroyed German national characters in an Oracle export, because at that time ASCII was the default character encoding.)

How does SQL Server handle large physical database file

Coming from MySQL and PostgreSQL, i would very much like to know how SQL Server stores and handles large physical database file.
According to this article here
http://msdn.microsoft.com/en-us/library/aa174545%28SQL.80%29.aspx
SQL Server has 3 types of file, .mdf, .ndf and .ldf
Due to the nature of how data grows, a database can contain hundreds of thousand of files. This would eventually affect the size of these .mdf.
So the question is, how does SQL Server handle large physical database files?
I might seem to ask a lot of question, but i would like to have an answer also covers the sub-question below:
Theoretically, .mdf filesize could grow to GB or perhaps TB. Is this common in real world scenario?
Since SQL Server deals with a single file, it would have a considerably large read/write operation performed on the same file. How would this impact the performance?
Is it possible (has there been any case) to split .mdf into parts. Instead of having 1 uber large .mdf file, would it be better to split it into chunks?
note: I am new to SQL Server, basic query in SQL Server appears to be similar to MySQL, I would like to know a bit about what is going on "under the hood".
1 Theoretically, mdf filesize could grow to GB or perhaps TB. Is this
common in real world scenario?
Yes, it is common. It depends on amount of read-write operations per second and your disk subsystem. Nowadays, a database with size of hundreds GB is considered to be small.
2 Since MSSQL deals with single file, it would have a considerably
large read/write operation performed on the same file. How would this
impact the performance?
This is one of the most common performance bottlenecks. You need to choose appropriate disk subsystem and maybe divide your database into several filegroups and place them on different disk subsystems.
3 Is it possible (has there been any case) to split mdf into parts.
Instead of having 1 uber large mdf file, would it be better to split
it into chunks?
Yes you can. This "chunks" are called filegroups. You can create different tables, indexes, objects or even parts of tables in different filegroups (if version and edition of SQL-Server allows it). But it will give you advantage only if you create filegroups across multiple disks, RAIDs and so on. For more information you can read Using Files and Filegroups

What about performance of cursors,reindex and shrinking?

i am having recently came to know that sql server if i delete one column or modify it acquires space at backend so i need to reindex and shrink the database and i have done it and my datbase size reduced to
2.82 to 1.62
so its good like wise so now i am in a confusion
so in my mind many questions regarding this subject occurs pls help me about this one
1. So it is necessary to recreate indexes(refresh ) after particular interval
It is necessary to shrink database after particular time so performance will be up to date?
If above yes then what particular time should i refresh (Shrink) my database?
i am having no idea what should be done for disk spacing problem i am having 77000 records it takes 2.82gb dataspace which is not acceptable i am having two tables of that one only with one table nvarchar(max) so there should be minimum spaces to database can anyone help me on this one Thanks in advance
I am going to simplify things a little for you so you might want to read up about the things I talk about in my answer.
Two concepts you must understand. Allocated space vs free space. A database might be 2GB in size but it is only using 1GB so it has allocated 2GB with 1GB free space. When you shrink a database it removes the free space so free space should be about 0. Dont think smaller file size is faster. As you database grows it has to allocate space again. When you shrink the file and then it grows every so often it cannot allocate space in a contiguous fashion. This will create fragmentation of the files which slows you down even more.
With data files(.mdb) files this is not so bad but with the transaction log shrinking the log can lead to virtual log file fragmentation issues which can slow you down. So in a nutshell there is very little reason to shrink your database on a schedule. Go read about Virtual Log Files in SQL Server there are a lot of articles about it. This is a good article about shrink log files and why it is bad. Use it as a starting point.
Secondly indexes get fragmented over time. This will lead to bad performance of SELECT queries mainly but will also affect other queries. Thus you need to perform some index maintenance on the database. See this answer on how to defragment your indexes.
Update:
Well the time you rebuild indexes is not clear cut. Index rebuilds lock the index during the rebuild. Essentially they are offline for the duration. In your case it would be fast 77 000 rows is nothing for SQL server. So rebuilding the indexes will consume server resources. IF you have enterprise edition you can do online index rebuilding which will NOT lock the indexes but will consume more space.
So what you need to do is find a maintenance window. For example if your system is used from 8:00 till 17:00 you can schedule maintenance rebuilds after hours. Schedule this with SQL server agent. The script in the link can be automated to run.
Your database is not big. I have seen SQL server handle tables of 750GB without taking strain if the IO is split over several disks. The slowest part of any database server is not the CPU or the RAM but the IO pathways to the disks. This is a huge topic though. Back to your point you are storing data in NVARCHAR(MAX) fields. I assume this is large text. So after you shrink the database you see the size at 1,62GB which means that each row in your database is about 1,62/77 000 big or roughly 22Kb big. This seems reasonable. Export the table to a text file and check the size you will be suprised it will probably be larger than 1,62GB.
Feel free to ask more detail if required.

Writing small data to file and reading from it or query the database

I have a situation where I would have to query the database for some data which is the same for all users and changes daily, so I figured I could make a file in which I would save this data (once per day) and then load it from that file each time a user visits my site.
Now, I know that this is a common practice (caching) when the requests from database are big but the data I'm about to write to the file is a simple 3 digit number, so my question is should this still be faster or is it just an overkill and I should stick with the database query?
Caching, when done right, is always faster.
It depends how long storing and retrieving data from the file takes and how long requests to the database takes.
If the database query to get the number takes long, then caching may be a good idea, since the data is small.
If you were to do a search (e.g. sequential) in a file with lots of cached data (which doesn't seem to be the case), it would take long.
Disk I/O could be slower than database I/O (which is unlikely, unless it's a local DB).
Bottom line - benchmark.
For your scenario, caching is probably a good idea, but if it's only a single 3-digit number for all users then I'd just try to stick it in RAM rather than in a file.

Is there a reverse-incremental backup solution with built-in redundancy (e.g. par2)?

I'm setting a home server primarily for backup use. I have about 90GB of personal data that must be backed up in the most reliable manner, while still preserving disk space. I want to have full file history so I can go back to any file at any particular date.
Full weekly backups are not an option because of the size of the data. Instead, I'm looking along the lines of an incremental backup solution. However, I'm aware that a single corruption in a set of incremental backups makes the entire series (beyond a point) unrecoverable. Thus simple incremental backups are not an option.
I've researched a number of solutions to the problem. First, I would use reverse-incremental backups so that the latest version of the files would have the least chance of loss (older files are not as important). Second, I want to protect both the increments and backup with some sort of redundancy. Par2 parity data seems perfect for the job. In short, I'm looking for a backup solution with the following requirements:
Reverse incremental (to save on disk space and prioritize the most recent backup)
File history (kind of a broader category including reverse incremental)
Par2 parity data on increments and backup data
Preserve metadata
Efficient with bandwidth (bandwidth saving; no copying the entire directory over for each increment). Most incremental backup solutions should work this way.
This would (I believe) ensure file integrity and relatively small backup sizes. I've looked at a number of backup solutions already but they have a number of problems:
Bacula - Simple normal incremental backups
bup - incremental and implements par2 but isn't reverse incremental and doesn't preserve metadata
duplicity - incremental, compressed, and encrypted but isn't reverse incremental
dar - incremental and par2 is easy to add, but isn't reverse incremental and no file history?
rdiff-backup - almost perfect for what I need but it doesn't have par2 support
So far I think that rdiff-backup seems like the best compromise but it doesn't support par2. I think I can add par2 support to backup increments easily enough since they aren't modified each backup but what about the rest of the files? I could generate par2 files recursively for all files in the backup but this would be slow and inefficient, and I'd have to worry about corruption during a backup and old par2 files. In particular, I couldn't tell the difference between a changed file and a corrupt file, and I don't know how to check for such errors or how they would affect the backup history. Does anyone know of any better solution? Is there a better approach to the issue?
Thanks for reading through my difficulties and for any input you can give me. Any help would be greatly appreciated.
http://www.timedicer.co.uk/index
Uses rdiff-backup as the engine. I've been looking at it, but that requires me to set up a "server" using linux or a virtual machine.
Personally, I use WinRAR to make pseudo-incremental backups (it actually makes a full backup of recent files) run daily by a scheduled task. It is similarly a "push" backup.
It's not a true incremental (or reverse-incremental) but it saves different versions of files based on when it was last updated. I mean, it saves the version for today, yesterday and the previous days, even if the file is identical. You can set the archive bit to save space, but I don't bother anymore as all I backup are small spreadsheets and documents.
RAR has it's own parity or recovery record that you can set in size or percentage. I use 1% (one percent).
It can preserve metadata, I personally skip the high resolution times.
It can be efficient since it compresses the files.
Then all I have to do is send the file to my backup. I have it copied to a different drive and to another computer in the network. No need for a true server, just a share. You can't do this for too many computers though as Windows workstations have a 10 connection limit.
So for my purpose, which may fit yours, backs up my files daily for files that have been updated in the last 7 days. Then I have another scheduled backup that backups files that have been updated in the last 90 days run once a month or every 30 days.
But I use Windows, so if you're actually setting up a Linux server, you might check out the Time Dicer.
Since nobody was able to answer my question, I'll write a few possible solutions I found while researching the topic. In short, I believe the best solution is rdiff-backup to a ZFS filesystem. Here's why:
ZFS checksums all blocks stored and can easily detect errors.
If you have ZFS set to mirror your data, it can recover the errors by copying from the good copy.
This takes up less space than full backups, even though the data is copied twice.
The odds of an error in both the original and mirror is tiny.
Personally I am not using this solution as ZFS is a little tricky to get working on Linux. Btrfs looks promising but hasn't been proven stable from years of use. Instead, I'm going with a cheaper option of simply checking hard drive SMART data. Hard drives should do some error checking/correcting themselves and by monitoring this data I can see if this process is working properly. It's not as good as additional filesystem parity but better than nothing.
A few more notes that might be interesting to people looking into reliable backup development:
par2 seems to be dated and buggy software. zfec seems like a much faster modern alternative. Discussion in bup occurred a while ago: https://groups.google.com/group/bup-list/browse_thread/thread/a61748557087ca07
It's safer to calculate parity data before even writing to disk. i.e. don't write to disk, read it, and then calculate parity data. Do it from ram, and check against the original for additional reliability. This might only be possible with zfec, since par2 is too slow.