What is the difference between physical and logical backup? [closed]

What is the difference between physical and logical backup? [closed] - backup

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 15 days ago.
The community reviewed whether to reopen this question 15 days ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I was reading about backup. I understood what physical backup is. But I am not able understand what logical backup is? how does it work?
Pictorial representation of the working would help.
Thanks in advance

Logical vs. Physical (Basic difference):
Logical backup is using SQL statements. Export using exp tool is logical.
Physical backup is copying the data files either when the database is up and running (HOT BACKUP) or when the database is shutdown (COLD BACKUP)
In other words,
physical backup is to copy for backing up all the physical files that belongs to database.(like data files,control files,log files, executables etc).
In logical backup, you don't take the copies of any physical things,you only extract the data from the data files into dump files.(ex : using export )
Read This Article
Physical Backup
The operating system saves the database files onto tape or some other media. This is useful to restire the system to an earlier point whenever needed.
Logical Backup
In logical backup technique, the IMPORT/EXPORT utilities are used to create the backup of the database. A logical backup backs-up the contents of the database. A logical backup can be used to restore the database to the last backup. However, unlike physical back, it should not be used to create an OS back up copy because restoring using this approach would make it possible to correct the damaged datafiles. Therefore in these situations physical backups should be preferred.
more types as... Cold & Hot backups under Physical Backup, is also explained there.
Logical vs. Physical Database Backups :
Once you’ve made a decision about your uptime requirements, you need to make decisions about what kind of data you will be backing up from your databases.
Physical Files, such as text files, are stored as a single document on your hard drive. Although databases consist of many complex elements, these are usually aggregated into simple files on your server’s hard drive. These files can easily be backed up just like any other files on your system.
Logical Data Elements such as tables, records and their associated meta data are stored across many different locations. Backups for tables and other logical database elements usually require special tools that are designed to work with your specific database platforms. Although these types of backups are more complex, they offer more granular recovery capabilities. This is especially true when doing point-in-time recovery of tables that involve complex transactions and inter-dependencies.
Logical database backups are critical for granular recovery of specific components. And Physical backups are useful for full disaster recovery scenarios.
The choice between Logical and Physical database backups should be covered as part of your Recovery Point Objectives. (RPOs)

In my understanding, a logical backup is just an export of one or more tables from the database. If it contains all tables of the database, one can use it to restore the state of the database at the time when the logical backup was made. One can also use it to import the tables into a different database. For instance, a script with CREATE TABLE and INSERT statements would be a possible file format for a logical backup (as used by MySQL - Oracle has its own file format for export files).
A physical backup is a copy of the internal database files. Only this permits to use the log files to restore the database to the last second before the media failure (i.e. to a much later time than the time of the backup - as long as one has a copy of all log files that were written since the time of the backup). I.e. only this is the "real backup" one usually expects from a database that is constantly updated.
(Just for safety: Note that a copy of the internal database files while the database is running will be of no help, unless special precautions are taken: Since the copying takes some time, it will give an inconsistent view of the files. Check the manual of your DBMS for "hot backups" if you cannot shut down the DBMS before copying the files. It is also essential to protect the log files, e.g. by duplicating them on two independent disks. In Oracle, you must switch to ARCHIVELOG mode to make sure that the log files are not overwritten after some time. In general, being really prepared for a media failure needs a lot of knowledge and also practical tests on a different computer. A logical backup is probably simpler and there is less risk that it turns out to be completely unusable when one needs it, because the file format is simpler. However, long ago, I destroyed German national characters in an Oracle export, because at that time ASCII was the default character encoding.)

Related

Advantage of backing up to multiple files?

I come from a Sybase background, and with it, if a backup to one file took 20 minutes, a backup to two files would take 10 minutes (plus a bit of overhead), four files would take 5 minutes (plus a bit more overhead), etc. I expected to see the same results with DB2 but it doesn't seem to be reducing the overall backup time at all. While not optimal, in both the Sybase and DB2 tests the files were all being written to the same filesystem. Am I misunderstanding what the multi-file backup achieves in DB2? Thanks.

When you take a look at the BACKUP DATABASE syntax and options you will notice that Db2 supports several storage targets (with respective options) as well as options on how the database data is read. The backup process consists of reading the relevant data from the database and writing it to the backup device.
For the reading part, there are options like BUFFER and PARALLELISM that impact performance and throughput. By default, if not specified by the user, Db2 tries to come up with good values. This is something you could look into.
Are you compressing or encrypting the backup file? Are you writing the backup to the same file system as your database is in? That would be more to consider.

Is it wrong to write byte of images in the database?

When should I make this direct recording at the bank?
What are the situations?
I know I can record the path of the image in the bank.

In addition to the cost being higher as mentioned, one must take into account several factors:
Data Volume: For a low volume of data there may be no problem. On the other hand, for mass storage of data the database is practically unfeasible.
Clustering: One advantage of the database is if your system runs on multiple servers, everyone will have uniform access to the files.
Scalability: If demand for volume or availability increases, can you add more capacity to the system? It is much easier to split files between different servers than to distribute records from one table to more servers.
Flexibility: Backing up, moving files from one server to another, doing some processing on the stored files, all this is easier if the files are in a directory.
There are several strategies for scaling a system in terms of both availability and volume. Basically these strategies consist of distributing them on several different servers and redirecting the user to each of them according to some criteria. The details vary of implementation, such as: data update strategy, redundancy, distribution criteria, etc.
One of the great difficulties in managing files outside BD is that we now have two distinct data sources that need to be always in sync.
From the safety point of view, there is actually little difference. If a hacker can compromise a server, it can read both the files written to disk of your system and the files of the database system. If this question is critical, an alternative is to store the encrypted data.

I also convert my images into byte array and store them in an sql server database but in the long run, I am sure that someone will ask you and tell you that you should only save the (server) path of the image.
The biggest disadvantage of storing as binary I think is
Retrieving images from database is significantly more expensive compared to using the file system

How does SQL Server handle large physical database file

Coming from MySQL and PostgreSQL, i would very much like to know how SQL Server stores and handles large physical database file.
According to this article here
http://msdn.microsoft.com/en-us/library/aa174545%28SQL.80%29.aspx
SQL Server has 3 types of file, .mdf, .ndf and .ldf
Due to the nature of how data grows, a database can contain hundreds of thousand of files. This would eventually affect the size of these .mdf.
So the question is, how does SQL Server handle large physical database files?
I might seem to ask a lot of question, but i would like to have an answer also covers the sub-question below:
Theoretically, .mdf filesize could grow to GB or perhaps TB. Is this common in real world scenario?
Since SQL Server deals with a single file, it would have a considerably large read/write operation performed on the same file. How would this impact the performance?
Is it possible (has there been any case) to split .mdf into parts. Instead of having 1 uber large .mdf file, would it be better to split it into chunks?
note: I am new to SQL Server, basic query in SQL Server appears to be similar to MySQL, I would like to know a bit about what is going on "under the hood".

1 Theoretically, mdf filesize could grow to GB or perhaps TB. Is this
common in real world scenario?
Yes, it is common. It depends on amount of read-write operations per second and your disk subsystem. Nowadays, a database with size of hundreds GB is considered to be small.
2 Since MSSQL deals with single file, it would have a considerably
large read/write operation performed on the same file. How would this
impact the performance?
This is one of the most common performance bottlenecks. You need to choose appropriate disk subsystem and maybe divide your database into several filegroups and place them on different disk subsystems.
3 Is it possible (has there been any case) to split mdf into parts.
Instead of having 1 uber large mdf file, would it be better to split
it into chunks?
Yes you can. This "chunks" are called filegroups. You can create different tables, indexes, objects or even parts of tables in different filegroups (if version and edition of SQL-Server allows it). But it will give you advantage only if you create filegroups across multiple disks, RAIDs and so on. For more information you can read Using Files and Filegroups

Possible differences between two databases with different sizes but same set of data

I have updated two databases in MSSQL Server 2008R2 using liquibase.
Both of them start with the same database, but one ran through several liquibase updates until the final one incrementally, the other just go straight to the final update.
So I have checked they have the same schema, same set of data, but their .mdf file sizes are 10GB apart.
What areas (best to provide also the SQL command) I can look into to investigate what possibly gives me this 10GB difference (e.g. Index? Unused empty spaces? etc...)
I am not trying to make them the same (so no Shrink), I just want to find out the places that contribute to this 10GB size difference. So I will accept answers like using HEX editor to open up the mdf files and compare byte by byte, but I need to know what am I looking at.
Thank you

The internal structure (physical organization, not logical data) of databases is opaque both by design and due to the real-world scenarios that affect how data is created, updated and accessed.
In most cases there is literally no telling why two logically equivalent databases are different on a physical level. It is some combination of deleted objects, unbalanced pages, disk-based temporary tables, history of garbage collection, and many other potential causes.
In short, you would never expect a physical database to be 1:1 with the logical data it contains.

Is there a reverse-incremental backup solution with built-in redundancy (e.g. par2)?

I'm setting a home server primarily for backup use. I have about 90GB of personal data that must be backed up in the most reliable manner, while still preserving disk space. I want to have full file history so I can go back to any file at any particular date.
Full weekly backups are not an option because of the size of the data. Instead, I'm looking along the lines of an incremental backup solution. However, I'm aware that a single corruption in a set of incremental backups makes the entire series (beyond a point) unrecoverable. Thus simple incremental backups are not an option.
I've researched a number of solutions to the problem. First, I would use reverse-incremental backups so that the latest version of the files would have the least chance of loss (older files are not as important). Second, I want to protect both the increments and backup with some sort of redundancy. Par2 parity data seems perfect for the job. In short, I'm looking for a backup solution with the following requirements:
Reverse incremental (to save on disk space and prioritize the most recent backup)
File history (kind of a broader category including reverse incremental)
Par2 parity data on increments and backup data
Preserve metadata
Efficient with bandwidth (bandwidth saving; no copying the entire directory over for each increment). Most incremental backup solutions should work this way.
This would (I believe) ensure file integrity and relatively small backup sizes. I've looked at a number of backup solutions already but they have a number of problems:
Bacula - Simple normal incremental backups
bup - incremental and implements par2 but isn't reverse incremental and doesn't preserve metadata
duplicity - incremental, compressed, and encrypted but isn't reverse incremental
dar - incremental and par2 is easy to add, but isn't reverse incremental and no file history?
rdiff-backup - almost perfect for what I need but it doesn't have par2 support
So far I think that rdiff-backup seems like the best compromise but it doesn't support par2. I think I can add par2 support to backup increments easily enough since they aren't modified each backup but what about the rest of the files? I could generate par2 files recursively for all files in the backup but this would be slow and inefficient, and I'd have to worry about corruption during a backup and old par2 files. In particular, I couldn't tell the difference between a changed file and a corrupt file, and I don't know how to check for such errors or how they would affect the backup history. Does anyone know of any better solution? Is there a better approach to the issue?
Thanks for reading through my difficulties and for any input you can give me. Any help would be greatly appreciated.

http://www.timedicer.co.uk/index
Uses rdiff-backup as the engine. I've been looking at it, but that requires me to set up a "server" using linux or a virtual machine.
Personally, I use WinRAR to make pseudo-incremental backups (it actually makes a full backup of recent files) run daily by a scheduled task. It is similarly a "push" backup.
It's not a true incremental (or reverse-incremental) but it saves different versions of files based on when it was last updated. I mean, it saves the version for today, yesterday and the previous days, even if the file is identical. You can set the archive bit to save space, but I don't bother anymore as all I backup are small spreadsheets and documents.
RAR has it's own parity or recovery record that you can set in size or percentage. I use 1% (one percent).
It can preserve metadata, I personally skip the high resolution times.
It can be efficient since it compresses the files.
Then all I have to do is send the file to my backup. I have it copied to a different drive and to another computer in the network. No need for a true server, just a share. You can't do this for too many computers though as Windows workstations have a 10 connection limit.
So for my purpose, which may fit yours, backs up my files daily for files that have been updated in the last 7 days. Then I have another scheduled backup that backups files that have been updated in the last 90 days run once a month or every 30 days.
But I use Windows, so if you're actually setting up a Linux server, you might check out the Time Dicer.

Since nobody was able to answer my question, I'll write a few possible solutions I found while researching the topic. In short, I believe the best solution is rdiff-backup to a ZFS filesystem. Here's why:
ZFS checksums all blocks stored and can easily detect errors.
If you have ZFS set to mirror your data, it can recover the errors by copying from the good copy.
This takes up less space than full backups, even though the data is copied twice.
The odds of an error in both the original and mirror is tiny.
Personally I am not using this solution as ZFS is a little tricky to get working on Linux. Btrfs looks promising but hasn't been proven stable from years of use. Instead, I'm going with a cheaper option of simply checking hard drive SMART data. Hard drives should do some error checking/correcting themselves and by monitoring this data I can see if this process is working properly. It's not as good as additional filesystem parity but better than nothing.
A few more notes that might be interesting to people looking into reliable backup development:
par2 seems to be dated and buggy software. zfec seems like a much faster modern alternative. Discussion in bup occurred a while ago: https://groups.google.com/group/bup-list/browse_thread/thread/a61748557087ca07
It's safer to calculate parity data before even writing to disk. i.e. don't write to disk, read it, and then calculate parity data. Do it from ram, and check against the original for additional reliability. This might only be possible with zfec, since par2 is too slow.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas