What is sql-dump for? - sql

I know that a SQL dump is a series of insert SQL statements which reflect all the records inside the database. But what is it used for? Why should we dump the database records? Does every database support a dumping function?

Somewhat strangely, this is actually the usual way to back up a database. Copying the files themselves that actually hold the data is not the usual backup method, for various complicated reasons.
All relational databases work this way, or at least I've never heard of one that doesn't: they all have a facility to export a bunch of SQL code that, when executed, will recreate the database in the same state it was in when the dump was started.
However these various formats are generally incompatible, due to subtle differences between the various dialects of SQL used by the different database systems. There are utilities that can convert between some of them, but I'm not aware of any 'Rosetta Stone' that handles every possible case.
As well as being the primary method of backing up a database, this technique is also useful when staging the data of db apps between different servers, ie from development to testing to production.

mysqldump produces an SQL representation of the data for one or more tables or databases. As the format is SQL, it will run on any other MySQL server, regardless of architecture or major/minor version (obviously, views won't work on 4.x etc. but it is mostly forwards compatible).
There is another tool, mysqlhotcopy, but as this tool produces binary files, they are tied to the machine they have been generated on, and cannot be used elsewhere. SQL has the advantage of running on any MySQL server, and being independent of the underlying file storage mechanism of the database(s).
The two main use cases for dumping SQL are:
Backing up the database data. The SQL can be read in ("played back") to an empty database server and it will re-create the tables and populate them with rows.
Migrating the data to another server. Say you are upgrading from MySQL 5.0 to 5.1. You have two machines. You use mysqldump to produce an SQL dump on the 5.0 machine, and feed it into the 5.1.
There are some less common uses. For example, SQL snapshot of your application's database could be taken for unit testing against a known state. It is also possible to transform SQL code into another dialect, e.g. PostgeSQL or SQLite, to port your data to another database.
You asked if other databases provide SQL dump functionality. The answer is yes in almost all cases. PostgreSQL provides pg_dump, SQLite has a .dump command, etc.

Related

If I'm using PostgreSQL, do I need a server too? Like AWS RDS?

In my CS program, I was told I should learn SQL for my databases.
If I'm using PostgreSQL, do I also need a SQL server to go along with it? Is PostgreSQL a language, a server, or both? Is there even a SQL language or is it only servers?
Background: I downloaded Postgres because hey, that has SQL in the name, it works and I'm under the impression it's a pretty good choice anyway. But I couldn't figure out through their website if it needs a companion server, so I went looking for one and found AWS RDS.
The impression I have is that Postgres is the language and AWS RDS is the server, and they serve different functions. But I'm not sure about any of that.
Seems you're learning too many new topics at the same time.
Ok. I'll try to answer.
SQL stands for 'Structured Query Language', and serves as a 'standard' for many vendors that in much ways respects its fundamentals. Oracle, MySQL (now owned by Oracle), MariaDB and PostreSQL are some vendors.
Main thing with SQL code I would recommend you to identify every time you look at it, is to understand if it belongs to DML or DDL. DML stands for 'Data Manipulation Language' and refers to SQL instructions which 'modifies' data. DDL stands for 'Data Declaration Language' which defines or 'alter' de structure on which data will be stored.
Another important concept is atomicity of data manipulation. You can confirm a change or roll it back before it is persisted. This thing corresponds to 'commit' changes or do a 'rollback'. It's some kind of advanced concept, but generally happens "automatically" with standard client configurations. Later, you would have to know about it while programming some system module which interacts with databases.
When you think of the SQL 'server', it refers to the software configured/installed which has the responsability of manage persistence of data within some kind of 'instance' of persistence, allocated in some system with data storage capabilities. AWS implements this service in the cloud, and RDS is the product which supports many kind of SQL flavors to choose (Oracle, Postgresql, etc.)
If you are comfortable with Docker, I recomend you learn the basics which would help you setup and destroy databases many times, which is useful to develop and test locally. Next command, let you start a Postgresql database configured with open port 5432. You can see the server log through docker and use some SQL client to get connected. When you press Ctrl+C everything will be deleted. Of course there are other ways to keep data persistent, but this command would be an easy starting point.
$ docker run --rm -p 5432:5432 --name some-postgres-container-name -e POSTGRES_PASSWORD=mysecretpassword postgres:13.3
Side note: it's better to get used to work with specific docker image versions always (not 'latest').
More details of it usage here: https://hub.docker.com/_/postgres/
if I'm using PostgreSQL, do I also need a SQL server to go along with
it? Is PostgreSQL a language, a server, or both? Is there even a SQL
language or is it only servers lol? I'm genuinely trying to figure
this out myself, but basically everything I read is beyond my scope of
competence and confuses me more. I'm learning the syntax of SQL well
enough, but I'm so confused about everything on the most fundamental
level.
By the way "SQL Server" is Microsoft's SQL flavor, just another one. Don't be confused with the concept of having some SQL server configured.
Yes, you can think of PostgreSQL as a language too, which shares most of its syntax and semantics with other SQL vendors. Yes, there is a 'basic' SQL language shared and compatible between all vendors; some share more aspects than others. In terms of Venn diagrams, you can think of many circles representing each one, Microsoft's SQL Server, Oracle SQL, PostgreSQL, MySQL, etc. sharing the very most of its elements, where each element is a SQL instruction.
When dealing with Databases in general, keep in mind that they helps to modelate situations of 'real world' scenarios or software systems. SQL allows to 'talk' to implementation of "Relational Databases" wich is one kind of database modeling, but there are others too. ER Diagrams helps to represent the 'structure' of a database in a conceptual manner. I like DBeaver because it has an integrated ER diagram generator wich helps to understand the structure of a given database instance.
I have used Postgres and it is an excellent product (and free).
I would install it standalone first. It does come with its own client tools, which you use to communicate with the database server, which runs independently as a service. However, you might be better off installing something like SqlWorkbench as a client tool (which I use). In the config you specify the machine Postgres is running on (which can be your local computer for testing purposes) and the port to connect on. Essentially, the client sends your instructions to Postgres server and the server returns the resultsets associated with your instructions. The client also formats the resultsets into a nice readable "spreadsheet" format with rows and columns.
First I'll try to answer the questions you asked. There is a SQL language, but in practice it is not strictly standardized. There are many offerings for databases and database servers. Many of these are discussed below.
Any database you pick will give you the chance to learn basics of SQL queries and this knowledge will serve you well even if you switch to a different database later.
Specifically, when it comes to PostgreSQL, it is a Relational Database Management System. It is a software that operates as a server. You can install it on your personal computer running Windows, Linux, or MacOS. You can also install in on a dedicated server computer where you'll get better performance and uptime. Further, there are many companies that offer PostgreSQL hosting including Amazon RDS and Google Cloud but they're not free.
For a CS student, PostreSQL installed on your personal computer might be a reasonable choice. But you have lots of options. Read on....
For a CS program, your choice of database will depend on:
what degree of portability you need
how much data you have
how many users will connect to database
what kinds of jobs you might pursue after graduation
Portability
If you think you want to ship your database with your application, then your best bet is probably SQLite. By some accounts it can handle several million rows worth of data and still be performant. However, it's not great if you need for multiple users to connect to the same database. Your data can get corrupted in many multi-user scenarios.
How Much Data and Users
For large data and large users, you'll want to consider the client/server heavy hitters:
PostgreSQL
MySQL/MariaDB
Oracle
SQL Server
These databases will support large quantities of data any many simultaneous connections. But if you want to distribute the database with your application, it's not a good idea. Or if you want to demonstrate your app, you need to ensure that a connection to a server will be available. All of these databases come with a free version, but the last two will have the most restrictions.
After Graduation
Now you're looking to the future and possibly what kind of skills you want to put on your resume. If you think you'll end up in a corporate environment that is already well established, they will likely already have a preferred database and it could be any of the ones listed here (SQLite or the "heavy hitters"). If you want to position yourself as developing apps with low overhead cost, you'll gravitate towards SQLite/PostgreSQL/MySQL. If you think you're going to be some kind of database administrator working in a buttoned-up corporate environment, those companies tend to favor SQL Server and Oracle.
Good luck. Any choice you make will probably be fine. Knowing some flavor of SQL is useful for your future endeavors.
SQL is a language like any other language but working on database. It is called SQL because it works on structured data like table (i.e rows and columns). After reading the documentation of PostgreSQL, I think we do not need any separate server installation. You can download it from here. If you are facing any issues with it I suggest using MySQL workbench. Although installation may take longer time, but its easy to understand.

Is there a good reason to split related datasets into different SQL databases rather than just tables?

This question is motivated by a recent update to some business software a friend of mine is using. Their architecture was based on Access databases until now, which was awfully slow. They had split their datasets into multiple mdb files (sales.mdb, products.mdb, stock.mdb, ...). They are now moving on to SQL Server Express and keeping this structure. Instead of using a table for each of these datasets, they created different databases on the same instance of SQL Server 2008 Express.
From my (admittedly limited) understanding of SQL Server, this does not seem sensible, as it prevents JOINs between different tables and requires a program that needs sales and stock data to maintain two DB connections instead of just one.
One of the software vendor's consultants claimed that this would circumvent SQL Express' RAM limit of 1GB physical memory - he says that is per-database, but from what I gather from MSDN, it's actually per-instance, so they win nothing here.
Is there a good reason for splitting data in the same business domain into databases rather than tables?
(One that I can think of is that you can restrict access per database, but not per table - but this is irrelevant in this particular case, all modules of the program can access all databases.)
You can write joins across databases, so that's not a major issue. Generally, I would suggest keeping everything in one database unless there was a very good reason to split it, and those reasons might have something to do with compatibility, say where you had an application that had to run against a database in compatibility mode 80 or something - you might choose to separate some data into a separate database at that compatibility level. Or if you had a major chunk of functionality that you wanted to be able to easily move to another server - say data migration, or ETL.
It sounds like the limitation is in the application.
Depending on your version of SQL Server Express, splitting the databases up could allow for more data storage (2005 has a 4GB db limitation, 2008 is 10GB).
An addition to the 1GB RAM per instance limit, I believe SQL Server Express is also limited to 1 CPU per instance as well.
I would agree with HABO's comment. You cannot enforce referential integrity across databases, that will all have to be managed within the application.

Database Meta tools

I'm now working with a legacy database which is missing, among almost everything you'd expect from a decent SQL relational DB, any documentation or metadata. I can't make changes to the DB schema, except my local test copy, as it exists at many client sites and there's no upgrading procedures. Are there any tools that I can use to build and keep my own meta about the database? I'm looking to keep track of relationships, basic documentation about tables and columns, and references in stored procedures. There's 200+ tables and 3300+ SPs. A base autogeneration would be very helpful, particularly with the SPs. Preferably FOSS and Linux, but I will settle for win just to have something.
Not sure what you mean with "metadata", but I'm pretty happy with Liquibase.
It manages the schema in one (or more) XML files and can reverse engineer an existing database (all major ones supported).
It's Java/JDBC based and runs fine on Linux
The main purposes of Liquibase is to handle upgrades (schema migration) smoothly, so I'm not sure if this exactly what you are looking for.

how to make a db schema such that its use is supported by all db management systems

is there a windows xp utility to make a database such that its support by sql server, oracle, and other db management systems.
the database schema is very huge so i would like to know what to use to make it so its protable from sql server to oracle if future demands that change?
In short, what you seek is nearly impossible to do successfully. Every database product has enough quirks that building such database would not perform well and would be too limiting in terms of the features you were able to use. I.e, you have to play the game of lowest common denominator with respect to features that all products implement you want to support. A far better solution is to abstract the data layer into its own library accessed via interfaces so that you can swap out your data layer. ORMs, as Rafael E. Belliard suggested, makes this simpler but it can also be done manually.
I would recommend building your database using an ORM like Hibernate for Java (or NHibernate for .NET). This would allow you to seamlessly transition from one database type to the other with little to no issues. They would allow you to logically create the database schema without a specific database in mind, which you could then move from one database to the other.
I have created applications which change from SQL Server to MySQL to Oracle to MS Access to SQLite easily (clients love that flexibility).
However, you would need to know your way around programming...

How is Database Migration done?

i remember in my previous job, i needed to do data migration. in that case, i needed to migrate to a new system, i was to develop, so it has a different table schema. i think 1st, i should know:
in general, how is data migrated (with the same schema) to a different DB engine. eg. MySQL -> MSSQL. in my case, my destination DB was MySQL and i used MySQL Migration Toolkit
i am thinking, in an enterprise app, there may be stored procedures, triggers that also need to be imported.
if table schema is different, how will i then go abt doing this? in my prev job, what i did was import data (in my case, from Access) into my destination (MySQL) leaving table structures. then use SQL to select data and manipulate as required into final destination tables.
in my case, where i dont have documentation for the old db, and the columns was not named correctly, eg. it uses say 'field1', 'field2' etc. i needed to trace from the application code what the columns mean. any better way? or sometimes, columns contain multiple values in delimited data, is reading code the only way?
I really depends, but from your question I assume you want to hear what other people do.
So here is what I do in my current project.
I have to migrate from Oracle to Oracle but to a completely different schema.
The old system was 2-tier (old client, old database) the new system is 3-tier (new client, business logic, new database). We have more than 600 tables in the new schema.
After much pondering we scraped the idea of doing a migration from old database to new database in SQL. We decided that in our case i would be much easier to go:
old database -> old client -> business logic -> new database
In the old database much of the data is stored in strange ways and the old client
mangles it in complex ways. We have access to the source code of the old client but it is a very large system.
We wrote a migration tool that sits above the old client and the business logic.
We have some SQL before and some SQL after that but the bulk of data is migrated via
old client and business logic.
The downside is that it is slow, a complete migration taking more than 190 hours in our case but otherwise it works well.
UPDATE
As far as stored procedures and triggers are concerned:
Even as we use the same DBMS in old and new system (both Oracle) the procedures and
triggers are written from scratch for the new system.
When I've performed database migrations, I've used the application instead a general tool to migrate the database. The application connects to two databases and copies objects from one to the other. You don't have to worry about schema or permissions or whatnot since all that is handled in the application, just like what happens when you set up the application in the first place.
Of course, this may not help you if your application doesn't support this. But if you're writing an application, I strongly recommend doing it this way.
I recommend the wikipedia article for a good overview and links to the main commercial tools (and some non-commercial ones). Stored procedures (and kin, e.g. user-defined function), if abundant, are going to be the "hot spots" in the migration, requiring rare abd costly human skills -- as soon as you get away from the "declarative" mood of mainstream SQL, and into procedural code, you cannot expect automated tools to do a decent job (Turing's Theorem says that they actually can't, in a sufficiently general case;-). So, you need engineers with a good understanding of the procedural trappings of BOTH engines -- the one you're migrating from, the one you're migrating to. You can buy that -- it's one of the niches where consultants make REALLY good money!-)
If you are using MS SQL Server, you can use SSMS to script out the schema and all data in one go: SQL Server 2008: Script Data as Inserts.
If you are not using any/many non-standard SQL constructs, then you might be able to manually edit this scipt without too much effort.