I have a database server with few main databases, and few dozens of small ones.
These small databases are kind of intermediary/staging databases for data import from various sources into main database. Data import is a daily task. They are all quite similar in structure as the implementation of these data imports are similar, so basically they have a configuration tables, which define mapping, conversions etc, and the data tables, which contain the results of the import.
Some time ago there have been only the handful of small ones, but now I have more then 20 of them will grow further with the number of supported data feeds.
I have just migrated all the server environment to SQL Server 2008, and having some time now for clean-up/refactoring, I am thinking to merge all of data-import databases into just one database, and use database schema to separate them.
Question-0: Any other ideas for the described situation?
Question-1: Shall I change from a separate database to a separate schema?
Question-2: !!!: Any tricky thing to be careful about in database schema implementation?
Edit-1: highlighted question-2 as the most 'unanswered' currently.
In your instance, I would probably put merge the databases into one. I don't really see a reason to have them separated, and merging them will reduce the amount of work you have to do to support backups etc. If you were importing data from a data source once and then never using the staging tables again, I could see the reason to bring up separate databases to handle the data transformation. Since you use these tables on an ongoing basis, I would much rather keep them together so that I only have to go to one place to find the full end to end state of the production data and the data load states.
2008 is really good at handling database partitioning too, if the db gets too large, or you need to separate data for security reasons you get the benefit of having a single db with the advantages like having several smaller ones. You won't get that with multiple smaller dbs.
When we migrated we had a very similar situation and I ended up moving everything into one some-what large Importing database like you have hinted towards. We did not, however, separate them using schemas.
Because the database is the unit of referential integrity and backup, if you are bringing in large amounts of data for staging which does not need to be backed up on the same schedule, it might be easiest to keep it in a separate DB.
You can use a single DB with multiple file groups and different backups, but it will require a lot more design.
The basic factors this will depend on are: recovery model, backup objectives, usage patterns and amount of effort to design and maintain your file group design.
All the prior answers work for me, particularly your comment about selectively combining databases -- if some are very busy, very large, or process sensitive data, you might want to keep them separate, or in separate groupings. This would make it easier to configure backups/restores and disk/drive allocation (give the busy ones their own set of spindles).
Like possibly most database developers, I have dealt almost exclusively with objects in the dbo schema, but I have done some recent work with other schemas. The main gotcha I've encountered is remembering to always specify the schema when referring to any database object. Never assume that any given connection will reference an object in the schema you want it to--always be clear and precise!
I would put all your import staging tables in one database separate from your regular production databse as the backup needs may be very different. This database should also contains things like your configuration management for SSIS packages, any logging tables, any import metadata tables (we keep track of every run of the imports and the status of that run as well as a bazillion other things about the import like the filename, the normal file size, etc. Comes in handy for researching problems and for adding checks to the processing. We usea a schema that is by client and then an additional schema for objects realted to the importing/exporting process (logs, meta data etc.)
Related
Currently our team is having a major database management/data management issue where hundreds of databases are being built and used for minor/one off applications where the app should really be pulling from an already existing database.
Since our security is so tight, the owners of these Systems of authority will not allow others to pull data from them at a consistent (App Necessary) rate, rather they allow a single app to do a weekly pull and that data is then given to the org.
I am being asked to compile all of those publicly available (weekly snapshots) into a single data warehouse for end users to go to. We realistically are talking 30-40 databases each with hundreds of thousands of records.
What is the best way to turn this into a data warehouse? Create a SQL server and treat each one as its own DB on the server? As far as the individual app connections I am less worried, I really want to know what is the best practice to house all of the data for consumption.
What you're describing is more of a simple data lake. If all you're being asked for is a single place for the existing data to live as-is, then sure, directly pulling all 30-40 databases to a new server will get that done. One thing to note is that if they're creating Database Snapshots, those wouldn't be helpful here. With actual database backups, it would be easy to build a process that would copy and restore those to your new server. This is assuming all of the sources are on SQL Server.
"Data warehouse" implies a certain level of organization beyond that, to facilitate reporting on an aggregate of the data across the multiple sources. Generally you'd identify any concepts that are shared between the databases and create a unified table for each concept, then create an ETL (extract, transform, load) process to standardize the data from each source and move it into those unified tables. This would be a large lift for one person to build. There's plenty of resources that you could read to get you started--Ralph Kimball's The Data Warehouse Toolkit is a comprehensive guide.
In either case, a tool you might want to look into is SSIS. It's good for copying data across servers and has drivers for multiple different RDBMS platforms. You can schedule SSIS packages from SQL Agent. It has other features that could help for data warehousing as well.
Here is the use-case: we need to backup some of the tables from a client server, copy it to our servers, restore it, then running some queries using ODBC.
I managed to do this process for the entire database by using probkup for backup, prorest for restore and proserve to make it accessible for SQL queries.
However, some of the databases are big (> 8GB), so we are looking for a solution to do the backup for only the tables we need. I didn't find anything with the documentation of probkup how this can be done.
Progress only supports full database backups.
To get the effect that you are looking for you could dump (export) the tables that you want and then load them into an empty database.
"proutil dump" and "proutil load" are where you want to start digging.
The details will vary depending on exactly what you want to do and what resources and capabilities you have available to you.
Another option would be to replicate the tables in question to a partial database. Progress has a product called "pro2" that can help with that. It is usually pointed at SQL targets but you could also point it at a Progress database.
Or, if you have programming skills, you could put together a solution using replication triggers (under the covers that's what pro2 does...)
probkup and prorest are block-level programs and can't do a backup or restore by table.
To do what you're asking for, you'll need to do a dump the data from the source db's tables and then load it into the target db.
If your object is simply to maintain a copy of the DB, you might also try incremental backups. Depending upon your situation, that might speed things up a bit.
Other options include various forms of DB replication, which allow you to keep real- or near-real-time copies of your database.
OpenEdge Replication. With the correct license, you can do query-only access on the replication target, which is good for reporting and analysis.
Third-party replication products. These can be more flexible in terms of both target DBs and limiting the tables to be replicated.
Home-grown replication (by copying and applying AI files). This is not terribly complicated, but you have to factor the cost of doing the work and maintaining the system. There are some scripts out there that can get you started.
Or, as Tom said, you can get clever with replication via triggers.
Currently I have 3 (same code base apps) with it's own databases and own unique data. Were moving towards doing multi tenancy in rails, after a couple of prototype testing we've decided to go for a shared tenancy. My only biggest problem is that, each databases have their own data with unique ids and etc. How would it be possible to merge them either via sql command/dump or rails script that way they will have their own account_id + keep all data integrity?
Absolutely doable. It depends on a lot of details.
Basically I would
Make a full backup of all three.
Prep each database to hold compatible data (no duplicates).
Select one to be the new master.
Dump the other two (data only).
Hack the dump, to make sure. Typical COPY statements in dumps are just fine.
Restore data from the two additional database on top of existing data in the master.
Make sure all sequences are set properly.
Run vaccumdb -fz master.
SQL Server and Oracle terminology -
In SQL Server If I have two applications and want to keep the database completely separate, I could simply create 1 database for each application therefore I end up with 2 databases.
If I wanted to do the same thing in oracle, what do I need to create?
- create a new "Databases"? "Instance", "Schema", or "Tablespace" per application?
(Note, these two applications is the same application used by two different companies, that do not share data!)
Reference: http://www.codeproject.com/Tips/492342/Concept-mapping-between-SQL-Server-and-Oracle
Having worked with SQL Server a lot in the past, I have sympathy with trying to figure out how Oracle organizes things as I struggled with the same thing. My comments below are from SQL Server 2000 and 2003 so forgive me if things have changed since then.
Previous responders have been helpful. I think one problematic assumption here is that there is an exact "level" equivalency between SQL Server and Oracle. What I mean by "level" is something that occupies the same space in the hierarchies that you have diagrammed above (and which, btw, I think are a good place to start but might need a bit of editing in a couple of places, for example how you have diagrammed "user" and "schema" in the Oracle hierarchy, I might put them side-by-side.) I do not think these concept "levels" match exactly between the DB platforms.
A schema in Oracle is somewhat equivalent to a separate database in SQL Server but not entirely.
I would say that the "walls" -- not an exact technical term but oh well -- between databases in SQL server are a bit higher than the "walls" between schemas in Oracle. Others might disagree but here is my reasoning:
a. A schema in Oracle is a purely logical construct. It denotes who has ownership of objects. It has nothing to do with the physical location or layout of the objects. A tablespace (orthagonal concept, as noted by a previous poster) indicates the physical location of objects. A tablespace can hold objects that are in multiple schemas and vice versa. In SQL Server these two concepts are sort of merged into one -- a database is both tablespace and schema, more or less, although in some respects within a DB in SQL Server you then have multiple owners with various object ownership. This can get a bit confusing because as I remember (it's been a couple of years) if not using NT Authentication the users are defined at the server level and then have to "link" to the users in the individual DBs.
b. I remember finding it easier, or at least a bit simpler, to assure myself that users to two separate DBs in SQL Server had no access to the relative other user's DB than I have found it in Oracle.
c. Because a DB in SQL server represents both physical storage and logical ownership, you can detach the DB and move it to another SQL Server Instance and attach it. You can't do this with a schema in Oracle. I mean, you can datapump the data out or back it up or whatever to another server and another schema, but that all takes at least some scripting and such or at least a fair amount of clicking in Enterprise Manager. It doesn't give you the one-click "Detach DB" option that you have in SQL Server which makes it a lot easier to get the idea that SQL Server DBs are units that you can more-or-less move back and forth between databases.
To sum things up, I think either option would work. That is, 1) Create two separate instances of Oracle with one schema in each instance for each application, or 2) Create two separate schemas in one Oracle instance.
There are pros and cons for each option. Option 1 is probably going to be more work to set up and configure but will also give you more separation, independence, ability to have separate hardware, etc., for each DB. Option 2 will be quite a bit simpler but gives you less separation between the data and greater risk of configuration screw-ups or other things allowing users of one schema to access the other. It also means you have to be a bit more careful that someone writing a query accessing data in one schema doesn't use all the CPU and IO resources and starve a user on the other schema.
Also, yes, you could use pluggable databases in 12c. However, given the fact that you need to ask these questions (no shame, just pointing out where you're at) makes me hesitant about recommending what can easily be a more complex setup.
TL;DR -- SQL Server isn't Oracle and Oracle isn't SQL Server. Either option works and there are pros and cons to each.
If you're using 12.1 or later with the multitenant option, you could create separate pluggable databases in a single container database. The other option, which works in any version of Oracle, would be to create a separate schema. It would be possible, as well, to create a separate database, though that is generally not the preferred approach unless you have a particular need to do things like upgrade the database that one application is using without affecting the other.
Creating a Database
If you create a separate database, you'd end up with complete separate memory structures (i.e. the SGA and PGA for each database would be separate) as well as a completely separate set of background processes (each database would have its own log writer process(es) for example). That is a very heavyweight option-- you can't have too many databases on a single server before you start having a lot of contention for RAM, for scheduling all the background processes, etc. It does provide for the maximum separation between different applications-- each database can be running a different version of Oracle with a different set of initialization parameters-- but this also tends to increase the complexity of managing the environment. This generally only makes sense when you have third party applications that require a specific version of the database or a specific set of initialization parameters.
Creating a Schema
If you create a separate schema, you still have a single database so the two schemas are sharing the same memory structures (competing with each other for space in the SGA's buffer cache, for example), initialization parameters, etc. You have to exercise a modicum of planning to ensure that that the two don't interfere with each other-- you'd probably want to make sure that nether application creates public synonyms or at least that they won't wan to create the same public synonym as the other application-- but this is generally pretty trivial.
Creating a Pluggable Database
This only works in 12.1 and only if you have the multitenant option. This is the most similar to the SQL Server concept of creating a new database for each application.
You should create a new instance (schema) on the same database, where the schema in oracle is the same as the SQL server database
Here's the scenario.
Two identical databases:
One Live database, one Archive database, they're suppose have the exact same schema (table, view, indexes, SPs, functions), the only difference is the data in the databases. The data in Live DB will be archived with some business rules and apparently the data in Archive DB will be different from in Live DB.
The challenge is that we keep on patching changes (SP change, function change, data change, or even table schema change) to the Live DB in each release. Unfortunately, the changes required on Archive DB are forgotten for a long time and the issues have just not been addressed yet. It will happen one day that the out-of-sync DBs come back and bite us.
Here's what I want to do: I want to synchronize non-data related changes from Live DB to Archive DB. Either automated or manually.
Any idea is welcome. Here are some ideas that have come to my mind:
replication? I find replication does not fit this scenario quite well.
scripting the SP/function/view changes? I can manually pull out the scripts and combine them together. What about the table schema changes? It's difficult for me to track back to find out what's happened on table schema changes.
I know there's Redgate and other product can do the job but I'd like to explore the full potential.
If anybody can point out some feasible way that'd be great.
If you are using SQL Server, Visual Studio Team System Database Edition has a schema comparison and patching tool. Have a look at this article