Backup database and remove sensitive data

Backup database and remove sensitive data - sql

I'm looking at backup routine which allows our production database to be backed up with sensitive data stripped out of certain columns within the database to be exported to our testing server.
The routine should require the least human intervention and hopefully just be a simple customisable SQL script without taking the production database offline.
Database server is SQL Server 2008.

I've run into similar requirements before, and the only sure solution I know of is to use a copy of your production database. You can mask/delete data on the copy and run backups from there. Yes it's ugly and a waste of resources, but to date I haven't found a solid alternative for this particular problem.
As for the copy method, you do have some options:
Replication
Scheduled DB copy
Backup/restore from production
So while I admit this solution is pretty cringe-worthy, it can be automated and serve your purposes. If you can find productive uses for the database copy that don't require your deleted information (e.g. reports, testing, development) then this can actually be a less-than-terrible solution. It can be a nice security boon to have a slightly out-of-date version of your production database with sensitive data removed.

If you want to take a backup then Just type
BACKUP DATABASE Dbname
If you want to specify offline or anything else then you can do it.
The backup file will generate on the default path of the SQL SERVER 2008.

Related

How to manage/ track changes to SQL Server database without compare tool

I'm working on a project as an outsourcing developer where i don't have access to testing and production servers only the development environment.
To deploy changes i have to create sql scripts containing the changes to make on each server for the feature i wish to deploy.
Examples:
When i make each change on the database, i save the script to a folder, but sometimes this is not enought because i sent a script to alter a view, but forgot to include new tables that i created in another feature.
Another situation would be changing a table via SSMS GUI and forgot to create a script with the changed or new columns and later have to send a script to update the table in testing.
Since some features can be sent for testing and others straight to production (example: queries to feed excel files) its hard to keep track of what i have to send to each environment.
Since the deployment team just executes the scripts i sent them to update the database, how can i manage/ keep track of changes to sql server database without a compare tool ?
[Edit]
The current tools that i use are SSMS, VS 2008 Professional and TFS 2008.

I can tell you how we at xSQL Software do this using our tools:
deployment team has an automated process that takes a schema snapshot of the staging and production databases and dumps the snapshots nightly on a share that the development team has access to.
every morning the developers have up to date schema snapshots of the production and staging databases available. They use our Schema Compare tool to compare the dev database with the staging/production snapshot and generate the change scripts.
Note: to take the schema snapshot you can either use the Schema Compare tool or our Schema Compare SDK.

I'd say you can have a structural copy of test and production servers as additional development databases and keep in mind to always apply change when you send something.
On these databases you can establish triggers that will capture all DDL events and put them into table with getdate() attached. With that you should be able to handle changes pretty easily and some simple compare will also be easier to apply.

Look into Liquibase specially at the SQL format and see if that gives you what you want. I use it for our database and it's great.
You can store all your objects in separate scripts, but when you do a Liquibase "build" it will generate one SQL script with all your changes in it. The really important part is getting your Liquibase configuration to put the objects in the correct dependency order. That is tables get created before foreign key constraints for one example.
http://www.liquibase.org/

What's your process for dealing with database schema changes in a dev team?

here's a more general question on how you handle database schema changes in a development team.
We are a team of developers and the databases used during development are running locally on everyone's box as we want to avoid the requirement to have web access all the time. So running a single central database instance somewhere is not a real option.
Whenever one of us decides that it is time to extend/change the db schema, we mail database files (MYI/MYD) or SQL files to execute around, or give others instructions on the phone what they need to do to get the changed code running on their local DBs. That's not the perfect approach for sure. The same problem arises when we need to adjust the DB schema on staging or production once a new release is ready.
I was wondering ... how do you guys handle this kind of stuff? For source code, we use SVN.
Really appreciate your input!
Thanks,
Michael

One approach we've used in the past is to script the entire DDL for the database, along with any test/setup data needed. Store that in SVN, then when there's a change, any developer can pull down the changes, drop the database, and rebuild it from the script files.

At the very least you should have the scripts of all the objects in the database (tables, stored procedures, etc) under source control.
I don't think mailing schema changes is a real option for a professional development team.

We had a system on one of my previous teams that was the best I've encountered for dealing with this situation.
The nightly build of the application included a build of a database (SQL Server). The database got built to the Test DB server. Each developer then had a DTS package (this was a while ago, and I'm sure they upgraded to SSIS packages) to pull down that nightly DB build to their local DB environment.
This kept the master copy in one location and put the onus on the developers to keep their local dev databases fresh.

At my work, we deal with pretty large databases that are time-consuming to generate, so for us, starting from scratch with a new DB isn't ideal. Like Harper, we have our DDL in SVN. Additionally, we store a version number in a database table. Every check-in that changes the DB must be accompanied by a script that:
Will upgrade the database schema and modify any existing data appropriately, and
Will update the version number in the database.
Further, we number the scripts and database versions such that a script we've written knows how to upgrade further along a branch or from an older branch to a newer one without any input from the developer (apart from the database name and the directory to the upgrade scripts).
Thus, if I've got a copy of a customer's 4GB DB that's from a year old version and I want to test how their data will work with the version we cut yesterday, I can just run our script and let it handle the upgrades rather than having to start from scratch and redo every INSERT, UPDATE and DELETE performed since the database was created.

We have a non-SQL description of the database schema. When the application starts, it compares the desired database schema with the actual database schema, and performs whatever ADD TABLE, ADD COLUMN, ADD INDEX, etc. statements it needs to do to get the database to look right.
This doesn't handle every case; sometimes you have to delete the database and recreate if if you've changed something that the schema resolver can't handle, but most of the time we don't need to worry about it.

I'd certainly keep the database schema in source code control.
At my present job, every time there's a schema change, we write the SQL for the change (alter table xyz add column ...) and put it in SVN. Then developers can update test databases by running this script. It's pretty clumsy but it works.
At a previous job I wrote some code that at application start-up would automatically compare the actual database schema to what it expected, and if it was not up to date perform the updates. Mostly this was done for deployment reasons: When we shipped new copies of the software, it would then automatically update the user's database. But it was also handy for developers.
I think there should be some generic SQL tool to do this. Maybe there is, but I've never seen one.

How to Sql Backup or Mirror database?

We are not hosting our databases. Right now, One person is manually creating a .bak file from the production server. The .bak then copied to each developer's pc. Is there a better apporach that would make this process easier? I am working on build project right now for our team, I am thinking about adding the .bak file into SVN so each person has the correct local version? I had tried to generate a sql script but, it has no data just the schema?

Developers can't share a single dev database?
Adding the .bak file to SVN sounds bad. That's going to keep every version of it forever - you'd be better off (in most cases) leaving it on a network share visible by all developers and letting them copy it down.
You might want to use SSIS packages to let developers make ad hoc copies of production.
You might also be interested in the Data Publishing Wizard, an open source project that lets you script databases with their data. But I'd lean towards SSIS if developers need their own copy of the database.

If the production server has online connectivity to your site you can try the method called "log shipping".
This entails creating a baseline copy of your production database, then taking chunks of the transaction log written on the production server and applying the (actions contained in) the log chunks to your copy. This ensures that after a certain delay your backup database will be in the same state as the production database.
Detailed information can be found here: http://msdn.microsoft.com/en-us/library/ms187103.aspx
As you mentioned SQL 2008 among the tags: as far as I remember SQL2008 has some kind of automatism to set this up.

You can create a schedule back up and restore
You don't have to developer PC for backup, coz. SQL server has it's own back up folder you can use it.
Also you can have restore script generated for each PC from one location, if the developer want to hold the database in their local system.
RESTORE DATABASE [xxxdb] FROM
DISK = N'\xxxx\xxx\xxx\xxxx.bak'
WITH FILE = 1, NOUNLOAD, REPLACE, STATS = 10
GO

Check out SQL Source Control from RedGate, it can be used to keep schema and data in sync with a source control repository (docs say supports SVN). It supports the datbase on a centrally deployed server, or many developer machines as well.
Scripting out the data probably won't be a fun time for everyone depending on how much data there is, but you can also select which tables you're going to do (like lookups) and populate any larger business entity tables using SSIS (or data generator for testing).

How do you upload SQL Server databases to shared hosting environments?

We have a common problem of moving our development SQL 2005 database onto shared web servers at website hosting companies.
Ideally we would like a system that transfers the database structure and data as an exact replica.
This would be commonly achieved by restoring a backup. But because they are shared SQL servers, we cannot restore backups – we are not given access to the actual machine.
We could generate a script to create the database structure, but then we could not do a data transfer through the menu item Tasks/Import Data because we might violate foreign key constraints as tables are imported in an order the conflicts with the database schema. Also, indexes might not be replicated if they are set to auto generate.
Thus we are left with a messy operation:
Create a script in SQL 2005 that generates the database in SQL 2000 format.
Run the script to create a SQL 2000 database in SQL 2000.
Create a script in SQL 2000 that generates the database structure WITHOUT indexes and foreign keys.
Run this script on the production server. You now have a database structure to upload data to.
Use SQL 2005 to transfer the data to the production server with Tasks/Import data.
Use SQL 2000 to generate a script that creates the database with indexes and keys.
Copy the commands that generate the indexes and foreign keys only. These are located after the table creation commands. Note: In SQL 2005, the indexes and foreign keys are generated as one and cannot be easily separated.
Run this script on the production database.
Voila! The database is uploaded with all data and keys/constraints in place. What a messy and error prone system.
Is there something better?

Scott Gu had written few posts on this topic :
SQL Server Database Publishing Toolkit for Web Hosting

Generation scripts are fine for creating the database objects, but not for transporting database information. For example, client-specific databases where the developer is required to pre-populate some data.
One of the issues I've run into with this is the new MAX types in SQL Server 2005+. (nvarchar(max), varchar(max), etc.) Of course, this is worse when you are actually using Sql Server Express, which doesn't allow for exporting other than creating your own scripts to create the data.
I would recommend switching to a hosting company that allows you to have the ability to FTP backup files and does NOT require you to use your own scripts. That's the whole point of SQL Server, right? To provide more tools that are friendlier to use. If the hosting company takes that away, you may as well move to MySql for its ease in dumping information.
WebHost4Life is a life saver in this category. They offer FTP to the database server to upload your backup file or MDF and LDF files for attachment! I was so upset when I saw GoDaddy had the similar restriction you mentioned. Their tool didn't tell me it was a bad import, and I couldn't figure out why my site was coming back with 500 errors.
One other note: I'm not sure which is considered more secure. I enabled external connections in GoDaddy and connected with Management Studio, and I was able to see every database on that server! I couldn't access them, but I now have that info. A double whammy is that GoDaddy requires that the user name for the DB be the same as the DB! now all you need to do is spam passwords against those hundreds of DBs!
Webhost4life, on the other hand, has only your specific database shown in Management Studio. And they let you pick your own DB name and user name, independent of each other. They only append the same unique id on the end of the user & db names in order to keep them from conflicting with others.

You should not rely on restoring backups for copying / transferring databases. You need to use scripts - trust me you will get better at it.

I have used the RedGate Compare tools with shared hosting and it works well.

Database-generation scripts are messy, but they also have several advantages that ... well, make the pain more tolerable.
First, if you treat the DB scripts as real programming tasks in and of themselves, you can encapsulate the messiness. If you generate a script once (using a database tool), you can split the table structure aspects from the constraint aspects (keys, indices, etc.). Similarly, you can export the data once, but split it it into "system" data that's not frequently changed but is necessary for correct operation (stuff like tax or shipping rates, etc.), 'test' data that's easily identifiable, and 'operational' data that needs to be moved from DB version Old to DB version New (last week's Orders).
The first 3 minutes after you've accomplished that, things are wonderful: you can regenerate a new database with or without test data in a few minutes. Unfortunately, after 3 minutes, the databases are out of synch, at least in terms of data, if not quite as frequently in terms of structure.
I personally like to have each table's structure as a separate SQL file (and it's constraints as a separate file in a separate directory, and it's test data in one file, it's system data in another, etc.). On the one hand, this means that several different files have to be touched when making a change, but on the other hand, it makes it much easier to see the granularity of what's been changed: it's all right there in the version control logs. (I could probably be convinced that many-files is a mistaken strategy...)
All of this is predicated on the assumption that you have some facility for actually running a complex script involving many files and are not just constrained to some Web-based control panel, which may be what you're describing when you say "we are not given access to the actual machine." I feel that you can't do custom software development and not have some kind of shell access on the server; the hosting business is competitive enough that you can certainly find a script-friendly host easily enough.

Check whether the webhsoting company provides myLittleBackup
This is definitively the easiest solution to "install" a db from the development server to the shared sql server

Answer for SQL Server 2008 users.
I had the same exact issue as OP but I was using SQL Server 2008 and my shared hosting company is GoDaddy. Here's the solution to copy DB + the data to GoDaddy database...
In Visual Studio 2010, go to Server Explorer (in VS Express, I think it's called database explorer). Right click on database and select Publish to Provider ... this opens the Database Publishing Wizard ... go thru the wizard and it'll create a xxx.sql file on your local computer ...
Open SQL Server Management Studio and connect to the GoDaddy database (you should have already created this via the GoDaddy control panel within their website) ...
Open windows explorer and find the xxx.sql file and double click it. The script should open up in SSMS. Execute the script "within the proper database" ... voila, done.

How can I maintain consistent DB schema accross 18 databases (sql server)?

We have 18 databases that should have identical schemas, but don't. In certain scenarios, a table was added to one, but not the rest. Or, certain stored procedures were required in a handful of databases, but not the others. Or, our DBA forgot to run a script to add views on all of the databases.
What is the best way to keep database schemas in sync?

For legacy fixes/cleanup, there are tools, like SQLCompare, that can generate scripts to sync databases.
For .NET shops running SQL Server, there is also the Visual Studio Database Edition, which can create change scripts for schema changes that can be checked into source control, and automatically built using your CI/build process.

SQL Compare by Red Gate is a great tool for this.

SQLCompare is the best tool that I have used for finding differences between databases and getting them synced.
To keep the databases synced up, you need to have several things in place:
1) You need policies about who can make changes to production. Generally this should only be the DBA (DBA team for larger orgs) and 1 or 2 backaps. The backups should only make changes when the DBA is out, or in an emergency. The backups should NOT be deploying on a regular basis. Set Database rights according to this policy.
2) A process and tools to manage deployment requests. Ideally you will have a development environment, a test environment, and a production environment. Developers should do initial development in the dev environment, and have changes pushed to test and production as appropriate. You will need some way of letting the DBA know when to push changes. I would NOT recommend a process where you holler to the next cube. Large orgs may have a change control committee and changes only get made once a month. Smaller companies may just have the developer request testing, and after testing is passed a request for deployment to production. One smaller company I worked for used Problem Tracker for these requests.
Use whatever works in your situation and budget, just have a process, and have tools that work for that process.
3) You said that sometimes objects only need to go to a handful of databases. With only 18 databases, probably on one server, I would recommend making each Databse match objects exactly. Only 5 DBs need usp_DoSomething? So what? Put it in every databse. This will be much easier to manage. We did it this way on a 6 server system with around 250-300 DBs. There were exceptions, but they were grouped. Databases on server C got this extra set of objects. Databases on Server L got this other set.
4) You said that sometimes the DBA forgets to deploy change scripts to all the DBs. This tells me that s/he needs tools for deploying changes. S/He is probably taking a SQL script, opening it in in Query Analyzer or Manegement Studio (or whatever you use) and manually going to each database and executing the SQL. This is not a good long term (or short term) solution. Red Gate (makers of SQLCompare above) have many great tools. MultiScript looks like it may work for deployment purposes. I worked with a DBA that wrote is own tool in SQL Server 2000 using O-SQl. It would take an SQL file and execute it on each database on the server. He had to execute it on each server, but it beat executing on each DB. I also helped write a VB.net tool that would do the same thing, except it would also go through a list of server, so it only had to be executed once.
5) Source Control. My current team doesn't use source control, and I don't have enough time to tell you how many problems this causes. If you don't have some kind of source control system, get one.

I haven't got enough reputation to comment on the above answer but the pro version of SQL Compare has a scriptable API. Given that you have to replicate stuff to all of these databases you could use this to make an automated job to either generate the change scripts or to validate that the databases are all in sync. It's also not much more expensive than the standard version.

Aside from using database comparison tools, with 18 databases you should have a DBA, so enforce a policy that only the DBA can change tables at the database level by restricting access to CREATE and ALTER to the DBA only. On both your test and live databases. The dev database shouldn't have this, of course! Make the developers who have been creating or altering the schemas willy-nilly go via the DBA.

Create a single source-controlled DDL/SQL script for each release and only use it to update the databases. The diff tools can be useful but mainly for checking that you haven't made a mistake and getting out of trouble when the policies fail. Combine the DDL, SQL, and stored procedure scripts into a single script so that it's not easy to "forget" to run one of the scripts.

We have got a tool called DB Schema Difftective that can compare and sync database schemas. With our other tool, DB MultiRun you can easily deploy generated (sync) scripts to multiple db servers (project based).

I realize this post is old, but TurnKey is correct. If you are a developer working in a team environment, the best way to maintain a database schema for a large application, is to make updates to a Master Schema in what ever source safe you use. Simply write your own Scripting class and your Database will be perfect every time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas