Copy Azure SQL Database and Change Scale - sql

We are using the command CREATE DATABASE X AS COPY OF Y to copy an Azure SQL database so that we can take a transactionally consistent backup to our local network. The database is running as a P2 so the copy is also created as a P2 and we are therefore incurring double charges due to daily rate charging of the new database sizes.
Is there any way to copy a database with a different scale setting? Or, are there other ways to take the transactionally consistent backup?

As far as I know currently the way to do the transactionally consistent backups is to either use the COPY command, which you are doing, or rely on the point in time Backup/Recovery provided by Microsoft. If your goal is simply to have the backup somewhere you may look at the GeoReplication options (standard and active) which gets the data into another region in Azure. If your requirements is definitely to get a local copy, the COPY + Export is pretty much your option.
There is not a way currently to perform a COPY from one Database tier level to another; however, in code you can change the tier level for a database, so in theory you could change the Copy to a lower tier immediately after the COPY (there is a sample on how do to this with PowerShell on MSDN using Set-AzureSqlDatabase). However, SQL Database is billed at the day, so you even if you change this immediately, you'd get charged for the P2 instance of the copy for that day. If you are doing these COPY-Export operations daily and deleting the Copy as soon as you get the export down then you won't be saving any money. They have announced that hourly billing is coming to SQL Database along with pricing changes and some other things. It looks like the new pricing will go into affect Nov 1st, and while it's not explicit, I'm assuming that means hourly billing then as well. With Hourly billing at least once you get the copy completed you can reduce the tier on the copy and only pay for that one hour, then after you pull down the export you can delete the copy and save money.

You can set the size of the DB during copy.
CREATE DATABASE db_copy
AS COPY OF ozabzw7545.db_original ( SERVICE_OBJECTIVE = 'P2' ) ;
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-transact-sql?view=azuresqldb-current

Related

CockroachDB how to restore a dropped column?

I accidentally dropped a column. I have no backup set up for this single node setup. Does cockroach have any auto backup mechanism or am I screwed?
We could use time-travel queries to restored deleted data within a garbage collection window before the data is deleted forever.
The garbage collection window is determined by the gc.ttlseconds field in the replication zone configuration.
Examples are:
SELECT name, balance
FROM accounts
AS OF SYSTEM TIME '2016-10-03 12:45:00'
WHERE name = 'Edna Barath`;
SELECT * FROM accounts AS OF SYSTEM TIME '-4h';
SELECT * FROM accounts AS OF SYSTEM TIME '-20m';
I noticed that managed CockroachDB run database backup (incremental or full) hourly up to 30 days. You may be able to restore the whole database from it.
Please note that the restoration will cause your cluster to be unavailable for the duration of the restored. All current data is deleted.
We can manage our own backup, including incremental, database and table level backup. We need to configure a userfile location or a cloud storage location. This require billing information.
CockroachDB stores old versions of data at least through its configured gc.ttlseconds window (default one day). There's no simple way that I know of to instantly restore, but you can do
SELECT * FROM <tablename> AS OF SYSTEM TIME <timestamp before dropping the column>
And then manually reinsert the data from there.

Database copy limit per database reached. The database X cannot have more than 10 concurrent database copies (Azure SQL)

In our application, we have a master database 'X'. For each new client, we will create a new database copy of master database 'X'.
I am using the following SQL command which will be executed against Azure SQL server.
CREATE DATABASE [NEW NAME] AS COPY OF [MASTER DB]
We are using a custom queue tier so that we can create more than one client at a time parallelly.
I am facing issues in following scenario.
I am trying to create 70 clients. Once 25 clients got created I am getting below error.
Database copy limit per database reached. The database 'BlankDBClient' cannot have more than 10 concurrent database copies
Can you please share your thoughts on this?
SQL Azure has logic to do various operations online/automatically for you (backups, upgrades, etc). There are IOs required to do each copy, so there are limits in place because the machine does not have infinite iops. (Those limits may change a bit over time as we work to improve the service, get newer hardware, etc).
In terms of what options you have, you could:
Restore N databases from a database backup (which would still have IO limits but they may be higher for you depending on your reservation size)
Consider models to copy in parallel using a single source to hierarchically create what you need (copy 2 from one, then copy 2 from each of the ones you just copied, etc)
Stage out the copies over time based on the limits you get back from the system.
Try a larger reservation size for the source and target during the copy to get more IOPS and lower the time to perform the operations.
In addition to Connor answer, you can consider to a have a dacpac or bacpac of that master database stored on Azure Storage and once you have submitted 25 concurrent database copies you can start restoring the dacpac from Azure Storage.
You can also monitor how many database copies are showing COPYING on the state_desc column of the following queries, after sending the first batch of 25 copies, and when those queries return less than 25 rows, start sending more copies until reaching the 25 limit. Keep doing this until finishing the queue of copies required.
Select
[sys].[databases].[name],
[sys].[databases].[state_desc],
[sys].[dm_database_copies].[start_date],
[sys].[dm_database_copies].[modify_date],
[sys].[dm_database_copies].[percent_complete],
[sys].[dm_database_copies].[error_code],
[sys].[dm_database_copies].[error_desc],
[sys].[dm_database_copies].[error_severity],
[sys].[dm_database_copies].[error_state]
From
[sys].[databases]
Left
Outer
Join
[sys].[dm_database_copies]
On
[sys].[databases].[database_id] = [sys].[dm_database_copies].[database_id]
Where
[sys].[databases].[state_desc] = 'COPYING'
SELECT state_desc, *
FROM sys.databases
WHERE [state_desc] = 'COPYING'

SSIS : Huge Data Transfer from Source (SQL Server) to Destination (SQL Server)

Requirement :
Transfer millions of records from source (SQL Server) to destination (SQL Server).
Structure of source tables is different from destination tables.
Refresh data once per week in destination server.
Minimum amount of time for the processing.
I am looking for optimized approach using SSIS.
Was thinking these options :
Create Sql dump from source server and import that dump in destination server.
Directly copy the tables from source server to destination server.
Lots of issues to consider here. Such as are the servers in the same domain, on same network, etc.
Most of the time you will not want to move the data as a single large chunk of millions of records but in smaller amounts. An SSIS package handles that logic for you, but you can always recreate it as well but iterating the changes easier. Sometimes this is a reason to push changes more often rather than wait an entire week as smaller syncs are easier to manage with less downtime.
Another consideration is to be sure you understand your delta's and to ensure that you have ALL of the changes. For this reason I would generally suggest using a staging table at the destination server. By moving changes to staging and then loading to the final table you can more easily ensure that changes are applied correctly. Think of the scenario of a an increment being out of order (identity insert), datetime ordered incorrectly or 1 chunk failing. When using a staging table you don't have to rely solely on the id/date and can actually do joins on primary keys to look for changes.
Linked Servers proposed by Alex K. can be a great fit, but you will need to pay close attention to a couple of things. Always do it from Destination server so that it is a PULL not a push. Linked servers are fast at querying the data but horrible at updating/inserting in bulk. 1 XML column cannot be in the table at all. You may need to set some specific properties for distributed transactions.
I have done this task both ways and I would say that SSIS does give a bit of advantage over Linked Server just because of its robust error handling, threading logic, and ability to use different adapters (OLEDB, ODBC, etc. they have different performance do a search and you will find some results). But the key to your #4 is to do it in smaller chunks and from a staging table and if you can do it more often it is less likely to have an impact. E.g. daily means it would already be ~1/7th of the size as weekly assuming even daily distribution of changes.
Take 10,000,000 records changed a week.
Once weekly = 10mill
once daily = 1.4 mill
Once hourly = 59K records
Once Every 5 minutes = less than 5K records
And if it has to be once a week. just think about still doing it in small chunks so that each insert will have more minimal affect on your transaction logs, actual lock time on production table etc. Be sure that you never allow loading of a partially staged/transferred data otherwise identifying delta's could get messed up and you could end up missing changes/etc.
One other thought if this is a scenario like a reporting instance and you have enough server resources. You could bring over your entire table from production into a staging or update a copy of the table at destination and then simply do a drop of current table and rename the staging table. This is an extreme scenario and not one I generally like but it is possible and actual impact to the user would be very nominal.
I think SSIS is good at transfer data, my approach here:
1. Create a package with one Data Flow Task to transfer data. If the structure of two tables is different then it's okay, just map them.
2. Create a SQL Server Agent job to run your package every weekend
Also, feature Track Data Changes (SQL Server) is also good to take a look. You can config when you want to sync data and it's good at performance too
With SQL Server versions >2005, it has been my experience that a dump to a file with an export is equal to or slower than transferring data directly from table to table with SSIS.
That said, and in addition to the excellent points #Matt makes, this the usual pattern I follow for this sort of transfer.
Create a set of tables in your destination database that have the same table schemas as the tables in your source system.
I typically put these into their own database schema so their purpose is clear.
I also typically use the SSIS OLE DB Destination package's "New" button to create the tables.
Mind the square brackets on [Schema].[TableName] when editing the CREATE TABLE statement it provides.
Use SSIS Data Flow tasks to pull the data from the source to the replica tables in the destination.
This can be one package or many, depending on how many tables you're pulling over.
Create stored procedures in your destination database to transform the data into the shape it needs to be in the final tables.
Using SSIS data transformations is, almost without exception, less efficient than using server side SQL processing.
Use SSIS Execute SQL tasks to call the stored procedures.
Use parallel processing via Sequence Containers where possible to save time.
This can be one package or many, depending on how many tables you're transforming.
(Optional) If the transformations are complex, requiring intermediate data sets, you may want to create a separate Staging database schema for this step.
You will have to decide whether you want to use the stored procedures to land the data in your ultimate destination tables, or if you want to have the procedures write to intermediate tables, and then move the transformed data directly into the final tables. Using intermediate tables minimizes down time on the final tables, but if your transformations are simple or very fast, this may not be an issue for you.
If you use intermediate tables, you will need a package or packages to manage the final data load into the destination tables.
Depending on the number of packages all of this takes, you may want to create a Master SSIS package that will call the extraction package(s), then the transformation package(s), and then, if you use intermediate processing tables, the final load package(s).

SSIS data import with resume

I need to push a large SQL table from my local instance to SQL Azure. The transfer is a simple, 'clean' upload - simply push the data into a new, empty table.
The table is extremely large (~100 million rows) and consist only of GUIDs and other simple types (no timestamp or anything).
I create an SSIS package using the Data Import / Export Wizard in SSMS. The package works great.
The problem is when the package is run over a slow or intermittent connection. If the internet connection goes down halfway through, then there is no way to 'resume' the transfer.
What is the best approach to engineering an SSIS package to upload this data, in a resumable fashion? i.e. in case of connection failure, or to allow the job to be run only between specific time windows.
Normally, in a situation like that, I'd design the package to enumerate through batches of size N (1k row, 10M rows, whatever) and log to a processing table what the last successful batch transmitted would be. However, with GUIDs you can't quite partition them out into buckets.
In this particular case, I would modify your data flow to look like Source -> Lookup -> Destination. In your lookup transformation, query the Azure side and only retrieve the keys (SELECT myGuid FROM myTable). Here, we're only going to be interested in rows that don't have a match in the lookup recordset as those are the ones pending transmission.
A full cache is going to cost about 1.5GB (100M * 16bytes) of memory assuming the Azure side was fully populated plus the associated data transfer costs. That cost will be less than truncating and re-transferring all the data but just want to make sure I called it out.
Just order by your GUID when uploading. And make sure you use the max(guid) from Azure as your starting point when recovering from a failure or restart.

Reducing Size Of SQL Backup?

I am using SQL Express 2005 and do a backup of all DB's every night. I noticed one DB getting larger and larger. I looked at the DB and cannot see why its getting so big! I was wondering if its something to do with the log file?
Looking for tips on how to find out why its getting so big when its not got that much data in it - Also how to optimise / reduce the size?
Several things to check:
is your database in "Simple" recovery mode? If so, it'll produce a lot less transaction log entries, and the backup will be smaller. Recommended for development - but not for production
if it's in "FULL" recovery mode - do you do regular transaction log backups? That should limit the growth of the transaction log and thus reduce the overall backup size
have you run a DBCC SHRINKDATABASE(yourdatabasename) on it lately? That may help
do you have any log / logging tables in your database that are just filling up over time? Can you remove some of those entries?
You can find the database's recovery model by going to the Object Explorer, right click on your database, select "Properties", and then select the "Options" tab on the dialog:
Marc
If it is the backup that keeps growing and growing, I had the same problem. It is not a 'problem' of course, this is happening by design - you are just making a backup 'set' that will simply expand until all available space is taken.
To avoid this, you've got to change the overwrite options. In the SQL management studio, right-click your DB, TASKS - BACKUP, then in the window for the backup you'll see it defaults to the 'General' page. Change this to 'Options' and you'll get a different set of choices.
The default option at the top is 'Append to the existing media set'. This is what makes your backup increase in size indefinitely. Change this to 'Overwrite all existing backup sets' and the backup will always be only as big as one entire backup, the latest one.
(If you have a SQL script doing this, turn 'NOINIT' to 'INIT')
CAUTION: This means the backup will only be the latest changes - if you made a mistake three days ago but you only have last night's backup, you're stuffed. Only use this method if you have a backup regime that copies your .bak file daily to another location, so you can go back to any one of those files from previous days.
It sounds like you are running with the FULL recovery model and the Transaction Log is growing continuously as the result of no Transaction Log backups being taken.
In order to rectify this you need to:
Take a transaction log backup. (See: BACKUP(TRANSACT-SQL) )
Shrink the transaction log file down
to an appropriate size for your needs. (See:How to use DBCC SHRINKFILE.......)
Schedule regular transaction log
backups according to data recovery
requirements.
I suggest reading the following Microsoft reference in order to ensure that you are managing your database environment appropriately.
Recovery Models and Transaction Log Management
Further Reading: How to stop the transaction log of a SQL Server database from growing unexpectedly
One tip for keeping databases small would be at design time, use the smallest data type that you can use.
for Example you may have a status table, do you really need the index to be an int, when a smallint or tinyint will do?
Darknight
as you do a daily FULL backup for your Database , ofcourse it will get so big with time .
so you have to put a plan for your self . as this
1st day: FULL
/ 2nd day: DIFFERENTIAL
/ 3rd day: DIFFERENTIAL
/ 4th day: DIFFERENTIAL
/ 5th day: DIFFERENTIAL
and then start over .
and when you restore your database , if you want to restore the FULL you can do it easily , but when you need to restore the DIFF version , you backup the first FULL before it with " NO-recovery " then the DIFF you need , and then you will have your data back safely .
7zip your backup file for archiving. I recently backed up a database to a 178MB .bak file. After archiving it to a .7z file is was only 16MB.
http://www.7-zip.org/
If you need an archive tool that works with larger files sizes more efficiently and faster than 7zip does, I'd recommend taking a look at LZ4 archiving. I have used it for archiving file backups for years with no issues:
http://lz4.github.io/lz4/