Service fabric backup, partition ID - backup

If I have a statelful service fabric split into three partitions and want to backup to an external source, I can incorporate the partition ID of the service into the folder name of the external source containing the backup. When a restore is required the service instance can pass it's partition ID into the backup provider and so pull down the data for that particular partition.
My concern is that if there is a catastrophic failure and the service fabric needs to be rebuilt, the partitions will no longer have the same partition ID (it seems to be a Guid), in which case the restore process will not be able to find the backup for the new partition ID.
What is the recommended way to deal with this?
Instead of the PartitionID I am currently using the partition key, is this OK?

Yes that makes sense and you're doing it right. You can take a backup from one partition and restore it on a different partition in another service.
The target service must use the same partition count and type.
This way you can also use backup and restore to make a copy of an existing service, for debugging purposes for example.
Code from this project can help you create and restore SF backups, and help you store the data in an external store.

Related

Clean up and prevent excessive data accumulation in an MobileFirst Analytics 8.0 environment

Our analytics data is taking up almost 100% disk space on the file system. How do we remove the old er data, and prevent such situation from occurring again?
You can follow the url, https://mobilefirstplatform.ibmcloud.com/tutorials/en/foundation/8.0/installation-configuration/production/server-configuration/#setting-up-jndi-properties-for-mobilefirst-server-web-applications to setup JNDI properties in Mobilefirst. You need to
set the TTL values base on you business requirements, and keep the values as short as possible, so that huge data accumulation does not occur again. To clean up the existing data, you can perform the following
Setup the Analytics server with JNDI properties set for TTL and other configuration
Stop the Analytics Server
Delete the /analyticsData directory contents to discard any initial data (this will not affect as there is no data accumulated yet. So that there is no directories within the analyticsData directory) Note:
/analyticsData is the default location, please refer
http://mobilefirstplatform.ibmcloud.com/tutorials/en/foundation/8.0/installation-configuration/production/analytics/configuration/ to verify the actual value in your environment.
Restart the Analytics server. (Now the index will be created brand new with TTL in effect causing the proper data purging in place)

Delete records from SQL Server backup file

It is an insane idea to delete records from backup since the notion of backup is to serve on disaster. But in our case, data deletion is a valid use-case.
Requirement: in brief, we are in need of a system which is capable of deleting a specific record from an active database instance and from all its backups.
We have a fully functional internal system which is capable of performing the mentioned requirement of deleting data from active database. But what we don't know is how to do the same agonist all these database backups.
Question:
Is it possible to find a specific record from a backup?
Is there any predefined schema or data allocation style within SQL Server backup file, which allow us to isolate a specific record?
Can you share any thoughts or experience you have on such style of deletion?
Note: we take 2 full backup daily and store a week worth (14 in total) at any point in time.
I do understand the business concept of "deleted everywhere".
I do not know of any way to do this. I do not believe the format of the backup is even published. That doesn't mean that someone hasn't hacked it, but it certainly isn't a broadly known capability.
I think that, in order to do this, you will need to securely wipe all copies of backups and take new backups. You then lose the point in time recovery capability.
Solution: The way that I would address this business requirement is to recover each backup, delete the desired record(s), secure wipe the backup media (or destroy the old media and use new media), and then take a new backup of THAT recovered version. That will give you a point in time recovery of that data without the specific record(s).
You can't modify the contents of a .bak file. You shouldn't want to do that either. If you want to restore to a specific point in time you should use the Full recovery model and take differential and log backups instead of just full backups.

Master data services deployment

What is the best approach to keep Production,dev and test enviroments in sync?
We have Master Data Services database in our development, Test and Production environments. Data is been entered into Production and we need to keep our test and development servers in Sync. I couldn't find the documentation to handle this.
I am not sure if this process is correct-
For moving updated data from Development we are following this process-
create second version of the model and make the changes in it and then deploy the 2nd version to test and prod.
Can we do this same above process from Production to test and Development to keep them in Sync?
Thanks
Two options come to mind:
Snapshot replication
Snapshot replication distributes data exactly as it appears at a specific moment in time and does not monitor for updates to the data. When synchronization occurs, the entire snapshot is generated and sent to Subscribers.
Log shipping
SQL Server Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances. The transaction log backups are applied to each of the secondary databases individually.
MDS has tool which is called MDSModelDeploy. You can create package with all business rules, schema and data. Ship it over to some other machine and.
clone model (preserving keys, etc)
update model
More information here

Staging in ETL: Best Practices?

Currently, the architecture I work with takes a few data sources out of which one is staged locally because it's hosted in the cloud. The others are hosted locally anyway, so the ETL I perform takes it directly from the source. I don't really see the point in creating a stage for the other sources.
1) Is there a distinct benefit to duplicating the locally hosted source into a local stage?
2) Is it a better idea to host the stage on a separate machine or the same one as the Warehouse?
3) If I'm trying to reduce my ETL time, what's a good way to do so? I was considering partitioning my data so that the important information is pulled more frequently than the "archived data". Is this a good approach, and what are my alternatives?
#omgitsdev There are a few concepts I would like to clarify.
Your files can be hosted anywhere - locally or on cloud
The files are loaded into a temporary table to be loaded into your Data Warehouse. This process is called staging.
Conceptually you can have your staging area anywhere; however to reduce connectivity issues, we create a separate schema in the same database and stage them. This will ensure that your performance is not hampered by connectivity issues.
you generally partition your fact table by the column which holds the date; this is easier and also the most recent partitions hold the latest data;
Based on the volume, you either make it a monthly, quarterly or yearly partition; there are situations where we also create daily or hourly partitions.
Your performance can also be accelerated by ensuring that the staging tables are in a separate disk from the data warehouse tables.

Creating tables in SQL Server 2005 master DB

I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)
The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).
The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.
I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).
You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx