How to handle the database backup and restore in different environments?

How to handle the database backup and restore in different environments? - directus

I am using Directus in prod and multiple non-prod environments. In prod, users will only be updating content in collections. In non-prod, admins will be testing all other updates regarding new collections, user roles, permissions, etc. I plan on backing up tables related to collection content in prod and restoring them to lower environments in order to keep data up to date. Once testing of new collections or other admin changes are done in non prod environments I plan on backing them up and then restoring the tables related to these changes up to prod. Right now, the plan is to write shell scripts using mysqldump and mysqlimport to perform these operations.
My question is what tables relating to collection content do I need to backup in prod and move to lower environments? Also, could I use the method described here to not overwrite any activity records and avoid losing data?

Mission critical tables would be:
directus_collections (what collections are there)
directus_fields (what fields are in those collections)
directus_relations (which collections are connected to each other)
I'd personally copy over all the directus_* tables. Even if you don't necessarily have to move over things like activity, I'd play it safe and make sure you don't run into any weird issues later on.
Also, could I use the method described here to not overwrite any activity records and avoid losing data?
If you're not going to edit any actual content in the non-prod environments, you should be able to ignore directus_activity and directus_revisions, as those two are directly related to the raw content.

Related

Should I create separate SQL Server database for each user?

I am working on Asp.Net MVC web application, back-end is SQL Server 2012.
This application will provide billing, accounting, and inventory management. The user will create an account by signup. just like http://www.quickbooks.in. Each user will create some masters and various transactions. There is no limit, user can make unlimited records in the database.
I want to keep stable database performance, after heavy data load. I am maintaining proper indexing and primary keys in it, but there would be a heavy load on the database, per user.
So, should I create a separate database for each user, or should maintain one database with UserID. Add UserID in each table and making a partition based on UserID?
I am not an expert in SQL Server, so please provide suggestions with clear specifications.
Please inform me if there is any lack of information.

A DB per user is what happens when customers need to be able pack up and leave taking the actual database with them. Think of a self hosted wordpress website. Or if there are incredible risks to one user accidentally seeing another user's data, so it's safer to rely on the servers security model than to rely on remembering to add the UserId filter to all your queries. I can't imagine a scenario like that, but who knows-- maybe if the privacy laws allowed for jail time, I would rather data partitioned by security rules rather than carefully writing WHERE clauses.
If you did do user-per-database, creating a new user will be 10x more effort. While INSERT, UPDATE and so on stay the same from version to version, with each upgrade the syntax for database, user creation, permission granting and so on will evolve enough to break those scripts each SQL version upgrade.
Also, this will multiply your migration headaches by the number of users. Let's say you have 5000 users and you need to add some new columns, change a columns data type, update a trigger, and so on. Instead of needing to run that change script 1x, you need to run it 5000 times.
Per user Dbs also probably wastes disk space. Each of those databases is going to have a transaction log, sitting idle taking up the minimum log space.
As for load, if collectively your 5000 users are doing 1 billion inserts, updates and so on per day, my intuition tells me that it's going to be faster on one database, unless there is some sort of contension issue (everyone reading and writing to the same table at the same time and the same pages of the same table). Each database has machine resources (probably threads and memory) per database doing housekeeping, so these extra DBs can't be free.
Anyhow, the best thing to do is to simulate the two architectures and use a random data generator to simulate load and see how they perform.

It's not an easy answer to give.
First, there is logical design to be considered. Then you have integrity, security, management and performance (in this very order).
A database is a logical unit of data, self contained. Ideally, you should be able to take a database, move it to another instance, probably change the connection strings and be running again.
All the constraints are database-level. No foreign keys can exist referencing some object outside the database.
So, try thinking in these terms first.
How would you reliably prevent one user messing up the other user's data? Keep in mind that it's just a matter of time before someone opens an excel sheet and fire up queries on the database bypassing your application. Row level security in SQL Server is something you don't want to deal with.
Multiple databases mean that all management tasks should be scripted out and executed on all databases. Yes, there is some overhead to it, but once you set it up it's just the matter of monitoring. If a database goes suspect, it's a single customer down, not all of them. You can even have different versions for different customes if each customer have it's own database. Additionally, if you roll an upgrade, you can do it per customer, so the inpact will be much less.
Performance is the least relevant factor here. Of course, it really depends on how many customers and how much data, but proper indexing will solve these issues. Scale-out is much easier with multiple databases.
BTW, partitioning, as you mentioned it, is never a performance booster, it's simply a management feature, allowing for faster loading and evicting of data from a table.
I'd probably put each customer in separate database, but it's up to you eventually to make a decision for yourself. Hope I've helped some with this.

Database of images and text

background:
I'm in the design phase of building an app.
I want the app to display text and images, the problem is that I will have A LOT of them. hundreds to thousands.
This is my largest app so far, and I am unsure on how to handle all the data.
The question???????:
What would be the best way to store and access these images and text?
Would I use a formal database approach like SQL?
Or would it be better to navigate files/folders e.g. dropping all the files in res/drawable?
potentially useful facts:
The database will be stored and accessed natively so it can be accessed off-line.
The user will not be adding to the database in anyway, only accessing the data.
the database will be updated every 6 months.
The application 'page' will display 1-5 images along with several blocks of text.
Concept:
the app will be like a recipe app...the user will pick some parameters e.g. ingredients, type, diet.. then select a recipe. And then several images and blocks of text will be displayed showing and detailing the process of some recipe.
I apologize if this is repeated but I didn't see a specific answer for my purposes.

The "Best" approach will depend on the functionality of the database server in question.
Generally, you should store the images "In" the database until that becomes a performance issue. Once you start storing images "Outside" of the database you will have to handle all the issue that are normally taken care of by the database. Disk space management, orphan records, file name conflicts, folder file limits, to name just a few. Depending on your situation these may be big issues or thay may be nothing to worry about.
I've seen several application where images (or attachements) were kept "Outside" the database, and in each case it was done poorly. There are just so many issues to handle, and most developers don't even think of half of them. In many cases the performance of storing the images "In" the databse was acceptable, but the developers decided against it because they just knew it would not perform well.

If your using SQL server 2008 the Filestream data type is ideal for your case. It stores the binary files outside of the database but behaves as a normal field. Also you are able to read/write the files using a stream instead of getting/setting the whole file as a byte array (like when using varbin(max))
If you don't have this functionality in your database, I would recommend storing the images outside of the DB

Its probably a better idea to use a file based approach for deployed static resources.
At the very least because taking a dependency on file system is typically easier to manage then taking a dependency on a DB.
Also this line indicates some sort of non-web client
The database will be stored and accessed natively so it can be accessed off-line."
This means if you go with the DB approach you'll have a couple of other interesting problems
Deployment
Depending on the platform deploying a DB can be a real bear depending on your target platform. What happens if they if already have the engine but its a different version.
Resources
Is your DB going to be client/server based (like MySQL/SQL Server etc)? If so then your app has to now manage the current state of its process. If not then you'll be using a file-based db SQL Lite/MS Access, at which point I would question why using a static DB is worth doing at all.
One final note. There's nothing stopping your Content Production environment from using a DB. Its quite common for Content producers to maintain a database for their content that will you will later use to produce the files for publishing/deployment.

Redis databases on a dev machine with multiple projects

How do you manage multiple projects on your development and/or testing machine, when some of those projects use Redis databases?
There are 2 major problems:
Redis doesn't have named databases (only numbers 0-16)
Tests are likely to execute FLUSHDB on each run
Right now, I think we have three options:
Assign different databases for each project, each dev and test environment
Prefix keys with a project name using something like redis-namespace
Nuke and seed the databases anytime you switch between projects
The first one is problematic if multiple projects assign "0" for the main use and "1" for the test and such. Even if Project B decided to change to "2" and "3", another member in the project might have a conflict in another projects for him. In other words, that approach is not SCM friendly.
For the second one, it's a bad idea simply because it adds needless overhead on runtime performance and memory efficiency. And no matter what you do, another project might be already using the same key coincidentally when you joined the project.
The third option is rather a product of compromise, but sometimes I want to keep my local data untouched while I deploy small patches for another projects.
I know this could be a feature request for Redis, but I need a solution now.
Any ideas, practices?

If the projects are independent and so do not need to share data, it is much better to use multiple redis instances - each project configuration has a port number rather than a database name/id. Create an appropriately named config file and startup script for each one so that you can get whichever instance you need running with a single click.
Make sure you update the save settings in each config file as well as setting the ports - Multiple instances using the same dump.rdb file will work, but lead to some rather confusing bugs.
I also use separate instances for development and testing so that the test instance never writes anything to disk and can be flushed at the start of each test.

Redis is moving away from multiple databases, so I would recommend you start migrating put of that mechanism sooner rather than later. This means one instance per db. Given the very low overhead of running Redis, this isn't a problem from a resources standpoint.
That said, you can specify the number of databases, and providing A naming standard would work. For example, configure redis to have say, 60 DBS and you add 10 for the test db. For example db3 uses db13 for testing.
It sounds like your dev, test, and prod environments are pretty tied together. If so, I'd suggest moving away from that. Using separate instances is the easiest route to that, and provides protection against cross purpose contamination. Between this and the future of redis being single-db per instance, separate instances is the best route.

Creating tables in SQL Server 2005 master DB

I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)

The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).

The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.

I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).

You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?

Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.

We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)

What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.

I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks

Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas