We have an old repair database that has alot of relational tables and it works as it should but i need to update it to be able to handle different clients ( areas ) - currenty this is done as a single client only.
So i need to extend the tables and the sql statements so ex i can login as user A and he will see his own system only and user B will have his own system too.
Is it correctly understood that you wouldnt create new tables for each client but just add a clientID to every record in every ( base ) table and then just filter with a clientid in all sql statements to be able to achieve multiple clients ?
Is this also something that would work ( how is it done ) on hosted solutions ? Am worried about performance if thats an issue lets say i had 500 clients ( i wont but from a theoretic viewpoint ) ?
The normal situation is to add a client key to each table where appropriate. Many tables don't need them -- such as reference tables.
This is preferred for many reasons:
You have the data for all clients in one place, so you can readily answers a question such as "what is the average X for each client".
If you change the data structure, then it affects all clients at the same time.
Your backup and restore strategy is only implemented once.
Your optimization is only implemented once.
This is not always the best solution. You might have requirements that specify that data must be separated -- in which case, each client should be in a separate database. However, indexes on the additional keys are probably a minor consideration and you shouldn't worry about it.
This question has been asked before. The problem with adding the key to every table is that you say you have a working system, and this means every query needs to be updated.
Probably the easiest is to create a new database for each client, so that the only thing you need to change is the connection string. This also means you can get automated query tools for example to work without worrying about cross-client data leakage.
And it also allows you to backup, transfer, delete a single client easily as well.
There are of course pros and cons to this approach, but it will simplify the development effort. Also remember that if you plan to spin it up in a cloud environment then spinning up databases like this is also very easy.
Related
I am working on Asp.Net MVC web application, back-end is SQL Server 2012.
This application will provide billing, accounting, and inventory management. The user will create an account by signup. just like http://www.quickbooks.in. Each user will create some masters and various transactions. There is no limit, user can make unlimited records in the database.
I want to keep stable database performance, after heavy data load. I am maintaining proper indexing and primary keys in it, but there would be a heavy load on the database, per user.
So, should I create a separate database for each user, or should maintain one database with UserID. Add UserID in each table and making a partition based on UserID?
I am not an expert in SQL Server, so please provide suggestions with clear specifications.
Please inform me if there is any lack of information.
A DB per user is what happens when customers need to be able pack up and leave taking the actual database with them. Think of a self hosted wordpress website. Or if there are incredible risks to one user accidentally seeing another user's data, so it's safer to rely on the servers security model than to rely on remembering to add the UserId filter to all your queries. I can't imagine a scenario like that, but who knows-- maybe if the privacy laws allowed for jail time, I would rather data partitioned by security rules rather than carefully writing WHERE clauses.
If you did do user-per-database, creating a new user will be 10x more effort. While INSERT, UPDATE and so on stay the same from version to version, with each upgrade the syntax for database, user creation, permission granting and so on will evolve enough to break those scripts each SQL version upgrade.
Also, this will multiply your migration headaches by the number of users. Let's say you have 5000 users and you need to add some new columns, change a columns data type, update a trigger, and so on. Instead of needing to run that change script 1x, you need to run it 5000 times.
Per user Dbs also probably wastes disk space. Each of those databases is going to have a transaction log, sitting idle taking up the minimum log space.
As for load, if collectively your 5000 users are doing 1 billion inserts, updates and so on per day, my intuition tells me that it's going to be faster on one database, unless there is some sort of contension issue (everyone reading and writing to the same table at the same time and the same pages of the same table). Each database has machine resources (probably threads and memory) per database doing housekeeping, so these extra DBs can't be free.
Anyhow, the best thing to do is to simulate the two architectures and use a random data generator to simulate load and see how they perform.
It's not an easy answer to give.
First, there is logical design to be considered. Then you have integrity, security, management and performance (in this very order).
A database is a logical unit of data, self contained. Ideally, you should be able to take a database, move it to another instance, probably change the connection strings and be running again.
All the constraints are database-level. No foreign keys can exist referencing some object outside the database.
So, try thinking in these terms first.
How would you reliably prevent one user messing up the other user's data? Keep in mind that it's just a matter of time before someone opens an excel sheet and fire up queries on the database bypassing your application. Row level security in SQL Server is something you don't want to deal with.
Multiple databases mean that all management tasks should be scripted out and executed on all databases. Yes, there is some overhead to it, but once you set it up it's just the matter of monitoring. If a database goes suspect, it's a single customer down, not all of them. You can even have different versions for different customes if each customer have it's own database. Additionally, if you roll an upgrade, you can do it per customer, so the inpact will be much less.
Performance is the least relevant factor here. Of course, it really depends on how many customers and how much data, but proper indexing will solve these issues. Scale-out is much easier with multiple databases.
BTW, partitioning, as you mentioned it, is never a performance booster, it's simply a management feature, allowing for faster loading and evicting of data from a table.
I'd probably put each customer in separate database, but it's up to you eventually to make a decision for yourself. Hope I've helped some with this.
I'm working on creating an application best described as a CRM. There is a relatively complex table structure, and I'm thinking about allowing users to do a fair bit of customization (adding fields and the like). One concern is that I will be reaching a certain level of scale almost immediately. We have about 50,000 individual users who will be coming online within about nine months of launch. So I want to build to last.
I'm thinking about two and maybe even three options.
One table set with a userID column on everything and with a custom attributes table created by creating a table which indexes custom attributes, then another table which has their values, which can then be joined to the existing contact records for the user. -- From what I've read, this seems like the right option, but I keep feeling like it's not. It seems like once these tables start reaching the millions of records searching for just one users records in every query is going to become a database hog.
For each user account recreate the table set, preened with a unique identifier (the userID for example.) Then rather than using a WHERE userID=? everywhere I can use a FROM ?_contacts. For attributes I could then have a custom attributes table where users could add additional columns for custom attributes. -- This feels like the simplest way to go, though, of course when I decide to change the database structure there would be a migration from hell.
The third option, which I'm pretty confident is wrong, but for that reason alone I can not rule out, is that a new database should be created for each user with all the requisite tables.
Am I crazy? Is option one really the best?
The first method is the best. Create individual userId's and then you can assign specific roles to them. A database retrieval time indeed depends on the number of records too. But, there is a trade-off where you can write efficient sql queries to fetch data. Well, according to this site, you will probably won't run out of memory or run into concurrency issues, because with a good server, the performance ought to be good, provided that you are efficient in writing queries.
If you recreate table sets, you will just end up creating lots of tables and can make the indexing slow which is a bad practice. Whereas if you opt of relational database scheme rather than an ordinary database scheme, and normalize the database and datatables for improving efficiency.
Creating a new database for each and every user, just sums up the complexity from both the above statements resulting in a shabby and disorganized database access. Because, if you decide to run individual instances of databases for every single user, you would just end up consuming your servers physical resources like RAM and CPU usage which will affect the service quality of all the other users.
Take up option 1. Assign separate userIds and assign them roles and privileges where needed. That is more efficient than the other two methods.
I am developing a VB.Net application. That application might be working on a LAN. MS Access as a back end will be used. I have developed many single user applications, but don't know of multi user , LAN, manage DB etc. How do I make the program as Multi user on LAN. Data will be accessed at the same time. How to manage such things.
Please give me some help and Guidance.
Thanks
Your VB application does not care how many people run it.
Your database, with MS Access, has some serious issues with multiple users. Get away from it if you can. SQL Server has a free version called SQL Express. If you only plan on 2 people, you might be OK with Access for a while but be prepared to support it more.
That was all the easy stuff, now you have to think about how you are going to handle multiple users trying to access and update the same data (concurrency).
Imagine this, you are a user looking at employee record 1 and so is someone else. You change the birthday and save. The the other user changes thier suppervisor and saves. How do you know something changed? What do you do if something changed? These are questions I cannot answer for you, you must decide based on your situation.
There are 2 main types of concurrency, optimistic and pessimistic. See this link for a great explaination and discussion on them: optimistic-vs-pessimistic-locking
You can look at this on a table-by-table basis.
If a table is never updated, you dont have to worry about concurrency
If a table is rarely updated, like a table of states, you can decide if it is worth the extra effort to add concurrency.
Everything else, pretty much should have some type of concurrency.
Now, the million dollar question, how?
You will find as many ways to handle concurrency as you will find colors in the rainbow. Here are some of the ones I like:
Simple number that you increment with each save. Small and easy.
DateTime stamp - As long as you dont expect to ever have 2 people save the same record during the same second, this is easy. (I personally dont like it by it's self)
User Name - Pretty simple gives a little bit of an audit by knowing who last inserted/edited the record but doesn't handle an issue I have seen to often. Imagine the same senerio as above but you had 2 instances of record 1. Now you change the data again, maybe supervisor, and when you save, you overwrite the changes from your first save with those of the second save.
Guid - VB can create a guid, SQL Server can create a guid and so can Access. It is nice an unique and most important, you can create it on the client so you dont have to requery the database after you save the record to get a refreshed record.
Combination of these. I like 2 and 3 myself. Gives a mini audit and is unique to the user.
If you use a DataAdapter, by default, MS will assume concurrency checking means to compare EVERY field to make sure it did not change. This works, but is completely un-scaleable and should not be done.
All of this depends on the size of your application and how you see it being used. Definately do some more research before you settle on a decision.
There are a number of solutions here.
If I may suggest a drastic alternative, have you considered pairing the client running on the user's computer with a server component (through a web service)? A simpler alternative would be for the client to talk directly to a SQL Server (or other database) instance through the network?*
*I'm not a fan of having client side apps talk directly to the database. It will mean maintenance headaches in the future, but I
included it to give you options
.
I found this random example via Google so YMMV.
For an application I am writing, I need to be able to identify when new data is inserted into several tables of a database.
The problem is two fold, this data will be been inserted many times per minute into sometimes very large databases (and I need to be sensitive to demand / database polling issues) and I have no control of the application creating this data (so as far as I know, I can't use the notify / listen functionality available within postgres for exactly this kind of task*).
Any suggestion regarding a good strategy would be much appreciated.
*I believe the application controlling this data is using the notify / listen functionality itself, but I haven't a clue how (if at all possible) to know what the "channel" it uses externally and if it is ever able to latch on to that.
Generally, you need something in the table that you can use to determine newness, and there are a few approaches.
A timestamp column would let you use the date but you'd still have the application issue of storing a date outside of your database, and data that isn't in the database means another realm of data to manage. Yuck.
A tracking table that stored last update/insert timestamps on a per-table basis could give you what you want. You'd want to use a trigger to maintain the last-DML timestamp.
A solution you don't want to use is a serial (integer) id that comes from nextval, for any purpose than uniqueness. The standard/common mistake is to presume serial keys will be contiguous (they're not) or monotonic (they're not).
We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.