many-to-many database - sql

I'm trying to create a database to analyze the configurations of my servers.
But i have many services that can run on many servers (for failover/load balancing). Also, the configuration of a same service can change from one server to an other, this is why I can't just have a service table
I tried to link the different tables using a single table that tie them all together. I think i'm in NF3 but i'm not 100% sure.
Is that a "valide" database design? database design
I fear that the request to find stuff in the database are going to be a bit complicated.
thank you

It would help more if you showed the actual db design. But...
If you have two tables associated in a many to many relationship, you will need a table in between them that represents the relationship. Tables normally represent entities in the real world, and foreign key represent the relationships. But in a many to many relationship you need a table to handle that complexity.
That table represents and could be called ServiceRunningOnServer and it's primary key should be the combination of a ServiceId (with a foreign key to Service.Id), and a ServerId (with a foreign key to Server.Id).
Any setting for a service that is across the board (not server specific) is an attribute of the service entity and so belongs in the Service table. But any setting that is specific to the server it is running on, is an attribute of the relationship between that service and that server and so it belongs in the ServiceRunningOnServer table.
Yes, this is a perfectly normalized db design. And actually it is the design with the most optimized complexity. Meaning that other designs might make some things easier, but they will also make other things harder. In the end, and in the total sum of things, other designs will over-complicate things. This design will keep the total complexity of adding, updating, reading and deleting data from your database to a minimum.

Related

Key relationships(Primary key, Foreign Key) in SQL Server Management Studio

Why do we define key relationships in SQL Management Studio?. I am a student and teacher wanted this. In fact I think that It is not necessary, because I can connect tables with sql queries.
For example: We have EMPLOYEE and MANAGER table.
EMPLOYEE has ID, Name, SuperID(supervisorID in MANAGER)
MANAGER has ID, Name
In fact teacher said that "you should connect two tables in the SQL Server Management, for Employee's SuperID is foreign key of MANAGER's ID and "
I can connect(i.e I can see connection of two tables) two tables with sql query like:
SELECT e.*,m.Name as 'Manager's Name' FROM EMPLOYEE e, MANAGER m
WHERE e.SuperTC = m.TC
It is ok but why do we connect two table in SQL Management Studio?(Primary Key, Foreign Key section in SQL Management Studio)
If we don't connect what will be happen? Is it necessary when we work in professional company?
To enforce referential integrity and prevent orphaned records. It tells the database what it needs to know to make it impossible to store data that is not valid according to your business rules. If you make it impossible to store invalid data, then your data will be valid. Valid data makes everybody happy.
If each EMPLOYEE must have a MANAGER, then tell the database that and it won't let you create an EMPLOYEE record without a MANAGER reference.
If a MANAGER needs to be deleted but you still have an EMPLOYEE that lists that id as a MANAGER, the database won't let you delete the MANAGER (or can be told to also delete the EMPLOYEE with cascaded deletes).
If the key of a MANAGER needs to change, a cascaded update rule on a foreign key will automatically update any associated EMPLOYEE records automatically.
If we don't connect what will be happen?
#DeanKuga raises an important point. Most large customers of major applications will want or need direct database access. This may be to use third party reporting applications, mass data manipulation your front-end doesn't provide, or imports or exports to other systems they own.
Here's something a lot of developers don't think about: Even though you're only licencing the application to your customers and you will still own the application, the customer owns the data. It's their data in every sense. Your rights as an independent vendor end with copyrights and patents on the program code. If the customer wants to connect to the RDBMS they have a license for on the server they own to read the data they own, that is their right and they absolutely will do that. That is not reverse engineering. Indeed, this capability is one of the primary and express purposes of using a general RDBMS and it's one reason why customers love them. They can always get their data out. Customers do not want giant information silos that prevent them from moving data to where they need it. Whatever the application is that you sold them, it almost certainly will not do everything that the business needs. They will have multiple applications and they all need to communicate to work together.
Yes it is necesarry when going into pro. In fact, when you specify a foreign key field in a table, that refers to a primary key to an other one, that field will be indexed. And Indexes in SQL Server presents a large gain in performance. Indexes in SQL Server will improve the fetching performance for the SQL Server Engine. Adding Indexes is like putting your field values into a dictionnary and we all know that finding a word in a dictionnary is a lot more esaier than finding it in a list of non-ordered words. The performance is not remarkable for small data, but it will be necessary for big data, that's why when going pro, foreign keys will be necessary since we are dealing with large scale databases. Referential integrity is the second reason, because your database tables are the concrete representation of a certain design that is meant to be preserving your data integrity. If you design a something and you do not put it in place, why will we be designing databases? With foreign keys, you are sure that your design in conform to your tables implementation.

What is the best structure to separate all tables per client

For instance, i have these entities
Client : table
TransactionA : table
TransactionB : table
..
TransactionZ : table
TransactionA to TransactionZ table is referenced to Client
in database structure, i've been thinking of creating new table TransactionA for every new Client registered and has a schema with the Client.Code so it looks like clientA.tbl_TransactionA.
with this structure, i think my database would generate thousands of table depending on how many clients will register which i think that it is hard in maintenance if there's a modification in core.
I would like to ask for your opinion on the best approach on this matter, advantage and disadvantage.
PS:
I am using Entity Framework (code first), MSSQL
Thanks in advance.
Creating a table per client would not be a good idea on many levels. To pick one of the more obvious ones, using Entity Framework you would have to alter and recompile your code each time you wanted to add a client. You'd probably have to use reflection or to figure out which client DbSet to reference when seeking a transaction.
It isn't clear what has driven you to this design consideration, but it would seem obvious that the more reasonable model would be to have a Transactions table that had a foreign key / navigation property to the Client table. I assume there's some good but unstated reason why this would not suffice, though.

Fetch an entity's read-only collection from a separate database

I'm building a new NHibernate 3.3 application that must connect to a legacy system in order to look up some information about my users. There's a separate, read-only, database that holds course enrollments that I'd like to use to populate a collection on my Student entity. These would be components in NHibernate-speak, consisting of a department code and course and section numbers, like "MTH101 sec. 2"
The external database has a surrogate key, the student number, which corresponds to a property in my User entity, but it's not the primary key of a Student.
These databases are on separate servers. I can't change the legacy database,
Do I have a hope of mapping the enrollments collection as NHibernate components?
Two Options
When you have multiple databases or multiple database servers that you're trying to link together in a single domain model using NHibernate, you basically have two options.
Leverage the database server's capabilities (linked servers, etc.) to join the data so that NHibernate only has to worry about connecting to one database. In your NHibernate mappings, you fully specify the table attribute so that the database server knows to query against the other database server. For your "surrogate key, ... not the primary key", you could map this using <many-to-one property-ref="...">.
Use multiple NHibernate session factories, one for each database. You would be responsible for coordinating what gets loaded from which database. You configure each session factory for just the tables that exist in that database and with the appropriate connection string. Then, to load the data, you execute two queries, one against one database, and another against the other database.
Which one?
Which is the right choice? It depends...
Available features
If your database server doesn't have any features to support #1, or if there are other things preventing you from using those features, then you obviously have to use #2.
Cross-DB where Clauses
#1 gives you more flexibility when writing queries - you could specify where clauses that span both databases if you needed to, though you need to be careful that the query you write doesn't require database A to fetch tons of data from database B. With method #2 you execute a second query to get what you need from database B, which forces you to be more conscious about exactly what data you have to fetch from each database to get the job done.
Unenforced relationship
There won't be any foreign keys enforcing the relationship because the data lives in two different databases. NHibernate (very reasonably) assumes that database relationships are enforced by foreign keys. Since there's a chance these two databases could be out of sync, #1 will require you to resort to things like not-found="ignore", which has performance implications.
Complexity of Deployment
Inter-database relationships make deploying to various environments (DEV, QA, PROD) difficult. You can't just deploy the application and database, and make sure the application's connection strings are pointing at the correct databases; instead you also have to make sure that any references inside the databases to other databases are pointing to the correct places.
Given all of the above factors, I usually lean towards option #2, but there are some situations where #1 is just so much more convenient.

Database Client Specific Tables v/s Relational Tables

I have a scenario, my application is a SAAS based app catering to multiple clients. Data Integrity to clients is very essential.
Is it better to keep my Tables
Client specific
OR
Relational Tables
For Ex: I have a mapping table with fields MapField1,MapField2. I need this kind of data for each client.
Should I have tables like MappingData_
or a Single Table with mapping to the ClientId
MappingData with Fields MapField1,MapField2,ClientId
I would have a separate database for each customer. (Multiple databases in a single SQL Server instance.)
This would allow you to design it once, with a single schema.
No dynamically named tables compromising test & development
Upgrades and maintenance can be designed and tested in one DB, then rolled out to all
A single customer's data can be backed-up, restored or dropped exceedingly simply
Bugs discovered/exploited in one DB won't comprise the integrity of other DBs
Data access (read and write) can be managed using SQL Logins (No re-inventing the wheel)
If there is a need for globally shared data, that would go in another database, with it's own set of permissions for the different SQL Logins.
The use of a single database, with all users in it is my next best choice. You still have a single schema. But you don't get to partition the customers' data, you need to manage access rights and permissions yourself, and a whole host of other additional design and testing work.
I would never go near dynamically creating new tables for additional customers. A new table name means all your queries need to be updated with the new table name, and a whole host of other maintenance head-aches.
I'm pretty much of the opinion that if you want to create tables dynamically during the Business As Usual use of an application/service, you've designed it badly.
SO has a tag for the thing you're describing: "multi-tenant".
Visualize the architecture for supporting a multi-tenant database application as a spectrum. At one extreme of the spectrum is "shared nothing", which means each tenant has its own database. At the other extreme of the spectrum is "shared everything", which means tenants share tables, and each row in each table belongs to one tenant. (Each row contains a tenant identifier.)
Terminology seems to overlap, so read carefully. What one writer means by shared schema might be identical to what another writer means by shared everything.
This SO answer, also written by me, describes the differences and the tradeoffs in terms of cost, data isolation and protection, maintenance, and disaster recovery. It also links to a fairly good introductory article.

Ideas for Combining Thousand Databases into One Database

We have a SQL server that has a database for each client, and we have hundreds of clients. So imagine the following: database001, database002, database003, ..., database999. We want to combine all of these databases into one database.
Our thoughts are to add a siteId column, 001, 002, 003, ..., 999.
We are exploring options to make this transition as smoothly as possible. And we would LOVE to hear any ideas you have. It's proving to be a VERY challenging problem.
I've heard of a technique that would create a view that would match and then filter.
Any ideas guys?
Create a client database id for each of the client databases. You will use this id to keep the data logically separated. This is the "site id" concept, but you can use a derived key (identity field) instead of manually creating these numbers. Create a table that has database name and id, with any other metadata you need.
The next step would be to create an SSIS package that gets the ID for the database in question and adds it to the tables that have to have their data separated out logically. You then can run that same package over each database with the lookup for ID for the database in question.
After you have a unique id for the data that is unique, and have imported the data, you will have to alter your apps to fit the new schema (actually before, or you are pretty much screwed).
If you want to do this in steps, you can create views or functions in the different "databases" so the old client can still hit the client's data, even though it has been moved. This step may not be necessary if you deploy with some downtime.
The method I propose is fairly flexible and can be applied to one client at a time, depending on your client application deployment methodology.
Why do you want to do that?
You can read about Multi-Tenant Data Architecture and also listen to SO #19 (around 40-50 min) about this design.
The "site-id" solution is what's done.
Another possibility that may not work out as well (but is still appealing) is multiple schemas within a single database. You can pull common tables into a "common" schema, and leave the customer-specific stuff in customer-specific schema. In some database products, however, the each schema is -- effectively -- a separate database. In other products (Oracle, DB2, for example) you can easily write queries that work in multiple schemas.
Also note that -- as an optimization -- you may not need to add siteId column to EVERY table.
Sometimes you have a "contains" relationship. It's a master-detail FK, often defined with a cascade delete so that detail cannot exist without the parent. In this case, the children don't need siteId because they don't have an independent existence.
Your first step will be to determine if these databases even have the same structure. Even if you think they do, you need to compare them to make sure they do. Chances are there will be some that are customized or missed an upgrade cycle or two.
Now depending on the number of clients and the number of records per client, your tables may get huge. Are you sure this will not create a performance problem? At any rate you may need to take a fresh look at indexing. You may need a much more powerful set of servers and may also need to partion by client anyway for performance.
Next, yes each table will need a site id of some sort. Further, depending on your design, you may have primary keys that are now no longer unique. You may need to redefine all primary keys to include the siteid. Always index this field when you add it.
Now all your queries, stored procs, views, udfs will need to be rewritten to ensure that the siteid is part of them. PAy particular attention to any dynamic SQL. Otherwise you could be showing client A's information to client B. Clients don't tend to like that. We brought a client from a separate database into the main application one time (when they decided they didn't still want to pay for a separate server). The developer missed just one place where client_id had to be added. Unfortunately, that sent emails to every client concerning this client's proprietary information and to make matters worse, it was a nightly process that ran in the middle of the night, so it wasn't known about until the next day. (the developer was very lucky not to get fired.) The point is be very very careful when you do this and test, test, test, and test some more. Make sure to test all automated behind the scenes stuff as well as the UI stuff.
what I was explaining in Florence towards the end of last year is if you had to keep the database names and the logical layer of the database the same for the application. In that case you'd do the following:
Collapse all the data into consolidated tables into one master, consolidated database (hereafter referred to as the consolidated DB).
Those tables would have to have an identifier like SiteID.
Create the new databases with the existing names.
Create views with the old table names which use row-level security to query the tables in the consolidated DB, but using the SiteID to filter.
Set up the databases for cross-database ownership chaining so that the service accounts can't "accidentally" query the base tables in the consolidated DB. Access must happen through the views or through stored procedures and other constructs that will enforce row-level security. Now, if it's the same service account for all sites, you can avoid the cross DB ownership chaining and assign the rights on the objects in the consolidated DB.
Rewrite the stored procedures to either handle the change (since they are now referring to views and they don't know to hit the base tables and include SiteID) or use InsteadOf Triggers on the views to intercept update requests and put the appropriate site specific information into the base tables.
If the data is large you could look at using a partioned view. This would simplify your access code as all you'd have to maintain is the view; however, if the data is not large, just add a column to identify the customer.
Depending on what the data is and your security requirements the threat of cross contamination may be a show stopper.
Assuming you have considered this and deem it "safe enough". You may need/want to create VIEWS or impose some other access control to prevent customers from seeing each-other's data.
IIRC a product called "Trusted Oracle" had the ability to partition data based on such a key (about the time Oracle 7 or 8 was out). The idea was that any given query would automagically have "and sourceKey = #userSecurityKey" (or some such) appended. The feature may have been rolled into later versions of the popular commercial product.
To expand on Gregory's answer, you can also make a parent ssis that calls the package doing the actual moving within a foreach loop container.
The parent package queries a config table and puts this in an object variable. The foreach loop then uses this recordset to pass variables to the package, such as your database name and any other details the package might need.
You table could list all of your client databases and have a flag to mark when you are ready to move them. This way you are not sitting around running the ssis package on 32,767 databases. I'm hooked on the foreach loop in ssis.