Database Design Problem - sql-server-2005

Database Design Problem - sql-server-2005

I'm trying to design a database for a server and database inventory and I'm looking for a good database design. We have tables for server clusters, stand alone servers and databases. I would like to represent the following relationships in the database:
A one to many relationship from cluster to servers.
A one to many relationship from database to cluster/server.
The difficulty is in the second relationship becuse the clusters and servers are in separate tables and a cluster is made up of servers. What is the best way to represent this relationship?

It sounds like you have this in your relational view of the situation.
Cluster : name, other attributes of cluster
Server : name, optional FK to cluster, other attributes of a server
Database : name, (FK to cluster OR FK to server)
The issue is that you have a somewhat more complex real-world situation, one that relational technology doesn't reflect cleanly.
Host -- an abstract superclass for places a database can run.
Cluster (extends Host) : name, etc.
Server (extends Host) : name, optional FK to cluster.
Database : FK to Host
You have several choices for handling this kind of "subentity" problem.
Collapse Host, Cluster and Server into a single table. This leads to a recursive relationship among Host (as Cluster) and Host (as Server). This is kind of annoying, but it does create a single table for Host, Cluster and Server. The resulting table has a lot of nulls (Cluster rows use one bunch of columns, Server rows use a different set of columns.) You have to add a column to discriminate among the subentities of Host.
Push Host information down into Cluster and Server. This is useful when you have a lot of common information in the Host table, and very little subclass-specific information in the Cluster or Server tables. The Cluster and Server tables look very similar (essentially clones of Host) with a few columns that are different.
Use a Join between (Host and Cluster) or (Host and Server) based on a discriminator in Host. While fairly complex, this scales well because all Databases are joined to a Host, and the complete list of Hosts is a union of Hosts which join to Server plus Hosts which join to Cluster.
Use optional FK fields in Database. This requires a union between Database joined to Cluster plus Database joined to Server to get a full list of databases. Each Database might have to have a discriminator so that you could distinguish among the various combinations of NULL values in the two FK fields. There are four possible combinations, of which two are sensible, and two might be prohibited. Trying to simply use two nullable FK's doesn't usually work out well, so you often need a status flag to separate Database on Cluster from Database on Server from Database not assigned to anything, from Database with unknown hosting from any other status that might be relevant.

Option 1: Have two fields in the database table. One refers to server, the other to cluster. Keep one of them null.
Option 2: Another approach is to add an entry in cluster for each stand-alone server also and link only to that table.
Option 1 is really not the cleanest solution (i do agree with the comments), so go for option 2 :)

Related

SQL Data modeling -Querying Records that have tags across multiple categories

I have a table that stores different software services a company offers. The services are tagged by the Industry it serves, the LoB it belongs to, and the technology involved in the service. The service can have multiple tags on each of Industry,LOB, and Technology.
For eg: Following could be the master data:
And a transaction data could look like this :
I need to create a view that can be used to query data by Industry/LoB and Technology tags. For time being I've Left outer joined all tagtoService relation tables(service-technology, service-LoB, Service-Industry tables) to the services transaction table. but this goes for a huge number of records as it is possible to typically have one service tagged to up to 10-15 industries and technologies.
Just wanted to know what is the optimal way to model this data so that I have provision to query for service by all of the three tags right from within one view.
I am not a Data modeling expert and this is more of my first venture into the data modeling side- so please pardon the 'noob'ness of my question :). I use SAP HANA as the database and expose data via an OData service for which I want to use this view as a datasource.

If you're asking modeling the data: Normally in your transaction table, you keep the foreign keys, not the text columns that can be obtained via foreign keys from the master tables. I bet that's what you meant as well but the example shows text values in the transaction tables.
Other than that, I think what you have is sound and reasonable. These "tag" tables represent different level of granularity for the "services" table and it can be counterproductive if you combine them in a single table (examples: single column with comma separated tags, XML / JSON columns, multiple columns [LOBTag1, LOBTag2, ...] ) b/c that will make these columns non-indexable and/or hard to query. You may have optimization with XML and JSON columns but those are should not be considered unless the columns are too many and sparse.

Good practices between SQL and elasticsearch

Imagine you have a SQL database like mysql or postgresql. You have two tables : user and car. One user can drive N cars, a car can be driven by N users, so you have a third "drive" table with two foreign key.
Now, you want that your table user goes on elasticsearch, because you want search users by name, email... etc... Maybe you also need to do some search on the car table.
I see three way to achieve this, I d'like to know what is the best way :
1) Abandon the sql database. All your tables are now on elasticsearch. You can do search on whatever you want, but you must treat all your constraints manually.
2) Keep the structure on the sql database, you keep your three tables, the primary keys and the foreign keys. But your tables contains only elasticsearch ID of the associated row in elasticsearch. For exemple in table user, you keep user_id and add a user_elasticsearch_id that point on the elasticsearch row where you found the name, the email... etc... So you have your sql constraints, you can do search, but you must maintain two tables.
3) Duplicate. You don't touch your sql database, you duplicate all the rows on the elasticsearch database. You have your constrains, you can search, but again you must maintain two tables and you have twice the data and twice the storage.
Now, brave fellow of stackoverflow, what would you do in this case ?
Thank you.

The most common setup for critical business data is having e.g. a SQL database as your primary datastore and Elasticsearch as additional search index. (= your solution 3).
An alternative for non business-critical data like logs etc. is having Elasticsearch standalone.
Solution 2 seems wired, is not an option for me.

Because you may have a lot of business rules mixed into you database and application using it, I would be conservative and keep the DB. And use ES to index the user attributes I want to search on. ES would return scored results. When a result select I would switch to DB to retrieve all information and relations.
So I would choose 2b : keep DB and store PK in ES, not ID in DB).
Keep in mind you can force the ID en ES. It could be "user_PK" or something alike.

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?

You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.

There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.

I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.

Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.

a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp

This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.

You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

Is uniqueidentifier unique across databases?

I have a number of database where each have an audit table. One of these databases is a 'common' database that has tables such as countries, states, etc.
The tables of this common database are reference in the other databases as views.
Currently my audit table has an identity column. For listing audit records, I use a UNION to union the audit table in the normal database and audit table in the common database.
However the identity column in audit tables might be the same.
If I add a uniqueidentifier column in the audit table and use that as my unique id, will this column be unique across databases?

If you use the built-in methods of uniqueidentifier generation (like newId() or C#'s Guid.NewGuid()), yes, it will be unique across databases, servers, countries, whatever.
In fact, that's one of the big uses of GUIDs - replication. If you have the same GUID in two databases, it's guaranteed that it was put there on purpose.
However, do note that GUIDs do have their shortcomings - they might make your indices perform worse (or at least require more maintenance), and they are bigger in general.
Also, GUIDs aren't entirely random - there's a few different GUID generating algorithms, some of which are inherently unique (e.g. using a MAC address as part of the GUID - MACs are unique by default, although you can override them manually) - so one part is unique per physical server, and the machine makes sure it doesn't use the same timestamp for two GUIDs. There's also sequential GUIDs (newSequentialId()), which are handy in avoiding index fragmentation (very useful for clustered indices of course) - do note that those depend on the MAC address of the computer, and they are predictable, so if you're making the GUID public, and you depend on them being "secret", you might not want to use those. Some GUID algorithms are more predictable than others.

It is universally unique. It is a random-generated string of 32 chars making a possibility of having a duplicate as 1 / (16^32).
There is no certain mechanism to create uniqueness, extremeley low probability does it all.

relational database design for a routing table

I am trying to build relational database to store IPv4 routing table (unicast for now). Can anyone suggest how do I go about doing that following best practices?
Requirements: This database will store routing table for multiple routers/devices (1000+)
I am thinking this...
have a routers table that stores only routerid, hostname, etc.
have a interfaces table that stores only interface names (along with interfaceid) for each routerid
have a routingTable table that stores columns: IP prefix (subnet/route with mask); router (as routerid); outgoing interfaces as list of intefaceids (in case of load balancing)
My question is basically, how do I store the outgoing interfaces - as a list or multiple tables?
Similar concept applies to multicast routing table too.

I think i might have figured it out myself. I realized that storing the outgoing interface ids (aka OIL, outgoing interface List) in a list is not the best way, instead I would store the OIL in a table as oil_id, route_id, out_interface_id (which route_id, out_interface_id as unique) and assuming that out_interface_id is globally unique

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas