What should be the table (database) structure for this situation? - sql

I have some student data, I need to save in database. Refer pic here:
I want to generate report (Pi chart, Bar graph) using these data to show that how much understanding they keep of these subject.
What would be the the table structure like to save these data, considering there would be more than 100 students in a class.
I am using SQL database.
Your suggestion will be highly appreciated.

Have one table for your students. That will have the student name, roll number, etc, and an IDENTITY or AUTOINCREMENT column to create an INTEGER "id" column. This is a "surrogate key" that's much more Database friendly than using external identifiers (Such as Roll Number, which due to Sod's Law will sometimes be non-numeric, or contain a typo that needs changing after you've started recording the data).
Then have a similar table for Subjects.
Having the student and subject tables (sometimes known as Dimensions) separate from your data is called normalisation and is the basis of Relational Database Design.
Then you'd have a "Fact Table" to hold the "scores" for each student on each subject. It should have a key pointing to the Student table, a key pointing to the Subject table and two other columns, Theory and Numerical, in which to store your data.
You can then look at UNIQUE KEY constraints, PRIMARY KEYs, FOREIGN KEY constraints, INDEXes, etc, etc. Ideally from SQL Tutorials that are widely available on the interwebs.

Related

Adding an artificial primary key versus using a unique field [duplicate]

This question already has answers here:
Surrogate vs. natural/business keys [closed]
(19 answers)
Why would one consider using Surrogate keys vs Natural with ON UPDATE CASCADE?
(1 answer)
Closed 7 months ago.
Recently I Inherited a huge app from somebody who left the company.
This app used a SQL server DB .
Now the developer always defines an int base primary key on tables. for example even if Users table has a unique UserName field , he always added an integer identity primary key.
This is done for every table no matter if other fields could be unique and define primary key.
Do you see any benefits whatsoever on this? using UserName as primary key vs adding UserID(identify column) and set that as primary key?
I feel like I have to add add another element to my comments, which started to produce an essay of comments, so I think it is better that I post it all as an answer instead.
Sometimes there are domain specific reasons why a candidate key is not a good candidate for joins (maybe people change user names so often that the required cascades start causing performance problems). But another reason to add an ever-increasing surrogate is to make it the clustered index. A static and ever-increasing clustered index alleviates a high-cost IO operation known as a page split. So even with a good natural candidate key, it can be useful to add a surrogate and cluster on that. Read this for further details.
But if you add such a surrogate, recognise that the surrogate is purely internal, it is there for performance reasons only. It does not guarantee the integrity of your data. It has no meaning in the model, unless it becomes part of the model. For example, if you are generating invoice numbers as an identity column, and sending those values out into the real world (on invoice documents/emails/etc), then it's not a surrogate, it's part of the model. It can be meaningfully referenced by the customer who received the invoice, for example.
One final thing that is typically left out of this discussion is one particular aspect of join performance. It is often said that the primary key should also be narrow, because it can make joins more performant, as well as reducing the size of non-clustered indexes. And that's true.
But a natural primary key can eliminate the need for a join in the first place.
Let's put all this together with an example:
create table Countries
(
countryCode char(2) not null primary key clustered,
countryName varchar(64) not null
);
insert Countries values
('AU', 'Australia'),
('FR', 'France');
create table TourLocations
(
tourLocationName varchar(64) not null,
tourLocationId int identity(1,1) unique clustered,
countryCode char(2) not null foreign key references Countries(countryCode),
primary key (countryCode, tourLocationName)
);
insert TourLocations (TourLocationName, countryCode) values
('Bondi Beach', 'AU'),
('Eiffel Tower', 'FR')
I did not add a surrogate key to Countries, because there aren't many rows and we're not going to be constantly inserting new rows. I already know what all the countries are, and they don't change very often.
On the TourLocations table I have added an identity and clustered on it. There could be very many tour locations, changing all the time.
But I still must have a natural key on TourLocations. Otherwise I could insert the same tour location name with the same country twice. Sure, the Id's will be different. But the Id's don't mean anything. As far as any real human is concerned, two tour locations with the same name and country code are completely indistinguishable. Do you intend to have actual users using the system? Then you've got a problem.
By putting the same country and location name in twice I haven't created two facts in my database. I have created the same fact twice! No good. The natural key is necessary. In this sense The Impaler's answer is strictly, necessarily, wrong. You cannot not have a natural key. If the natural key can't be defined as anything other than "every meaningful column in the table" (that is to say, excluding the surrogate), so be it.
OK, now let's investigate the claim that an int identity key is advantageous because it helps with joins. Well, in this case my char(2) country code is narrower than an int would have been.
But even if it wasn't (maybe we think we can get away with a tinyint), those country codes are meaningful to real people, which means a lot of the time I don't have to do the join at all.
Suppose I gave the results of this query to my users:
select countryCode, tourLocationName
from TourLocations
order by 1, 2;
Very many people will not need me to provide the countries.countryName column for them to know which country is represented by the code in each of those rows. I don't have to do the join.
When you're dealing with a specific business domain that becomes even more likely. Meaningful codes are understood by the domain users. They often don't need to see the long description columns from the key table. So in many cases no join is required to give the users all of the information they need.
If I had foreign keyed to an identity surrogate I would have to do the join, because the identity surrogate doesn't mean anything to anyone.
You are talking about the difference between synthetic and natural keys.
In my [very] personal opinion, I would recommend to always use synthetic keys (and always call it id). The main problem is that natural keys are never unique; they are unique in theory, yes, but in the real world there are a myriad of unexpected and inexorable events that will make this false.
In database design:
Natural keys correspond to values present in the domain model. For example, UserName, SSN, VIN can be considered natural keys.
Synthetic keys are values not present in the domain model. They are just numeric/string/UUID values that have no relationship with the actual data. They only serve as a unique identifiers for the rows.
I would say, stick to synthetic keys and sleep well at night. You never know what the Marketing Department will come up with on Monday, and suddenly "the username is not unique anymore".
Yes having a dedicated int is a good thing for PK use.
you may have multiple alternate keys, that's ok too.
two great reasons for it:
it is performant
it protects against key mutation ( editing a name etc. )
A username or any such unique field that holds meaningful data is subject to changes. A name may have been misspelled or you might want to edit a name to choose a better one, etc. etc.
Primary keys are used to identify records and, in conjunction with foreign keys, to connect records in different tables. They should never change. Therefore, it is better to use a meaningless int field as primary key.
By meaningless I mean that apart from being the primary key it has no meaning to the users.
An int identity column has other advantages over a text field as primary key.
It is generated by the database engine and is guaranteed to be unique in multi-user scenarios.
it is faster than a text column.
Text can have leading spaces, hidden characters and other oddities.
There are multiple kinds of text data types, multiple character sets and culture dependent behaviors resulting in text comparisons not always working as expected.
int primary keys generated in ascending order have a superior performance in conjunction with clustered primary keys (which is a SQL-Server specialty).
Note that I am talking from a database point of view. In the user interface, users will prefer identifying entries by name or e-mail address, etc.
But commands like SELECT, INSERT, UPDATE or DELETE will always identify records by the primary key.
This subject - quite much like gulivar travels and wars being fought over which end of the egg you supposed to crack open to eat.
However, using the SAME "id" name for all tables, and autonumber? Yes, it is LONG establihsed choice.
There are of course MANY different views on this subject, and many advantages and disavantages.
Regardless of which choice one perfers (or even needs), this is a long established concept in our industry. In fact SharePoint tables use "ID" and autonumber by defualt. So does ms-access, and there probably more that do this.
The simple concpet?
You can build your tables with the PK and child tables with forighen keys.
At that point you setup your relationships between the tables.
Now, you might decide to add say some invoice number or whatever. Rules might mean that such invoice number is not duplicated.
But, WHY do we care of you have some "user" name, or some "invoice" number or whatever. Why should that fact effect your relational database model?
You mean I don't have a user name, or don't have a invoice number, and the whole database and relatonships don't work anymore? We don't care!!!!
The concept of data, even required fields, or even a column having to be unique ?
That has ZERO to do with a working relational data model.
And maybe you decide that invoice number is not generated until say sent to the customer. So, the fact of some user name, invoice number or whatever? Don't care - you can have all kinds of business rules for those numbers, but they have ZERO do to do with the fact that you designed a working relational data model based on so called "surrogate" or sometime called synthetic keys.
So, once you build that data model - even with JUST the PK "id" and FK (forighen keys), you are NOW free to start adding columns and define what type of data you going to put in each table. but, what you shove into each table has ZERO to do with that working related data model. They are to be thought as seperate concpets.
So, if you have a user name - add that column to the table. If you don't want users name, remove the column. As such data you store in the table has ZERO to do with the automatic PK ID you using - it not really any different then say what area of memory the computer going to allocate to load that data. Basic data operations of the system is has nothing to do with having build database with relationships that simple exist. And the data columns you add after having built those relationships is up to you - but will not, and should not effect the operation of the database and relationships you built and setup. Not only are these two concepts separate, but they free the developer from having to worry about the part that maintains the relationships as opposed to data column you add to such tables to store user data.
I mean, in json data, xml? We often have a master + child table relationship. We don't care how that relationship is maintained - but only that it exists.
Thus yes, all tables have that pk "ID". Even better? in code, you NEVER have to guess what the PK id is - it always the same!!!
So, data and columns you put and toss into a table? Those columns and data have zero to do with the PK id, and while it is the database generating that PK? It could be a web service call to some monkeys living in a far away jungle eating banana's and they give you a PK value based on how many bananas they eaten. We just really don't' care about that number - it is just internal house keeping numbers - one that we don't see or even care about in most code. And thus the number one rule to such auto matic PK values?
You NEVER give that auto PK number any meaning from a user and applcation point of view.
In summary:
Yes, using a PK called "id" for all tables? Common, and in fact in SharePoint and many systems, it not only the default, but is in fact required for such systems to operate.
Its better to use userid. User table is referenced by many other tables.
The referenced table would contain the primary key of the user table as foreign key.
Its better to use userid since its integer value,
it takes less space than string values of username and
the searches by the database engine would be faster
user(userid, username, name)
comments(commentid, comment, userid) would be better than
comments(commentid, comment, username)

A different way to Model a portion of this ERD

I have a very simple table diagram from modeling my application. The problem is I am second guessing my relation between Vendor and VendorOrder. The VendorOrders table should store all vendororders in the system. To get all orders for a certain apartment, you would just use the PK and FK relationship to gather that data. Is there anything I should improve with the overall design?
Diagram:
There's three things I see that you could improve this by doing.
Create an intersection table between your Apartment and Resident tables called ApartmentResidents, where each table references the intersection table with a one to many relationship. In this ERD, it only allows for one resident to be registered to an apartment. If a resident lives in more than one apartment for the lifetime of this database, you'll need to register them as an entirely new resident.
Intersection table example
In your Vendor table, instead of using a name as your primary key I would create an id instead. Using things that have a real-world value as your primary key can get messy for a number of reasons:
If two vendors have the same name, like "Johnson's Repair", you'll need to misspell one of them for it to be a valid key.
If you typo a vendor's name, you're also going to contain a reference to that typo in the foreign key tables (Which also might make it not show in results if you do a select query for the correct spelling).
Placing an index on a string is less performant than if you put it on an auto incrementing integer key.
(Optional) I usually name my database tables pluralized, like "Apartments", or "Vendors". It makes the SQL syntax read more like a sentence inside the query. If I remember right that's also one of the things that SQL's creator was going for too with the syntax design.

SQL Server database design with foreign keys

I have the following partial database design:
All the tables are dependent on each other so the table bvd_docflow_subdocuments is dependent on the table bdd_docflow_subsets
and the table bvd_docflow_subdocuments is dependent on bvd_docflow_subsets. So I thought I could me smart and use foreign keys on every table (and ON DELETE CASCADE). However the FK are being drilldown how further I go in to the tables.
The problem is the table bvd_docflow_documents has no point having a reference to the 1docflow_documentset_id` PK / FK. Is there a way (and maybe my design is crappy) that only the table standing above it has an FK relationship between the tables and not all the tables above it.
Edit:
More explanation:
In the bvd_docflow_subsets table information is stored about objects to create documents. There is an relation between that table and bvd_docflow_subdocuments table (This table stores master data about all the documents for an subset. (docflow_subset_id is in both tables). This is the link between those to tables.
Going further down we also got the table bvd_docflow_documents this table contains the actual document data. The link between bvd_docflow_documents and bvd_docflow_subdocuments is bvd_docflow_subdocument_id.
On every table I got an foreign key defined so when data is removed on a table all the data linked to that data is also removed.
However when we look to the bvd_docflow_documents table it has all the foreign keys from the other tables (docflow_subset_id and docflow_documentset_id) and there is the problem. The only foreign key needed for that bvd_docflow_documents table is docflow_subdocument_id and no other.
Edit 2
I have changed my design further and removed information that I don't need after initial import of the data.
See the following link for the (total) databse design:
https://sqldbm.com/Project/SQLServer/Share/_AUedvNutCEV2DGLJleUWA
The tables subsets, subdocuments and documents have a many to many relationship so I thought a table in between those 3 documents_subdocuments is the way to go were I define all the different keys for those tables.
I am not used to the database design first and then build it. But, for everything there is a first time, and I try to do make a database that is using standards and is using the power of SQL Server the correct way.
I'll address the bottom-most table and ignore the rest for the most part.
But first some comments. Your schema is simply a model of a system. To provide feedback, one must understand this "system" and how it actually works to evaluate your model. In addition, it is important to understand your entities and your reasons for choosing them and modelling them in the specified manner. Without that understanding all of this guessing based on experience.
And another comment. Slapping an identity column into every table is just lazy modelling IMO. Others will disagree, but you need to also enforce all natural keys. Do you have natural keys? It is rare not to have any. Enforce those that do exist.
And one last comment. Stop the ridiculous pattern of prepending the column names with the table names. And you should really think long and hard about using very long table names. Given what you have, I sense you need a schema for your docflow stuff.
For the documents table, your current PK makes no sense. Again, you've slapped an identity column into the table. By itself, this column is a key for the table. The inclusion of any other columns does not make the key any more "unique" - that inclusion is logical nonsense. Following your pattern, you would designate the identity column as the primary key. But ...
According to your image, the documents table is related to one and only one subdocument. You added a foreign key to that table - which matches the image. You also added additional columns and foreign keys to the "higher" tables. So now a document "points" to a specific subdocument. It also points to a specific subset - which may have no relationship to the subdocument. The same thought applies to the other FK. I have a doubt that this is logically correct. So why do these columns (and related FKs) exist? Perhaps this is the result of premature optimization - which everyone knows is the root of all evil coding. Again, it is impossible to know if this is "right" or even "useful" for your model.
To answer your question "... is there a way", the answer is obviously yes. You remove the columns of which you complain. You added them - Why? Is this perhaps a problem with the tool you are using?
And some last comments. There is nothing special about "varchar(50)". Perhaps this is a place holder that will be updated later. It may also be another sign of laziness. And generally speaking, columns with names like "type" and "code" tend to be foreign keys to "lookup" tables - because people like to add, modify, or remove these sorts categorization values over time. I'm also concerned about the column name overlap among the tables. "Location" exists in multiple tables, as do action_code and action_id. And a column named "id" (action_id) suggests a lookup to another table - is it? Should it be? Is there a relationship between action_id and action_code? From a distance it is impossible to answer any of these questions.
But designing a database is more art than science. Sometimes you just need to create something, populate it with some sample data, and then determine if it works for your needs. Everyone will get something wrong in the first try. That is expected; that is how you learn. The most difficult part is actually completing your first attempt.

Database design when having either foreign key or string

In my application I'm creating a case having a supervisor, but as the supervisor can be either an employee or an external supervisor, I'd like to be able to save either an employee id for the internal reference or a string for the name of the external supervisors name.
How should I implement this? Is having a table "case" and the sub-tables "case_internal_sv" and "case_external_sv" the way to go?
There's not a lot of information in your question on which to base an answer.
If there is common data and functionality across all types of supervisors then you will probably want one table to hold that common data. That table would establish the primary key values for supervisors and the case table would have a foreign key into this table. Information that is unique to either internal or external supervisors would go into separate tables and those tables would also have a foreign key back to the common supervisor level data.
This design is superior because you only have one place to go in order to find a list of all supervisors and because you can enforce the supervisor / case relationship directly in the database without a lot of code or additional constraints to ensure that "one and only one" of two columns is populated.
It's sufficiently superior, from a database point of view, that I'd consider using this design even if the data for internal and external supervisors is completely disjoint (which it's unlikely to be).
If your database allows you to define multiple primary keys you could have field employee and field employee_type combine to form the unique primary key. If not you could have an autogenerated primary key for the table and have a field for employee type and a field for employee_id.
What database are you using?
Having tables too normalized can do more harm than help. You should evaluate the benefits/drawbacks of having sub-tables based on your requirements.
A simple solution can be to have a 'sv_type' column in 'case' table. And have two columns
'internal_sv_id', a nullable foreign key to the employee table
'external_sv_name', a nullable string to save external name.
Then check for supervisor in one of these two columns based on the 'sv_type'
This design might not fully conform to 3rd normal form, but it can save lot of costly joins and allow integrity with employee table.
As for me, I'd go for the simplest solution in case of doubt.

how to design a schema where the columns of a table are not fixed

I am trying to design a schema where the columns of a table are not fixed. Ex: I have an Employee table where the columns of the table are not fixed and vary (attributes of Employee are not fixed and vary). Frequent addition of a new attribute / column is requirement.
Nullable columns in the Employee table itself i.e. no normalization
Instead of adding nullable columns, separate those columns out in their individual tables ex: if Address is a column to be added then create table Address[EmployeeId, AddressValue].
Create tables ExtensionColumnName [EmployeeId, ColumnName] and ExtensionColumnValue [EmployeeId, ColumnValue]. ExtensionColumnName would have ColumnName as "Address" and ExtensionColumnValue would have ColumnValue as address value.
Employee table
EmployeeId
Name
ExtensionColumnName table
ColumnNameId
EmployeeId
ColumnName
ExtensionColumnValue table
EmployeeId
ColumnNameId
ColumnValue
There is a drawback is the first two ways as the schema changes with every new attribute. Note that adding a new attribute is frequent and a requirement.
I am not sure if this is the good or bad design. If someone had a similar decision to make, please give an insight on things like foreign keys / data integrity, indexing, performance, reporting etc.
It might be useful to look at the current crop of NoSQL databases which allow you to store arbitrary sets of key-value pairs per record.
I would recommend you look at couchdb, mongodb, lucene, etc ...
If the schema changes often in an SQL database this ends up in a nightmare, especially with reporting.
Putting everything in (rowId, key, value) triads is flexible, but slower because of the huge number of records.
The way the ERP vendors do it is just make their schema of the fields they're sure of and add a largisch number of "flexfields" (i.e. 20 numbers, 20 strings, etc) in fixed named columns and use a lookup table to see which flexcolumn corresponds to what. This allows some flexibility for the future while essentially having a static schema.
I recommend using a combination of numbers two and three. Where possible, model tables for standard associations like addresses. This is the most ideal approach...
But for constantly changing values that can't be summarized into logical groupings like that, use two tables in addition to the EMPLOYEES table:
EMPLOYEE_ATTRIBUTE_TYPE_CODES (two columns, employee_attribute_type_code and DESCRIPTION)
EMPLOYEE_ATTRIBUTES (three columns: employee_id foreign key to EMPLOYEES, employee_attribute_type_code foreign key to EMPLOYEE_ATTRIBUTE_TYPE_CODES, and VALUE)
In EMPLOYEE_ATTRIBUTES, set the primary key to be made of:
employee_id
employee_attribute_type_code
This will stop duplicate attributes to the same employee.
If, as you say, new attributes will be added frequently, an EAV data model may work well for you.
There is a pattern, called observation pattern.
For explanation, see these questions/answers: one, two, three.
In general, looks like this:
For example, subjects employee, company and animal can all have observation Name (trait), subjects employee and animal can have observation Weight (measurement) and subject beer bottle can have observations Label (trait) and Volume (measurement). It all fits in the model.
Combine your ExtensionColumn tables into one
Property:
EmployeeID foreign key
PropertyName string
PropertyValue string
If you use a monotonic sequence for assigning primary keys in all your object tables then a single property table can hold properties for all objects.
I would use a combination of 1 and 2. If you are adding attributes frequently, I don't think you have a handle on the data requirements.
I supect some of the attributes being added belong in a another table. If you keep adding attribututes like java certified, asp certified, ..., then you need a certification table. This can be relationship to a certifications code table listing available certifications.
Attributes like manager may be either an attribute or relationship table. If you have multiple relationships between employees, then consider a relationship table with a releation type. Organizations with a matrix management structure will require a releationship table.
Addresses and phone numbers often go in separate tables. An address key like employee_id, address_type would be appropriate. If history is desired add a start_date column to the key.
If you are keeping history I recommend using start_date and end_date columns on the appropriate columns. I try to use a relationship where the record is active when 'start_date <= date-being-considered < end_date' is true.
Attributes like weight, eye color, etc.