What is the most efficient way to show lots of related data?

What is the most efficient way to show lots of related data? - sql

I need to render a school schedule in a very detailed way (see screenshot). Every Day, Hour of the day, Room, Group and Teacher are separate entities that are related one to another in a certain way.
My issue is I have to pull data to every single cell via separate query and that's how I am having over 700 queries to get week's schedule.
The question is: what is the best approach to store, manipulate and pull data for such demands?
I was thinking about making a separate 'static' table to store the actual values, not related ID's, but then I am loosing flexibility.

Here's my best try.
id
date
room_id
group_id
teacher_id
If you're sure that the data is very static, it might be more manageable to put teacher_9, group_9 etc. as columns in your Schedule Table. Tradeoffs are yours.
For Teacher data, you most likely want it in a different table for future changes. Imagine a Name change. Same with Group and Room.
If you're concerned about query performance, know that SQL will cache your results :).

Related

Setup of tables to enable me to related records to eachother?

I am trying to figure out how I can relate records form 1 table to each other
I have a table with Individual cases (e.g disciplinary, grievance etc)
However, multiples of these cases could relate to each other.
e.g a group of people get into a fight, all the people involved would get an individual case, then all the cases would need to be related/linked.
Im struggling to figure out how would be best to store this data, whether it be in the same table or a new table.
This is the backend to a Winform. So now I need to be able to link records, in the example data the user would select that caseID 1- 4 are linked.
So the question is how do i store the data that these cases are related. Because other case might be linked at a later date.

It sounds like you need some sort of "master case" or "incident".
I think I would say that an "incident" comprises one or more "cases" and keep both an incidents and cases tables. In fact cases might be a bad name. Instead, it might be more like IncidentPerson.
An alternative approach would be to have a "master case" id on each case. This could be NULL or the same case. I'm not as fond of this approach because it will likely lead to confusion down the road. One analyst will count cases per month using "cases" and another using "master cases" and you'll spend a lot of time trying to figure out why the numbers are different.

more table or more rows better in sql?

Actually i am building a software for academic institutions, so i just wanted to know answers of a few questions:
As you know the some new data will be generated each year(for new admissions) and some will be upgraded. So i should store all the data in one single table with academic year separation(as a columns like ac_year), or should i make separate tables for each year. Also that there are different tables to store information like, classes,marks,fee,hostel,etc about the students. So each Info, like Fee would be stored in different tables like
Fee-2010
Fee-2011
Fee-2012...
Or in 1 single Fee table with year as a columns.
One more point is that soon after 1-2 years database will get heavier so backing up data for a single year would be possible in single table(Like Fee with year as a column) ?
And
Please answer keeping in mind SQL Server 2005.
Thanks

As you phrase the question, the answer is clearly to store the data in one table, with the year (or date or other information) as a column. This is simply the right thing to do. You are dealing with the same entity over time.
The one exception would be when the fields are changing significantly from one year to the next. I doubt that is the case for your table.
If your database is really getting big, then you can look into partitioning the table by time. Each partition would be one year's worth of data. This would speed up queries that only need to access one year's worth. It also helps with backing up and restoring data. This may be the solution you are ultimately looking for.
"Really getting big" means at least millions of rows in the table. Even with a couple million rows in the table, most queries will probably run fine on decent hardware with appropriate indexes.

It's not typical to store the data in multiple tables based on time constraints. I would prefer to store it all in one table. In the future, you may look to archiving old data, but it will still be significant time before performance will become an issue.

It is always better option to add new property to entity than create a new entity for every different property. This way maintenance and querying will be much more easier for you.
On the performance part of querying you don't have to worry about internal affairs of data and database. If there become a real performance issue there are many solutions like Creating Index on years as in your situation.

database design bigger table vs split table have the same col

i have a database program for store as you know there is too type of invoice in it one for the thing i bought and the other for me when i sold them the two table is almost identical like
invoice table
Id
customerName
date
invoiceType
and invoiceDetails which have
id
invoiceId
item
price
amount
my question is simple its what best to keep the design like that or split every table for two sperate tabels
couple of my friend suggest splitting the tables as one for saleInvoice and the other for buyInvoice to speed the time for querying
so whats the pro and con of every abrouch i feel that if i split them its like i dont follow DRY rule
i am using Nhibernate BTW so its kindda weird to have to identical class with different names

Both approached would work. If you use the single table approach, then the invoiceType column would be your discriminator field. In your nHibernate mapping, this discriminator field would be used by nHibernate to decide which type (i.e. a purchase or a sale) to instantiate for a given row in the table (see section 5.1.6 of the nHibernate mapping guide. For ad hoc SQL queries or reporting queries, you could create two views, one to return only rows with invoiceType = purchase and one to return only rows where invoiceType=sales.
Alternatively, you could create two separate tables, one for purchase and one for sales. As you point out, these two tables would have nearly identical schemas and nhibernate mapping files.
If you are anticipating very high transaction volumes, you would want to put purchases and sales on two different physical discs. With two different tables, this can be accomplished by putting them into different file groups. With a single table, you still could accomplish this by creating a SQL Server Partitioned Table. Before you go to this trouble, you might want to evaluate if this really is necessary and that disc access to the table is really going to be the performance bottleneck. You don't want to spend a lot of time doing premature optimization if it is not necessary.
My preference would be to have a single table with a discriminator column, to better follow DRY principles. Unless I had solid numbers that indicated that indicated it was necessary, I would hold off implementing a partitioned table until if and when it became necessary.

I'd ask myself, how do I intend to use this information? Will I need sales and buy invoices in the same queries? Am I likely to need specialized information eventually (highly likely in my experience) for each type? And if I do will I need to have child tables for only 1 type? How would that affect referntial integrity? Would a change to one automatically mean I needed a change to the other? How large is the table likely to be (It would have to be in the multi-millions before I would consider that it might need to be split out only due to size). How likely is it that I would get the information mixed up by accident if they are in the same table and include both when I didn't want to? The answers would determine whether I needed to split it out for me. I would tend to see these as two separate functions and it would take alot to convince me to put them in one table.

Is precalculation denormalization? If not, what is (in simple terms)?

I'm attempting to understand denormalization in databases, but almost all the articles google has spat out are aimed at advanced DB administrators. I fair amount of knowledge about MySQL and MSSQL, but I can't really grasp this.
The only example I can think of when speed was an issue was when doing calculations on about 2,500,000 rows in two tables at a place I used to intern at. As you can guess, calculating that much on demand took forever and froze the dev server I was on for a few minutes. So right before I left my supervisor wanted me to write a calculation table that would hold all the precalculated values, and would be updated about every hour or so (this was an internal site that wasn't used often). However I never got to finish it because I left
Would this be an example of denormalization? If so, is this a good example of it or does it go much farther? If not, then what is it in simple terms?

Say you had an Excel file with 2 worksheets you want to use to store family contact details. On the first worksheet, you have names of your contacts with their cell phone numbers. On the second worksheet, you have mailing addresses for each family with their landline phone numbers.
Now you want to print Christmas card labels to all of your family contacts listing all of the names but only one label per mailing address.
You need a way to link the two normalized sets. All the data in the 2 sets you have is normalized. It's 'atomic,' representing one 'atom,' or piece of information that can't be broken down. None of it is repeated.
In a denormalized view of the 2 sets, you'd have one list of all contacts with the mailing addresses repeated multiple times (cousin Alan lives with Uncle Bob at the same address, so it's listed on both Alan and Bob's rows.)
At this point, you want to introduce a Household ID in both sets to link them. Each mailing address has one householdID, each contact has a householdID value that can be repeated (cousin Alan and Uncle Bob, living in the same household, have the same householdID.)
Now say we're at work and we need to track zillions of contacts and households. Keeping the data normalized is great for maintenance purposes, because we only want to store contact and household details in one place. When we update an address, we're updating it for all the related contacts. Unfortunately, for performance reasons, when we ask the server to join the two related sets, it takes forever.
Therefore, some developer comes along and creates one denormalized table with all the zillions of rows, one for each contact with the household details repleated. Performance improves, and space considerations are tossed right out the window, as we now need space for 3 zillion rows instead of just 2.
Make sense?

I would call that aggregation not denormalization(if it is quantity of orders for example, SUM(Orders) per day...). This is what OLAP is used for. Denormalization would be for example instead of having a PhoneType table and the PhoneTypeID in the Contact table, you would just have the PhoneType in the Contact table thus eliminating 1 join
You could also of course use index/materialized views to have to aggregation values...but now you will slow down your update, delete and inserts
triggers are also another way to accomplish this

In an overly simplified form I would describe de-normalisation as reducing the number of tables used to represent the same data.
Customers and addresses are often kept in different tables to allow the concept of one customer having multiple addresses. (Work, Home, Current Address, Previous Address, etc)
The same could be said to apply to surnames, and other properties, but only the current surname ever be of concern. As such, one might normalise all the way to having a Customer table and a Surname table, with foreign key relationships, etc. But then denormalise this by merging the two tables together.
The benefit of "normalise until it hurts" is that it forces one to consider a pure and (hopefully) complete representation of the data and possible behaviours and relationships.
The benefit of "de-normalise until it works" is to reduce certain maintenance and/or processing overheads, but sticking to the same basic model as derived by working out a normalised model.
In the "Surname" example, by denormalising one is able to add an index to the customers based on their Surname and Date of Birth. Without de-normalising the Surname and DoB are in different tables and the composite index is not possible.

Denormalizing can be beneficial, the example you provided is an instance of this. It is not ideal to dynamically calculate these as the cost is expensive and thus you create a table and have a functional id referencing the other table along with calculation value.
The data is redundant as it can be derived from another table but due to production requirements this is a better design in the functional sense.
Curious to see what others have to say on this topic because I know my sql professor would cringe at the term denormalize but it has practicality.

Normal form would reject this table, as it is fully derivable from existing data. However, for performance reasons data of this type is commonly found. For example inventory counts are typically carried, but are derivable from the transactions that created them.
For smaller faster sets a view can be used to derive the aggregate. This provides the user the data they need (the aggregated value) rather than forcing them to aggregate it themselves. Oracle (and others?) have introduced materialized views to do what your manager was suggesting. This can be updated on various schedules.
If update volumes permit, triggers could be used to emulate a materialized view using a table. This may reduce the cost of maintaining the aggregated value. If not it would spread the overhead over a greater period of time. It does however, add the risk of creating a deadlock condition.
OLAP takes this simple case to more of an extreme interest in aggregates. Analysts are interested in aggregated values not the details. However, if the aggregated value is interesting, they may look at the details. Starting from normal form, is still a good practice.

What kind of structure is that?

In the project where I work I saw this structure in database, and I ask to all of you, what a hell of modeling is this?
TableX
Columns: isMonday, BeginingHourMonday, EndHourMonday, isTuesday, BeginingHourTuesday, EndHourTuesday and so on...
Is this no-sql? I did not asked to the personn who created becaus I'm ashamed :$
Bye.

this is totally de-normalized data. no-sql kind of. i just wonder why month is not included. it could increase the de-normalization-factor.

This is called a calendar table.
It is a very common and incredibly useful approach to dealing with and solving a lot of date and time related queries. It allows you to search, sort, group, or otherwise mine for data in interesting and clever ways.

#Brian Gideon is right. So is #iamgopal. And I am too, when I say "it depends on the nature of the data being modeled and stored in the database".
If it is a list of days with certain attributes/properties for each day, then yes, I would call it denormalized -- and 9 times out of 10 (or more) this will probably be the case. (I recall a database with 13 columns, one for each month in the year and one for total, and at the end of the year the user added 13 more columns for the next year. "Mr. Database", we called him.)
If this is a description of, say, work hours within a week, where each and every time the data is queried you always require the information for each day in the week, then the row would represent one "unit" of data (each column dependant upon the primary key of the table and all that), and it would be counter-productive to split the data into smaller pieces.
And, of course, it might be a combination of the two -- data that was initially normalized down to one row per day, and then intentionally denormalized for performance reasons. Perhaps 9 times out of 10 they do need a weeks' worth of information, and analysis showed massive performance gains by concatenating that data into one row?
As it is, without further information on use and rational I'm siding with #iamgopal, and upvoting him.

Looks like a structure of a timesheet for a given week.
If normalized, it might look like
columns: day, startHour, endHour
When this is converted to a pivot table in excel, you will have a timesheet kind of a structure, which is good for input screens/views (as against creating a view with normalized structure).

Looking to that table. I don't see any good reason to do that, even for a performance reason.
Lets see, if I change the isMonday, isTuesday, etc to ID_Day I still get the same speed and logic. And if I change the BeginingHourMonday to StartHour and the EndHourMonday to EndHour, I still get the same effect.
I still have the day of week and the start and end time and that is the basically idea I get from the table struture. Maybe there is something I'm not seeing.
Regards

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas