How to normalise a database for dates - sql

I'm creating a database which involves many records which have many dates. Many records within these tables can have the same date. These will range from 3 years prior to about 3 years in the future. Would an efficient system use the date datatype built into SQL or to make individual tables for the Date, Month and Year. Sorry if this seems like an amateur question, I've only learnt SQL recently for this project.
Thanks

Yes, as you already guessed, the best solution here is to use the date datatype built into SQL.

From the way you have asked the question, it sounds like you want to record aggregated data for each day/month/year. As #edward said, you will definitely want to use the built in data type for the raw records - your "fact" table, and then you might also build up aggregated data in separate tables for the year or month.
Depending on the volume of data these might be stored physically, or just done through views on the fact table.
In general, you never want to remove information as you never know how it might be used in the future, which is why storing with the raw date is the correct option.

Related

What is the best practice for aggregating data in SQL Server?

I have a source table that is way too huge and queries take way too long for it to be usable directly for on-demand reporting.
The charts we generate are time-based, usually the resolution is in months or days, so my first idea was to create a "Months" table and a "Days" table, and to filter / sum / count into these tables, essentially running all possible queries in advance.
The question is how. My first idea was to write a C# console app to load all data for a month, and then somehow filter it (DataSet? DataView?) and then aggregate it (load into List<>? LinqToSQL?) and then update the Month and Day tables.
Is there a better way to do this? I apologize in advance for the lack of code in this post. I am writing this for advice BEFORE I start coding.
My 2 cents would be to create a calendar table( with some surrogate key) and use SSIS to aggregate the data.

What is the advantage of using a date dimension table over directly storing a date?

I have a need to store a fairly large history of data. I have been researching the best ways to store such an archive. It seems that a datawarehouse approach is what I need to tackle. It seems highly recommended to use a date dimension table rather than a date itself. Can anyone please explain to me why a separate table would be better? I don't have a need to summarize any of the data, just access it quickly and efficiently for any give day in the past. I'm sure I'm missing something, but I just can't see how storing the dates in a separate table is any better than just storing a date in my archive.
I have found these enlightening posts, but nothing that quite answers my question.
What should I have in mind when building OLAP solution from scratch?
Date Table/Dimension Querying and Indexes
What is the best way to store historical data in SQL Server 2005/2008?
How to create history fact table?
Well, one advantage is that as a dimension you can store many other attributes of the date in that other table - is it a holiday, is it a weekday, what fiscal quarter is it in, what is the UTC offset for a specific (or multiple) time zone(s), etc. etc. Some of those you could calculate at runtime, but in a lot of cases it's better (or only possible) to pre-calculate.
Another is that if you just store the DATE in the table, you only have one option for indicating a missing date (NULL) or you need to start making up meaningless token dates like 1900-01-01 to mean one thing (missing because you don't know) and 1899-12-31 to mean another (missing because the task is still running, the person is still alive, etc). If you use a dimension, you can have multiple rows that represent specific reasons why the DATE is unknown/missing, without any "magic" values.
Personally, I would prefer to just store a DATE, because it is smaller than an INT (!) and it keeps all kinds of date-related properties, the ability to perform date math etc. If the reason the date is missing is important, I could always add a column to the table to indicate that. But I am answering with someone else's data warehousing hat on.
Lets say you've got a thousand entries per day for the last year. If you've a date dimension your query grabs the date in the date dimension and then uses the join to collect the one thousand entries you're interested in. If there's no date dimension your query reads all 365 thousand rows to find the one thousand you want. Quicker, more efficient.

more table or more rows better in sql?

Actually i am building a software for academic institutions, so i just wanted to know answers of a few questions:
As you know the some new data will be generated each year(for new admissions) and some will be upgraded. So i should store all the data in one single table with academic year separation(as a columns like ac_year), or should i make separate tables for each year. Also that there are different tables to store information like, classes,marks,fee,hostel,etc about the students. So each Info, like Fee would be stored in different tables like
Fee-2010
Fee-2011
Fee-2012...
Or in 1 single Fee table with year as a columns.
One more point is that soon after 1-2 years database will get heavier so backing up data for a single year would be possible in single table(Like Fee with year as a column) ?
And
Please answer keeping in mind SQL Server 2005.
Thanks
As you phrase the question, the answer is clearly to store the data in one table, with the year (or date or other information) as a column. This is simply the right thing to do. You are dealing with the same entity over time.
The one exception would be when the fields are changing significantly from one year to the next. I doubt that is the case for your table.
If your database is really getting big, then you can look into partitioning the table by time. Each partition would be one year's worth of data. This would speed up queries that only need to access one year's worth. It also helps with backing up and restoring data. This may be the solution you are ultimately looking for.
"Really getting big" means at least millions of rows in the table. Even with a couple million rows in the table, most queries will probably run fine on decent hardware with appropriate indexes.
It's not typical to store the data in multiple tables based on time constraints. I would prefer to store it all in one table. In the future, you may look to archiving old data, but it will still be significant time before performance will become an issue.
It is always better option to add new property to entity than create a new entity for every different property. This way maintenance and querying will be much more easier for you.
On the performance part of querying you don't have to worry about internal affairs of data and database. If there become a real performance issue there are many solutions like Creating Index on years as in your situation.

projection of the entire database tables according to a specific column

Is there any Database Server that offers the possibility to do global projection of the entire database? For example suppose that we have 30 tables that have a 'Year' column, and the database has data for the last 5 years, and let's say that we are interested in one year of data at a time, is there any way to do global projection so we can have a view of the database that include only data for one year at a time?
If you really must not alter existing code to have it only show the past year, then try to make a view for every table, have this view only show you the 'current year' if you want to show anything other than the current year you then can query the source table. You rename the table and name the view as the table was (though this is a generally sloppy practice).
Otherwise you're going to have to use a WHERE clause in all your queries.
Realistically this is something that your ORM should be dealing with NOT your RDBMS.. unless you're doing raw SQL queries in your code (in which case see the start of my answer for the VIEW option).
A UNION query with a WHERE clause to filter by a year date range should solve what you are describing.
All the major RDBMS support this functionality.
If the tables all have the same schema then it's easy; if not, you will probably have to introduce 'dummy' columns for some portions of the UNION.
[SGBD is the french term for a RDBMS: What does SGBD mean? ]

What kind of structure is that?

In the project where I work I saw this structure in database, and I ask to all of you, what a hell of modeling is this?
TableX
Columns: isMonday, BeginingHourMonday, EndHourMonday, isTuesday, BeginingHourTuesday, EndHourTuesday and so on...
Is this no-sql? I did not asked to the personn who created becaus I'm ashamed :$
Bye.
this is totally de-normalized data. no-sql kind of. i just wonder why month is not included. it could increase the de-normalization-factor.
This is called a calendar table.
It is a very common and incredibly useful approach to dealing with and solving a lot of date and time related queries. It allows you to search, sort, group, or otherwise mine for data in interesting and clever ways.
#Brian Gideon is right. So is #iamgopal. And I am too, when I say "it depends on the nature of the data being modeled and stored in the database".
If it is a list of days with certain attributes/properties for each day, then yes, I would call it denormalized -- and 9 times out of 10 (or more) this will probably be the case. (I recall a database with 13 columns, one for each month in the year and one for total, and at the end of the year the user added 13 more columns for the next year. "Mr. Database", we called him.)
If this is a description of, say, work hours within a week, where each and every time the data is queried you always require the information for each day in the week, then the row would represent one "unit" of data (each column dependant upon the primary key of the table and all that), and it would be counter-productive to split the data into smaller pieces.
And, of course, it might be a combination of the two -- data that was initially normalized down to one row per day, and then intentionally denormalized for performance reasons. Perhaps 9 times out of 10 they do need a weeks' worth of information, and analysis showed massive performance gains by concatenating that data into one row?
As it is, without further information on use and rational I'm siding with #iamgopal, and upvoting him.
Looks like a structure of a timesheet for a given week.
If normalized, it might look like
columns: day, startHour, endHour
When this is converted to a pivot table in excel, you will have a timesheet kind of a structure, which is good for input screens/views (as against creating a view with normalized structure).
Looking to that table. I don't see any good reason to do that, even for a performance reason.
Lets see, if I change the isMonday, isTuesday, etc to ID_Day I still get the same speed and logic. And if I change the BeginingHourMonday to StartHour and the EndHourMonday to EndHour, I still get the same effect.
I still have the day of week and the start and end time and that is the basically idea I get from the table struture. Maybe there is something I'm not seeing.
Regards