What kind of structure is that? - sql

In the project where I work I saw this structure in database, and I ask to all of you, what a hell of modeling is this?
TableX
Columns: isMonday, BeginingHourMonday, EndHourMonday, isTuesday, BeginingHourTuesday, EndHourTuesday and so on...
Is this no-sql? I did not asked to the personn who created becaus I'm ashamed :$
Bye.

this is totally de-normalized data. no-sql kind of. i just wonder why month is not included. it could increase the de-normalization-factor.

This is called a calendar table.
It is a very common and incredibly useful approach to dealing with and solving a lot of date and time related queries. It allows you to search, sort, group, or otherwise mine for data in interesting and clever ways.

#Brian Gideon is right. So is #iamgopal. And I am too, when I say "it depends on the nature of the data being modeled and stored in the database".
If it is a list of days with certain attributes/properties for each day, then yes, I would call it denormalized -- and 9 times out of 10 (or more) this will probably be the case. (I recall a database with 13 columns, one for each month in the year and one for total, and at the end of the year the user added 13 more columns for the next year. "Mr. Database", we called him.)
If this is a description of, say, work hours within a week, where each and every time the data is queried you always require the information for each day in the week, then the row would represent one "unit" of data (each column dependant upon the primary key of the table and all that), and it would be counter-productive to split the data into smaller pieces.
And, of course, it might be a combination of the two -- data that was initially normalized down to one row per day, and then intentionally denormalized for performance reasons. Perhaps 9 times out of 10 they do need a weeks' worth of information, and analysis showed massive performance gains by concatenating that data into one row?
As it is, without further information on use and rational I'm siding with #iamgopal, and upvoting him.

Looks like a structure of a timesheet for a given week.
If normalized, it might look like
columns: day, startHour, endHour
When this is converted to a pivot table in excel, you will have a timesheet kind of a structure, which is good for input screens/views (as against creating a view with normalized structure).

Looking to that table. I don't see any good reason to do that, even for a performance reason.
Lets see, if I change the isMonday, isTuesday, etc to ID_Day I still get the same speed and logic. And if I change the BeginingHourMonday to StartHour and the EndHourMonday to EndHour, I still get the same effect.
I still have the day of week and the start and end time and that is the basically idea I get from the table struture. Maybe there is something I'm not seeing.
Regards

Related

How to normalise a database for dates

I'm creating a database which involves many records which have many dates. Many records within these tables can have the same date. These will range from 3 years prior to about 3 years in the future. Would an efficient system use the date datatype built into SQL or to make individual tables for the Date, Month and Year. Sorry if this seems like an amateur question, I've only learnt SQL recently for this project.
Thanks
Yes, as you already guessed, the best solution here is to use the date datatype built into SQL.
From the way you have asked the question, it sounds like you want to record aggregated data for each day/month/year. As #edward said, you will definitely want to use the built in data type for the raw records - your "fact" table, and then you might also build up aggregated data in separate tables for the year or month.
Depending on the volume of data these might be stored physically, or just done through views on the fact table.
In general, you never want to remove information as you never know how it might be used in the future, which is why storing with the raw date is the correct option.

What is the advantage of using a date dimension table over directly storing a date?

I have a need to store a fairly large history of data. I have been researching the best ways to store such an archive. It seems that a datawarehouse approach is what I need to tackle. It seems highly recommended to use a date dimension table rather than a date itself. Can anyone please explain to me why a separate table would be better? I don't have a need to summarize any of the data, just access it quickly and efficiently for any give day in the past. I'm sure I'm missing something, but I just can't see how storing the dates in a separate table is any better than just storing a date in my archive.
I have found these enlightening posts, but nothing that quite answers my question.
What should I have in mind when building OLAP solution from scratch?
Date Table/Dimension Querying and Indexes
What is the best way to store historical data in SQL Server 2005/2008?
How to create history fact table?
Well, one advantage is that as a dimension you can store many other attributes of the date in that other table - is it a holiday, is it a weekday, what fiscal quarter is it in, what is the UTC offset for a specific (or multiple) time zone(s), etc. etc. Some of those you could calculate at runtime, but in a lot of cases it's better (or only possible) to pre-calculate.
Another is that if you just store the DATE in the table, you only have one option for indicating a missing date (NULL) or you need to start making up meaningless token dates like 1900-01-01 to mean one thing (missing because you don't know) and 1899-12-31 to mean another (missing because the task is still running, the person is still alive, etc). If you use a dimension, you can have multiple rows that represent specific reasons why the DATE is unknown/missing, without any "magic" values.
Personally, I would prefer to just store a DATE, because it is smaller than an INT (!) and it keeps all kinds of date-related properties, the ability to perform date math etc. If the reason the date is missing is important, I could always add a column to the table to indicate that. But I am answering with someone else's data warehousing hat on.
Lets say you've got a thousand entries per day for the last year. If you've a date dimension your query grabs the date in the date dimension and then uses the join to collect the one thousand entries you're interested in. If there's no date dimension your query reads all 365 thousand rows to find the one thousand you want. Quicker, more efficient.

more table or more rows better in sql?

Actually i am building a software for academic institutions, so i just wanted to know answers of a few questions:
As you know the some new data will be generated each year(for new admissions) and some will be upgraded. So i should store all the data in one single table with academic year separation(as a columns like ac_year), or should i make separate tables for each year. Also that there are different tables to store information like, classes,marks,fee,hostel,etc about the students. So each Info, like Fee would be stored in different tables like
Fee-2010
Fee-2011
Fee-2012...
Or in 1 single Fee table with year as a columns.
One more point is that soon after 1-2 years database will get heavier so backing up data for a single year would be possible in single table(Like Fee with year as a column) ?
And
Please answer keeping in mind SQL Server 2005.
Thanks
As you phrase the question, the answer is clearly to store the data in one table, with the year (or date or other information) as a column. This is simply the right thing to do. You are dealing with the same entity over time.
The one exception would be when the fields are changing significantly from one year to the next. I doubt that is the case for your table.
If your database is really getting big, then you can look into partitioning the table by time. Each partition would be one year's worth of data. This would speed up queries that only need to access one year's worth. It also helps with backing up and restoring data. This may be the solution you are ultimately looking for.
"Really getting big" means at least millions of rows in the table. Even with a couple million rows in the table, most queries will probably run fine on decent hardware with appropriate indexes.
It's not typical to store the data in multiple tables based on time constraints. I would prefer to store it all in one table. In the future, you may look to archiving old data, but it will still be significant time before performance will become an issue.
It is always better option to add new property to entity than create a new entity for every different property. This way maintenance and querying will be much more easier for you.
On the performance part of querying you don't have to worry about internal affairs of data and database. If there become a real performance issue there are many solutions like Creating Index on years as in your situation.

Same data, two different ways to store it

The two tables below can both hold the same data - a full year, including some arbitrary info about each month
table1 (one row = one month)
------
id
month
year
info
table2 (one row = one year)
------
id
year
jan_info
feb_info
mar_info
apr_info
may_info
jun_info
jul_info
aug_info
sep_info
oct_info
nov_info
dec_info
Table A
Seems more intuitive because the month is numeric, but its
10x more rows for a full year of data. Also the
Rows are smaller (less columns)
Table B
10x less rows for a full year of data, but
Single rows are much larger
Possibly more difficult to add more arbitrary info for a month
In a real world test scenerio I set up, there were 12,000 rows in table1 for 10 years of data, where table2 had 150. I realize less is better, generally speaking, but ALWAYS? I'm afraid that im overlooking some caveat that ill find later if I commit to one way. I havent even considered disk usage or what query might be faster. What does MySQL prefer? Is there a "correct" way? Or is there a "better" way?
Thanks for your input!
Don't think about how to store it, think about how you use it. And also think about how it might change in the future. The storage structure should reflect use.
The first option is more normalized by the second, so I would tend to prefer it. It has the benefit of being easy to change, for example if every month suddenly needed a second piece of information stored about it. Usually this kind of structure is easier to populate, but not always. Think about where the data is coming from.
If you're only using this data for reports and you don't need to aggregate data across months, use the second option.
It really depends on what the data is for and where it comes from. Generally, though, the first option is better.
12000 rows for 10 years of data? I say that scale pretty well since 12000 rows is next to nothing with a decent DBMS.
How are you using the database? Are you sure you really need to worry about optimizations?
If you need to store data that is specific to a month then you should absolutely store a row for each month. It's a lot cleaner approach compared to the one with a column for each month.
"In a real world test scenerio I set up, there were 12,000 rows in table1 for 10 years of data, where table2 had 150."
How? There would have to be 80 months in a year for that to be the case.
Since this is an optimising problem the optimising answer applies: It depends.
What do you want to do with your data?
Table A is the normal form in which one would store this kind of data.
For special cases Table B might come in handy, but I'd need to think hard to find a good example.
So either go with A or give us some details about what you want to do with the data.
A note on disc space: Total disc space is is a non issue, except for extremely huge tables. If at all discspace per select matters, and that should be less for the Table A design in most cases.
A note on math: if you divide 12000 by 12 and get 150 as an result, something is wrong.
How are you using the data? If you are often doing a report that splits the data out by month, the second is easier (and probably faster but you need to test for yourself) to query. It is less normalized but but honestly when was the last time we added a new month to the year?
In general I'd say one record per month as the more general solution.
One important issue is whether "info" is and must logically always be a single field. If there are really several pieces of data per month, or if it's at all likely that in the future there will be, than putting them all in one table gets to be a major pain.
Another question is what you will do with this data. You don't say what "info" is, so just for purposes of discussion let's suppose it's "sales for the month". Will you ever want to say, "In what months did we have over $1,000,000 in sales?" ? With one record per month, this is an easy query: "select year, month from sales where month_sales>1000000". Now try doing that with the year table. "select year, 'Jan' from year_sales where jan_sales>1000000 union select year, 'Feb' from year_sales where feb_sales>1000000 union select year, 'Mar' from year_sales where mar_sales>1000000 union ..." etc. Or maybe you'd prefer "select year, case when jan_sales>1000000 then 'Jan=yes' else 'Jan=no', case when feb_sales>1000000 then 'Feb=yes' else 'Feb=no' ... for the remaining months ... from year_sales where jan_sales>1000000 or feb_sales>1000000 or mar_sales>1000000 ..." Yuck.
Having many small records is not that much more of a resource burden than having fewer but bigger records. Yes, the total disk space requirement will surely be more because of per-record overhead, and index searches will be somewhat slower because the index will be larger. But the difference is likely to be minor, and frankly there are so many factors in database performance that this sort of thing is hard to predict.
But I have to admit that I just faced a very similar problem and went the other way: I needed a set of flags for each day of the week, saying "are you working on this day". I wrestled with whether to create a separate table with one record per day, but I ended up putting seven fields into a single record. My thinking is that there will never be additional data for each day without some radical change in the design, and I have no reason to ever want to look at just one day. The days are used for calculating a schedule and assigning due dates, so I can't imagine, in the context of this application, ever wanting to say "give me all the people who are working on Tuesday". But I can readily imagine the same data in a different application being used with precisely that question.

Using a smalldatetime or int for storing a month in database

I'm currently developing a monthly checklist system for our organization. A user may login, select a month, then submit a list of yes/no questions relevant to that month for our organization's purposes. Some of the questions are used in more than 1 month's checklist, so I'm creating an intersection table to facilitate this one-to-many relationship. The fields are ChecklistMonth and ChecklistQuestionID.
I'm unsure of how to store the ChecklistMonth field, however. If I use a smalldatetime, it seems a bit overkill, as I am only interested in the month. It will also look a bit dated in future years. On the other hand, it seems a bit wasteful to create a table with the fields MonthID and Month in order to identify only the month.
What is everyone's opinion on this? Thanks in advance.
If it is a month without regard to year, I would just use a TINYINT. I don't think that you need a separate lookup table since the numbers of the months are pretty distinct and universal (without getting into Chinese or Jewish calendars, etc...).
If you use any sort of datetime then you always need to remember the exact rules around it. Are you storing it as the first day of the month? Middle day of the month? What year? Plus, it's extra, unnecessary room being used in the DB.
EDIT: I thought that I had already added this to my response, but apparently not... remember to add a check constraint to the column:
CHECK (month BETWEEN 1 AND 12)
Bite the bullet, use the MonthID. It's the better decision in the long term, as it's clearer, and the waste involved in having an enumeration table for Month is trivial.
(And by the way, the decision to use an enumeration table for month, while some may consider it unnecessary, is, I think, the right one.)
I'd store it as an int. I agree with you that smalldatetime is overkill, and could be confusing in the future. Not to mention you'd still have to pull the month out to check if it's the month your querying for.
Do you need a cross reference table? Your MonthId should be 1=Jan, 2=Feb up to 12. I think having a field which has the month number is preety self documenting and doesn't require additional lookup tables. Assuming that your only dealing with a single calender here of course.