I have a database that keeps track of attendance for students in a school. There's one table (SpecificClasses) with dates of all of the classes, and another table (Attendance) with a list of all the students in each class and their attendance on that day.
The school wants to be able to view that data in many different ways and to filter it according to many different parameters. (I won't paste the entire query here because it is quite complicated and the details are not important for my question.) One of their options they want is to view the attendance of a specific student on a certain day of the week. Meaning, they want to be able to notice if a student is missing every Tuesday or something like that.
To make the query be able to do that, I have used DatePart("w",[SpecificClasses]![Day]). However, running this on every class (when we may be talking about hundreds of classes taken by one student in one semester) is quite time-consuming. So I was thinking of storing the day of the week manually in the SpecificClasses table, or perhaps even in the Attendance table to be able to avoid making a join, and just being very careful in my events to keep this data up-to-date (meaning to fill in the info when the secretaries insert a new SpecificClass or fix the Day field).
Then I was wondering whether I could just make a calculated field that would store this value. (The school has Access 2010 so I don't have to worry about compatibility). If I create a calculated field, does Access actually store that field and remember it for the future and not have to recalculate it each time?
As HansUp mentions in his answer, a Calculated field cannot be indexed so it might not give you much of a performance boost. However, since you are using Access 2010 you could create a "real" Integer field named [WeekdayNumber] and put an index on it,
and then use a Before Change data macro to insert the Weekday() value for you:
(The Weekday() function gives the same result as DatePart("w", ...).)
I was wondering whether I could just make a calculated field that
would store this value.
No, not for a calculated field expression which uses DatePart(). Access supports a limited set of functions for calculated fields, and DatePart() is not one of those.
If I create a calculated field, does Access actually store that field
and remember it for the future and not have to recalculate it each
time?
Doesn't apply to your current case. But for a calculated field which Access would accept, yes, that is the way it works.
However a calculated field can not be indexed so that limits how much improvement it can offer in terms of data retrieval speed. If you encounter another situation where you can create a valid calculated field, test the performance to see whether you notice any improvement (vs. calculating the value in a query).
For your DatePart() query problem, consider creating a calendar table with a row for each date and include the weekday number as a separate indexed field. Then you could join the calendar table into your query, avoid the need to compute DatePart() again, and allow Access to use the indexed weekday number to quickly identify which rows match the weekday of interest.
Related
In my database, I have a table that has to get info from two adjacent rows from another table.
Allow me to demonstrate. There's a bill that calculates the difference between two adjacent meter values and calculates the cost accordingly (i.e., I have a water meter and if I want to calculate the amount I should pay in December, I take the value I measured in November and subtract it from the December one).
My question is, how to implement the references the best way? I was thinking about:
Making each meter value an entity on its own. The bill will then have two foreign keys, one for each meter value. That way I can include other useful data, like measurement date and so on. However, implementing and validating adjacency becomes icky.
Making a pair of meter values an entity (or a meter value and a diff). The bill will reference that pair. However, that leads to data duplication.
Is there a better way? Thank you very much.
First, there is no such thing as "adjacent" rows in a relational database. Tables represent unordered sets. If you have a concept of ordering it needs to be implementing using data in the rows. Let me assume that you have some sort of "id" or "creation date" that specifies the ordering.
Because you don't specify the database, I'll assume you have a functional database that supports the ANSI standard window functions. In that case, you can get what you want using the LAG() function. The syntax to get the previous meter reading is something like:
select lag(value) over (partition by meterid order by readdatetime)
There is no need to have data duplication or some arcane data data structure. LAG() should also be able to take advantage of appropriate indexes.
I have a need to store a fairly large history of data. I have been researching the best ways to store such an archive. It seems that a datawarehouse approach is what I need to tackle. It seems highly recommended to use a date dimension table rather than a date itself. Can anyone please explain to me why a separate table would be better? I don't have a need to summarize any of the data, just access it quickly and efficiently for any give day in the past. I'm sure I'm missing something, but I just can't see how storing the dates in a separate table is any better than just storing a date in my archive.
I have found these enlightening posts, but nothing that quite answers my question.
What should I have in mind when building OLAP solution from scratch?
Date Table/Dimension Querying and Indexes
What is the best way to store historical data in SQL Server 2005/2008?
How to create history fact table?
Well, one advantage is that as a dimension you can store many other attributes of the date in that other table - is it a holiday, is it a weekday, what fiscal quarter is it in, what is the UTC offset for a specific (or multiple) time zone(s), etc. etc. Some of those you could calculate at runtime, but in a lot of cases it's better (or only possible) to pre-calculate.
Another is that if you just store the DATE in the table, you only have one option for indicating a missing date (NULL) or you need to start making up meaningless token dates like 1900-01-01 to mean one thing (missing because you don't know) and 1899-12-31 to mean another (missing because the task is still running, the person is still alive, etc). If you use a dimension, you can have multiple rows that represent specific reasons why the DATE is unknown/missing, without any "magic" values.
Personally, I would prefer to just store a DATE, because it is smaller than an INT (!) and it keeps all kinds of date-related properties, the ability to perform date math etc. If the reason the date is missing is important, I could always add a column to the table to indicate that. But I am answering with someone else's data warehousing hat on.
Lets say you've got a thousand entries per day for the last year. If you've a date dimension your query grabs the date in the date dimension and then uses the join to collect the one thousand entries you're interested in. If there's no date dimension your query reads all 365 thousand rows to find the one thousand you want. Quicker, more efficient.
I'm working on a database, and can see that the table was set up with multiple columns (day,month,year) as opposed to one date column.
I'm thinking I should convert that to one, but wanted to check if there's much point to it.
I'm rewriting the site, so I'm updating the code that deals with it anyway, but I'm curious if there is any advantage to having it that way?
The only thing it gets used for is to compare data, where all columns get compared, and I think that an integer comparison might be faster than a date comparison.
Consolidate them to a single column - an index on a single date will be more compact (and therefore more efficient) than the compound index on 3 ints. You'll also benefit from type safety and date-related functions provided by the DBMS.
Even if you want to query on month of year or day of month (which doesn't seem to be the case, judging by your description), there is no need to keep them separate - simply create the appropriate computed columns and intex them.
The date column makes sense for temporal data because it is fit for purpose.
However, if you have a specific use-case where you are more often comparing month-to-month data instead of using the full date, then there is a little bit of advantage - as you mentioned - int columns are much leaner to store into index pages and faster to match.
The downsides are that with 3 separate int columns, validation of dates is pretty much a front-end affair without resorting to additional coding on the SQL Server side.
Normally, a single date field is ideal, as it allows for more efficient comparison, validity-checks at a low level, and database-side date-math functions.
The only significant advantage of separating the components is when a day or month first search (comparison) is frequently needed. Maybe an "other events that happened on this day" sort of thing. Or a monthly budgeting application or something.
(Even then, a proper date field could probably be made to work efficiently with proper indexing.)
Yes, I would suggest you replace the 3 columns with a single column that contains the date in Julian which is a floating point number. The part before the dot gives the day, the part after the dot gives the time within the day. Calculations will be easy and you can also easily convert Julian back into month/day/year etc. I believe that MS Excel stores dates internally as a floating point number so you will be in good company.
What I want to achieve:
I have a table denoting ID and Credits of each individual ID.
I want to rate each ID as, rate(ID)=Credit(ID)/sum(Credit(ID)) the sum is over all IDs
I will be updating the table quite frequently and want to keep the sum(Credit(ID)) handy by creating another column and storing this sum in the table (say sigmaID), which should always have exact same value for all rows.
Whenever I change Credit for an ID (say add 100), I can simply do the same operation on this column value (add 100)
Do I have to update sigmaID for all rows? Will it be efficient?
I would like to periodically check if sigmaID is indeed sum(Credit(ID)) for consistency , am I overdoing it? Will it inefficient?
Is there any other approach to this (I am worried about efficiency)?
Kindly provide pure SQL queries as I need to put all of this in an UPDATE trigger which will calculate this rating (and loads of other formulas with other parameters of ID). I may have access to scripting language (PHP/python) but I don't know for sure. Hence the pure SQL request.
Unfortunately my English is very very poor.
As i realized, you want to have all information about your column values, in past and present?
You want to log it ? If it so you can make journal table and log everything yo want in it by trigger.
best regards,
tato mumladze
Sorry for the long question title.
I guess I'm on to a loser on this one but on the off chance.
Is it possible to make the calculation of a calculated field in a table the result of an aggregate function applied to a field in another table.
i.e.
You have a table called 'mug', this has a child called 'color' (which makes my UK head hurt but the vendor is from the US, what you going to do?) and this, in turn, has a child called 'size'. Each table has a field called sold.
The size.sold increments by 1 for every mug of a particular colour and size sold.
You want color.sold to be an aggregate of SUM size.sold WHERE size.colorid = color.colorid
You want mug.sold to be an aggregate of SUM color.sold WHERE color.mugid = mug.mugid
Is there anyway to make mug.sold and color.sold just work themselves out or am I going to have to go mucking about with triggers?
you can't have a computed column directly reference a different table, but you can have it reference a user defined function. here's a link to a example of implementing a solution like this.
http://www.sqlservercentral.com/articles/User-Defined+functions/complexcomputedcolumns/2397/
No, it is not possible to do this. A computed column can only be derived from the values of other fields on the same row. To calculate an aggregate off another table you need to create a view.
If your application needs to show the statistics ask the following questions:
Is it really necessary to show this in real time? If so, why? If it is really necesary to do this, then you would have to use triggers to update a table. This links to a short wikipedia article on denormalisation. Triggers will affect write performance on table updates and relies on the triggers being active.
If it is only necessary for reporting purposes, you could do the calculation in a view or a report.
If it is necessary to support frequent ad-hoc reports you may be into the realms of a data mart and overnight ETL process.