Can you please let me know the best approach for designing Data ware house and dimension modelling (SSAS cube) based on below requirement.
Requirement here is, I have to get the student count which are active as of that month, if the user selects year (2015) from drop down which is displayed in the image. Catch here there is no option to select enrollstartdate and enrollenddate as two different dates (no role play dimension) , only one filter i.e Year.
Requirement to get the active student count as of that month
There are a couple of possible approaches that come to mind. The first is a periodic snapshot fact table and another is a timespan accumulating snapshot fact table.
In my opinion, the first is easier to implement, so I've provided some detail below that I hope you will find useful.
CREATE TABLE FactEnrollmentSnapshot
(
DateKey INT NOT NULL -- Reference to Date dimension table
, StudentKey INT NOT NULL -- Reference to Student dimension table
);
CREATE TABLE DimStudent
(
StudentKey INT NOT NULL
StudentId ?
...Other Student Attributes...
);
CREATE TABLE DimDate
(
DateKey INT NOT NULL
, FullDate DATETIME NOT NULL
, Year SMALLINT
);
Assuming your date dimension is at the day grain, you could either store daily snapshots, or just store snapshots on the 15th of each month.
Depending on whether you need to get a count of unique students during 2015 or the most recent count of students in 2015 you could use the DISTINCT COUNT aggregation or the LastChild aggregation in SSAS. If you use LastChild, make sure your Date dimension is marked as a Time type.
Note that a snapshot style fact table results in semi-additive facts.
You could get the raw data to populate the fact table from your example source data by using a CROSS JOIN between you source data and the Date dimension
SELECT
StudentTable.StudentID
, DimDate.FullDate
FROM
StudentTable
INNER JOIN DimDate ON (DimDate.FullDate BETWEEN StudentTable.EnrollDate AND ISNULL(StudentTable.DisenrollDate,'9999-12-31'));
I didn't include the lookups for surrogate keys for simplicity
You can then get the answer for your business users be filtering on the Year attribute in the Date dimension.
I hope this is useful in getting you started with a possible approach.
Regards,
Jesse Dyson
Related
I'm trying to make a attendance management system for my college project.
I'm planning to createaone table for each month.
Each table will have
OCT(Roll_no int ,Name varchar, (dates...) bool)
Here dates will be from 1 to 30 and store boolean for present or absent.
Is this a good way to do it?
Is there a way to dynamically add a column for each day when the data was filled.
Also, how can I populate data according to current day.
Edit : I'm planning to make a UI which will have only two options (Present, absent) corresponding to each fetched roll no.
So, roll nos. and names are already going to be in the table. I'll just add status (present or absent) corresponding to each row in table for each date.
I would use Firebase. Make a node with a list of users. Then inside the uses make a attendance node with time-stamps for attended days. That way it's easier to parse. You also would leave room for the ability to bind data from other tables to users as well as the ability to add additional properties to each user.
Or do the SQL equivalent which would be make a table list of users (names and user properties) with associated keys (Primary keys in the user table with Foreign keys in the attendance table) that contained an attendance column that would hold an array of time-stamps representing attended days.
Either way, your UI would then only have to process timestamps and be able to parse through them with dates.
Though maybe add additional columns as years go so it wouldnt be so much of a bulk download.
Edit: In your case you'd want the SQL columns to be by month letting you select whichever month you'd like. For your UI, on injecting new attendance you'd simply add a column to the table if it does not already exist and then continue with the submission. On search/view you'd handle null results (say there were 2 months where no one attended at all. You'd catch any exceptions and continue with your display.)
Ex:
User
Primary Key - Name
1 - Joe
2 - Don
3 - Rob
Attendance
Foreign Key - Dates Array (Oct 2017)
1 - 1508198400, 1508284800, 1508371200
2 - 1508284800
3 - 1508198400, 1508371200
I'd agree with Gordon. This is not a good way to store the data. (It might be a good way to present it). If you have a table with the following columns, you will be able to store the data you want:
role_no (int)
Name (varchar)
Date (Date)
Present (bool)
If you want to then pull out the data for a particular month, you could just add this into your WHERE clause:
WHERE DATEPART(mm, [Date]) = 10 -- for October, or pass in a parameter
Dynamically adding columns is going to be a pain in the neck and is also quite messy
I know there are a lot of solutions to this but I am looking for a simple query to get all the dates between two dates.
I cannot declare variables.
As per the comment above, it's just guesswork without your table structures and further detail. Also, are you using a 3NF database or star schema structures, etc. Is this a transaction system or a data warehouse?
As a general answer, I would recommend creating a Calendar table, that way you can create multiple columns for Working Day, Weekend Day, Business Day, etc. and add a date key value, starting at 1 and incrementing each day.
Your query then is a very simple sub-select or join to the table to do something like
SELECT date FROM Calendar WHERE date BETWEEN <x> AND <y>
How to create a Calender table for 100 years in Sql
There are other options like creating the calendar table using iterations (eg, as a CTE table) and linking to that.
SQL - Create a temp table or CTE of first day of the month and month names
I have a fact table that has 4 date columns CreatedDate, LoginDate, ActiveDate and EngagedDate. I have a dimension table called DimDate whose primary key can be used as foreign key for all the 4 date columns in fact table. So the model looks like this.
But the problem is, when I want to do sub-filtering for the measures based on the date column. For ex: Count all users who were created in the last month and are engaged in this month. This is not possible to do with this design, coz when I filter the measure with create date , I can’t further filter for a different time window for engaged date. Since all the connected to same dimension, they are not working independently.
However, If I create a separate date dimension table for each of the columns, and join them like this then it works.
But this looks very cumbersome when I have 20 different date columns in fact table in real world scenario, where I have to create 20 different dimensions and connect them one by one. Is there any other way I can achieve my scenario w/o creating multiple duplicated date dimensions?
This concept is called a role-playing dimension. You don't have to add the table to the DSV or the actual dimensions one time for each date. Instead add the date once, then go to the dimension usage tab. Click Add Cube Dimension, and then choose the date dim. Right-click and rename it. Then update the relationship to use the correct fields.
There's a good article on MSSQLTips.com that covers this topic.
I'm creating a timesheet using Infopath. The data will be stored in the database, so for that I have to create a table. This timesheet will be used for the whole year.
I need help in creating a SQL table. The table structure I want for this timesheet is:
Project_Category Mon Tue Wed Thu Fri Sat Sun Total
Project 1
Project 2
Project 3
Project 4
Project 5
Other
Total
The days should be with dates (Like, Monday 01/01/2013) or please suggest me if you have a better way to do this.
I would not store this data in a single table. Consider creating this using multiple tables instead of the single table.
For example, you could have a Projects table with ProjectId and ProjectName. Then you could easily link your ProjectId field to a ProjectSummary table which stores ProjectId, DateField and Total. I have no clue what your Total row is suppose to be, but if it's a calculation of a date range, use SQL to calculate those values and do not store that in the table.
Good luck -- there are lots of resources online to get started with SQL -- do a little searching.
As sgeddes has suggested, multiple tables will probably be a much better way to approach this.
Personally I would avoid having more than 1 day per row and also to make it flexible allow more than one entry per day.
The structure I would create is as follows:
Entry_ID INT IDENTITY(1,1) PRIMARY KEY
Timesheet_ID INT,
Project_ID INT,
DateTimeFrom DATETIME,
DateTimeTo DATETIME
This then allows date based calculations to be much simpler.
eg. Number of hours on project X between 20th June and 25th June would be a query like:
SELECT SUM(DATEDIFF(MINUTES,DateTimeFrom,DateTimeTo)/60) AS [HOURS]
FROM MyTable
WHERE DateTimeFrom >= '2012-06-25' AND DateTimeTo <= '2012-06-29'
I have a task to design a SQL database that will record values for the company's commodity products (sku numbers). The company has 7600 unique product items that need to be tracked, and each product will have approximately 200 values over the period of a year (one value per product per day, over the period of a year).
My first guess is that the sku numbers go top to bottom (each sku has a row) and each date is a column.
The data will be used to view in chart / graph format and additional calculations will be displayed against those (such as percentage profit margin etc)
My question is:
- is this layout advisable?
- do I have to be cautious of anything, if this type of data goes back about 15 yrs (each table will represent a year)
Any suggestions?
It better to have 3 columns only - instead of many as you are suggesting:
sku date value
-------------------------
1 2011-01-01 10
1 2011-01-02 12
2 2011-01-01 5
This way you can easily add another column if you want to record something else about a given product per date.
I would suggest a table for your products, and a table for the historical values. Maybe create an index for the historical values based on date if you plan to select for specific time periods.
create table products (
id number primary key,
sku number,
name text,
desc text);
create table values (
id number primary key,
product_id number,
timestamp date,
value number,
foreign key fk_prod_price product_id on product.id);
create index idx_price on values.timestamp;
NOTE: not actual sql, you will have to write your own
If you do like #fiver wrote, you don't have to have a table for each year either. Everything in one table. And add indexes on sku/date for faster searching