How to design a star schema - sql

I am confused where should I start to design a star schema.
for example
I have tables in database as follows:
Branch(branchNo, bStreetAddress, bCity)
LoanManager(empNo, empName, phone, branchNo)
Customer(custNo, custName, profession, streetAddress, city, state)
Account(accNo, accType, balance, accDate, custNo)
LoanContract(contractNo, loanType, amount, loanDate, empNo, custNo)
I want to design a data-warehouse to analysis the loads
such as :
The total amount of loans in 2008.
For the type of loans with more than 10 loan contracts, the type of loan and the number of contracts
when creating a star schema, what where should I start?
For what I understanding, all the star schemas must have a center, and the center fact table, contains "Measures" and "Relations to other fact tables".
So, is it that, when designing the star schema, we always start from the center,
confirm what are the measure first? and then choose proper relation to another fact table?
But I still have another question, what should we choose to be Measures?
When choosing measures, what question should I ask myself?

The design of a star schema is always driven by the client's business needs. What are the questions asked? How fine-grained should the answers be?
In you example, interesting questions might be "Number of Contracts by Branch or LoanManager" or "Managed sum of Loans by Branch or LoanManager". In this case, Branch and LoanManager would become your dimensions while Count(LoanContract) and Sum(LoanContract.amount) would be your measures. A common additional dimension is time, usually week or quarter.
The schema for answering those questions could look like this:
DimBranch ( branchNo )
DimLoanManager ( empNo )
DimQuarter ( year, qNo ) -- qNo in (1,2,3,4)
DimWeek ( year, weekNo ) -- weekNo in (0..53), depending on business rules
Measures ( branchNo, empNo, year, qNo, weekNo, numContracts, sumLoans )
For the business questions you already posed in your question, the dimensions and measures would be such:
dimension: year, measure: Sum(LoanContract.amount)
dimension: loanType, measure: Count(LoanContract)
Putting those two into the same star schema doesn't make much sense, since they neither share dimensions or measures.

Related

Dynamic mapping with SQL

I have an SQL db which contains a table of license plate numbers (plates), a table of parking lots (places), and a table corresponding one to the other over time (parking), each row placing a specific vehicle plate in a specific place at a particular time.
create table parking (
plateid integer,
placeid integer,
time_period integer
);
This means each row as a whole is unique but the plate/place combinations are not.
I need to count how many times each plate appears in each place. There are an indeterminate number of both, so I cannot maintain a table per place and simply count the entries.
This is easy enough using a general purpose programming language applied to the list. Is there a way to do it with straight SQL?
Are you just looking for aggregation?
select plate_id, place_id, count(*)
from parking
group by plate_id, place_id;
You can group by both plate and place and count occurrences:
SELECT place_id, place_id, count(*)
FROM parking
GROUP BY place_id, plate_id

Displaying count of duplicates and removing duplicates at the same time?

Apologies if the title doesn't make full sense, I'll try to explain as best I can.
I have a table containing information about vehicles, there are many duplicates and around 5000 rows overall. Here's a snippet as an example:
As you can see the model '159 TI TBI' repeats twice, this essentially means there are two of these cars stored in London.
I am looking for something like below, where there is a count of how many times a particular vehicle in a particular location repeats, as well as removing duplicates so each vehicle only appears once for each location.
I am able to do a fairly simple select command for a particular vehicle and location, such as
SELECT COUNT(model), model, loc_name, vehicle_type
FROM vehicles
WHERE loc_name='London' AND model='159 TI TBI'
GROUP BY model, loc_name, vehicle_type
The issue is that I'd be repeating this command for every combination of a vehicle model in a particular location, it's not very efficient.
Hopefully this makes some sense, I haven't had a huge amount of experience with SQL so apologies if anything is badly wrong. Thanks.
This query will give you the required results
SELECT COUNT(model) cnt, model, loc_name, vehicle_type
FROM vehicles
GROUP BY model, loc_name, vehicle_type
Your question is a bit unclear. But let me try. It seems you think to get the count for each group, you would have to re-query with for each vehicle in the where clause. However, aggregation will allow you to get the count across all the vehicles. If you are just looking for the model, location, type uniquely and the count of occurrences, you have the right query, just remove your where clause and the power of SQL will take care of it for you.
SELECT COUNT(*) as quantity, model, loc_name, vehicle_type
FROM vehicles
GROUP BY model, loc_name, vehicle_type
if you want only the rows with more then one occurrence you can use having for filter th aggregated result
SELECT COUNT(*) as quantity, model, loc_name, vehicle_type
FROM vehicles
GROUP BY model, loc_name, vehicle_type
having count(*) > 1

Create dynamic Metric, which will be based on various dimensions/ filters applied

I have a table "Trans" which contains the acccountNumbers and other dimensions like Facility , Status etc.
I need to create a calculated member in SSAS where the logic would be
Count of Accounts in a group / Total accounts.
Count of Accounts in a group would be based on the Dimension filter I provide.
For e.g. If I provide the facility then I need the Count of accounts (In numerator) group be different facilities.
Likewise If I provide the Status I would need the numerator to be grouped as per the data in Status table.
Table Name
Trans (AccountNumber, facility,Status) -- This is fact table
Dimension tables
Facility( Id, Facility_name)
Status (Id, Status)
Not sure how to go about it.
EXISTING is a useful function, so maybe something like:
COUNT(
EXISTING [AccountNumber].[AccountNumber].MEMBERS
)

Grouping Across a Hierarchy

End Goal: I need to create a financial report that can follow an organization hierarchy but also separate by account type on each level. Ultimately the report will be displayed in crystal reports.
Problem: There are several types of accounts (revenue, expense, liability, etc...). There are also several organizational structures, with a maximum depth of 8 levels. Users need to be able to drill down to each level of the structure, and see sub totals for each account type.
Example
Highest Level
Expense
CEO
Community & Customer Service
Corporate
Engineering
Revenue
CEO
Community & Customer Service
Corporate
Engineering
Level 1 (CEO)
Revenue
WHS
OD
Expense
WHS
OD
The structure continues until you end at the account level, which are grouped by account type.
The structure is stored in an adjacency model.
Structure Columns: Class_id - Class_Name - Parent_ID
Account Columns:
Account_id, Class_id, Account_Type, Budget, Actual
I have no problem navigating and grouping the hierarchy, but my difficulty comes in the grouping of account types within the hierarchy. Any suggestions for solutions in SQL or CrystalReports would be greatly appreciated!

Dimensional Model for Employee Turnover

I am trying to determine the best way to model the scenario of employee turnover for a Dimensional Model. I am not sure if its best to include the Termination_Count and Headcount in the same measure.
I currently have a headcount measure with both termed and headcount:
**Headcount Measure:**
Employee_id
Department
Employee_count
Termed_count
Month
So each individual employee will have a row created for them if they are active during the month or if they are terminated during the month.
How have other people worked with employee turnover issue.
Don't track Headcount and Turnover in the same table, they have different grains.
Headcount: being a semi-aditive measure you need a snapshot fact table counting employees per department, salary level, bureau and whatever other dimensions you need. It should store these values once per day;
Turnover: have a Hire/Fire transaction table with three measures: employees_hired (0/1), employees_fired (0.1) and net_employee_variation (=1/0/+1). On the employee dimension you can have a "date_hired" and "date_left_ as attributes to allow, for example, counting time between the two events.
But you shouldn't mix what is a transaction fact table with a snapshot fact table.