I have couple of tables which are dimension and facts. For example, I have Applications table (20 millions or rows and + 100 thounds every day), and I have Contracts table (5 millions or rows and + 10-20 thounds every day), both have common dictionaries (Customer, Bank, RiskResult, etc) for one contract can be more than one application, so for contract application is Dimension, and I need do analyze use Application attributes, but I need do analyze applications too, for example, how much applications was crated today, how much was deleted, difference between wish sum and actual sum in Application, etc. There is table transaction, where one contract have a lot of rows, so for Transation contract is dimension. What I did, in views for SSAS Tabular I create two tables from one, for Application table I craeted FactApplication where all SK all Dimensions and all sums, then I craeted DimApplicatiom where I placed all attributes, and then link them, but they have 1 to 1 relationship and I don't know how right is it? For cantracts I did the same.
it'd be nice to have a more visual representation (ERD) of your raw tables and proposed Fact/Dim tables.
My understanding is that you have a "Credit Card Application" business process
I would have something like this:
DimApplicant (ApplicantSK (PK), FirstName, ...)
FactApplication (ApplicationSK (PK), ApplicantSK(FK), ApplicationDateSK, ProcessingDuration, ApprovalStatus, CustomerSK (NULLABLE)(FK) (1 to N)..)
DimContract (ContractSK, CustomerSK, ...) (Assuming there can be multiple contracts per Customer)
DimCustomer (CustomerSK, FirstName...)
Related
I am setting up a database where I'd like to have many-to-many relationships between some tables. There's no user interface for this database; we will be putting data into the tables using R scripts and retrieving it using Python scripts.
The entities involved are projects and cost forecasts. Multiple projects may use the same forecast. For each forecast, there are costs to develop a project in each of several future years. I need to be able to retrieve the cost forecast for each future year for each individual project.
I think the tables below would be a fairly standard way to represent these relationships. Note that "pk" means "primary key" and "fk" means "foreign key".
PROJECT
name
forecast_id (fk)
FORECAST
forecast_id (pk)
COST
forecast_id (fk)
year
cost
To retrieve the forecast for a particular project, I would just retrieve all the rows from COST that have a matching forecast_id. I don't need the FORECAST table for anything, except as a home for the forecast_id that establishes the many-to-many relationship between PROJECT and COST.
So my main question is, can I just drop the FORECAST table and have a direct many-to-many relationship between PROJECT and COST, using the forecast_id? I know this is physically possible, but many discussions use language along the lines that "many-to-many relationships aren't possible without a bridge table." But why would I want to add the bridge table, if I can do all my queries without it and it is one more table I would have to maintain?
Going further, many discussions of many-to-many relationships (including #mike-organek's comment below) suggest a structure similar to this:
PROJECT
project_id (pk)
name
PROJECT_COST
project_id (fk)
cost_id (fk)
COST
cost_id (pk)
year
cost
While this seems like a commonly preferred approach, it suits my needs even less well. Now every time I add a new project, instead of just assigning the forecast_id corresponding to a particular forecast, I have to add a bunch of link records to the PROJECT_COST table, one for each future year. This will also require a lot of management, and allows potential creation of relationships I don't want (e.g., one project uses costs from one forecast for the first two years, then costs from a different forecast for the next two years).
So my second question is, is there anything preferable about the second approach over the first approach, or over my simplified approach (using just the PROJECT and COST tables)?
Update
There seems to be some confusion about what I'm asking here. So I've revised the question significantly to try to make it clearer. Note that I renamed cost_group to forecast as part of this.
The second approach (with the project_cost table containing two foreign keys) is the correct way to model a many-to-many relationship.
But your idea with the shared forecast_id (with or without forecast table) exhibits that you are not thinking of a many-to-many relationship in the ordinary sense: if one project is associated with a certain set of costs, all other projects must either be associated with the same or a disjoint set of costs.
If that is what you want, I see no problem with removing the forecast table. There is no referential integrity you are losing that way.
If you have additional requirements, for example that there has to be at least a cost and a project for each existing forecast_id, things may change. That could be guaranteed with foreign keys from the forecast table, but not without that table.
I’ve been asked create our analysis cube and have a design question.
We sell ‘widgets’ and ‘parts’ to go with those widgets. Each order has many widgets and sometimes a few parts.
What I’m stuck on is – to me, an order is a fact in a measure. But, what are the widgets? Are they a dimension and each fact in the measure will be an entry for every part and widget for the order.
So, if order 123 had widget 1 and widget 2 and part 5, then there will be 3 facts in the measure for the same order? Is that correct?
At its basic level you can consider most facts to be transactions or transaction line items. So, for example, you may have a 'sales' fact table in which each record represents one line item from that sale. Each fact record would have numeric columns representing metrics and other columns joining to dimension tables. The combination of those dimensions would describe that line item. So, in your case, you likely have something like:
1) A 'date' dimension detailing the date of the transaction
2) A 'widget' dimension detailing the widget sold on that transaction
3) A 'customer' dimension detailing the customer who bought that item (almost certainly the same customer would appear on every line item for this transaction)
4) ... determined by what information you have and what business problem you're trying to solve.
Now, the dimension tables contain further details. For example, your widget dimension table likely contains things like the name of the widget, the color, the manufacturer, etc. Every time your company sells one of these widgets, the record in the fact table links to that same dimension record for that name, color, manufacturer, etc. combination (i.e. you don't create a new dimension record every time you sell the same item - this is a one-to-many relationship - each dimension record may have many related fact records).
You other dimension tables would similarly describe their dimensions. For example, the customer dimension might give the customer's name, their address, ...
So, the short answer to your question is that widget likely is a dimension, items and widgets may (or may not) actually be the same dimension (in a school class I suspect that they are), and that you would have 3 fact records for that one transaction.
This is probably along the same lines as the prior answer but....
If you try and model "many widgets per order" you'll have issues because you end up with a many (order fact) to many (widgets) relationship. In a cube / star schema design, many to many relationship usually need to be moddeled out to be many to one in some way.
So what you do is try and identify what special thing identifies an "order" (as opposed to a bunch of widgets in an order). Usually that is simply stuff like order date, customer, order number, tax
An example way to model this is:
If you have a single order with five widgets, you model that as a fact table with five records that happens to have a repeating widget, customer, date etc. in it
Then you have to work out how you spread an order header tax amount over five records. The two obvious solutions are:
Create a widget that represents tax and add that as another record
Spread the tax over five records, either evenly or weighted by something
Modelling "parts" just takes these concepts further.
It is important to understand what the end user wants to see, why they want to see parts. What do they want to measure by parts, how do you assign higher level values (like tax) down to lower levels like parts.
I am currently going to be designing an app in vb.net to work with an access back-end database. I have been trying to think of ways to reduce down data redundancy
and I have an example scenario below:
Lets imagine, for an example purpose, I have a customers table and need to highlight all customers in WI and send them a letter. The customers table would
contain all the customers and properties associated with customers (Name, Address, Etc) so we would query for where the state is "WI" in the table. Then we would
take the results of that data, and append it into a table with a "completion" indicator (So from 'CUSTOMERS' to say 'WI_LETTERS' table).
Lets assume some processing needs to be done so when its completed, mark a field in that table as 'complete', then allow the letters to be printed with
a mail merge. (SELECT FROM 'WI_LETTERS' WHERE INDICATOR = COMPLETE).
That item is now completed and done. But lets say, that every odd year (2013) we also send a notice to everyone in the table with a state of "WI". We now query the
customers table when the year is odd and the customer's state is "WI". Then append that data into a table called 'notices' with a completion indicator
and it is marked complete.
This seems to keep the data "task-based" as the data is based solely around the task at hand. However, isn't this considered redundant data? This setup means there
can be one transaction type to many accounts (even multiple times to the same account year after year), but shouldn't it be one account to many transactions?
How is the design of this made better?
You certainly don't want to start creating new tables for each individual task you perform. You may want to create several different tables for different types of tasks if the information you need to track (and hence the columns in those tables) will be quite different between the different types of tasks, but those tables should be used for all tasks of that particular type. You can maintain a field in those tables to identify the individual task to which each record applies (e.g., [campaign_id] for Marketing campaign mailouts, or [mail_batch_id], or similar).
You definitely don't want to start creating new tables like [WI_letters] that are segregated by State (or any client attribute). You already have the customers' State in the [Customers] table so the only customer-related attribute you need in your [Letters] table is the [CustomerID]. If you frequently want to see a list of Letters for Customers in Wisconsin then you can always create a saved Query (often called a View in other database systems) named [WI_Letters] that looks like
SELECT * FROM Letters INNER JOIN Customers ON Customers.CustomerID=Letters.CustomerID
WHERE Customers.State="WI"
I am working on a VB.net (VS-2010, Win XP Pro 2 SP3), Employee Management Project. I need to keep track of Employee Leave Attendance and also each Equipment assigned to an Employee. How can I achieve this using SQLlite.
It will be very useful if you could provide me with examples as I am completely new to the field of SQL and VB.net
I think this can be done with two tables where one has the primary key while the other has a foreign key, but I am not sure. Also how many tables will I need for storing data in Leave and Equipment Form.
I went through other questions but I was unable to figure out a solution for my problem.
(Sorry, I cannot provide with images as this site prevents me from posting images without 10 reps)
Most problems are only as complex, and as simple as you make them. Out of habbit, nearly all tables end up with a unique ID field. There are exceptions, which I will call "link" tables, eg, ones that provide connection details between two data tables.
Now, in your senario
You would need a "holiday" table, where each row will contain the employee unique ID and either a start/finish date, eg, if they take half a day, it needs to be visible, or, just a year and value, eg in 2011, I booked, 2 lots of 35 hours, and 1 lot of 4 hours eg, Ive taken 2 weeks and half a day.
For the equipment, you would need a data table, since an item can only got to 1 employee, it depends if you're going to use this for booking or not, but if its just like a library, eg I currently have a loaner laptop, then you can just have an employee field in the equipment table. If you need a booking system, then you would require link tables and more complex.
Best way to work out your tables is to try and group your data, and then write the items on peices of paper and see how you as a human do it. After a while you end up able to do so in your head.
I am designing a system for a client, where he is able to create data forms for various products he sales him self.
The number of fields he will be using will not be more than 600-700 (worst case scenario). As it looks like he will probably be in the range of 400 - 500 (max).
I had 2 methods in mind for creating the database (using meta data):
a) Create a table for each product, which will hold only fields necessary for this product, which will result to hundreds of tables but with only the neccessary fields for each product
or
b) use one single table with all availabe form fields (any range from current 300 to max 700), resulting in one table that will have MANY fields, of which only about 10% will be used for each product entry (a product should usualy not use more than 50-80 fields)
Which solution is best? keeping in mind that table maintenance (creation, updates and changes) to the table(s) will be done using meta data, so I will not need to do changes to the table(s) manually.
Thank you!
/**** UPDATE *****/
Just an update, even after this long time (and allot of additional experience gathered) I needed to mention that not normalizing your database is a terrible idea. What is more, a not normalized database almost always (just always from my experience) indicates a flawed application design as well.
i would have 3 tables:
product
id
name
whatever else you need
field
id
field name
anything else you might need
product_field
id
product_id
field_id
field value
Your key deciding factor is whether normalization is required. Even though you are only adding data using an application, you'll still need to cater for anomalies, e.g. what happens if someone's phone number changes, and they insert multiple rows over the lifetime of the application? Which row contains the correct phone number?
As an example, you may find that you'll have repeating groups in your data, like one person with several phone numbers; rather than have three columns called "Phone1", "Phone2", "Phone3", you'd break that data into its own table.
There are other issues in normalisation, such as transitive or non-key dependencies. These concepts will hopefully lead you to a database table design without modification anomalies, as you should hope for!
Pulegiums solution is a good way to go.
You do not want to go with the one-table-for-each-product solution, because the structure of your database should not have to change when you insert or delete a product. Only the rows of one or many tables should be inserted or deleted, not the tables themselves.
While it's possible that it may be necessary, having that many fields for something as simple as a product list sounds to me like you probably have a flawed design.
You need to analyze your potential table structures to ensure that each field contains no more than one piece of information (e.g., "2 hammers, 500 nails" in a single field is bad) and that each piece of information has no more than one field where it belongs (e.g., having phone1, phone2, phone3 fields is bad). Either of these situations indicates that you should move that information out into a separate, related table with a foreign key connecting it back to the original table. As pulegium has demonstrated, this technique can quickly break things down to three tables with only about a dozen fields total.