Data warehouse - Star Schema Explained

Data warehouse - Star Schema Explained - schema

I am Mapping a star schema from the start, and I have a question that I can't find an answer.
Picture that I have a model which involve Client, Store , Address, Time (Dimensions) and Sale (fact). Ok, that is easy to model, but I get a "conceptual" question. I have to add a Newsletter Dimension to the star, and this newsletter can lead the customer to buy something.
So, in my report/cubes I need to know (in a period):
How many Newsletter lead to a Sale,
How many Newsletter have been generated.
Where should I place the generate_date of Newsletter? If i place it in FACT table, but if this Newsletter did not lead to a Sale, there will be no date in FACT table.
If i place it on Newsletter and join it to Time, I would be breaking the Star schema?
How do I solve this :S
I have many other cases that is the same question, like client join_date.

As far as I understood! Why your making Sale as a Dimension and Store as a Fact? Normally Fact table contains Transactional data and I think Sale is Transactional. But this is my assumption as you haven't described what type of data it contain.
As far as Newsletter is concerned! Yes you can add just like other dimensions you have added but to get desired result! You also have to add Customer Dimension.
Place generate_date in Date dimension and place Date ID in Fact table.
You cannot join Newsletter with Time as it will break Start Schema rules.
This link might be helpful to you.

Related

Relational Database design: handling sales receipts

I am making an online market for a learning project. Pardon me for not having a diagram.
I have the tables Seller and Product,which contains data about the seller and products, respectively. A Seller can have multiple products. There is also a Receipt table that stores information regarding purchases made by a customer. This is an important record and must persist. The receipt should be able to have information on the item purchased.
However, products are dynamic, products may be added and removed. But since the Receipt should reference the Product, it means that I should not delete a Product row even if it is no longer on sale.
Is this the right way to do it? Are there any better design pattern I can use?

Yes, that is the right way to do it. If you set the referential integrity right, the system will not allow you to delete a product or seller if it has receipts. The next thing to do is to use a flag to mark the product or seller as deleted or archived. It could be either a boolean or a date that indicates when it became inactive. Using a 'From' AND a 'To' date to indicate valid time intervals, as Hellmar Becker suggests, is very powerfull, but it opens a whole new can of worms: you can have more than one 'valid' period, so you have to extend your primary key.
Modern databases like HANA (from SAP) just don't allow deletes any more, and have inbuilt 'deleted' flags.

This isn't a proper answer. I just want to give you the gift of a diagram since you didn't have one! :)
(Disclaimer: QuickDatabaseDiagrams is my project)

Schema for storing payment attempt records in SQL

I tried looking at similar StackOverlow posts and it seems as those questions for input about schema is valid. Also, I'm a software developer and not a DB expert by trade. So hopefully this is met well.
I'm using SQL Server, though I think this question is generic enough that it might be applicable to pretty much any SQL product as it pertains to what's the best schema for my scenario.
I'm writing a referral payment system whereby stores may credit and pay back individuals who refer customers. The entities are -
Referrer: the one to be paid for referring customers,
Referral: the customer that was referred
Referral Purchase: The amount and date of the referral's purchase.
Admin: the one doing the paying.
When determining what to pay the referrer I need to tally up all of the referral purchases that have not been credited. The sum at the time of the pay out attempt is what gets paid.
The confounding part of this whole thing is that when an Admin makes a payment, it may fail for any number of reasons (insufficient funds, the referrer gave bad PayPal information, etc.). All of this needs to be stored so that I can not only look back over past payment attempts and determine the failures and what referral purchases were involved in the failure, but also to determine which referral purchases have yet to be credited to the referrer.
The best schema I've been able to devise is the following:
The point here is that each PaymentAttempt holds the status of the payment attempt (success/failure) and each Referral Purchase that was credited in the payment attempt has a link table which associates it with the payment attempt. One referral purchase may, then, be involved in any number of attempts to credit the referrer, with the last one being the successful attempt.
Ultimately my question comes down to this: when I need to go back and then determine how much the referrer needs to be paid at a later date, is it going to be a pain in years to come if I need to query ALL of the ReferralPurchases associated with the referrer, then join ALL of the ReferralPurchase/PaymentAttempt link tables, then join the associated PaymentAttempt status tables to find out which of the referral purchases have yet to be credited? I could see myself needing to create pretty weird queries just to find those five purchases that have yet to be credited.
Alternatively I could update the ReferralPurchase itself with a status flag, but is this considered "asking for it" in terms of data integrity (I think I could see some saying this is poor design since the state could be queried in other ways, and perhaps a bug might result in the bit being set without proper records to warrant it)? Is that bad design?
Or is there some better way to lay things out?

Will try my best to help you out, hopefully I understand your question correctly. If I were designing the system, there would be two tables that stand out for me. The tables and their columns are.
ReferralPurchase
• ReferralPurchase_Id (PK)
• Referrer_Id (Pointing to a person table)
• Referral _Id (Pointing to a person table)
Payment
• Payment_Id (PK)
• ReferralPurchase_Id (FK)
• AmountToBePaid
• StatusOfPayment
• DateLogged
• DatePaymentMade (Null if status is not successful)
• Admin_Id (Pointing to a person table)

Ben, not sure what you mean by status field. I would steer away from lifecycle status fields, but would consider a boolean. For example:
An isPaid flag on ReferralPurchase would seem like a reasonable approach. It should only be updated on a confirmed payment, and if there is a query on why it has been set, the evidence will exist in the form of history from the PaymentAttempt and link tables. This would simplify queries of outstanding payments, and pending payments would just be incomplete PaymentAttempts. There is the theoretical possibility that the history could contradict the value of the flag.
Alternatively, you could have an isSuccessful flag on the link table, which is "closer to source", if I can put it that way, in that it cannot as easily be in conflict, as it is the history itself (as long as the coder does not allow more than one row to be marked isSuccessful for a given ReferralPayment for example). Finding outstanding payments is just those ReferralPayments where not exists an isSuccessful link record.
Others will have different views on this. Let us know which way you go.

Database design for coupon usuage restriction

While working on implementing voucher feature for an eCommerce application, I need to implement Voucher usage restriction, some of restriction I am planning to have
Products
Exclude products
Product categories
Exclude categories
Email /Customer restrictions
Currently We are supporting following 2 type of Vouchers with an option to create Custom voucher type and all those Vouchers types are being maintained in a single table with help of discriminator (Hibernate use).
Serial Vouchers
Promotion Vouchers.
these are only few which I am targeting at initial stage.My main confusion is about database design and restriction of these voucher usage with Voucher.I am not able to decide which is best way to Map these restrictions in database.
Should I go for a single table for all these restriction and have a relation with Voucher table or is it good to group all similar type of restriction in a single table and have their relation with Voucher table.
As an additional information , we are using hibernate to map our entities with the DB table.

This seems like a very wide-open and freeform requirement. Some questions:
How complex will the business rules you are attempting to model be? If you’re allowing (business) users to define their own vouchers, odds are good they’ll come up with some pretty byzantine rules and combinations. If you have to support anything they come up with, you will have problems.
What will the database be tasked to do with this data? Store the “voucher definition”, sure, but then what? Run tallies or reports on them? Analyze how many are used, by who/when/how/for what? Or just list out what was used/generated over the past year?
What kind of data volumes are you going to have? One entry per voucher definition, or per voucher printed/issued? (If the latter, can you use one entry per voucher, with a count of how many issued?) Are we talking dozens, hundreds, or millions of vouchers?
If it’s totally free-form, if they just want a listing without serious analysis, if the overall volume is small, consider using blob fields rather than minutiae-oriented columns. Something like a big text field and a data-entry box wherein the user will “Enter any other criteria defining the voucher”. (You might even do this using XML.) Ugly, you can’t readily analyze the data, but if the goals are too great or diffuse and you're not going to use all that detailed data, it might be necessary.
A final note: a voucher that is good for only selected products cannot be used on products that are added after the voucher is created. A voucher that is good for all but selected products can be used for subsequently created products. This logic may apply to any voucher-limiting criteria. Both methodologies have merit, make sure the users are clear on what they’re doing.

If I understand what your your are doing, you will have a problem with only one table for all restrictions, because it means 1 row per Voucher and multiple values in your different restrictions columns.
It will be harder for you to UPDATE, extract and cast restrictions values.
In my opinion, you should have one table for each restrictions type and map them with Voucher table. However It will be easier for you to add new restrictions.

As a suggestion:
Isn't it more rational to have valid-products and valid-categories instead of Exclude-products and Exclude-categories?
Having a Customer-Creditgroup table will lead us to have valid-customer-group table.
BTW in the current design we can have a voucher definition table, I will call it voucher-type table.
About the restrictions:
In RDBMS level you can state only two types of table constraints decoratively:
uniquely identifying attributes (keys)
Subsets requirements referencing back to the same or other table
(foreign key)
Implementing all other types of table constraints (like a multi-tuple constraints or transition constraints) requires you to develop procedural data integrity code.
When a voucher is going to sold to a specific customer for a specific product we will need to check validity of excluded elements, that could be done by triggers in data base level or business logic of your application.

I would personally go with your second proposal... grouping all similar types of restrictions in a single table, which refers the Voucher table.
I'll add to that, that you can handle includes and excludes on the same table.
So the structure I'd use is some along the lines of:
Voucher (id, type, etc...)
VoucherProductRestriction (id,voucher_id,product_id,include)
VoucherProductCategoryRestriction (id,voucher_id,product_category_id,include)
VoucherCustomerRestriction (id,voucher_id,customer_id)
VoucherEmailRestriction (id,voucher_id,email)
...where the include column could be a boolean that is true in case you want to restrict the voucher to that product or category, or false if you want to restrict it to any product or category other than those specificied.
If I understand your context correctly, it makes no sense to have both include and exclude restrictions on the same voucher (although it could make sense to have more than one of the same type). You can probably handle and check this better if you use a single table for both types of restrictions.

SQL/MySQL structure (Denormalize or keep relational)

I have a question about best practices related to de-normalization or table hierarchy relationships.
For a simple example, let's say I have an app that allows a user to make a payment for an order. I save the order information in the orders table, and I have another table for the payment called payments. Payments has a foreign key to the orders table.
Let's assume that I can pay with a credit card, check, or paypal, and I want to save the information about the payment.
My question is what is the best way to handle this relationship between the different payment data and the payment table. The types of payment all have different data associated with them. So do I denormalize the payments table, putting credit card, check, and paypal information fields in there and then just use the fields as necessary. Alternately I could specify a payment type, and store the information in their own tables, but then I would have to use logic on an application level to get the data out of the correct credit card, check or paypal information tables...

I would choose to keep the database normalized.
but then I would have to use logic on an application level to get the data out of the correct credit card, check or paypal information tables...
You have to use logic (or at least mapping) in either case. Whether its what table to pull the data from or what fields in the table to access.

What about keeping it denormalized and then making a view to put the data back together again. You get the best of both worlds. IIRC, MySQL introduced views in version 5.

So do I denormalize the payments
table, putting credit card, check, and
paypal information fields in there and
then just use the fields as necessary.
yes. but this is not "denormalizing". if you stored order information in the client table, that would be denormalizing. adding nullable columns to accurately describe a payment in the payments table is not.

You can use the idea of table per subclass as the ORM tools do. This would require a join for each query against the payment table but...
Create tables for each payment type so you will have a creditcardpayment and a checkpayment table. The common fields go in the payment table, the specific fields go in the sub tables. The sub tables primary keys are foreign keys to the payment table's id.
To add a new payment you have to first insert the common fields into the payment table, get the id generated, then insert the specific fields into the specific sub table.
To query you have to join the subtables with the payment table. You could use a view to make that easier.
This way the database is still normalized and you have no null columns.

It partially depends on the framework (if any) that you are using. For instance: the Ruby on Rails way would generally be to store the type of the payment in the payments table and then have different, separate tables for each payment type (PayPal, Credit Card, etc).
Alternatively, if you notice that you are repeating the same data in many of the tables, Rails has a way to store all of the data in the same table, using only the fields you need, but still allowing you to have separate objects. For instance, you would have an AbstractPayment object with an abstract_payments table, but you would also have PayPalPayment and CreditCardPayment objects that both inherit from AbstractPayment and use the abstract_payments table. All you need to determine the payment type is a column in abstract_payments that tells you which type it is (probably a string, but could be an integer if you so choose). This is called STI.
No matter what framework/language you use, the same ideas can definitely apply and I think the right solution will depend on how many different types of payments you have, compared with how simple you want your database to be.

Keep it as normalized as possible. Only de-normalize when the performance of a fully normalized schema requires denormalization to improve response time, and do that only on a case by case basis to deal with specific performance issues associated with individual querys within your application.
These are complex problems. Database Normalization requires intimate domain knowledge, and a skilled analysis of how that domain model will be manipulated and utilized within your application. Denormalizing for performance requires that you understand your application's usage patterns well enough to predict performance issues before they occur (waiting till they actually occur in production is too late - by then making fundemental schema changes in the database is very expensive) and know what denormalization techniques to use to address them.

You need to weight the following factors:
How much space will you waste if you put all data into a single table
How complex the SQL queries will become in either case.
If you use different tables, you'll have to use joins. If you put everything into a single table, you'll need to find some magic to "ignore" the rows which don't matter (say when you want to find all credit card payments: Your query must then ignore everything that's something else).
The latter part gets more easy when you move the special data into special tables at the cost of more complex joins.

Circular database relationships. Good, Bad, Exceptions?

I have been putting off developing this part of my app for sometime purely because I want to do this in a circular way but get the feeling its a bad idea from what I remember my lecturers telling me back in school.
I have a design for an order system, ignoring the everything that doesn't pertain to this example I'm left with:
CreditCard
Customer
Order
I want it so that,
Customers can have credit cards (0-n)
Customers have orders (1-n)
Orders have one customer(1-1)
Orders have one credit card(1-1)
Credit cards can have one customer(1-1) (unique ids so we can ignore uniqueness of cc number, husband/wife may share cc instances ect)
Basically the last part is where the issue shows up, sometimes credit cards are declined and they wish to use a different one, this needs to update which their 'current' card is but this can only change the current card used for that order, not the other orders the customer may have on disk.
Effectively this creates a circular design between the three tables.
Possible solutions:
Either
Create the circular design, give references:
cc ref to order,
customer ref to cc
customer ref to order
or
customer ref to cc
customer ref to order
create new table that references all three table ids and put unique on the order so that only one cc may be current to that order at any time
Essentially both model the same design but translate differently, I am liking the latter option best at this point in time because it seems less circular and more central. (If that even makes sense)
My questions are,
What if any are the pros and cons of each?
What is the pitfalls of circular relationships/dependancies?
Is this a valid exception to the rule?
Is there any reason I should pick the former over the latter?
Thanks and let me know if there is anything you need clarified/explained.
--Update/Edit--
I have noticed an error in the requirements I stated. Basically dropped the ball when trying to simplify things for SO. There is another table there for Payments which adds another layer. The catch, Orders can have multiple payments, with the possibility of using different credit cards. (if you really want to know even other forms of payment).
Stating this here because I think the underlying issue is still the same and this only really adds another layer of complexity.

A customer can have 0 or more credit cards associated, but the association is dynamic - it can come and go. And as you point out a credit card can be associated with more than one customer. So this ends up being an n:m table, maybe with a flag column for "active".
An order has a static relationship to 0 or 1 credit card, and after a sale is complete, you can't mess with the cc value, no matter what happens to the relationship between the cc and the customer. The order table should independently store all the associated info about the cc at the time of the sale. There's no reason to associate the sale with any other credit card column in any other table (which might change - but it wouldn't affect the sale).

I think the problem is with the modeling of the Order. Instead of one Order has one credit card, an order should be able to be associated with more than one credit card of which only one is active at any time. Essentially, Order and Credit is many-to-many. In order to model this in DB, you need to introduce an association table, let's say PaymentHistory. Now when an order requires a new credit card, you can simply create a new credit card, and associate it with the order and mark the associating PaymentHistory as active.

Hmm?
A customer has several credit cards, but only a current one. An order has a single assigned card. When a customer puchases something, his default card is tried first, otherwise, he may change his main card?
I see no circular references here; when a user's credit card changes, his orders' stay the same. Your tables would end up as:
Customer: id, Current Card
Credit Cards: id, number, customer_id
Order: id, Card_id, Customer_id
Edit: Oops, forgot a field, thanks.

No matter the reason your data has circular relationships, you'll be a lot happier if you "forget" to declare one of them so that your tables have a bulk-load order.
That comes in handy when you least expect it.

This is a year old but there's some points worth making.
NB For on-line NON-ACCOUNT processes: The Customer would be better defined as Buyer and there would also probably be another type of customer - the Beneficiary/Recipient. You can buy/purchase airline tickets and flowers etc. for other people so these two roles need to be clearly separated as they involve different business processes (one to pay and the other to be sent the goods).
If it is a non-account process then you shouldn't be retaining credit card details. It's a security risk - and you're putting the buyer at risk by keeping this information. Credit cards are processed in real-time and then the information should be thrown away.
ACCOUNT CUSTOMERS: The only exception would be when someone has opened an account and provided their credit card information for use in subsequent purchases. In such a case changes to the credit card information would take place outside of the transaction - as part of the Account Management process.
The main point is to make sure that you fully understand the business processes before you start modelling and coding.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas