Relational Database design: handling sales receipts - sql

I am making an online market for a learning project. Pardon me for not having a diagram.
I have the tables Seller and Product,which contains data about the seller and products, respectively. A Seller can have multiple products. There is also a Receipt table that stores information regarding purchases made by a customer. This is an important record and must persist. The receipt should be able to have information on the item purchased.
However, products are dynamic, products may be added and removed. But since the Receipt should reference the Product, it means that I should not delete a Product row even if it is no longer on sale.
Is this the right way to do it? Are there any better design pattern I can use?

Yes, that is the right way to do it. If you set the referential integrity right, the system will not allow you to delete a product or seller if it has receipts. The next thing to do is to use a flag to mark the product or seller as deleted or archived. It could be either a boolean or a date that indicates when it became inactive. Using a 'From' AND a 'To' date to indicate valid time intervals, as Hellmar Becker suggests, is very powerfull, but it opens a whole new can of worms: you can have more than one 'valid' period, so you have to extend your primary key.
Modern databases like HANA (from SAP) just don't allow deletes any more, and have inbuilt 'deleted' flags.

This isn't a proper answer. I just want to give you the gift of a diagram since you didn't have one! :)
(Disclaimer: QuickDatabaseDiagrams is my project)

Related

Schema for storing payment attempt records in SQL

I tried looking at similar StackOverlow posts and it seems as those questions for input about schema is valid. Also, I'm a software developer and not a DB expert by trade. So hopefully this is met well.
I'm using SQL Server, though I think this question is generic enough that it might be applicable to pretty much any SQL product as it pertains to what's the best schema for my scenario.
I'm writing a referral payment system whereby stores may credit and pay back individuals who refer customers. The entities are -
Referrer: the one to be paid for referring customers,
Referral: the customer that was referred
Referral Purchase: The amount and date of the referral's purchase.
Admin: the one doing the paying.
When determining what to pay the referrer I need to tally up all of the referral purchases that have not been credited. The sum at the time of the pay out attempt is what gets paid.
The confounding part of this whole thing is that when an Admin makes a payment, it may fail for any number of reasons (insufficient funds, the referrer gave bad PayPal information, etc.). All of this needs to be stored so that I can not only look back over past payment attempts and determine the failures and what referral purchases were involved in the failure, but also to determine which referral purchases have yet to be credited to the referrer.
The best schema I've been able to devise is the following:
The point here is that each PaymentAttempt holds the status of the payment attempt (success/failure) and each Referral Purchase that was credited in the payment attempt has a link table which associates it with the payment attempt. One referral purchase may, then, be involved in any number of attempts to credit the referrer, with the last one being the successful attempt.
Ultimately my question comes down to this: when I need to go back and then determine how much the referrer needs to be paid at a later date, is it going to be a pain in years to come if I need to query ALL of the ReferralPurchases associated with the referrer, then join ALL of the ReferralPurchase/PaymentAttempt link tables, then join the associated PaymentAttempt status tables to find out which of the referral purchases have yet to be credited? I could see myself needing to create pretty weird queries just to find those five purchases that have yet to be credited.
Alternatively I could update the ReferralPurchase itself with a status flag, but is this considered "asking for it" in terms of data integrity (I think I could see some saying this is poor design since the state could be queried in other ways, and perhaps a bug might result in the bit being set without proper records to warrant it)? Is that bad design?
Or is there some better way to lay things out?
Will try my best to help you out, hopefully I understand your question correctly. If I were designing the system, there would be two tables that stand out for me. The tables and their columns are.
ReferralPurchase
• ReferralPurchase_Id (PK)
• Referrer_Id (Pointing to a person table)
• Referral _Id (Pointing to a person table)
Payment
• Payment_Id (PK)
• ReferralPurchase_Id (FK)
• AmountToBePaid
• StatusOfPayment
• DateLogged
• DatePaymentMade (Null if status is not successful)
• Admin_Id (Pointing to a person table)
Ben, not sure what you mean by status field. I would steer away from lifecycle status fields, but would consider a boolean. For example:
An isPaid flag on ReferralPurchase would seem like a reasonable approach. It should only be updated on a confirmed payment, and if there is a query on why it has been set, the evidence will exist in the form of history from the PaymentAttempt and link tables. This would simplify queries of outstanding payments, and pending payments would just be incomplete PaymentAttempts. There is the theoretical possibility that the history could contradict the value of the flag.
Alternatively, you could have an isSuccessful flag on the link table, which is "closer to source", if I can put it that way, in that it cannot as easily be in conflict, as it is the history itself (as long as the coder does not allow more than one row to be marked isSuccessful for a given ReferralPayment for example). Finding outstanding payments is just those ReferralPayments where not exists an isSuccessful link record.
Others will have different views on this. Let us know which way you go.

Composite key turns out to be not unique....trying to build a fix

OK...I am hoping this is a classic problem that everyone knows the answer to already. I have been building a mysql database (my first one) where the main purpose was to load line-item data from an invoice and related data from the matching remittance and reconcile the two. Basically, everything has been going along fine until I discovered a problem.
Details: I have thus far identified individual invoice line items with a client (to be billed) id, service date, and service type and matching that transaction against the remittance transaction with the same client ID, service date and service type. Unfortunately, there are times (I just discovered) when one client (ID) gets multiple instances of a particular service on the same day and thus my invoices are not unique based on the three components I just mentioned.
There is another piece of info on the invoice (service time) that could be used to make invoice items unique, but the remittance does not include service times (thus I cannot match directly against it using service time). Likewise, the remittance has another piece of info (claims ref number) that uniquely identifies remittance items. But of course, the claims ref number is not on the invoice.
Is there some way to use an intermediate table perhaps that can bridge this gap? Any help, answers or helpful links would be most appreciated. Thanks in advance.
This is perhaps more a business problem then a technical one-- it sounds like there is in fact no reliable way to match up remittances and invoices, unless something like matching on the dollar amount works. If you use an artificial key on the invoice you kind of solve the technical problem but not the business one.
If you can't change the business process at all and there is no technical way to match remittances and invoices, you might be forced to treat all invoices for a customer/service date/service type as a unit; make each invoice a part of that unit, and then group all the remittances and all the invoices that match that unit together.
You can make life easy on yourself and create an Invoice ID and remove the composite key all together.
Any type of fix is going to have an impact on the calling code, as increasing the field count on the composite key implies that this new field needs to be supplied, so I suggest just creating an invoice ID.
Many IT professionals that work with RDBMS will suggest to never use natral keys. Always use a surrogate key (like an auto-increment column)
I agree with #antlersoft (+1), this sounds mostly like a business problem: how to “match up” items within two separate sets of data that cannot be clearly and cleanly matched up with the data provided.
If the “powers that be” (aka your manager/supervisor/project owner) cannot or will not make this decision, and if you have to do something, based on the information provided I’d recommend matching same-day items like so:
lowest invoice-item service time with lowest remittance claims ref number
next-lowest invoice item service time with next-lowest remittance claims ref number
etc.
(So when you have such multiple-per-day items, do you always have the same number of invoice items and remittances? Or is that going to be your next hurdle?)
Once you know how to implement “matching up” items, you then have to implement it by storing the data that supports/defines the assocaition within the database. Assuming tables InvoiceItem and Remittance, you could add (and populate) ServiceTime in the Remittance table, or ClaimsRefNumber in the InvoiceItem table (the latter seems more sensible to me). Alternatively, as most people suggest, you could add a surrogate key to either (or both) tables, and store one’s surrogate key in the other’s table. (Again, I’d store, say, RemittanceId in table InvoiceItem, as presumably you couldn’t have a Remittance without an InvoiceItem – but it depends strongly upon your buseinss logic.)

SQL Server Business Logic: Deleting Referenced Data

I'm curious on how some other people have handled this.
Imagine a system that simply has Products, Purchase Orders, and Purchase Order Lines. Purchase Orders are the parent in a parent-child relationship to Purchase Order Lines. Purchase Order Lines, reference a single Product.
This works happily until you delete a Product that is referenced by a Purchase Order Line. Suddenly, the Line knows its selling 30 of something...but it doesn't know what.
What's a good way to anticipate the deletion of a referenced piece of data like this? I suppose you could just disallow a product to be deleted if any Purchase Order Lines reference it but that sounds...clunky. I imagine its likely that you would keep the Purchase Order in the database for years, essentially welding the product you want to delete into your database.
The parent entity should NEVER be deleted or the dependent rows cease to make sense, unless you delete them too. While it is "clunky" to display old records to users as valid selections, it is not clunky to have your database continue to make sense.
To address the clunkiness in the UI, some people create an Inactive column that is set to True when an item is no longer active, so that it can be kept out of dropdown lists in the user interface.
If the value is used in a display field (e.g. a readonly field) the inactive value can be styled in a different way (e.g. strike-through) to reflect its no-longer-active status.
I have StartDate and ExpiryDate columns in all entity tables where the entity can become inactive or where the entity will become active at some point in the future (e.g. a promotional discount).
Enforce referential integrity. This basically means creating foreign keys between the tables and making sure that nothing "disappears"
You can also use this to cause referenced items to be deleted when the parent is deleted (cascading deletes).
For example you can create a SQL Server table in such a way that if a PurchaseOrder is deleted it's child PurchaseOrderLines are also deleted.
Here is a good article that goes into that.
It doesn't seem clunky to keep this data (to me at least). If you remove it then your purchase order no longer has the meaning that it did when you created it, which is a bad thing. If you are worried about having old data in there you can always create an archive or warehouse database that contains stuff over a year old or something...
For data like this where parts of it have to be kept for an unknown amount of time while other parts will not, you need to take a different approach.
Your Purchase Order Lines (POL) table needs to have all of the columns that the product table has. When a line item is added to the purchase order, copy all of product data into the POL. This includes the name, price, etc. If the product has options, then you'll have to create a corresponding PurchaseOrderLineOptions table.
This is the only real way of insuring that you can recreate the purchase order on demand at any point. It also means that someone can change the pricing, name, description, and other information about the product at anytime without impacting previous orders.
Yes, you end up with a LOT of duplicate information in your line item table..; but that's okay.
For kicks, you might keep the product id in the POL table for referencing back, but you cannot depend on the product table to have any bearing on the paid for product...

Should I denormalize Loans, Purchases and Sales tables into one table?

Based on the information I have provided below, can you give me your opinion on whether its a good idea to denormalize separate tables into one table which holds different types of contracts?.. What are the pro's/con's?.. Has anyone attempted this before?.. Banking systems use a CIF (Customer Information File) [master] where customers may have different types of accounts, CD's, mortgages, etc. and use transaction codes[types] but do they store them in one table?
I have separate tables for Loans, Purchases & Sales transactions. Rows from each of these tables are joined to their corresponding customer in the following manner:
customer.pk_id SERIAL = loan.fk_id INTEGER;
= purchase.fk_id INTEGER;
= sale.fk_id INTEGER;
Since there are so many common properties among these tables, which revolves around the same merchandise being: pawned, bought and sold, I experimented by consolidating them into one table called "Contracts" and added the following column:
Contracts.Type char(1) {L=Loan, P=Purchase, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic table where for example:
Contracts.Capital DECIMAL(7,2)
in a loan contract it holds the pawn principal amount, in a purchase holds the purchase price, in a sale holds sale price.
Is this design a good idea or should I keep them separate?
Your table second design is better, and, is "normalised".
You first design was the denormalised one!
You are basiclly following a database design/modelling pattern known as "subtype/supertype"
for dealing with things like transactions where there is a lot of common data and some data specific to each tranasaction type.
There are two accepted ways to model this. If the the variable data is minimal you hold everthing in a single table with the transaction type specfic attributes held in "nullable" columns. (Which is essentially your case and you have done the correct thing!).
The other case is where the "uncommon" data varies wildly depending on transaction type in which case you have a table which holds all the "common" attributes, and a table for each type which holds the "uncommon" attributes for that "type".
However while "LOAN", "PURCHASE" and "SALE" are valid as transactions. I think Inventory is a different entity and should have a table on its own. Essentially "LOAN" wll add to inventory transaction, "PURCHASE" wll change of inventory status to Saleable and "SALE" wll remove the item from inventory (or change status to sold). Once an item is added to inventory only its status should change (its still a widget, violin or whatever).
I would argue that it's not denormalized. I see no repeating groups; all attributes depending on a unique primary key. Sounds like a good design to me.
Is there a question here? Are you just looking for confirmation that it's acceptable?
Hypothetically, what would you do if the overwhelming consensus was that you should revert back to separate tables for each type? Would you ignore the evidence from your own experience (better performance, simpler programming) and follow the advice of Stackoverflow.com?

Circular database relationships. Good, Bad, Exceptions?

I have been putting off developing this part of my app for sometime purely because I want to do this in a circular way but get the feeling its a bad idea from what I remember my lecturers telling me back in school.
I have a design for an order system, ignoring the everything that doesn't pertain to this example I'm left with:
CreditCard
Customer
Order
I want it so that,
Customers can have credit cards (0-n)
Customers have orders (1-n)
Orders have one customer(1-1)
Orders have one credit card(1-1)
Credit cards can have one customer(1-1) (unique ids so we can ignore uniqueness of cc number, husband/wife may share cc instances ect)
Basically the last part is where the issue shows up, sometimes credit cards are declined and they wish to use a different one, this needs to update which their 'current' card is but this can only change the current card used for that order, not the other orders the customer may have on disk.
Effectively this creates a circular design between the three tables.
Possible solutions:
Either
Create the circular design, give references:
cc ref to order,
customer ref to cc
customer ref to order
or
customer ref to cc
customer ref to order
create new table that references all three table ids and put unique on the order so that only one cc may be current to that order at any time
Essentially both model the same design but translate differently, I am liking the latter option best at this point in time because it seems less circular and more central. (If that even makes sense)
My questions are,
What if any are the pros and cons of each?
What is the pitfalls of circular relationships/dependancies?
Is this a valid exception to the rule?
Is there any reason I should pick the former over the latter?
Thanks and let me know if there is anything you need clarified/explained.
--Update/Edit--
I have noticed an error in the requirements I stated. Basically dropped the ball when trying to simplify things for SO. There is another table there for Payments which adds another layer. The catch, Orders can have multiple payments, with the possibility of using different credit cards. (if you really want to know even other forms of payment).
Stating this here because I think the underlying issue is still the same and this only really adds another layer of complexity.
A customer can have 0 or more credit cards associated, but the association is dynamic - it can come and go. And as you point out a credit card can be associated with more than one customer. So this ends up being an n:m table, maybe with a flag column for "active".
An order has a static relationship to 0 or 1 credit card, and after a sale is complete, you can't mess with the cc value, no matter what happens to the relationship between the cc and the customer. The order table should independently store all the associated info about the cc at the time of the sale. There's no reason to associate the sale with any other credit card column in any other table (which might change - but it wouldn't affect the sale).
I think the problem is with the modeling of the Order. Instead of one Order has one credit card, an order should be able to be associated with more than one credit card of which only one is active at any time. Essentially, Order and Credit is many-to-many. In order to model this in DB, you need to introduce an association table, let's say PaymentHistory. Now when an order requires a new credit card, you can simply create a new credit card, and associate it with the order and mark the associating PaymentHistory as active.
Hmm?
A customer has several credit cards, but only a current one. An order has a single assigned card. When a customer puchases something, his default card is tried first, otherwise, he may change his main card?
I see no circular references here; when a user's credit card changes, his orders' stay the same. Your tables would end up as:
Customer: id, Current Card
Credit Cards: id, number, customer_id
Order: id, Card_id, Customer_id
Edit: Oops, forgot a field, thanks.
No matter the reason your data has circular relationships, you'll be a lot happier if you "forget" to declare one of them so that your tables have a bulk-load order.
That comes in handy when you least expect it.
This is a year old but there's some points worth making.
NB For on-line NON-ACCOUNT processes: The Customer would be better defined as Buyer and there would also probably be another type of customer - the Beneficiary/Recipient. You can buy/purchase airline tickets and flowers etc. for other people so these two roles need to be clearly separated as they involve different business processes (one to pay and the other to be sent the goods).
If it is a non-account process then you shouldn't be retaining credit card details. It's a security risk - and you're putting the buyer at risk by keeping this information. Credit cards are processed in real-time and then the information should be thrown away.
ACCOUNT CUSTOMERS: The only exception would be when someone has opened an account and provided their credit card information for use in subsequent purchases. In such a case changes to the credit card information would take place outside of the transaction - as part of the Account Management process.
The main point is to make sure that you fully understand the business processes before you start modelling and coding.