Composite key turns out to be not unique....trying to build a fix

Composite key turns out to be not unique....trying to build a fix - sql

OK...I am hoping this is a classic problem that everyone knows the answer to already. I have been building a mysql database (my first one) where the main purpose was to load line-item data from an invoice and related data from the matching remittance and reconcile the two. Basically, everything has been going along fine until I discovered a problem.
Details: I have thus far identified individual invoice line items with a client (to be billed) id, service date, and service type and matching that transaction against the remittance transaction with the same client ID, service date and service type. Unfortunately, there are times (I just discovered) when one client (ID) gets multiple instances of a particular service on the same day and thus my invoices are not unique based on the three components I just mentioned.
There is another piece of info on the invoice (service time) that could be used to make invoice items unique, but the remittance does not include service times (thus I cannot match directly against it using service time). Likewise, the remittance has another piece of info (claims ref number) that uniquely identifies remittance items. But of course, the claims ref number is not on the invoice.
Is there some way to use an intermediate table perhaps that can bridge this gap? Any help, answers or helpful links would be most appreciated. Thanks in advance.

This is perhaps more a business problem then a technical one-- it sounds like there is in fact no reliable way to match up remittances and invoices, unless something like matching on the dollar amount works. If you use an artificial key on the invoice you kind of solve the technical problem but not the business one.
If you can't change the business process at all and there is no technical way to match remittances and invoices, you might be forced to treat all invoices for a customer/service date/service type as a unit; make each invoice a part of that unit, and then group all the remittances and all the invoices that match that unit together.

You can make life easy on yourself and create an Invoice ID and remove the composite key all together.
Any type of fix is going to have an impact on the calling code, as increasing the field count on the composite key implies that this new field needs to be supplied, so I suggest just creating an invoice ID.

Many IT professionals that work with RDBMS will suggest to never use natral keys. Always use a surrogate key (like an auto-increment column)

I agree with #antlersoft (+1), this sounds mostly like a business problem: how to “match up” items within two separate sets of data that cannot be clearly and cleanly matched up with the data provided.
If the “powers that be” (aka your manager/supervisor/project owner) cannot or will not make this decision, and if you have to do something, based on the information provided I’d recommend matching same-day items like so:
lowest invoice-item service time with lowest remittance claims ref number
next-lowest invoice item service time with next-lowest remittance claims ref number
etc.
(So when you have such multiple-per-day items, do you always have the same number of invoice items and remittances? Or is that going to be your next hurdle?)
Once you know how to implement “matching up” items, you then have to implement it by storing the data that supports/defines the assocaition within the database. Assuming tables InvoiceItem and Remittance, you could add (and populate) ServiceTime in the Remittance table, or ClaimsRefNumber in the InvoiceItem table (the latter seems more sensible to me). Alternatively, as most people suggest, you could add a surrogate key to either (or both) tables, and store one’s surrogate key in the other’s table. (Again, I’d store, say, RemittanceId in table InvoiceItem, as presumably you couldn’t have a Remittance without an InvoiceItem – but it depends strongly upon your buseinss logic.)

Related

Database design for coupon usuage restriction

While working on implementing voucher feature for an eCommerce application, I need to implement Voucher usage restriction, some of restriction I am planning to have
Products
Exclude products
Product categories
Exclude categories
Email /Customer restrictions
Currently We are supporting following 2 type of Vouchers with an option to create Custom voucher type and all those Vouchers types are being maintained in a single table with help of discriminator (Hibernate use).
Serial Vouchers
Promotion Vouchers.
these are only few which I am targeting at initial stage.My main confusion is about database design and restriction of these voucher usage with Voucher.I am not able to decide which is best way to Map these restrictions in database.
Should I go for a single table for all these restriction and have a relation with Voucher table or is it good to group all similar type of restriction in a single table and have their relation with Voucher table.
As an additional information , we are using hibernate to map our entities with the DB table.

This seems like a very wide-open and freeform requirement. Some questions:
How complex will the business rules you are attempting to model be? If you’re allowing (business) users to define their own vouchers, odds are good they’ll come up with some pretty byzantine rules and combinations. If you have to support anything they come up with, you will have problems.
What will the database be tasked to do with this data? Store the “voucher definition”, sure, but then what? Run tallies or reports on them? Analyze how many are used, by who/when/how/for what? Or just list out what was used/generated over the past year?
What kind of data volumes are you going to have? One entry per voucher definition, or per voucher printed/issued? (If the latter, can you use one entry per voucher, with a count of how many issued?) Are we talking dozens, hundreds, or millions of vouchers?
If it’s totally free-form, if they just want a listing without serious analysis, if the overall volume is small, consider using blob fields rather than minutiae-oriented columns. Something like a big text field and a data-entry box wherein the user will “Enter any other criteria defining the voucher”. (You might even do this using XML.) Ugly, you can’t readily analyze the data, but if the goals are too great or diffuse and you're not going to use all that detailed data, it might be necessary.
A final note: a voucher that is good for only selected products cannot be used on products that are added after the voucher is created. A voucher that is good for all but selected products can be used for subsequently created products. This logic may apply to any voucher-limiting criteria. Both methodologies have merit, make sure the users are clear on what they’re doing.

If I understand what your your are doing, you will have a problem with only one table for all restrictions, because it means 1 row per Voucher and multiple values in your different restrictions columns.
It will be harder for you to UPDATE, extract and cast restrictions values.
In my opinion, you should have one table for each restrictions type and map them with Voucher table. However It will be easier for you to add new restrictions.

As a suggestion:
Isn't it more rational to have valid-products and valid-categories instead of Exclude-products and Exclude-categories?
Having a Customer-Creditgroup table will lead us to have valid-customer-group table.
BTW in the current design we can have a voucher definition table, I will call it voucher-type table.
About the restrictions:
In RDBMS level you can state only two types of table constraints decoratively:
uniquely identifying attributes (keys)
Subsets requirements referencing back to the same or other table
(foreign key)
Implementing all other types of table constraints (like a multi-tuple constraints or transition constraints) requires you to develop procedural data integrity code.
When a voucher is going to sold to a specific customer for a specific product we will need to check validity of excluded elements, that could be done by triggers in data base level or business logic of your application.

I would personally go with your second proposal... grouping all similar types of restrictions in a single table, which refers the Voucher table.
I'll add to that, that you can handle includes and excludes on the same table.
So the structure I'd use is some along the lines of:
Voucher (id, type, etc...)
VoucherProductRestriction (id,voucher_id,product_id,include)
VoucherProductCategoryRestriction (id,voucher_id,product_category_id,include)
VoucherCustomerRestriction (id,voucher_id,customer_id)
VoucherEmailRestriction (id,voucher_id,email)
...where the include column could be a boolean that is true in case you want to restrict the voucher to that product or category, or false if you want to restrict it to any product or category other than those specificied.
If I understand your context correctly, it makes no sense to have both include and exclude restrictions on the same voucher (although it could make sense to have more than one of the same type). You can probably handle and check this better if you use a single table for both types of restrictions.

Modelling the Domain from Two perspectives

I'm trying to model the domain of my system but I've come across and issue and could do with some help.
My issue is one of perspective. I'm modeling a system where I have a Customer entity which will have a number of Order entities and the system will be required to list all the Orders for a selected Customer (perspective 1). I therefore modeled a Customer class which contains a collection of Orders... simple. However I've just realised that the system will also need to list all Orders with the details of the Customer (perspective 2) which would mean that I had a single Customer reference from each Order.
The problem is that from each perspective I will be taking time to create object which I will not be interested in E.g. When I will display a list of Orders a Customer instance will be created for each order; in turn the Customer instance will then hold a collection of Orders they have made (which from this perspective I'm not interested in!!).
Could anybody help with suggestions? I've come across this issue before but I've never taken the time to design a proper solution.
Regards,
JLove

I have seen this before. The trick is to differentiate between Customer-Identity and Customer-Details (e.g. Orders). You can then link from all Order-Objects to the Customer-Identity-Object, and in the other view link from the Customer-Identity-Object to the Customer-Details-Object which further links to Order-Objects (you probably want this ordered chronologically).
The implementation can be held as on Object-System or as a relational Database (in which case you would have a table "Customers" with CustomerID as Key, their addresses etc; and a table "Orders" with OrderID as key, and CustomerID as another column.

SQL Server Business Logic: Deleting Referenced Data

I'm curious on how some other people have handled this.
Imagine a system that simply has Products, Purchase Orders, and Purchase Order Lines. Purchase Orders are the parent in a parent-child relationship to Purchase Order Lines. Purchase Order Lines, reference a single Product.
This works happily until you delete a Product that is referenced by a Purchase Order Line. Suddenly, the Line knows its selling 30 of something...but it doesn't know what.
What's a good way to anticipate the deletion of a referenced piece of data like this? I suppose you could just disallow a product to be deleted if any Purchase Order Lines reference it but that sounds...clunky. I imagine its likely that you would keep the Purchase Order in the database for years, essentially welding the product you want to delete into your database.

The parent entity should NEVER be deleted or the dependent rows cease to make sense, unless you delete them too. While it is "clunky" to display old records to users as valid selections, it is not clunky to have your database continue to make sense.
To address the clunkiness in the UI, some people create an Inactive column that is set to True when an item is no longer active, so that it can be kept out of dropdown lists in the user interface.
If the value is used in a display field (e.g. a readonly field) the inactive value can be styled in a different way (e.g. strike-through) to reflect its no-longer-active status.
I have StartDate and ExpiryDate columns in all entity tables where the entity can become inactive or where the entity will become active at some point in the future (e.g. a promotional discount).

Enforce referential integrity. This basically means creating foreign keys between the tables and making sure that nothing "disappears"
You can also use this to cause referenced items to be deleted when the parent is deleted (cascading deletes).
For example you can create a SQL Server table in such a way that if a PurchaseOrder is deleted it's child PurchaseOrderLines are also deleted.
Here is a good article that goes into that.
It doesn't seem clunky to keep this data (to me at least). If you remove it then your purchase order no longer has the meaning that it did when you created it, which is a bad thing. If you are worried about having old data in there you can always create an archive or warehouse database that contains stuff over a year old or something...

For data like this where parts of it have to be kept for an unknown amount of time while other parts will not, you need to take a different approach.
Your Purchase Order Lines (POL) table needs to have all of the columns that the product table has. When a line item is added to the purchase order, copy all of product data into the POL. This includes the name, price, etc. If the product has options, then you'll have to create a corresponding PurchaseOrderLineOptions table.
This is the only real way of insuring that you can recreate the purchase order on demand at any point. It also means that someone can change the pricing, name, description, and other information about the product at anytime without impacting previous orders.
Yes, you end up with a LOT of duplicate information in your line item table..; but that's okay.
For kicks, you might keep the product id in the POL table for referencing back, but you cannot depend on the product table to have any bearing on the paid for product...

One mysql table with many fields or many (hundreds of) tables with fewer fields?

I am designing a system for a client, where he is able to create data forms for various products he sales him self.
The number of fields he will be using will not be more than 600-700 (worst case scenario). As it looks like he will probably be in the range of 400 - 500 (max).
I had 2 methods in mind for creating the database (using meta data):
a) Create a table for each product, which will hold only fields necessary for this product, which will result to hundreds of tables but with only the neccessary fields for each product
or
b) use one single table with all availabe form fields (any range from current 300 to max 700), resulting in one table that will have MANY fields, of which only about 10% will be used for each product entry (a product should usualy not use more than 50-80 fields)
Which solution is best? keeping in mind that table maintenance (creation, updates and changes) to the table(s) will be done using meta data, so I will not need to do changes to the table(s) manually.
Thank you!
/**** UPDATE *****/
Just an update, even after this long time (and allot of additional experience gathered) I needed to mention that not normalizing your database is a terrible idea. What is more, a not normalized database almost always (just always from my experience) indicates a flawed application design as well.

i would have 3 tables:
product
id
name
whatever else you need
field
id
field name
anything else you might need
product_field
id
product_id
field_id
field value

Your key deciding factor is whether normalization is required. Even though you are only adding data using an application, you'll still need to cater for anomalies, e.g. what happens if someone's phone number changes, and they insert multiple rows over the lifetime of the application? Which row contains the correct phone number?
As an example, you may find that you'll have repeating groups in your data, like one person with several phone numbers; rather than have three columns called "Phone1", "Phone2", "Phone3", you'd break that data into its own table.
There are other issues in normalisation, such as transitive or non-key dependencies. These concepts will hopefully lead you to a database table design without modification anomalies, as you should hope for!

Pulegiums solution is a good way to go.
You do not want to go with the one-table-for-each-product solution, because the structure of your database should not have to change when you insert or delete a product. Only the rows of one or many tables should be inserted or deleted, not the tables themselves.

While it's possible that it may be necessary, having that many fields for something as simple as a product list sounds to me like you probably have a flawed design.
You need to analyze your potential table structures to ensure that each field contains no more than one piece of information (e.g., "2 hammers, 500 nails" in a single field is bad) and that each piece of information has no more than one field where it belongs (e.g., having phone1, phone2, phone3 fields is bad). Either of these situations indicates that you should move that information out into a separate, related table with a foreign key connecting it back to the original table. As pulegium has demonstrated, this technique can quickly break things down to three tables with only about a dozen fields total.

SQL/MySQL structure (Denormalize or keep relational)

I have a question about best practices related to de-normalization or table hierarchy relationships.
For a simple example, let's say I have an app that allows a user to make a payment for an order. I save the order information in the orders table, and I have another table for the payment called payments. Payments has a foreign key to the orders table.
Let's assume that I can pay with a credit card, check, or paypal, and I want to save the information about the payment.
My question is what is the best way to handle this relationship between the different payment data and the payment table. The types of payment all have different data associated with them. So do I denormalize the payments table, putting credit card, check, and paypal information fields in there and then just use the fields as necessary. Alternately I could specify a payment type, and store the information in their own tables, but then I would have to use logic on an application level to get the data out of the correct credit card, check or paypal information tables...

I would choose to keep the database normalized.
but then I would have to use logic on an application level to get the data out of the correct credit card, check or paypal information tables...
You have to use logic (or at least mapping) in either case. Whether its what table to pull the data from or what fields in the table to access.

What about keeping it denormalized and then making a view to put the data back together again. You get the best of both worlds. IIRC, MySQL introduced views in version 5.

So do I denormalize the payments
table, putting credit card, check, and
paypal information fields in there and
then just use the fields as necessary.
yes. but this is not "denormalizing". if you stored order information in the client table, that would be denormalizing. adding nullable columns to accurately describe a payment in the payments table is not.

You can use the idea of table per subclass as the ORM tools do. This would require a join for each query against the payment table but...
Create tables for each payment type so you will have a creditcardpayment and a checkpayment table. The common fields go in the payment table, the specific fields go in the sub tables. The sub tables primary keys are foreign keys to the payment table's id.
To add a new payment you have to first insert the common fields into the payment table, get the id generated, then insert the specific fields into the specific sub table.
To query you have to join the subtables with the payment table. You could use a view to make that easier.
This way the database is still normalized and you have no null columns.

It partially depends on the framework (if any) that you are using. For instance: the Ruby on Rails way would generally be to store the type of the payment in the payments table and then have different, separate tables for each payment type (PayPal, Credit Card, etc).
Alternatively, if you notice that you are repeating the same data in many of the tables, Rails has a way to store all of the data in the same table, using only the fields you need, but still allowing you to have separate objects. For instance, you would have an AbstractPayment object with an abstract_payments table, but you would also have PayPalPayment and CreditCardPayment objects that both inherit from AbstractPayment and use the abstract_payments table. All you need to determine the payment type is a column in abstract_payments that tells you which type it is (probably a string, but could be an integer if you so choose). This is called STI.
No matter what framework/language you use, the same ideas can definitely apply and I think the right solution will depend on how many different types of payments you have, compared with how simple you want your database to be.

Keep it as normalized as possible. Only de-normalize when the performance of a fully normalized schema requires denormalization to improve response time, and do that only on a case by case basis to deal with specific performance issues associated with individual querys within your application.
These are complex problems. Database Normalization requires intimate domain knowledge, and a skilled analysis of how that domain model will be manipulated and utilized within your application. Denormalizing for performance requires that you understand your application's usage patterns well enough to predict performance issues before they occur (waiting till they actually occur in production is too late - by then making fundemental schema changes in the database is very expensive) and know what denormalization techniques to use to address them.

You need to weight the following factors:
How much space will you waste if you put all data into a single table
How complex the SQL queries will become in either case.
If you use different tables, you'll have to use joins. If you put everything into a single table, you'll need to find some magic to "ignore" the rows which don't matter (say when you want to find all credit card payments: Your query must then ignore everything that's something else).
The latter part gets more easy when you move the special data into special tables at the cost of more complex joins.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas