SQL DB structure - Draft orders and consideration of Order ID as identifier

SQL DB structure - Draft orders and consideration of Order ID as identifier - sql

I am upgrading a system for a client which was developed by myself around 10 years ago.
It is a standard (if there can be such a thing, of course) sales / inventory / accounting system.
One of the additions they have asked me about was the ability to create draft orders. As the company has grown, so have the sizes of the orders. They want the ability to begin entering an order for a client and have the option of saving and coming back to it later.
My initial thoughts would be to have an orders table which includes drafts and a field which signified the status (draft / posted). This would prevent duplicating data across an Orders table and a DraftOrders table.
This seems correct to me but of course the OrderId field (auto-increment int) would no longer be a solid identifier for the Order (since a lot of the numbers in between orders may be missing).
The client would ideally like to keep the OrderId as an identifier so is there any solution which would enable this, rather than creating a draft order table?
Many thanks in advance for your help.
Kind regards

If you are to ensure that the identifier has no gaps for taxation purposes, you can not use the PK in the first place. This is because the sequence may have gaps, too. For example, if an INSERT fails due to some constraint violation you lose the reserved sequence number.
In case you do not want to create a separate table, I may suggest adding a new column to store the tax order ID. It will remain NULL for drafts and will be filled programmatically when the order is placed. On the UI you will show this new column and will possibly allow some searching on it (hint: good candidate for an index), yet internally you will still use same FKs as before (for both orders and drafts).

Related

Forward Planning for SQL Invoice Auto-increment

I am developing a database system for my employer and part of this involves creating invoices. I've been thinking about the auto-increment ids on my tables, and to what extent I need to make allowances for growth of the business. I am utilising InnoDB because the system will be very comprehensive, and many records will get updated.
Simplified, here is what I have currently:
Office (An office/store of the business. Currently 2.)
office_id (PK) INT, AI, UN
Invoice
invoice_no (PK) INT, AI, UN
office_id (FK) (Where the invoice originated from.)
Products
product_id (PK) INT, AI, UN
InvoiceLine (Ties products to an invoice to make the lines.)
invoice_line_id (PK) INT, AI, UN
invoice_no (FK)
product_id (FK)
quantity
Firstly, while I'll probably never run out of invoice numbers, I wonder if there may be a better way to approach this, just incase the business does have an unanticipated expansion of offices and increase in sales. How would a large company with say 50+ stores tackle this? Would each store likely have its own set of invoice numbers starting from 1?
This is what I've considered...
Option 1 - Should I make the invoice_no bigger than the standard 10 precision? Regardless of difficulty, could this be changed after deployment if we saw the current limit would be insufficient, or is this impossible/highly problematic?
Option 2 - Pardon my ignorance but is it possible/wise to have a database made up of tables with different engine types? It is my understanding that with MyISAM, the invoice table could have a composite key of office_id and invoice_no, where the auto-incrementing number would increase separately for each office. Is this true and viable?
Option 3 - Could I have new tables created upon the insert of new office? Create table InvoiceX & InvoiceXLine, where X is the office_id?
Is there a better, simpler method that Im just not thinking of?
Secondly, if the business expands and we were averaging 30+ lines per invoice, it is conceivable that the invoice_line_ids would run out in the long term. So I probably need a similar solution for this, except Option 3 above (creating an InvoiceLineX table for every invoice_no) would be completely impractical in this case.
Could I simply make the primary key for the InvoiceLine table a composite of invoice_no and product_id?

It's kind of a business question. Until you know how they intend to send invoices, why would you guess? That said, if I had to keep and eye on the future I'd keep a few separate IDs.
A master, magic number that is just the sequential unique ID that's as big as you need (maybe an INT, maybe bigger depending on your business size),
an "invoice originator" column being the store (or whatever) that generated it,
another column for "invoice processing entity ID" being the store/accounts office that issued/needs to deal with it throughout its lifecycle.
That gives you more flex if you have, say the larger of a store in a state processing all invoices in that state. Of course this is guesswork!
The point of all this is that you've collected a lot of data that will likely be useful in its own right and then your actual invoice number will be some combination of those things.
Use your imagination (or business analyst) about what else you might want to keep & use.
Can't help you with the DB types.
Do not have one table per location/invoice line. That would suck big time.
A side note - you will always get gaps in your IDs. They are unavoidable so try not to get distracted with that and insist on gap-free. You can't get that with any level of performance and you probably don't even need it.
If you think you might need it gap-free or broken down by location, put in a batch job that allocates an office/store/whatever specific number at the end of each day. That way you can allocate some nice numbers as you see fit, using the basic sequence from the underlying IDs.
I think the short answer is go with more or less what you have unless it proves to be wrong. All your suggestions are do to with problems you either don't know the answer to or won't happen.

Counting rows with condition

My Table looks like something below
Id | Customer_number | Customer_Name | Customer_owner
I want to insert Customer_Number as a sequence specific to Customer_owner
that is 1,2,3,.... for Customer_owner X and 1,2,3,... Customer_owner Y.
To get the Customer_number I can use following SQL
SELECT COUNT(*) FROM Customer where Customer_owner='X'
My question is that are there any performance impact. Specially for a table with 100,000 records.
Are there any better alternatives?

In terms of performance, I would suggest not adding another column to Customers, for various reasons:
The need to update all of owner's A related customers when adding a customer with the owner A, same goes for removing.
Number of clients is Repeated multiple times - taking up more space and thus (generally) slowing execution.
No real usage to link Number of clients to client's owner via another column for a record describing a single customer.
and many more explanations..
The correct normal form would be having 2 tables:
Customers(Cust_id,Cust_name,Cust_Owner_id)
2.a. Owners (Owner_id,Owner_name,NumberOfCustomers)
OR
2.b. Owners (Owner_id,Owner_name) and have NumberOfCustomers be auto calculated upon Querying.
Edit:
Since you want to display all the customers for a single owner, I assume that is your main usage, you should add a cluster index on Cust_Owner_id . Then , when querying, performance would be good since it will have the benefits of clustering according to your desired data.
Read more about clustering here: Clustered Index
Edit 2:
I've just realized your intent via latest comments, but the solution still remains, I would add, specific to your issue, that I don't recommend you should store the number for all of one owner's customers, instead, keep a SUBSCRIBED DATE Column in Customers table, and when querying, decide of the customer number upon display.
If you want however that number to be permanent (any by that the order 1,2,3,..n will probably break, since customers can be removed), simply use the Customer_Id, since it is already unique.

You can calculate Customer_Number on the fly when you need it:
select c.*, row_number() over (partition by Customer_Owner order by id) as CustomerNumber
from Customer c
This is a much safer approach than trying to store and maintain the number, which can be affected by all sorts of updates into the system. Imagine the fun of changing the numbering when an existing record changes it ownership, for instance.

If you only need unique numbering in the UI you could just assign the numbers in the UI. If you go that route you need to make sure you always retrieve customers in the same order, so add an ORDER BY Id, Or, do what Gordon Linoff suggests.

Composite key turns out to be not unique....trying to build a fix

OK...I am hoping this is a classic problem that everyone knows the answer to already. I have been building a mysql database (my first one) where the main purpose was to load line-item data from an invoice and related data from the matching remittance and reconcile the two. Basically, everything has been going along fine until I discovered a problem.
Details: I have thus far identified individual invoice line items with a client (to be billed) id, service date, and service type and matching that transaction against the remittance transaction with the same client ID, service date and service type. Unfortunately, there are times (I just discovered) when one client (ID) gets multiple instances of a particular service on the same day and thus my invoices are not unique based on the three components I just mentioned.
There is another piece of info on the invoice (service time) that could be used to make invoice items unique, but the remittance does not include service times (thus I cannot match directly against it using service time). Likewise, the remittance has another piece of info (claims ref number) that uniquely identifies remittance items. But of course, the claims ref number is not on the invoice.
Is there some way to use an intermediate table perhaps that can bridge this gap? Any help, answers or helpful links would be most appreciated. Thanks in advance.

This is perhaps more a business problem then a technical one-- it sounds like there is in fact no reliable way to match up remittances and invoices, unless something like matching on the dollar amount works. If you use an artificial key on the invoice you kind of solve the technical problem but not the business one.
If you can't change the business process at all and there is no technical way to match remittances and invoices, you might be forced to treat all invoices for a customer/service date/service type as a unit; make each invoice a part of that unit, and then group all the remittances and all the invoices that match that unit together.

You can make life easy on yourself and create an Invoice ID and remove the composite key all together.
Any type of fix is going to have an impact on the calling code, as increasing the field count on the composite key implies that this new field needs to be supplied, so I suggest just creating an invoice ID.

Many IT professionals that work with RDBMS will suggest to never use natral keys. Always use a surrogate key (like an auto-increment column)

I agree with #antlersoft (+1), this sounds mostly like a business problem: how to “match up” items within two separate sets of data that cannot be clearly and cleanly matched up with the data provided.
If the “powers that be” (aka your manager/supervisor/project owner) cannot or will not make this decision, and if you have to do something, based on the information provided I’d recommend matching same-day items like so:
lowest invoice-item service time with lowest remittance claims ref number
next-lowest invoice item service time with next-lowest remittance claims ref number
etc.
(So when you have such multiple-per-day items, do you always have the same number of invoice items and remittances? Or is that going to be your next hurdle?)
Once you know how to implement “matching up” items, you then have to implement it by storing the data that supports/defines the assocaition within the database. Assuming tables InvoiceItem and Remittance, you could add (and populate) ServiceTime in the Remittance table, or ClaimsRefNumber in the InvoiceItem table (the latter seems more sensible to me). Alternatively, as most people suggest, you could add a surrogate key to either (or both) tables, and store one’s surrogate key in the other’s table. (Again, I’d store, say, RemittanceId in table InvoiceItem, as presumably you couldn’t have a Remittance without an InvoiceItem – but it depends strongly upon your buseinss logic.)

SQL Server Business Logic: Deleting Referenced Data

I'm curious on how some other people have handled this.
Imagine a system that simply has Products, Purchase Orders, and Purchase Order Lines. Purchase Orders are the parent in a parent-child relationship to Purchase Order Lines. Purchase Order Lines, reference a single Product.
This works happily until you delete a Product that is referenced by a Purchase Order Line. Suddenly, the Line knows its selling 30 of something...but it doesn't know what.
What's a good way to anticipate the deletion of a referenced piece of data like this? I suppose you could just disallow a product to be deleted if any Purchase Order Lines reference it but that sounds...clunky. I imagine its likely that you would keep the Purchase Order in the database for years, essentially welding the product you want to delete into your database.

The parent entity should NEVER be deleted or the dependent rows cease to make sense, unless you delete them too. While it is "clunky" to display old records to users as valid selections, it is not clunky to have your database continue to make sense.
To address the clunkiness in the UI, some people create an Inactive column that is set to True when an item is no longer active, so that it can be kept out of dropdown lists in the user interface.
If the value is used in a display field (e.g. a readonly field) the inactive value can be styled in a different way (e.g. strike-through) to reflect its no-longer-active status.
I have StartDate and ExpiryDate columns in all entity tables where the entity can become inactive or where the entity will become active at some point in the future (e.g. a promotional discount).

Enforce referential integrity. This basically means creating foreign keys between the tables and making sure that nothing "disappears"
You can also use this to cause referenced items to be deleted when the parent is deleted (cascading deletes).
For example you can create a SQL Server table in such a way that if a PurchaseOrder is deleted it's child PurchaseOrderLines are also deleted.
Here is a good article that goes into that.
It doesn't seem clunky to keep this data (to me at least). If you remove it then your purchase order no longer has the meaning that it did when you created it, which is a bad thing. If you are worried about having old data in there you can always create an archive or warehouse database that contains stuff over a year old or something...

For data like this where parts of it have to be kept for an unknown amount of time while other parts will not, you need to take a different approach.
Your Purchase Order Lines (POL) table needs to have all of the columns that the product table has. When a line item is added to the purchase order, copy all of product data into the POL. This includes the name, price, etc. If the product has options, then you'll have to create a corresponding PurchaseOrderLineOptions table.
This is the only real way of insuring that you can recreate the purchase order on demand at any point. It also means that someone can change the pricing, name, description, and other information about the product at anytime without impacting previous orders.
Yes, you end up with a LOT of duplicate information in your line item table..; but that's okay.
For kicks, you might keep the product id in the POL table for referencing back, but you cannot depend on the product table to have any bearing on the paid for product...

Doubt regarding a database design

I have a doubt regarding a database design, suppose a finance/stock software
in the software, the user will be able to create orders,
those orders may contain company products or third-party products
typical product table:
PRIMARY KEY INT productId
KEY INT productcatId
KEY INT supplierId
VARCHAR(20) name
TEXT description
...
but i also need some more details in the company products like:
INT instock
DATETIME laststockupdate
...
The question is, how should i store the data?
I'm thinking in 2 options:
1 -
Have both company and third-party, products in a single table,
some columns will not be used by third-party products
identify the company products are identified by a supplier id
2 -
Have the company products and third-party in separated tables
3 - [new, thanks RibaldEddie]
Have a single product table,
company products have additional info in a separated table
Thanks in advance!

You didn't mention anything about needing to store separate bits of Vendor information, just that a type of product has extra information. So, you could have one products table and an InHouseProductDetails table that has a productId foreign key back to the products table that stores the company specific information. Then when you run your queries you can join the products table to the details table.
The benefit is that you don't have to have NULLable columns in the products table, so your data is safer from corruption and you don't have to store the products themselves in two separate tables.
Oooo go with 3! 3 is the best!

To be honest, I think the choice of #1 or #2 are completely dependent upon some other factors (I can only thing of 2 at the moment):
How much data is expected (affecting speed of queries)
Is scalability going to be a concern anywhere in the near future (I'd guess within 5 years)
If you did go with a single table for all inventory, then later decided to split them, you can. You suggested a supplier identifier of some sort. List suppliers in a table (your company included) with keys to your inventory. Then it really won't matter.
As far as UNION goes, it's been a while since I've written raw Sql - so I'm not sure if UNION is the correct syntax. However, I do know that you can pull data from multiple tables. Actually just found this: Retrieving Data from Multiple Tables with Sql Joins

I agree with RibaldEddie. Just one thing to add: put a unique constraint on that foreign key in your InHouseProductDetails table. That'll enforce that it's a one-to-one relationship between the two tables, so you don't accidently end up with two InHouseProductDetails records for one product (maybe from some dataload gone awry or something)
Constraints are like defensive driving; they help prevent the unexpected...

I would advice on using point #1. What happens when another supplier comes along? It's also more easy to extend on one product table/produst class.

Take into account the testing of your application also. Having all data in one table raises the possible requirement of testing both the 3rd Party & Company elements of your app for any change to either.
If you're happy that your Unit test would cover this off its not so much of a worry... if you're relying on a human tester then it becomes more of an issue when sizing the impact of changes.
Personally I'd go for the one products table with common details and separate tables for the 3rd party & Company specifics.

one table for products with a foreign key to the Vendor table; include your own company in the Vendor table
the Stock table can then be used to store information about stock levels for any product, not just yours
Note that you need the Stock table anyway, this just make the DB model more company-agnostic - so if you ever need to store stock level information about third-party products, there's no DB change required

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas