I am developing a database system for my employer and part of this involves creating invoices. I've been thinking about the auto-increment ids on my tables, and to what extent I need to make allowances for growth of the business. I am utilising InnoDB because the system will be very comprehensive, and many records will get updated.
Simplified, here is what I have currently:
Office (An office/store of the business. Currently 2.)
office_id (PK) INT, AI, UN
Invoice
invoice_no (PK) INT, AI, UN
office_id (FK) (Where the invoice originated from.)
Products
product_id (PK) INT, AI, UN
InvoiceLine (Ties products to an invoice to make the lines.)
invoice_line_id (PK) INT, AI, UN
invoice_no (FK)
product_id (FK)
quantity
Firstly, while I'll probably never run out of invoice numbers, I wonder if there may be a better way to approach this, just incase the business does have an unanticipated expansion of offices and increase in sales. How would a large company with say 50+ stores tackle this? Would each store likely have its own set of invoice numbers starting from 1?
This is what I've considered...
Option 1 - Should I make the invoice_no bigger than the standard 10 precision? Regardless of difficulty, could this be changed after deployment if we saw the current limit would be insufficient, or is this impossible/highly problematic?
Option 2 - Pardon my ignorance but is it possible/wise to have a database made up of tables with different engine types? It is my understanding that with MyISAM, the invoice table could have a composite key of office_id and invoice_no, where the auto-incrementing number would increase separately for each office. Is this true and viable?
Option 3 - Could I have new tables created upon the insert of new office? Create table InvoiceX & InvoiceXLine, where X is the office_id?
Is there a better, simpler method that Im just not thinking of?
Secondly, if the business expands and we were averaging 30+ lines per invoice, it is conceivable that the invoice_line_ids would run out in the long term. So I probably need a similar solution for this, except Option 3 above (creating an InvoiceLineX table for every invoice_no) would be completely impractical in this case.
Could I simply make the primary key for the InvoiceLine table a composite of invoice_no and product_id?
It's kind of a business question. Until you know how they intend to send invoices, why would you guess? That said, if I had to keep and eye on the future I'd keep a few separate IDs.
A master, magic number that is just the sequential unique ID that's as big as you need (maybe an INT, maybe bigger depending on your business size),
an "invoice originator" column being the store (or whatever) that generated it,
another column for "invoice processing entity ID" being the store/accounts office that issued/needs to deal with it throughout its lifecycle.
That gives you more flex if you have, say the larger of a store in a state processing all invoices in that state. Of course this is guesswork!
The point of all this is that you've collected a lot of data that will likely be useful in its own right and then your actual invoice number will be some combination of those things.
Use your imagination (or business analyst) about what else you might want to keep & use.
Can't help you with the DB types.
Do not have one table per location/invoice line. That would suck big time.
A side note - you will always get gaps in your IDs. They are unavoidable so try not to get distracted with that and insist on gap-free. You can't get that with any level of performance and you probably don't even need it.
If you think you might need it gap-free or broken down by location, put in a batch job that allocates an office/store/whatever specific number at the end of each day. That way you can allocate some nice numbers as you see fit, using the basic sequence from the underlying IDs.
I think the short answer is go with more or less what you have unless it proves to be wrong. All your suggestions are do to with problems you either don't know the answer to or won't happen.
Related
I am upgrading a system for a client which was developed by myself around 10 years ago.
It is a standard (if there can be such a thing, of course) sales / inventory / accounting system.
One of the additions they have asked me about was the ability to create draft orders. As the company has grown, so have the sizes of the orders. They want the ability to begin entering an order for a client and have the option of saving and coming back to it later.
My initial thoughts would be to have an orders table which includes drafts and a field which signified the status (draft / posted). This would prevent duplicating data across an Orders table and a DraftOrders table.
This seems correct to me but of course the OrderId field (auto-increment int) would no longer be a solid identifier for the Order (since a lot of the numbers in between orders may be missing).
The client would ideally like to keep the OrderId as an identifier so is there any solution which would enable this, rather than creating a draft order table?
Many thanks in advance for your help.
Kind regards
If you are to ensure that the identifier has no gaps for taxation purposes, you can not use the PK in the first place. This is because the sequence may have gaps, too. For example, if an INSERT fails due to some constraint violation you lose the reserved sequence number.
In case you do not want to create a separate table, I may suggest adding a new column to store the tax order ID. It will remain NULL for drafts and will be filled programmatically when the order is placed. On the UI you will show this new column and will possibly allow some searching on it (hint: good candidate for an index), yet internally you will still use same FKs as before (for both orders and drafts).
I'm about to design a Hotel Booking system.
Each Hotel has some RoomTypes assigned.
RoomType: id | name | hotel_id
Each Hotel offers some quantities of RoomTypes for specific period (given date_from and date_to).
Also each Client has an ability to make a Reservation of some quantities of some RoomTypesfor specified period (date_from and date_to).
I need to be able to find & display available Offers for given Hotel, to know number of free (Offered - Booked) rooms of each RoomType for each day, query against minimum number of free rooms etc.
I'd like to ask for advice, how should I keep the data. What solution is optimal? I know some queries (like, display number of free rooms of given type for each day in given range) can't be achieved with simple SQL query unless I use stored procedures. However I'd like to make it as fast and easy to implement as possible.
so far I've considered:
keep RoomOffer: hotel_id | date_from | date_to | quantity | room_type_id and the same with Reservation
have RoomOffer: hotel_id | date | quantity | room_type_id and the same with Reservation - i.e. when creating RoomOffer / Reservation, create single record for every day in given range.
any advices?
I assume that RoomType refers to a single room, and also that the primary key for each room is the tuple (hotel_id, room_type_id), since you use both fields for RoomOffer.
However, I do not recommend you to take the RoomOffer and Reservation approach. First of all, because you are storing a lot of redundant information: When a room is not still booked, you need a room offer to say that it is available (or even worse, plenty of them because you divide it by time ranges), something that you already know.
Instead of that, I'd suggest a desing more similar to this one:
From your question I know you are concerned about the performance of the system, but usually this kind of optimizations in the design phase are not a good idea. Doing so probably leads you to a lot of redundant data and coupled classes. However, you can improve the performance of the DB queries using indices, or even with no-SQL approaches. That is something you can evaluate better in a later phase of the project.
I'm working on a database to hold information for an on-call schedule. Currently I have a structure that looks about like this:
Table - Person: (key)ID, LName, FName, Phone, Email
Table - PersonTeam: (from Person)ID, (from Team)ID
Table - Team: (key)ID, TeamName
Table - Calendar: (key dateTime)dt, year, month, day, etc...
Table - Schedule: (from Calendar)dt, (id of Person)OnCall_NY, (id of Person)OnCall_MA, (id of Person)OnCall_CA
My question is: With the Schedule table, should I leave it structured as is, where the dt is a unique key, or should I rearrange it so that dt is non-unique and the table looks like this:
Table - Schedule: (from Calendar)dt, (from Team)ID, (from Person)ID
and have multiple entries for each day, OR would it make sense to just use:
Table - Schedule: (from Calendar)dt, (from PersonTeam)KeyID - [make a key ID on each of the person/team pairings]
A team will always have someone on call, but a person can be on call for more than one team at a time (if they are on multiple teams).
If a completely different setup would work better let me know too!
Thanks for any help! I apologize if my question is unclear. I'm learning fast but nevertheless still fairly new to using SQL daily, so I want to make sure I'm using best practices when I learn so I don't develop bad habits.
The current version, one column per team, is probably not a good idea. Since you're representing teams as a table (and not as an enum or equivalent), it means you expect to add/remove teams over time. That would force you to add/remove columns to the table, which is always a much larger task than adding/removing a few rows.
The 2nd option is the usual solution to a problem like this. A safe choice. You can always define an additional foreign key constraint from Schedule(teamID, personID) to PersonTeam to ensure you don't mistakenly assign schedule duty to a person not belonging to the team.
The 3rd option is pretty much equivalent to the 2nd, only you're swapping a composite natural key for PersonTeam for a surrogate simple key. Since the two components of said composite key are already surrogate, there is no advantage (in terms of immutability, etc.) to adding this additional one. Plus it would turn a very simple N-M relationship (PersonTeam) which most DB managers / ORMs will handle nicely into a more complex object which will need management on its own.
By Occam's razor, I'd do away with the additional surrogate key and use your 2nd option.
In my view, the answer may depend on whether the number of teams is fixed and fairly small. Of course, whether the names of the teams are fixed or not, may also matter, but that would probably have more to do with column naming.
More specifically, my view is this:
If the business requirement is to always have a small and fixed number of people (say, three) on call, then it may well be more convenient to allocate three columns in Schedule, one for every team to hold the ID of the appointed person, i.e. like your current structure:
dt OnCall_NY OnCall_MA OnCall_CA
--- --------- --------- ---------
with dt as the primary key.
If the number of teams (in the Team table) is fixed too, you could include teams' names/designators in the column names like you are doing now, but if the number of teams is more than three and it's just the number of teams in Schedule that is limited to three, then you could just use names like OnCallID1, OnCallID2, OnCallID3.
But even if that requirement is fixed, it may only turn out fixed today, and tomorrow your boss says, "We no longer work with a fixed number of teams (on call)", or "We need to extend the number of teams supported to four, and we may need to extend it further in the future". So, a more universal approach would be the one you are considering switching to in your question, that is
dt Team Person
--- ---- ------
where the primary key would now be dt, Team.
That way you could easily extend/reduce the number of people on call on the database level without having to change anything in the schema.
UPDATE
I forgot to address your third option in my original answer (sorry). Here goes.
Your first option (the one actually implemented at the moment) seems to imply that every team can be presented by (no more than) one person only. If you assign surrogate IDs to the Person/Team pairs and use those keys in Schedule instead of separate IDs for Person and Team, you will probably be unable to enforce the mentioned "one person per team in Schedule" requirement (or, at least, that might prove somewhat troublesome) at the database level, while, using separate keys, it would be just enough to set Team to be part of a composite key (dt, Team) and you are done, no more than one team per day now.
Also, you may have difficulties letting a person change the team over time if their presence in the team was fixated in this way, i.e. with a Schedule reference to the Person/Team pair. You would probably have to change the Team reference in the PersonTeam table, which would result in misrepresentation of historical info: when looking at the people on call back on certain day, the person's Team shown would be the one they belong to now, not the one they did then.
Using separate IDs for people and teams in Schedule, on the other hand, would allow you to let people change teams freely, provided you do not make (Schedule.Team, Schedule.Person) a reference to (PersonTeam.Team, PersonTeam.Person), of course.
I'm building a web application for a printing company and am trying to determine the best design for my database.
I have an orders table. Each order has many proofs. However, there are two different kinds of proofs: electronic proofs and physical proofs. Furthermore, if its an electronic proof, I need to store the image source, and if its a physical proof, i need to store a tracking number.
Every proof has many comments associated with it.
It would be nice if i could just have one proofs table, and one comments table, but what is the best way to keep track of the image source and tracking number based on the proof type?
I've considered creating separate tables for electronic_proofs and physical_proofs, but that would require making a separate electronic_proof_comments table and physical_proof_comments table.
The other option would be to have a single proofs table, in which case I could then have a single comments table. However, as I see it, that would require three support tables, proof_types, image_sources, and tracking_numbers.
Any thoughts on the best way, or an even better way of solving this problem?
As mentioned in another answer, you only need one table and would simply leave one of the fields blank, depending on the type. However, the difference in my answer would be to relate the order details to the proof rather than the proof to the order details (or order item in their example). Doing it this way allows you to have multiple proofs per order detail.
This is how I would set it up if I were you:
ORDERS
OrderID (PK)
CustomerID (FK)
Etc...
ORDERDETAILS
OrderDetailsID (PK)
OrderID (FK)
ProductID? (FK)
Etc...
PROOFS
ProofID (PK)
OrderDetailsID (FK)
ProofType
ImagePath (just path, not image)
TrackingNumber
Etc...
COMMENTS
CommentID (PK)
ProofID (FK)
Comment
Etc...
It would probably also be wise to break ProofType into it's own table, but will work without. If you were to do that, you'd create a ProofType table and change the "ProofType" field in the "Proofs" table to a foreign key referencing the "ProofTypeID" field in the new "ProofType" table.
Hope this helps! Good Luck!
I agree with #Ricketts (+1). The tricky part is whether to have columns ImagePath and TrackingNumber in the Proof table, or to normalize them out into separate tables. (Normalize, because they don't depend on the primary key, they depend on the primary key + the proof type column.) If these are the only two columns that are proof-type-specific, then you're probably ok storing them in the single table... but that ImagePath makes me nervous, particularly if its not an image but an actual sizable chunk of binary data. It might make sense for a number of reasons to store that data separately in its own table, but not move TrackingNumber out as well.
Other considerations: what's the ratio between proof types? How is performance likely to work (particularly if there's a BLOB involved in your data requests?) You have to weigh and perhaps test these considerations before making your final decision.
I would expect that proof type is just an attribute of the item, as is the tracking number and image source, correct?
It's similar to a situation where some items may come in sizes, but some don't - you just don't populate the attributes that don't matter for that specific type of item.
Also, note that with two different "proof" tables, your order line item table now has to deal with two different entities.
Something like this should be doable for basic use...
ORDERS
Order ID (PK)
ORDER ITEM
Order Item ID (PK)
Order ID (FK)
Proof ID (FK)
PROOF
Proof ID (PK)
Proof Name
Proof Type
Tracking Number
Image Source
COMMENTS
Comment ID (PK)
Proof ID (FK)
Comment Text
You can create lookup tables for proof type, tracking number, and image source if necessary. It really depends on how far you want to go to match reality to relational theory.
I have a doubt regarding a database design, suppose a finance/stock software
in the software, the user will be able to create orders,
those orders may contain company products or third-party products
typical product table:
PRIMARY KEY INT productId
KEY INT productcatId
KEY INT supplierId
VARCHAR(20) name
TEXT description
...
but i also need some more details in the company products like:
INT instock
DATETIME laststockupdate
...
The question is, how should i store the data?
I'm thinking in 2 options:
1 -
Have both company and third-party, products in a single table,
some columns will not be used by third-party products
identify the company products are identified by a supplier id
2 -
Have the company products and third-party in separated tables
3 - [new, thanks RibaldEddie]
Have a single product table,
company products have additional info in a separated table
Thanks in advance!
You didn't mention anything about needing to store separate bits of Vendor information, just that a type of product has extra information. So, you could have one products table and an InHouseProductDetails table that has a productId foreign key back to the products table that stores the company specific information. Then when you run your queries you can join the products table to the details table.
The benefit is that you don't have to have NULLable columns in the products table, so your data is safer from corruption and you don't have to store the products themselves in two separate tables.
Oooo go with 3! 3 is the best!
To be honest, I think the choice of #1 or #2 are completely dependent upon some other factors (I can only thing of 2 at the moment):
How much data is expected (affecting speed of queries)
Is scalability going to be a concern anywhere in the near future (I'd guess within 5 years)
If you did go with a single table for all inventory, then later decided to split them, you can. You suggested a supplier identifier of some sort. List suppliers in a table (your company included) with keys to your inventory. Then it really won't matter.
As far as UNION goes, it's been a while since I've written raw Sql - so I'm not sure if UNION is the correct syntax. However, I do know that you can pull data from multiple tables. Actually just found this: Retrieving Data from Multiple Tables with Sql Joins
I agree with RibaldEddie. Just one thing to add: put a unique constraint on that foreign key in your InHouseProductDetails table. That'll enforce that it's a one-to-one relationship between the two tables, so you don't accidently end up with two InHouseProductDetails records for one product (maybe from some dataload gone awry or something)
Constraints are like defensive driving; they help prevent the unexpected...
I would advice on using point #1. What happens when another supplier comes along? It's also more easy to extend on one product table/produst class.
Take into account the testing of your application also. Having all data in one table raises the possible requirement of testing both the 3rd Party & Company elements of your app for any change to either.
If you're happy that your Unit test would cover this off its not so much of a worry... if you're relying on a human tester then it becomes more of an issue when sizing the impact of changes.
Personally I'd go for the one products table with common details and separate tables for the 3rd party & Company specifics.
one table for products with a foreign key to the Vendor table; include your own company in the Vendor table
the Stock table can then be used to store information about stock levels for any product, not just yours
Note that you need the Stock table anyway, this just make the DB model more company-agnostic - so if you ever need to store stock level information about third-party products, there's no DB change required