database design - when to split tables?

database design - when to split tables? - sql

Sometimes creating a separate table would produce much more work, should I split it anyway?
for example: In my project I have a table of customers, each customer has his own special price for each product (there are only 5 products & more products are not planned in the future), each customer also have unique days of the week when the company delivers to him the products.
Many operations like changing days/price for a customer, or displaying days & prices of all customers would be much easier when the days & product prices are columns in the customers table and not separate tables, so is it refuted to create only one big customers table in such case? What are the drawbacks?
UPDATE: They just informed me that after a year or so there's a chance that they add more products, they say their business won't exceed 20-30 products in any event.
I still can't understand why in such case when product's prices has no relation (each customer has his own special price) adding rows to Products table is better then adding Columns to Customers table?
The only benefit I could think of is that customer that has only 5 products won't have to 'carry' 20 nullable products (saves space on server)? I don't have much experience so maybe I'm missing the obvious?

Clearly, just saying that one should always normalize is not pragmatic. No advice is always true.
If you can say with certainty that 5 "items" will be enough for a long time I think it is perfectly fine to just store them as columns if it saves you work.
If your prediction fails and a 6th items needs to be stored you can add a new column. As long as the number of columns doesn't get out of hand with very high probability, this should not be a problem.
Just be careful with such tactics as the ability of many programmers to predict the future turns out to be very limited.
In the end only one thing counts: Delivering the requested solution at the lowest cost. Purity of code is not a goal.

Normalization is all about data integrity (consistency), nothing else; not about hard, easy, fast, slow, efficient and other murky attributes. The current design almost certainly allows for data anomalies. If not right now, the moment you try to track price changes, invoices, orders, etc, it is a dead end.

Related

Keeping track of total inventory, sales, etc; stored procedure or something like elastic search

I'm creating an inventory management and sales tool for an e-commerce site. I'm somewhat new to programming, and I'm curious what is the best way to keep track of totals. For example this company sells roughly 200 products a day, and I would like to be able to keep track of the total amount of products sold in dollars, units sold, and eventually graph this data. I would like to be able to graph a month's worth of these numbers (may 14: 145 units sold, $14,545, $2000 profit, may 15: etc). What is the best way of doing this?
I thought about creating a total's table, and every time a new order comes it adds order value to the previous total's amount, but this seems like it could get cloudy quick if an order doesn't get logged.
Doing a select all and adding the total's for each day for a month seems like it would be bad performance wise.
What options do I have and what do you recommend as the best solution?

I recommend against creating a totals table. While building a report that summarizes the totals from the transactional data may seen to cause a performance problem, in practice it might not be nearly as bad as you think. Two hundred orders per day over thirty days really isn't all that many records for most modern relational database systems.
If you did run into a significant performance issue with this one report, one thing you could do would be to run the report during any off-hours that the business may have and then cache the results of that run in a table for use when someone wants to view the report. However, before going to that trouble I recommend just trying out what was mentioned above and see if performance is really even that much of an issue.

Quickly compute millions of values for a search

Let's say I have a database of millions of widgets with a price attribute. Widgets belong to suppliers, and I sell widgets to customers by first buying them from suppliers and then selling them to the customer. With this basic setup, if a customer asks me for every widget less than $50, it's trivial to list them.
However, I mark up the price of widgets from individual suppliers differently. So I may mark up widgets from Supplier A by 10%, and I may mark up widgets from Supplier B by a flat rate of $5. In a database, these markups would be stored in a join table with my ID, the supplier ID, a markup type (flat, percentage), and a markup rate. On top of this, suppliers may add their own markups when they sell to me (these markups would be in the same join table with the supplier's ID, my ID, and a markup type/rate).
So if I want to sell a $45 widget from Supplier A, it might get marked up by the supplier's 10% markup (to $49.50), and then my own $10 flat markup (to $59.50). This widget would not show up in the client's search for widgets costing less than $50. However, it's possible that an $80 widget could get marked down to $45 by the time it reaches the client, and should be returned in results. These markups are subject to change, and let's assume I'm one of hundreds of people in this system selling widgets to customers through suppliers, all with their own markup relationships in that markup table.
Is there any precedent for performing calculations like this quickly across millions of objects? I realize this is a huge, non-trivial problem, but I'm curious how one would start addressing a problem like this.

Add columns to your database and store the computed results, updating them with the related records change. You cannot calculate these values on the fly for millions of records.

Is there any precedent for performing calculations like this quickly across millions of
objects?
Standard. Seriously. Data warehouse, risk projections. Stuff like that - your problem is small. Precaulcuate all combinations, store them in a proper higher level database server, finished.
it is not huge - seriously. It is only huge for a small server, but once you get a calculation grid going... it is quite trivial. Millions of objects? Calculate 100.000 objects in a minute per machine, 10 million are 100 minute objects. And you dont have THAT many changes.

Best way to calculate sum depending on dates with SQL

I don't know a good way to maintain sums depending on dates in a SQL database.
Take a database with two tables:
Client
clientID
name
overdueAmount
Invoice
clientID
invoiceID
amount
dueDate
paymentDate
I need to propose a list of the clients and order it by overdue amount (sum of not paid past invoices of the client). On big database it isn't possible to calculate it in real time.
The problem is the maintenance of an overdue amount field on the client. The amount of this field can change at midnight from one day to the other even if nothing changed on the invoices of the client.
This sum changes if the invoice is paid, a new invoice is created and due date is past, a due date is now past and wasn't yesterday...
The only solution I found is to recalculate every night this field on every client by summing the invoices respecting the conditions. But it's not efficient on very big databases.
I think it's a common problem and I would like to know if a best practice exists?

You should read about data warehousing. It will help you to solve this problem. It looks similar as what you just said
"The only solution I found is to recalculate every night this field
on every client by summing the invoices respecting the conditions. But
it's not efficient on very big databases."
But it has something more than that. When you read it, try to forget about normalization. Its main intention is for 'show' data, not 'manage' data. So, you would feel weird at beginning but if you understand 'why we need data warehousing', it will be very very interesting.
This is a book that can be a good start http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247 , classic one.

Firstly, I'd like to understand what you mean by "very big databases" - most RDBMS systems running on decent hardware should be able to calculate this in real time for anything less than hundreds of millions of invoices. I speak from experience here.
Secondly, "best practice" is one of those expressions that mean very little - it's often used to present someone's opinion as being more meaningful than simply an opinion.
In my opinion, by far the best option is to calculate it on the fly.
If your database is so big that you really can't do this, I'd consider a nightly batch (as you describe). Nightly batch runs are a pain - especially for systems that need to be available 24/7, but they have the benefit of keeping all the logic in a single place.
If you want to avoid nightly batches, you can use triggers to populate an "unpaid_invoices" table. When you create a new invoice record, a trigger copies that invoice to the "unpaid_invoices" table; when you update the invoice with a payment, and the payment amount equals the outstanding amount, you delete from the unpaid_invoices table. By definition, the unpaid_invoices table should be far smaller than the total number of invoices; calculating the outstanding amount for a given customer on the fly should be okay.
However, triggers are nasty, evil things, with exotic failure modes that can stump the unsuspecting developer, so only consider this if you have a ninja SQL developer on hand. Absolutely make sure you have a SQL query which checks the validity of your unpaid_invoices table, and ideally schedule it as a regular task.

Should a SQL database be able to handle the creation of thousands of tables on a daily basis or do I have to change my code?

This is my first time designing tables in a sql database and I have no idea how much server cpu this would use and whether this is a viable way of coding.
I have to create a bidding site where the gist is every time someone bids (where bids have to be bought separately at 50 cents per bid) the final price goes up by 1 cent, 2 cents, or 5 cents.
The trouble I'm facing is that I have to make a database table to keep track of the item's bid history and it seems like I have to create an individual table for each item (3 things need to be kept track of apart from the item id - bidder, bit time, cents at which it was bid on).
I'm fairly inexperienced in this and am willing to go back to the drawing board to brainstorm another table design, but was wondering if creating thousands (assuming the site will be somewhat successful) of table on a daily basis for each new item being listed is something that's alright. And I'm probably overestimating site traffic and might be more in the range of just a few hundred tables per day, but I want to prepare for the worst.

I would go back to the drawing board. Creating new tables for what is essentially the same thing is poor design. Have you heard of the DRY (Don't Repeat Yourself) principle.

Why do you think you need one table per item ?
you could design a table structure with perhaps to hold your items, their bid history with 2-3 tables for all items together... depending on the metadata it could usefull to have another 1-2 tables... always NOT pet item but per "information type" (like "item history", "item metadata").

Recommendations for best SQL Performance updating and/or calculating stockonhand totals

Apologies for the length of this question.
I have a section of our database design which I am worried may begin to cause problems. It is not at that stage yet, but obviously don't want to wait until it is to resolve the issue. But before I start testing various scenarios, I would appreciate input from anyone who has experience with such a problem.
Situation is Stock Control and maintaining the StockOnHand value.
It would be possible to maintain a table hold the stock control figures which can be updated whenever a order is entered either manually or by using a database trigger.
Alternatively you can get SQL to calculate the quantities by reading and summing the actual sales values.
The program is installed on several sites some of which are using MS-SQL 2005 and some 2008.
My problem is complicated because the same design needs to cope with several scenarios,
such as :
1) Cash/Sale Point Of Sale Environment. Sale is entered and stock is reduced in one transaction. No amendments can be made to this transaction.
2) Order/Routing/Confirmation
In this environment, the order is created and can be placed on hold, released, routed, amended, delivered, and invoiced. And at any stage until it is invoiced the order can be amended. (I mention this because any database triggers may be invoked lots of time and has to determine if changes should affect the stock on hand figures)
3) Different business have a different ideas of when their StockOnHand should be reduced. For example, some consider the stock as sold once they approve an order (as they have committed to sell the goods and hence it should not be sold to another person). Others do not consider the stock as sold until they have routed it and some others only when it has been delivered or collected.
4) There can be a large variance in number of transactions per product. For example, one system has four or five products which are sold several thousand times per month, so asking SQL to perform a sum on those transactions is reading ten of thousands of transactions per year, Whereas, on the same system, there are several thousand other products where sales would only less than a thousand transactions per year per product.
5) Historical information is important. For that reason, our system does not delete or archive transactions and has several years worth of transactions.
6) The system must have the ability to warn operators if they do not have the required stock when the order is entered ( which quite often is in real time, eg telephone order).
Note that this only required for some products. (But I don't think it would be practical to sum the quantity across ten of thousands of transactions in real time).
7) Average Cost Price. Some products can be priced based on the average cost of the items in stock. The way this is implemented is that the Average Cost price is re-calculated for every goods in transaction, something like newAverageCostPrice = (((oldAverageCostPrice * oldStockOnHand) + newCostValue) / newStockOnHand) . This means the stock On Hand must be known for every goods in if the product is using AverageCost.
The way the system is currently implemented is two fold.
We have a table which holds the StockOnHand for each product and location. Whenever a sale is updated, this table is updated via the business layer of our application (C#)
This only provides the current stock on hand figures.
If you need to run a Stock Valuation for a particular date, this figure is calculated by performing a sum of the quantitys on the lines involved. This also requires a join between the sales line and the sale header tables as the quantity and product are stored in the line file and the date and status are only held in the header table.
However, there are downsides to this method, such as.
Running the stock valuation report is slow, (but not unacceptably slow), but I am not happy with it. (It works and monitoring the server does not show it overloading it, but it has the potential to cause problems and hence requires regular monitoring)
The logic of the code updating the StockOnHand table is complicated.
This table is being updated frequently. In a lot of cases this is un-necessary as the information does not need to be checked. For example, if 90% of your business is selling 4 or 5 products, you don't really need a computer to tell you are out of stock.
Database trigggers.
I have never implemented complicated triggers before, so am wary of this.
For example, as stated before we need configuration options to determine the conditions when the stock figures should be updated. This is currently read once and cached in our program. To do this inside a trigger would persumably mean reading this information for every trigger. Does this have a big impact on performance.
Also we may need a trigger on the sale header and the sale line. (This could mean that an amendment to the sale header would be forced to read the lines and update the stockonhand for the relevant products, and then later on the lines are saved and another database trigger would amend the stockonahand table again which may be in-efficient.
Another alternative would be to only update the StockOnHand table whenever the transaction is invoiced (which means no further amendments can be done) and to provide a function to calculate the stockonhand figure based on a union of this table and the un-invoiced transactions which affect stock.
Any advice would be greatly appreciated.

First of I would strongly recommend you add "StockOnHand", "ReservedStock" and "SoldStock"
to your table.
A cash sale would immediatly Subtract the sale from "StockOnHand" and add it to "SoldStock", for an order you would leave "StockOnHand" alone and merely add the sale to ReservedStock, when the stock is finally invoiced you substract the sale from StockOnHand and Reserved stock and add it to "SoldStock".
The business users can then choose whether StockOnHand is just that or StockOnHand - ReservedStock.
Using a maintaind StockOnHand figure will reduce your query times massively, versus the small risk that the figure can go out of kilter if you mess up your program logic.
If your customers are so lucky enough to experience update contention when maintaining the StockOnHand figure (i.e. are they likely to process more than five sales a second at peak times) then you can consisider the following scheme:-
Overnight calculate the StockOnHand figure by counting deliveries - sales or whatever.
When a sale is confirmed insert a row to a "Todays Sales" table.
When you need to query stock on hand total up todays sale and subtract it from the start of day figure.
You could also place a "Stock Check Threshold" on each product so if you start the day with 10,000 widgets you can set the CheckThreshold to 100 if someone is ordering less than 100 than dont bother checking the stock. If someone orders over 100 then check the stock and recalculate a new lower threshold.

Could you create a view (or views) to respresent your stock on hand? This would take the responsibility for doing the calculations out of synchronous triggers which slow down your transactions. Using multiple views could satisfy the requirement "Different business have a different ideas of when their StockOnHand should be reduced." Assuming you can meet the stringent requirements, creating an indexed view could further improve your performance.

Just some ideas off the top of my head:
Instead of a trigger (and persistent SOH data), you could use a computed column (e.g. SOH per product per store). However, the performance impact of evaluating this would likely be abysmal unless there are >> more writes to your source tables than reads from your computed column. (The trade off is that is assuming the only reason you calculate the SOH is so that you can read it again. If you update the source data for the calc much more often than you actually need to read it, then the computed col might make sense - since it is JIT evaluation only when needed. This would be unusual though - reads are usually more frequent than writes in most Systems)
I'm guessing that the reason you are looking at triggers is because the source tables for the SOH figures are updated from a large number of procs / code in order to prevent oversight (as opposed to a calling a recalc SPROC from every applicable point where the source data has been modified?)
IMHO placing complicated in DB triggers is not advised, as this will adversely affect the performance of high volume inserts / updates, and triggers aren't great for maintainability.
Does the SOH calculation need to be real time? If not, you could implement a mechanism to queue requests for recalculation (e.g. by using a trigger to indicate that a product / location balance is dirty) and then run a recalculation service every few minutes for near real-time. Mission critical calculations (e.g. financial - like your #6) could still however detect that a SOH calc is dirty and then force a recalc before doing a transaction.
Re : 3 - Ouch. Would recommend that internally you agree on a consistent (and industry acceptable) set of terminology (Stock In Hand, Stock Committed, Stock in Transit, Shrinkage etc etc) and then try to convince your customers to conform to a standard. But that is in the ideal world of course!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas