soft delete or hard delete good for eCommerce - sql

I have already read this post but I am concerned the best solution for eCommerce site
Our scenario:
Product table
ProductID Name Price
OrderDetails table
OrderID ProductID
OrderDetails table has FK ProductID referrenced to ProductID of Product table
once product has been deleted, how are you going to display the historical order report?
Options:
soft delete disadvantage - it affects db storage performance
hard delete disadvantage - need extra join query while taking report
Any help would be great.

I would definitely go with soft delete. Especially if in an e-commerce context.
How about storing deleted products in an ArchivedProduct table and then doing the following:
SELECT
*
FROM
OrderDetails RIGHT JOIN Product ON OrderDetails.ProductID = Product.ProductID
UNION ALL
SELECT
*
FROM
OrderDetails RIGHT JOIN ArchivedProduct ON OrderDetails.ProductID = ArchivedProduct.ProductID
When you say
it affects db storage performance
Yes, there is an overhead in terms of performance which is entirely dependent upon the size of the 3 tables.
If at a later stage you wanted to increase the performance of the query, you could either wipe out some of the previously "deleted" products from the ArchivedProduct table based on your own considerations (for example, all products inserted prior to ...) or add some constraints to the second SELECT statement. You'd still be in a safer position than with a hard delete.

Related

Multi update query

I've created two temp tables. One with Orders which contains Article and Quantity and the other one with availability where we also have Article and Quantity. I would like to write a multi update query with subtracking order quantity from stock and from itself for all articles in temporary table with Orders. As far as I know it is not possible to alter two fields from different tables in one update query.
I've tried something like this, but it's of course doesn't work.
UPDATE #Stocks as s
INNER JOIN #Orders as o on o.ArticleId=s.ArticleId
SET
s.Quantity = (s.Quantity - o.Quanity)
FROM
#Stocks s
JOIN #Orders o on o.ArticleId=s.ArticleId
WHERE
#Stocks.ArticleId IN (SELECT ArticleId FROM #Orders)
When do you an update using a join with multiple matches, only one arbitrary row is chosen for the update. The key idea is to aggregate the data before the update:
UPDATE s
SET Quantity = (s.Quantity - o.Quanity)
FROM #Stocks s JOIN
(SELECT o.ArticleId, SUM(o.Quantity) as quantity
FROM #Orders o
GROUP BY o.ArticleId
) o
ON o.ArticleId = s.ArticleId;
Your statement is way over-complicated, mixing update syntax from SQL Server, MySQL, and Postgres. In addition, the WHERE clause is unnecessary because the JOIN does the filtering. However, even once the syntax errors are fixed, you will still have the problem of calculating incorrect results, unless you pre-aggregate the data.
Unfortunately, the description of this behavior is buried deep in the documentation of the first example on the update page:
The previous example assumes that only one sale is recorded for a
specified salesperson on a specific date and that updates are current.
If more than one sale for a specified salesperson can be recorded on
the same day, the example shown does not work correctly. The example
runs without error, but each SalesYTD value is updated with only one
sale, regardless of how many sales actually occurred on that day. This
is because a single UPDATE statement never updates the same row two
times. [emphasis added]
How about this?
UPDATE s
SET s.Quantity = (s.Quantity - o.Quanity)
FROM #Stocks as s
INNER JOIN #Orders as o on o.ArticleId=s.ArticleId
For updating two tables using single query, you should create a view that contain both tables columns and then update that view.
Your Question is all about Multi Update,
but updation perform in one table based on another table so
to do this use join
But if updation perform in two or more table we have to create view then we can update
thanks

Update some table when changes are made in a specific table

This is the database diagram of my sample database:
I want to UPDATE Ingredient Table when changes are made into OrderDetails table with the help of Table Trigger
But the problem is that the Ingredients table may have one or more entries for an Item. I want to update all entries in Ingredient table which are associated with an Item. For more description, see the algorithm below:
For each Ingredient in Ingredients for the current Item:
Update the Quantity in Ingredient table using the formula:
formula: Ingredient.Quantity = Ingredient.Quantity - (Item.Quantity * Ingredients.Quantity)
(The whole idea is whenever Item(s) for an OrderID is Added/Updated/Deleted, the Ingredient(s) Quantity should be decreased/increased by the Quantity mentioned)
We don't iterate or loop in SQL (unless we can avoid it). We describe what we want to happen, and let SQL work out how.
We also, generally, don't store data that we can compute. We can always compute the amount of each ingredient that has been ordered. (If there's another table tracking deliveries from our suppliers, that could also be computed over). If performance was an issue, we'd then consider creating an indexed view - which does include the calculated values, but SQL Server takes care of maintaining it automatically.
In that way, you avoid any discrepancies from creeping in (e.g. if your trigger is disabled).
All that being said, I think the trigger you want is:
create trigger T_OrderDetails
on OrderDetails
after insert,update,delete
as
begin
update ing
set Quantity = ing.Quantity - ((COALESCE(iod.Quantity,0) - COALESCE(dod.Quantity,0)) * i.Quantity)
from
inserted iod
full outer join
deleted dod
on
iod.ItemID = dod.ItemID
inner join
Ingredients i
on
i.ItemID = iod.ItemID or
i.ItemID = dod.ItemID --Cope with OUTER join above
inner join
Ingredient ing
on
i.IngID = ing.IngID
end

MS Access 2010 query pulls same records multiple times, sql challenge

I'm currently working on a program that keeps track of my company's stock inventory, using ms Access 2010. I'm having a hard time getting the query, intended to show inventory, to display the information I want. The problem seems to be that the query pulls the same record multiple times, inflating the sums of reserved and sold product.
Background:
My company stocks steel bars. We offer to cut the bars into pieces. From an inventory side, We want to track the length of each bar, from the moment it comes in to the warehouse, through it's time in the warehouse (where it might get cut into smaller pieces), until the entire bar is sold and gone.
Database:
The query giving problems, is consulting 3 tables;
Barstock (with the following fields)
BatchNumber (all the bars recieved, beloning to the same production heat)
BarNo (the individual bar)
Orginial Length (the length of the bar when recieved at the stock
(BatchNumber and BarNo combined, is the primary key)
Sales
ID (primary key)
BatchNumber
BarNo
Quantity Sold
Reservation (a seller kan reserve some material, when a customer signals interest, but needs time to decide)
ID (Primary key)
BatchNumber
BarNo
Quantity reserved
I'd like to pull information from the three tables into one list, that displays:
-Barstock.orginial length As Received
- Sales.Quantity sold As Sold
- Recieved - Sold As On Stock
- reservation.Quantity Reserved As Reserved
- On Stock - Reserved As Available.
The problem is that I suck at sql. I've looked into union and inner join to the best of my ability, but my efforts have been in vain. I usually rely on the design view to produce the Sql statements I need. With design view, I've come up with the following Sql:
SELECT
BarStock.BatchNo
, BarStock.BarNo
, First(BarStock.OrgLength) AS Recieved
, Sum(Sales.QtySold) AS SumAvQtySold
, [Recieved]-[SumAvQtySold] AS [On Stock]
, Sum(Reservation.QtyReserved) AS Reserved
, ([On Stock]-[Reserved])*[Skjemaer]![Inventory]![unitvalg] AS Available
FROM
(BarStock
INNER JOIN Reservation ON (BarStock.BarNo = Reservation.BarNo) AND (BarStock.BatchNo = Reservation.BatchNo)
)
INNER JOIN Sales ON (BarStock.BarNo = Sales.BarNo) AND (BarStock.BatchNo = Sales.BatchNo)
GROUP BY
BarStock.BatchNo
, BarStock.BarNo
I know that the query is pulling the same record multiple times because;
- when I remove the GROUP BY term, I get several records that are exactley the same.
- There are however, only one instance of these records in the corresponding tables.
I hope I've been able to explain myself properly, please ask if I need to elaborate on anything.
Thank you for taking the time to look at my problem!
!!! Checking some assumptions
From your database schema, it seems that:
There could be multiple Sales records for a given BatchNumber/BarNo (for instance, I can imagine that multiple customers may have bought subsections of the same bar).
There could be multiple Reservation records for a given BatchNumber/BarNo (for instance, multiple sections of the same bar could be 'reserved')
To check if you do indeed have multiple records in those tables, try something like:
SELECT CountOfDuplicates
FROM (SELECT COUNT(*) AS CountOfDuplicates
FROM Sales
GROUP BY BatchNumber & "," & BarNo)
WHERE CountOfDuplicates > 1
If the query returns some records, then there are duplicates and it's probably why your query is returning incorrect values.
Starting from scratch
Now, the trick to your make your query work is to really think about what is the main data you want to show, and start from that:
You basically want a list of all bars in the stock.
Some of these bars may have been sold, or they may be reserved, but if they are not, you should show the Quantity available in Stock. Your current query would never show you that.
For each bar in stock, you want to list the quantity sold and the quantity reserved, and combined them to find out the quantity remaining available.
So it's clear, your central data is the list of bars in stock.
Rather than try to pull everything into a single large query straight away, it's best to create simple queries for each of those goals and make sure we get the proper data in each case.
Just the Bars
From what you explain, each individual bar is recorded in the BarStock table.
As I said in my comment, from what I understand, all bars that are delivered have a single record in the BarStock table, without duplicates. So your main list against which your inventory should be measured is the BarStock table:
SELECT BatchNumber,
BarNo,
OrgLength
FROM BarStock
Just the Sales
Again, this should be pretty straightforward: we just need to find out how much total length was sold for each BatchNumber/BarNo pair:
SELECT BatchNumber,
BarNo,
Sum(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber, BarNo
Just the Reservations
Same as for Sales:
SELECT BatchNumber,
BarNo,
SUM(QtyReserved) AS Reserved
FROM Reservation
GROUP BY BatchNumber, BarNo
Original Stock against Sales
Now, we should be able to combine the first 2 queries into one. I'm not trying to optimise, just to make the data work together:
SELECT BarStock.BatchNumber,
BarStock.BarNo,
BarStock.OrgLength,
S.SumAvQtySold,
(BarStock.OrgLength - Nz(S.SumAvQtySold)) AS OnStock
FROM BarStock
LEFT JOIN (SELECT BatchNumber,
BarNo,
Sum(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber, BarNo) AS S
ON (BarStock.BatchNumber = S.BatchNumber) AND (BarStock.BarNo = S.BarNo)
We do a LEFT JOIN because there might be bars in stock that have not yet been sold.
If we did an INNER JOIN, we wold have missed these in the final report, leading us to believe that these bars were never there in the first place.
All together
We can now wrap the whole query in another LEFT JOIN against the reserved bars to get our final result:
SELECT BS.BatchNumber,
BS.BarNo,
BS.OrgLength,
BS.SumAvQtySold,
BS.OnStock,
R.Reserved,
(OnStock - Nz(Reserved)) AS Available
FROM (SELECT BarStock.BatchNumber,
BarStock.BarNo,
BarStock.OrgLength,
S.SumAvQtySold,
(BarStock.OrgLength - Nz(S.SumAvQtySold)) AS OnStock
FROM BarStock
LEFT JOIN (SELECT BatchNumber,
BarNo,
SUM(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber,
BarNo) AS S
ON (BarStock.BatchNumber = S.BatchNumber) AND (BarStock.BarNo = S.BarNo)) AS BS
LEFT JOIN (SELECT BatchNumber,
BarNo,
SUM(QtyReserved) AS Reserved
FROM Reservation
GROUP BY BatchNumber,
BarNo) AS R
ON (BS.BatchNumber = R.BatchNumber) AND (BS.BarNo = R.BarNo)
Note the use of Nz() for items that are on the right side of the join: if there is no Sales or Reservation data for a given BatchNumber/BarNo pair, the values for SumAvQtySold and Reserved will be Null and will render OnStock and Available null as well, regardless of the actual quantity in stock, which would not be the result we expect.
Using the Query designer in Access, you would have had to create the 3 queries separately and then combine them.
Note though that the Query Designed isn't very good at dealing with multiple LEFT and RIGHT joins, so I don't think you could have written the whole thing in one go.
Some comments
I believe you should read the information that #Remou gave you in his comments.
To me, there are some unfortunate design choices for this database: getting basic stock data should be as easy as s simple SUM() on the column that hold inventory records.
Usually, a simple way to track inventory is to keep track of each stock transaction:
Incoming stock records have a + Quantity
Outgoing stock records have a - Quantity
The record should also keep track of the part/item/bar reference (or ID), the date and time of the transaction, and -if you want to manage multiple warehouses- which warehouse ID is involved.
So if you need to know the complete stock at hand for all items, all you need to do is something like:
SELECT BarID,
Sum(Quantity)
FROM StockTransaction
GROUP BY BarID
In your case, while BatchNumber/BarNo is your natural key, keeping them in a separate Bar table would have some advantages:
You can use Bar.ID to get back the Bar.BatchNumber and Bar.BarNo anywhere you need it.
You can use BarID as a foreign key in your BarStock, Sales and Reservation tables. It makes joins easier without having to mess with the complexities of compound keys.
There are things that Access allows that are not really good practice, such as spaces in table names and fields, which end up making things less readable (at least because you need to keep them between []), less consistent with VBA variable names that represent these fields, and incompatible with other database that don't accept anything other than alphanumerical characters for table and field names (should you wish to up-size later or interface your database with other apps).
Also, help yourself by sticking to a single naming convention, and keep it consistent:
Do not mix upper and lower case inconsistently: either use CamelCase, or lower case or UPPER case for everything, but always keep to that rule.
Name tables in the singular -or the plural-, but stay consistent. I prefer to use the singular, like table Part instead of Parts, but it's just a convention (that has its own reasons).
Spell correctly: it's Received not Recieved. That mistake alone may cost you when debugging why some query or VBA code doesn't work, just because someone made a typo.
Each table should/must have an ID column. Usually, this will be an auto-increment that guarantees uniqueness of each record in the table. If you keep that convention, then foreign keys become easy to guess and to read and you never have to worry about some business requirement changing the fact that you could suddenly find yourself with 2 identical BatchNumbers, for some reason you can't fathom right now.
There are lots of debates about database design, but there are certain 'rules' that everyone agrees with, so my recommendation should be to strive for:
Simplicity: make sure that each table records one kind of data, and does not contain redundant data from other tables (normalisation).
Consistency: naming conventions are important. Whatever you choose, stick to it throughout your project.
Clarity: make sure that you-in-3-years and other people can easily read the table names and fields and understand what they are without having to read a 300 page specification. It's not always possible to be that clear, but it's something to strive for.

performance: joining tables vs. large table with redundant data

Lets say i have a bunch of products. Each product has and id, price, and long description made up of multiple paragraphs. Each product would also have multiple sku numbers that would represent different sizes and colors.
To clarify: product_id 1 has 3 skus, product_id 2 has 5 skus. All of the skus in product 1 share the same price and description. product 2 has a different price and description than product 1. All of product 2's skus share product 2's price and description.
I could have a large table with different records for each sku. The records would have redundant fields like the long description and price.
Or I could have two tables. One named "products" with product_id, price, and description. And one named "skus" with product_id, sku, color, and size. I would then join the tables on the product_id column.
$query = "SELECT * FROM skus LEFT OUTER JOIN products ON skus.product_id=products.product_id WHERE color='green'";
or
$query = "SELECT * FROM master_table WHERE color='green'";
This is a dumbed down version of my setup. In the end there will be a lot more columns and a lot of products. Which method would have better performance?
So to be more specific: Let's say I want to LIKE search on the long_description column for all of the skus. I am trying to compare having one table that has 5000 long_description and 5000 skus vs OUTER JOINing two tables, one has 1000 long_description records and the other has 5000 skus.
It depends on the usage of those tables - in order to get a definitive answer you should do both and compare using representative data sets / system usage.
The normal approach is to only denormalise data in order to combat specific performance problems that you are having, so in this case my advice would be to default to joining across two tables and only denormalise to using a single table if you have a performance problem and find that denormalisation fixes it.
OLTP normalized tables better
Join them at query, easier data manupulation and good response for short queries
OLAP denormalized tables better
Tables mostly dont change and good for long queries

Slow but simple Query, how to make it quicker?

I have a database which is 6GB in size, with a multitude of tables however smaller queries seem to have the most problems, and want to know what can be done to optimise them for example there is a Stock, Items and Order Table.
The Stock table is the items in stock this has around 100,000 records within with 25 fields storing ProductCode, Price and other stock specific data.
The Items table stores the information about the items there are over 2,000,000 of these with over 50 fields storing Item Names and other details about the item or product in question.
The Orders table stores the Orders of Stock Items, which is the when the order was placed plus the price sold for and has around 50,000 records.
Here is a query from this Database:
SELECT Stock.SKU, Items.Name, Stock.ProductCode FROM Stock
INNER JOIN Order ON Order.OrderID = Stock.OrderID
INNER JOIN Items ON Stock.ProductCode = Items.ProductCode
WHERE (Stock.Status = 1 OR Stock.Status = 2) AND Order.Customer = 12345
ORDER BY Order.OrderDate DESC;
Given the information here what could be done to improve this query, there are others like this, what alternatives are there. The nature of the data and the database cannot be detailed further however, so if general optmisation tricks and methods are given these will be fine, or anything which applies generally to databases.
The Database is MS SQL 2000 on Windows Server 2003 with the latest service packs for each.
DB Upgrade / OS Upgrade are not options for now.
Edit
Indices are Stock.SKU, Items.ProductCode and Orders.OrderID on the tables mentioned.
Execution plan is 13-16 seconds for a Query like this 75% time spent in Stock
Thanks for all the responses so far - Indexing seems to be the problem, all the different examples given have been helpful - dispite a few mistakes with the query, but this has helped me a lot some of these queries have run quicker but combined with the index suggestions I think I might be on the right path now - thanks for the quick responses - has really helped me and made me consider things I did not think or know about before!
Indexes ARE my problem added one to the Foriegn Key with Orders (Customer) and this
has improved performance by halfing execution time!
Looks like I got tunnel vision and focused on the query - I have been working with DBs for a couple of years now, but this has been very helpful. However thanks for all the query examples they are combinations and features I had not considered may be useful too!
is your code correct??? I'm sure you're missing something
INNER JOIN Batch ON Order.OrderID = Orders.OrderID
and you have a ) in the code ...
you can always test some variants against the execution plan tool, like
SELECT
s.SKU, i.Name, s.ProductCode
FROM
Stock s, Orders o, Batch b, Items i
WHERE
b.OrderID = o.OrderID AND
s.ProductCode = i.ProductCode AND
s.Status IN (1, 2) AND
o.Customer = 12345
ORDER BY
o.OrderDate DESC;
and you should return just a fraction, like TOP 10... it will take some milliseconds to just choose the TOP 10 but you will save plenty of time when binding it to your application.
The most important (if not already done): define your primary keys for the tables (if not already defined) and add indexes for the foreign keys and for the columns you are using in the joins.
Did you specify indexes? On
Items.ProductCode
Stock.ProductCode
Orders.OrderID
Orders.Customer
Sometimes, IN could be faster than OR, but this is not as important as having indexes.
See balexandre answer, you query looks wrong.
Some general pointers
Are all of the fields that you are joining on indexed?
Is the ORDER BY necessary?
What does the execution plan look like?
BTW, you don't seem to be referencing the Order table in the question query example.
Table index will certainly help as Cătălin Pitiș suggested.
Another trick is to reduce the size of the join rows by either use sub select or to be more extreme use temp tables. For example rather than join on the whole Orders table, join on
(SELECT * FROM Orders WHERE Customer = 12345)
also, don't join directly on Stock table join on
(SELECT * FROM Stock WHERE Status = 1 OR Status = 2)
Setting the correct indexes on the tables is usually what makes the biggest difference for performance.
In Management Studio (or Query Analyzer for earlier versions) you can choose to view the execution plan of the query when you run it. In the execution plan you can see what the database is really doing to get the result, and what parts takes the most work. There are some things to look for there, like table scans, that usually is the most costly part of a query.
The primary key of a table normally has an index, but you should verify that it's actually so. Then you probably need indexes on the fields that you use to look up records, and fields that you use for sorting.
Once you have added an index, you can rerun the query and see in the execution plan if it's actually using the index. (You may need to wait a while after creating the index for the database to build the index before it can use it.)
Could you give it a go?
SELECT Stock.SKU, Items.Name, Stock.ProductCode FROM Stock
INNER JOIN Order ON Order.OrderID = Stock.OrderID AND (Order.Customer = 12345) AND (Stock.Status = 1 OR Stock.Status = 2))
INNER JOIN Items ON Stock.ProductCode = Items.ProductCode
ORDER BY Order.OrderDate DESC;
Elaborating on what Cătălin Pitiș said already: in your query
SELECT Stock.SKU, Items.Name, Stock.ProductCode
FROM Stock
INNER JOIN Order ON Order.OrderID = Stock.OrderID
INNER JOIN Items ON Stock.ProductCode = Items.ProductCode
WHERE (Stock.Status = 1 OR Stock.Status = 2) AND Order.Customer = 12345
ORDER BY Order.OrderDate DESC;
the criterion Order.Customer = 12345 looks very specific, whereas (Stock.Status = 1 OR Stock.Status = 2) sounds unspecific. If this is correct, an efficient query consists of
1) first finding the orders belonging to a specific customer,
2) then finding the corresponding rows of Stock (with same OrderID) filtering out those with Status in (1, 2),
3) and finally finding the items with the same ProductCode as the rows of Stock in 2)
For 1) you need an index on Customer for the table Order, for 2) an index on OrderID for the table Stock and for 3) an index on ProductCode for the table Items.
As long your query does not become much more complicated (like being a subquery in a bigger query, or that Stock, Order and Items are only views, not tables), the query optimizer should be able to find this plan already from your query. Otherwise, you'll have to do what kuoson is suggesting (but the 2nd suggestion does not help, if Status in (1, 2) is not very specific and/or Status is not indexed on the table Status). But also remember that keeping indexes up-to-date costs performance if you do many inserts/updates on the table.
To shorten my answer I gave 2 hours ago (when my cookies where switched off):
You need three indexes: Customer for table Order, OrderID for Stock and ProductCode for Items.
If you miss any of these, you'll have to wait for a complete table scan on the according table.