MS Access 2010 query pulls same records multiple times, sql challenge - sql

I'm currently working on a program that keeps track of my company's stock inventory, using ms Access 2010. I'm having a hard time getting the query, intended to show inventory, to display the information I want. The problem seems to be that the query pulls the same record multiple times, inflating the sums of reserved and sold product.
Background:
My company stocks steel bars. We offer to cut the bars into pieces. From an inventory side, We want to track the length of each bar, from the moment it comes in to the warehouse, through it's time in the warehouse (where it might get cut into smaller pieces), until the entire bar is sold and gone.
Database:
The query giving problems, is consulting 3 tables;
Barstock (with the following fields)
BatchNumber (all the bars recieved, beloning to the same production heat)
BarNo (the individual bar)
Orginial Length (the length of the bar when recieved at the stock
(BatchNumber and BarNo combined, is the primary key)
Sales
ID (primary key)
BatchNumber
BarNo
Quantity Sold
Reservation (a seller kan reserve some material, when a customer signals interest, but needs time to decide)
ID (Primary key)
BatchNumber
BarNo
Quantity reserved
I'd like to pull information from the three tables into one list, that displays:
-Barstock.orginial length As Received
- Sales.Quantity sold As Sold
- Recieved - Sold As On Stock
- reservation.Quantity Reserved As Reserved
- On Stock - Reserved As Available.
The problem is that I suck at sql. I've looked into union and inner join to the best of my ability, but my efforts have been in vain. I usually rely on the design view to produce the Sql statements I need. With design view, I've come up with the following Sql:
SELECT
BarStock.BatchNo
, BarStock.BarNo
, First(BarStock.OrgLength) AS Recieved
, Sum(Sales.QtySold) AS SumAvQtySold
, [Recieved]-[SumAvQtySold] AS [On Stock]
, Sum(Reservation.QtyReserved) AS Reserved
, ([On Stock]-[Reserved])*[Skjemaer]![Inventory]![unitvalg] AS Available
FROM
(BarStock
INNER JOIN Reservation ON (BarStock.BarNo = Reservation.BarNo) AND (BarStock.BatchNo = Reservation.BatchNo)
)
INNER JOIN Sales ON (BarStock.BarNo = Sales.BarNo) AND (BarStock.BatchNo = Sales.BatchNo)
GROUP BY
BarStock.BatchNo
, BarStock.BarNo
I know that the query is pulling the same record multiple times because;
- when I remove the GROUP BY term, I get several records that are exactley the same.
- There are however, only one instance of these records in the corresponding tables.
I hope I've been able to explain myself properly, please ask if I need to elaborate on anything.
Thank you for taking the time to look at my problem!

!!! Checking some assumptions
From your database schema, it seems that:
There could be multiple Sales records for a given BatchNumber/BarNo (for instance, I can imagine that multiple customers may have bought subsections of the same bar).
There could be multiple Reservation records for a given BatchNumber/BarNo (for instance, multiple sections of the same bar could be 'reserved')
To check if you do indeed have multiple records in those tables, try something like:
SELECT CountOfDuplicates
FROM (SELECT COUNT(*) AS CountOfDuplicates
FROM Sales
GROUP BY BatchNumber & "," & BarNo)
WHERE CountOfDuplicates > 1
If the query returns some records, then there are duplicates and it's probably why your query is returning incorrect values.
Starting from scratch
Now, the trick to your make your query work is to really think about what is the main data you want to show, and start from that:
You basically want a list of all bars in the stock.
Some of these bars may have been sold, or they may be reserved, but if they are not, you should show the Quantity available in Stock. Your current query would never show you that.
For each bar in stock, you want to list the quantity sold and the quantity reserved, and combined them to find out the quantity remaining available.
So it's clear, your central data is the list of bars in stock.
Rather than try to pull everything into a single large query straight away, it's best to create simple queries for each of those goals and make sure we get the proper data in each case.
Just the Bars
From what you explain, each individual bar is recorded in the BarStock table.
As I said in my comment, from what I understand, all bars that are delivered have a single record in the BarStock table, without duplicates. So your main list against which your inventory should be measured is the BarStock table:
SELECT BatchNumber,
BarNo,
OrgLength
FROM BarStock
Just the Sales
Again, this should be pretty straightforward: we just need to find out how much total length was sold for each BatchNumber/BarNo pair:
SELECT BatchNumber,
BarNo,
Sum(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber, BarNo
Just the Reservations
Same as for Sales:
SELECT BatchNumber,
BarNo,
SUM(QtyReserved) AS Reserved
FROM Reservation
GROUP BY BatchNumber, BarNo
Original Stock against Sales
Now, we should be able to combine the first 2 queries into one. I'm not trying to optimise, just to make the data work together:
SELECT BarStock.BatchNumber,
BarStock.BarNo,
BarStock.OrgLength,
S.SumAvQtySold,
(BarStock.OrgLength - Nz(S.SumAvQtySold)) AS OnStock
FROM BarStock
LEFT JOIN (SELECT BatchNumber,
BarNo,
Sum(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber, BarNo) AS S
ON (BarStock.BatchNumber = S.BatchNumber) AND (BarStock.BarNo = S.BarNo)
We do a LEFT JOIN because there might be bars in stock that have not yet been sold.
If we did an INNER JOIN, we wold have missed these in the final report, leading us to believe that these bars were never there in the first place.
All together
We can now wrap the whole query in another LEFT JOIN against the reserved bars to get our final result:
SELECT BS.BatchNumber,
BS.BarNo,
BS.OrgLength,
BS.SumAvQtySold,
BS.OnStock,
R.Reserved,
(OnStock - Nz(Reserved)) AS Available
FROM (SELECT BarStock.BatchNumber,
BarStock.BarNo,
BarStock.OrgLength,
S.SumAvQtySold,
(BarStock.OrgLength - Nz(S.SumAvQtySold)) AS OnStock
FROM BarStock
LEFT JOIN (SELECT BatchNumber,
BarNo,
SUM(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber,
BarNo) AS S
ON (BarStock.BatchNumber = S.BatchNumber) AND (BarStock.BarNo = S.BarNo)) AS BS
LEFT JOIN (SELECT BatchNumber,
BarNo,
SUM(QtyReserved) AS Reserved
FROM Reservation
GROUP BY BatchNumber,
BarNo) AS R
ON (BS.BatchNumber = R.BatchNumber) AND (BS.BarNo = R.BarNo)
Note the use of Nz() for items that are on the right side of the join: if there is no Sales or Reservation data for a given BatchNumber/BarNo pair, the values for SumAvQtySold and Reserved will be Null and will render OnStock and Available null as well, regardless of the actual quantity in stock, which would not be the result we expect.
Using the Query designer in Access, you would have had to create the 3 queries separately and then combine them.
Note though that the Query Designed isn't very good at dealing with multiple LEFT and RIGHT joins, so I don't think you could have written the whole thing in one go.
Some comments
I believe you should read the information that #Remou gave you in his comments.
To me, there are some unfortunate design choices for this database: getting basic stock data should be as easy as s simple SUM() on the column that hold inventory records.
Usually, a simple way to track inventory is to keep track of each stock transaction:
Incoming stock records have a + Quantity
Outgoing stock records have a - Quantity
The record should also keep track of the part/item/bar reference (or ID), the date and time of the transaction, and -if you want to manage multiple warehouses- which warehouse ID is involved.
So if you need to know the complete stock at hand for all items, all you need to do is something like:
SELECT BarID,
Sum(Quantity)
FROM StockTransaction
GROUP BY BarID
In your case, while BatchNumber/BarNo is your natural key, keeping them in a separate Bar table would have some advantages:
You can use Bar.ID to get back the Bar.BatchNumber and Bar.BarNo anywhere you need it.
You can use BarID as a foreign key in your BarStock, Sales and Reservation tables. It makes joins easier without having to mess with the complexities of compound keys.
There are things that Access allows that are not really good practice, such as spaces in table names and fields, which end up making things less readable (at least because you need to keep them between []), less consistent with VBA variable names that represent these fields, and incompatible with other database that don't accept anything other than alphanumerical characters for table and field names (should you wish to up-size later or interface your database with other apps).
Also, help yourself by sticking to a single naming convention, and keep it consistent:
Do not mix upper and lower case inconsistently: either use CamelCase, or lower case or UPPER case for everything, but always keep to that rule.
Name tables in the singular -or the plural-, but stay consistent. I prefer to use the singular, like table Part instead of Parts, but it's just a convention (that has its own reasons).
Spell correctly: it's Received not Recieved. That mistake alone may cost you when debugging why some query or VBA code doesn't work, just because someone made a typo.
Each table should/must have an ID column. Usually, this will be an auto-increment that guarantees uniqueness of each record in the table. If you keep that convention, then foreign keys become easy to guess and to read and you never have to worry about some business requirement changing the fact that you could suddenly find yourself with 2 identical BatchNumbers, for some reason you can't fathom right now.
There are lots of debates about database design, but there are certain 'rules' that everyone agrees with, so my recommendation should be to strive for:
Simplicity: make sure that each table records one kind of data, and does not contain redundant data from other tables (normalisation).
Consistency: naming conventions are important. Whatever you choose, stick to it throughout your project.
Clarity: make sure that you-in-3-years and other people can easily read the table names and fields and understand what they are without having to read a 300 page specification. It's not always possible to be that clear, but it's something to strive for.

Related

Display correct results in many to many relatonship tables

I currently have three tables.
master_tradesmen
trades
master_tradesmen_trades (joins the previous two together in a many-to-many relationship). The 'trade_id' and 'master_tradesman_id' are the foreign keys.
Here is what I need to happen. A user performs a search and types in a trade. I need a query that displays all of the information from the master_tradesmen table whose trade in the master_tradesmen_trade table matches the search. For example, if 'plumbing' is typed in the search bar (trade_id 1), all of the columns for Steve Albertsen (master_tradesman_id 6) and Brian Terry (master_tradesman_id 8) would be displayed from the master_tradesmen table. As a beginner to SQL, trying to grasp the logic of this is about to make my head explode. I'm hoping that someone with more advanced SQL knowledge can wrap their head around this much easier than I can. Note: the 'trades' column in master_tradesmen is for display purposes only, not for querying. Thank you so much in advance!
You have a catalog for the tradesmen, & another catalog for the trades.
The trades should only appear once in the trades catalog in order to make your DB more consistent
Then you have your many-to-many table which connects the trades & master tradesmen tables.
If we want to get the tradesmen according to the given trade in the input, we should first
know the id of that trade which has to be unique, so in your DB you would have something
like the img. below :
Now we can make a query to select the id of trade :
DECLARE #id_trade int = SELECT trade_id FROM trades WHERE trade_name LIKE '%plumbing%'
Once we know the trading id, we can redirect to the 'master_tradesmen_trades' table to know the name of the people how work that trade :
SELECT * FROM master_tradesmen_trades WHERE trade_id = #id_trade
You will get the following result :
You may say, 'But there is still something wrong with it, as i am still not able to see the tradesmen', this is the moment when we make an inner join:
SELECT * FROM master_tradesmen_trades trades_and_tradesmen
INNER JOIN master_tradesman tradesmen
ON tradesmen.id = trades_and_tradesmen.master_tradesmen_id
WHERE trade_id = #id_trade
IF you need to see specific columns, you can do :
SELECT first_name, last_name, city, state FROM master_tradesmen_trades trades_and_tradesmen
INNER JOIN master_tradesman tradesmen
ON tradesmen.id = trades_and_tradesmen.master_tradesmen_id
WHERE trade_id = #id_trade

Need help wrapping head around joins

I have a database of a service that helps people sell things. If they fail a delivery of a sale, they get penalised. I am trying to extract the number of active listings each user had when a particular penalty was applied.
I have the equivalent to the following tables(and relevant fields):
user (id)
listing (id, user_id, status)
transaction (listing_id, seller_id)
listing_history (id, listing_status, date_created)
penalty (id, transaction_id, user_id, date_created)
The listing_history table saves an entry every time a listing is modified, saving a record of what the new state of the listing is.
My goal is to end with a result table with the field: penalty_id, and number of active listings the penalised user had when the penalty was applied.
So far I have the following:
SELECT s1.penalty_id,
COUNT(s1.record_id) 'active_listings'
FROM (
SELECT penalty.id AS 'penalty_id',
listing_history.id AS 'record_id',
FROM user
JOIN penalty ON penalty.user_id = user.id
JOIN transaction ON transaction.id = penalty.transaction_id
JOIN listing_history ON listing_history.listing_id = listing.id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
) s1
GROUP BY s1.penalty_id
Status = 0 means that the listing is active (or that the listing was active at the time the record was created). I got results similar to what I expected, but I fear I may be missing something or may be doing the JOINs wrong. Would this have your approval? (apart from the obvious non-use of aliases, for clarity problems).
UPDATE - As the comments on this answer indicate that changing the table structure isn't an option, here are more details on some queries you could use with the existing structure.
Note that I made a couple changes to the query before even modifying the logic.
As viki888 pointed out, there was a problem reference to listing.id; I've replaced it.
There was no real need for a subquery in the original query; I've simplified it out.
So the original query is rewritten as
SELECT penalty.id AS 'penalty_id'
, COUNT(listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
GROUP BY penalty.id
Now the most natural way, in my opinion, to write the corrected timeline constraint is with a NOT EXISTS condition that filters out all but the most recent listing_history record for a given id. This does require thinking about some edge cases:
Could two listing history records have the same create date? If so, how do you decide which happened first?
If a listing history record is created on the same day as the penalty, which is treated as happening first?
If the created_date is really a timestamp, then this may not matter much (if at all); if it's really a date, it might be a bigger issue. Since your original query required that the listing history be created before the penalty, I'll continue in that style; but it's still ambiguous how to handle the case where two history records with matching status have the same date. You may need to adjust the date comparisons to get the desired behavior.
SELECT penalty.id AS 'penalty_id'
, COUNT(DISTINCT listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
AND NOT EXISTS (SELECT 1
FROM listing_history h2
WHERE listing_history.date_created < h2.date_created
AND h2.date_created < penalty.date_created
AND h2.id = listing_history.id)
GROUP BY penalty.id
Note that I switched from COUNT(...) to COUNT(DISTINCT ...); this helps with some edge cases where two active records for the same listing might be counted.
If you change the date comparisons to use <= instead of < - or, equivalently, if you use BETWEEN to combine the date comparisons - then you'd want to add AND h2.status != 0 (or AND h2.status <> 0, depending on your database) to the subquery so that two concurrent ACTIVE records don't cancel each other out.
There are several equivalent ways to write this, and unfortunately its the kind of query that doesn't always cooperate with a database query optimizer so some trial and error may be necessary to make it run well with large data volumes. Hopefully that gives enough insight into the intended logic that you could work out some equivalents if need be. You could consider using NOT IN instead of NOT EXISTS; or you could use an outer join to a second instance of LISTING_HISTORY... There are probably others I'm not thinking of off hand.
I don't know that we're in a position to sign off on a general statement that the query is, or is not, "correct". If there's a specific question about whether a query will include/exclude a record in a specific situation (or why it does/doesn't, or how to modify it so it won't/will), those might get more complete answers.
I can say that there are a couple likely issues:
The only glaring logic issue has to do with timeline management, which is something that causes a lot of trouble with SQL. The issue is, while your query demonstrates that the listing was active at some point before the penalty creation date, it doesn't demonstrate that the listing was still active on the penalty creation date. Consider
PENALTY
id transaction date
1 10 2016-02-01
TRANSACTION
id listing_id
10 100
LISTING_HISTORY
listing_id status date
100 0 2016-01-01
100 1 2016-01-15
The joins would create a single record, and the count for penalty 1 would include listing 100 even though its status had changed to something other than 0 before the penalty was created.
This is hard - but not impossible - to fix with your existing table structure. You could add a NOT EXISTS condition looking for another LISTING_HISTORY record matching the ID with a date between the first LISTING_HISTORY date and the PENALTY date, for one.
It would be more efficient to add an end date to the LISTING_HISTORY date, but that may not be so easy depending on how the data is maintained.
The second potential issue is the COUNT(RECORD_ID). This may not do what you mean - what COUNT(x) may intuitively seem like it should do, is what COUNT(DISTINCT RECORD_ID) actually does. As written, if the join produces two matches with the same LISTING_HISTORY.ID value - i.e. the listing became active at two different times before the penalty - the listing would be counted twice.

Oracle SQL: Show Individual Count Even if There are Duplicates

this is my first question on here, so please forgive me if I break any rules.
Here is what I need to know:
How do I create an Oracle SQL query that will display a unique count of something even if there are duplicates in the results?
Example: a customer table has a list of purchases made by various customers. The table lists the customer ID, name, category of purchase (ie Hardware, Tools, Seasonal) ect. The outcome of the query needs to show each customer id, customer name and the category of the purchase, and a count of the individual customer. SO customer ID 1 for John Smith has made a purchase in each department. If I do a count of the customer, he will appear three times as he has made three purchases, but I also need a column to count the customer only once. The count in the other rows returned for the other departments should show a 0 or Null.
I normally achieve this by pulling everything and exporting to excel. I add a column that uses an IF formula on the ID to only show a 1 on the first occurrence of the customer IE: IF(A3=A2,0,1) (if a3 is the same as A2, show a 0, if it's not the same as A2 then show a 1). This will give me a unique count of customers for one part of the report and will still show me how many purchase the customer made in another part of the report.
I want to do this directly in the SQL query as I have a large set of data this needs to be done on, and adding any formulas in excel will make the sheet huge. This will also make it easier to host the query results in ACCESS so excel can pull it from there.
I have tried to find a solution to this for a while, but any searching on Google will usually return results on how to remove duplicates form a table or how to count the duplicates in a table.
I am sorry if this is long question, but I wanted to be through so I do not waste anyone's time on back an fourth comments (I have seen this many times on here and else where when the OP asks a very cryptic question and expects everyone to understand them without further expiation).
Using distinct can be used in a count to only count the unique values of a field.
SELECT
cust.customer_id, cust.customer_name, p.category,
count(distinct p.department_id) as total_departments,
count(*) as total_purchases
FROM customers cust
LEFT JOIN purchase_table p on (cust.customer_id = p.customer_id)
GROUP BY cust.customer_id, cust.customer_name, p.category
ORDER BY cust.customer_id;
Such method is not limited to the Oracle RDBMS.

Rolling Sum SQL

I have a pretty simple SQL requirement but wanting to know what is 'best practice' for the below scenario as I am running into a performance issue.
I have a list of teams, each week/round these teams pay a game fee. If a team doesn't pay then then they will have an outstanding balance. All team payments go into a payments table which is getting bigger and bigger. What is the best practice to return a list of teams with their current balance?
What I have at the moment:
Select teams.*, (Select SUM(amount) from payments p where p.TeamID=teams.TeamID) as teambalance
from (select TeamID, TeamName from Teams) teams
I have thought about this a lot and think that the classical advice of "don't store the same information twice" is mistaken here, or at least misinterpreted.
Think about how banks must do it. Obviously, when you want to know your current balance and you've been a customer for 20 years, they don't add up 20 years of account activity to find your current balance. In light of that, I see two ways to handle it:
Choose periods to "close" and always calculate from the last closed period. This keeps the summing relatively short. The monthly statement is probably a good such anchor. Do you have a similar natural time period or business life cycle to track with?
Work backwards, by anchoring your account history in the present. Instead of starting at 0 and adding, start at current balance and go back. This is just as valid, in my opinion, and has the added benefit that you don't have to do a thing when you want to trim old history. Store the current balance, and forget the supposed denormalization. The current balance is as true an empirical fact as the starting balance, and there is no harm in anchoring your accounts this way.
You can continue to add if you like, so long as performance is okay. But it may not be optimal.
Your current query is fine, but there is no need for the teams derived table. Unless you're using MySQL, the DBMS doesn't need this kind of "help"--though MySQL could actually be harmed by it.
select teamId,teamName,sum(amount)
from teams t join payments p on t.teamId = p.teamId
group by t.teamId, t.teamName
I have used two methods to accomplish this task - one being the method that you are currently using. The other is using cross apply.
I prefer your current method -
This may be faster than using a subquery in the SELECT clause (or a join):
select teams.TeamID, teams.teamName, team_balances.teambalance
from teams
join ( select TeamID, sum(amount) teambalance
from payments
group by TeamID
) team_balances
on team_balances.TeamID = teams.TeamID;
This will sequentially scan the payments table once, rather than doing N index scans (one per team).
denormalizing
Another option is to create add a "outstanding_balance" column to the teams table.
Create a trigger on the payments table. In the trigger, increment or decrement the outstanding_balance column in teams based on the TeamID and the invoice/payment amount, respectively.
Depending on your RDBMS you could also use a materialized view. This is similar to the trigger method, except the balance for each team would be stored in a different table.

Query accross a circle of tables

I'm curious about how I could go about getting the data I need out of a "circle" of tables.
I have 5 tables(and a few supporting ones): 3 entities joined by junction tables. So the general model is like this:
Cards have many Budgets, and Accounts have many Budgets, and Accounts have many Cards.
So my relationships make a circle, through the junction tables, form Card to Budget to Account back to Card, This structure works all fine and dandy until today when I tried to construct a query using all 5 tables, and noticed that I know of no way to avoid abiguous joins which this structure in place. I'm thinking it might have been a better idea to create AccountBudget and CardBudget tables, but since they will both define exactly the same type of data, one table seemed more efficient.
The information I'm trying to get is basically the total budget limit for all cards of a certain type, and the total budget limit for all accounts of that same type. Am I just looking at this problem wrong?
// Card Budget_Card Budget Budget_Account Account
// ------- --------- -------- -------------- ---------
// cardId------\ budgetId<---------budgetId------>budgetId -----accountId--(to Card)->
// accountId --->cardId limit accountId<------/ typeId
// (etc) typeId (etc)
// (typeId in Budget is either 1 for an account budget or 2 for a card budget.)
As you can see, it's a circle. What I'm trying to accomplish is return one row with two columns: the sum of Budget.limit for the record in Account where typeId = 1, and the sum of Budget.limit for all rows in Card belonging to Accounts of the same type.
As per suggestion, I can in fact get the data I need from a union, but it's no use to me if the data is not in two separate columns:
SELECT DISTINCTROW Sum(Budget.limit) AS SumOfLimit
FROM (Account RIGHT JOIN Card ON Account.accountId = Card.accountId)
RIGHT JOIN (Budget LEFT JOIN Budget_Card ON Budget.budgetID = Budget_Card.budgetId) ON Card.cardId = Budget_Card.cardId
GROUP BY Budget.typeId, Budget.quarterId, Account.typeId
HAVING (((Budget.typeId)=2) AND ((Budget.quarterId)=[#quarterId]) AND ((Account.typeId)=[#accountType]))
UNION SELECT DISTINCTROW Sum(Budget.limit) AS SumOfLimit
FROM Budget LEFT JOIN (Account RIGHT JOIN Budget_Account ON Account.accountId = Budget_Account.accountId) ON Budget.budgetID = Budget_Account.budgetId
GROUP BY Budget.typeId, Budget.quarterId, Account.typeId
HAVING (((Budget.typeId)=1) AND ((Budget.quarterId)=[#quarterId]) AND ((Account.typeId)=[#accountType]));
So, if I understand you correctly, you've made separate column headers with the same name, and so your data becomes skewed because the information needs to be separated? If this is the case I would suggest changing the column headers as you've proposed, or in linking two queries together. To connect the data by querying the same tagged name will combine results. If you want to designate something, it's always a good idea to create separate names for column headers.
Here is an explanation of using SQL to query multiple tables: http://www.techrepublic.com/article/sql-basics-query-multiple-tables/1050307
First make the query for the Cards, then union with the query for the Accounts
Although it would be easier to relate cards to accounts and then only have budgets for accounts, however i don't know if that would work with your schema