Rolling Sum SQL - sql

I have a pretty simple SQL requirement but wanting to know what is 'best practice' for the below scenario as I am running into a performance issue.
I have a list of teams, each week/round these teams pay a game fee. If a team doesn't pay then then they will have an outstanding balance. All team payments go into a payments table which is getting bigger and bigger. What is the best practice to return a list of teams with their current balance?
What I have at the moment:
Select teams.*, (Select SUM(amount) from payments p where p.TeamID=teams.TeamID) as teambalance
from (select TeamID, TeamName from Teams) teams

I have thought about this a lot and think that the classical advice of "don't store the same information twice" is mistaken here, or at least misinterpreted.
Think about how banks must do it. Obviously, when you want to know your current balance and you've been a customer for 20 years, they don't add up 20 years of account activity to find your current balance. In light of that, I see two ways to handle it:
Choose periods to "close" and always calculate from the last closed period. This keeps the summing relatively short. The monthly statement is probably a good such anchor. Do you have a similar natural time period or business life cycle to track with?
Work backwards, by anchoring your account history in the present. Instead of starting at 0 and adding, start at current balance and go back. This is just as valid, in my opinion, and has the added benefit that you don't have to do a thing when you want to trim old history. Store the current balance, and forget the supposed denormalization. The current balance is as true an empirical fact as the starting balance, and there is no harm in anchoring your accounts this way.
You can continue to add if you like, so long as performance is okay. But it may not be optimal.
Your current query is fine, but there is no need for the teams derived table. Unless you're using MySQL, the DBMS doesn't need this kind of "help"--though MySQL could actually be harmed by it.

select teamId,teamName,sum(amount)
from teams t join payments p on t.teamId = p.teamId
group by t.teamId, t.teamName

I have used two methods to accomplish this task - one being the method that you are currently using. The other is using cross apply.
I prefer your current method -

This may be faster than using a subquery in the SELECT clause (or a join):
select teams.TeamID, teams.teamName, team_balances.teambalance
from teams
join ( select TeamID, sum(amount) teambalance
from payments
group by TeamID
) team_balances
on team_balances.TeamID = teams.TeamID;
This will sequentially scan the payments table once, rather than doing N index scans (one per team).
denormalizing
Another option is to create add a "outstanding_balance" column to the teams table.
Create a trigger on the payments table. In the trigger, increment or decrement the outstanding_balance column in teams based on the TeamID and the invoice/payment amount, respectively.
Depending on your RDBMS you could also use a materialized view. This is similar to the trigger method, except the balance for each team would be stored in a different table.

Related

Need help wrapping head around joins

I have a database of a service that helps people sell things. If they fail a delivery of a sale, they get penalised. I am trying to extract the number of active listings each user had when a particular penalty was applied.
I have the equivalent to the following tables(and relevant fields):
user (id)
listing (id, user_id, status)
transaction (listing_id, seller_id)
listing_history (id, listing_status, date_created)
penalty (id, transaction_id, user_id, date_created)
The listing_history table saves an entry every time a listing is modified, saving a record of what the new state of the listing is.
My goal is to end with a result table with the field: penalty_id, and number of active listings the penalised user had when the penalty was applied.
So far I have the following:
SELECT s1.penalty_id,
COUNT(s1.record_id) 'active_listings'
FROM (
SELECT penalty.id AS 'penalty_id',
listing_history.id AS 'record_id',
FROM user
JOIN penalty ON penalty.user_id = user.id
JOIN transaction ON transaction.id = penalty.transaction_id
JOIN listing_history ON listing_history.listing_id = listing.id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
) s1
GROUP BY s1.penalty_id
Status = 0 means that the listing is active (or that the listing was active at the time the record was created). I got results similar to what I expected, but I fear I may be missing something or may be doing the JOINs wrong. Would this have your approval? (apart from the obvious non-use of aliases, for clarity problems).
UPDATE - As the comments on this answer indicate that changing the table structure isn't an option, here are more details on some queries you could use with the existing structure.
Note that I made a couple changes to the query before even modifying the logic.
As viki888 pointed out, there was a problem reference to listing.id; I've replaced it.
There was no real need for a subquery in the original query; I've simplified it out.
So the original query is rewritten as
SELECT penalty.id AS 'penalty_id'
, COUNT(listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
GROUP BY penalty.id
Now the most natural way, in my opinion, to write the corrected timeline constraint is with a NOT EXISTS condition that filters out all but the most recent listing_history record for a given id. This does require thinking about some edge cases:
Could two listing history records have the same create date? If so, how do you decide which happened first?
If a listing history record is created on the same day as the penalty, which is treated as happening first?
If the created_date is really a timestamp, then this may not matter much (if at all); if it's really a date, it might be a bigger issue. Since your original query required that the listing history be created before the penalty, I'll continue in that style; but it's still ambiguous how to handle the case where two history records with matching status have the same date. You may need to adjust the date comparisons to get the desired behavior.
SELECT penalty.id AS 'penalty_id'
, COUNT(DISTINCT listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
AND NOT EXISTS (SELECT 1
FROM listing_history h2
WHERE listing_history.date_created < h2.date_created
AND h2.date_created < penalty.date_created
AND h2.id = listing_history.id)
GROUP BY penalty.id
Note that I switched from COUNT(...) to COUNT(DISTINCT ...); this helps with some edge cases where two active records for the same listing might be counted.
If you change the date comparisons to use <= instead of < - or, equivalently, if you use BETWEEN to combine the date comparisons - then you'd want to add AND h2.status != 0 (or AND h2.status <> 0, depending on your database) to the subquery so that two concurrent ACTIVE records don't cancel each other out.
There are several equivalent ways to write this, and unfortunately its the kind of query that doesn't always cooperate with a database query optimizer so some trial and error may be necessary to make it run well with large data volumes. Hopefully that gives enough insight into the intended logic that you could work out some equivalents if need be. You could consider using NOT IN instead of NOT EXISTS; or you could use an outer join to a second instance of LISTING_HISTORY... There are probably others I'm not thinking of off hand.
I don't know that we're in a position to sign off on a general statement that the query is, or is not, "correct". If there's a specific question about whether a query will include/exclude a record in a specific situation (or why it does/doesn't, or how to modify it so it won't/will), those might get more complete answers.
I can say that there are a couple likely issues:
The only glaring logic issue has to do with timeline management, which is something that causes a lot of trouble with SQL. The issue is, while your query demonstrates that the listing was active at some point before the penalty creation date, it doesn't demonstrate that the listing was still active on the penalty creation date. Consider
PENALTY
id transaction date
1 10 2016-02-01
TRANSACTION
id listing_id
10 100
LISTING_HISTORY
listing_id status date
100 0 2016-01-01
100 1 2016-01-15
The joins would create a single record, and the count for penalty 1 would include listing 100 even though its status had changed to something other than 0 before the penalty was created.
This is hard - but not impossible - to fix with your existing table structure. You could add a NOT EXISTS condition looking for another LISTING_HISTORY record matching the ID with a date between the first LISTING_HISTORY date and the PENALTY date, for one.
It would be more efficient to add an end date to the LISTING_HISTORY date, but that may not be so easy depending on how the data is maintained.
The second potential issue is the COUNT(RECORD_ID). This may not do what you mean - what COUNT(x) may intuitively seem like it should do, is what COUNT(DISTINCT RECORD_ID) actually does. As written, if the join produces two matches with the same LISTING_HISTORY.ID value - i.e. the listing became active at two different times before the penalty - the listing would be counted twice.

SQL Payment Duplication Detection

Dealing with a very large database of a firm - the database houses a record of all the payments to and from the firm to the suppliers. I want to find out whether duplicate/ triplicate payments have been made to the suppliers over a certain period of time. The only way I can interact with the database is by using a plugin via Excel and some SQL coding. Any suggestions on how I may go about doing this?
Completion of this task isn't urgent. Just want to start somewhere and hopefully develop this question over time.
SELECT suppliername,date, COUNT(*) AS Nb
FROM yourtable
where date >= 'put your date'
GROUP BY suppliername,date
HAVING ( COUNT(*) > 1 )

MS Access 2010 query pulls same records multiple times, sql challenge

I'm currently working on a program that keeps track of my company's stock inventory, using ms Access 2010. I'm having a hard time getting the query, intended to show inventory, to display the information I want. The problem seems to be that the query pulls the same record multiple times, inflating the sums of reserved and sold product.
Background:
My company stocks steel bars. We offer to cut the bars into pieces. From an inventory side, We want to track the length of each bar, from the moment it comes in to the warehouse, through it's time in the warehouse (where it might get cut into smaller pieces), until the entire bar is sold and gone.
Database:
The query giving problems, is consulting 3 tables;
Barstock (with the following fields)
BatchNumber (all the bars recieved, beloning to the same production heat)
BarNo (the individual bar)
Orginial Length (the length of the bar when recieved at the stock
(BatchNumber and BarNo combined, is the primary key)
Sales
ID (primary key)
BatchNumber
BarNo
Quantity Sold
Reservation (a seller kan reserve some material, when a customer signals interest, but needs time to decide)
ID (Primary key)
BatchNumber
BarNo
Quantity reserved
I'd like to pull information from the three tables into one list, that displays:
-Barstock.orginial length As Received
- Sales.Quantity sold As Sold
- Recieved - Sold As On Stock
- reservation.Quantity Reserved As Reserved
- On Stock - Reserved As Available.
The problem is that I suck at sql. I've looked into union and inner join to the best of my ability, but my efforts have been in vain. I usually rely on the design view to produce the Sql statements I need. With design view, I've come up with the following Sql:
SELECT
BarStock.BatchNo
, BarStock.BarNo
, First(BarStock.OrgLength) AS Recieved
, Sum(Sales.QtySold) AS SumAvQtySold
, [Recieved]-[SumAvQtySold] AS [On Stock]
, Sum(Reservation.QtyReserved) AS Reserved
, ([On Stock]-[Reserved])*[Skjemaer]![Inventory]![unitvalg] AS Available
FROM
(BarStock
INNER JOIN Reservation ON (BarStock.BarNo = Reservation.BarNo) AND (BarStock.BatchNo = Reservation.BatchNo)
)
INNER JOIN Sales ON (BarStock.BarNo = Sales.BarNo) AND (BarStock.BatchNo = Sales.BatchNo)
GROUP BY
BarStock.BatchNo
, BarStock.BarNo
I know that the query is pulling the same record multiple times because;
- when I remove the GROUP BY term, I get several records that are exactley the same.
- There are however, only one instance of these records in the corresponding tables.
I hope I've been able to explain myself properly, please ask if I need to elaborate on anything.
Thank you for taking the time to look at my problem!
!!! Checking some assumptions
From your database schema, it seems that:
There could be multiple Sales records for a given BatchNumber/BarNo (for instance, I can imagine that multiple customers may have bought subsections of the same bar).
There could be multiple Reservation records for a given BatchNumber/BarNo (for instance, multiple sections of the same bar could be 'reserved')
To check if you do indeed have multiple records in those tables, try something like:
SELECT CountOfDuplicates
FROM (SELECT COUNT(*) AS CountOfDuplicates
FROM Sales
GROUP BY BatchNumber & "," & BarNo)
WHERE CountOfDuplicates > 1
If the query returns some records, then there are duplicates and it's probably why your query is returning incorrect values.
Starting from scratch
Now, the trick to your make your query work is to really think about what is the main data you want to show, and start from that:
You basically want a list of all bars in the stock.
Some of these bars may have been sold, or they may be reserved, but if they are not, you should show the Quantity available in Stock. Your current query would never show you that.
For each bar in stock, you want to list the quantity sold and the quantity reserved, and combined them to find out the quantity remaining available.
So it's clear, your central data is the list of bars in stock.
Rather than try to pull everything into a single large query straight away, it's best to create simple queries for each of those goals and make sure we get the proper data in each case.
Just the Bars
From what you explain, each individual bar is recorded in the BarStock table.
As I said in my comment, from what I understand, all bars that are delivered have a single record in the BarStock table, without duplicates. So your main list against which your inventory should be measured is the BarStock table:
SELECT BatchNumber,
BarNo,
OrgLength
FROM BarStock
Just the Sales
Again, this should be pretty straightforward: we just need to find out how much total length was sold for each BatchNumber/BarNo pair:
SELECT BatchNumber,
BarNo,
Sum(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber, BarNo
Just the Reservations
Same as for Sales:
SELECT BatchNumber,
BarNo,
SUM(QtyReserved) AS Reserved
FROM Reservation
GROUP BY BatchNumber, BarNo
Original Stock against Sales
Now, we should be able to combine the first 2 queries into one. I'm not trying to optimise, just to make the data work together:
SELECT BarStock.BatchNumber,
BarStock.BarNo,
BarStock.OrgLength,
S.SumAvQtySold,
(BarStock.OrgLength - Nz(S.SumAvQtySold)) AS OnStock
FROM BarStock
LEFT JOIN (SELECT BatchNumber,
BarNo,
Sum(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber, BarNo) AS S
ON (BarStock.BatchNumber = S.BatchNumber) AND (BarStock.BarNo = S.BarNo)
We do a LEFT JOIN because there might be bars in stock that have not yet been sold.
If we did an INNER JOIN, we wold have missed these in the final report, leading us to believe that these bars were never there in the first place.
All together
We can now wrap the whole query in another LEFT JOIN against the reserved bars to get our final result:
SELECT BS.BatchNumber,
BS.BarNo,
BS.OrgLength,
BS.SumAvQtySold,
BS.OnStock,
R.Reserved,
(OnStock - Nz(Reserved)) AS Available
FROM (SELECT BarStock.BatchNumber,
BarStock.BarNo,
BarStock.OrgLength,
S.SumAvQtySold,
(BarStock.OrgLength - Nz(S.SumAvQtySold)) AS OnStock
FROM BarStock
LEFT JOIN (SELECT BatchNumber,
BarNo,
SUM(QtySold) AS SumAvQtySold
FROM Sales
GROUP BY BatchNumber,
BarNo) AS S
ON (BarStock.BatchNumber = S.BatchNumber) AND (BarStock.BarNo = S.BarNo)) AS BS
LEFT JOIN (SELECT BatchNumber,
BarNo,
SUM(QtyReserved) AS Reserved
FROM Reservation
GROUP BY BatchNumber,
BarNo) AS R
ON (BS.BatchNumber = R.BatchNumber) AND (BS.BarNo = R.BarNo)
Note the use of Nz() for items that are on the right side of the join: if there is no Sales or Reservation data for a given BatchNumber/BarNo pair, the values for SumAvQtySold and Reserved will be Null and will render OnStock and Available null as well, regardless of the actual quantity in stock, which would not be the result we expect.
Using the Query designer in Access, you would have had to create the 3 queries separately and then combine them.
Note though that the Query Designed isn't very good at dealing with multiple LEFT and RIGHT joins, so I don't think you could have written the whole thing in one go.
Some comments
I believe you should read the information that #Remou gave you in his comments.
To me, there are some unfortunate design choices for this database: getting basic stock data should be as easy as s simple SUM() on the column that hold inventory records.
Usually, a simple way to track inventory is to keep track of each stock transaction:
Incoming stock records have a + Quantity
Outgoing stock records have a - Quantity
The record should also keep track of the part/item/bar reference (or ID), the date and time of the transaction, and -if you want to manage multiple warehouses- which warehouse ID is involved.
So if you need to know the complete stock at hand for all items, all you need to do is something like:
SELECT BarID,
Sum(Quantity)
FROM StockTransaction
GROUP BY BarID
In your case, while BatchNumber/BarNo is your natural key, keeping them in a separate Bar table would have some advantages:
You can use Bar.ID to get back the Bar.BatchNumber and Bar.BarNo anywhere you need it.
You can use BarID as a foreign key in your BarStock, Sales and Reservation tables. It makes joins easier without having to mess with the complexities of compound keys.
There are things that Access allows that are not really good practice, such as spaces in table names and fields, which end up making things less readable (at least because you need to keep them between []), less consistent with VBA variable names that represent these fields, and incompatible with other database that don't accept anything other than alphanumerical characters for table and field names (should you wish to up-size later or interface your database with other apps).
Also, help yourself by sticking to a single naming convention, and keep it consistent:
Do not mix upper and lower case inconsistently: either use CamelCase, or lower case or UPPER case for everything, but always keep to that rule.
Name tables in the singular -or the plural-, but stay consistent. I prefer to use the singular, like table Part instead of Parts, but it's just a convention (that has its own reasons).
Spell correctly: it's Received not Recieved. That mistake alone may cost you when debugging why some query or VBA code doesn't work, just because someone made a typo.
Each table should/must have an ID column. Usually, this will be an auto-increment that guarantees uniqueness of each record in the table. If you keep that convention, then foreign keys become easy to guess and to read and you never have to worry about some business requirement changing the fact that you could suddenly find yourself with 2 identical BatchNumbers, for some reason you can't fathom right now.
There are lots of debates about database design, but there are certain 'rules' that everyone agrees with, so my recommendation should be to strive for:
Simplicity: make sure that each table records one kind of data, and does not contain redundant data from other tables (normalisation).
Consistency: naming conventions are important. Whatever you choose, stick to it throughout your project.
Clarity: make sure that you-in-3-years and other people can easily read the table names and fields and understand what they are without having to read a 300 page specification. It's not always possible to be that clear, but it's something to strive for.

Using VBA to get the sum of values based on criteria from other tables?

I need to find the sum of the prices of a number of products, however the prices are stored in a different table to products that need pricing.
But, there is a catch, it needs to select these items based on criteria from a third table too.
So, I need the sum of the price of all products in Table 1 where CutID in Table 2 = 001.
Table 1 and Table 2 are linked on SCID, one to many respectively.
If this makes no sense tell me and I will try to clarify?
Thanks,
Bob P
Based on your question, I don't think there's a need for VBA. Excel formulas should be sufficient.
Add a few columns to your primary table. In these columns, use vlookup() to get all your information in one place, including the criteria.
If you only need to sum based on one criteria, use sumif(). If there's multiple criteria, use sumproduct().
Generally, with Access, I initially try to work with something as close a possible to a standard SQL query for ease of maintenance and portability. This ran for me in Access 2010:
SELECT Products.ProductID, Sum(Prices.Price) AS PriceSum
FROM Prices INNER JOIN (Critera INNER JOIN Products ON Critera.SCID = Products.SCID) ON Prices.ProductID = Products.ProductID
WHERE Critera.CutID="001"
GROUP BY Products.ProductID;
Please let us know if that works with your data (I'm not sure of your column names, either).

Query accross a circle of tables

I'm curious about how I could go about getting the data I need out of a "circle" of tables.
I have 5 tables(and a few supporting ones): 3 entities joined by junction tables. So the general model is like this:
Cards have many Budgets, and Accounts have many Budgets, and Accounts have many Cards.
So my relationships make a circle, through the junction tables, form Card to Budget to Account back to Card, This structure works all fine and dandy until today when I tried to construct a query using all 5 tables, and noticed that I know of no way to avoid abiguous joins which this structure in place. I'm thinking it might have been a better idea to create AccountBudget and CardBudget tables, but since they will both define exactly the same type of data, one table seemed more efficient.
The information I'm trying to get is basically the total budget limit for all cards of a certain type, and the total budget limit for all accounts of that same type. Am I just looking at this problem wrong?
// Card Budget_Card Budget Budget_Account Account
// ------- --------- -------- -------------- ---------
// cardId------\ budgetId<---------budgetId------>budgetId -----accountId--(to Card)->
// accountId --->cardId limit accountId<------/ typeId
// (etc) typeId (etc)
// (typeId in Budget is either 1 for an account budget or 2 for a card budget.)
As you can see, it's a circle. What I'm trying to accomplish is return one row with two columns: the sum of Budget.limit for the record in Account where typeId = 1, and the sum of Budget.limit for all rows in Card belonging to Accounts of the same type.
As per suggestion, I can in fact get the data I need from a union, but it's no use to me if the data is not in two separate columns:
SELECT DISTINCTROW Sum(Budget.limit) AS SumOfLimit
FROM (Account RIGHT JOIN Card ON Account.accountId = Card.accountId)
RIGHT JOIN (Budget LEFT JOIN Budget_Card ON Budget.budgetID = Budget_Card.budgetId) ON Card.cardId = Budget_Card.cardId
GROUP BY Budget.typeId, Budget.quarterId, Account.typeId
HAVING (((Budget.typeId)=2) AND ((Budget.quarterId)=[#quarterId]) AND ((Account.typeId)=[#accountType]))
UNION SELECT DISTINCTROW Sum(Budget.limit) AS SumOfLimit
FROM Budget LEFT JOIN (Account RIGHT JOIN Budget_Account ON Account.accountId = Budget_Account.accountId) ON Budget.budgetID = Budget_Account.budgetId
GROUP BY Budget.typeId, Budget.quarterId, Account.typeId
HAVING (((Budget.typeId)=1) AND ((Budget.quarterId)=[#quarterId]) AND ((Account.typeId)=[#accountType]));
So, if I understand you correctly, you've made separate column headers with the same name, and so your data becomes skewed because the information needs to be separated? If this is the case I would suggest changing the column headers as you've proposed, or in linking two queries together. To connect the data by querying the same tagged name will combine results. If you want to designate something, it's always a good idea to create separate names for column headers.
Here is an explanation of using SQL to query multiple tables: http://www.techrepublic.com/article/sql-basics-query-multiple-tables/1050307
First make the query for the Cards, then union with the query for the Accounts
Although it would be easier to relate cards to accounts and then only have budgets for accounts, however i don't know if that would work with your schema