transactions and balance - sql

I work on contracting Company database " sql server " . I'm lost what’s the best solutions to calculate their customers balance accounts.
Balance table: create table for balance and another for transactions. So my application add any transactions to transactions table and calculate the balance according to balance table value.
Calculate balance using query: so I'll create transactions table only.
Note: the records may be up to 2 million records for each year, so I think they will need to backup it every year or some thing like that.
any new ideas or comments ?!

I would have a transactions table and a balances table as well, if I were you. Let's consider for example that you have 1 000 000 users. If a user has 20 transactions on average, then getting balance from a transaction table would be roughly 20x slower than getting balance from a balances table. Also, it is better to have something than not having that something.
So, I would choose to create a balances table without thinking twice.

Comments on your 2 ways:
Good solution if you have much more queries than updates (100 times or more). So, you add new transaction, recalculate balance and store it. You can do it in one transaction but it can take a lot of time and block user action. So, you can do it later (for example, update balances onces a minute/hour/day). Pros: fast reading. Cons: possible difference between balance value and sum of transactions or increasing user action time
Good solution if you have much more updates than reads (for example, trading system with a lot of transactions). Updating current balance can take time and may be worthless, because another transaction has already came :) so, you can calculate balance at runtime, on demand. Pros: always actual balance. Cons: calculating balance can take time.
As you see, it depends on your payload profile (reads or writes). I'll advice you to begin with second variant - it's easy to implement and good DB indexies can help you to get sum very fast (2 millions per year - not so much as it looks). But, it's up to you.

Definitely you must have a separate Balance table beside transaction table. Otherwise during read balance your performance will be slower day by day as transaction increasing and transactions will be costly as other users may lock the transaction table to read balance at the same time.

This question would seem to have a lot of opinion, and I was tempted to close it.
But, in any environment where I've been where customers have "balances", a critical part of the business is knowing the current balance for each customer. This means having a historical transaction table, a current balance amount, and an auditing process to ensure that the two are aligned.
The current balance would be maintained whenever the database is changed. The "standard" method is to use triggers. My preferred method is to encapsulate data changes in stored procedures, and have the logic for the summarization in the same procedures used to modify the transaction data.

Related

Calculating running balance from join table [SQL Database Design]

Let's say I have three tables
TRANSACTIONS
amount
date
RECORDS
amount
date
CUSTOM_RECORDS
amount
date
(Let's just say there are many other fields to justify splitting of these tables)
To calculate running balance I have two methods
-------------METHOD 1 -------------
Heavy on READ, Light on WRITE
Whenever we read, just join the table, sort by date and calculate the running balance.
PRO
Write is easy, just write into each table
CON
Reading is very heavy, the calculation needs to be done on each read.
It is very strange to be querying (from let's say a span of 1 week) and to have the calculation done for ALL the records. If I query for 10 records then calculation needs to be done for 1 million records to be able to know the 10 record balance.
-------------METHOD 2 -------------
Heavy on WRITE, Light on READ
I have another table
FINAL_TABLE
date
amount
running balance
Whenever I write, I refresh this table and calculate all the running balance again.
PRO
Read is easy, running balance already computed.
Querying between time period is as easy as extracting the date between the time span from the FINAL_TABLE
CON
Write is really slow, each write on any of the Three tables mean refreshing a whole FINAL_TABLE table!
Why didn't I just reuse the latest running balance? This can occur if the entry is a guarantee to be chronological in real life. However, sometimes entry might be added late.
Currently, I am using Method 2 and every time a client save/update a row into any of the three tables, the server freeze as it tries to refresh and compute the FINAL_TABLE. Obviously, this is not very scalable.
Method 1 is also not very scalable in term of querying. I would have to calculate running balance from the beginning of time in order to know the running balance of last week.
Both Method is not very scalable. What is a good design to ensure scalability and relatively fast performance on READ and WRITE? What method does the bank use to keep track of running balance?
It depends.
Suppose you have a report like transaction report where accounts' running balance will be shown. If you want to show real time data then always method 1 will be preferable. And I will suggest to use Quirky Update for this rather than using cursors, loops, sub-queries or recursions.
On the other hand, if you don't need real time running total then you could have use method 2 with a little customization. I will not support updating Final Table while you made a transaction. Rather than I will suggest to update it with interval schedule. Depending on your traffic or load you may update the running total after a interval.
And for real time I will discourage using method 2 as it will make your transaction costly.
To make your method 1 faster here is some link.
Calculating Running Total
Quirky Update
Quirky Update Performance
Halloween Protection
Create Table AccBalance
(
AccountNO PK,
Balance
)
Create Table AccDateWiseCumBalance
(
AccountNO PK,
SystemDate PK,
Cumulative Balance
)
First table will be updated by each transaction will keep real time balance but not any history.
Second table keep account and date wise cumulative balance which will be updated at each day end.
So if you need up to previous date cumulative balance you will retrieve data from second table.
And if you need up to current date cumulative balance you will retrieve data from second table up to day before current date and retrieve current date data from first table.

What is the best practice database design for transactions aggregation?

I am designing a database which will hold transaction level data. It will work the same way as a bank account - debits/credits to an Account Number.
What is the best / most efficient way of obtaining the aggregation of these transactions.
I was thinking about using a summary table and then adding these to a list of today's transactions in order to derive how much each account has (i.e their balance).
I want this to be scalable (ie. 1 billion transactions) so don't want to have to perform database hits to the main fact table as it will need to find all the debits/credits associated with a desired account number scanning potentially a billion rows.
Thanks, any help or resources would be awesome.
(Have been working in Banks for almost 10years. Here is how it is actually done).
TLDR: your idea is good.
Every now and then you store the balance somewhere else ("carry forward balance"). E.g. every month or so (or aver a given number of transactions). To calculate the actual balance (or any balance in the past) you accumulate all relevant transactions going back in time until the most recent balance you kept ("carry forward balance"), which you need to add, of course.
The "current" balance is not kept anywhere. Just alone for the locking problems you would have if you'd update this balance all the time. (In real banks you'll hit some bank internal accounts with almost every single transactions. There are plenty of bank internal accounts to be able to get the figures required by law. These accounts are hit very often and thus would cause locking issues when you'd like to update them with every transaction. Instead every transactions is just insert — even the carry forward balances are just inserts).
Also in real banks you have many use cases which make this approach more favourable:
Being able to get back dated balances at any time - Being able to get balances based on different dates for any time (e.g. value date vs. transaction date).
Reversals/Cancellations are a fun of it's own. Imagine to reverse a transaction from two weeks ago and still keep all of the above going.
You see, this is a long story. However, the answer to your question is: yes, you cannot accumulate an ever increasing number of transactions, you need to keep intermediate balances to limit the number to accumulate if needed. Hitting the main table for a limited number of rows, should be no issue.
Make sure your main query uses an Index-Only Scan.
Do an Object Oriented Design, Create table for objects example Account, Transaction etc. Here's a good website for your reference. But there's a lot more on the web discussing OODBMS. The reference I gave is just my basis when I started doing an OODBMS.

Select sum or updating a field for Total Balance?

I'm using Entity Framework and Azure Sql.
I have users and they have records on balance table.Some of users may have 1 million record.I need total balance of the user before every http requests.
I have two approaches for getting total balance of user:
First:
Insert balance and update totalbalance field (by itself) in a transaction.
transaction(
InsertBalance(amount)
Update CustomerSummary Set Totalbalance=Totalbalance+Amount
)
If I need total balance I'll just select this from CustomerSummary table.
Second: Inserts balance directly without using any transaction.
If I need total balance I have to get sum by query.
Is the first approach reliable for total balance ?
Can I get sum on second approach as fast as like first approach ?
The second approach is guaranteed to be accurate -- if you want the sum of a particular column, there is nothing more accurate than a query that calculates the sum.
The reason for maintaining a summary table is performance. Typically, such a table is maintained in one of two ways:
Triggers
Stored procedures that wrap all data modification operations
Your example with the insert is an "application-side" solution. The danger is that someone might come along and say that a balance is incorrect and then have the value changed directly in the database. The total doesn't get changed.
To make this work correctly, you need to have the right controls over access to the database to ensure that whenever amount changes, then all its dependencies change. Note: this is not an issue if you calculate the balance when you need it.

Best way to calculate sum depending on dates with SQL

I don't know a good way to maintain sums depending on dates in a SQL database.
Take a database with two tables:
Client
clientID
name
overdueAmount
Invoice
clientID
invoiceID
amount
dueDate
paymentDate
I need to propose a list of the clients and order it by overdue amount (sum of not paid past invoices of the client). On big database it isn't possible to calculate it in real time.
The problem is the maintenance of an overdue amount field on the client. The amount of this field can change at midnight from one day to the other even if nothing changed on the invoices of the client.
This sum changes if the invoice is paid, a new invoice is created and due date is past, a due date is now past and wasn't yesterday...
The only solution I found is to recalculate every night this field on every client by summing the invoices respecting the conditions. But it's not efficient on very big databases.
I think it's a common problem and I would like to know if a best practice exists?
You should read about data warehousing. It will help you to solve this problem. It looks similar as what you just said
"The only solution I found is to recalculate every night this field
on every client by summing the invoices respecting the conditions. But
it's not efficient on very big databases."
But it has something more than that. When you read it, try to forget about normalization. Its main intention is for 'show' data, not 'manage' data. So, you would feel weird at beginning but if you understand 'why we need data warehousing', it will be very very interesting.
This is a book that can be a good start http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247 , classic one.
Firstly, I'd like to understand what you mean by "very big databases" - most RDBMS systems running on decent hardware should be able to calculate this in real time for anything less than hundreds of millions of invoices. I speak from experience here.
Secondly, "best practice" is one of those expressions that mean very little - it's often used to present someone's opinion as being more meaningful than simply an opinion.
In my opinion, by far the best option is to calculate it on the fly.
If your database is so big that you really can't do this, I'd consider a nightly batch (as you describe). Nightly batch runs are a pain - especially for systems that need to be available 24/7, but they have the benefit of keeping all the logic in a single place.
If you want to avoid nightly batches, you can use triggers to populate an "unpaid_invoices" table. When you create a new invoice record, a trigger copies that invoice to the "unpaid_invoices" table; when you update the invoice with a payment, and the payment amount equals the outstanding amount, you delete from the unpaid_invoices table. By definition, the unpaid_invoices table should be far smaller than the total number of invoices; calculating the outstanding amount for a given customer on the fly should be okay.
However, triggers are nasty, evil things, with exotic failure modes that can stump the unsuspecting developer, so only consider this if you have a ninja SQL developer on hand. Absolutely make sure you have a SQL query which checks the validity of your unpaid_invoices table, and ideally schedule it as a regular task.

Recommendations for best SQL Performance updating and/or calculating stockonhand totals

Apologies for the length of this question.
I have a section of our database design which I am worried may begin to cause problems. It is not at that stage yet, but obviously don't want to wait until it is to resolve the issue. But before I start testing various scenarios, I would appreciate input from anyone who has experience with such a problem.
Situation is Stock Control and maintaining the StockOnHand value.
It would be possible to maintain a table hold the stock control figures which can be updated whenever a order is entered either manually or by using a database trigger.
Alternatively you can get SQL to calculate the quantities by reading and summing the actual sales values.
The program is installed on several sites some of which are using MS-SQL 2005 and some 2008.
My problem is complicated because the same design needs to cope with several scenarios,
such as :
1) Cash/Sale Point Of Sale Environment. Sale is entered and stock is reduced in one transaction. No amendments can be made to this transaction.
2) Order/Routing/Confirmation
In this environment, the order is created and can be placed on hold, released, routed, amended, delivered, and invoiced. And at any stage until it is invoiced the order can be amended. (I mention this because any database triggers may be invoked lots of time and has to determine if changes should affect the stock on hand figures)
3) Different business have a different ideas of when their StockOnHand should be reduced. For example, some consider the stock as sold once they approve an order (as they have committed to sell the goods and hence it should not be sold to another person). Others do not consider the stock as sold until they have routed it and some others only when it has been delivered or collected.
4) There can be a large variance in number of transactions per product. For example, one system has four or five products which are sold several thousand times per month, so asking SQL to perform a sum on those transactions is reading ten of thousands of transactions per year, Whereas, on the same system, there are several thousand other products where sales would only less than a thousand transactions per year per product.
5) Historical information is important. For that reason, our system does not delete or archive transactions and has several years worth of transactions.
6) The system must have the ability to warn operators if they do not have the required stock when the order is entered ( which quite often is in real time, eg telephone order).
Note that this only required for some products. (But I don't think it would be practical to sum the quantity across ten of thousands of transactions in real time).
7) Average Cost Price. Some products can be priced based on the average cost of the items in stock. The way this is implemented is that the Average Cost price is re-calculated for every goods in transaction, something like newAverageCostPrice = (((oldAverageCostPrice * oldStockOnHand) + newCostValue) / newStockOnHand) . This means the stock On Hand must be known for every goods in if the product is using AverageCost.
The way the system is currently implemented is two fold.
We have a table which holds the StockOnHand for each product and location. Whenever a sale is updated, this table is updated via the business layer of our application (C#)
This only provides the current stock on hand figures.
If you need to run a Stock Valuation for a particular date, this figure is calculated by performing a sum of the quantitys on the lines involved. This also requires a join between the sales line and the sale header tables as the quantity and product are stored in the line file and the date and status are only held in the header table.
However, there are downsides to this method, such as.
Running the stock valuation report is slow, (but not unacceptably slow), but I am not happy with it. (It works and monitoring the server does not show it overloading it, but it has the potential to cause problems and hence requires regular monitoring)
The logic of the code updating the StockOnHand table is complicated.
This table is being updated frequently. In a lot of cases this is un-necessary as the information does not need to be checked. For example, if 90% of your business is selling 4 or 5 products, you don't really need a computer to tell you are out of stock.
Database trigggers.
I have never implemented complicated triggers before, so am wary of this.
For example, as stated before we need configuration options to determine the conditions when the stock figures should be updated. This is currently read once and cached in our program. To do this inside a trigger would persumably mean reading this information for every trigger. Does this have a big impact on performance.
Also we may need a trigger on the sale header and the sale line. (This could mean that an amendment to the sale header would be forced to read the lines and update the stockonhand for the relevant products, and then later on the lines are saved and another database trigger would amend the stockonahand table again which may be in-efficient.
Another alternative would be to only update the StockOnHand table whenever the transaction is invoiced (which means no further amendments can be done) and to provide a function to calculate the stockonhand figure based on a union of this table and the un-invoiced transactions which affect stock.
Any advice would be greatly appreciated.
First of I would strongly recommend you add "StockOnHand", "ReservedStock" and "SoldStock"
to your table.
A cash sale would immediatly Subtract the sale from "StockOnHand" and add it to "SoldStock", for an order you would leave "StockOnHand" alone and merely add the sale to ReservedStock, when the stock is finally invoiced you substract the sale from StockOnHand and Reserved stock and add it to "SoldStock".
The business users can then choose whether StockOnHand is just that or StockOnHand - ReservedStock.
Using a maintaind StockOnHand figure will reduce your query times massively, versus the small risk that the figure can go out of kilter if you mess up your program logic.
If your customers are so lucky enough to experience update contention when maintaining the StockOnHand figure (i.e. are they likely to process more than five sales a second at peak times) then you can consisider the following scheme:-
Overnight calculate the StockOnHand figure by counting deliveries - sales or whatever.
When a sale is confirmed insert a row to a "Todays Sales" table.
When you need to query stock on hand total up todays sale and subtract it from the start of day figure.
You could also place a "Stock Check Threshold" on each product so if you start the day with 10,000 widgets you can set the CheckThreshold to 100 if someone is ordering less than 100 than dont bother checking the stock. If someone orders over 100 then check the stock and recalculate a new lower threshold.
Could you create a view (or views) to respresent your stock on hand? This would take the responsibility for doing the calculations out of synchronous triggers which slow down your transactions. Using multiple views could satisfy the requirement "Different business have a different ideas of when their StockOnHand should be reduced." Assuming you can meet the stringent requirements, creating an indexed view could further improve your performance.
Just some ideas off the top of my head:
Instead of a trigger (and persistent SOH data), you could use a computed column (e.g. SOH per product per store). However, the performance impact of evaluating this would likely be abysmal unless there are >> more writes to your source tables than reads from your computed column. (The trade off is that is assuming the only reason you calculate the SOH is so that you can read it again. If you update the source data for the calc much more often than you actually need to read it, then the computed col might make sense - since it is JIT evaluation only when needed. This would be unusual though - reads are usually more frequent than writes in most Systems)
I'm guessing that the reason you are looking at triggers is because the source tables for the SOH figures are updated from a large number of procs / code in order to prevent oversight (as opposed to a calling a recalc SPROC from every applicable point where the source data has been modified?)
IMHO placing complicated in DB triggers is not advised, as this will adversely affect the performance of high volume inserts / updates, and triggers aren't great for maintainability.
Does the SOH calculation need to be real time? If not, you could implement a mechanism to queue requests for recalculation (e.g. by using a trigger to indicate that a product / location balance is dirty) and then run a recalculation service every few minutes for near real-time. Mission critical calculations (e.g. financial - like your #6) could still however detect that a SOH calc is dirty and then force a recalc before doing a transaction.
Re : 3 - Ouch. Would recommend that internally you agree on a consistent (and industry acceptable) set of terminology (Stock In Hand, Stock Committed, Stock in Transit, Shrinkage etc etc) and then try to convince your customers to conform to a standard. But that is in the ideal world of course!