Calculating running balance from join table [SQL Database Design] - sql

Let's say I have three tables
TRANSACTIONS
amount
date
RECORDS
amount
date
CUSTOM_RECORDS
amount
date
(Let's just say there are many other fields to justify splitting of these tables)
To calculate running balance I have two methods
-------------METHOD 1 -------------
Heavy on READ, Light on WRITE
Whenever we read, just join the table, sort by date and calculate the running balance.
PRO
Write is easy, just write into each table
CON
Reading is very heavy, the calculation needs to be done on each read.
It is very strange to be querying (from let's say a span of 1 week) and to have the calculation done for ALL the records. If I query for 10 records then calculation needs to be done for 1 million records to be able to know the 10 record balance.
-------------METHOD 2 -------------
Heavy on WRITE, Light on READ
I have another table
FINAL_TABLE
date
amount
running balance
Whenever I write, I refresh this table and calculate all the running balance again.
PRO
Read is easy, running balance already computed.
Querying between time period is as easy as extracting the date between the time span from the FINAL_TABLE
CON
Write is really slow, each write on any of the Three tables mean refreshing a whole FINAL_TABLE table!
Why didn't I just reuse the latest running balance? This can occur if the entry is a guarantee to be chronological in real life. However, sometimes entry might be added late.
Currently, I am using Method 2 and every time a client save/update a row into any of the three tables, the server freeze as it tries to refresh and compute the FINAL_TABLE. Obviously, this is not very scalable.
Method 1 is also not very scalable in term of querying. I would have to calculate running balance from the beginning of time in order to know the running balance of last week.
Both Method is not very scalable. What is a good design to ensure scalability and relatively fast performance on READ and WRITE? What method does the bank use to keep track of running balance?

It depends.
Suppose you have a report like transaction report where accounts' running balance will be shown. If you want to show real time data then always method 1 will be preferable. And I will suggest to use Quirky Update for this rather than using cursors, loops, sub-queries or recursions.
On the other hand, if you don't need real time running total then you could have use method 2 with a little customization. I will not support updating Final Table while you made a transaction. Rather than I will suggest to update it with interval schedule. Depending on your traffic or load you may update the running total after a interval.
And for real time I will discourage using method 2 as it will make your transaction costly.
To make your method 1 faster here is some link.
Calculating Running Total
Quirky Update
Quirky Update Performance
Halloween Protection
Create Table AccBalance
(
AccountNO PK,
Balance
)
Create Table AccDateWiseCumBalance
(
AccountNO PK,
SystemDate PK,
Cumulative Balance
)
First table will be updated by each transaction will keep real time balance but not any history.
Second table keep account and date wise cumulative balance which will be updated at each day end.
So if you need up to previous date cumulative balance you will retrieve data from second table.
And if you need up to current date cumulative balance you will retrieve data from second table up to day before current date and retrieve current date data from first table.

Related

Creating a SQL totals table

What I am trying to accomplish is a SQL table that contains several different totals based off of 5 other tables. This would be so that when my application needs those totals, it would not need to perform the sum since it is a rather large query.
I would like to know if there is a recommended method to have a totals table that constantly updates based on changes made in other tables. I have thought of replacing it with an indexed view or having triggers on each of the tables that are being summed, but it seems inefficient to rerun the sum query every time a field is updated. One other thing I thought of would be to have a trigger on update and every time the data changes, I would just add or remove the difference from the stored total. My end goal is to have some totals that are constantly up to date.
The table is showing totals per product. (e.g. total qty from table1 + total qty from table2)
If this is too general, I can give more specifics about table structure.
Add a trigger to tables in question and check for the only the relevant value changing rather than run the sum each time a field that is irrelevant to the total on the computed table is modified.
I ended up putting these in a queue when the underlying data was changed, and using a scheduled task to update the totals at a regular interval. We decided the tradeoff in data freshness was worth not having to recalculate the total with every transaction.

transactions and balance

I work on contracting Company database " sql server " . I'm lost what’s the best solutions to calculate their customers balance accounts.
Balance table: create table for balance and another for transactions. So my application add any transactions to transactions table and calculate the balance according to balance table value.
Calculate balance using query: so I'll create transactions table only.
Note: the records may be up to 2 million records for each year, so I think they will need to backup it every year or some thing like that.
any new ideas or comments ?!
I would have a transactions table and a balances table as well, if I were you. Let's consider for example that you have 1 000 000 users. If a user has 20 transactions on average, then getting balance from a transaction table would be roughly 20x slower than getting balance from a balances table. Also, it is better to have something than not having that something.
So, I would choose to create a balances table without thinking twice.
Comments on your 2 ways:
Good solution if you have much more queries than updates (100 times or more). So, you add new transaction, recalculate balance and store it. You can do it in one transaction but it can take a lot of time and block user action. So, you can do it later (for example, update balances onces a minute/hour/day). Pros: fast reading. Cons: possible difference between balance value and sum of transactions or increasing user action time
Good solution if you have much more updates than reads (for example, trading system with a lot of transactions). Updating current balance can take time and may be worthless, because another transaction has already came :) so, you can calculate balance at runtime, on demand. Pros: always actual balance. Cons: calculating balance can take time.
As you see, it depends on your payload profile (reads or writes). I'll advice you to begin with second variant - it's easy to implement and good DB indexies can help you to get sum very fast (2 millions per year - not so much as it looks). But, it's up to you.
Definitely you must have a separate Balance table beside transaction table. Otherwise during read balance your performance will be slower day by day as transaction increasing and transactions will be costly as other users may lock the transaction table to read balance at the same time.
This question would seem to have a lot of opinion, and I was tempted to close it.
But, in any environment where I've been where customers have "balances", a critical part of the business is knowing the current balance for each customer. This means having a historical transaction table, a current balance amount, and an auditing process to ensure that the two are aligned.
The current balance would be maintained whenever the database is changed. The "standard" method is to use triggers. My preferred method is to encapsulate data changes in stored procedures, and have the logic for the summarization in the same procedures used to modify the transaction data.

Select sum or updating a field for Total Balance?

I'm using Entity Framework and Azure Sql.
I have users and they have records on balance table.Some of users may have 1 million record.I need total balance of the user before every http requests.
I have two approaches for getting total balance of user:
First:
Insert balance and update totalbalance field (by itself) in a transaction.
transaction(
InsertBalance(amount)
Update CustomerSummary Set Totalbalance=Totalbalance+Amount
)
If I need total balance I'll just select this from CustomerSummary table.
Second: Inserts balance directly without using any transaction.
If I need total balance I have to get sum by query.
Is the first approach reliable for total balance ?
Can I get sum on second approach as fast as like first approach ?
The second approach is guaranteed to be accurate -- if you want the sum of a particular column, there is nothing more accurate than a query that calculates the sum.
The reason for maintaining a summary table is performance. Typically, such a table is maintained in one of two ways:
Triggers
Stored procedures that wrap all data modification operations
Your example with the insert is an "application-side" solution. The danger is that someone might come along and say that a balance is incorrect and then have the value changed directly in the database. The total doesn't get changed.
To make this work correctly, you need to have the right controls over access to the database to ensure that whenever amount changes, then all its dependencies change. Note: this is not an issue if you calculate the balance when you need it.

SQL Server : Update reporting table in real time

For one of our applications we have huge data in multiple tables and every time a user does something new record is inserted in to these tables. There is a reporting screen where we have to do calculations from these tables and show the total from these tables
For example: Assume two parent tables Employee and Attendance
Employee table has 100,000 records and Attendance table has data for each day whenever a employee goes and comes out of their building. The records in Attendance table is more 2 million for one year. I need to calculate the attendance for each employee (Total) and display it on screen for all 100,000 records and it is paginated based on employee name. The caluclation takes too much time and it spikes the DB CPU.
To avoid runtime calculation for the total Im planning to have a separate table with total calculated values for each employee and just query the table and show it whenever needed. But the problem is for previous years the data is not going to change but for the current year the data will be generated whenever the employee records attendance day to day. What is the best option for me to keep the table updated in real time with Total for every employee whenever new attendance is recorded for the current year.
I thought of using triggers but triggers are synchronous and it should affect the performance of my reporting application when ever I query or it will affect the performance of inserts into parent table.
Please let me know if there are any better ways to update my Total value table in real time without impacting the performance of insert or update to parent tables
This is a perfect case for indexed views. Certainly, the core of your query is a group by such as:
select EmployeeID, count(*)
from AttendanceRecords
group by EmployeeID
Index that view. It's contents will then be available cheaply and updated in real time. There is zero potential for out-of-sync data.
One option would be to use SQL Change Tracking:
https://msdn.microsoft.com/en-us/bb933875.aspx
This is not change data capture (which can be quite heavy) - change tracking just lets you know which keys changed so you can act on it. With that information, you could have a regular job that collects those changes and updates your summaries.
...or, if you can use SQL 2014, you could get into Updatable Column Stores and dispense with the summaries.
Would you consider exporting data from previous years and using it to create the total attendance counts for employees in earlier years?
You say you're moving towards essentially having a table acting as a counter at the moment, so by ensuring your old data conforms to this model as well it'll be much easier to write and maintain the code that interacts with it and server load from any individual query should be minimal.

thousands of db tables VS one huge table

I am trying to develop an application that keeps track of daily stock data. (Each day a new record is created for every stock). There will be around 5000-10000 stock tracked. Then I need to analyze every day, month or other period some stock data, and keep it.
My question is this: Is it better to have an activity table for each stock that will keep the daily activity (each day a new row) or is it smarter to have one huge table that is inserted with 10,000 records everyday for all the stocks? Keep in mind that I need to do batch calculations every day for every stock (calculating moving averages and stuff).
One table. You might want to partition it by stock ID.
Automatic table creation is almost always a bad idea.
Generally you can query single table faster in comparison to joins and multiple queries.