Select sum or updating a field for Total Balance? - sql

I'm using Entity Framework and Azure Sql.
I have users and they have records on balance table.Some of users may have 1 million record.I need total balance of the user before every http requests.
I have two approaches for getting total balance of user:
First:
Insert balance and update totalbalance field (by itself) in a transaction.
transaction(
InsertBalance(amount)
Update CustomerSummary Set Totalbalance=Totalbalance+Amount
)
If I need total balance I'll just select this from CustomerSummary table.
Second: Inserts balance directly without using any transaction.
If I need total balance I have to get sum by query.
Is the first approach reliable for total balance ?
Can I get sum on second approach as fast as like first approach ?

The second approach is guaranteed to be accurate -- if you want the sum of a particular column, there is nothing more accurate than a query that calculates the sum.
The reason for maintaining a summary table is performance. Typically, such a table is maintained in one of two ways:
Triggers
Stored procedures that wrap all data modification operations
Your example with the insert is an "application-side" solution. The danger is that someone might come along and say that a balance is incorrect and then have the value changed directly in the database. The total doesn't get changed.
To make this work correctly, you need to have the right controls over access to the database to ensure that whenever amount changes, then all its dependencies change. Note: this is not an issue if you calculate the balance when you need it.

Related

Creating a SQL totals table

What I am trying to accomplish is a SQL table that contains several different totals based off of 5 other tables. This would be so that when my application needs those totals, it would not need to perform the sum since it is a rather large query.
I would like to know if there is a recommended method to have a totals table that constantly updates based on changes made in other tables. I have thought of replacing it with an indexed view or having triggers on each of the tables that are being summed, but it seems inefficient to rerun the sum query every time a field is updated. One other thing I thought of would be to have a trigger on update and every time the data changes, I would just add or remove the difference from the stored total. My end goal is to have some totals that are constantly up to date.
The table is showing totals per product. (e.g. total qty from table1 + total qty from table2)
If this is too general, I can give more specifics about table structure.
Add a trigger to tables in question and check for the only the relevant value changing rather than run the sum each time a field that is irrelevant to the total on the computed table is modified.
I ended up putting these in a queue when the underlying data was changed, and using a scheduled task to update the totals at a regular interval. We decided the tradeoff in data freshness was worth not having to recalculate the total with every transaction.

Calculating running balance from join table [SQL Database Design]

Let's say I have three tables
TRANSACTIONS
amount
date
RECORDS
amount
date
CUSTOM_RECORDS
amount
date
(Let's just say there are many other fields to justify splitting of these tables)
To calculate running balance I have two methods
-------------METHOD 1 -------------
Heavy on READ, Light on WRITE
Whenever we read, just join the table, sort by date and calculate the running balance.
PRO
Write is easy, just write into each table
CON
Reading is very heavy, the calculation needs to be done on each read.
It is very strange to be querying (from let's say a span of 1 week) and to have the calculation done for ALL the records. If I query for 10 records then calculation needs to be done for 1 million records to be able to know the 10 record balance.
-------------METHOD 2 -------------
Heavy on WRITE, Light on READ
I have another table
FINAL_TABLE
date
amount
running balance
Whenever I write, I refresh this table and calculate all the running balance again.
PRO
Read is easy, running balance already computed.
Querying between time period is as easy as extracting the date between the time span from the FINAL_TABLE
CON
Write is really slow, each write on any of the Three tables mean refreshing a whole FINAL_TABLE table!
Why didn't I just reuse the latest running balance? This can occur if the entry is a guarantee to be chronological in real life. However, sometimes entry might be added late.
Currently, I am using Method 2 and every time a client save/update a row into any of the three tables, the server freeze as it tries to refresh and compute the FINAL_TABLE. Obviously, this is not very scalable.
Method 1 is also not very scalable in term of querying. I would have to calculate running balance from the beginning of time in order to know the running balance of last week.
Both Method is not very scalable. What is a good design to ensure scalability and relatively fast performance on READ and WRITE? What method does the bank use to keep track of running balance?
It depends.
Suppose you have a report like transaction report where accounts' running balance will be shown. If you want to show real time data then always method 1 will be preferable. And I will suggest to use Quirky Update for this rather than using cursors, loops, sub-queries or recursions.
On the other hand, if you don't need real time running total then you could have use method 2 with a little customization. I will not support updating Final Table while you made a transaction. Rather than I will suggest to update it with interval schedule. Depending on your traffic or load you may update the running total after a interval.
And for real time I will discourage using method 2 as it will make your transaction costly.
To make your method 1 faster here is some link.
Calculating Running Total
Quirky Update
Quirky Update Performance
Halloween Protection
Create Table AccBalance
(
AccountNO PK,
Balance
)
Create Table AccDateWiseCumBalance
(
AccountNO PK,
SystemDate PK,
Cumulative Balance
)
First table will be updated by each transaction will keep real time balance but not any history.
Second table keep account and date wise cumulative balance which will be updated at each day end.
So if you need up to previous date cumulative balance you will retrieve data from second table.
And if you need up to current date cumulative balance you will retrieve data from second table up to day before current date and retrieve current date data from first table.

transactions and balance

I work on contracting Company database " sql server " . I'm lost what’s the best solutions to calculate their customers balance accounts.
Balance table: create table for balance and another for transactions. So my application add any transactions to transactions table and calculate the balance according to balance table value.
Calculate balance using query: so I'll create transactions table only.
Note: the records may be up to 2 million records for each year, so I think they will need to backup it every year or some thing like that.
any new ideas or comments ?!
I would have a transactions table and a balances table as well, if I were you. Let's consider for example that you have 1 000 000 users. If a user has 20 transactions on average, then getting balance from a transaction table would be roughly 20x slower than getting balance from a balances table. Also, it is better to have something than not having that something.
So, I would choose to create a balances table without thinking twice.
Comments on your 2 ways:
Good solution if you have much more queries than updates (100 times or more). So, you add new transaction, recalculate balance and store it. You can do it in one transaction but it can take a lot of time and block user action. So, you can do it later (for example, update balances onces a minute/hour/day). Pros: fast reading. Cons: possible difference between balance value and sum of transactions or increasing user action time
Good solution if you have much more updates than reads (for example, trading system with a lot of transactions). Updating current balance can take time and may be worthless, because another transaction has already came :) so, you can calculate balance at runtime, on demand. Pros: always actual balance. Cons: calculating balance can take time.
As you see, it depends on your payload profile (reads or writes). I'll advice you to begin with second variant - it's easy to implement and good DB indexies can help you to get sum very fast (2 millions per year - not so much as it looks). But, it's up to you.
Definitely you must have a separate Balance table beside transaction table. Otherwise during read balance your performance will be slower day by day as transaction increasing and transactions will be costly as other users may lock the transaction table to read balance at the same time.
This question would seem to have a lot of opinion, and I was tempted to close it.
But, in any environment where I've been where customers have "balances", a critical part of the business is knowing the current balance for each customer. This means having a historical transaction table, a current balance amount, and an auditing process to ensure that the two are aligned.
The current balance would be maintained whenever the database is changed. The "standard" method is to use triggers. My preferred method is to encapsulate data changes in stored procedures, and have the logic for the summarization in the same procedures used to modify the transaction data.

Azure Table Storage - Calculate or Persist Totals

I'm looking into using Table Storage for storing some transactional data, however, I need to support some very high level reporting over it, basically totals per day / month.
Couple of options I have though of:
Use a partition / row key structure and dynamically perform sum
e.g. 20101101_ITEMID_XXXXXXXX (x = guid or time, to make unique)
then I would query for a months data using a portion of the row key (ITEMID_201011), and to a total on the "Cost" property in the type.
How would the query limit of 1000 records be managed by this though? (i.e. if there are more than 1000 transactions for the day, totaling would be hard)
Use another record to store the total for the day, and update this as new records are added
e.g. row key "20101101_ITEMID_TOTAL"
then query off this for the days totals, or months, or years totals.
What is the best way to do this? Is there a 'best practice' for this type of requirement using table storage?
I'm not sure what is the best practice but I can comment that we have a similar situation with AzureWatch and are definitely using pre-aggregated values in tables.
Mostly for performance reasons -- table storage is not instantaneous even if you query by single partition-key and a range in row-key. The time it takes to download the records is somewhat significant and depending on the records might spike the CPU up, because it needs to de-serialize the data into objects. If you get to travel to the table storage multiple times because of the 1000 record limit, you'll be paying more as well.
Some other thoughts to consider:
Will your aggregated totals ever change? If no, the this is another nudge toward pre-aggregation
Will you need to keep aggregated values after raw data is gone or will you ever need to purge raw data? If yes, then it is another nudge toward pre-aggregation

SQL Is it possible to setup a column that will contain a value dependent on another column?

I have a table (A) that lists all bundles created off a machine in a day. It lists the date created and the weight of the bundle. I have an ID column, a date column, and a weight column. I also have a table (B) that holds the details related to that machine for the day. In that table (B), I want a column that lists a sum of weights from the other table (A) that the dates match on. So if the machine runs 30 bundles in a day, I'll have 30 rows in table (A) all dated the same day. In table (B) I'll have 1 row detailing other information about the machine for the day plus the column that holds the total bundle weight created for the day.
Is there a way to make the total column in table (B) automatically adjust itself whenever a row is added to table (A)? Is this possible to do in the table schema itself rather than in an SQL statement each time a bundle is added? If it's not, what sort of SQL statement do I need?
Wes
It would be a mistake to do so unless you have performance problems that require it.
A better approach is to define a view in the database that will aggregate the daily bundles by machine:
CREATE VIEW MachineDailyTotals
(MachineID, RunDate, BundleCount, TotalWeight)
AS SELECT MachineID, RunDate, COUNT(*), SUM(WeightCol)
FROM BundleListTable
GROUP BY MachineID, RunDate
This will allow you to always see the correct, updated total weight per machine per day without imposing any load on the database until you actually look at the data. You can perform a simple OUTER JOIN with the machine table to get information about the machine, including the daily total info, without having to actually store the totals anywhere.
If you need the sum (or other aggregate) in real time, add a trigger on table A for INSERT, UPDATE, DELETE which calculates the sum to be stored in B.
Otherwise, add a daily job which calculates the sums.
Please specify which database you are using.
Are you sure that you don't want to pull this information dynamically rather than storing it in a separate table? This seems like an indirect violation of Normalization rules in that you'll be storing the same information in two different places. With a dynamic query, you'll always be sure that the derived information will be correct without having to worry about the coding and maintenance of triggers.
Of course, if you are dealing with large amounts of data and query times are becoming an issue, you may want the shortcut of a summary table. But, in general, I'd advise against it.
This can be accomplished via Triggers which are little bits of code that execute whenever a certain action (insert/update/delete) happens on a table. The syntax is varies by vendor (MySQL vs. Oracle) but the language is typically the same language you would write a stored procedure in.
If you mention the DB type I can help with the actual syntax