SQL Complex calculation and updating bulk records

SQL Complex calculation and updating bulk records - sql

Consider the below scenario,
I am having
Table A,
Table B and
Table C.
In these above tables needs to get updated their records as per calculation formula in respective columns on regular time interval.
(For example 24 hours once.)
So I tried in stored procedure and made in job scheduler by writing the queries to update the records.
When I am executing the procedure , it is taking 15 to 20 minutes to execute and complete SQL Server management studio getting hanged because of 5000 records in each table.
So I splitted the updation process into 3 seperate procedures and noted the time for execution as 1 minute for each procedure based on count of records.
Even After optimization of the procedure,
What kind of transaction needs to provide for updating the data for avoiding exceptions?
Is there any other possibility to retain the server from timeout issue?
My DB Plan Information:
I am developing Rental systems product.
In that Room Information will be available in a table 1.
Rented Customer Information will be available in table 2.
Rented room and its tenant invoice will be available in table 3
Rented room and its tenant receipt will be available in table 4
Step 1 : Need to calculate total due amount for all tenant, Dues in days,Dues in months for each invoice in a while loop for updating in the table records.
Step 2 : Need to calculate total late fee for each invoice on regular basis.
(Note: Late fee will vary based on per day calculation basis for same invoice.)
So I am storing into table 5 as common table for all leases with their respective calculated value on daily basis.
For that I have created a common stored procedure for calculating all data and Based on user defined function, It will update into table 5.
I am fetching from table 5 to all my reports and grid.
Problem statement:
Even I have optimized the procedure, When I am executing it, It is taking around 150 seconds approximately.
Question : How can I implement transaction for this action and How to make it in proper way by avoiding any deadlocks and other's data in the same table has to get accessed at the same time ?

you can go to the below link to change the timeout session for your SQL server management studio
https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-remote-query-timeout-server-configuration-option
And I suggest you keep your drive space free as much as you as can by compressing your ldf and mdf files.

if you are using cursor i have this suggestion
try change it to join (if its possible)
if you are using "transaction" and its out of your cursor and if its not damage your data try to bring it into the cursor and commit in cursor not after ending your query

Related

Oracle: Data consistency across multiple tables to be displayed

I have 3 reports based on 3 different tables, which ideally should match each other in audit.
They are updated sequentially once in a day.
The problem here is when one of the table is updated and second one is in progress, the customer sees data discrepancy between the reports for some time.
We tried the solution where in we commit after all 3 tables are updated but we started having issue on undo tbsp. The application have many other things running on.
I am looking for a solution where in we can restrict the user to show data to a specific point, and he must see updated data only after all 3 tables are refreshed/updated.

I think you can use select * for update for all 3 tables befor start updating procedure.
In that case users can select data and will see only not changed data till update session will not finish and make commit.

You can use a flashback query to show data as-of a point in time:
select * from table1 as of timestamp timestamp '2021-12-10 12:00:00';
The application would need to determine the latest time when the tables were synchronized - perhaps with a log table that records when the update process last started. However, the flashback query also uses the UNDO tablespace. But the query should at least use less UNDO since some of the committed transactions will now free up some space.

Optimization and performance issue

I have a Dashboard to display data from stored procedure,
Stored procedure contains calculations for data to be display in dashboard, I am getting an performance issue while executing the stored procedure, so I decided to run the SP in background and decided to dump data in a physical table, after that i can directly fetch data from this table, but again millions data again coming there I will get performance I am not getting a way to solve this kindly help me with this.

The problem lies in the amount of data the dashboard is trying to process.
Since it's okay for you to dump the output on a physical table, simply create an aggregate version of that table. For example instead of having millions of records, you can group by country, department, employee, etc then dump the output in a physical table instead. Usually we group the transactions into per day, or in other worlds 1 row per transaction day or GROUP BY CAST(transaction_date AS VARCHAR(12)).
Better yet, if it is possible, modify the stored procedure to return only a few rows of data that is already aggregated.
At least in the place we work in, we call that "reporting tables" and it only contains few thousand rows that drive the dashboards. So we have an SP.. let's say "usp_Report" that is used by the dashboard. It does two things (1) update the "reporting table" in aggregate form (2) return the data found in the "reporting table". The update the data portion only happens per day/hour so we program this change frequency control within the stored procedure.

Calculating running balance from join table [SQL Database Design]

Let's say I have three tables
TRANSACTIONS
amount
date
RECORDS
amount
date
CUSTOM_RECORDS
amount
date
(Let's just say there are many other fields to justify splitting of these tables)
To calculate running balance I have two methods
-------------METHOD 1 -------------
Heavy on READ, Light on WRITE
Whenever we read, just join the table, sort by date and calculate the running balance.
PRO
Write is easy, just write into each table
CON
Reading is very heavy, the calculation needs to be done on each read.
It is very strange to be querying (from let's say a span of 1 week) and to have the calculation done for ALL the records. If I query for 10 records then calculation needs to be done for 1 million records to be able to know the 10 record balance.
-------------METHOD 2 -------------
Heavy on WRITE, Light on READ
I have another table
FINAL_TABLE
date
amount
running balance
Whenever I write, I refresh this table and calculate all the running balance again.
PRO
Read is easy, running balance already computed.
Querying between time period is as easy as extracting the date between the time span from the FINAL_TABLE
CON
Write is really slow, each write on any of the Three tables mean refreshing a whole FINAL_TABLE table!
Why didn't I just reuse the latest running balance? This can occur if the entry is a guarantee to be chronological in real life. However, sometimes entry might be added late.
Currently, I am using Method 2 and every time a client save/update a row into any of the three tables, the server freeze as it tries to refresh and compute the FINAL_TABLE. Obviously, this is not very scalable.
Method 1 is also not very scalable in term of querying. I would have to calculate running balance from the beginning of time in order to know the running balance of last week.
Both Method is not very scalable. What is a good design to ensure scalability and relatively fast performance on READ and WRITE? What method does the bank use to keep track of running balance?

It depends.
Suppose you have a report like transaction report where accounts' running balance will be shown. If you want to show real time data then always method 1 will be preferable. And I will suggest to use Quirky Update for this rather than using cursors, loops, sub-queries or recursions.
On the other hand, if you don't need real time running total then you could have use method 2 with a little customization. I will not support updating Final Table while you made a transaction. Rather than I will suggest to update it with interval schedule. Depending on your traffic or load you may update the running total after a interval.
And for real time I will discourage using method 2 as it will make your transaction costly.
To make your method 1 faster here is some link.
Calculating Running Total
Quirky Update
Quirky Update Performance
Halloween Protection
Create Table AccBalance
(
AccountNO PK,
Balance
)
Create Table AccDateWiseCumBalance
(
AccountNO PK,
SystemDate PK,
Cumulative Balance
)
First table will be updated by each transaction will keep real time balance but not any history.
Second table keep account and date wise cumulative balance which will be updated at each day end.
So if you need up to previous date cumulative balance you will retrieve data from second table.
And if you need up to current date cumulative balance you will retrieve data from second table up to day before current date and retrieve current date data from first table.

SQL Server : Update reporting table in real time

For one of our applications we have huge data in multiple tables and every time a user does something new record is inserted in to these tables. There is a reporting screen where we have to do calculations from these tables and show the total from these tables
For example: Assume two parent tables Employee and Attendance
Employee table has 100,000 records and Attendance table has data for each day whenever a employee goes and comes out of their building. The records in Attendance table is more 2 million for one year. I need to calculate the attendance for each employee (Total) and display it on screen for all 100,000 records and it is paginated based on employee name. The caluclation takes too much time and it spikes the DB CPU.
To avoid runtime calculation for the total Im planning to have a separate table with total calculated values for each employee and just query the table and show it whenever needed. But the problem is for previous years the data is not going to change but for the current year the data will be generated whenever the employee records attendance day to day. What is the best option for me to keep the table updated in real time with Total for every employee whenever new attendance is recorded for the current year.
I thought of using triggers but triggers are synchronous and it should affect the performance of my reporting application when ever I query or it will affect the performance of inserts into parent table.
Please let me know if there are any better ways to update my Total value table in real time without impacting the performance of insert or update to parent tables

This is a perfect case for indexed views. Certainly, the core of your query is a group by such as:
select EmployeeID, count(*)
from AttendanceRecords
group by EmployeeID
Index that view. It's contents will then be available cheaply and updated in real time. There is zero potential for out-of-sync data.

One option would be to use SQL Change Tracking:
https://msdn.microsoft.com/en-us/bb933875.aspx
This is not change data capture (which can be quite heavy) - change tracking just lets you know which keys changed so you can act on it. With that information, you could have a regular job that collects those changes and updates your summaries.
...or, if you can use SQL 2014, you could get into Updatable Column Stores and dispense with the summaries.

Would you consider exporting data from previous years and using it to create the total attendance counts for employees in earlier years?
You say you're moving towards essentially having a table acting as a counter at the moment, so by ensuring your old data conforms to this model as well it'll be much easier to write and maintain the code that interacts with it and server load from any individual query should be minimal.

SQL Is it possible to setup a column that will contain a value dependent on another column?

I have a table (A) that lists all bundles created off a machine in a day. It lists the date created and the weight of the bundle. I have an ID column, a date column, and a weight column. I also have a table (B) that holds the details related to that machine for the day. In that table (B), I want a column that lists a sum of weights from the other table (A) that the dates match on. So if the machine runs 30 bundles in a day, I'll have 30 rows in table (A) all dated the same day. In table (B) I'll have 1 row detailing other information about the machine for the day plus the column that holds the total bundle weight created for the day.
Is there a way to make the total column in table (B) automatically adjust itself whenever a row is added to table (A)? Is this possible to do in the table schema itself rather than in an SQL statement each time a bundle is added? If it's not, what sort of SQL statement do I need?
Wes

It would be a mistake to do so unless you have performance problems that require it.
A better approach is to define a view in the database that will aggregate the daily bundles by machine:
CREATE VIEW MachineDailyTotals
(MachineID, RunDate, BundleCount, TotalWeight)
AS SELECT MachineID, RunDate, COUNT(*), SUM(WeightCol)
FROM BundleListTable
GROUP BY MachineID, RunDate
This will allow you to always see the correct, updated total weight per machine per day without imposing any load on the database until you actually look at the data. You can perform a simple OUTER JOIN with the machine table to get information about the machine, including the daily total info, without having to actually store the totals anywhere.

If you need the sum (or other aggregate) in real time, add a trigger on table A for INSERT, UPDATE, DELETE which calculates the sum to be stored in B.
Otherwise, add a daily job which calculates the sums.
Please specify which database you are using.

Are you sure that you don't want to pull this information dynamically rather than storing it in a separate table? This seems like an indirect violation of Normalization rules in that you'll be storing the same information in two different places. With a dynamic query, you'll always be sure that the derived information will be correct without having to worry about the coding and maintenance of triggers.
Of course, if you are dealing with large amounts of data and query times are becoming an issue, you may want the shortcut of a summary table. But, in general, I'd advise against it.

This can be accomplished via Triggers which are little bits of code that execute whenever a certain action (insert/update/delete) happens on a table. The syntax is varies by vendor (MySQL vs. Oracle) but the language is typically the same language you would write a stored procedure in.
If you mention the DB type I can help with the actual syntax

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas