thousands of db tables VS one huge table - sql

I am trying to develop an application that keeps track of daily stock data. (Each day a new record is created for every stock). There will be around 5000-10000 stock tracked. Then I need to analyze every day, month or other period some stock data, and keep it.
My question is this: Is it better to have an activity table for each stock that will keep the daily activity (each day a new row) or is it smarter to have one huge table that is inserted with 10,000 records everyday for all the stocks? Keep in mind that I need to do batch calculations every day for every stock (calculating moving averages and stuff).

One table. You might want to partition it by stock ID.
Automatic table creation is almost always a bad idea.

Generally you can query single table faster in comparison to joins and multiple queries.

Related

Creating a SQL totals table

What I am trying to accomplish is a SQL table that contains several different totals based off of 5 other tables. This would be so that when my application needs those totals, it would not need to perform the sum since it is a rather large query.
I would like to know if there is a recommended method to have a totals table that constantly updates based on changes made in other tables. I have thought of replacing it with an indexed view or having triggers on each of the tables that are being summed, but it seems inefficient to rerun the sum query every time a field is updated. One other thing I thought of would be to have a trigger on update and every time the data changes, I would just add or remove the difference from the stored total. My end goal is to have some totals that are constantly up to date.
The table is showing totals per product. (e.g. total qty from table1 + total qty from table2)
If this is too general, I can give more specifics about table structure.
Add a trigger to tables in question and check for the only the relevant value changing rather than run the sum each time a field that is irrelevant to the total on the computed table is modified.
I ended up putting these in a queue when the underlying data was changed, and using a scheduled task to update the totals at a regular interval. We decided the tradeoff in data freshness was worth not having to recalculate the total with every transaction.

Calculating running balance from join table [SQL Database Design]

Let's say I have three tables
TRANSACTIONS
amount
date
RECORDS
amount
date
CUSTOM_RECORDS
amount
date
(Let's just say there are many other fields to justify splitting of these tables)
To calculate running balance I have two methods
-------------METHOD 1 -------------
Heavy on READ, Light on WRITE
Whenever we read, just join the table, sort by date and calculate the running balance.
PRO
Write is easy, just write into each table
CON
Reading is very heavy, the calculation needs to be done on each read.
It is very strange to be querying (from let's say a span of 1 week) and to have the calculation done for ALL the records. If I query for 10 records then calculation needs to be done for 1 million records to be able to know the 10 record balance.
-------------METHOD 2 -------------
Heavy on WRITE, Light on READ
I have another table
FINAL_TABLE
date
amount
running balance
Whenever I write, I refresh this table and calculate all the running balance again.
PRO
Read is easy, running balance already computed.
Querying between time period is as easy as extracting the date between the time span from the FINAL_TABLE
CON
Write is really slow, each write on any of the Three tables mean refreshing a whole FINAL_TABLE table!
Why didn't I just reuse the latest running balance? This can occur if the entry is a guarantee to be chronological in real life. However, sometimes entry might be added late.
Currently, I am using Method 2 and every time a client save/update a row into any of the three tables, the server freeze as it tries to refresh and compute the FINAL_TABLE. Obviously, this is not very scalable.
Method 1 is also not very scalable in term of querying. I would have to calculate running balance from the beginning of time in order to know the running balance of last week.
Both Method is not very scalable. What is a good design to ensure scalability and relatively fast performance on READ and WRITE? What method does the bank use to keep track of running balance?
It depends.
Suppose you have a report like transaction report where accounts' running balance will be shown. If you want to show real time data then always method 1 will be preferable. And I will suggest to use Quirky Update for this rather than using cursors, loops, sub-queries or recursions.
On the other hand, if you don't need real time running total then you could have use method 2 with a little customization. I will not support updating Final Table while you made a transaction. Rather than I will suggest to update it with interval schedule. Depending on your traffic or load you may update the running total after a interval.
And for real time I will discourage using method 2 as it will make your transaction costly.
To make your method 1 faster here is some link.
Calculating Running Total
Quirky Update
Quirky Update Performance
Halloween Protection
Create Table AccBalance
(
AccountNO PK,
Balance
)
Create Table AccDateWiseCumBalance
(
AccountNO PK,
SystemDate PK,
Cumulative Balance
)
First table will be updated by each transaction will keep real time balance but not any history.
Second table keep account and date wise cumulative balance which will be updated at each day end.
So if you need up to previous date cumulative balance you will retrieve data from second table.
And if you need up to current date cumulative balance you will retrieve data from second table up to day before current date and retrieve current date data from first table.

SQL Server : Update reporting table in real time

For one of our applications we have huge data in multiple tables and every time a user does something new record is inserted in to these tables. There is a reporting screen where we have to do calculations from these tables and show the total from these tables
For example: Assume two parent tables Employee and Attendance
Employee table has 100,000 records and Attendance table has data for each day whenever a employee goes and comes out of their building. The records in Attendance table is more 2 million for one year. I need to calculate the attendance for each employee (Total) and display it on screen for all 100,000 records and it is paginated based on employee name. The caluclation takes too much time and it spikes the DB CPU.
To avoid runtime calculation for the total Im planning to have a separate table with total calculated values for each employee and just query the table and show it whenever needed. But the problem is for previous years the data is not going to change but for the current year the data will be generated whenever the employee records attendance day to day. What is the best option for me to keep the table updated in real time with Total for every employee whenever new attendance is recorded for the current year.
I thought of using triggers but triggers are synchronous and it should affect the performance of my reporting application when ever I query or it will affect the performance of inserts into parent table.
Please let me know if there are any better ways to update my Total value table in real time without impacting the performance of insert or update to parent tables
This is a perfect case for indexed views. Certainly, the core of your query is a group by such as:
select EmployeeID, count(*)
from AttendanceRecords
group by EmployeeID
Index that view. It's contents will then be available cheaply and updated in real time. There is zero potential for out-of-sync data.
One option would be to use SQL Change Tracking:
https://msdn.microsoft.com/en-us/bb933875.aspx
This is not change data capture (which can be quite heavy) - change tracking just lets you know which keys changed so you can act on it. With that information, you could have a regular job that collects those changes and updates your summaries.
...or, if you can use SQL 2014, you could get into Updatable Column Stores and dispense with the summaries.
Would you consider exporting data from previous years and using it to create the total attendance counts for employees in earlier years?
You say you're moving towards essentially having a table acting as a counter at the moment, so by ensuring your old data conforms to this model as well it'll be much easier to write and maintain the code that interacts with it and server load from any individual query should be minimal.

How to implement the history by weekly, monthly, yearly

I wrote an system to record every trade,including amount,customer,date,...
but now I need to implement a function to show the history of recent 1 month, recent 3 month,...,
what's the best practice to implement on RoR
Should I create another table to record an monthly data, weekly data ?
or just re-calculate all the history, when the user do the select query ? But I thought it may has bad performance using this method.
If your data is only ever inserted, then aggregation tables are pretty easy to manage. Deletions and updates get a bit trickier.
Don't forget that calendar weekly (ie. Mon-Sun) data doesn't aggregate to months.
One great advantage of maintaining summary tables is that you can effectively index on summary data, so finding all stocks with a monthly trade volume greater than a particular quantity becomes practically instantaneous, and if you have that kind of requirement then I'd definitely go down that route.
Better use act_as_versioned for history functionality.It's easy track the history of the record using act_as_versioned.

Azure Table Storage - Calculate or Persist Totals

I'm looking into using Table Storage for storing some transactional data, however, I need to support some very high level reporting over it, basically totals per day / month.
Couple of options I have though of:
Use a partition / row key structure and dynamically perform sum
e.g. 20101101_ITEMID_XXXXXXXX (x = guid or time, to make unique)
then I would query for a months data using a portion of the row key (ITEMID_201011), and to a total on the "Cost" property in the type.
How would the query limit of 1000 records be managed by this though? (i.e. if there are more than 1000 transactions for the day, totaling would be hard)
Use another record to store the total for the day, and update this as new records are added
e.g. row key "20101101_ITEMID_TOTAL"
then query off this for the days totals, or months, or years totals.
What is the best way to do this? Is there a 'best practice' for this type of requirement using table storage?
I'm not sure what is the best practice but I can comment that we have a similar situation with AzureWatch and are definitely using pre-aggregated values in tables.
Mostly for performance reasons -- table storage is not instantaneous even if you query by single partition-key and a range in row-key. The time it takes to download the records is somewhat significant and depending on the records might spike the CPU up, because it needs to de-serialize the data into objects. If you get to travel to the table storage multiple times because of the 1000 record limit, you'll be paying more as well.
Some other thoughts to consider:
Will your aggregated totals ever change? If no, the this is another nudge toward pre-aggregation
Will you need to keep aggregated values after raw data is gone or will you ever need to purge raw data? If yes, then it is another nudge toward pre-aggregation