I'm experimenting with a personal finance application, and I'm thinking about what approach to take to update running balances when entering a transaction in an account.
Currently the way I'm using involves retrieving all records more recent than the inserted/modified one, and go one by one incrementing their running balance.
For example, given the following transactions:
t1 date = 2008-10-21, amount = 500, running balance = 1000
t2 date = 2008-10-22, amount = 300, running balance = 1300
t3 date = 2008-10-23, amount = 100, running balance = 1400
...
Now suppose I insert a transaction between t1 and t2, then t2 and all subsequent transactions would need their running balances adjusted.
Hehe, now that I wrote this question, I think I know the answer... so I'll leave it here in case it helps someone else (or maybe there's even a better approach?)
First, I get the running balance from the previous transaction, in this case, t1. Then I update all following transactions (which would include the new one):
UPDATE transactions
SET running_balance = running_balance + <AMOUNT>
WHERE date > <t1.date>
The only issue I see is that now instead of storing only a date, I'll have to store a time too. Although, what would happen if two transactions had the exact same date/time?
PS: I'd prefer solutions not involving propietary features, as I'm using both PostgreSQL and SQLite... Although a Postgre-only solution would be helpful too.
Some sort of Identity / Auto-increment columnn in there would be wise as well, purely for the transaction order if anything.
Also in addition to just the date of the transaction, a date that the transaction is inserted into the database (not always the same) would be wise / helpful as well.
These sort of things simply help you arrange things in the system and make it easier to change things i.e. for transactions, at a later time.
I think this might work:
I was using both the date and the id to order the transactions, but now I'm going to store both the date and the id on one column, and use that for ordering. So, using comparisons (like >) should always work as expected, right? (as opposed to the situation I describe earlier where two columns have the exact datetime (however unlikely that'd be).
If you have a large volume of transactions, then you are better off storing the running balance date-wise or even week/month-wise in a separate table.
This was if you are inserting rows for the same date you just need to change the running balance in one row.
The querying and reporting will be more trickier as using this running balance you would need to arrive at balances after each transaction, it would be more like taking the last days running balance and adding or subtracting the transaction value.
Related
for a metering project I use a simple SQL table in the following format
ID
Timestamp: dat_Time
Metervalue: int_Counts
Meterpoint: fk_MetPoint
While this works nicely in general I have not found an efficient solution for one specific problem: There is one Meterpoint which is a submeter of another Meterpoint. I'd be interested in the Delta of those two Meterpoints to get the remaining consumption. As the registration of counts is done by one device I get datapoints for the various Meterpoints at the same Timestamp.
I think I found a solution applying a subquery which appears to be not very efficient.
SELECT
A.dat_Time,
(A.int_Counts- (SELECT B.int_Counts FROM tbl_Metering AS B WHERE B.fk_MetPoint=2 AND B.dat_Time=A.dat_Time)) AS Delta
FROM tbl_Metering AS A
WHERE fk_MetPoint=1
How could I improve this query?
Thanks in advance
You can try using a window function instead:
SELECT m.dat_Time,
(m.int_counts - m.int_counts_2) as delta
FROM (SELECT m.*,
MAX(CASE WHEN fk.MetPoint = 2 THEN int_counts END) OVER (PARTITION BY dat_time) as int_counts_2
FROM tbl_Metering m
) m
WHERE fk_MetPoint = 1
From a query point of view, you should as a minimum change to a set-based approach instead of an inline sub-query for each row, using a group by as a minimum but it is a good candidate for a windowing query, just as suggested by the "Great" Gordon Linoff
However if this is a metering project, then we are going to expect a high volume of records, if not now, certainly over time.
I would recommend you look into altering the input such that delta is stored as it's own first class column, this moves much of the performance hit to the write process which presumably will only ever occur once for each record, where as your select will be executed many times.
This can be performed using an INSTEAD OF trigger or you could write it into the business logic, in a recent IoT project we computed or stored these additional properties with each inserted reading to greatly simplify many types of aggregate and analysis queries:
Id of the Previous sequential reading
Timestamp of the Previous sequential reading
Value Delta
Time Delta
Number of readings between this and the previous reading
The last one sounds close to your scenario, we were deliberately batching multiple sequential readings into a single record.
You could also process the received data into a separate table that includes this level of aggregation information, so as not to pollute the raw feed and to allow you to re-process it on demand.
You could redirect your analysis queries to this second table, which is now effectively a data warehouse of sorts.
I am trying to maintain a ticketing system to keep track of Work Order numbers and every so often the count of the WO_NUM jumps. The WO_NUM should be the same as the WOID. But for some reason, after using this system for years, the WO_NUM started to be 1 greater than the WOID.
The count for WO_NUM jumps by 2 instead of 1 from WO_NUM 229912 to WO_NUM 229914
Then a few months later (128 days) to be exact, it jumps again this time to 2 greater than the WOID.
The count for WO_NUM jumps from 239946 to 239948
This happens again 18 days later, but this time to 3 greater than WOID with WO_NUM jumping from 241283 to 241285 while WOID increments normally from 241281 to 241822
And again 7 days later to 4 greater than WOID with WO_NUM jumps from 241897 to 241899 while WOID increments normally from 241894 to 241895.
This seems to keep getting further and further off and it his happening almost exponentially quicker. Any idea why this might be/how I might go about fixing it?
If you are using an IDENTITY field in SQL Server or a similar auto-increment mechanism in another system, there is no guarantee that your IDs will be consecutive. If you try to insert a new row and the insert fails, the ID that would have been used is skipped. This is to allow another insert to begin while the other is in process without an ID collision.
If you need (not want) IDs to be consective then you'll have do do something like:
Create a locking mechanism so that inserts are atomic.
Use a key table that will store the next available ID for your table
Only increment the key table if the insert succeeds.
That said, obviously this adds a lot of risk to your system, and doesn't address what happens if you delete a record. I would reconsider whether you need consecutive IDs and whether that feature is worth the extra development and overhead.
figured it out myself. Turns out that if a work order was started and then cancelled without being saved WO_NUM was being incremented and not rolled back. Thanks to all that tried to help, sorry I didn't provide the requisite information. I'll make sure to do better next time!
I'm just getting started with SQL Server's Change Data Capture functionality. I'd like to be able to pick out entries from the change table based on which transactions they were part of. Is there a way to do this? The transaction ID field in those tables doesn't seem to correspond to anything meaningful.
I realize that there are the LSN's and you can look for general start and end times, but is it possible to account for multiple transactions which run at the same time, each affecting different records? (with potentially interleaving start and end times for different operations)
I ran into this same problem a few minutes ago. I did some Googling and it appears there's a cdc.lsn_time_mapping table that may do this for you.
Here's how I used it (not sure how reliable this is, what would happen if you truncate the transaction log, etc.):
DECLARE #from_lsn binary(10), #to_lsn binary(10);
SET #from_lsn = sys.fn_cdc_get_min_lsn('Application_Person');
SET #to_lsn = sys.fn_cdc_get_max_lsn();
SELECT m.*, c.*
FROM cdc.fn_cdc_get_all_changes_Application_Person(#from_lsn, #to_lsn, N'all') c
INNER JOIN cdc.lsn_time_mapping m ON c.__$start_lsn = m.start_lsn
Other useful links for future readers:
https://msdn.microsoft.com/en-us/library/bb510494.aspx (How to set up
CDC)
https://msdn.microsoft.com/en-us/library/bb510627.aspx (How to
query changes)
I have a simple query:
Select Count(p.Group_ID)
From Player_Source P
Inner Join Feature_Group_Xref X On P.Group_Id=X.Group_Id
where x.feature_name ='Try this site'
which spits out the current number of people in a specific test group at the current moment in time.
If I wanted to see what this number was, say, on 9/10/12 instead, could I add something in to the query to time phase this information as the database had it 2 days ago?
No. If you want to store historical information, you will need to incorporate that into your schema. For example, you might extend Feature_Group_Xref to add the columns Effective_Start_Timestamp and Effective_End_Timestamp; to find which groups currently have a given feature, you would write AND Effective_End_Timestamp > CURRENT_TIMESTAMP() (or AND Effective_End_Timestamp IS NULL, depending how you want to define the column), but to find which groups had a given feature at a specific time, you would write AND ... BETWEEN Effective_Start_Timestamp AND Effective_End_Timestamp (or AND Effective_Start_Timestamp < ... AND (Effective_End_Timestamp > ... OR Effective_End_Timestamp IS NULL)).
Wikipedia has a good article on various schema designs that people use to tackle this sort of problem: see http://en.wikipedia.org/wiki/Slowly_changing_dimension.
It depends...
It is at least theoretically possible that you could use flashback query
Select Count(p.Group_ID)
From Player_Source as of timestamp( date '2012-09-10' ) P
Join Feature_Group_Xref as of timestamp( date '2012-09-10' ) X
On P.Group_Id=X.Group_Id
where x.feature_name ='Try this site'
This requires, though, that you have the privileges necessary to do a flashback query and that there is enough UNDO for Oracle to apply to be able to get back to the state those tables were in at midnight two days ago. It is unlikely that the database is configured to retain that much UNDO though it is generally possible. This query would also work if you happen to be using Oracle Total Recall.
More likely, though, you will need to modify your schema definition so that you are storing historical information that you can then query as of a point in time. There are a variety of ways to accomplish this-- adding effective and expiration date columns to the table as #ruakh suggests is one of the more popular options. Which option(s) are appropriate in your particular case will depend on a variety of factors including how much history you want to retain, how frequently data changes, etc.
Rails 2.3.4
I have searched google, and have not found an answer to my dilemma.
For this discussion, I have two models. Users and Entries. Users can have many Entries (one for each day).
Entries have values and sent_at dates.
I want to query and display the average value of entries for a user BY DAY OF WEEK. So if a user has entered values for, say, the past 3 weeks, I want to show the average value for Sundays, Mondays, etc. In MySQL, it is simple:
SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = ? GROUP BY 1
That query will return between 0 and 7 records, depending upon how many days a user has had at least one entry.
I've looked at find_by_sql, but while I am searching Entry, I don't want to return an Entry object; instead, I need an array of up to 7 days and averages...
Also, I am concerned a bit about the performance of this, as we would like to load this to the user model when a user logs in, so that it can be displayed on their dashboard. Any advice/pointers are welcome. I am relatively new to Rails.
You can query the database directly, no need to use an actual ActiveRecord object. For example:
ActiveRecord::Base.connection.execute "SELECT DAYOFWEEK(sent_at) as day, AVG(value) as average FROM entries WHERE user_id = #{user.id} GROUP BY DAYOFWEEK(sent_at);"
This will give you a MySql::Result or MySql2::Result that you can then use each or all on this enumerable, to view your results.
As for caching, I would recommend using memcached, but any other rails caching strategy will work as well. The nice benefit of memcached is that you can have your cache expire after a certain amount of time. For example:
result = Rails.cache.fetch('user/#{user.id}/averages', :expires_in => 1.day) do
# Your sql query and results go here
end
This would put your results into memcached for one day under the key 'user//averages'. For example if you were user with id 10 your averages would be in memcached under 'user/10/average' and the next time you went to perform this query (within the same day) the cached version would be used instead of actually hitting the database.
Untested, but something like this should work:
#user.entries.select('DAYOFWEEK(sent_at) as day, AVG(value) as average').group('1').all
NOTE: When you use select to specify columns explicitly, the returned objects are read only. Rails can't reliably determine what columns can and can't be modified. In this case, you probably wouldn't try to modify the selected columns, but you can'd modify your sent_at or value columns through the resulting objects either.
Check out the ActiveRecord Querying Guide for a breakdown of what's going on here in a fairly newb-friendly format. Oh, and if that query doesn't work, please post back so others that may stumble across this can see that (and I can possibly update).
Since that won't work due to entries returning an array, we can try using join instead:
User.where(:user_id => params[:id]).joins(:entries).select('...').group('1').all
Again, I don't know if this will work. Usually you can specify where after joins, but I haven't seen select combined in there. A tricky bit here is that the select is probably going to eliminate returning any data about the user at all. It might make more sense just to eschew find_by_* methods in favor of writing a method in the Entry model that just calls your query with select_all (docs) and skips the association mapping.