Running balance and Database normalization paradigm - sql

Maybe I'm not good in googling but I'm seeking for a set of gold rules and recommendations in designing databases related to billing.
Lets say I have an SQL table with transactions
transactions(id int, credit float, debit float, billable_account_id int)
Based on the rule of database normalization I abandon the idea of storing and updating on every transaction pre-calculated running balance for every *billable_account_id* in the same table or elsewhere regardless of the size of the transaction table.
I'm using Postgres if that matters(though the subject is common) and I'm not an SQL ninja at all but trying to be pedantic in designing.
Questions:
Am I right going with this approach?
If yes, what methods would you suggest in maintaining such table and in composing query for getting running totals?
Any references are very appreciated!

You can use analytic functions to generate a running total in most databases. In Oracle, something like
SELECT billable_account_id,
SUM( (CASE WHEN credit IS NOT NULL THEN credit
WHEN debit IS NOT NULL THEN -1 * debit
ELSE 0
END) ) OVER (PARTITION BY billable_account_id
ORDER BY transaction_date ) running_total
FROM transactions
If you don't have a TRANSACTION_DATE, you could use ID assuming that you can guarantee that the generated IDs are monotonically increasing.
However, from a performance standpoint, you are likely going to want to bend if not break the third normal form normalization rules for OLAP/ DSS type reporting because people are going to want to report on totals pretty frequently and some accounts are likely to have large numbers of transactions. You may, for example, want to create a separate table that has an ending balance for each BILLABLE_ACCOUNT_ID for each month end and then use the analytic function to just add the current month's transactions to last month's ending balance. In Oracle, you may want to create a materialized view that will automatically maintain the running total.

Related

Are multiple nullable foreign keys a bad design?

We have a table to store user financial transactions.
A transaction can be incremental or decremental, we have 3 types of transactions, increase with a payment, increase with receiving of a gift, and decrease with purchase of product, So our transaction table contains 3 foreign keys:
[PaymentId] [GiftId] [RequestId]
Is this a bad design? What better alternative is there?
I think it is complicated to join [Transaction] table with 3 other tables to get details of each transaction to display list of user transactions
It seems like you'd be better off with a column called something along the lines of TransactionType and then just storing the TransactionTypeId in a single column as well. Using these two columns you can represent any number of types of Transactions. You actually find this common in the financial data world with dimension tables that represent things like Cost Type, etc.

How to create an aggregate table (data mart) that will improve chart performance?

I created a table named user_preferences where user preferences have been grouped by user_id and month.
Table:
Each month I collect all user_ids and assign all preferences:
city
district
number of rooms
the maximum price they can spend
The plan assumes displaying a graph showing users' shopping intentions like this:
The blue line is the number of interested users for the selected values in the filters.
The graph should enable filtering by parameters marked in red.
What you see above is a simplified form for clarifying the subject. In fact, there are many more users. Every month, the table increases by several hundred thousand records. The SQL query retrieving data (feeding) for chart lasts up to 50 seconds. It's far too much - I can't afford it.
So, I need to create a table (table/aggregation/data mart) where I will be able to insert the previously calculated numer of interested users for all combinations. Thanks to this, the end user will not have to wait for the data to count.
Details below:
Now the question is - how to create such a table in PostgreSQL?
I know how to write a SQL query that will calculate a specific example.
SELECT
month,
count(DISTINCT user_id) interested_users
FROM
user_preferences
WHERE
month BETWEEN '2020-01' AND '2020-03'
AND city = 'Madrid'
AND district = 'Latina'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
GROUP BY
1
The question is - how to calculate all possible combinations? Can I write multiple nested loop in SQL?
The topic is extremely important to me, I think it will also be useful to others for the future.
I will be extremely grateful for any tips.
Well, base on your query, you have the following filters:
month
city
distirct
rooms
price_max
You can try creating a view with the following structure:
SELECT month
,city
,distirct
,rooms
,price_max
,count(DISTINCT user_id)
FROM user_preferences
GROUP BY month
,city
,distirct
,rooms
,price_max
You can make this view materialized. So, the query behind the view will not be executed when queried. It will behave like table.
When you are adding new records to the base table you will need to refresh the view (unfortunately, posgresql does not support auto-refresh like others):
REFRESH MATERIALIZED VIEW my_view;
or you can scheduled a task.
If you are using only exact search for each field, this will work. But in your example, you have criteria like:
month BETWEEN '2020-01' AND '2020-03'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
In such cases, I usually write the same query but SUM the data from the materialized view. In your case, you are using DISTINCT and this may lead to counting a user multiple times.
If this is a issue, you need to precalculate too many combinations and I doubt this is the answer. Alternatively, you can try to normalize your data - this will improve the performance of the aggregations.

stock table for inventory database design architecture

I am currently building a management system. Is it good practice to create a balance table for inventory to store the inventory at hand and constantly update the table if there are changes, or should one just directly query total inventory ordered table - total inventory used table? Which would be the most efficient and fastest way to do?
It is likely a bad idea to use two separate tables. You will have to perform a join which is unnecessary. Simply have one table with an 'ordered' column and a 'used' column. In your query you can very efficiently calculate the net value e.g. :
SELECT ordered, used, (ordered - used) as net FROM inventory

Efficient sliding window sum over a database table

A database has a transactions table with columns: account_id, date, transaction_value (signed integer). Another table (account_value) stores the current total value of each account, which is the sum of all transaction_values per account. It is updated with a trigger on the transactions table (i.e., INSERTs, UPDATEs and DELETEs to transactions fire the trigger to change the account_value.)
A new requirement is to calculate the account's total transaction value only over the last 365 days. Only the current running total is required, not previous totals. This value will be requested often, almost as often as the account_value.
How would you implement this "sliding window sum" efficiently? A new table is ok. Is there a way to avoid summing over a year's range every time?
This can be done with standard windowing functions:
SELECT account_id,
sum(transaction_value) over (partition by account_id order by date)
FROM transactions
The order by inside the over() claues makes the sum a "sliding sum".
For the "only the last 356 days" you'd need a second query that will limit the rows in the WHERE clause.
The above works in PostgreSQL, Oracle, DB2 and (I think) Teradata. SQL Server does not support the order by in the window definition (the upcoming Denali version will AFAIK)
As simple as this?
SELECT
SUM(transaction_value), account_id
FROM
transactions t
WHERE
-- SQL Server, Sybase t.DATE >= DATEADD(year, -1, GETDATE())
-- MySQL t.DATE >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY
account_id;
You may want to remove the time component from the date expressions using DATE (MySQL) or this way in SQL Server
If queries of the transactions table are more frequent than inserts to the transactions table, then perhaps a view is the way to go?
You are going to need a one-off script to populate the existing table with values for the preceding year for each existing record - that will need to run for the whole of the previous year for each record generated.
Once the rolling year column is populated, one alternative to summing the previous year would be to derive each new record's value as the previous record's rolling year value, plus the transaction value(s) since the last update, minus the transaction values between one year prior to the last update and one year ago from now.
I suggest trying both approaches against realistic test data to see which will perform better - I would expect summing the whole year to perform at least as well where data is relatively sparse, while the difference method may work better if data is to be frequently updated on each account.
I'll avoid any actual SQL here as it varies a lot depending on the variety of SQL that you are using.
You say that you have a trigger to maintain the existing running total.
I presume that it also (or perhaps a nightly process) creates new daily records in the account_value table. Then INSERTs, UPDATEs and DELETEs fire the trigger to add or subtract from the existing running total?
The only changes you need to make are:
- add a new field, "yearly_value" or something
- have the existing trigger update that in the same way as the existing field
- use gbn's type of answer to create today's records (or however far you backdate)
- but initialise each new daily record in a slightly different way...
When you insert a new row for a new day, it should be initialised to yesterday's value - the value 365 days ago. After that, the behavior should be identical to what you're already used to.

MS Visio Category relationships to SQL Server

I'm using MS Visio to model a database and part of the model contains Transaction categories - the parent table has a transactionId, timestamp, amount, and transactionType. There are three child tables - cheque, bank transfer, and credit card, all related to the parent by transactionId.
Is there a specific way this kind of relationship is implemented in SQL Server, or is it just a conceptual model leaving the implementation up to me? If the latter, why have a transactionType column in the parent table if the tables are all related with transactionId - is it just to narrow my queries? That is, if a row in the parent table specifies "cheque" as the transactionType I know that I only have to query/join the cheque child table?
It just occurred to me - is this just an ISA hierarchy, in which case I'd create three distinct tables each containing the columns identified in the ISA parent entity?
This is essentially multiple-table inheritance, although you can model it in the domain as a simple reference relationship if you want.
There are many good reasons to have the selector field/property. The obvious one is so an application or service gets a hint as to how to load the details, so it doesn't have to load every conceivable row from every conceivable table (try this when you have 20 different types of transactions).
Another reason is that much of the time the end user doesn't necessarily need to know the details of a transaction, but does need to know the type. If you're looking at an A/R report from some financial or billing system, most of the time all you need to know for a basic report is the previous balance, amount, subsequent balance, and the transaction type. Without that information, it's very hard to read. The ledger doesn't necessarily show the details for every transaction, and some systems may not even track the details at all.
The most common alternative to this type of model is a single table with a whole bunch of nullable columns for each different transaction type. Although I personally despise this model, it's a requirement for many Object-Relational Mappers that only support single-table inheritance. That's the only other way you'd want (or not want) to model this in a database.
The transactionType in the parent table is useful if you'd like to query over all transactions, for example to sum the amounts per transaction type:
select transactionType, sum(amount)
from transactions
group by transactionType
Without the column, you could still do that by querying on the child tables:
select
case when c.transactionId is not null then 'CHEQUE'
when cc.transactionId is not null then 'CREDIT CARD'
...
end
, sum(amount)
from transactions t
left join cheque c on t.transactionId = c.transactionId
left join creditcard cc on t.transactionId = cc.transactionId
...
group by
case when c.transactionId is not null then 'CHEQUE'
when cc.transactionId is not null then 'CREDIT CARD'
...
end
As you can see, that's much harder, and requires extending the query for each type of transaction you add.