Trying double entry system for accounting, but getting transaction list is a problem - sql

It's been asked a few times, but more about design. I'm trying to make a little home finance application to learn some new technologies. I've gone with a single double row double entry system design.
So, a Journal is the root transaction, which has 2 or more Transactions. An Account can be either a Third Party (I paid my mate some cash, I got paid by my company, I paid a restaurant for dinner) or a bank account (I paid money from my credit card)
So, I can Transfer between two accounts.
(I transferred $200 from my Current Account, to my Credit Card account)
I can recieve money.
(I got paid by My Company, into my Current Account.
I can pay someone.
(I paid the restaurant $20 for dinner)
So, lets use the last example. I use my Current Account to pay KFC.
I have two accounts to deal with. KFC, an my current account.
I create a Journal for this transaction.
And then for that Journal, I create two transactions.
First one, is an amount of -20, and the Account is my bank account.
Second one is an amount of +20, and the account is KFC.
I can quickly get the balance of my bank account.
SELECT SUM(Amount) FROM Journal WHERE AccountID = MyBankAccountId
Perfect.
But I have an issue.
How do I show a transaction list for my bank account?
So I want to get a 'Statement' for AccountId 1
But from this, I can't really tell to whom I paid, or was paid. It only shows the lines related to Account 1.
The data above shows the transactions against the account I care about, but .. I need to show the OTHER account details too. So I need to somehow get 'SourceAccountId' and 'DestinationAccountId'.
Is my design wrong, and I need to go to one line transactions, with a Source and Destination account id in the same row?
The issue I have with that is - I want to be able to assign budgets to my transactions. For example, I was going to add a BudgetId to the Transaction table, allowing me apportion the amounts to different budgets or categories.
(I went to the hardware store, and spend $25. Of that $25, $10 was for my "garden" budget, and $15 was for "internal home decoration" budget.
So my transaction line might have 3 rows.
1 for the credit to the hardware store account for $25.
1 for a debit to my bank account for $10, with a budget Id of my 'Garden Budget'.
1 for a debit to my bank account for $15 with a budget Id of my 'Internal home decoration' budget.
This may be an issue too, as I will then get two 'lines' on my statement for my bank account... One for $15 and one for $10. Maybe I can go one line transaction, but then ... an extra table for the budget apportioning?
That would make getting a balance for an account more tricky. I’d need to check SoutceAccountId and DestinationAccountId.
Any ideas would be great.

I've worked and built some A/R systems before, it's... interesting to say the least. Typically the way it's done is to record the amount as a positive number (regardless of payment transaction... $100 moved somewhere at the transaction level, the direction is irrelevant), so as it is other than recording +/- in it, I think your transaction table is fine. Might want to add a memo field though so you can record what it was for (ie, let's say you pay the electric bill, you can record that this transaction was for "June '19 bill").
The journal table is where the magic happens.
There's two models I've seen, both work:
ID - PKey - Integer - IDENTITY
TransactionID - FKey - Integer - Indicates the parent Transaction
Amount - Decimal - The amount of the journal entry
AccountID - FKey - integer - Indicates the Account the journal entry is for
NetEffect - Bit - Flag which indicates if the item is a Credit (True) or Debit (False)
OR...
ID - PKey - Integer - IDENTITY
TransactionID - FKey - Integer - Indicates the parent Transaction
DebitAmount - Decimal - Amount of Debit
CreditAmount - Decimal - Amount of Credit
DebitAccountID - FKey - Integer - Indicates the Debit account for the item
CreditAccountID - FKey - Integer - Indicates the Credit account for the item
Both would then use the Sum(Debits) - Sum(Debits) to get the balance.
The first one should be explanitory. The second one is closer to a double-entry form. You would use either the Credit OR the Debit fields for an entry, but not both. But it also makes your budget calculations easier.
The Transaction would record $25. The Transactions then record like this:
1 -- 1 -- 00 -- 25 -- null -- {account number for garden store}
2 -- 1 -- 10 -- 00 -- {Account for Garden Budget} -- null
3 -- 1 -- 15 -- 00 -- {Internal home Budget} -- null
To allocate something to the budget:
4 -- 2 -- 25 -- 00 -- null -- {Account for Garden Budget}
5 -- 2 -- 25 -- 00 -- null -- {Internal hom Budget}
6 -- 2 -- 00 -- 50 -- {Your checking account} -- null
That moves $50 into your two budgets, 25 for one, 25 for the other, and deducts if from your account.
Then you get paid:
7 -- 3 -- 1000 -- 00 -- null -- {Checking account number}
8 -- 3 -- 00 -- 1000 -- {Account for employer} -- null
Hopefully that all made sense.

Related

Only Show unique Customers per date cohort for repeat purchase rate

Scenario:
I have a table that has all of the customer purchases by Month and each month has a period. Within that table I am showing the customers that have made purchases in each Month/Period. What I am trying to figure out is how to exclude any customer that made a purchase in the previous month so that the repeat purchases are only for unique customers. The data looks like the following:
customer_email
cohortMonth
month_number
orders_for_period
abc#gmail.com
10/2019
0
2
def#gmail.com
10/2019
0
1
ghi#gmail.com
10/2019
0
1
def#gmail.com
10/2019
1
1
abc#gmail.com
10/2019
1
1
def#gmail.com
10/2019
2
1
In the Table above for Month_number=0 we have 3 total customers and within this period customer abc#gmail.com was the only repeat customer because they have 2 orders. This would show as a 33% repeat purchase rate for month_number 0. For Month_number=1 we have 2 customers that have purchased again in the period but only def#gmail.com is unique as abc#gmail.com already made the purchase. This would then bring the repeat_rate to 66% as now 2 customers have comeback and purchased out of the 3 that originally purchased.
cohortMonth
month_number
repeat_purchase_rate
10/2019
0
33%
10/2019
1
66%
10/2019
2
66%
With every unique customer that purchases in the subsequent periods we want to add that to the total to understand the repeat rate at a cumulative level.
I have tried a ton of different ways to figure this out but backing out the customers that made purchases in the previous period and only showing the unique customers is where I am struggling at. Any help is greatly appreciated!
Side Note: Whenever I format a table it looks like how I want it to look in the preview but then when I review I get the error :"Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon."
I then indent and it breaks the way the table looks. Any help on that would be great as well. Thank you

SQL-sum over dynamic period

I have 2 tables: Customers and Actions, where each customer has uniqe ID (which can be found in each table).
Part of the customers became club members at a specific date (change between the customers). I'm trying to summarize their purchases until that date, and to get those who purchase more than (for example) 200 until they become club members.
For example, I can have the following customer:
custID purchDate purchAmount
1 2015-05-12 100
1 2015-07-12 150
1 2015-12-29 320
Now, assume that custID=1 became a club member at 2015-12-25; in that case, I'd like to get SUM(purchAmount)=250 (pay attention that I'd like to get this customer because 250>200).
I tried the following:
SELECT cust.custID, SUM(purchAmount)totAmount
FROM customers cust
JOIN actions act
ON cust.custID=act.custID
WHERE act.clubMember=1
AND cust.purchDate<act.clubMemberDate
GROUP BY cust.custID
HAVING totAmount>200;
Is it the right way to "attack" this question, or should I use something like while loop over the clubMemberDate (which telling the truth-I don't know how to do)?
I'm working with Teradata.
Your help will be appreciated.

sum not calculating correct no. of units in SQL command

I have the following SQL script(of which the result is displayed under the script). The issue I am having is that I need to add up the quantity on the invoice. The quantity works fine when all the products on the invoice are different. When there is a product that appears twice on the invoice, the result is incorrect. Any help appreciated.
The DISTINCT keyword acts on all columns you select.
A new product introduces a difference which makes it no longer distinct. Hence the extra row(s).
Where you had:
Order Product Total
1 Toaster $10
2 Chair $20
And another item is added to order 1:
Order Product Total
1 Toaster $99
1 Balloon $99 -- Yes that's a $89 balloon!
2 Chair $20
The new row (balloon) is distinct and isn't reduced into the previous row (toaster).
To make is distinct again, don't select the product name:
Order Total
1 $99
2 $20
Uniqueness kicks in and everyone's happy!
If you can remove the column from the select list that's "different", you should get the results you need.

sql - dynamically populate columns in a result set based on another column

We want to determine the amount of assets that customers have insured or uninsured.
Basically the insured amount is £85K, and we aportion that accross several account types held by the custimer in order of the accounts liquidity, so for example, a current account is more liquid than a 95 day notice account therefore the current account would receive the money first...
We have the below code that generates the priorities based on the account type and of course we could now loop through the table and determine the insured amount for each account type for each customer, and anything non-insured would become the uninsured amount for that customer.
But is there a way to do this in the main select statement?
declare #Priorities TABLE
(
[refID] [int] IDENTITY(1,1) NOT NULL,
acno nvarchar(10),
Suffix nvarchar(6) null,
Unit nvarchar(5) null,
CT nvarchar(2) null,
LOB nvarchar(2) null,
Participants int null,
VAL float null,
xPriority int null,
insuredAmt float null,
uninsuredAmt float null
)
declare #Count int=0
declare #I int=0
insert into #Priorities
select acno, suffix, unit, CT, LOB, Participants, Val, xPriority =
case
when LOB BETWEEN 'GA' AND 'GN' then 1 -- Current accounts
when LOB IN ( 'HY' , 'H1') then 2 -- Call
when CHARINDEX(LOB, 'HA; HC; HM; HQ; IA; IB; IC; IE;IG; IU;')>0 then 3 -- Instant Access
when LOB='HO' then 4 -- 14 Day notice
when CHARINDEX(LOB, 'DN; HG; HK; HL; HP; HS; HU; IV; IW;')>0 then 5 -- 35-Day notice
when LOB IN ('DN', 'HV') then 6 -- 95-Day notice
when (LOB BETWEEN 'IS' AND 'IT') or (LOB between 'JA' and 'JZ' ) then 7 -- Residual Deals
end,
insuredAmt=null,
uninsuredAmt=null
from actdata
where datadate = '20131212' /*#processdate*/
and CHARINDEX(CT, 'AA;AC;AE;AG;BA;BC;BE;BG;CA;DE;DG;ZA;')=0
and LOB BETWEEN 'GA' AND 'KZ' And val>0
and participants is not null
order by acno
what I'd like to do is calculate the insured amount for each record based on the value of the column xPriority.
Also, we multiply the total insured amount by the number of participants in the account, so if there are 5 participants in the account, then the total insured amount for them all would be 5*85000 = £425000.
the alternative would be a loop or a cursor.
this would be run on a daily basis when the data is brought into the reporting database from the iSeries system by another process.
thanks
Philip
I believe the answer is 'No', but that's based on a particular interpretation of:
"Also, we multiply the total insured amount by the number of participants in the account, so if there are 5 participants in the account, then the total insured amount for them all would be 5*85000 = £425000."
Without multiple participants in an account, I think it would be possible to order the SELECT statement by the customer ID and the CASE statement for xPriority, keep a decreasing running total (down form 85K and resetting for each customer) and calculate insuredAmt and uninsuredAmt accordingly. This would tell you how each individual's insured amount was apportioned across their accounts.
However, with multiple participants you have to resolve the following problem:
Mary has a current account of £40k and a 30 day notice account with £20k.
Bob has a current account of £50k and a 30 day notice account with £20k.
Mary and Bob also have a joint, 14 day notice account of £60k.
The two current accounts are fully insured, as is the 14 day notice account (Mary has £45k and Bob £35k allowance left). The problem is that the coverage of the two 30 day accounts depends on which individual's insurance is apportioned to the joint account (all of Mary's plus a bit of Bob's; all of Bob's plus a bit of Mary's; £30k of each; 9:7 ratio from Mary:Bob?). I don't think this problem is even resolvable in logical terms without reference to the full terms and conditions of the accounts! When it comes to programming it, I'd advise extreme caution. Start by writing inefficient code you can test thoroughly on such cases, then use that as a benchmark for checking code with enhanced efficiency.
Good luck!

Optimal selection for ordering multiple items (parts) from multiple suppliers (vendors)

The task here is to define the optimal (as detailed below) way of ordering items (parts) from suppliers.
The relevant parts of the table schema (with some sample data) are
Items
ID NUMBER
1 Item0001
2 Item0002
3 Item0003
Suppliers
ID NAME DELIVERY DISCOUNT
1 Supplier0001 0 0
2 Supplier0002 0 0.025
3 Supplier0003 20 0
DELIVERY is the delivery charge (in dollars) levied by that supplier on each delivery. DISCOUNT is the settlement discount (as a percentage i.e. 2.5% for ID=2 above) allowed by that supplier for on time payment.
SupplierItems
SUPPLIER_ID ITEM_ID PRICE
1 2 21.67
1 5 45.54
1 7 32.97
This is the many-to-many join between suppliers and items with the price that supplier charges for that item (in dollars). Every item has at least 1 supplier but some have more than one. A supplier may have no items.
PartsRequests
ID ITEM_ID QUANTITY LOCATION_ID ORDER_ID
1 59 4 2 (null)
2 89 5 2 (null)
3 42 4 2 (null)
This table is a request from a field site for parts to be ordered and delivered by the supplier to that site. A delivery of any number of items to a site attracts a delivery charge. When the parts are ordered, the ORDER_ID is inserted into the table so we are only concerned with those where ORDER_ID IS NULL
The question is, what is the optimal way to order these parts for each `LOCATION' where there are 3 optimal solutions that need to be presented to the user for selection.
The combination of orders with the least number of suppliers
The combination of orders with the lowest total cost i.e. The sum of QUANTITY*PRICE for each item plus the DELIVERY for each order summed over all orders ignoring DISCOUNT
As item 2 but accounting for DISCOUNT
Clearly I need to determine the combinations of orders that are available and then determining the optimal ones becomes trivial but I am a bit stuck on an efficient way to deal with building the combinations.
I have built some SQL fiddles in SQL Server 2008 with random data. This one has 100 items, 10 suppliers and 100 requests. This one has 1000 items, 50 suppliers and 250 requests. The table schema is the same.
Update
I reasoned that the solution had to be recursive and I built a nice table valued function to get but I ran into the 32 hard limit on recursion in SQL Server. I was uncomfortable with it anyway because it hinted more of a procedural language solution than a RDMS.
So I am now playing with CTE recursion.
The root query is:
SELECT DISTINCT
'' SOLUTION_ID
,LOCATION_ID
,SUPPLIER_ID
,(subquery I haven't quite worked out) SOLE_SUPPLIER
FROM PartsRequests pr
INNER JOIN
SupplierItems si ON pr.ITEM_ID=si.ITEM_ID
WHERE pr.ORDER_ID IS NULL
This gets all the suppliers that can supply the required items and is certainly a solution, probably not optimal. The subquery sets a flag if the supplier is the sole supplier of any product required for that location; if so they must be part of any solution.
The recursive part is to remove suppliers one by one by means of CTE.SUPPLIER_ID<>CTE.SUPPLIER_ID and add them if they still cover all the items. The SOLUTION_ID will be a CSV list of the suppliers removed, partly to uniquely identify each solution and partly to check against so I get combinations instead of permutations.
Still working on the details, the purpose of this update was to allow the Community to say "Yay, looks like that will work" or, alternatively "You moron, that won't work because ..."
Thanks
This is a more general answer (as in, not sql) as I think solving this problem will require something more powerful. Your first scenario is to select a minimum number of suppliers. This problem can be seen as a set cover problem as you are trying to cover all demands per site with the suppliers. This problem is already NP-complete.
Your third scenario seems to be basically the same as the second. You just have to take the discount into account in the prices, assuming you pay on time for every order.
The second scenario is at least NP-hard as I see a lot of resemblance with the facility location problem. You are trying to decide which suppliers (facilities) to use (open) to cover your orders (demands) based on their prices and delivery costs (opening costs).
Enumerating your possible solutions seems infeasible as with 10 suppliers, you have 2^10 possibilities of using them, further complicated by the distribution of demands internally.
I would suggest some dynamic programming to first select the suppliers that you have to use (=they are the only ones that deliver a specific thing), eliminating some possibilities (if the cost for supplier A +delivery cost A< cost for supplier B) and then trying to expand your set of possible solutions. Linear programming is also a valid train of thought.