For years, I have used the humble spreadsheet as a way of keeping track of finances, but as this spreadsheet grew in size, and as I required more types of data analyses, it eventually became clear that a spreadsheet was no longer going to cut it. Problem is, going from Calc to Base has been a slow learning process in understanding SQL (HSQLDB, specifically), particularly its syntax.
Below is an example table with some arbitrary values. We'll call it Table A. There are a lot of other columns in my original table, but are either irrelevant, or I have already figured out what to do with them.
Quantity
Account Paid From
Recipient A Percentage
Recipient B Percentage
100
A
100
0
200
B
0
100
500
A
0
100
50
B
100
0
10
A
40
60
The idea here is that in row 1, Person A paid for something intended solely for person A. Therefore, this transaction does not result in anyone owing anything to another person. Same with row 2, where person B paid for something intended solely for person B
In row 3, person A paid $500 on behalf of person B. Person B now owes Person A $500.
In row 4, B Paid $50 for A. Subtract 50 from 500, and B now only owes A $450
In row 5, A paid for something that is 40% theirs, and 60% for B. In other words, A paid $6 out of the $10 for B. B now owes A 500 - 50 + 6 $456
I'm looking for something along the lines of the following:
Select all entries where Account Paid From = A
Of those entries, take the total sum of Quantity * Recipient B Percentage / 100
The result is how much B owes A
Select all entries where Account Paid From = B
Of those entries, take the total sum of Quantity * Recipient A Percentage / 100
The result is how much A owes B
Subtract A owes B from B owes A to find out who is in debt to the other (if the value is + or -), and by how much.
I guess something along the lines of:
SELECT SUM("Account Paid From" * "Recipient B Percentage / 100)
WHERE "Account Paid From" = "A" - SUM("Account Paid From" * "Recipient B Percentage / 100)
WHERE "Account Paid From" = "B" AS "Owed"
FROM "Table A"
But...you know, without syntax errors screaming at me.
For simplicity, I will use table name "T" and the following abbreviated column names:
"Quantity" "Q", "Account Paid From" "APF", "Recipient A Percentage" "RAP", "Recipient B Percentage" "RBP"
SELECT * FROM T WHERE "APF" = 'A'
SELECT SUM("Q" * "RBP" / 100) AS "Sum RBP" FROM "T" WHERE "APF" = 'A'
SELECT SUM("Q" * "RAP" / 100) AS "Sum RAP" FROM "T" WHERE "APF" = 'B'
SELECT "Sum RBP" - "Sum RAP" FROM (SELECT SUM("Q" * "RBP" / 100) AS "Sum RBP" FROM "T" WHERE "APF" = 'A'), (SELECT SUM("Q" * "RAP" / 100) AS "Sum RAP" FROM "T" WHERE "APF" = 'B')
Related
I have a large data set that I'm trying to export in a way I've never done before. There are dozens of columns with flags (0 or 1) to indicate whether a person has that trait. At the end each record has a total cost which sums up all money associated with that person. Sample below
ID
Visit
Stay
Treatment
Total Cost
1
0
1
1
$50
2
1
0
1
$100
I'm trying to get it into a format like so:
Visit
Stay
Treatment
1
1
2
$100
$50
$75
So that number of flags is summed up per column and the average cost is below that. Hence, there's two treatment and the average cost is $75, there's one stay with an average cost of $100.
I've tried GROUPING BY and a few other functions, but haven't been successful. Any help would be greatly appreciated!
We add up all we need and then use union all to unpivot.
select sum(visit) as visit
,sum(stay) as stay
,sum(treatment) as treatment
from t
union all
select sum(visit*total_cost)/sum(visit)
,sum(stay*total_cost)/sum(stay)
,sum(treatment*total_cost)/sum(treatment)
from t
visit
stay
treatment
1
1
2
100
50
75
Fiddle
I was thus to come up with a solution for The column "Loan_Term" which measures the duration of the loan applied for, in months.
Applicants asking for loans with loan terms less than 120 months are given a rating of "Short" under loan tenure. Those with loan terms at least 120 months, but less than 300 months will be rated as "Medium". Applicants asking for loans with loan terms of 300 or more months, will be rated as "Long".
Write a function that takes an applicant's numerical value of months of Loan_Term as an input parameter and returns the respective customer's rating. Create a new attribute "Loan_Tenure" for every applicant in df_loans.
Display df_loans with only the "Loan_Term" and "Loan_Tenure" attributes.
My code is as follows df_loans =df
df_loans.loc[(df_loans.Loan_Term < 120 return "Short" ) | (df_loans.Loan_Term > 120 & < 300 return "Medium") | (df_loans.Loan_Term > 300 return "Long")]. It is wrong and I was wondering is there a way for it to only display this criterion in the table through loc or must i use something else.
for this you can use numpy's select function
df['Loan_Tenure'] = np.select([df_loans.Loan_Term <= 120 ,(df_loans.Loan_Term>120)&(df_loans.Loan_Term<300),(df_loans.Loan_Term >= 300)],['Short','Medium','Long'])
I have a table called Customer which contains two columns called opening_amt and receive_amt. I wish to display all customer details where the sum of opening_amt and receive_amt is greater than 15000.
select *
from Customer
where opening_amt > 15000;
works for just the opening amt, however this function does not work.
select *
from Customer
where opening_amt and receive_amt > 15000 ;
Thanks in advance for any help.
You need to specify the amount for both columns!
select * from Customer where opening_amt > 15000 and receive_amt > 15000
or
"sum of opening_amt and receive_amt is greater than 15000."
select * from Customer where opening_amt + receive_amt > 15000
Both examples above are doing quite different things. One is ensuring that only customers with both amounts are greater than 15000 will be returned. The second will only return customers with the sum of both over 15000.
SQL novice here. So, i would like to return records from a table for an account where at least one of those records meets a certain criteria.
We've got a program that should allocate cash to old invoices before new ones. It isn't and i need to find records that have been affected.
I want to return all records for an account if there is;
an open amount > 0 for a record that is older than a record for the same account where the Open Amount is zero or less than the gross amount.
So in the below example, Account A1 has allocated correctly. I want my query to return all the records for accounts B2 and C3.
I think i need to use some combination of HAVING and possibly a subquery but its got me confused! Any help is greatly appreciated! Thank you.
UKID ACCOUNT DATE OPEN_AMOUNT GROSS_AMOUNT
1 A1 12/03/14 100 100
2 A1 12/02/14 0 150
3 B2 21/03/14 0 100
4 B2 21/02/14 100 100
5 C3 01/03/14 50 100
6 C3 01/02/14 50 100
You should be able to do this:
SELECT *
FROM YourTable
WHERE ACCOUNT IN (
SELECT DISTINCT ACCOUNT
FROM YourTable
WHERE OPEN_AMOUNT <> 0
)
You need that second query to determine all accounts that have a non zero open amount. Then with that list you can get all of the records associated with it.
Something like
select *
from mytable
where account in (select ext.account
from mytable ext
inner join (SELECT account, date
FROM mytable ext
WHERE OPEN_AMOUNT = 0
or OPEN_AMOUNT < gross_amount) sub
on ext.account = sub.account
and ext.date < sub.date
where open_amount > 0)
should do.
If you need something more like a chain control it'll be more difficoult.
Can you please specify the database you are using? Some db specific feature can actually help a lot.
In my schema, I have a table Projects, and a table Tasks. Each project is comprised of tasks. Each task has Hours and PercentComplete.
Example table:
ProjectID TaskID Hours PercentComplete
1 1 100 50
1 2 120 80
I am trying to get the weighted percentage complete for the project. I am doing this using the following SQL statement:
SELECT P.ProjectID, P.ProjectName, SUM(T.Hours) AS Hours,
SUM(T.PercentComplete * T.Hours) / 100 AS CompleteHours,
SUM(T.PercentComplete * T.Hours) / SUM(T.Hours) AS PercentComplete
FROM Projects AS P INNER JOIN
Tasks AS T ON T.ProjectID = P.ProjectID
WHERE (P.ProjectID = 1)
My question is about this part of that statement:
SUM(T.PercentComplete * T.Hours) / SUM(T.Hours) AS PercentComplete
This gives me the correct weighted percentage for this project (in the case of the sample data above, 66%). But I cannot seem to wrap my head around why it does this.
Why does this query work?
SUM(T.PercentComplete * T.Hours) / 100 is the number of complete hours.
SUM(T.Hours) is the total number of hours.
The ratio of these two amounts, i.e.:
(SUM(T.PercentComplete * T.Hours) / 100) / SUM(T.Hours)
is the proportion of hours complete (it should be between 0 and 1).
Multiplying this by 100 gives the percentage.
I prefer to keep percentages like this out of the database and move them to the presentation layer. It would be much easier if the database stored "hours completed" and "hours total" and did not store the percentages at all. The extra factors of 100 in the calculations confuse the issue.
Basically you are finding the number of hours completed over the number of hours total.
SUM(T.PercentComplete * T.Hours) computes the total number of hours that you have completed. (100 * 50) = 50 * 100 + (120 * 80) = 146 * 100 is the numerator. 146 hours have been completed on this job, and we keep a 100 multiplier for the percent (because it is [0-100] instead of [0-1])
Then we find the total number of hours worked, SUM(T.Hours), which is 100 + 120 = 220.
Then dividing, we find the weighted average. (146 * 100) / 220 = 0.663636364 * 100 = 66.4%
Is this what you were wondering about?
It calculates the two sums individually by adding up the value for each row then divides them at the end
SUM(T.PercentComplete * T.Hours)
50* 100 +
80 * 120
-------
14,600
SUM(T.Hours)
100 +
120
---
220
Then the division at the end
14,600 / 220
------------
66.3636
Edit As per HLGEM's comment it will actually return 66 due to integer division.
Aggregate functions, such as SUM(), work against the set of data defined by the GROUP BY clause. So if you group by ProjectID, ProjectName, the functions will break things down by that.
The SUM peratiorn first multiply the columns than add
( 100* 50+ 120* 80) / (100+ 120)