I'm working with some payments balances for different clients and I want to use a calculated column in a sub-query. Breakdown of what I'm working with:
Step 1: I'm basically pulling clients id's, names, deposit balances and payment outflows.
Step 2: Sum(deposit balance)/count(distinct business accounts) - average deposits
Step 3: Sum(payment outflows)/count(distinct business accounts) - average outflows
Step 4: ??? Use the average outflows calculated in the previous column with a set of conditions to assign a multiplication factor ??? i.e.:
when avg outflow > x then multiplication_factor=2
when avg outflow > y then multiplication_factor=5
etc.
Would appreciate any suggestions on how to tackle step 4 and proceed.
I haven't used SQL in a while and here is what I've come up with so far:
Select parent_id, parent_name, deposits, payments
, sum(deposits)/count(distinct business_accts) avg_deposits
, sum(payments)/count(distinct business_accts) avg_payments
, ??? multiplication_factor (
case
when avg_payments > x then multiplication_factor=1
when avg_payments > y then multiplication_factor=2
etc.
end
)
FROM Table_pyb
WHERE something something;
I doubt if I can access the avg_payments columns in a sub-query... suggestions would be much appreciated :)
Related
My following SQLite database:
Explaining brazilian tax income:
a) if I have loss, I don't need to pay any tax (e.g.: January)
b) negative results can be subtracted from the next month positive outcome (e.g.: in February, instead of paying the full tax for $ 5942, the tax can be applied only to (5942 - 3200) = 2742.
c) if previous negative results are not sufficient to cover the next positive outcome, I got pay tax (e.g.: in September, I could compensate from June and July, but I had to aggregate from August (e.g.: total tax = -5000 -2185 +5000 +3000 = 815)
My goal would be build the following table:
I couldn't figure out a way to solve this problem. Any help?
Tks
You need to use recursive CTEs here. If you are not familiar with this feature you might check out my tutorial, the official documentation referenced in that tutorial, as well as any number of other tutorials available on the Internet.
First, I generate temporary row numbers using the row_number Window function in the source CTE block below (replace "RESULTS" with your table name).
Then I use recursive CTE (losses) to calculate residual loss from the previous months, which can be used to reduce your taxes. (This part might be tricky to understand if you are not familiar with recursive CTEs.) Finally, I calculate the total taxable amount adjusted for previous remaining loss if necessary.
WITH RECURSIVE
source AS (
SELECT row_number() OVER (ORDER BY ym) AS rid, *
FROM RESULTS
),
losses AS (
SELECT s.*, 0 AS res_loss
FROM source AS s
WHERE rid = 1
UNION ALL
SELECT s.*, iif(l.res_loss + l.profitloss < 0, l.res_loss + l.profitloss, 0) AS res_loss
FROM source AS s, losses AS l
WHERE s.rid = l.rid + 1
)
SELECT ym, profitloss, iif(profitloss + res_loss > 0, profitloss + res_loss, 0) AS tax
FROM losses
ORDER BY ym;
I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.
I'm supposed to find the percentage of people having received aid.
I'm assuming the best way to do this is find the number rows who received 0 aid, and the number of rows that have a greater than 0 value, create two variables for those and divide accordingly to find the percentage. It's been a while since I've worked with sql so this is challenging me.
select
rprawrd_aidy_code as year,
sum(rprawrd_accept_amt)
from
rprawrd
where
rprawrd_aidy_code = '1819'
group by
rprawrd_aidy_code
This only gives me a total of the amount of aid provided for the year in question. I need to figure out the total rows that received vs the total that didnt.
If the only output you need from your script is that ratio, there are a few ways to go about this one:
WITH cte (awrd) AS(
SELECT
CASE WHEN rprawrd_accept_amt > 0 THEN 1.0
ELSE 0.0
END awrd
FROM rprawrd
WHERE rprawrd_ady_code = '1819'
)
SELECT SUM(awrd)/COUNT(awrd)
FROM cte
This will get you the percentage of people who received an award, but if you need to know the amounts as well you'll have to approach it differently.
I am trying to get a total hours from a dataset and because you can have the same asset with the same company (company_B) twice at two different times I have this join issue. I know I want the min for company_B gone and the Max for company_B gone because they represent wrong dates being matched. The negative is easy but what about the Max?
I have:
AssetID------StartDate-------FinishDate-------CompanyName----HoursOnSite
22222-------2016-02-12-------2016-02-20-------Company_A--------192
22222-------2016-02-01-------2016-02-09-------Company_B--------208 (keep)
22222-------2016-02-12-------2016-02-09-------Company_B-------(-56) (remove)
22222-------2016-02-01-------2016-02-21-------Company_B--------480 (remove)
22222-------2016-02-12-------2016-02-21-------Company_B--------216 (keep)
55555-------2016-02-18-------2016-02-22-------Company_C--------96
99584-------2016-02-22-------2016-02-25-------Company_D--------63
I think you can do the query for the records with max and min HoursOnSite for company B, and use (not in) or not equal to exclude those records.
If you still have concern, please paste your query.
I'm assuming that there has to be atleast 3 instances of unique assetid - companyname combination for the Max, Min filters to work. You can change it in the final where statement tO suit your requirement
WITH CTE
AS (
SELECT *
,count(CompanyName) OVER (PARTITION BY AssetID,CompanyName) AS a
FROM <TABLE_NAME>
)
SELECT *
FROM CTE
WHERE HoursOnSite NOT IN (
SELECT MAX(HoursOnSite)
FROM <TABLE_NAME>
)
AND gdp NOT IN (
SELECT min(HoursOnSite)
FROM <TABLE_NAME>
)
AND a > 2 --MODIFY AS PER YOUR REQUIREMENT
I found it hard to describe what I wanted to do in the title, but I will be more specific here.
I have a reasonably long query:
SELECT
/*Amount earned with validation to remove outlying figures*/
Case When SUM(t2.[ActualSalesValue])>=0.01 OR SUM(t2.[ActualSalesValue])<0 Then SUM(t2.[ActualSalesValue]) ELSE 0 END AS 'Amount',
/*Profit earned (is already calculated then input into db, this just pulls that figure*/
SUM(t2.[Profit]) AS 'Profit',
/*Product Type - pulls the product type so that we can sort by product*/
t1.[ucIIProductType] AS 'Product Type',
/*Profit Percentage - This is to calculate the percentage of profit based on the sales price which uses 2 different columns - Case ensures that there are no wild values appearing in the reports as previously experienced*/
Case When SUM(t2.[ActualSalesValue])>=0.01 OR SUM(t2.[ActualSalesValue])<0 THEN (SUM(t2.[Profit])/SUM(t2.[ActualSalesValue])) ELSE 0 END AS 'Profit Percentage',
/*Percentage of Turnover*/
*SUM(t2.[ActualSalesValue])/(Select SUM(t2.[ActualSalesValue]) OVER() FROM [_bvSTTransactionsFull]) AS 'PoT'
/*The join is connect the product type with the profit and the amount*/
FROM [dbo].[StkItem] AS t1
INNER JOIN [dbo].[_bvSTTransactionsFull] AS t2
/*There attirbutes are the links between the tables*/
ON t1.[StockLink]=t2.[AccountLink]
WHERE t2.[TxDate] BETWEEN '1/Aug/2014' AND '31/Aug/2014' AND ISNUMERIC(t2.[Account]) = 1
Group By t1.[ucIIProductType]
The 'Percentage of Turnover' part I am having trouble with - I am trying to calculate the percentage of the Amount based on the total amount - using the same column. So eg: I want to take the Amount value in row 1, then divide it by the total amount of the entire column and then have that value listed in a new column. But I keep getting errors or I Keep getting 1 (because it wants to divide the value by the same value. CAN anyone please advise me on proper syntax for solving this:
/*Percentage of Turnover*/
*SUM(t2.[ActualSalesValue])/(Select SUM(t2.[ActualSalesValue]) OVER() FROM [_bvSTTransactionsFull]) AS 'PoT'
I think you want one of the following:
SUM(t2.[ActualSalesValue])/(Select SUM(t.[ActualSalesValue]) FROM [_bvSTTransactionsFull] t) AS PoT
or:
SUM(t2.[ActualSalesValue])/(SUM(SUM(t2.[ActualSalesValue])) OVER() ) AS PoT
Note: you should use single quotes only for string and date constants, not for column and table names. If you need to escape names, use square braces.