My following SQLite database:
Explaining brazilian tax income:
a) if I have loss, I don't need to pay any tax (e.g.: January)
b) negative results can be subtracted from the next month positive outcome (e.g.: in February, instead of paying the full tax for $ 5942, the tax can be applied only to (5942 - 3200) = 2742.
c) if previous negative results are not sufficient to cover the next positive outcome, I got pay tax (e.g.: in September, I could compensate from June and July, but I had to aggregate from August (e.g.: total tax = -5000 -2185 +5000 +3000 = 815)
My goal would be build the following table:
I couldn't figure out a way to solve this problem. Any help?
Tks
You need to use recursive CTEs here. If you are not familiar with this feature you might check out my tutorial, the official documentation referenced in that tutorial, as well as any number of other tutorials available on the Internet.
First, I generate temporary row numbers using the row_number Window function in the source CTE block below (replace "RESULTS" with your table name).
Then I use recursive CTE (losses) to calculate residual loss from the previous months, which can be used to reduce your taxes. (This part might be tricky to understand if you are not familiar with recursive CTEs.) Finally, I calculate the total taxable amount adjusted for previous remaining loss if necessary.
WITH RECURSIVE
source AS (
SELECT row_number() OVER (ORDER BY ym) AS rid, *
FROM RESULTS
),
losses AS (
SELECT s.*, 0 AS res_loss
FROM source AS s
WHERE rid = 1
UNION ALL
SELECT s.*, iif(l.res_loss + l.profitloss < 0, l.res_loss + l.profitloss, 0) AS res_loss
FROM source AS s, losses AS l
WHERE s.rid = l.rid + 1
)
SELECT ym, profitloss, iif(profitloss + res_loss > 0, profitloss + res_loss, 0) AS tax
FROM losses
ORDER BY ym;
Related
I'm working with some payments balances for different clients and I want to use a calculated column in a sub-query. Breakdown of what I'm working with:
Step 1: I'm basically pulling clients id's, names, deposit balances and payment outflows.
Step 2: Sum(deposit balance)/count(distinct business accounts) - average deposits
Step 3: Sum(payment outflows)/count(distinct business accounts) - average outflows
Step 4: ??? Use the average outflows calculated in the previous column with a set of conditions to assign a multiplication factor ??? i.e.:
when avg outflow > x then multiplication_factor=2
when avg outflow > y then multiplication_factor=5
etc.
Would appreciate any suggestions on how to tackle step 4 and proceed.
I haven't used SQL in a while and here is what I've come up with so far:
Select parent_id, parent_name, deposits, payments
, sum(deposits)/count(distinct business_accts) avg_deposits
, sum(payments)/count(distinct business_accts) avg_payments
, ??? multiplication_factor (
case
when avg_payments > x then multiplication_factor=1
when avg_payments > y then multiplication_factor=2
etc.
end
)
FROM Table_pyb
WHERE something something;
I doubt if I can access the avg_payments columns in a sub-query... suggestions would be much appreciated :)
I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.
I'm supposed to find the percentage of people having received aid.
I'm assuming the best way to do this is find the number rows who received 0 aid, and the number of rows that have a greater than 0 value, create two variables for those and divide accordingly to find the percentage. It's been a while since I've worked with sql so this is challenging me.
select
rprawrd_aidy_code as year,
sum(rprawrd_accept_amt)
from
rprawrd
where
rprawrd_aidy_code = '1819'
group by
rprawrd_aidy_code
This only gives me a total of the amount of aid provided for the year in question. I need to figure out the total rows that received vs the total that didnt.
If the only output you need from your script is that ratio, there are a few ways to go about this one:
WITH cte (awrd) AS(
SELECT
CASE WHEN rprawrd_accept_amt > 0 THEN 1.0
ELSE 0.0
END awrd
FROM rprawrd
WHERE rprawrd_ady_code = '1819'
)
SELECT SUM(awrd)/COUNT(awrd)
FROM cte
This will get you the percentage of people who received an award, but if you need to know the amounts as well you'll have to approach it differently.
Please help me to optimize my SQL query.
I have a table with the fields: date, commodity_id, exp_month_id, exp_year, price, where the first 4 fields are the primary key. The months are designated with the alphabet-ordered letters: e.g. F (for Jan), G (for Feb.), H (for March), etc. Thus the letter of more distant from Jan. month will be larger than the letter of the less distant month (F < G < H < ...). Some commodity_ids have all 12 months in the table, some only 5 or 3, which are constant for all years.
I need to calculate the difference between prices (gradient) of the neighboring records in terms of exp_month_id, exp_year. As the first step, I want to define for every couple (exp_month_id, exp_year) the valid couple (next_month_id, next_year). The main problem here, that if the current exp_month_id is the last in the year, then next_year = exp_year + 1 and next_month_id should be the first one in the year.
I have written the following query to do the job:
WITH trading_months AS (
SELECT DISTINCT commodity_id,
exp_month_id
FROM futures
ORDER BY exp_month_id
)
SELECT DISTINCT f.commodity_id,
f.exp_month_id,
f.exp_year,
(
WITH [temp] AS (
SELECT exp_month_id
FROM trading_months
WHERE commodity_id = f.commodity_id
)
SELECT exp_month_id
FROM [temp]
WHERE exp_month_id > f.exp_month_id
UNION ALL
SELECT exp_month_id
FROM [temp]
LIMIT 1
)
AS next_month_id,
(
SELECT CASE WHEN EXISTS (
SELECT commodity_id,
exp_month_id
FROM trading_months
WHERE commodity_id = f.commodity_id AND
exp_month_id > f.exp_month_id
LIMIT 1
)
THEN f.exp_year ELSE f.exp_year + 1 END
)
AS next_year
FROM futures AS f
This query serves as a base for a dynamic table (view) which is subsequently used for calculating the gradient. However, the execution of this query takes more than one second and thus the whole process takes minutes. I wonder if you could help me optimizing the query.
Note: The following requires Sqlite 3.25 or newer for window function support:
Lack of sample data (Preferably as a CREATE TABLE and INSERT statements for easy importing) and expected results makes it hard to test, but if your end goal is computing the difference in prices between expiration dates (Making your question a bit of an XY problem, maybe something like:
SELECT date, commodity_id, price, exp_year, exp_month_id
, price - lag(price, 1) OVER (PARTITION BY commodity_id ORDER BY exp_year, exp_month_id) AS "change from last price"
FROM futures;
Thanks to the hint of #Shawn to use window functions I could rewrite the query in much shorter form:
CREATE VIEW "futures_nextmonths_win" AS
WITH trading_months AS (
SELECT DISTINCT commodity_id,
exp_month_id,
exp_year
FROM futures)
SELECT commodity_id,
exp_month_id,
exp_year,
lead(exp_month_id) OVER w AS next_month_id,
lead(exp_year) OVER w AS next_year
FROM trading_months
WINDOW w AS (PARTITION BY commodity_id ORDER BY exp_year, exp_month_id);
which is also slightly faster then the original one.
I have the following query, which even without a ton of data (~3k rows) is still a bit slow to execute, and the logic is a bit over my head - was hoping to get some help optimizing the query or even an alternate methodology:
Select companypartnumber, (PartTotal + IsNull(Cum_Lower_Ranks, 0) ) / Sum(PartTotal) over() * 100 as Cum_PC_Of_Total
FROM PartSalesRankings PSRMain
Left join
(
Select PSRTop.Item_Rank, Sum(PSRBelow.PartTotal) as Cum_Lower_Ranks
from partSalesRankings PSRTop
Left join PartSalesRankings PSRBelow on PSRBelow.Item_Rank < PSRTop.Item_Rank
Group by PSRTop.Item_Rank
) as PSRLowerCums on PSRLowerCums.Item_Rank = PSRMain.Item_Rank
The PartSalesRankings table simply consists of CompanyPartNumber(bigint) which is a part number designation, PartTotal(decimal 38,5) which is the total sales, and Item_Rank(bigint) which is the rank of the item based on total sales.
I'm trying to end up with my parts into categories based on their percentile - so an "A" item would be top 5%, a "B" item would be the next 15%, and "C" items would be the lower 80th percentile. The view I created works fine, it just takes almost three seconds to execute, which for my purposes is quite slow. I narrowed the bottle neck to the above query - any help would be greatly appreciated.
The problem you are having is the calculation of the cumulative sum of PartTotal. If you are using SQL Server 2012, you can do something like:
select (case when ratio <= 0.05 then 'A'
when ratio <= 0.20 then 'B'
else 'C'
end),
t.*
from (select psr.companypartnumber,
(sum(PartTotal) over (order by PartTotal) * 1.0 / sum(PartTotal) over ()) as ratio
FROM PartSalesRankings psr
) t
SQL Server 2012 also have percentile functions and other functions not in earlier versions.
In earlier versions, the question is how to get the cumulative sum efficiently. Your query is probably as good as anything that can be done in one query. Can the cumulative sum be calculated when partSalesRankings is created? Can you use temporary tables?