I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.
I am trying to normalize End-of-Day stock prices in PostgreSql.
Let's say I have a stock table defined as such:
create table eod (
date date not null,
stock_id int not null,
split decimal(16,8) not null,
close decimal(12,6) not null,
constraint pk_eod primary key (date, stock_id)
);
Data in this table might look like this:
"date","stock_id","eod_split","close"
"2014-06-13",14010920,"1.00000000","182.560000"
"2014-06-13",14010911,"1.00000000","91.280000"
"2014-06-13",14010923,"1.00000000","41.230000"
"2014-06-12",14010911,"1.00000000","92.290000"
"2014-06-12",14010920,"1.00000000","181.220000"
"2014-06-12",14010923,"1.00000000","40.580000"
"2014-06-11",14010920,"1.00000000","182.250000"
"2014-06-11",14010911,"1.00000000","93.860000"
"2014-06-11",14010923,"1.00000000","40.860000"
"2014-06-10",14010911,"1.00000000","94.250000"
"2014-06-10",14010923,"1.00000000","41.110000"
"2014-06-10",14010920,"1.00000000","184.290000"
"2014-06-09",14010920,"1.00000000","186.220000"
"2014-06-09",14010911,"7.00000000","93.700000"
"2014-06-09",14010923,"1.00000000","41.270000"
"2014-06-06",14010923,"1.00000000","41.480000"
"2014-06-06",14010911,"1.00000000","645.570000"
"2014-06-06",14010920,"1.00000000","186.370000"
"2014-06-05",14010920,"1.00000000","185.980000"
"2014-06-05",14010911,"1.00000000","647.350000"
"2014-06-05",14010923,"1.00000000","41.210000"
...
"2005-03-04",14010920,"1.00000000","92.370000"
"2005-03-04",14010911,"1.00000000","42.810000"
"2005-03-04",14010923,"1.00000000","25.170000"
"2005-03-03",14010923,"1.00000000","25.170000"
"2005-03-03",14010911,"1.00000000","41.790000"
"2005-03-03",14010920,"1.00000000","92.410000"
"2005-03-02",14010920,"1.00000000","92.920000"
"2005-03-02",14010923,"1.00000000","25.260000"
"2005-03-02",14010911,"1.00000000","44.121000"
"2005-03-01",14010920,"1.00000000","93.300000"
"2005-03-01",14010923,"1.00000000","25.280000"
"2005-03-01",14010911,"1.00000000","44.500000"
"2005-02-28",14010923,"1.00000000","25.160000"
"2005-02-28",14010911,"2.00000000","44.860000"
"2005-02-28",14010920,"1.00000000","92.580000"
"2005-02-25",14010923,"1.00000000","25.250000"
"2005-02-25",14010920,"1.00000000","92.800000"
"2005-02-25",14010911,"1.00000000","88.990000"
"2005-02-24",14010923,"1.00000000","25.370000"
"2005-02-24",14010920,"1.00000000","92.640000"
"2005-02-24",14010911,"1.00000000","88.930000"
"2005-02-23",14010923,"1.00000000","25.200000"
"2005-02-23",14010911,"1.00000000","88.230000"
"2005-02-23",14010920,"1.00000000","92.100000"
...
"2003-02-24",14010920,"1.00000000","78.560000"
"2003-02-24",14010911,"1.00000000","14.740000"
"2003-02-24",14010923,"1.00000000","24.070000"
"2003-02-21",14010920,"1.00000000","79.950000"
"2003-02-21",14010923,"1.00000000","24.630000"
"2003-02-21",14010911,"1.00000000","15.000000"
"2003-02-20",14010911,"1.00000000","14.770000"
"2003-02-20",14010920,"1.00000000","79.150000"
"2003-02-20",14010923,"1.00000000","24.140000"
"2003-02-19",14010920,"1.00000000","79.510000"
"2003-02-19",14010911,"1.00000000","14.850000"
"2003-02-19",14010923,"1.00000000","24.530000"
"2003-02-18",14010923,"2.00000000","24.960000"
"2003-02-18",14010911,"1.00000000","15.270000"
"2003-02-18",14010920,"1.00000000","79.330000"
"2003-02-14",14010911,"1.00000000","14.670000"
"2003-02-14",14010920,"1.00000000","77.450000"
"2003-02-14",14010923,"1.00000000","48.300000"
"2003-02-13",14010920,"1.00000000","75.860000"
"2003-02-13",14010911,"1.00000000","14.540000"
"2003-02-13",14010923,"1.00000000","46.990000"
Note the "split" column. When a split value other than 1 is recorded, it basically means that the stock shares split by that factor. IOW, when the split is 2.0, the number of the outstanding shares doubled, but the value of each individual share is halved from that point on. If the stock was worth $100 per share, it's now worth $50 per share.
If you graph this with raw numbers, this sort of thing is truly ugly. Sharp cliffs show up, when the overall value of the company did not significantly change... and when you have multiple splits, you end up with a graph that does not properly reflect the trending of the company, often by a large margin. In the above example, where there was a 2:1 split, your close prices for a stock would look something like 100, 100, 100, 50, 50, 50.
I want to use this table to create a "normalized" price, in a reasonably efficient manner (there's quite a few records to chunk through). Continuing the sample, this would show the stock prices at 50, 50, 50, 50, 50, 50. If there were multiple splits, the data should still be consistent and smooth, if we ignored actual market value changes.
My idea is, if I can create a CTE of a "running product" aggregate of the split value, going back in time, I can define date ranges per stock and what the modifier value to apply to the closing cost should be, then join that back to the eod table and select into a new table the adjusted close value for each stock.
...the problem is, I cannot wrap my head around how to do that in anything other than a whole bunch of temp tables and multi-step processes. I do not know of any built-in functionality to make this easier, either.
Can someone show me how I can generate the normalized data?
You don't need a CTE. You just need a cumulative product. Postgres doesn't have one built in. But, arithmetic to the rescue!
select eod.*,
exp(sum(ln(eod_split)) over (partition by stock_id order by date)) as cume_split,
(close *
exp(sum(ln(eod_split)) over (partition by stock_id order by date))
) as normalized_price
from eod;
Hilarious, looking for this solution, I find that an associate already asked about it. Here is the basic algebra behind this ingenious solution: https://blog.prepscholar.com/natural-log-rules
I have a query to pull clickthrough for a funnel, where if a user hit a page it records as "1", else NULL --
SELECT datestamp
,COUNT(visits) as Visits
,count([QE001]) as firstcount
,count([QE002]) as secondcount
,count([QE004]) as thirdcount
,count([QE006]) as finalcount
,user_type
,user_loc
FROM
dbname.dbo.loggingtable
GROUP BY user_type, user_loc
I want to have a column for each ratio, e.g. firstcount/Visits, secondcount/firstcount, etc. as well as a total (finalcount/Visits).
I know this can be done
in an Excel PivotTable by adding a "calculated field"
in SQL by grouping
in PowerPivot by adding a CalculatedColumn, e.g.
=IFERROR(QueryName[finalcount]/QueryName[Visits],0)
BUT I need give the report consumer the option of slicing by just user_type or just user_loc, etc, and excel will tend to ADD the proportions, which won't work b/c
SUM(A/B) != SUM(A)/SUM(B)
Is there a way in DAX/MDX/PowerPivot to add a calculated column/measure, so that it will be calculated as SUM(finalcount)/SUM(Visits), for any user-defined subset of the data (daterange, user type, location, etc.)?
Yes, via calculated measures. calculated columns are for creating values that you want to see on rows/columns/report header...calculated measures are for creating values that you want to see in the values section of a pivot table and can slice/dice by the columns in the model.
The easiest way would be to create 3 calculated "measures" in the calculation area of the powerpivot sheet.
TotalVisits:=SUM(QueryName[visits])
TotalFinalCount:=SUM(QueryName[finalcount])
TotalFinalCount2VisitsRatio:=[TotalFinalCount]/[TotalVisits]
You can then slice the calculated measure [TotalFinalCount2VisitsRatio] by user_type or just user_loc (or whatever) and the value will be calculated correctly. The difference here is that you are explicitly telling the xVelocity engine to SUM-then-DIVIDE. If you create the calculated column, then the engine thinks you want to DIVIDE-then-SUM.
Also, you don't have to break down the measure into 3 separate measures...it's just good practice. If you're interested in learning more, I'd recommend this book...the author is the PowerPivot/DAX guru and the book is very straightforward.