how you doing?
I'm trying to get the average from a count. The metric is a string, so I get an error.
I've tried some ways, but I can't. Thanks for your help
This is the code
SELECT
user_type, // works fine
newsletter, // works fine
COUNT (newsletter) as total, // Works fine
AVG (newsletter) as percentage, // Error. No matching signature for aggregate function AVG: Supported signatures AVG(INT64), AVG(NUMERICAL), AVG (FLOAT64)
This is what I've unsuccessfully tried
AVG (newsletter) as percentage
AVG (CAST (newsletter as INT64)) as percentage
COUNT(newsletter) / SUM(newsletter)
I would like to get a table like this
user_type | newsletter | total | percentage
free. yes. 4. x%
premium. yes. 7. x%
To get the ratio of the current row to the whole table...
you already have the value for each individual row
use window functions to get the total for the whole table
then divide the two
(With a "window" of () to represent the whole table)
x * 1.0 / SUM(x) OVER ()
In your case, x is COUNT(newsletter) which gives...
COUNT(newsletter) * 1.0 / SUM(COUNT(newsletter)) OVER ()
If you want to get the count of newsletter having value yes then you can use case when expression
SELECT
user_type, -- works fine
newsletter, -- works fine
COUNT (newsletter) as total, -- Works fine
sum (case when newsletter ='yes' then 1 else 0 end)
from yourtable
Related
Using the function below, when you first rate a title, and then afterwards give a different rating for the title the averagerating for that title will be incorrect.
The number of votes for title 'tt9910206' is 4, and averagerating is 8 before the function is called.
When calling the function the first time, and giving a rating of 7, the numvotes is 5 and the expected average is 7,8, which the function does return. But when changing the rating from 7 to 8 the expected result is 8, yet the function returns 7,84.
I suspect it's because the function doesnt take into account that the user has undone their rating, when recalculating the average.
How do I fix this, so when a user changes their rating, the function recalculates the average by using the averagerating number from before a user gave it a rating in the first place?
Edit: Found the answer here
Finding an average after replacing a current value
You may have other issues in your function. But you seem to be forgetting that Postgres does integer division. So, 1/2 is 0, not 0.5.
I notice calculations like this:
count( user_titlerate.tconst ) / count( user_titlerate.tconst )
COUNT() returns and integer. I usually modify this to:
count( user_titlerate.tconst ) * 1.0 / count( user_titlerate.tconst )
But you can use other constructs such as:
count( user_titlerate.tconst )::numeric / count( user_titlerate.tconst )
I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.
I have a table that includes the rows Data, Gender, Age Group and Number of Fans. I need to show the split of page fans across age groups in %.
So far, I have been able to limit the data to the newest data (The most recent entry is 2018-10-06,) but have been unable to perform -- what I assume is needed -- a window function to group the genders (M, F, U) together and to then find the percent per age group. I greatly appreciate any help. Here is as far as I have gotten with success:
SELECT *
FROM fanspergenderage
WHERE fanspergenderage.date >= '2018-10-16'
GROUP BY fanspergenderage.gender, fanspergenderage.agegroup;
Here
I need to show the split of page fans across age groups in %.
I interpret this as the proportion of all fans in each age group. You seem to be asking for something like this:
SELECT f.agegroup,
COUNT(*) as num_fans,
COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () as ratio
FROM fanspergenderage f
WHERE f.date >= '2018-10-16'
GROUP BY f.fanspergenderage;
The * 1.0 is because some databases do integer division.
I'm supposed to find the percentage of people having received aid.
I'm assuming the best way to do this is find the number rows who received 0 aid, and the number of rows that have a greater than 0 value, create two variables for those and divide accordingly to find the percentage. It's been a while since I've worked with sql so this is challenging me.
select
rprawrd_aidy_code as year,
sum(rprawrd_accept_amt)
from
rprawrd
where
rprawrd_aidy_code = '1819'
group by
rprawrd_aidy_code
This only gives me a total of the amount of aid provided for the year in question. I need to figure out the total rows that received vs the total that didnt.
If the only output you need from your script is that ratio, there are a few ways to go about this one:
WITH cte (awrd) AS(
SELECT
CASE WHEN rprawrd_accept_amt > 0 THEN 1.0
ELSE 0.0
END awrd
FROM rprawrd
WHERE rprawrd_ady_code = '1819'
)
SELECT SUM(awrd)/COUNT(awrd)
FROM cte
This will get you the percentage of people who received an award, but if you need to know the amounts as well you'll have to approach it differently.
I'm working on some SQL code as part of my University work. The data is factitious just to be clear. I'm trying to count the occurances of 1 & 0 in the SQL table Fact_Stream, this is stored in the Free_Stream column/attribute as a Boolean/bit value.
As calculations cant be made on bit values (at least in the way I'm trying) I've converted the value to an integer -- Just to be clear on that. The table contains information on a streaming companies streams, a 1 indicates the stream was free of charge, a 0 indicates the stream was paid for. My code:
SELECT Fact_Stream.Free_Stream, ((CAST(Free_Stream AS INT)) / COUNT(*) * 100) As 'Percentage of Streams'
FROM Fact_Stream
GROUP BY Free_Stream
The result/output is nearly where I want it to be, but it doesn't display the percentage correctly.
Output:
Using MS SQL Management Studio | MS SQL Server 2012 (I believe)
The percentage should be based on all rows, so you need to divide the count per 1/0 by a count of all rows. The easiest way to get this is utilizing a Windowed Aggregate Function:
SELECT Fact_Stream.Free_Stream,
100.0 * COUNT(*) -- count per bit
/ SUM(COUNT(*)) OVER () -- sum of those counts = count of all rows
As "Percentage of Streams"
FROM Fact_Stream
GROUP BY Free_Stream
You have INTs as a devisor and devidened(not sure I am correct with namings). So the result is also INT. Just cast one of those to decimal(notice how did I change to 100.0). Also you should debide count of elements in group to total count of rows in the table:
select Free_Stream,
(count(*) / (select count(*) from Free_Stream)) * 100.0 as 'Percentage of Streams'
from Fact_Stream
group by Free_Stream
Your equation is dividing the identifier (1 or 0) by the number of streams for each one, instead of dividing the count of free or paid by the total count. One way to do this is to get the total count first, then use it in your query:
declare #totalcount real;
select #totalcount = count(*) from Fact_Stream;
SELECT Fact_Stream.Free_Stream,
(Cast(Count(*) as real) / #totalcount)*100 AS 'Percentage of Streams'
FROM Fact_Stream
group by Fact_Stream.Free_Stream