Is there a way to find percentage of non-zero vs zero values in one column? - sql

I'm supposed to find the percentage of people having received aid.
I'm assuming the best way to do this is find the number rows who received 0 aid, and the number of rows that have a greater than 0 value, create two variables for those and divide accordingly to find the percentage. It's been a while since I've worked with sql so this is challenging me.
select
rprawrd_aidy_code as year,
sum(rprawrd_accept_amt)
from
rprawrd
where
rprawrd_aidy_code = '1819'
group by
rprawrd_aidy_code
This only gives me a total of the amount of aid provided for the year in question. I need to figure out the total rows that received vs the total that didnt.

If the only output you need from your script is that ratio, there are a few ways to go about this one:
WITH cte (awrd) AS(
SELECT
CASE WHEN rprawrd_accept_amt > 0 THEN 1.0
ELSE 0.0
END awrd
FROM rprawrd
WHERE rprawrd_ady_code = '1819'
)
SELECT SUM(awrd)/COUNT(awrd)
FROM cte
This will get you the percentage of people who received an award, but if you need to know the amounts as well you'll have to approach it differently.

Related

How to fix this column doesn't exist error in SQL?

In the sales table, three columns are btl_price, bottle_qty, and total. The total for a transaction should be the product of btl_price and bottle_qty. How many transactions have a value of total that is not equal to btl_price times bottle_qty?
Here is the table:
Here are my codes:
sql = """
Select (btl_price*bottle_qty) As total_sale, CAST(total AS money)
From sales
Where total != total_sale
"""
It keeps telling me "column "total_sale" does not exist".
Please help me to identify my mistakes.
PS: I code this in Jupyter Notebook. This is a practice of mine not in any DBMS.
You cannot use columns computed in the SELECT clause in the WHERE clause (in SQL, the matter is evaluated before the former).
Also, you need proper type casting to compare money and numbers.
Finally, you need to turn on aggregation to compute the number of sales that satisfy the condition.
Assuming that you are using Postgres, that would be:
select count(*)
from sales
where total::numeric <> btl_price::numeric * btl_quantity
Try this:
SELECT *
FROM sales
WHERE total !=(btl_price * bottle_qty)
Good luck

how to calculate prevalence using sql code

I am trying to calculate prevalence in sql.
kind of stuck in writing the code.
I want to make automative code.
I have check that I have 1453477 of sample size and number of people who has disease is 851451 using count.
The formula of calculating prevalence is no.of person who has disease/no.sample size.
select (COUNT(condition_id)/COUNT(person_id)) as prevalence
from disease
where condition_id=12345;
when I run above code, I get 1 as a output where I am suppose to get 0.5858.
Can some one please help me out?
Thanks!
In your current query you count the number of rows in the disease table, once using the column condition_id, once using the column person_id. But the number of rows is the same - this is why you get 1 as a result.
I think you need to find the number of different values for these columns. This can be done using count distinct:
select (COUNT(DISTINCT condition_id)/COUNT(DISTINCT person_id)) as prevalence
from disease
where condition_id=12345;
You can cast by
count(...)/count(...)::numeric(6,4) or
count(...)/count(...)::decimal
as two options.
Important point is apply cast to denominator or numerator part(in this case denominator), Do not apply to division as
(count(...)/count(...))::numeric(6,4) which again results an integer.
I am pretty sure that the logic that you want is something like this:
select avg( (condition_id = 12345)::int )
from disease;
Your version doesn't have the sample size, because you are filtering out people without the condition.
If you have duplicate people in the data, then this is a little more complicated. One method is:
select (count(distinct person_id) filter (where condition_id = 12345)::numeric /
count(distinct person_id
)
from disease;

Query for grouping of successful attempts when order matters

Let's say, for example, I have a db table Jumper for tracking high jumpers. It has three columns of interest: attempt_id, athlete, and result (a boolean for whether the jumper cleared the bar or not).
I want to write a query that will compare all athletes' performance across different attempts yielding a table with this information: attempt number, number of cleared attempts, total attempts. In other words, what is the chance that an athlete will clear the bar on x attempt.
What is the best way of writing this query? It is trickier than it would seem at first because you need to determine the attempt number for each athlete to be able to total the final totals.
I would prefer answers be written with Django ORM, but SQL will also be accepted.
Edit: To be clear, I need it to be grouped by attempt, not by athlete. So it would be all athletes' combined x attempt.
You could solve it using SQL:
SELECT t.attempt_id,
SUM(CASE t.result WHEN TRUE THEN 1 ELSE 0 END) AS cleared,
COUNT(*) AS total
FROM Jumper t
GROUP BY t.attempt_id
EDIT: If attempt_id is just a sequence, and you want to use it to calculate the attempt number for each jumper, you could use this query instead:
SELECT t.attempt_number,
SUM(CASE t.result WHEN TRUE THEN 1 ELSE 0 END) AS cleared,
COUNT(*) AS total
FROM (SELECT s.*,
ROW_NUMBER() OVER(PARTITION BY athlete
ORDER BY attempt_id) AS attempt_number
FROM Jumper s) t
GROUP BY t.attempt_number
This way, you group every first attempt from all athletes, every second attempt from all athletes, and so on...

SQL Percentage of Occurrences

I'm working on some SQL code as part of my University work. The data is factitious just to be clear. I'm trying to count the occurances of 1 & 0 in the SQL table Fact_Stream, this is stored in the Free_Stream column/attribute as a Boolean/bit value.
As calculations cant be made on bit values (at least in the way I'm trying) I've converted the value to an integer -- Just to be clear on that. The table contains information on a streaming companies streams, a 1 indicates the stream was free of charge, a 0 indicates the stream was paid for. My code:
SELECT Fact_Stream.Free_Stream, ((CAST(Free_Stream AS INT)) / COUNT(*) * 100) As 'Percentage of Streams'
FROM Fact_Stream
GROUP BY Free_Stream
The result/output is nearly where I want it to be, but it doesn't display the percentage correctly.
Output:
Using MS SQL Management Studio | MS SQL Server 2012 (I believe)
The percentage should be based on all rows, so you need to divide the count per 1/0 by a count of all rows. The easiest way to get this is utilizing a Windowed Aggregate Function:
SELECT Fact_Stream.Free_Stream,
100.0 * COUNT(*) -- count per bit
/ SUM(COUNT(*)) OVER () -- sum of those counts = count of all rows
As "Percentage of Streams"
FROM Fact_Stream
GROUP BY Free_Stream
You have INTs as a devisor and devidened(not sure I am correct with namings). So the result is also INT. Just cast one of those to decimal(notice how did I change to 100.0). Also you should debide count of elements in group to total count of rows in the table:
select Free_Stream,
(count(*) / (select count(*) from Free_Stream)) * 100.0 as 'Percentage of Streams'
from Fact_Stream
group by Free_Stream
Your equation is dividing the identifier (1 or 0) by the number of streams for each one, instead of dividing the count of free or paid by the total count. One way to do this is to get the total count first, then use it in your query:
declare #totalcount real;
select #totalcount = count(*) from Fact_Stream;
SELECT Fact_Stream.Free_Stream,
(Cast(Count(*) as real) / #totalcount)*100 AS 'Percentage of Streams'
FROM Fact_Stream
group by Fact_Stream.Free_Stream

How to accurately figure percentage of a finite total

I’m in the process of creating a report that will tell end users what percentage of a gridview (the total number of records is a finite number) has been completed in a given month. I have a gridview with records that I’ve imported and users have to go into each record and update a couple of fields. I’m attempting to create a report tell me what percentage of the grand total of records was completed in a given month. All I need is the percentage. The grand total is (for this example) is 2000.
I’m not sure if the actual gridview information/code is needed here but if it does, let me know and I’ll add it.
The problem is that I have been able to calculate the percentage total but when its displayed the percentage total is repeated for every single line in the table. I’m scratching my head on how to make this result appear only once.
Right now here’s what I have for my SQL code (I use nvarchar because we import from many non windows systems and get all sorts of extra characters and added spaces to our information):
Declare #DateCount nvarchar(max);
Declare #DivNumber decimal(5,1);
SET #DivNumber = (.01 * 2541);
SET #DateCount = (SELECT (Count(date_record_entered) FROM dbo.tablename WHERE date_record_entered IS NOT NULL and date_record_entered >= 20131201 AND date_record_entered <= 20131231);
SELECT CAST(ROUND(#DivNumber / #DateCount, 1) AS decimal(5,1) FROM dbo.tablename
Let’s say for this example the total number of records in the date_record_entered for the month of December is 500.
I’ve tried the smaller pieces of code separately with no success. This is the most recent thing I’ve tried.
I know I'm missing something simple here but I'm not sure what.
::edit::
What I'm looking for as the expected result of my query is to have a percentage represented of records modified in a given month. If 500 records were done that would be 25%. I just want to have the 25 (and trainling decimal(s) when it applies) showing once and not 25 showing for every row in this table.
The following query should provide what you are looking for:
Declare #DivNumber decimal(5,1);
SET #DivNumber = (.01 * 2541);
SELECT
CAST(ROUND(#DivNumber / Count(date_record_entered), 1) AS decimal(5,1))
FROM dbo.tablename
WHERE date_record_entered IS NOT NULL
and date_record_entered >= 20131201
AND date_record_entered <= 20131231
Why do you select the constant value cast(round(#divNumber / #DateCount, 1) as decimal(5,1)from the table? That's the cause of your problem.
I'm not too familiar with sql server, but you might try to just select without a from clause.
select emp.Natinality_Id,mstn.Title as Nationality,count(emp.Employee_Id) as Employee_Count,
count(emp.Employee_Id)* 100.0 /nullif(sum(count(*)) over(),0) as Nationality_Percentage
FROM Employee as emp
left join Mst_Natinality as mstn on mstn.Natinality_Id = emp.Natinality_Id
where
emp.Is_Deleted='false'
group by emp.Natinality_Id,mstn.Title