As the questions stated I am looking to make a WHERE clause affect a single column and am having issues. My query is as follows
Select exp, COUNT(grade), COUNT(exp)
FROM table
WHERE grade = 100
GROUP BY exp;
essentailly I want a output that groups by exp and gives a full count of everyone with that exp but in the second column shows only how many of those people got perfect scores. The problem is the current WHERE affects the COUNT(exp). Beginner to SQL so sorry if this is simple and thanks for any help.
You want conditional aggregation, which in Postgres uses filter:
SELECT exp, COUNT(*),
COUNT(*) filter (where grade = 100)
FROM table
GROUP BY exp;
Related
I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).
Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.
You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))
I am trying to calculate prevalence in sql.
kind of stuck in writing the code.
I want to make automative code.
I have check that I have 1453477 of sample size and number of people who has disease is 851451 using count.
The formula of calculating prevalence is no.of person who has disease/no.sample size.
select (COUNT(condition_id)/COUNT(person_id)) as prevalence
from disease
where condition_id=12345;
when I run above code, I get 1 as a output where I am suppose to get 0.5858.
Can some one please help me out?
Thanks!
In your current query you count the number of rows in the disease table, once using the column condition_id, once using the column person_id. But the number of rows is the same - this is why you get 1 as a result.
I think you need to find the number of different values for these columns. This can be done using count distinct:
select (COUNT(DISTINCT condition_id)/COUNT(DISTINCT person_id)) as prevalence
from disease
where condition_id=12345;
You can cast by
count(...)/count(...)::numeric(6,4) or
count(...)/count(...)::decimal
as two options.
Important point is apply cast to denominator or numerator part(in this case denominator), Do not apply to division as
(count(...)/count(...))::numeric(6,4) which again results an integer.
I am pretty sure that the logic that you want is something like this:
select avg( (condition_id = 12345)::int )
from disease;
Your version doesn't have the sample size, because you are filtering out people without the condition.
If you have duplicate people in the data, then this is a little more complicated. One method is:
select (count(distinct person_id) filter (where condition_id = 12345)::numeric /
count(distinct person_id
)
from disease;
I am new to SQL and have had pretty good luck figuring things out thus far but I am missing something in this query:
The question is how to return a distinct count from two columns using another column and the criteria if the value is greater than 0.
I have tried IF and AND operators (My current query returns a 0 not an error, and it works when only using one .shp criteria)
select count (distinct ti.TO_ADDRESS)
from ti
where ti.input_id = 'xxx_029_01z_c_zzzzbab_ecrm.shp'
and ti.input_id = 'xxx_030_01z_c_zzzzbab_ecrm.shp'
and ti.OPENED>0;
Thanks so much!!
I think you want two levels of aggregation:
select count(*)
from (select ti.TO_ADDRESS
from ti
where ti.input_id in ('xxx_029_01z_c_zzzzbab_ecrm.shp', 'xxx_030_01z_c_zzzzbab_ecrm.shp') and
ti.OPENED > 0
group by ti.TO_ADDRESS
having count(distinct ti.input_id) = 2 -- has both of them
) ti;
I'm trying to write an (Oracle) SQL query that, given an "agent_id", would give me a list of questions that agent has answered during an assessment, as well as an average score over all of the times that agent has answered those questions.
Note: I tried to design the query such that it would support multiple employees (so we can query at the store level), hence the "IN" condition in the where clause.
Here's what I have so far:
select question.question_id as questionId,
((sum(answer.answer) / count(answer.answer)) * 100) as avgScore
from SPMADMIN.SPM_QC_ASSESSMENT_ANSWER answer
join SPMADMIN.SPM_QC_QUESTION question
on answer.question_id = question.question_id
join SPMADMIN.SPM_QC_ASSESSMENT assessment
on answer.assessment_id = assessment.assessment_id
join SPMADMIN.SPM_QC_SUB_GROUP_TYPE sub_group
on question.sub_group_type_id = sub_group.sub_group_id
join SPMADMIN.SPM_QC_GROUP_TYPE theGroup
on sub_group.group_id = theGroup.group_id
where question.question_id in (select distinct question2.question_id
from SPMADMIN.SPM_QC_QUESTION question2
)
and question.bool_yn_active_flag = 'Y'
and assessment.agent_id in (?)
and answer.answer is not null
order by theGroup.page_order asc,
sub_group.group_order asc,
question.sub_group_order asc
Basically I would want to see:
|questionId|avgScore|
| 1 | 100 |
| 2 | 50 |
| 3 | 75 |
Such that every question that employee has ever answered is in the list of question indexes with their average score over all of the times they've answered it.
When I run it as is, I'm given a "ORA-00937: not a single-group group function" error. Any sort of combination of a "group by" clause I've added hasn't helped in the least.
When I run it removing the question.question_id as questionId, part of the select, it runs fine, but it shows their average score over all questions. I need it broken down by question.
Any help or pointers would be greatly appreciated.
When you have an aggregate function in the SELECT list (SUM and COUNT are aggregate functions), then any other columns in the SELECT list need to be in a GROUP BY clause. For example:
SELECT fi, COUNT(fo)
FROM fum
GROUP BY fi
The COUNT(fo) expression is an aggregate, the fi column is a non-aggregate. If you were to add another non-aggregate to the SELECT list, it would also need to be included in the GROUP BY. For example
SELECT TRUNC(fee), fi, COUNT(fo)
FROM fum
GROUP BY TRUNC(fee), fi
To be a little more precise, rather than say "columns in the SELECT list", we should actually say "all non-aggregate expressions in the SELECT list" will need to be included in the GROUP BY clause.
It's not your joins but your use of GROUP BY.
When you use a GROUP BY in SQL, the things you GROUP BY are the things which define the groups. Everything else you have in your SELECT have to be in aggregates which operate over the group.
You can also do aggregates over the entire set without a GROUP BY, but then every column will need to be within an aggregate function.
So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?
Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?
If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?
You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)