Aggregate function calls cannot be nested? - sql

In PostgreSQL database I have table called answers. This table stores information about how users answered a questions. There are only 4 question in the table. At the same time, the number of users who answered the questions can be dynamic and the user can answer only part of the questions.
Table answers:
| EMPLOYEE | QUESTION_ID | QUESTION_TEXT | OPTION_ID | OPTION_TEXT |
|----------|-------------|------------------------|-----------|--------------|
| Bob | 1 | Do you like soup? | 1 | Yes |
| Alex | 1 | Do you like soup? | 2 | No |
| Kate | 1 | Do you like soup? | 3 | I don't know |
| Bob | 2 | Do you like ice cream? | 1 | Yes |
| Alex | 2 | Do you like ice cream? | 3 | I don't know |
| Oliver | 2 | Do you like ice cream? | 1 | Yes |
| Bob | 3 | Do you like summer? | 2 | No |
| Alex | 3 | Do you like summer? | 1 | Yes |
| Jack | 3 | Do you like summer? | 2 | No |
| Bob | 4 | Do you like winter? | 3 | I don't know |
| Alex | 4 | Do you like winter? | 1 | Yes |
| Oliver | 4 | Do you like winter? | 3 | I don't know |
For example, with next code I can find average of the answers for question 1 and 2 of each person who answered for these questions.
select
employee,
avg(
case when question_id in (1, 2) then option_id else null end
) as average_score
from
answers
group by
employee
Result:
| EMPLOYEE | AVERAGE_SCORE |
|----------|---------------|
| Bob | 2 |
| Alex | 2,5 |
| Kate | 3 |
| Oliver | 1 |
Now, I want to know the number of users whose average of the answers for question 1 and 2 is >= than 2. I tried next code but it raise error:
select
count(
avg(
case when question_id in (1, 2) then option_id else null end
)
) as average_score
from
answers
where
average_score >= 2
group by
answers.employee
ERROR:
SQL Error [42803]: ERROR: aggregate function calls cannot be nested

You need to filter after aggregation. That uses a having clause. In Postgres, you can also use filter:
select employee,
avg(option_id) filter (where question_id in (1, 2)) as average_score
from answers
group by employee
having avg(option_id) filter (where question_id in (1, 2)) > 2;
If you want the count, then use this as a subquery: select count(*) from <the above query>.
It is strange that you are equating "option_id" with "score", but that is how your question is phrased.

you have to use having clause.. it can be done simply as
select employee, [Average Score] = avg(case when question_id in (1, 2)
then option_id else null
end
)
from answers group by employee having average_score > 2;
update :
It must work now...
select employee, average_score = avg(case when question_id in (1, 2)
then option_id else null
end
)
from answers group by employee having average_score > 2;

Related

What Clause would most optimally create this query?

So I don't have much experience with SQL, and am trying to learn. An interview question I came across had this question. I'm trying to learn more SQL but maybe I'm missing a piece of info to solve this? Or maybe I'm approaching the problem wrong.
This is the question:
We have following two tables , below is their info:
POLICY (id as int, policy_content as varchar2)
POLICY_VOTES (vote as boolean, policy_id as int)
Write a single query that returns the policy_id, number of yes(true) votes and number of no(false) votes with a row for each policy up for a vote stored
My first thought when approaching this was to use a WITH clause to get the policy_ids and use an inner join to get the votes for yes and no but I can't find a way to make it work, which is what leads me to believe that there's another clause in SQL I'm not aware of or couldn't find that would make it easier. Either that or I'm thinking of the problem in the wrong way.
Good question.
I cannot answer too specifically, since you did not specify a DBMS, but what you will want to do is count or situationally sum based on criteria. When you use an aggregate function like that, you also need GROUP BY.
Here are two example tables I made with test data:
policy
| id | policy_content |
|----|----------------|
| 1 | foo |
| 2 | foo |
| 3 | foo |
| 4 | foo |
| 5 | foo |
policy votes
| vote | policy_id |
|------|-----------|
| yes | 1 |
| no | 1 |
| yes | 2 |
| yes | 2 |
| no | 3 |
| no | 3 |
| no | 4 |
| yes | 4 |
| yes | 5 |
| yes | 5 |
Using the below query:
SELECT
policy_votes.policy_id,
SUM(CASE WHEN vote = 'yes' THEN 1 ELSE 0 END) AS yes_votes,
SUM(CASE WHEN vote = 'no' THEN 1 ELSE 0 END) AS no_votes
FROM
policy_votes
GROUP BY
policy_votes.policy_id
You get:
| POLICY_ID | YES_VOTES | NO_VOTES |
|-----------|-----------|----------|
| 1 | 1 | 1 |
| 2 | 2 | 0 |
| 4 | 1 | 1 |
| 5 | 2 | 0 |
| 3 | 0 | 2 |
Here is an SQL Fiddle for you to try it out.
Try this:
select p.id, p.content,
Count(case when pv.vote='true' then 1 end) as number_of_yes,
Count(case when pv.vote='false' then 1 end) as number_of_no
From policy p join policy_votes pv
On(p.id = pv.policy_id)
Group by p.id, p.content
Cheers!!

How correctly use AVG in query?

In PostgreSQL database I have table called answers which looks like this:
| EMPLOYEE | QUESTION_ID | QUESTION_TEXT | OPTION_ID | OPTION_TEXT |
|----------|-------------|------------------------|-----------|--------------|
| Bob | 1 | Do you like soup? | 1 | 1 |
| Alex | 1 | Do you like soup? | 9 | 9 |
| Oliver | 1 | Do you like soup? | 6 | 6 |
| Bob | 2 | Do you like ice cream? | 3 | 3 |
| Alex | 2 | Do you like ice cream? | 9 | 9 |
| Oliver | 2 | Do you like ice cream? | 8 | 8 |
| Bob | 3 | Do you like summer? | 2 | 2 |
| Alex | 3 | Do you like summer? | 9 | 9 |
| Oliver | 3 | Do you like summer? | 8 | 8 |
In this table you can notice that I have 3 question and user answers to them. Users answer questions on a scale of one to ten. I'm trying to find the number of users whose avg of answers to questions 1, 2 and 3 is greater than 5 without deep subquery. For example only 2 user has avg(option_text) for three question more than 5. They are Alex and Oliver.
I tried to use this script, but it's work not as I expected:
SELECT
SUM(CASE WHEN (AVG(OPTION_ID) FILTER(WHERE QUESTION_ID IN(61, 62))) > 5 THEN 1 ELSE 0 END) AS COUNT
FROM
ANSWERS;
ERROR:
SQL Error [42803]: ERROR: aggregate function calls cannot be nested
You can select all employees that have an average response of greater than 5 for questions 1,2,3 with a group by query
select employee, avg(option_id)
from answers
where question_id in (1,2,3)
group by employee
having avg(option_id) > 5
and count(distinct question_id) = 3
-- the last part is only needed if you only want employees that answered all questions
To count the number of users that have an average that's greater than 5
select count(*) from (
select employee
from answers
where question_id in (1,2,3)
group by employee
having avg(option_id) > 5
and count(distinct question_id) = 3
)
This following query should work-
SELECT
DISTINCT COUNT(*) OVER () AS CNT
FROM ANSWERS
WHERE QUESTION_ID NOT IN(61, 62)
GROUP BY EMPLOYEE
HAVING AVG(OPTION_ID) > 5
Check demo Here

SQL Select Rows where a value of a multiple value field occurs only once

I have the following problem:
I have a table with different columns describing objects. One of this column let's assume can contain the values 1,2,3,4,5,6,7,8,9,10. Within this table objects can contain all of these values or some just contain for example value 1,3,5 (so 0 to n values)
Now I want to find all the objects containing only the value 1 and 2, but I do not want them in my result set if they contain 1,2,3 or other combinations but (1,2).
How do I write this SQL statement?
Sample data (Result set to be expected --> Mark and Michael):
+---------+--------------------+---------------------------+--+
| OBJ | OBJ_CHARACTERISTIC | CHARACTERISTIC_DATE_ADDED | |
+---------+--------------------+---------------------------+--+
| Mark | 1 | 15.01.2018 | |
| Mark | 2 | 15.02.2018 | |
| Jimmy | 1 | 31.01.2018 | |
| Jimmy | 2 | 11.02.2018 | |
| Jimmy | 4 | 15.03.2018 | |
| Jimmy | 5 | 15.04.2018 | |
| Jimmy | 6 | 15.04.2018 | |
| Harry | 1 | 08.01.2018 | |
| Harry | 2 | 11.01.2018 | |
| Harry | 3 | 15.02.2018 | |
| Michael | 1 | 15.06.2018 | |
| Michael | 2 | 15.07.2018 | |
| Dwayne | 4 | 15.01.2018 | |
| Dwayne | 5 | 15.01.2018 | |
| Dwayne | 6 | 15.01.2018 | |
+---------+--------------------+---------------------------+--+
You could use analytic counts to see how many characteristics each object has, and how many of the ones you are looking for; and then compare those counts:
select obj, obj_characteristic, characteristic_date_added
from (
select obj, obj_characteristic, characteristic_date_added,
count(distinct obj_characteristic) over (partition by obj) as c1,
count(distinct case when obj_characteristic in (1,2) then obj_characteristic end)
over (partition by obj) as c2
from your_table
)
where c1 = c2;
With your sample data that gives:
OBJ OBJ_CHARACTERISTIC CHARACTERI
------- ------------------ ----------
Mark 1 2018-01-15
Mark 2 2018-02-15
Michael 1 2018-06-15
Michael 2 2018-07-15
From the way the question is worded it sounds like you want the complete rows, as above; froma comment you may only want the names. If so you can just change the outer select to:
select distinct obj
from ...
OBJ
-------
Mark
Michael
or use aggregates instead via a having clause:
select obj
from your_table
group by obj
having count(distinct obj_characteristic)
= count(distinct case when obj_characteristic in (1,2) then obj_characteristic end);
OBJ
-------
Mark
Michael
db<>fiddle demo of all three.
In this case, as 1 and 2 are contiguous, you could also do this with min/max, as an aggregate to just get the names:
select obj
from your_table
group by obj
having min (obj_characteristic) = 1
and max(obj_characteristic) = 2;
or analytically to get the complete rows:
select obj, obj_characteristic, characteristic_date_added
from (
select obj, obj_characteristic, characteristic_date_added,
min(obj_characteristic) over (partition by obj) as min_char,
max(obj_characteristic) over (partition by obj) as max_char
from your_table
)
where min_char = 1
and max_char = 2;
but the earlier versions are more generic.
If you are just looking for sql to return rows values '1,2' and nothing else use:
select * from table where column like '%1,2'
Post an example of the data, it may be more helpful to understand.
#dwin90 You could try:
SELECT obj
FROM your_table
WHERE (OBJ_CHARACTERISTIC=1 OR OBJ_HARACTERISTIC=2 AND OBJ_CHARACTERISTIC !> 2
)GROUP BY OBJ

SQL select values sum by same ID

here is my table called "Employee"
eID | name |
==============
1 | Mike |
2 | Josh |
3 | Mike |
And table called "Sells"
sID | eID | | price |
=========================
1 | 1 | | 8 |
2 | 3 | | 9 |
3 | 3 | | 5 |
4 | 1 | | 4 |
5 | 1 | | 3 |
This should be my expected result: returns the total income per employee
name | Income |
==================
Mike | 15 |
Josh | 0 |
Mike | 14 |
Actually, I know use the query "SUM...GROUP BY..." to get the incomes of 15 and 14, but I don't know how to get the income of 0 which is not shown on the "Sells" table.
Could someone give me some help? Thanks a lot.
You just need to use a left outer join, so you can get the sum for missing values too. You could use case expression to deal with null values
SELECT e.name,
COALESCE(SUM(price), 0) as Income
FROM employees e
LEFT OUTER JOIN sells s
ON e.eid = s.eid
GROUP BY e.eid, e.name
Edited: case expression is not needed. I put coalesce on the return of sum fuction, in order to deal with missing values (SUM over an empty set returns NULL)

Select rows where one column is within a day of another column

I have two tables from a site similar to SO: one with posts, and one with up/down votes for each post. I would like to select all votes cast on the day that a post was modified.
My tables layout is as seen below:
Posts:
-----------------------------------------------
| post_id | post_author | modification_date |
-----------------------------------------------
| 0 | David | 2012-02-25 05:37:34 |
| 1 | David | 2012-02-20 10:13:24 |
| 2 | Matt | 2012-03-27 09:34:33 |
| 3 | Peter | 2012-04-11 19:56:17 |
| ... | ... | ... |
-----------------------------------------------
Votes (each vote is only counted at the end of the day for anonymity):
-------------------------------------------
| vote_id | post_id | vote_date |
-------------------------------------------
| 0 | 0 | 2012-01-13 00:00:00 |
| 1 | 0 | 2012-02-26 00:00:00 |
| 2 | 0 | 2012-02-26 00:00:00 |
| 3 | 0 | 2012-04-12 00:00:00 |
| 4 | 1 | 2012-02-21 00:00:00 |
| ... | ... | ... |
-------------------------------------------
What I want to achieve:
-----------------------------------
| post_id | post_author | vote_id |
-----------------------------------
| 0 | David | 1 |
| 0 | David | 2 |
| 1 | David | 4 |
| ... | ... | ... |
-----------------------------------
I have been able to write the following, but it selects all votes on the day before the post modification, not on the same day (so, in this example, an empty table):
SELECT Posts.post_id, Posts.post_author, Votes.vote_id
FROM Posts
LEFT JOIN Votes ON Posts.post_id = Votes.post_id
WHERE CAST(Posts.modification_date AS DATE) = Votes.vote_date;
How can I fix it so the WHERE clause takes the day before Votes.vote_date? Or, if not possible, is there another way?
Depending on which type of database you are using (SQL, Oracle ect..);To take the Previous days votes you can usually just subtract 1 from the date and it will subtract exactly 1 day:
Where Cast(Posts.modification_date - 1 as Date) = Votes.vote_date
or if modification_date is already in date format just:
Where Posts.modification_date - 1 = Votes.vote_date
If you have a site similar to Stack Overflow, then perhaps you also use SQL Server:
SELECT p.post_id, p.post_author, v.vote_id
FROM Posts p LEFT JOIN
Votes v
ON p.post_id = v.post_id
WHERE CAST(DATEDIFF(day, -1, p.modification_date) AS DATE) = v.vote_date;
Different databases have different ways of subtracting one day. If this doesn't work, then your database has something similar.
I found another solution, which is to add a day to Posts.modification_date:
...
WHERE CAST(CEILING(CAST(p.modification_date AS FLOAT)) AS datetime) = v.vote_date