SQL Select Rows where a value of a multiple value field occurs only once - sql

I have the following problem:
I have a table with different columns describing objects. One of this column let's assume can contain the values 1,2,3,4,5,6,7,8,9,10. Within this table objects can contain all of these values or some just contain for example value 1,3,5 (so 0 to n values)
Now I want to find all the objects containing only the value 1 and 2, but I do not want them in my result set if they contain 1,2,3 or other combinations but (1,2).
How do I write this SQL statement?
Sample data (Result set to be expected --> Mark and Michael):
+---------+--------------------+---------------------------+--+
| OBJ | OBJ_CHARACTERISTIC | CHARACTERISTIC_DATE_ADDED | |
+---------+--------------------+---------------------------+--+
| Mark | 1 | 15.01.2018 | |
| Mark | 2 | 15.02.2018 | |
| Jimmy | 1 | 31.01.2018 | |
| Jimmy | 2 | 11.02.2018 | |
| Jimmy | 4 | 15.03.2018 | |
| Jimmy | 5 | 15.04.2018 | |
| Jimmy | 6 | 15.04.2018 | |
| Harry | 1 | 08.01.2018 | |
| Harry | 2 | 11.01.2018 | |
| Harry | 3 | 15.02.2018 | |
| Michael | 1 | 15.06.2018 | |
| Michael | 2 | 15.07.2018 | |
| Dwayne | 4 | 15.01.2018 | |
| Dwayne | 5 | 15.01.2018 | |
| Dwayne | 6 | 15.01.2018 | |
+---------+--------------------+---------------------------+--+

You could use analytic counts to see how many characteristics each object has, and how many of the ones you are looking for; and then compare those counts:
select obj, obj_characteristic, characteristic_date_added
from (
select obj, obj_characteristic, characteristic_date_added,
count(distinct obj_characteristic) over (partition by obj) as c1,
count(distinct case when obj_characteristic in (1,2) then obj_characteristic end)
over (partition by obj) as c2
from your_table
)
where c1 = c2;
With your sample data that gives:
OBJ OBJ_CHARACTERISTIC CHARACTERI
------- ------------------ ----------
Mark 1 2018-01-15
Mark 2 2018-02-15
Michael 1 2018-06-15
Michael 2 2018-07-15
From the way the question is worded it sounds like you want the complete rows, as above; froma comment you may only want the names. If so you can just change the outer select to:
select distinct obj
from ...
OBJ
-------
Mark
Michael
or use aggregates instead via a having clause:
select obj
from your_table
group by obj
having count(distinct obj_characteristic)
= count(distinct case when obj_characteristic in (1,2) then obj_characteristic end);
OBJ
-------
Mark
Michael
db<>fiddle demo of all three.
In this case, as 1 and 2 are contiguous, you could also do this with min/max, as an aggregate to just get the names:
select obj
from your_table
group by obj
having min (obj_characteristic) = 1
and max(obj_characteristic) = 2;
or analytically to get the complete rows:
select obj, obj_characteristic, characteristic_date_added
from (
select obj, obj_characteristic, characteristic_date_added,
min(obj_characteristic) over (partition by obj) as min_char,
max(obj_characteristic) over (partition by obj) as max_char
from your_table
)
where min_char = 1
and max_char = 2;
but the earlier versions are more generic.

If you are just looking for sql to return rows values '1,2' and nothing else use:
select * from table where column like '%1,2'
Post an example of the data, it may be more helpful to understand.

#dwin90 You could try:
SELECT obj
FROM your_table
WHERE (OBJ_CHARACTERISTIC=1 OR OBJ_HARACTERISTIC=2 AND OBJ_CHARACTERISTIC !> 2
)GROUP BY OBJ

Related

More efficient way to SELECT rows from PARTITION BY

Suppose I have the following table:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
+----+-------------+-------------+
My desired results are:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+----+-------------+-------------+
My current solution is:
SELECT
*
FROM
(SELECT
id,
step_number,
MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
employee_id
FROM
table_name) AS t
WHERE
t.step_number = t.min_step_number
Is there a more efficient way I could be doing this?
I'm currently using postgresql, version 12.
In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:
select distinct on (id) t.*
from mytbale t
order by id, step_number
This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).
Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).

Aggregate function calls cannot be nested?

In PostgreSQL database I have table called answers. This table stores information about how users answered a questions. There are only 4 question in the table. At the same time, the number of users who answered the questions can be dynamic and the user can answer only part of the questions.
Table answers:
| EMPLOYEE | QUESTION_ID | QUESTION_TEXT | OPTION_ID | OPTION_TEXT |
|----------|-------------|------------------------|-----------|--------------|
| Bob | 1 | Do you like soup? | 1 | Yes |
| Alex | 1 | Do you like soup? | 2 | No |
| Kate | 1 | Do you like soup? | 3 | I don't know |
| Bob | 2 | Do you like ice cream? | 1 | Yes |
| Alex | 2 | Do you like ice cream? | 3 | I don't know |
| Oliver | 2 | Do you like ice cream? | 1 | Yes |
| Bob | 3 | Do you like summer? | 2 | No |
| Alex | 3 | Do you like summer? | 1 | Yes |
| Jack | 3 | Do you like summer? | 2 | No |
| Bob | 4 | Do you like winter? | 3 | I don't know |
| Alex | 4 | Do you like winter? | 1 | Yes |
| Oliver | 4 | Do you like winter? | 3 | I don't know |
For example, with next code I can find average of the answers for question 1 and 2 of each person who answered for these questions.
select
employee,
avg(
case when question_id in (1, 2) then option_id else null end
) as average_score
from
answers
group by
employee
Result:
| EMPLOYEE | AVERAGE_SCORE |
|----------|---------------|
| Bob | 2 |
| Alex | 2,5 |
| Kate | 3 |
| Oliver | 1 |
Now, I want to know the number of users whose average of the answers for question 1 and 2 is >= than 2. I tried next code but it raise error:
select
count(
avg(
case when question_id in (1, 2) then option_id else null end
)
) as average_score
from
answers
where
average_score >= 2
group by
answers.employee
ERROR:
SQL Error [42803]: ERROR: aggregate function calls cannot be nested
You need to filter after aggregation. That uses a having clause. In Postgres, you can also use filter:
select employee,
avg(option_id) filter (where question_id in (1, 2)) as average_score
from answers
group by employee
having avg(option_id) filter (where question_id in (1, 2)) > 2;
If you want the count, then use this as a subquery: select count(*) from <the above query>.
It is strange that you are equating "option_id" with "score", but that is how your question is phrased.
you have to use having clause.. it can be done simply as
select employee, [Average Score] = avg(case when question_id in (1, 2)
then option_id else null
end
)
from answers group by employee having average_score > 2;
update :
It must work now...
select employee, average_score = avg(case when question_id in (1, 2)
then option_id else null
end
)
from answers group by employee having average_score > 2;

How to select values, where each one depends on a previously aggregated state?

I have the following table:
|-----|-----|
| i d | val |
|-----|-----|
| 1 | 1 |
|-----|-----|
| 2 | 4 |
|-----|-----|
| 3 | 3 |
|-----|-----|
| 4 | 7 |
|-----|-----|
Can I get the following output:
|-----|
| sum |
|-----|
| 1 |
|-----|
| 5 |
|-----|
| 8 |
|-----|
| 1 5 |
|-----|
using a single SQLite3 SELECT-query? I know it could be easily achieved using variables, but SQLite3 lacks those. Maybe some recursive query? Thanks.
No.
In a relational database table rows do not have any order. If you specify an order for the rows, then it's possible to write a query.
Now, you could add an extra column to sort the rows. For example:
| val | sort
|-----|-----
| 1 | 10
| 4 | 20
| 3 | 30
| 7 | 40
The query could be:
select
sum(val) over(order by sort)
from my_table
For the updated question, you can write:
select
sum(val) over(order by id)
from my_table
By using the order of the id column and if you want only the sum column, you can do this:
select (select sum(val) from tablename where id <= t.id) sum
from tablename t

Need PIVOT or CASE solution for converting columns to rows (NON-Dynamic if possible)

So I have a table that has data such as this:
SCHD_ID | INST_ID |
|---------|---------|
| 1001 | Mike |
| 1001 | Ted |
| 1001 | Chris |
| 1002 | Jill |
| 1002 | Jamie |
| 1003 | Brad |
| 1003 | Carl |
| 1003 | Drew |
| 1003 | Nick |
I need to come up with a query to display the data like below:
|SCHD_ID | INST 1 | INST 2 | INST 3 |
|---------|--------|--------|--------|
| 1001 | Mike | Ted | Chris |
| 1002 | Jill | Jamie | Null |
| 1003 | Brad | Carl | Drew |
I have tried looking into all the pivot descriptions and some case examples but everything seems to use a common repeated value to pivot around. This is one of those cases where the columns need to be dynamic, but only to a point. I can drop off any data after the third instructor. In the example above I did not put in a column for INST 4 for SCHD_ID 1003 even though in my data set example it existed. Can adding in a restraint like this make it possible to come up with a non dynamic solution for the pivot/case statement?
Thanks for the help,
Dwayne
You can do this using row_number() and conditional aggregation. However, your data doesn't have an ordering column, so you cannot guarantee which three instructors you will get:
select schd_id,
max(case when seqnum = 1 then inst_id end) as inst1,
max(case when seqnum = 2 then inst_id end) as inst2,
max(case when seqnum = 3 then inst_id end) as inst3
from (select t.*,
row_number() over (partition by schd_id order by sched_id) as seqnum
from table t
) t
group by SCHD_ID ;
If you have a priority ordering for choosing the instructors, then put the logic in the order by clause.

selecting data with highest field value in a field

I have a table, and I'd like to select rows with the highest value. For example:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 4 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
Expected result:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
How may I do so? I assume it can be done by some oracle function I am not aware of?
Thanks in advance :-)
You can use MAX() function for that with grouping user column like this:
SELECT "user"
,MAX("index") AS "index"
FROM Table1
GROUP BY "user"
ORDER BY "user";
Result:
| USER | INDEX |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
See this SQLFiddle
if you have more than one column
select user , index
from (
select u.* , row_number() over (partition by user order by index desc) as rnk
from some_table u)
where rnk = 1
user is a reserved word - you should use a different name for the column.
select user,max(index) index from tbl
group by user;
Alternatively, you can use analytic functions:
select user,index, max(index) over (partition by user order by 1 ) highest from YOURTABLE
Note: Try NOT to use words like user, index, date etc.. as your column names, as they are reserved words for Oracle. If you will use, then use them with quotation marks, eg. "index", "date"...