Oracle SQL -- What's wrong with this grouping? - sql

I am trying to grab a row that has the max of some column. Normally I'd use Rank for this and just select rank = 1 but that seems pointless when I know I just need the max of a column. Here is my SQL:
SELECT
name,
value,
MAX(version)
FROM
my_table t
WHERE
person_type = "STUDENT"
GROUP by NAME,VALUE
HAVING version = max(version)
This returns the "You've done something wrong involving grouping error" i.e. "not a GROUP BY expression" when trying to run. If I add version to the group by field, this SQL runs, but it obviously returns all rows instead of just the max version of each.
So my question is mostly "Why doesn't this work?" I am selecting the max of version so I don't see why I need to group by it. I know there are other solutions (partition over, rank ...) but I am more interested in why this in particular is flawed syntactically.
EDIT: More explicit about the use of this having clause.
Let's say there are these two rows in table t:
NAME VALUE VERSION
JEREMY C 1
JEREMY A 2
What is returned from this query should be:
JEREMY A 2
But if I remove having then I would get:
JEREMY A 2
JEREMY C 2

The HAVING clause, in general, needs to contain columns that are produced by the group by. In fact, you can think of the HAVING clause as a WHERE on the group by.
That is, the query:
select <whatever>
from t
group by <whatever>
having <some condition>
is equivalent to:
select <whatever>
from (select <whatever>
from t
group by <whatever
) t
where <some condition>
If you think about it this way, you'll realize that max(version) makes sense because it is an aggregated value. However, "version" does not make sense, since it is neither a calculated value nor a group by column.
You seem to know how to fix this. The one other comment is that some databases (notably mysql) would accept your syntax. They treat "HAVING version = max(version)" as "HAVING any(version) = max(version)".

This SQL statement fails because the HAVING clause runs after the GROUP BY-- it can only operate on either aggregates or columns that are listed in the GROUP BY clause. If you have only grouped by NAME and VALUE, VERSION alone has no meaning-- it has many possible values for every combination of NAME and VALUE at that point so it doesn't make sense to compare it to MAX(version) or any other aggregate which has exactly 1 value for every NAME and VALUE pair.

You're trying to use version in your HAVING clause, but it's not being grouped by.
If all you want is the name, value and max version, you don't need the HAVING clause at all.
SELECT
name,
value,
MAX(version)
FROM
my_table t
WHERE
person_type = "STUDENT"
GROUP by NAME,VALUE
The HAVING clause is for when you want to have a "Where" clause after aggregation, like
HAVING max(version) > 5
EDIT:
Based on your sample data, you're grouping by VALUE but what you really want to do is identify the VALUE that has the MAX(VERSION) for each NAME.
To do this, you need to use a WHERE EXISTS or self join, like so:
select name, value, version from t
where exists
(
select 1 from
(select name, max(version) version
from t
group by name) s
where s.name = t.name and s.version = t.version
)

Another way of getting what you want:
select *
from (select name
, value
, version
, max(version) over
(partition by name) as max_version
from t)
where version = max_version;
Sample execution:
SQL> create table t (name varchar2(30)
2 , value varchar2(1)
3 , version number not null
4 , constraint t_pk primary key (name, version));
Table created.
SQL> insert into t select 'JEREMY', 'C', 1 from dual
2 union all select 'JEREMY', 'A', 2 from dual
3 union all select 'SARAH', 'D', 2 from dual
4 union all select 'SARAH', 'X', 1 from dual;
4 rows created.
SQL> commit;
Commit complete.
SQL> select name, value, version
2 from (select name
3 , value
4 , version
5 , max(version) over
6 (partition by name) as max_version
7 from t)
8 where version = max_version;
NAME V VERSION
------------------------------ - ----------
JEREMY A 2
SARAH D 2

Related

Remove duplicate id with different description in sql

Hi I have a data like duplicated id but the description is different
id
name
1
A
1
B
How to remove the duplicate? since using DISTINCT will still return all the data
Its not really clear from your question in what way really you wish to remove, given that each id has different metadata attached to it.
Do you just want to de-dup the id as single column or u wish to merge its metadata together so only 1 id remains ?
The simplest one is:
select distinct id from ...
But it looks like its not you meant.
So the second option is, you merge the metadata into an array_agg:
select id, array_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will remove the id duplication and get all metadata into an array.
If you are okay with string_agg you can go with that too:
select id, string_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will give you comma separated values of names:
if you want fancier than this , then u can create a struct for your metadata like: (assuming you have more metadata in real project)
select id, array_agg(struct(name, info)) as metadata
from (select 1 as id, 'A' as name, 'X' as info union all select 1 as id, 'B' as name, 'Y' as info)
group by 1
this will give you:
all other options will make you lose some data, like: if you do min(name) or max(name) to consider only one row per id.
If you could clarify your question a bit better, the community can help you more. For now, I see the above options for you.
Consider below simple approach
select any_value(t).*
from your_table t
group by t.id
if you would have some extra column that identify order of entries - for example ts (timestamp) - you could use below
select any_value(t having min ts).*
from your_table t
group by t.id

Issue using not exist in SQL

Not exist is not working.
I have a query which is fetching 10k rows... now there are 237 rows which I do not want to be retrieved in my final result but when I am using not exist it is fetching the same no. of rows that is 10k I have used the following query:
Select bu_name,
person_num,
name,
f_config_id,
ass_the
from x_asig_table
where not exist ((select 1
from
(select XXH.x_asig_table.*,
count(*) over (partition by bu_name, person_num, name) as c
from XXH.x_asig_table) t
where c > 1);
The sub-query is not correlated with the main query, i.e. it doesn't matter what row you look at in the main query, the subquery will always give you the same result. So either you get all rows or none. It is not possible with this query to get some rows and others not.
Add criteria to your subquery that relates it to the main query to solve the problem.
You need to connect back to the outer query. Something like this (also simplified your query, untested, but should work):
Select bu_name,
person_num,
name,
f_config_id,
ass_the
from x_asig_table X
where not exist (
SELECT NULL
FROM x_asig_table Y
GROUP BY bu_name,person_num, name
WHERE X.bu_name = Y.bu_name
AND X.person_num = Y.person_num
AND X.name = Y.name
HAVING COUNT(1) > 1
)
You appear to be trying to find only those rows where there is a single row per combination of bu_name, person_num and name (although the question is rather unclear what your intents are). If so, then you can do it without using EXISTS like this:
SELECT bu_name,
person_num,
name,
f_config_id,
ass_the
FROM (
SELECT bu_name,
person_num,
name,
f_config_id,
ass_the,
COUNT(1) OVER ( PARTITION BY bu_name, person_num, name ) AS cnt
FROM x_asig_table
)
WHERE cnt = 1;

duplicate values in a row using pl/sql

i have this query:
SELECT distinct
num as number,
name as name
from my_table_name
where number = '12345';
And this is the results:
number - name
1. 12345 - mike
2. 12345 - charlie
3. 12345 - jose
I need a new query when this happens (numbers duplicate or triplicate) show me only one of them. Example:
number - name
12345 - mike
I only need one of them; the position doesn't matter. If it find one, print this and close the procedure, function or cursor.
Distinct is going to return results that are distinct, relative to all of the data you are querying for. If you only want one of the results returned and you know that the result used is arbitrary, you can just add a filter based on the row number (how specifically this is done depends on what DBMS you are using.)
Oracle example:
select num as "number",
name as "name"
from my_table_name
where number = '12345'
and rownum = 1; -- just gets the first row.
SELECT * from
(SELECT rownum rnum,
num as number,
name as name
FROM my_table_name
WHERE number = '12345' )
WHERE rnum = 1
use ROW_NUMBER analytic function
SELECT *
FROM
(
select id, name, ROW_NUMBER() OVER ( partition by id order by name asc) as seq
from tableA
where number = '12345'
) T
where T.seq =1
If you don't care which one is being returned, why are you even asking for it to be returned?
However, to get a single line of results regardless of the number of matching rows, you should probably be using GROUP BY and a summary function:
Select
num as number,
max(name) as name --or min(), or any other summary function that works on this data type
from my_table_name
where num = '12345'
group by num

SQL Separating Distinct Values using single column

Does anyone happen to know a way of basically taking the 'Distinct' command but only using it on a single column. For lack of example, something similar to this:
Select (Distinct ID), Name, Term from Table
So it would get rid of row with duplicate ID's but still use the other column information. I would use distinct on the full query but the rows are all different due to certain columns data set. And I would need to output only the top most term between the two duplicates:
ID Name Term
1 Suzy A
1 Suzy B
2 John A
2 John B
3 Pete A
4 Carl A
5 Sally B
Any suggestions would be helpful.
select t.Id, t.Name, t.Term
from (select distinct ID from Table order by id, term) t
You can use row number for this
Select ID, Name, Term from(
Select ID, Name, Term, ROW_NUMBER ( )
OVER ( PARTITION BY ID order by Name) as rn from Table
Where rn = 1)
as tbl
Order by determines the order from which the first row will be picked.

Number of times one row column equals another row's other column in SQL

The confusing question is best asked through an example. Say we have the following result set:
What I want to do is count how many times one number appears from both columns.
So the returning data set might look like:
ID Counted
0 4
1 2
9 1
13 1
My original thought was to do some sort of addition between the counts on both IDs, but I'm not exactly sure how to GROUP them in SQL in a way that is working.
You can do this with a subquery, GROUP BY, and a UNION ALL, like this:
SELECT ID, COUNT(*)
FROM(
SELECT ID1 AS ID FROM MyTable
UNION ALL
SELECT ID2 AS ID FROM MyTable
) source
GROUP BY ID
ORDER BY ID ASC