DB2 Query - eliminate maxvalues - sql

I have the following problem (simplified):
I have a table that contains animals, e.g:
ID Type Birthday
1 Dog 1.1.2011
2 Cat 2.1.2009
3 Horse 5.1.2009
4 Cat 10.6.1999
5 Horse 9.3.2006
I know that all the animals belong to one "family". From each family I now want to see all the offspring, but I do not want to see the entry for the "founder of the family".
So for the simple sample above I just want to see this:
ID Type Birthday
2 Cat 2.1.2009
3 Horse 5.1.2009
So far I haven't been able to find a way of grouping the entries and then removing the first entry from each group. I was only able to find how to remove specific lines.
Is it even possible to solve this problem?
Thank you very much for your help. It is much appreciated.

A simple SQL(not necessary efficient can be:)
select
id, type, birthday
from animals
left join
(select type, min(birthday) min_birthday
from animals
group by type) a
on a.type=animals.type and a.min_birthday = animals.birthday
where a.type is null;
For best efficiency you can use an analytical function:
select id, type, birthday
from(
select
id,
type,
birthday,
row_number() over (partition by type order by birthday) as rnk
from animals
) a
where rnk >=2
For more examples with analytical functions, you can read this article

In SQL Server you can do:
select
id, type, birthday
from (
select
id, type, birthday,
row_number() over (partition by type order by birthday asc) r
from
animals
) q
where r > 1
The row_number() functions is rumoured to work also in DB2, but I don't know under which circumstances/versions.

The exists variant:
select id, type, birthday
from animals a
where exists (select null from animals e
where e.type = a.type and e.birthday < a.birthday)
(Edited, following comments.)

Related

Remove duplicate id with different description in sql

Hi I have a data like duplicated id but the description is different
id
name
1
A
1
B
How to remove the duplicate? since using DISTINCT will still return all the data
Its not really clear from your question in what way really you wish to remove, given that each id has different metadata attached to it.
Do you just want to de-dup the id as single column or u wish to merge its metadata together so only 1 id remains ?
The simplest one is:
select distinct id from ...
But it looks like its not you meant.
So the second option is, you merge the metadata into an array_agg:
select id, array_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will remove the id duplication and get all metadata into an array.
If you are okay with string_agg you can go with that too:
select id, string_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will give you comma separated values of names:
if you want fancier than this , then u can create a struct for your metadata like: (assuming you have more metadata in real project)
select id, array_agg(struct(name, info)) as metadata
from (select 1 as id, 'A' as name, 'X' as info union all select 1 as id, 'B' as name, 'Y' as info)
group by 1
this will give you:
all other options will make you lose some data, like: if you do min(name) or max(name) to consider only one row per id.
If you could clarify your question a bit better, the community can help you more. For now, I see the above options for you.
Consider below simple approach
select any_value(t).*
from your_table t
group by t.id
if you would have some extra column that identify order of entries - for example ts (timestamp) - you could use below
select any_value(t having min ts).*
from your_table t
group by t.id

sql query against redshift data

I have the following table in redshift:
ID, Category
flinch, cat
flinch, cat
mara, dog
mara, cat
the aim here is to get for each ID the number of occurences of each category , hence the expected resut would be:
ID,Category,size
flinch, cat, 2
mara, dog, 1
mara, cat, 1
I tried several queries but got exceptions :ERROR: could not identify an ordering operator for type record Hint: Use an explicit ordering operator or modify the query.
You seem to want group by:
select id, category, count(*) as size
from t
group by id, category;
Did you try this one?
select id, category, count(*)
from mytable
group by id, category

Select entry of each group having exactly 1 entry

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.
You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)
Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

plsql/sql is it possible to limit the rows returned from a join, or do some kind of subquery on a per-row basis?

I am doing a query that finds all cats born between two dates. Each cat has a name, or multiple names.
The initial query is something like
SELECT Id, Color FROM Cat WHERE Cat.BirthDate > dat_min AND Cat.BirthDate < dat_max;
I also have a table called CatName which for each Cat Id, has one or more names that this cat has been given by its different owners. I only want to return the first name that matches the Id in the CatName table, as part of the query. So something like:
SELECT Id, Color, Name FROM Cat JOIN CatName on .....
for a cat that has 5 names, will return 5 rows. I only want one row, the first one. If I was only retrieving data for one cat, then I would just use ROWNUM to limit it to 1 query, but I am trying to get a list of all cats including their name, so I can't do this.
Can anyone offer some guidance? I guess it doesn't have to be plsql specific, the technique will be the same I imagine.
There are several methods.
You could use an inline query:
SELECT id, color,
(SELECT name FROM CatName cn WHERE cn.id = c.id AND ROWNUM = 1) Name
FROM cat c
WHERE ...
You could use a join then analytics:
SELECT id, color, Name
FROM (SELECT Id, Color, Name,
row_number() OVER (PARTITION BY id ORDER BY 1) rn
FROM Cat JOIN CatName on .....)
WHERE rn = 1
You can use an aggregate:
SELECT id, color, MAX(name) name
FROM Cat JOIN CatName on .....
GROUP BY id, color
From a performance point of view, assuming that CatName is indexed by CatId:
if the number of cats returned is smallish, or you only want the very first few cats among many, solution 1 can be really fast,
if the dataset returned is large and you want all cats, then solution 2 and 3 can make good use of the efficient HASH JOIN.

How to find the highest populated instance in a column in SQL

So I have a table (person), that contains columns such as persons name, age, eye-color, favorite movie.
How do I find the most popular eye color(s), returning just the eye color (not the count) using SQL (Microsft Access), without using top as there might be multiple colours with the same count.
Thank you
SELECT
EyeColor
FROM
Person
GROUP BY
EyeColor
HAVING
COUNT(*) = (
SELECT MAX(i.EyeColorCount) FROM (
SELECT COUNT(*) AS EyeColorCount FROM Person GROUP BY EyeColor
) AS i
)
In Access, I think you need something on the lines of:
SELECT First(t.Eyecolor) AS FirstOfEyeColor
FROM (SELECT p.EyeColor, Count(p.EyeColor) AS C
FROM Person p
GROUP BY p.EyeColor
ORDER BY Count(p.EyeColor) DESC) AS t;