sql query against redshift data - sql

I have the following table in redshift:
ID, Category
flinch, cat
flinch, cat
mara, dog
mara, cat
the aim here is to get for each ID the number of occurences of each category , hence the expected resut would be:
ID,Category,size
flinch, cat, 2
mara, dog, 1
mara, cat, 1
I tried several queries but got exceptions :ERROR: could not identify an ordering operator for type record Hint: Use an explicit ordering operator or modify the query.

You seem to want group by:
select id, category, count(*) as size
from t
group by id, category;

Did you try this one?
select id, category, count(*)
from mytable
group by id, category

Related

Remove duplicate id with different description in sql

Hi I have a data like duplicated id but the description is different
id
name
1
A
1
B
How to remove the duplicate? since using DISTINCT will still return all the data
Its not really clear from your question in what way really you wish to remove, given that each id has different metadata attached to it.
Do you just want to de-dup the id as single column or u wish to merge its metadata together so only 1 id remains ?
The simplest one is:
select distinct id from ...
But it looks like its not you meant.
So the second option is, you merge the metadata into an array_agg:
select id, array_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will remove the id duplication and get all metadata into an array.
If you are okay with string_agg you can go with that too:
select id, string_agg(name) as names
from (select 1 as id, 'A' as name union all select 1 as id, 'B' as name)
group by 1
This will give you comma separated values of names:
if you want fancier than this , then u can create a struct for your metadata like: (assuming you have more metadata in real project)
select id, array_agg(struct(name, info)) as metadata
from (select 1 as id, 'A' as name, 'X' as info union all select 1 as id, 'B' as name, 'Y' as info)
group by 1
this will give you:
all other options will make you lose some data, like: if you do min(name) or max(name) to consider only one row per id.
If you could clarify your question a bit better, the community can help you more. For now, I see the above options for you.
Consider below simple approach
select any_value(t).*
from your_table t
group by t.id
if you would have some extra column that identify order of entries - for example ts (timestamp) - you could use below
select any_value(t having min ts).*
from your_table t
group by t.id

Filter by number of occurrences in a SQL Table

Given the following table where the Name value might be repeated in multiple rows:
How can we determine how many times a Name value exists in the table and can we filter on names that have a specific number of occurrances.
For instance, how can I filter this table to show only names that appear twice?
You can use group by and having to exhibit names that appear twice in the table:
select name, count(*) cnt
from mytable
group by name
having count(*) = 2
Then if you want the overall count of names that appear twice, you can add another level of aggregation:
select count(*) cnt
from (
select name
from mytable
group by name
having count(*) = 2
) t
It sounds like you're looking for a histogram of the frequency of name counts. Something like this
with counts_cte(name, cnt) as (
select name, count(*)
from mytable
group by name)
select cnt, count(*) num_names
from counts_cte
group by cnt
order by 2 desc;
You need to use a GROUP BY clause to find counts of name repeated as
select name, count(*) AS Repeated
from Your_Table_Name
group by name;
If You want to show only those Which are repeated more than one times. Then use the below query which will show those occurrences which are there more than one times.
select name, count(*) AS Repeated
from Your_Table_Name
group by name having count(*) > 1;

How to get grouping of rows in SQL

I have a table like this:
id name
1 washing
1 cooking
1 cleaning
2 washing
2 cooking
3 cleaning
and I would like to have a following grouping
id name count
1 washing,cooking,cleaning 3
2 washing,cooking 2
3 cleaning 1
I have tried to group by ID but can only show count after grouping by
SELECT id,
COUNT(name)
FROM WORK
GROUP BY id
But this will only give the count and not the actual combination of names.
I am new to SQL. I know it has to be relational but there must be some way.
Thanks in advance!
in postgresql you can use array_agg
SELECT id, array_agg(name), COUNT(*)
FROM WORK
GROUP BY id
in mysql you can use group_concat
SELECT id, group_concate(name), COUNT(*)
FROM WORK
GROUP BY id
or for redshift
SELECT id, listagg(name), COUNT(*)
FROM WORK
GROUP BY id

plsql/sql is it possible to limit the rows returned from a join, or do some kind of subquery on a per-row basis?

I am doing a query that finds all cats born between two dates. Each cat has a name, or multiple names.
The initial query is something like
SELECT Id, Color FROM Cat WHERE Cat.BirthDate > dat_min AND Cat.BirthDate < dat_max;
I also have a table called CatName which for each Cat Id, has one or more names that this cat has been given by its different owners. I only want to return the first name that matches the Id in the CatName table, as part of the query. So something like:
SELECT Id, Color, Name FROM Cat JOIN CatName on .....
for a cat that has 5 names, will return 5 rows. I only want one row, the first one. If I was only retrieving data for one cat, then I would just use ROWNUM to limit it to 1 query, but I am trying to get a list of all cats including their name, so I can't do this.
Can anyone offer some guidance? I guess it doesn't have to be plsql specific, the technique will be the same I imagine.
There are several methods.
You could use an inline query:
SELECT id, color,
(SELECT name FROM CatName cn WHERE cn.id = c.id AND ROWNUM = 1) Name
FROM cat c
WHERE ...
You could use a join then analytics:
SELECT id, color, Name
FROM (SELECT Id, Color, Name,
row_number() OVER (PARTITION BY id ORDER BY 1) rn
FROM Cat JOIN CatName on .....)
WHERE rn = 1
You can use an aggregate:
SELECT id, color, MAX(name) name
FROM Cat JOIN CatName on .....
GROUP BY id, color
From a performance point of view, assuming that CatName is indexed by CatId:
if the number of cats returned is smallish, or you only want the very first few cats among many, solution 1 can be really fast,
if the dataset returned is large and you want all cats, then solution 2 and 3 can make good use of the efficient HASH JOIN.

DB2 Query - eliminate maxvalues

I have the following problem (simplified):
I have a table that contains animals, e.g:
ID Type Birthday
1 Dog 1.1.2011
2 Cat 2.1.2009
3 Horse 5.1.2009
4 Cat 10.6.1999
5 Horse 9.3.2006
I know that all the animals belong to one "family". From each family I now want to see all the offspring, but I do not want to see the entry for the "founder of the family".
So for the simple sample above I just want to see this:
ID Type Birthday
2 Cat 2.1.2009
3 Horse 5.1.2009
So far I haven't been able to find a way of grouping the entries and then removing the first entry from each group. I was only able to find how to remove specific lines.
Is it even possible to solve this problem?
Thank you very much for your help. It is much appreciated.
A simple SQL(not necessary efficient can be:)
select
id, type, birthday
from animals
left join
(select type, min(birthday) min_birthday
from animals
group by type) a
on a.type=animals.type and a.min_birthday = animals.birthday
where a.type is null;
For best efficiency you can use an analytical function:
select id, type, birthday
from(
select
id,
type,
birthday,
row_number() over (partition by type order by birthday) as rnk
from animals
) a
where rnk >=2
For more examples with analytical functions, you can read this article
In SQL Server you can do:
select
id, type, birthday
from (
select
id, type, birthday,
row_number() over (partition by type order by birthday asc) r
from
animals
) q
where r > 1
The row_number() functions is rumoured to work also in DB2, but I don't know under which circumstances/versions.
The exists variant:
select id, type, birthday
from animals a
where exists (select null from animals e
where e.type = a.type and e.birthday < a.birthday)
(Edited, following comments.)