Oracle SQL: Counting how often an attribute occurs for a given entry and choosing the attribute with the maximum number of occurs - sql

I have a table that has a number column and an attribute column like this:
1.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+------------
I want to make the number unique, and the attribute to be whichever attribute occured most often for that number, like this (This is the end-product im interrested in) :
2.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 2 | b |
+------------
I have been working on this for a while and managed to write myself a query that looks up how many times an attribute occurs for a given number like this:
3.
+-----+-----+-----+
| num | att |count|
------------------+
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | a | 1 |
| 2 | b | 2 |
+-----------------+
But I can't think of a way to only select those rows from the above table where the count is the highest (for each number of course).
So basically what I am asking is given table 3, how do I select only the rows with the highest count for each number (Of course an answer describing providing a way to get from table 1 to table 2 directly also works as an answer :) )

You can use aggregation and window functions:
select num, att
from (
select num, att, row_number() over(partition by num order by count(*) desc, att) rn
from mytable
group by num, att
) t
where rn = 1
For each num, this brings the most frequent att; if there are ties, the smaller att is retained.

Oracle has an aggregation function that does this, stats_mode().:
select num, stats_mode(att)
from t
group by num;
In statistics, the most common value is called the mode -- hence the name of the function.
Here is a db<>fiddle.

You can use group by and count as below
select id, col, count(col) as count
from
df_b_sql
group by id, col

Related

Can I generate a map that shows a particular row was in a particular group in SQLite?

Say I have the following data:
+--------+-------+
| Group | Data |
+--------+-------+
| 1 | row 1 |
| 1 | row 2 |
| 1 | row 3 |
| 20 | row 1 |
| 20 | row 3 |
| 10 | row 1 |
| 10 | row A |
| 10 | row 2 |
| 10 | row 3 |
+--------+-------+
Is it possible to draw a map that shows which groups have which rows? Groups may not be contagious, so they can be placed into a separate table and use the row index for the string index instead. Something like this:
+-------+
| Group |
+-------+
| 1 |
| 20 |
| 10 |
+-------+
+-------+----------------+
| Data | Found in group |
+-------+----------------+
| row 1 | 111 |
| row A | 1 |
| row 2 | 1 1 |
| row 3 | 111 |
+-------+----------------+
Where the first character represents Group 1, the 2nd is Group 20 and the 3rd is Group 10.
Ordering of the Group rows isn't critical so long as I can reference which row goes with which character.
I only ask this because I saw this crazy example in the documentation generating a fractal, but I can't quite get my head around it.
Is this doable?
To find the missing values, first thing is to prepare a dataset which have all possible combination. You can achieve that using CROSS JOIN.
Once you have that DataSet, compare it with the actual DataSet.
Considering the Order by is done in the Grp column, you can achieve it using below.
SELECT
a.Data,group_concat(case when base.Grp is null then "." else "1" end,'') as Found_In_Group
,group_concat(b.Grp) as Group_Order
FROM
(SELECT Data FROM yourtable Group By Data)a
CROSS JOIN
(SELECT Grp FROM yourtable Group By Grp Order by Grp)b
LEFT JOIN yourtable base
ON b.Grp=base.Grp
AND a.Data=base.Data
GROUP BY a.Data
Note: Considered . instead of blank for better visibility to represent missing Group.
Data
Found_In_Group
Group_Order
row 1
111
1,10,20
row 2
11.
1,10,20
row 3
111
1,10,20
row A
.1.
1,10,20
Demo: Try here
SELECT Data, group_concat("Group") AS "Found in group"
FROM yourtable
GROUP BY Data
will give you a CSV list of groups.

SQL set increasing integer where value of column is 1

I have a data set which looks like:
Id INT,
Choice VARCHAR,
Order INT
Id + Choice form the primary key.
Currently a lot of the rows have Order = 1.
What I would like to do is, for each Id, if there are multiple rows with that Id where Order = 1, set them to be 1, 2, 3, 4, etc.
I can't work out the SQL to do this.
Example data:
+----+--------+-------+
| Id | Choice | Order |
+----+--------+-------+
| 4 | hello | 1 |
| 4 | world | 1 |
| 4 | test | 1 |
+----+--------+-------+
Would become:
+----+--------+-------+
| Id | Choice | Order |
+----+--------+-------+
| 4 | hello | 1 |
| 4 | world | 2 |
| 4 | test | 3 |
+----+--------+-------+
We can try using ROW_NUMBER here with a partition by Id. As for the ordering in your Order column, I don't see any logic present for how you numbered things. In the absence of this, I use the Choice column to decide how to order the row numbering.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Choice) rn
FROM yourTable
WHERE [Order] = 1
)
UPDATE cte
SET [Order] = rn;
Note: Please avoid naming your columns (tables, etc.) using reserved SQL keywords like ORDER. You will forever have to put that column name in square brackets, like this: [Order].

SQL : Getting duplicate rows along with other variables

I am working on Terradata SQL. I would like to get the duplicate fields with their count and other variables as well. I can only find ways to get the count, but not exactly the variables as well.
Available input
+---------+----------+----------------------+
| id | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 2 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
| 3 | ghi | 23.03.2015 |
Expected output :
+---------+----------+----------------------+
| id | name | count | // Other fields
+---------+----------+----------------------+
| 1 | abc | 2 |
| 1 | def | 2 |
| 2 | ajk | 1 |
| 3 | ghi | 2 |
| 3 | ghi | 2 |
What am I looking for :
I am looking for all duplicate rows, where duplication is decided by ID and to retrieve the duplicate rows as well.
All I have till now is :
SELECT
id, name, other-variables, COUNT(*)
FROM
Table_NAME
GROUP BY
id, name
HAVING
COUNT(*) > 1
This is not showing correct data. Thank you.
You could use a window aggregate function, like this:
SELECT *
FROM (
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
) AS sub
WHERE duplicates > 1
Using a teradata extension to ISO SQL syntax, you can simplify the above to:
SELECT id, name, other-variables,
COUNT(*) OVER (PARTITION BY id) AS duplicates
FROM users
QUALIFY duplicates > 1
As an alternative to the accepted and perfectly correct answer, you can use:
SELECT {all your required 'variables' (they are not variables, but attributes)}
, cnt.Count_Dups
FROM Table_NAME TN
INNER JOIN (
SELECT id
, COUNT(1) Count_Dups
GROUP BY id
HAVING COUNT(1) > 1 -- If you want only duplicates
) cnt
ON cnt.id = TN.id
edit: According to your edit, duplicates are on id only. Edited my query accordingly.
try this,
SELECT
id, COUNT(id)
FROM
Table_NAME
GROUP BY
id
HAVING
COUNT(id) > 1

Select rows appearing after a row with a given ID when sorted by criteria unrelated to the ID

Given the data in the table "people":
+----+-------+
| id | name |
+----+-------+
| 1 | Jane |
| 2 | Joe |
| 4 | John |
| 5 | Alice |
| 6 | Bob |
+----+-------+
And the order:
SELECT * FROM people ORDER BY name
... which would return:
+----+-------+
| id | name |
+----+-------+
| 5 | Alice |
| 6 | Bob |
| 1 | Jane |
| 2 | Joe |
| 4 | John |
+----+-------+
How could one write a query--including the order above--which would return only rows after the one with a given id, e.g., if given an id of 1, it would return:
+----+-------+
| id | name |
+----+-------+
| 2 | Joe |
| 4 | John |
+----+-------+
To be clear, the id is variable and not known before hand.
An approach using commonly supported SQL would be great, but I'm using PostgreSQL 9.2 and ActiveRecord 3.2 if they have anything additional of use, e.g., OVER() and ROW_NUMBER().
[Edit] I'd previously showed the wrong desired result set, including the row with the given id. But, the result set, as described in the question, should only include rows after the given ID.
select *
from people
where
name >= (
select name
from people
where id = 1
)
and id != 1
order by name
So far the simplest approach I've found for a situation where precision is needed, e.g., no missing or duplicate results across multiple calls with varying values for ID is to combine window functions and CTEs, as in:
WITH ordered_people AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY name) AS n
FROM people
ORDER BY name
)
SELECT *
FROM ordered_people
WHERE n > (SELECT n FROM ordered_people WHERE id = 1)
ORDER BY name
;

Running total of "matches" using a window function in SQL

I want to create a window function that will count how many times the value of the field in the current row appears in the part of the ordered partition coming before the current row. To make this more concrete, suppose we have a table like so:
| id| fruit | date |
+---+--------+------+
| 1 | apple | 1 |
| 1 | cherry | 2 |
| 1 | apple | 3 |
| 1 | cherry | 4 |
| 2 | orange | 1 |
| 2 | grape | 2 |
| 2 | grape | 3 |
And we want to create a table like so (omitting the date column for clarity):
| id| fruit | prior |
+---+--------+-------+
| 1 | apple | 0 |
| 1 | cherry | 0 |
| 1 | apple | 1 |
| 1 | cherry | 1 |
| 2 | orange | 0 |
| 2 | grape | 0 |
| 2 | grape | 1 |
Note that for id = 1, moving along the ordered partition, the first entry 'apple' doesn't match anything (since the implied set is empty), the next fruit, 'cherry' also doesn't match. Then we get to 'apple' again, which is a match and so on. I'm imagining the SQL looks something like this:
SELECT
id, fruit,
<some kind of INTERSECT?> OVER (PARTITION BY id ORDER by date) AS prior
FROM fruit_table;
But I cannot find anything that looks right. FWIW, I'm using PostgreSQL 8.4.
You could solve that without a window function rather elegantly with a self-left join and a count():
SELECT t.id, t.fruit, t.day, count(t0.*) AS prior
FROM tbl t
LEFT JOIN tbl t0 ON (t0.id, t0.fruit) = (t.id, t.fruit) AND t0.day < t.day
GROUP BY t.id, t.day, t.fruit
ORDER BY t.id, t.day
I renamed the date column day because date is a reserved word in every SQL standard and in PostgreSQL.
I corrected a mistake in your sample data. They way you had it, it did not check out. Might confuse people.
If your point is to do it with a window function, this one should work:
SELECT id, fruit, day
,count(*) OVER (PARTITION BY id, fruit ORDER BY day) - 1 AS prior
FROM tbl
ORDER BY id, day
This works, because, I quote the manual:
If frame_end is omitted it defaults to CURRENT ROW.
You effectively count how many rows had the same (id, fruit) on prior days - including the current row. That's what the - 1 is for.