How to have 'distinct' in having clause - sql

EDIT: This is an example relation! I need it to work on a bigger relation so no workarounds!
So I was given a simple task and at first I didn't see what could possibly be wrong and now I just don't understand why it doesnt work.
Lets say I have a table of people and their friends and I want to select the ones who have 2 or more friends.
people
------------------------------
|person | friend | relation |
|-----------------------------
|ana | jon | friend |
|ana | jon | lover |
|ana | phillip| friend |
|ana | kiki | friend |
|mary | jannet | friend |
|mary | jannet | lover |
|peter | july | friend |
I would want to do a
SELECT person FROM people GROUP BY person HAVING count(distinct friend) > 1;
and get
-------
| ana |
-------
But I get a syntax error when using the 'distinct' in the HAVING clause.
I understand that the 'distinct' is a part of the projection clause but
how do I make 'count' only count distinct entries without an additional subquery or something?
EDIT: The best I could come up with is:
SELECT tmp.person FROM (SELECT person, count(distinct friend)
AS numfriends FROM people GROUP BY person) AS tmp
WHERE tmp.numfriends > 1;

From the doc
http://www-01.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_0162.htm
The condition in the HAVING clause cannot include a DISTINCT or UNIQUE
aggregate expression.
An work-around would be to have the count distinct in the select
SELECT
person,
count(distinct friend) as f_count
FROM people
GROUP BY person
HAVING f_count > 1;
UPDATE :
Checked the document and found the facts
The HAVING clause is evaluated before the SELECT - so the server
doesn't yet know about that alias.
So to achieve the goal it could be done as
select
person,
f_count
from(
SELECT
person,
count(distinct friend) as f_count
FROM people
GROUP BY person
)x
where f_count > 1

you need to write it like this
SELECT person
FROM people
WHERE relation = 'friend'
GROUP BY person
HAVING count(*) > 1;

just check this.
declare #t table(person varchar(50), Friend VARCHAR(50), relation VARCHAR(50))
INSERT INTO #T VALUES('ana', 'jon','friend')
,('ana', 'jon','lover')
,('ana', 'phillip','friend')
,('ana', 'kiki','friend')
,('mary', 'jannat','friend')
,('mary', 'jannat','lover')
SELECT DISTINCT PERSON FROM
(
SELECT person FROM #t GROUP BY person HAVING count(friend) > 1
) a

Just an alternative way via using more group by's -
select cust_xref_id
from(
select cust_xref_id,cm11
from temp_nippon_cust
group by cust_xref_id,cm11
) temp
group by cust_xref_id
having count(cm11) > 1

Related

Oracle issue in group by

I know this question is asked by many people. But I still couldn't figure out why this is happening. Couldn't understand this logic.
I have a table mytesttable with columns id, company_name and employee_name.
I am trying to get the employee details grouping them with respect to company name. So I used the below query:
select *
from mytesttable
group by company_name;
But I get the below issue:
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
*Cause:
*Action:
Error at Line: 28 Column: 2
Now I have tried putting count(1) in select along with my columns, tried grouping using 2 columns etc. Still the same issue. Can anyone explain me how to achieve this?
Because this is a simple group by. Logic seems to be right, but wondering why it's not fetching me the result.
Your query is:
select *
from mytesttable
group by company_name;
The * expands to all the columns. So this becomes:
select company_name, col1, col2, col3, . . . -- your question doesn't specify the column names
from mytesttable
group by company_name;
When you specify the group by, you are specifying that there is one row per company_name in the result set. The other columns are normally filled with aggregation functions, such as MIN(), SUM(), or LISTAGG().
What value should be chosen for col1? In general, SQL does not attempt to answer this question. Instead, it returns a syntax error. This is not Oracle-specific. This is the definition of the language.
What you probably want is:
select company_name, count(*) as num_employees
from mytesttable
group by company_name;
In group by clause you have to put all selected columns. As you select all but given only one column in group by as a result it thrown error, but if you simply try like this way it will work. Actually group by used for aggregated function
select company_name, count(*) from mytesttable group by company_name;
I think you mis-understand what GROUP BY does?
It doesn't put all similar records next to each other, that's ORDER BY. What GROUP BY does is to collapse all similar records in to the same row, to allow aggregate calculations like SUM() and MIN() and COUNT(), etc.
So, two examples using the same input...
id | company_name | employee_name
----+--------------+---------------
1 | zzz | aaa
2 | xxx | bbb
3 | yyy | ccc
4 | zzz | ddd
5 | xxx | eee
Using ORDER BY...
SELECT * FROM mytesttable ORDER BY company_name, employee_name
id | company_name | employee_name
----+--------------+---------------
2 | xxx | bbb
5 | xxx | eee
3 | yyy | ccc
1 | zzz | aaa
4 | zzz | ddd
Using GROUP BY...
SELECT
company_name,
COUNT(*) number_of_employees,
MAX(id) highest_id_in_company
FROM
mytesttable
GROUP BY
company_name
ORDER BY
company_name
company_name | number_of_employees | highest_id_in_company
--------------+---------------------+-----------------------
xxx | 2 | 5
yyy | 1 | 3
zzz | 2 | 4
If you use group-by expression,you have to use aggregate functions such as max,min,average,count in select statements.
For example;
select count(*),company_name,employee_name
from mytesttable
group by company_name,employee_name
order by company_name;

Novice seeking help, Max Aggregate not returning expected results

I'm still very new to MS-SQL. I have a simple table and query that that is getting the best of me. I know it will something fundamental I'm overlooking.
I've changed the field names but the idea is the same.
So the idea is that every time someone signs up they get a RegID, Name, and Team. The names are unique, so for below yes John changed teams. And that's my trouble.
Football Table
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 101 | Bill | Blue |
| 102 | Tom | Green |
| 103 | John | Green |
+------------+----------+---------+
With the query at the bottom using the Max_RegID, I was expecting to get back only one record.
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 103 | John | Green |
+------------+----------+---------+
Instead I get back below, Which seems to include Max_RegID but also for each team. What am I doing wrong?
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 103 | John | Green |
+------------+----------+---------+
My Query
SELECT
Max(Football.RegID) AS Max_RegID,
Football.Name,
Football.Team
FROM
Football
GROUP BY
Football.RegID,
Football.Name,
Football.Team
EDIT* Removed the WHERE statement
The reason you're getting the results that you are is because of the way you have your GROUP BY clause structured.
When you're using any aggregate function, MAX(X), SUM(X), COUNT(X), or what have you, you're telling the SQL engine that you want the aggregate value of column X for each unique combination of the columns listed in the GROUP BY clause.
In your query as written, you're grouping by all three of the columns in the table, telling the SQL engine that each tuple is unique. Therefore the query is returning ALL of the values, and you aren't actually getting the MAX of anything at all.
What you actually want in your results is the maximum RegID for each distinct value in the Name column and also the Team that goes along with that (RegID,Name) combination.
To accomplish that you need to find the MAX(ID) for each Name in an initial data set, and then use that list of RegIDs to add the values for Name and Team in a secondary data set.
Caveat (per comments from #HABO): This is premised on the assumption that RegID is a unique number (an IDENTITY column, value from a SEQUENCE, or something of that sort). If there are duplicate values, this will fail.
The most straight forward way to accomplish that is with a sub-query. The sub-query below gets your unique RegIDs, then joins to the original table to add the other values.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, gets the list of IDs
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
GROUP BY
f2.Name
) AS sq
ON
sq.Max_RegID = f.RegID;
EDIT: Sorry. I just re-read the question. To get just the single record for the MAX(RegID), just take the GROUP BY out of the sub-query, and you'll just get the current maximum value, which you can use to find the values in the rest of the columns.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, now gets the MAX ID
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
) AS sq
ON
sq.Max_RegID = f.RegID;
Use row_number()
select * from
(SELECT
Football.RegID AS Max_RegID,
Football.Name,
Football.Team, row_number() over(partition by name order by Football.RegID desc) as rn
FROM
Football
WHERE
Football.Name = 'John')a
where rn=1
simply you can edit your query below way
SELECT *
FROM
Football f
WHERE
f.Name = 'John' and
Max_RegID = (SELECT Max(Football.Max_RegID) where Football.Name = 'John'
)
or
if sql server simply use this
select top 1 * from Football f
where f.Name = 'John'
order by Max_RegID desc
or
if mysql then
select * from Football f
where f.Name = 'John'
order by Max_RegID desc
Limit 1
You need self join :
select f1.*
from Football f inner join
Football f1
on f1.name = f.name
where f.Max_RegID = 103;
After re-visit question, the sample data suggests me subquery :
select f.*
from Football f
where name = (select top (1) f1.name
from Football f1
order by f1.Max_RegID desc
);

Oracle Apex Pie Chart SQL Statement

i'm not really much into SQL & Apex, but i need one statement and I would really appreciate your help on this.
The syntax of Apex pie charts is this:
SELECT link, label, value
My table looks like these simple sketch:
+------+-----------+---------+
| ID | Company | Item |
+------+-----------+---------+
| 1 | AAA |Some |
| 2 | BB |Stuff |
| 3 | BB |Not |
| 4 | CCCC |Important|
| 5 | AAA |For |
| 6 | DDDDD |Question?|
+------+-----------+---------+
I want to show the percentage of the companies.
Problem: All companies with less than 5 items should combine to one colum "other". The difficulty for me is to combine the "unimportant" companies.
Until now my statement looks like this:
SELECT null link,
company label,
COUNT(ID) value FROM table HAVING COUNT(ID) > 5 GROUP BY company
Here a wonderful diagram-sketch. :D
Thank you for your ideas!
I've not got SQL Developer in front of me but this (or a close variation) should work for you:
WITH company_count
AS (
SELECT CASE
WHEN count(*) < 5
THEN 'Other'
ELSE company
END AS company_name,
id
FROM tablename
),
company_group
AS (
SELECT company_name,
count(id) item_count
FROM company_count
GROUP BY company_name
)
SELECT NULL AS link,
company_name AS label,
item_count AS value
FROM company_group;
Hope it helps!
Okay Guys, I found the answer for my use case. Its quite similar to Ollies answer. Thanks again for the help!
WITH sq1 AS (SELECT company, COUNT (*) AS count FROM
(SELECT CASE WHEN COUNT (*) OVER (Partition By company) > 5 THEN company
ELSE 'other' END AS company, id FROM table) GROUP BY company)
SELECT null link, company label, count value FROM sq1 ORDER BY count desc

Beginner SQL query with ROW_NUMBER

i'm kind of a beginner with SQL.
Right now i'm trying to create a bit complex select but i'm getting some error, which I know it's a beginner mistake.
Any help appreciated.
SELECT ROW_NUMBER() OVER (ORDER BY score) AS rank, userID, facebookID, name, score FROM (
SELECT * FROM Friends AS FR WHERE userID = ?
JOIN
Users WHERE Users.facebookID = FR.facebookFriendID
)
UNION (
SELECT * FROM User WHERE userID = ?
)
Where the 2 ? will be replaced with my user's ID.
The table User contains every user in my db, while the Friends table contains all facebookFriends for a user.
USER TABLE
userID | facebookID | name | score
FRIENDS TABLE
userID | facebookFriendID
Sample data
USER
A | facebookID1 | Alex | 100
B | facebookID2 | Mike | 200
FRIENDS
A | facebookID2
A | facebookID3
B | facebookID1
I'd like this result since Alex and mike are friends:
rank | userID | facebookID | name
1 | B | facebookID2 | Mike
2 | A | facebookID1 | Alex
I hope this was quite clear explanation.
I'm getting this error at the moment:
Error occurred executing query: Incorrect syntax near the keyword 'AS'.
You've got several issues with your query. JOINS come before WHERE clauses. And when using a JOIN, you need to specify your ON clauses. Also when using a UNION, you need to make sure the same number of fields are returned in both queries.
Give this a try:
SELECT ROW_NUMBER() OVER (ORDER BY score) AS rank, userID, facebookID, name, score
FROM (
SELECT *
FROM Users
WHERE UserId = 'A'
UNION
SELECT U.userId, u.facebookId, u.name, u.score
FROM Friends FR
JOIN Users U ON U.facebookID = FR.facebookFriendID
WHERE FR.userID = 'A' ) t
SQL Fiddle Demo
Also, by the way your using ROW_NUMBER, it really will be a Row Number vs a RANK. If you want Rankings (with potential ties), replace ROW_NUMBER with RANK.

SQL - Find Duplicates, then add to another table with count

There are ton's of listings on how to find duplicate rows, and remove them, or list out the duplicates. In the masses of responses i've tried searching through on here those are the only responses i've found. I figured I would just put up my question since its been an hour and still no luck.
This is the example data I have
Table Name: Customers
_____________________________
ID | CompanyName
--------------
1 | Joes
2 | Wendys
3 | Kellys
4 | Ricks
5 | Wendys
6 | Kellys
7 | Kellys
I need to be able to find all the duplicates in this table, then put the results into another table that lists what the company name is, and how many duplicates it found.
For example the above table I should have a new table that says something like
Table Name: CustomerTotals
_______________________________
ID | CompanyName | Totals
-------------------------------
1 | Joes | 1
2 | Wendys | 2
3 | Kellys | 3
4 | Ricks | 1
-----EDIT Added after 2 responses, ran into another question------
Thanks for the responses! What about the opposite? say i only want to add items to a new table "UniqueCustomers" from the Customers table that doesn't exist in CustomerTotals table?
Try this :
INSERT INTO CustomerTotals
(CompanyName, Totals)
SELECT CompanyName, COUNT(*)
FROM Customer
GROUP BY CompanyName
Use an identity column for the ID field.
for get the duplicates, you can do
Insert into CustomerTotals (Id, CompanyName, Totals)
Select min(Id), CompanyName, count(*) From Customers
group by CompanyName
there you got the results, conserving the minimun id for each company name(if you really need the first id from your original table, as I see in the results).
If not, you can use an Identity Column
If you only want the duplicates to be inserted into the second table, use this:
INSERT INTO CustomerTotals
(CompanyName, Totals)
SELECT CompanyName, COUNT(*)
FROM Customer
GROUP BY CompanyName
HAVING Count(*) > 1
The above examples are fine. But you should use count(1) instead of count(*) to improve performance.
Read this question.
I think below script is simplest...
SELECT CompanyName, COUNT(*) AS Total
INTO #tempTable
FROM Customer
GROUP BY CompanyName
HAVING Count(*) > 1