Select records where column value is unique - sql

I have a table of posts in a forum (mybb_posts, with the username of the poster).
I want all the posts posted by people who only posted once, in other words, all the rows where username is a single occurrence in the username column.
So far I am using this:
SELECT *
FROM mybb_posts
WHERE username IN
(SELECT username
FROM
(SELECT username,
count(*) COUNT
FROM `mybb_posts`
GROUP BY username) tbl1
WHERE COUNT=1)
But the three nested SELECTs look ugly.
Is there a more elegant/efficient/simple way? All the answers I have seen on SO and elsewhere focus on getting the unique ids.
This is for a MySQL database, if you want to suggest non-standard solutions (but standard ones are preferred).

all the rows where username is a single occurrence in the username column.
This suggests window functions:
SELECT p.*
FROM (SELECT p.*, COUNT(*) OVER (PARTITION BY p.username) as cnt
FROM mybb_posts p
) p
WHERE cnt = 1;
As a note: You don't need two nested subqueries for your version. You can use a HAVING clause:
SELECT p.*
FROM mybb_posts p
WHERE p.username IN (SELECT p2.username
FROM mybb_posts p2
GROUP BY p2.username
HAVING COUNT(*) = 1
);

The most portable solution that I can think of is not exists and a correlated subquery. This works in most databases, including those that do not support window functions (such as MySQL 5.x versions, or MS Access). This should also be a rather efficient option.
For this, you need a primary key in your table. Assuming that it is called post_id, that would be:
select p.*
from mybb_posts p
where not exists (
select 1
from mybb_posts p1
where p1.username = p.username and p1.post_id <> p.post_id
)
For performance, you need an index on (username, post_id).

Related

How to query top record group conditional on the counts and strings in a second table

I call on the SQL Gods of the internet!! O so desperately need your help with this query, my livelyhood depends on it. I've solved it in Alteryx in like 2 minutes but i need to write this query in SQL and I am relatively new to the language in terms of complex blending and syntax.
Your help would be so appreciated!! :) xoxox I cant begin to describe
Using SSMS I need to use 2 tables 'searches' and 'events' to query...
the TOP 2 [user]s with the highest count of unique search ids in Table 'searches'
Condition that the [user]s in the list have at least 1 eventid in 'events' where [event type] starts with "great"
Here is an example of what needs to happen
search event and end result example
So the only pieces i have so far are below but boy oh boy please don't Laugh :(
What i was trying to do is..
select a table of unique users with the searchcounts from the search table
inner join selected table from 1 on userid with a table described in 3
create table of unique user ids with counts of events with [type] starting with "great"
Filter the inner joined table for the top 2 search counts from step 1
SELECT userid, COUNT() as searchcount
FROM searches
GROUP BY userid
INNER JOIN (SELECT userid, COUNT() as eventcount
FROM events WHERE LEFT(type, 5) = "great" AND eventcount>0 Group by userid)
ON searches.userid=events.userId
Obviously, this doesn't work at all!!! I think my structure is off and my method of filtering for "great" is errored. Also i dont know how to add the "top 2" clause to the search table query without affecting the inner join. This code needs to be fairly efficient so if you have a better more computationally efficient idea...I love you long time
SELECT top(2) userid, COUNT() as searchcount FROM searches
where userid in (select userid from events where left(type, 5)='great')
GROUP BY userid
order by count() desc
hope above query will serve your purpose.
I think you need exists and windows function dense_rank as follows:
Select * from
(Select u.userid, dense_rank() over (partition by u.userid order by count(*) desc) as rn
From users u join searches s on u.userid = s.userid
Where exists
(select 1 from events e
Where e.userid = u.userid And LEFT(e.type, 5) = 'great')
Group by u.userid ) t Where rn <= 2

SQL - Querying for names that occur at least twice

So I'm trying to find a way to query for a table "people" that has attribute "name", and I would like to query for names that occur at least twice, while the results should be distinct.
I was thinking of creating two alias tables, and joining on name but I can't figure it out.
Here is what I tried:
SELECT DISTINCT name
FROM people AS S1
INNER JOIN people AS S2 USING name
WHERE S2.lastname <> S2.surname
The surname part I did to remove cases of names appearing because of the two tables being equal (not even sure if this is correct).
But either way, this already failed as the syntax is wrong.
Would appreciate some help! Thanks in advance.
Aggregation is a simple method if you want just the names:
select name
from persons
group by name
having count(*) > 1;
If you want the original rows, use window functions:
select p.*
from (select p.*, count(*) over (partition by name) as cnt
from persons p
) p
where cnt >= 2;
Simple: use EXISTS() [ you only need to select from the people table once, and you dont have to use DISTINCT ] :
SELECT *
FROM people s1
WHERE EXISTS (SELECT *
FROM people s2
WHERE s2.name = s1.name
AND S2.lastname <> S1.lastname
);
BTW: assuming lastname <--> surname was a typo?
select p.people_name, count(1) as cnt
from people p
group by 1
having cnt >=1

Comparing two sum function in where clause

I want to check that an amount of likes the users received in all their personal pictures is at least twice as large as the number of likes received in the group pictures in which they are tagged.
In case the user is not tagged in any group photo but is tagged in a personal picture that has received at least one like, it will be returned.
My Question is:
How can I make a comparison between 2 sum functions
Where one result of the sum is returned in the nested query and compared with the external query.
Can I set an auxiliary variable to enter the sum value in it and compare it?
Thanks for the helpers:)
Select distinct UIP.userID
From tblUserInPersonalPic UIP
where **sum(UIP.numOfLikes) over (Partition by UIP.userID)*0.5** >
(Select distinct U.userID, sum(P.numOfLikes) over (Partition by U.userID)
From tblgroupPictures P left outer join
tblUserInGroupPic U On P.picNum=U.picNum
group by U.userID,P.numOfLikes,P.picNum)
It's kinda hard to know for sure, and of course I can't test my answer,
but I think you can do it with a couple of left joins, group by and having:
SELECT Personal.UserId
FROM tblUserInPersonalPic Personal
LEFT JOIN tblUserInGroupPic UserInGroup ON Personal.userID = UserInGroup.UesrId
LEFT JOIM tblgroupPictures GroupPictures ON UserInGroup.picNum = GroupPictures.picNum
GROUP BY Personal.userID
HAVING SUM(GroupPictures.numOfLikes) * 2 < SUM(Personal.numOfLikes)
Please note: When posting sql questions it's always best to provide sample data as DDL + DML (Create table + insert into statements) and desired results, so that who ever answers you can test the answer before posting it.
Try using two ctes..pseudo code.Also note distinct in second query will not even work,since you are returning two columns,so i changed it it below,so that you can get that column as well
;with tbl1
as
(
select a,sum(col1) as summ
from
tbl1
)
,tbl2
as
(
select userid,sum(Anothersmcol) as sum2
from tbl2
)
select tbl1.columns,tbl2.columns
from
tbl1 t1
join
tbl2 t2
on t1.sumcol>t2.sumcol
You can't use window functions in a where clause. Define it in a subquery:
select *
from (
select sum(...) over (...) as Sum1
, OtherColumn
from YourTable
) sub
where Sum1 < (...your subquery...)

Counting from two tables according to single criteria

I am trying to do something like this:
SELECT COUNT(topic.topic_id) + COUNT(post.post_id)
FROM topic, post WHERE author_id = ?
Both tables have column author_id.
I get column reference "author_id" is ambiguous error.
How can I tell it that author_id is present in both tables?
While you could, you most probably do not want to join both tables, since that might result in different counts. Explanation in this related answer:
Two SQL LEFT JOINS produce incorrect result
Two subqueries would be fastest:
SELECT (SELECT COUNT(topic_id) FROM topic WHERE author_id = ?)
+ (SELECT COUNT(post_id) FROM post WHERE author_id = ?) AS result
If topic_id and post_id are defined NOT NULL in their respective tables, you can slightly simplify:
SELECT (SELECT COUNT(*) FROM topic WHERE author_id = ?)
+ (SELECT COUNT(*) FROM post WHERE author_id = ?) AS result
If at least one of both author_id columns is unique, a JOIN would work, too, in this case (but slower, and I wouldn't use it):
SELECT COUNT(t.topic_id) + COUNT(p.post_id) AS result
FROM topic t
LEFT post p USING (author_id)
WHERE t.author_id = ?;
If you want to enter the value only once, use a CTE:
WITH x AS (SELECT ? AS author_id) -- enter value here
SELECT (SELECT COUNT(*) FROM topic JOIN x USING (author_id))
+ (SELECT COUNT(*) FROM post JOIN x USING (author_id)) AS result
But be sure to understand how joins work. Read the chapter about Joined Tables in the manual.

Approach to Selecting top item matching a criteria

EDIT: my apologies, this was a MSSQL2008 issue.
I have a SQL problem that I've come up against routinely, and normally just solved w/ a nested query. I'm hoping someone can suggest a more elegant solution.
It often happens that I need to select a result set for a user, conditioned upon it being the most recent, or the most sizeable or whatever.
For example: Their complete list of pages created, but I only want the most recent name they applied to a page. It so happens that the database contains many entries for each page, and only the most recent one is desired.
I've been using a nested select like:
SELECT pg.customName, pg.id
FROM (
select id, max(createdAt) as mostRecent
from pages
where userId = #UserId
GROUP BY id
) as MostRecentPages
JOIN pages pg
ON pg.id = MostRecentPages.id
AND pg.createdAt = MostRecentPages.mostRecent
Is there a better syntax to perform this selection?
Looks like you want
SELECT id, customname
FROM (SELECT id, customname,
row_number() OVER(PARTITION BY id ORDER BY createdat DESC) as pos
FROM pages
WHERE pages.userid = #UserId
) x
WHERE x.row_number = 1
(I'm assuming you're using SQL Server from the #UserId parameter. row_number() will work for SQL Server 2005, and tbh the above should also work for Oracle, Postgresql 8.4...)
This will select all the pages by userid and work out which is the most recent using a sort. An alternative would be sth like:
SELECT id, (SELECT TOP 1 customname
FROM pages pages_inner
WHERE pages_inner.id = pages_outer.id
ORDER BY pages_inner.createdat DESC) as customname
FROM (SELECT DISTINCT id FROM pages WHERE pages.userid = #UserId) pages_inner
Which approach is better depends on how many pages rows per id you have compared to pages per userid, I guess.
I'm not sure about better but a different syntax you could try is
SELECT pg.customName, pg.id
FROM pages pg
WHERE userId = #UserId
AND NOT EXISTS
(
SELECT * FROM pages pg2
WHERE pg2.UserId = pg.UserId
AND pg2.id = pg.id
AND pg2.createdAt > pg.createdAt
)
Another alternative would be an OUTER JOIN as in Bill Karwin's answer here How to get all the fields of a row using the SQL MAX function?
For what database (including version)? What you posted could be MySQL, SQL Server, or Sybase...
Using:
SELECT pg.customName,
pg.id
FROM PAGES pg
JOIN (SELECT t.userid,
MAX(t.createdAt) as mostRecent
FROM PAGES t
GROUP BY t.userid) x ON x.id = pg.id
AND x.mostRecent = pg.createdAt
AND x.userid = #UserId
This is the best approach for a portable query, assuming column references are correct. But there are alternatives for limiting the data set - SQL Server uses TOP, MySQL/Postgre/SQLite use LIMIT, Oracle uses ROWNUM.
What's best depends on your data & how the respective optimizer sees it, and your needs (portable vs not). Check the explain plan for the respective database to see how efficient the query is.
Are you using Oracle? Try to see if this query that uses analytic function would work for you. (Don't have access to db right now, so can't test myself.)
SELECT DISTINCT pg.id,
FIRST_VALUE(pg.customName) OVER (PARTITION BY pg.id ORDER BY pg.createdAt DESC) AS customName
FROM pages pg
Assuming SQL Server and your Pages table like so:
CREATE TABLE Pages (
Id int IDENTITY(1, 1) PRIMARY KEY
, CustomName nvarchar(20) NOT NULL
, CreatedAt datetime NOT NULL DEFAULT GETDATE()
, UserId int references Users(Id)
)
I would do the following:
select TOP 1 p.Id as PageId
, p.CustomName
from Pages p
where p.UserId = #UserId
order by p.Created desc
Or even:
select TOP 1 p.Id as PageId
, p.CustomName
, MAX(p.CreatedAt) DateTimeCreated
from Pages p
where p.UserId = #UserId
group by p.Id
, p.CustomName
I hope this helps! (If not, please provide further details so that we may be of better helping hand)
I don't know what your table looks like
Select top 1 pg.createdAt
,pg.customName
,pg.id
from table pg
where pg.UserId = #UserId
order by pg.createdAt Desc
I need a bit more info on your table(s)