SQL getting data from 2 tables - sql

I've got a tricky (at least for me it's tricky) question, I want to arrange data by comment count. My first table is called all_comments which has these columns (more but not essential):
comment, target_id
My second table is called our_videos which has these columns (more but not essential):
id, title
I want to get the count of all comments that have target_id same as id on 2nd table and arrange that data by comment count. Here is example of what I want:
TABLE #1:
id target_id
----------------
1 3
2 5
3 5
4 3
5 3
TABLE #2:
id title
-----------
1 "test"
2 "another-test"
3 "testing"
5 "......"
This is basically saying that data, that is in 2nd database and have id of 3 have 3 comments, and data that have id of 5 have 2 comments, and I want to arrange that data by this comment count and get result like this:
RESULT:
id title
----------------
3 "testing"
5 "......."
1 "test"
2 "another-test"
If I missed any important info needed for this question just ask, thanks for help, peace :)

it is very simple query and you definitely have to look at any sql tutorial
naive variant will be:
select videos.id, videos.title, count(*) as comment_count
from videos
left outer join
comments
on (videos.id = comments.target_id)
group by videos.id, videos.title
order by comment_count desc
this version has some performance problems, because you have to group by name, to speed up it we usually do next thing:
select videos.id, videos.title, q.cnt as comment_count
from videos
left outer join
(
select target_id, count(*)
from comments
group by target_id
) as q
on videos.id = q.target_id
order by q.cnt DESC

select videos.id, videos.title, isnull(cnt, 0) as cnt
from videos
left outer join
(select target_id, count(*) as cnt
from comments
group by target_id) as cnts
on videos.id = cnts.target_Id
order by isnull(cnt, 0) desc, videos.title

Some systems will let you write this even though sorting is not strictly supposed to happen on an column not included in the output. I don't necessarily recommend it but I might argue it's the most straightforward.
select id, title from videos
order by (select count(*) from comments where target_id = videos.id) desc, title
If you don't mind having it in the output it's a quick change:
select id, title from videos,
(select count(*) from comments where target_id = videos.id) as comment_count
order by comment_count desc, title
SQL generally has a lot of options.

Related

Postgres, groupBy and count for table and relations at the same time

I have a table called 'users' that has the following structure:
id (PK)
campaign_id
createdAt
1
123
2022-07-14T10:30:01.967Z
2
1234
2022-07-14T10:30:01.967Z
3
123
2022-07-14T10:30:01.967Z
4
123
2022-07-14T10:30:01.967Z
At the same time I have a table that tracks clicks per user:
id (PK)
user_id(FK)
createdAt
1
1
2022-07-14T10:30:01.967Z
2
2
2022-07-14T10:30:01.967Z
3
2
2022-07-14T10:30:01.967Z
4
2
2022-07-14T10:30:01.967Z
Both of these table are up to millions of records... I need the most efficient query to group the data per campaign_id.
The result I am looking for would look like this:
campaign_id
total_users
total_clicks
123
3
1
1234
1
3
I unfortunately have no idea how to achieve this while minding performance and most important of it all I need to use WHERE or HAVING to limit the query in a certain time range by createdAt
Note, PostgreSQL is not my forte, nor is SQL. But, I'm learning spending some time on your question. Have a go with INNER JOIN after two seperate SELECT() statements:
SELECT * FROM
(
SELECT campaign_id, COUNT (t1."id(PK)") total_users FROM t1 GROUP BY campaign_id
) tbl1
INNER JOIN
(
SELECT campaign_id, COUNT (t2."user_id(FK)") total_clicks FROM t2 INNER JOIN t1 ON t1."id(PK)" = t2."user_id(FK)" GROUP BY campaign_id
) tbl2
USING(campaign_id)
See an online fiddle. I believe this is now also ready for a WHERE clause in both SELECT statements to filter by "createdAt". I'm pretty sure someone else will come up with something better.
Good luck.
Hope this will help you.
select u.campaign_id,
count(distinct u.id) users_count,
count(c.user_id) clicks_count
from
users u left join clicks c on u.id=c.user_id
group by 1;
See here query output

SQL obtaining items ranked by their count(*)

I have been attempting the following query for a while- not sure how to approach this issue I'm having.
I need to obtain bands that cover the second most styles of music - including all equal bands if there is a tie for second. For example for the table band_style,
Band_id | Style
---------------------
1 Rock
2 Pop
1 Punk
3 Classical
1 Metal
2 Rock
4 Pop
4 Rap
The returned result should be
Band_id | Num_styles
2 2
4 2
My initial attempt at a solution:
SELECT band_id, COUNT(*) AS num_styles FROM band_style
GROUP BY band_id HAVING COUNT(*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id));
So this gives me the count of all the bands with less styles than the maximum. Now, I'd like to take ALL rows which have the maximum value of this query. I do not want to use rownum or limit because from what I've experienced this doesn't work too well in the case of ties. I am also wondering if there is a way to wrap this in another MAX function, but I don't really see how.
Any help with this issue would be appreciated- also think this would be useful to know to see if it can be applied to 3rd, 4th highest, etc.
(Using Oracle/SQLPlus)
Assuming this is a large data file and we do not necessarily know what the "second highest count" is.
UPDATE: this almost works- gets all bands with less than max number of styles. But calling MAX doesn't seem to be working, as the table returned still has all values of NUM except the max..
WITH data AS (
SELECT band_id, COUNT(*) AS NUM FROM band_style GROUP BY band_id HAVING COUNT (*) <
(SELECT MAX(c) FROM
(SELECT COUNT(band_id) AS c
FROM band_style
GROUP BY band_id)))
SELECT data.band_id, data.NUM FROM data
INNER JOIN ( SELECT band_id m, MAX(NUM) n
FROM data GROUP BY band_id
) t
ON t.band_id = data.band_id
AND t.NUM = data.NUM;
If you have to stick with mysql, this sql will be much more difficult. But if you could switch to mariadb or oracle this should work.
with data as (
select
band_id, count(*) styles,
dense_rank() over (order by count(*) desc) place
from
table1 group by band_id)
select * from data where place=2
http://sqlfiddle.com/#!4/dc3f6/12
Your friend here is the window function dense_rank.
The output is:
BAND_ID STYLES PLACE
2 2 2
4 2 2
And here to avoid some missunderstandings, due to place 2 is here styles 2.
http://sqlfiddle.com/#!4/2be32/3
Now the styles count is different from the place id.
BAND_ID STYLES PLACE
4 3 2
This illustrates that dense_rank does not know the second highest count value beforehand.

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

Select a subgroup of records by one distinct column

Sorry if this has been answered before, but all the related questions didn't quite seem to match my purpose.
I have a table that looks like the following:
ID POSS_PHONE CELL_FLAG
=======================
1 111-111-1111 0
2 222-222-2222 0
2 333-333-3333 1
3 444-444-4444 1
I want to select only distinct ID values for an insert, but I don't care which specific ID gets pulled out of the duplicates.
For Example(a valid SELECT would be):
1 111-111-1111 0
2 222-222-2222 0
3 444-444-4444 1
Before I had the CELL_FLAG column, I was just using an aggregate function as so:
SELECT ID, MAX(POSS_PHONE)
FROM TableA
GROUP BY ID
But I can't do:
SELECT ID, MAX(POSS_PHONE), MAX(CELL_FLAG)...
because I would lose integrity within the row, correct?
I've seen some similar examples using CTEs, but once again, nothing that quite fit.
So maybe this is solvable by a CTE or some type of self-join subquery? I'm at a block right now, so I can't see any other solutions.
Just get your aggregation in a subquery and join to it:
SELECT a.ID, sub.Poss_Phone, CELL_FLAG
FROM TableA as a
INNER JOIN (SELECT ID, MAX(POSS_PHONE) as [Poss_Phone]
FROM TableA
GROUP BY ID) Sub
ON Sub.ID = a.ID and SUB.Poss_Phone = A.Poss_Phone
This will keep integrity between your non-aggregated fields but still give you the MAX(Poss_Phone) per ID.

How can you get a histogram of counts from a join table without using a subquery?

I have a lot of tables that look like this: (id, user_id, object_id). I am often interested in the question "how many users have one object? how many have two? etc." and would like to see the distribution.
The obvious answer to this looks like:
select x.ucount, count(*)
from (select count(*) as ucount from objects_users group by user_id) as x
group by x.ucount
order by x.ucount;
This produces results like:
ucount | count
-------|-------
1 | 15
2 | 17
3 | 23
4 | 104
5 | 76
7 | 12
Using a subquery here feels inelegant to me and I'd like to figure out how to produce the same result without. Further, if the question you're trying to ask is slightly more complicated it gets messy passing more information out of the subquery. For example, if you want the data further grouped by the user's creation date:
select
x.ucount,
(select cdate from users where id = x.user_id) as cdate,
count(*)
from (
select user_id, count(*) as ucount
from objects_users group by user_id
) as x
group by cdate, x.ucount,
order by cdate, x.ucount;
Is there some way to avoid the explosion of subqueries? I suppose in the end my objection is aesthetic, but it makes the queries hard to read and hard to write.
I think a subquery is exactly the appropriate way to do this, regardless of your RDBMS. Why would it be inelegant?
For the second query, just join the users table like this:
SELECT
x.ucount,
u.cdate,
COUNT(*)
FROM (
SELECT
user_id,
COUNT(*) AS ucount
FROM objects_users
GROUP BY user_id
) AS x
LEFT JOIN users AS u
ON x.user_id = u.id
GROUP BY u.cdate, x.ucount
ORDER BY u.cdate, x.ucount