SQL Query With Max Value from Child Table - sql

Three pertinent tables: tracks (music tracks), users, and follows.
The follows table is a many to many relationship relating users (followers) to users (followees).
I'm looking for this as a final result:
<track_id>, <user_id>, <most popular followee>
The first two columns are simple and result from a relationship between tracks and users. The third is my problem. I can join with the follows table and get all of the followees that each user follows, but how to get only the most followee that has the highest number of follows.
Here are the tables with their pertinent columns:
tracks: id, user_id (fk to users.id), song_title
users: id
follows: followee_id (fk to users.id), follower_id (fk to users.id)
Here's some sample data:
TRACKS
1, 1, Some song title
USERS
1
2
3
4
FOLLOWS
2, 1
3, 1
4, 1
3, 4
4, 2
4, 3
DESIRED RESULT
1, 1, 4
For the desired result, the 3rd field is 4 because as you can see in the FOLLOWS table, user 4 has the most number of followers.
I and a few great minds around me are still scratching our heads.

So I threw this into Linqpad because I'm better with Linq.
Tracks
.Where(t => t.TrackId == 1)
.Select(t => new {
TrackId = t.TrackId,
UserId = t.UserId,
MostPopularFolloweeId = Followers
.GroupBy(f => f.FolloweeId)
.OrderByDescending(g => g.Count())
.FirstOrDefault()
.Key
});
The resulting SQL query was the following (#p0 being the track id):
-- Region Parameters
DECLARE #p0 Int = 1
-- EndRegion
SELECT [t0].[TrackId], [t0].[UserId], (
SELECT [t3].[FolloweeId]
FROM (
SELECT TOP (1) [t2].[FolloweeId]
FROM (
SELECT COUNT(*) AS [value], [t1].[FolloweeId]
FROM [Followers] AS [t1]
GROUP BY [t1].[FolloweeId]
) AS [t2]
ORDER BY [t2].[value] DESC
) AS [t3]
) AS [MostPopularFolloweeId]
FROM [Tracks] AS [t0]
WHERE [t0].[TrackId] = #p0
That outputs the expected response, and should be a start to a cleaner query.

This sounds like an aggregation query with row_number(). I'm a little confused on how all the joins come together:
select t.*
from (select t.id, f.followee_id, count(*) as cnt,
row_number() over (partition by t.id order by count(*) desc) as seqnum
from followers f join
tracks t
on f.follow_id = t.user_id
group by t.id, f.followee_id
) t
where seqnum = 1;

Related

SQL - Count how many unique IDs would be w/o a category if "category bucket" was removed

I am using Snowflake for this SQL question if there are any unique functions I can use, please help me out!
I have a data set with unique ids, other attributes that aren’t important, and then a list of categories (~22) each Unique id could fall into (denoted by a 1 if it’s in the category and 0 if not.)
I am trying to figure out how to write something where I could see if across all the categories, if a category was removed if any of the unique ids would then be left without any category and count how many unique ids would then do total how many ids would be left category less.
Example below for unique id Jshshsv it is only in CatAA but id Hairbdb is in CatY and CatAA. If CatAA was dropped, how many Ids would be left with no category?
UniqueID
Sum across Categories
CatX
CatY
CatZ
CatAA
Hairbdb
2
0
1
0
1
Jshshsv
1
0
0
0
1
For some reason I just cannot figure out how to do this in a manageable way in sql with so many category buckets. Any tips or things to try would be appreciated.
So if you data is in the form of pairs, with repeats
ID|Cat
--|--
Hairbdb|CatY
Hairbdb|CatY
Hairbdb|CatAA
Jshshsv|CatAA
Jshshsv|CatAA
The follow SQL can be used to find the Catagories are the single match for an ID.
WITH data AS (
SELECT * FROM VALUES
('Hairbdb','CatY'),
('Hairbdb','CatAA'),
('Jshshsv','CatAA')
v(id, cat)
), dist_data AS (
SELECT DISTINCT id, cat FROM data
), cat_counts AS (
SELECT id, count(distinct cat) c_cat
FROM data
GROUP BY 1
HAVING c_cat = 1
)
SELECT a.cat, a.id
FROM dist_data AS a
JOIN cat_counts AS b
ON b.id = a.id;
This works because you first count per id, how many categories the id is in, then you join the distinct data with those where the id is only in one cat, will give you id & cat
CAT
ID
CatAA
Jshshsv
IF you data is in a wide format (like how you present it), you can turn it into my form via UNPIVOT like so:
WITH data AS (
SELECT * FROM VALUES
('Hairbdb',0,1,0,1),
('Jshshsv',0,0,0,1)
v(id, catx, caty, catz, cataa )
)
SELECT id, cat from data unpivot(catv for cat in (catx, caty, catz, cataa))
WHERE catv = 1;
giving:
ID
CAT
Hairbdb
CATY
Hairbdb
CATAA
Jshshsv
CATAA
But if it's in your form with duplicates removed you could just use a WHERE clause:
WITH data AS (
SELECT * from values
('Hairbdb', 0, 1, 0, 1),
('Jshshsv', 0, 0, 0, 1)
v(UniqueID, CatX,CatY,CatZ, CatAA)
)
SELECT UniqueID,
CatX+CatY+CatZ+CatAA as "Sum across Categories",
CatX,
CatY,
CatZ,
CatAA
FROM data
WHERE "Sum across Categories" = 1;
So another variation, if you have many rows, per id, and the category allocation is not the same across the set, you can use a COUNT_IF and the greater than 0 test to turn the data into a in any, then using a HAVING clause to filter out those that are are in many columns
WITH data AS (
SELECT * FROM VALUES
('Hairbdb',0,1,0,1),
('Hairbdb',1,1,0,1),
('Hairbdb',0,1,1,1),
('Jshshsv',0,0,0,1),
('Jshshsv',0,0,0,1)
v(id, catx, caty, catz, cataa )
)
SELECT id,
COUNT_IF(catx=1)>0 AS catx_a,
COUNT_IF(caty=1)>0 AS caty_a,
COUNT_IF(catz=1)>0 AS catz_a,
COUNT_IF(cataa=1)>0 AS cataa_a
FROM data
GROUP BY 1
HAVING catx_a::int + caty_a::int + catz_a::int + cataa_a::int = 1;
if you are storing the categories in columns (though not a good design) you could try this.
SELECT UniqueID , sum(CatX+CatY+CatZ+CatAA) over (partition by UniqueID) as "Sum across Categories",
CatX, CatY, CatZ, CatAA FROM (
SELECT 'Hairbdb' as UniqueID, 0 as CatX, 1 as CatY, 0 as CatZ, 1 as CatAA from dual
UNION ALL
SELECT 'Jshshsv', 0,0,0,1 from dual
);

Get Items with smallest vote count including 0 in postgresql

I am working on a project where I need to get the 2 items with the least amount of votes where I have 2 tables an item table and a votes table with a forgienkey of ItemId.
I have this query:
SELECT id FROM (
SELECT "ItemId" AS id,
count("ItemId") AS total
FROM "Votes"
WHERE "ItemId" IN (
SELECT id FROM "Items"
WHERE date("Items"."createdAt") = date('2015-05-26 18:30:00.565+00')
AND "Items"."region" = 'west'
)
GROUP BY "ItemId" ORDER BY total LIMIT 2
) x;
Which in some respects is fine but it doesn't include the Items with the count being null or 0. Is there a better way to do this?
Thanks. Please let me know if you need more info.
Postgresql: 9.4
something like this should work:
SELECT id,
coalesce((SELECT count(*) FROM "Votes" WHERE "ItemId" = "Items".id), 0) as total
FROM "Items"
WHERE date("Items"."createdAt") = date('2015-05-26 18:30:00.565+00')
AND "Items"."region" = 'west'
ORDER BY total LIMIT 2
If an item has not been voted for, then the "Votes" table will not return anything for it and therefore the main query does not display the item at all.
You need to select from "Items" and then LEFT JOIN to "Votes" grouped by "ItemId" and the count of votes for it. Like this, all the items will be considered, also those for which no votes have been cast. Use the coalesce() function to convert NULLs to 0:
SELECT "Items".id, coalesce(x.total, 0) AS cnt
FROM "Items"
LEFT JOIN (
SELECT "ItemId" AS id, count("ItemId") AS total
FROM "Votes"
GROUP BY "ItemId") x USING (id)
WHERE date("Items"."createdAt") = '2015-05-26'::date
AND "Items"."region" = 'west'
ORDER BY cnt
LIMIT 2;

SQL to Paginate Data Where Pagination Starts at a Given Primary Key

Edit: The original example I used had an int for the primary key when in fact my primary key is a var char containing a UUID as a string. I've updated the question below to reflect this.
Caveat: Solution must work on postgres.
Issue: I can easily paginate data when starting from a known page number or index into the list of results to paginate but how can this be done if all I know is the primary key of the row to start from. Example say my table has this data
TABLE: article
======================================
id categories content
--------------------------------------
B7F79F47 local a
6cb80450 local b
563313df local c
9205AE5A local d
E88F7520 national e
5ab669a5 local f
fb047cf6 local g
591c6b50 national h
======================================
Given an article primary key of '9205AE5A' (article.id == '9205AE5A') and categories column must contain 'local' what sql can I use to return a result set that includes the articles either side of this one if it was paginated i.e. the returned result should contain 3 items (previous, current, next articles)
('563313df','local','c'),('9205AE5A','local','d'),('5ab669a5','local','f')
Here is my example setup:
-- setup test table and some dummy data
create table article (
id varchar(36),
categories varchar(256),
content varchar(256)
)
insert into article values
('B7F79F47', 'local', 'a'),
('6cb80450', 'local', 'b'),
('563313df', 'local', 'c'),
('9205AE5A', 'local', 'd'),
('E88F7520', 'national', 'e'),
('5ab669a5', 'local', 'f'),
('fb047cf6', 'local', 'g'),
('591c6b50', 'national', 'h');
I want to paginate the rows in the article table but the starting point I have is the 'id' of an article. In order to provide a "Previous Article" and "Next Article" links on the rendered page I also need the articles that would come either side of this article I know the id of
On the server side I could run my pagination sql and iterate through each result set to find the index of the given item. See the following inefficient pseudo code / sql to do this:
page = 0;
resultsPerPage = 10;
articleIndex = 0;
do {
resultSet = select * from article where categories like '%local%' limit resultsPerPage offset (page * resultsPerPage) order by content;
for (result in resultSet) {
if (result.id == '9205AE5A') {
// we have found the articles index ('articleIndex') in the paginated list.
// Now we can do a normal pagination to return the list of 3 items starting at the article prior to the one found
return select * from article where categories like '%local%' limit 3 offset (articleIndex - 1);
}
articleIndex++;
}
page++;
} while (resultSet.length > 0);
This is horrendously slow if the given article is way down the paginated list. How can this be done without the ugly while+for loops?
Edit 2: I can get the result using two sql calls
SELECT 'CurrentArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE id='9205AE5A'
ORDER BY content ASC
LIMIT 1 OFFSET 0
From that result returned e.g.
('CurrentArticle', 4, '9205AE5A', 'local', 'd')
I can get the RowNum value (4) and then run the sql again to get RowNum+1 (5) and RowNum-1 (3)
SELECT 'PrevNextArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE RowNum in (3, 5)
ORDER BY content ASC
LIMIT 2 OFFSET 0
with result
('PrevNextArticle', 3, '563313df', 'local', 'c'),
('PrevNextArticle', 5, '5ab669a5', 'local', 'f')
It would be nice to do this in one efficient sql call though.
If the only information about the surrounding articles shown in the page is "Next" and "Previous" there is no need to get their rows in advance. When the user chooses "Previous" or "Next" use these queries SQL Fiddle
-- Previous
select *
from article
where categories = 'local' and id < 3
order by id desc
limit 1
;
-- Next
select *
from article
where categories = 'local' and id > 3
order by id
limit 1
;
If it is necessary to get information about the previous and next articles: SQL Fiddle
with ordered as (
select
id, content,
row_number() over(order by content) as rn
from article
where categories = 'local'
), rn as (
select rn
from ordered
where id = '9205AE5A'
)
select
o.id,
o.content,
o.rn - rn.rn as rn
from ordered o cross join rn
where o.rn between rn.rn -1 and rn.rn + 1
order by o.rn
The articles will have rn -1, 0, 1, if existent.
Check whether following query solve your issue. passed id as well in filter with category:
SELECT * FROM
(
select (1 + row_number() OVER(Order BY id ASC)) AS RowNo,* from article where categories like '%local%' and id>=3
UNION
(SELECT 1,* FROM article where categories like '%local%' and id<3 ORDER BY id DESC LIMIT 1)
) AS TEMP
WHERE
RowNo between 1 and (1+10-1)
ORDER BY
RowNo
I think this query will yield you the result
(SELECT *, 2 AS ordering from article where categories like '%local%' AND id = 3 LIMIT 1)
UNION
(SELECT *, 1 AS ordering from article where categories like '%local%' AND id < 3 ORDER BY id DESC LIMIT 1 )
UNION
(SELECT *, 3 AS ordering from article where categories like '%local%' AND id > 3 ORDER BY id ASC LIMIT 1 )

UPDATE FROM subquery using the same table in subquery's WHERE

I have 2 integer fields in a table "user": leg_count and leg_length. The first one stores the amount of legs of a user and the second one - their total length.
Each leg that belongs to user is stored in separate table, as far as typical internet user can have zero to infinity legs:
CREATE TABLE legs (
user_id int not null,
length int not null
);
I want to recalculate the statistics for all users in one query, so I try:
UPDATE users SET
leg_count = subquery.count, leg_length = subquery.length
FROM (
SELECT COUNT(*) as count, SUM(length) as length FROM legs WHERE legs.user_id = users.id
) AS subquery;
and get "subquery in FROM cannot refer to other relations of same query level" error.
So I have to do
UPDATE users SET
leg_count = (SELECT COUNT(*) FROM legs WHERE legs.user_id = users.id),
leg_length = (SELECT SUM(length) FROM legs WHERE legs.user_id = users.id)
what makes database to perform 2 SELECT's for each row, although, required data could be calculated in one SELECT:
SELECT COUNT(*), SUM(length) FROM legs;
Is it possible to optimize my UPDATE query to use only one SELECT subquery?
I use PostgreSQL, but I beleive, the solution exists for any SQL dialect.
TIA.
I would do:
WITH stats AS
( SELECT COUNT(*) AS cnt
, SUM(length) AS totlength
, user_id
FROM legs
GROUP BY user_id
)
UPDATE users
SET leg_count = cnt, leg_length = totlength
FROM stats
WHERE stats.user_id = users.id
You could use PostgreSQL's extended update syntax:
update users as u
set leg_count = aggr.cnt
, leg_length = aggr.length
from (
select legs.user_id
, count(*) as cnt
, sum(length) as length
from legs
group by
legs.user_id
) as aggr
where u.user_id = aggr.user_id

How do I select the most recent entity of its type in a related entity list in T-SQL or Linq to Sql?

I have a table full of Actions. Each Action is done by a certain User at a certain DateTime. So it has 4 fields: Id, UserId, ActionId, and ActionDate.
At first, I was just reporting the top 10 most recent Actions like this:
(from a in db.Action
orderby a.ActionDate descending
select a).Take(10);
That is simple and it works. But the report is less useful than I thought. This is because some user might take 10 actions in a row and hog the whole top 10 list. So I would like to report the single most recent action taken for each of the top 10 most recently active users.
From another question on SO, I have gotten myself most of the way there. It looks like I need the "group" feature. If I do this:
from a in db.Action
orderby a.ActionDate descending
group a by a.UserId into g
select g;
And run it in linqpad, I get an IOrderedQueryable<IGrouping<Int32,Action>> result set with one group for each user. However, it is showing ALL the actions taken by each user and the result set is hierarchical and I would like it to be flat.
So if my Action table looks like this
Id UserId ActionId ActionDate
1 1 1 2010/01/09
2 1 63 2010/01/10
3 2 1 2010/01/03
4 2 7 2010/01/06
5 3 11 2010/01/07
I want the query to return records 2, 5, and 4, in that order. This shows me, for each user, the most recent action taken by that user, and all reported actions are in order, with the most recent at the top. So I would like to see:
Id UserId ActionId ActionDate
2 1 63 2010/01/10
5 3 11 2010/01/07
4 2 7 2010/01/06
EDIT:
I am having a hard time expressing this in T-SQL, as well. This query gets me the users and their last action date:
select
a.UserId,
max(a.ActionDate) as LastAction
from
Action as a
group by
a.UserId
order by
LastAction desc
But how do I access the other information that is attached to the record where the max ActionDate was found?
EDIT2: I have been refactoring and Action is now called Read, but everything else is the same. I have adopted Frank's solution and it is as follows:
(from u in db.User
join r in db.Read on u.Id equals r.UserId into allRead
where allRead.Count() > 0
let lastRead = allRead.OrderByDescending(r => r.ReadDate).First()
orderby lastRead.ReadDate descending
select new ReadSummary
{
Id = u.Id,
UserId = u.Id,
UserNameFirstLast = u.NameFirstLast,
ProductId = lastRead.ProductId,
ProductName = lastRead.Product.Name,
SegmentCode = lastRead.SegmentCode,
SectionCode = lastRead.SectionCode,
ReadDate = lastRead.ReadDate
}).Take(10);
This turns into the following:
exec sp_executesql N'SELECT TOP (10) [t12].[Id], [t12].[ExternalId], [t12].[FirstName], [t12].[LastName], [t12].[Email], [t12].[DateCreated], [t12].[DateLastModified], [t12].[DateLastLogin], [t12].[value] AS [ProductId], [t12].[value2] AS [ProductName], [t12].[value3] AS [SegmentCode], [t12].[value4] AS [SectionCode], [t12].[value5] AS [ReadDate2]
FROM (
SELECT [t0].[Id], [t0].[ExternalId], [t0].[FirstName], [t0].[LastName], [t0].[Email], [t0].[DateCreated], [t0].[DateLastModified], [t0].[DateLastLogin], (
SELECT [t2].[ProductId]
FROM (
SELECT TOP (1) [t1].[ProductId]
FROM [dbo].[Read] AS [t1]
WHERE [t0].[Id] = [t1].[UserId]
ORDER BY [t1].[ReadDate] DESC
) AS [t2]
) AS [value], (
SELECT [t5].[Name]
FROM (
SELECT TOP (1) [t3].[ProductId]
FROM [dbo].[Read] AS [t3]
WHERE [t0].[Id] = [t3].[UserId]
ORDER BY [t3].[ReadDate] DESC
) AS [t4]
INNER JOIN [dbo].[Product] AS [t5] ON [t5].[Id] = [t4].[ProductId]
) AS [value2], (
SELECT [t7].[SegmentCode]
FROM (
SELECT TOP (1) [t6].[SegmentCode]
FROM [dbo].[Read] AS [t6]
WHERE [t0].[Id] = [t6].[UserId]
ORDER BY [t6].[ReadDate] DESC
) AS [t7]
) AS [value3], (
SELECT [t9].[SectionCode]
FROM (
SELECT TOP (1) [t8].[SectionCode]
FROM [dbo].[Read] AS [t8]
WHERE [t0].[Id] = [t8].[UserId]
ORDER BY [t8].[ReadDate] DESC
) AS [t9]
) AS [value4], (
SELECT [t11].[ReadDate]
FROM (
SELECT TOP (1) [t10].[ReadDate]
FROM [dbo].[Read] AS [t10]
WHERE [t0].[Id] = [t10].[UserId]
ORDER BY [t10].[ReadDate] DESC
) AS [t11]
) AS [value5]
FROM [dbo].[User] AS [t0]
) AS [t12]
WHERE ((
SELECT COUNT(*)
FROM [dbo].[Read] AS [t13]
WHERE [t12].[Id] = [t13].[UserId]
)) > #p0
ORDER BY (
SELECT [t15].[ReadDate]
FROM (
SELECT TOP (1) [t14].[ReadDate]
FROM [dbo].[Read] AS [t14]
WHERE [t12].[Id] = [t14].[UserId]
ORDER BY [t14].[ReadDate] DESC
) AS [t15]
) DESC',N'#p0 int',#p0=0
If anyone knows something simpler (for the sport of it) I would like to know, but I think this is probably good enough.
Probably have some errors in this, but I think you want to do a join into a collection, then use 'let' to choose a member of that collection:
(
from u in db.Users
join a in db.Actions on u.UserID equals a.UserID into allActions
where allActions.Count() > 0
let firstAction = allActions.OrderByDescending(a => a.ActionDate).First()
orderby firstAction.ActionDate descending
select (u,firstAction)
).Take(10)