Edit: The original example I used had an int for the primary key when in fact my primary key is a var char containing a UUID as a string. I've updated the question below to reflect this.
Caveat: Solution must work on postgres.
Issue: I can easily paginate data when starting from a known page number or index into the list of results to paginate but how can this be done if all I know is the primary key of the row to start from. Example say my table has this data
TABLE: article
======================================
id categories content
--------------------------------------
B7F79F47 local a
6cb80450 local b
563313df local c
9205AE5A local d
E88F7520 national e
5ab669a5 local f
fb047cf6 local g
591c6b50 national h
======================================
Given an article primary key of '9205AE5A' (article.id == '9205AE5A') and categories column must contain 'local' what sql can I use to return a result set that includes the articles either side of this one if it was paginated i.e. the returned result should contain 3 items (previous, current, next articles)
('563313df','local','c'),('9205AE5A','local','d'),('5ab669a5','local','f')
Here is my example setup:
-- setup test table and some dummy data
create table article (
id varchar(36),
categories varchar(256),
content varchar(256)
)
insert into article values
('B7F79F47', 'local', 'a'),
('6cb80450', 'local', 'b'),
('563313df', 'local', 'c'),
('9205AE5A', 'local', 'd'),
('E88F7520', 'national', 'e'),
('5ab669a5', 'local', 'f'),
('fb047cf6', 'local', 'g'),
('591c6b50', 'national', 'h');
I want to paginate the rows in the article table but the starting point I have is the 'id' of an article. In order to provide a "Previous Article" and "Next Article" links on the rendered page I also need the articles that would come either side of this article I know the id of
On the server side I could run my pagination sql and iterate through each result set to find the index of the given item. See the following inefficient pseudo code / sql to do this:
page = 0;
resultsPerPage = 10;
articleIndex = 0;
do {
resultSet = select * from article where categories like '%local%' limit resultsPerPage offset (page * resultsPerPage) order by content;
for (result in resultSet) {
if (result.id == '9205AE5A') {
// we have found the articles index ('articleIndex') in the paginated list.
// Now we can do a normal pagination to return the list of 3 items starting at the article prior to the one found
return select * from article where categories like '%local%' limit 3 offset (articleIndex - 1);
}
articleIndex++;
}
page++;
} while (resultSet.length > 0);
This is horrendously slow if the given article is way down the paginated list. How can this be done without the ugly while+for loops?
Edit 2: I can get the result using two sql calls
SELECT 'CurrentArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE id='9205AE5A'
ORDER BY content ASC
LIMIT 1 OFFSET 0
From that result returned e.g.
('CurrentArticle', 4, '9205AE5A', 'local', 'd')
I can get the RowNum value (4) and then run the sql again to get RowNum+1 (5) and RowNum-1 (3)
SELECT 'PrevNextArticle' AS type,* FROM
(
SELECT (ROW_NUMBER() OVER (ORDER BY content ASC)) AS RowNum,*
FROM article
WHERE categories LIKE '%local%'
ORDER BY content ASC
) AS tagCloudArticles
WHERE RowNum in (3, 5)
ORDER BY content ASC
LIMIT 2 OFFSET 0
with result
('PrevNextArticle', 3, '563313df', 'local', 'c'),
('PrevNextArticle', 5, '5ab669a5', 'local', 'f')
It would be nice to do this in one efficient sql call though.
If the only information about the surrounding articles shown in the page is "Next" and "Previous" there is no need to get their rows in advance. When the user chooses "Previous" or "Next" use these queries SQL Fiddle
-- Previous
select *
from article
where categories = 'local' and id < 3
order by id desc
limit 1
;
-- Next
select *
from article
where categories = 'local' and id > 3
order by id
limit 1
;
If it is necessary to get information about the previous and next articles: SQL Fiddle
with ordered as (
select
id, content,
row_number() over(order by content) as rn
from article
where categories = 'local'
), rn as (
select rn
from ordered
where id = '9205AE5A'
)
select
o.id,
o.content,
o.rn - rn.rn as rn
from ordered o cross join rn
where o.rn between rn.rn -1 and rn.rn + 1
order by o.rn
The articles will have rn -1, 0, 1, if existent.
Check whether following query solve your issue. passed id as well in filter with category:
SELECT * FROM
(
select (1 + row_number() OVER(Order BY id ASC)) AS RowNo,* from article where categories like '%local%' and id>=3
UNION
(SELECT 1,* FROM article where categories like '%local%' and id<3 ORDER BY id DESC LIMIT 1)
) AS TEMP
WHERE
RowNo between 1 and (1+10-1)
ORDER BY
RowNo
I think this query will yield you the result
(SELECT *, 2 AS ordering from article where categories like '%local%' AND id = 3 LIMIT 1)
UNION
(SELECT *, 1 AS ordering from article where categories like '%local%' AND id < 3 ORDER BY id DESC LIMIT 1 )
UNION
(SELECT *, 3 AS ordering from article where categories like '%local%' AND id > 3 ORDER BY id ASC LIMIT 1 )
Related
I am using Snowflake for this SQL question if there are any unique functions I can use, please help me out!
I have a data set with unique ids, other attributes that aren’t important, and then a list of categories (~22) each Unique id could fall into (denoted by a 1 if it’s in the category and 0 if not.)
I am trying to figure out how to write something where I could see if across all the categories, if a category was removed if any of the unique ids would then be left without any category and count how many unique ids would then do total how many ids would be left category less.
Example below for unique id Jshshsv it is only in CatAA but id Hairbdb is in CatY and CatAA. If CatAA was dropped, how many Ids would be left with no category?
UniqueID
Sum across Categories
CatX
CatY
CatZ
CatAA
Hairbdb
2
0
1
0
1
Jshshsv
1
0
0
0
1
For some reason I just cannot figure out how to do this in a manageable way in sql with so many category buckets. Any tips or things to try would be appreciated.
So if you data is in the form of pairs, with repeats
ID|Cat
--|--
Hairbdb|CatY
Hairbdb|CatY
Hairbdb|CatAA
Jshshsv|CatAA
Jshshsv|CatAA
The follow SQL can be used to find the Catagories are the single match for an ID.
WITH data AS (
SELECT * FROM VALUES
('Hairbdb','CatY'),
('Hairbdb','CatAA'),
('Jshshsv','CatAA')
v(id, cat)
), dist_data AS (
SELECT DISTINCT id, cat FROM data
), cat_counts AS (
SELECT id, count(distinct cat) c_cat
FROM data
GROUP BY 1
HAVING c_cat = 1
)
SELECT a.cat, a.id
FROM dist_data AS a
JOIN cat_counts AS b
ON b.id = a.id;
This works because you first count per id, how many categories the id is in, then you join the distinct data with those where the id is only in one cat, will give you id & cat
CAT
ID
CatAA
Jshshsv
IF you data is in a wide format (like how you present it), you can turn it into my form via UNPIVOT like so:
WITH data AS (
SELECT * FROM VALUES
('Hairbdb',0,1,0,1),
('Jshshsv',0,0,0,1)
v(id, catx, caty, catz, cataa )
)
SELECT id, cat from data unpivot(catv for cat in (catx, caty, catz, cataa))
WHERE catv = 1;
giving:
ID
CAT
Hairbdb
CATY
Hairbdb
CATAA
Jshshsv
CATAA
But if it's in your form with duplicates removed you could just use a WHERE clause:
WITH data AS (
SELECT * from values
('Hairbdb', 0, 1, 0, 1),
('Jshshsv', 0, 0, 0, 1)
v(UniqueID, CatX,CatY,CatZ, CatAA)
)
SELECT UniqueID,
CatX+CatY+CatZ+CatAA as "Sum across Categories",
CatX,
CatY,
CatZ,
CatAA
FROM data
WHERE "Sum across Categories" = 1;
So another variation, if you have many rows, per id, and the category allocation is not the same across the set, you can use a COUNT_IF and the greater than 0 test to turn the data into a in any, then using a HAVING clause to filter out those that are are in many columns
WITH data AS (
SELECT * FROM VALUES
('Hairbdb',0,1,0,1),
('Hairbdb',1,1,0,1),
('Hairbdb',0,1,1,1),
('Jshshsv',0,0,0,1),
('Jshshsv',0,0,0,1)
v(id, catx, caty, catz, cataa )
)
SELECT id,
COUNT_IF(catx=1)>0 AS catx_a,
COUNT_IF(caty=1)>0 AS caty_a,
COUNT_IF(catz=1)>0 AS catz_a,
COUNT_IF(cataa=1)>0 AS cataa_a
FROM data
GROUP BY 1
HAVING catx_a::int + caty_a::int + catz_a::int + cataa_a::int = 1;
if you are storing the categories in columns (though not a good design) you could try this.
SELECT UniqueID , sum(CatX+CatY+CatZ+CatAA) over (partition by UniqueID) as "Sum across Categories",
CatX, CatY, CatZ, CatAA FROM (
SELECT 'Hairbdb' as UniqueID, 0 as CatX, 1 as CatY, 0 as CatZ, 1 as CatAA from dual
UNION ALL
SELECT 'Jshshsv', 0,0,0,1 from dual
);
Three pertinent tables: tracks (music tracks), users, and follows.
The follows table is a many to many relationship relating users (followers) to users (followees).
I'm looking for this as a final result:
<track_id>, <user_id>, <most popular followee>
The first two columns are simple and result from a relationship between tracks and users. The third is my problem. I can join with the follows table and get all of the followees that each user follows, but how to get only the most followee that has the highest number of follows.
Here are the tables with their pertinent columns:
tracks: id, user_id (fk to users.id), song_title
users: id
follows: followee_id (fk to users.id), follower_id (fk to users.id)
Here's some sample data:
TRACKS
1, 1, Some song title
USERS
1
2
3
4
FOLLOWS
2, 1
3, 1
4, 1
3, 4
4, 2
4, 3
DESIRED RESULT
1, 1, 4
For the desired result, the 3rd field is 4 because as you can see in the FOLLOWS table, user 4 has the most number of followers.
I and a few great minds around me are still scratching our heads.
So I threw this into Linqpad because I'm better with Linq.
Tracks
.Where(t => t.TrackId == 1)
.Select(t => new {
TrackId = t.TrackId,
UserId = t.UserId,
MostPopularFolloweeId = Followers
.GroupBy(f => f.FolloweeId)
.OrderByDescending(g => g.Count())
.FirstOrDefault()
.Key
});
The resulting SQL query was the following (#p0 being the track id):
-- Region Parameters
DECLARE #p0 Int = 1
-- EndRegion
SELECT [t0].[TrackId], [t0].[UserId], (
SELECT [t3].[FolloweeId]
FROM (
SELECT TOP (1) [t2].[FolloweeId]
FROM (
SELECT COUNT(*) AS [value], [t1].[FolloweeId]
FROM [Followers] AS [t1]
GROUP BY [t1].[FolloweeId]
) AS [t2]
ORDER BY [t2].[value] DESC
) AS [t3]
) AS [MostPopularFolloweeId]
FROM [Tracks] AS [t0]
WHERE [t0].[TrackId] = #p0
That outputs the expected response, and should be a start to a cleaner query.
This sounds like an aggregation query with row_number(). I'm a little confused on how all the joins come together:
select t.*
from (select t.id, f.followee_id, count(*) as cnt,
row_number() over (partition by t.id order by count(*) desc) as seqnum
from followers f join
tracks t
on f.follow_id = t.user_id
group by t.id, f.followee_id
) t
where seqnum = 1;
I have a submissions table which is essentially a single linked list. Given the id of a given row I want to return the entire list that particular row is a part of (and it be in the proper order). For example in the table below if had id 2 I would want to get back rows 1,2,3,4 in that order.
(4,3) -> (3,2) -> (2,1) -> (1,null)
I expect 1,2,3,4 here because 4 is essentially the head of the list that 2 belongs to and I want to traverse all the through the list.
http://sqlfiddle.com/#!15/c352e/1
Is there a way to do this using postgresql's RECURSIVE CTE? So far I have the following but this will only give me the parents and not the descendants
WITH RECURSIVE "sequence" AS (
SELECT * FROM submissions WHERE "submissions"."id" = 2
UNION ALL SELECT "recursive".* FROM "submissions" "recursive"
INNER JOIN "sequence" ON "recursive"."id" = "sequence"."link_id"
)
SELECT "sequence"."id" FROM "sequence"
This approach uses what you have already come up with.
It adds another block to calculate the rest of the list and then combines both doing a custom reverse ordering.
WITH RECURSIVE pathtobottom AS (
-- Get the path from element to bottom list following next element id that matches current link_id
SELECT 1 i, -- add fake order column to reverse retrieved records
* FROM submissions WHERE submissions.id = 2
UNION ALL
SELECT pathtobottom.i + 1 i, -- add fake order column to reverse retrieved records
recursive.* FROM submissions recursive
INNER JOIN pathtobottom ON recursive.id = pathtobottom.link_id
)
, pathtotop AS (
-- Get the path from element to top list following previous element link_id that matches current id
SELECT 1 i, -- add fake order column to reverse retrieved records
* FROM submissions WHERE submissions.id = 2
UNION ALL
SELECT pathtotop.i + 1 i, -- add fake order column to reverse retrieved records
recursive2.* FROM submissions recursive2
INNER JOIN pathtotop ON recursive2.link_id = pathtotop.id
), pathtotoprev as (
-- Reverse path to top using fake 'i' column
SELECT pathtotop.id FROM pathtotop order by i desc
), pathtobottomrev as (
-- Reverse path to bottom using fake 'i' column
SELECT pathtobottom.id FROM pathtobottom order by i desc
)
-- Elements ordered from bottom to top
SELECT pathtobottomrev.id FROM pathtobottomrev where id != 2 -- remove element to avoid duplicate
UNION ALL
SELECT pathtotop.id FROM pathtotop;
/*
-- Elements ordered from top to bottom
SELECT pathtotoprev.id FROM pathtotoprev
UNION ALL
SELECT pathtobottom.id FROM pathtobottom where id != 2; -- remove element to avoid duplicate
*/
In was yet another quest for my brain. Thanks.
with recursive r as (
select *, array[id] as lst from submissions s where id = 6
union all
select s.*, r.lst || s.id
from
submissions s inner join
r on (s.link_id=r.id or s.id=r.link_id)
where (not array[s.id] <# r.lst)
)
select * from r;
I have a sql view which contains data from 3 linked entities (Title > Edition > SKU). The data in this view is used to search on ANY field within the 3 entities. For example, if you specify a condition title.regionid = '14' the view returns 4,000 unique rows (1 per SKU), which belong to 765 unique Editions, which belong to 456 unique Titles.
What I need is to enable paging based on Titles using Row_Number(). So
SELECT * FROM myview WHERE title.regionid = '14' AND Row BETWEEN 0 AND 35
The problem is that my Row column needs to count the rows by Title, not by SKU, so from a result set of 4,000 rows, if the first title contains 12 editions and 65 SKUs, the row number for all 65 rows should be 1, because they belong to the same Title.
I cannot use GroupBy because my view contains 40+ columns all of which can be searched on via the WHERE clause.
Here's the query:
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY a.TitleSort ASC) AS Row, a.*
FROM (SELECT * FROM v_AdvancedSearch
WHERE
istitledeleted = 0
--AND ISBN = '1-4157-5842-5'
--AND etc
) AS a
) d
WHERE
Row BETWEEN 0 AND 35
In the first page there are 35 rows which only belong to 4 titles, but the Row column counts by row so it stops there, whereas if it counted by Title I would get 387 rows for page 1... How can I accomplish paging in this situation?
WITH Titles AS
(
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY a.TitleSort ASC) AS Row, a.*
FROM (SELECT DISTINCT TitleSORT, TitleId FROM v_AdvancedSearch
WHERE
istitledeleted = 0
--AND ISBN = '1-4157-5842-5'
--AND PictureFormat = 'Widescreen'
--AND UPC = '0-9736-14381-6-0'
--AND Edition = 'Standard'
) AS a
) d
WHERE
Row BETWEEN 0 AND 35
)
SELECT * FROM v_AdvancedSearch V
INNER JOIN Titles ON Titles.TitleId = V.TitleId
WHERE istitledeleted = 0
--CONDITIONS NEED TO BE REPEATED HERE
--AND ISBN = '1-4157-5842-5'
ORDER BY V.TitleSort ASC
This form works best for me
WITH
[cte] AS (
SELECT
DENSE_RANK ( ) OVER ( ORDER BY [v].[TitleSort], [v].[TitleId] ) AS [ordinal],
[v].[TitleSort],
[v].[TitleId]
--,
--field list,
--etc
FROM [v_AdvancedSearch] AS [v]
WHERE
[v].[istitledeleted] = 0
--AND
--additional conditions AND
--etc
)
SELECT
[v].[ordinal],
[v].[TitleSort],
[v].[TitleId]
--,
--field list,
--etc
FROM [cte] AS [v]
WHERE
[v].[ordinal] >= 0 AND
[v].[ordinal] <= 35
ORDER BY [v].[ordinal];
No need for DISTINCT or GROUP BY
No need to repeat the criteria conditions
Just gotta have that explicit field list
https://learn.microsoft.com/en-us/sql/t-sql/functions/dense-rank-transact-sql
https://www.google.com/search?q=%22sql+server%22+%22select+star%22
Lets say I have 2 tables: blog_posts and categories. Each blog post belongs to only ONE category, so there is basically a foreign key between the 2 tables here.
I would like to retrieve the 2 lasts posts from each category, is it possible to achieve this in a single request?
GROUP BY would group everything and leave me with only one row in each category. But I want 2 of them.
It would be easy to perform 1 + N query (N = number of category). First retrieve the categories. And then retrieve 2 posts from each category.
I believe it would also be quite easy to perform M queries (M = number of posts I want from each category). First query selects the first post for each category (with a group by). Second query retrieves the second post for each category. etc.
I'm just wondering if someone has a better solution for this. I don't really mind doing 1+N queries for that, but for curiosity and general SQL knowledge, it would be appreciated!
Thanks in advance to whom can help me with this.
Check out this MySQL article on how to work with the top N things in arbitrarily complex groupings; it's good stuff. You can try this:
SET #counter = 0;
SET #category = '';
SELECT
*
FROM
(
SELECT
#counter := IF(posts.category = #category, #counter + 1, 0) AS counter,
#category := posts.category,
posts.*
FROM
(
SELECT
*
FROM test
ORDER BY category, date DESC
) posts
) posts
HAVING counter < 2
SELECT p.*
FROM (
SELECT id,
COALESCE(
(
SELECT datetime
FROM posts pi
WHERE pi.category = c.id
ORDER BY
pi.category DESC, pi.datetime DESC, pi.id DESC
LIMIT 1, 1
), '1900-01-01') AS post_datetime,
COALESCE(
(
SELECT id
FROM posts pi
WHERE pi.category = c.id
ORDER BY
pi.category DESC, pi.datetime DESC, pi.id DESC
LIMIT 1, 1
), 0) AS post_id
FROM category c
) q
JOIN posts p
ON p.category <= q.id
AND p.category >= q.id
AND p.datetime >= q.post_datetime
AND (p.datetime, p.id) >= (q.post_datetime, q.post_id)
Make an index on posts (category, datetime, id) for this to be fast.
Note the p.category <= c.id AND p.category >= c.id hack: this makes MySQL to use Range checked for each record which is more index efficient.
See this article in my blog for a similar problem:
MySQL: emulating ROW_NUMBER with multiple ORDER BY conditions