Determine on what page a record is - sql

How can I determine on what page a certain record is?
Let's say i display 5 records per page using a query like this:
SELECT * FROM posts ORDER BY date DESC LIMIT 0,5
SELECT * FROM posts ORDER BY date DESC LIMIT 5,5
SELECT * FROM posts ORDER BY date DESC LIMIT 10,5
Sample data:
id | name | date
-----------------------------------------------------
1 | a | 2013-11-07 08:19 page 1
2 | b | 2013-12-02 12:32
3 | c | 2013-12-14 14:11
4 | d | 2013-12-21 09:26
5 | e | 2013-12-22 18:52 _________
6 | f | 2014-01-04 11:20 page 2
7 | g | 2014-01-07 21:09
8 | h | 2014-01-08 13:39
9 | i | 2014-01-08 16:41
10 | j | 2014-01-09 07:45 _________
11 | k | 2014-01-14 22:05 page 3
12 | l | 2014-01-21 17:21
Someone may edit a record, let's say with id = 7, or insert a new record (id = 13). How can determine on which page is that record? The reason is that I want to display the page that contains the record that has just been edited or added.
ok I guess I could just display the same page if the record is edited. But the problem is when a record gets added. The list can be ordered by name and the new record could be placed anywhere :(
Is there some way I could do a query like SELECT offset WHERE id = 13 ORDER BY date LIMIT 5 that returns 10 ?

For the sake of this example, let's assume that entry 7 has just been added (and that there could be duplicate names) - the first thing you need to do is find how many entries come before that one (based on name), thus:
SELECT COUNT(*)
FROM Posts
WHERE name <= 'g'
AND id < 7
Here, id is being used as a "tiebreaker" column, to ensure a stable sort. It's also assuming that we know the value of id, too - given that non-key data can be duplicate, you need that sort of functionality.
In any case, this gives us the number of rows preceding this one (6). With some integer division arithmetic (based on the LIMIT), we can now get the relevant information:
(int) ((6 - 1) / 5) = 1
... this is for a 0-indexed page, though (ie, entries 1 - 5 appear on page "0"); however, in this case it works in our favor. Note that we have to subtract 1 from the initial count because the first is 1, not 0 - otherwise, entry 5 would appear on the second page, instead of the first.
We now have the page index, but we need to turn it into the entry index. Some simple multiplication does that for us:
(1 * 5) + 1 = 6
(ignore that this is identical to the count - it's coincidence in this case).
This gives us the index of the first entry on the page, the value for OFFSET.
We can now write the query:
SELECT id, name, date
FROM Posts
ORDER BY name, id
LIMIT 5 OFFSET 6
(keep in mind that we require id to guarantee a stable sort for the data, if we assume that name could be a duplicate!).
This is two trips to the database. Surprisingly, SQLite allows LIMIT/OFFSET values to be the results of SQL subqueries (keep in mind, not all RDBMSs allow them to even be host variables, meaning the could only be changed with dynamic SQL. Although in at least one case, the db had ROW_NUMBER() to make up for that...). I wasn't able to get rid of the repetition of the subqueries, though.
SELECT Posts.id, Posts.name, Posts.date, Pages.pageCount
FROM Posts
CROSS JOIN (SELECT ((COUNT(*) - 1) / 5) + 1 as pageCount
FROM Posts
WHERE name <= 'g'
AND id < 7) Pages
ORDER BY name
LIMIT 5, (SELECT ((COUNT(*) - 1) / 5) * 5 + 1 as entryCount
FROM Posts
WHERE name <= 'g'
AND id < 7);
(and the working SQL Fiddle example).

Related

Find highest (max) date query, and then find highest value from results of previous query

Here is a table called packages:
id packages_sent date sent_order
1 | 10 | 2017-02-11 | 1
2 | 25 | 2017-03-15 | 1
3 | 5 | 2017-04-08 | 1
4 | 20 | 2017-05-21 | 1
5 | 25 | 2017-05-21 | 2
6 | 5 | 2017-06-19 | 1
This table shows the number of packages sent on a given date; if there were multiple packages sent on the same date (as is the case with rows 4 and 5), then the sent_order keeps track of the order in which they were sent.
I am trying to make a query that will return sum(packages_sent) given the following conditions: first, return the row with the max(date) (given some date provided), and second, if there are multiple rows with the same max(date), return the row with the max(send_order) (the highest send_order value).
Here is the query I have so far:
SELECT sum(packages_sent)
FROM packages
WHERE date IN
(SELECT max(date)
FROM packages
WHERE date <= '2017-05-29');
This query correctly finds the max date, which is 2017-05-21, but then for the sum it returns 45 because it is adding rows 4 and 5 together.
I want the query to return the max(date), and if there are multiple rows with the same max(date), then return the row with the max(sent_order). Using the example above with the date 2017-05-29, it should only return 25.
I don't see where a sum() comes into play. You seem to only want the last row:
select p.*
from packages p
order by date desc, sendorder desc
fetch first 1 row only;
If you data is truly ordered ascending as you show it then it's easier to use the surrogate key ID field.
SELECT packages_sent
FROM packages
WHERE ID =
(SELECT max(ID)
FROM packages
WHERE date <= '2017-05-29');
Since the ID is always increasing with date and sent order finding the max of it also finds the max of the other two in one step.

SQL: the most effective way to get row number of one element

I have a table of persons:
id | Name | Age
1 | Alex | 18
2 | Peter| 30
3 | Zack | 25
4 | Bim | 30
5 | Ken | 20
And I have the following interval of rows: WHERE ID>1 AND ID<5. I know that in this interval there is a person whose id=3. What is the most efficient (the fastest) way to get its row number in this interval (in my example rownumber=2)? I mean I don't need any other data. I need only one thing - to know row position of person with id=3 in interval WHERE ID>1 AND ID<5.
If it's possible I would like to get not vendor specific solution but a general sql solution. If it's not possible then I need solution for postgresql and h2.
The row number would be the number of rows between the first row in the interval and the row you're looking for. For interval ID>1 AND ID<5 and target row ID=3, this is:
select count(*)
from YourTable
where id between 2 and 3
For interval ID>314 AND ID<1592 and target row ID=1000, you'd use:
where id between 315 and 1000
To be sure that there is an element with ID=3, use:
select count(*)
from YourTable
where id between 2 and
(
select id
from YourTable
where id = 3
)
This will return 0 if the row doesn't exist.

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

TSQL change in query to and query

I have one to many relationship table
ReviewId EffectId
1 | 2
1 | 5
1 | 8
2 | 2
2 | 5
2 | 9
2 | 3
3 | 3
3 | 2
3 | 9
In the site the users select each effect he chooses, and I get all the relevant review.
I make an in query
For example if the user select effects 2 and 5
My query: “
select reviewed from table_name where effected in(2,5)
Now I need get all the review that contain both effect
All reviews that has effect 2 and effect 5
What is the best query to make this?
Important for me that the query will run as quick as possible.
And for this I can also change the table schema (if needed ) like add a cached field that contain all the effect with comma like
Reviewed cachedEffects
1 | ,2,5,8
2 | ,2,5,9,3,
3 | ,3,2,9
You can do it this way:
select reviewid
from
tbl
where effectid in (2,5)
group by reviewid
having count(distinct effectid) > 1
Demo
count (distinct effectid) is used to ensure that the results contain only those reviewIDs which have multiple records with different values of effectID. The where clause is used to filter out based on your filter condition of having both 2 and 5.
The key thing to note here is that we are grouping by reviewID, and also using the count of distinct effectID values to ensure that only those records which have both 2 and 5 are returned. If we did not do so, the query would return all rows which have effectID equal to either 2 or 5.
For improving performance, you could create an index on reviewID.

Dont understand how queries to retrieve top n records from each group work

I had an issue where I was trying to get the top 'n' records from each group (day) or records in my database. After a bunch of digging I found some great answers and they did in fact solve my problem.
How to select the first N rows of each group?
Get top n records for each group of grouped results
However, my noob-ness is preventing me from understanding exactly WHY these "counting" solutions work. If someone with better SQL knowledge can explain, that would be really great.
EDIT: here's more details
Let's say I had a table described below with this sample data. (To make things simpler, I have a column that kept track of the time of the next upcoming midnight, in order to group 'per day' better).
id | vote_time | time_of_midnight | name | votes_yay | votes_nay
------------------------------------------------------------------------
1 | a | b | Person p | 24 | 36
1 | a | b | Person q | 20 | 10
1 | a | b | Person r | 42 | 22
1 | c | d | Person p | 8 | 10
1 | c | d | Person s | 120 | 63
There can be tens or hundreds of "People" per day (b, d, ...)
id is some other column I needed in order to group by (you can think of it as an election id if that helps)
I'm trying to calculate the top 5 names that had the highest number of votes per day, in descending order. I was able to use the referenced articles to create a query that would give me the following results (on Oracle):
SELECT name, time_of_midnight, votes_yay, votes_nay, (votes_yay+votes_nay) AS total_votes
FROM results a
WHERE id=1 AND (
SELECT COUNT(*)
FROM results b
WHERE b.id=a.id AND b.time_of_midnight=a.time_of_midnight AND (a.votes_yay+a.votes_nay) >= (b.votes_yay+b.votes_nay)) <= 5
ORDER BY time_of_midnight DESC, total_votes DESC;
name | time_of_midnight | votes_yay | votes_nay | total_votes
------------------------------------------------------------------------
Person s | d | 120 | 63 | 183
Person p | d | 8 | 10 | 18
Person r | b | 42 | 22 | 64
Person p | b | 24 | 36 | 60
Person q | b | 20 | 10 | 30
So I'm not really sure
Why this counting method works?
[stupid]: Why don't I need to also include name in the inner query to make sure it doesn't join the data incorrectly?
Let's begin with the fact that your query is actually calculating top 5 names that had the lowest number of votes. To get the top 5 with the highest number, you'll need to change this condition:
(a.votes_yay+a.votes_nay) >= (b.votes_yay+b.votes_nay)
into this:
(a.votes_yay+a.votes_nay) <= (b.votes_yay+b.votes_nay)
or, perhaps, this (which is the same):
(b.votes_yay+b.votes_nay) >= (a.votes_yay+a.votes_nay)
(The latter form would seem to me preferable, but merely because it would be uniform with the other two comparisons which have a b column on the left-hand side and an a column on the right-hand side. That is perfectly irrelevant to the correctness of the logic.)
Logically, what's happening is this. For every row in results, the server will be looking for rows in the same table that match id and time_of_midnight of the given row and have the same or higher number of total votes than that in the given row. It will then count the found rows and check if the result is not greater than 5, i.e. if no more than 5 rows in the same (id, time_of_midnight) group have the same or higher number of votes as in the given row.
For example, if the given row happens to be one with the most votes in its group, the subquery will find only that same row (assuming there are no ties) and so the count will be 1. That is fewer than 5 – therefore, the given row will qualify for output.
If the given row will be the second most voted item in a group, the subquery will find the same row and the top-voted item (again, assuming no ties), which will give the count of 2. Again, that matches the count <= 5 condition, and so the row will be returned in the output.
In general, if a row is ranked as # N in its group according to the total number of votes, it means there are N rows in that group where the vote number is the same or higher than the number in the given row (we are still assuming there are no ties). So, when you are counting votes in this way, you are effectively calculating the given row's ranking.
Now, if there are ties, you may get fewer results per group using this method. In fact, if a group had 6 or more rows tied at the maximum number of rows, you would get no rows for that group in the output, because the subquery would never return a count value less than 6.
That is because effectively all the top-voted items would be ranked as 6 (or whatever their number would be) rather than as 1. To rank them as 1 instead, you could try the following modification of the same query:
SELECT name, time_of_midnight, votes_yay, votes_nay, (votes_yay+votes_nay) AS total_votes
FROM results a
WHERE id=1 AND (
SELECT COUNT(*) + 1
FROM results b
WHERE b.id=a.id AND b.time_of_midnight=a.time_of_midnight
AND (b.votes_yay+b.votes_nay) > (a.votes_yay+a.votes_nay)) <= 5
ORDER BY time_of_midnight DESC, total_votes DESC;
Now the subquery will be looking only for rows with the higher number of votes than in the given row. The resulting count will be increased by 1 and that will be the given row's ranking (and the value to compare against 5).
So, if the counts were e.g. 10, 10, 8, 7 etc., the rankings would be calculated as 1, 1, 3, 4 etc. rather than as 2, 2, 3, 4 etc., as with the original version.
That, of course, means that the output might now have more than 5 rows per group. For instance, if votes were distributed as 10, 9, 8, 8, 8, 8, 6 etc., you would get 10, 9 and all the 8s (because the rankings would be 1, 2, 3, 3, 3, 3, 7...). To return exactly 5 names per group (assuming there are at least 5 of them), you'd probably need to consider a different method altogether.