Limit results to x groups - sql

I'm developing a system using Trac, and I want to limit the number of "changelog" entries returned. The issue is that Trac collates these entries from multiple tables using a union, and then later combines them into single 'changesets' based on their timestamp. I wish to limit the results to the latest e.g. 3 changesets, but this requires retrieving as many rows as necessary until I've got 3 unique timestamps. Solution needs to work for SQLite/Postgres.
Trac's current SQL
Current SQL Result
Time User Field oldvalue newvalue permanent
=======================================================================
1371806593507544 a owner b c 1
1371806593507544 a comment 2 lipsum 1
1371806593507544 a description foo bar 1
1371806593324529 b comment hello world 1
1371806593125677 c priority minor major 1
1371806592492812 d comment x y 1
Intended SQL Result (Limited to 1 timestamp e.g.)
Time User Field oldvalue newvalue permanent
=======================================================================
1371806593507544 a owner b c 1
1371806593507544 a comment 2 lipsum 1
1371806593507544 a description foo bar 1

As you already pointed out on your own, this cannot be resolved in SQL due to the undetermined number of results. And I think this is not even required.
You can use a slightly modified trac/ticket/templates/ticket.html Genshi template to get what you want. Change
<div id="changelog">
<py:for each="change in changes">
into
<div id="changelog">
<py:for each="change in changes[-3:]">
and place the file into <env>/templates/ restart your web-server. But watch out for changes to ticket.html, whenever you attempt to upgrade your Trac install. Every time you do that, you might need to re-apply this change on the current template of the respective version. But IMHO its still a lot faster and cleaner than to patch Trac core code.

If you want just three records (as in the "Data Limit 1" result set), you can use limit:
select *
from t
order by time desc
limit 3
If you want all records for the three most recent time stamps, you can use a join:
select t.*
from t join
(select distinct time
from t
order by times desc
limit 3
) tt
on tt.time = t.time

Related

Removing SQL Rows from Query if two rows have an identical ID but differences in the columns

I´m currently working stuck on a SQL issue (well, mainly because I can´t find a way to google it and my SQL skills do not suffice to solve it myself)
I´m working on a system where documents are edited. If the editing process is finished, users mark the document as solved. In the MSSQL database, the corresponding row is not updated but instead, a new row is inserted. Thus, every document that has been processed has [e.g.: should have] multiple rows in the DB.
See the following situation:
ID
ID2
AnotherCondition
Steps
Process
Solved
1
1
yes
Three
ATAT
AF
2
2
yes
One
ATAT
FR
2
3
yes
One
ATAT
EG
2
4
yes
One
ATAT
AF
3
5
no
One
ABAT
AF
4
6
yes
One
ATAT
FR
5
7
no
One
AVAT
EG
6
8
yes
Two
SATT
FR
6
9
yes
Two
SATT
EG
6
10
yes
Two
SATT
AF
I need to select the rows which have not been processed yet. A "processed" document has a "FR" in the "Solved" column. Sadly other versions of the document exist in the DB, with other codes in the "Solved" columns.
Now: If there is a row which has "FR" in the "Solved" column I need to remove every row with the same ID from my SELECT statement as well. Is this doable?
In order to achieve this, I have to remove the rows with the IDs 2 | 4 (because the system sadly isn´t too reliable I guess) | and 6 in my select statement. Is this possible in general?
What I could do is to filter out the duplicates afterwards, in python/js/whatever. But I am curious whether I can "remove" these rows directly in the SQL statement as well.
To rephrase it another time: How can I make a select statement which returns only (in this example) the rows containing the ID´s 1, 3 and 5?
If you need to delete all rows where every id doesn't have any "Solved = 'no'", you can use a DELETE statement that will exclude all "id" values that have at least one "Solved = 'no'" in the corresponding rows.
DELETE FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'no');
Check the demo here.
Edit. If you need to use a SELECT statement, you can simply reverse the condition in the subquery:
SELECT *
FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'yes');
Check the demo here.
I'm not sure I understand your question correct:
...every document that has been processed has [...] multiple rows in the DB
I need to find out which documents have not been processed yet
So it seems you need to find unique documents with no versions, this could be done using a GROUP BY with a HAVING clause:
SELECT
Id
FROM dbo.TableName
GROUP BY Id
HAVING COUNT(*) = 1

In Sql Server 2014 ORDER BY clause with OFFSET FETCH NEXT returns weird results

I am currently using Sql Server 2014 Professional and the current version is (12.0.4100). I have a View and I am trying to SELECT 10 rows with specific offset.My View is like below:
BeginTime | EndTime | Duration | Name
09:00:00.0000000|16:00:00.0000000| 1 | some_name1
09:00:00.0000000|16:00:00.0000000| 2 | some_name2
09:00:00.0000000|16:00:00.0000000| 3 | some_name3
09:00:00.0000000|16:00:00.0000000| 4 | some_name4
09:00:00.0000000|16:00:00.0000000| 5 | some_name5
09:00:00.0000000|16:00:00.0000000| 6 | some_name6
09:00:00.0000000|16:00:00.0000000| 7 | some_name7
there are 100 rows like these and all have the exact same value in BeginTime and EndTime. Duration is incremented from 1 to 100 in related table. If query is only:
SELECT * FROM View_Name
ResultSet is correct. I can understand it by checking the duration column.
If I want to fetch only 10 rows starting from 0, ResultSet is correct and it is correct for starting from up to 18. When I want to fetch 10 rows starting from 19 or more than 19, Duration in ResultSet returns irrelevant results like Duration reversed. But it never returns the rows which has duration more than 11.
The query that I used to fetch specific rows is as follows:
SELECT * FROM View_Name ORDER BY BeginTime ASC OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY
There is also something strange in this situation; if I specify USE master, this problem disappears, but, if I specify USE [mydb_name], the problem appears again. By the way, I am using SQL SERVER 2014 Professional v(12.0.2269) in my local pc, this problem disappears for the above situation.
PS: I can not use USE master because, I am creating and listing the view dynamically, in Stored Procedures. Any help, answer or comment will be accepted. Thank You!
The documentation explains:
To achieve stable results between query requests using OFFSET and
FETCH, the following conditions must be met:
. . .
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.
What happens in your case is that BeginTime is not unique. Databases in general -- and SQL Server in particular -- do not implement stable sorts. A stable sort is one where the rows are in the same order when the keys are the same. This is rather obvious, because tables and result sets represent unordered sets. They have no inherent ordering.
So, you need a unique key to make the sort stable. Given your data, this would seem to be either duration, name, or both:
SELECT *
ROM View_Name
ORDER BY BeginTime ASC, Duration, Name
OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
your order by should be unique,otherwise you will get indeterministic results(in your case ,begin time is not unique and your are not guarnteed to get same results every time).try changing your query to below to make it unique..
SELECT * FROM View_Name ORDER BY duration OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY
Further to add ,your first query (select * from view) result set is not guaranteed to be accurate every time unless you have an outer order by .

Order on basis of multiple columns

Suppose I have table project which has following fields:
number of issue
number of comments
number of followers
created_at
I want to sort project on basis of all the fields, however each field has different precedence. A project with higher issue should be higher up even if it has lesser number of comments or followers.
Assume order of precedence to be:
issue > followers > comments > created_at
I can't use something like:
Select * from Projects ORDER BY Issue, Followers, Comments, Created_at
This would first order by issues and then solve conflicts on basis of followers and so on. For ex: I would want a project with 5 issues and 10 comments to be placed lower than one with 3 issues but 50 comments.
I guess I would need to use some multiplicative factor to scale everything in proportion. However, I can't figure out details.
Assume: 1 issue = 2 followers = 4 comments = 1 week old created at time
Something like this perhaps?
Select * from Projects
ORDER BY Issue * 5 + Followers * 3 + Comments DESC, Created_at

What is the best way to reassign ordinal number of a move operation

I have a column in the sql server called "Ordinal" that is used to indicate the display order of the rows. It starts from 0 and skips 10 for the next row. so we have something like this:
Id Ordinal
1 0
2 20
3 10
It skips 10 because we wanted to be able to move item in between items (based on ordinal) without having to reassign ordinal number for the entire table.
As you can imagine eventually, Ordinal number will need to be reassign somehow for a move in between operation either on surrounding rows or for the entire table as the unused ordinal numbers between the target items are all used up.
Is there any algorithm that I can use to effectively reorder the ordinal number for the move operation taken in the consideration like long term maintainability of the table and minimizing update operations of the table?
You can re-number the sequences using a somewhat complicated UPDATE statement:
UPDATE u
SET u.sequence = 10 * (c.num_below-1)
FROM test u
JOIN (
SELECT t.id, count(*) AS num_below
FROM test t
JOIN test tr ON tr.sequence <= t.sequence
GROUP BY t.id
) c ON c.id=u.id
The idea is to obtain a count of items with the sequence lower than that of the current row, multiply the count by ten, and assign it as the new count.
The content of test before the UPDATE:
ID Sequence
__ ________
1 0
2 10
3 20
4 12
The content of test after the UPDATE:
ID Sequence
__ ________
1 0
2 30
3 10
4 20
Now the sequence numbers are evenly spread again, so you can continue inserting in the middle until you run out of new sequence numbers; then you can re-number again.
Demo.
These won't answer your question directly--I just thought I might suggest some other approaches:
One possibility--don't try to do it by hand. Have your software manage the numbers. If they need re-writing, just save them with new numbers.
a second--use a "Linked List" instead. In each record store the index of the next record you want displayed, then have your code load that directly into a linked list.
Yet another simple approach. Let's say you're inserting a new record with an ordinal equal x.
First, check if there's a row having ordinal value equal x. In case there's one, just update all the records having the ordinal value equal or bigger than x increasing them by y. Then, you are safe to insert a new record.
This way you're sure you'll not run update every time and of course, you'll keep the order.

DB2 query to get next available number in table

I have a table with few columns and I want to achieve the following functionality using DB2 query.
say for e.g. USR table has User ID column and Option ID column
USER ID OPTION ID
1 1
1 5
1 22
1 100
1 999
I want to write a query and result should be next available number in sequence.
So when the first time query will be executed, it should return me the next
available option ID as 2, so user will enter #2, so DB would have now
USER ID OPTION ID
1 1
1 2
1 5
1 22
1 100
1 999
so now when the query will be executed, it will show me available Option ID as 3.
Can somebody help to get the optimized query to get the correct results?
Please note that I think that exposing option_id to the user is a terrible idea, business requirement or no. Surrogate id's like this are meant to be completely hidden from the end user ('natural' keys, like credit-card numbers, obviously have to be exposed, but still shouldn't be dictated in this manner).
The following should work on any version of DB2:
SELECT a.optionid + :nextIncrement as next_value
FROM Usr as a
LEFT JOIN Usr as b
ON b.userid = a.userid
AND b.optionid = a.optionid + :nextIncrement
WHERE a.userid = :userId
AND b.userid IS NULL
ORDER BY a.optionid ASC
FETCH FIRST 1 ROW ONLY
(statement run against a local table on my iSeries instance, with host variables replaced)
Again, I strongly recommend you not use this, and see about getting the business requirement changed.