SQL get next entry from known ID in a single query - sql

I was wondering how would I get the data for the next row in an SQL database, assuming I know the ID for the current entry and the table is ordered by ID.
Normally, when ordering by ID, one would think that to get the prev/next entry, you just need to substract/add 1 to the variable holding the ID, and run the SELECT query with the new ID, but this poses a problem when there are holes in the table, with ID's like so:
13,14,18,21...
And so on.
A way to do it would be by looping in your programming language, running a query and adding 1 every time it runs until it finds a row, but that could be potentially taxing to the database. Is there a way to find it in just a single query?

I was thinking about this being a plausible problem, considering I even thought about it for a second. So I thought of sharing my solution here!
What I would do to solve this, is to create a new query WHERE the new id is less/greater than the old one, like so:
SELECT *
FROM myTable t
WHERE t.id > 27
ORDER BY t.id
LIMIT 1
By doing this and limiting the results to 1, you can guarantee that you will get the entry that comes after 27.
This should also work for date orderings.

How about this:
Select MIN(myTable.Id)
FROM myTable
WHERE myTable.Id > 27

Get the next id number with min(). The next id after, say, 21 would be given by this query.
select min(test_id) as next_test_id
from test
where test_id > 21
Join that to the original table to get the row for that id number.
select *
from test
inner join (select min(test_id) as next_test_id
from test
where test_id > 3 ) t2
on test.test_id = t2.next_test_id

Related

SELECT DISTINCT returns more rows than expected

I have read many answers here, but until now nothing could help me. I'm developing a ticket system, where each ticket has many updates.
I have about 2 tables: tb_ticket and tb_updates.
I created a SELECT with subqueries, where it took a long time (about 25 seconds) to get about 1000 rows. Now I changed it to INNER JOIN instead many SELECTs in subqueries, it is really fast (70 ms), but now I get duplicates tickets. I would like to know how can I do to get only the last row (ordering by time).
My current result is:
...
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
The first column is the ticket ID, the second is the update ID... I would like to get only a row per ticket ID, but DISTINCT does not work in this case. Which row should be? Always the latest one, so in this case 2014-08-26 10:40:21.
UPDATE:
It is a postgresql database. I did not share my current query because it has only portuguese names, so I think it would not help at all.
SOLUTION:
Used_By_Already had the best solution to my problem.
Without the details of your tables one has to guess the field names, but it seems that tb_updates has many records for a single record in tb_ticket (a many to one relationship).
A generic solution to your problem - to get just the "latest" record - is to use a subquery on tb_updates (see alias mx below) and then join that back to tb_updates so that only the record that has the latest date is chosen.
SELECT
t.*
, u.*
FROM tb_ticket t
INNER JOIN tb_updates u
ON t.ticket_id = u.ticket_id
INNER JOIN (
SELECT
ticket_id
, MAX(updated_at) max_updated
FROM tb_updates
GROUP BY
ticket_id
) mx
ON u.ticket_id = mx.ticket_id
AND u.updated_at = mx.max_updated
;
If you have a dbms that supports ROW_NUMBER() then using that function can be a very effective alternative method, but you haven't informed us which dbms you are using.
by the way:
These rows ARE distinct:
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
69759 is different to 69771, and that is enough for the 2 rows to be DISTINCT
there are difference in the 2 dates also.
distinct is a row operator which means is considers the entire row, not just the first column, when deciding which rows are unique.
Used_By_Already's solution would work just fine. I'm not sure on the performance but another solution would be to use cross apply, though that is limited to only a few DBMS's.
SELECT *
FROM tb_ticket ticket
CROSS APPLY (
SELECT top(1) *
FROM tb_updates details
ORDER BY updateTime desc
WHERE details.ticketID = ticket.ticketID
) updates
U Can try something like below if your updateid is identity column:
Select ticketed, max(updateid) from table
group by ticketed
To obtain last row you have to end your query with order by time desc then use TOP (1) in the select statement to select only the first row in the query result
ex:
select TOP (1) .....
from .....
where .....
order by time desc

Identify if at least one row with given condition exists

Employee table has ID and NAME columns. Names can be repeated. I want to find out if there is at least one row with name like 'kaushik%'.
So query should return true/false or 1/0.
Is it possible to find it using single query.
If we try something like
select count(1) from employee where name like 'kaushik%'
in this case it does not return true/false.
Also we are iterating over all the records in table. Is there way in simple SQL such that whenever first record which satisfies condition is fetched, it should stop checking further records.
Or such thing can only be handled in Pl/SQL block ?
EDIT *
First approach provided by Justin looks correct answer
SELECT COUNT(*) FROM employee WHERE name like 'kaushik%' AND rownum = 1
Commonly, you'd express this as either
SELECT COUNT(*)
FROM employee
WHERE name like 'kaushik%'
AND rownum = 1
where the rownum = 1 predicate allows Oracle to stop looking as soon as it finds the first matching row or
SELECT 1
FROM dual
WHERE EXISTS( SELECT 1
FROM employee
WHERE name like 'kaushik%' )
where the EXISTS clause allows Oracle to stop looking as soon as it finds the first matching row.
The first approach is a bit more compact but, to my eye, the second approach is a bit more clear since you really are looking to determine whether a particular row exists rather than trying to count something. But the first approach is pretty easy to understand as well.
How about:
select max(case when name like 'kraushik%' then 1 else 0 end)
from employee
Or, what might be more efficient since like can use indexes:
select count(x)
from (select 1 as x
from employee
where name like 'kraushik%'
) t
where rownum = 1
since you require that the sql query should return 1 or 0, then you can try the following query :-
select count(1) from dual
where exists(SELECT 1
FROM employee
WHERE name like 'kaushik%')
Since the above query uses Exists, then it will scan the employee table and as soon as it encounters the first record where name matches "kaushik", it will return 1 (without scanning the rest of the table). If none of the records match, then it will return 0.
select 1
where exists ( select name
from employee
where name like 'kaushik%'
)

Assistance with SQL statement

I'm using sql-server 2005 and ASP.NET with C#.
I have Users table with
userId(int),
userGender(tinyint),
userAge(tinyint),
userCity(tinyint)
(simplified version of course)
I need to select always two fit to userID I pass to query users of opposite gender, in age range of -5 to +10 years and from the same city.
Important fact is it always must be two, so I created condition if ##rowcount<2 re-select without age and city filters.
Now the problem is that I sometimes have two returned result sets because I use first ##rowcount on a table. If I run the query.
Will it be a problem to use the DataReader object to read from always second result set? Is there any other way to check how many results were selected without performing select with results?
Can you simplify it by using SELECT TOP 2 ?
Update: I would perform both selects all the time, union the results, and then select from them based on an order (using SELECT TOP 2) as the union may have added more than two. Its important that this next select selects the rows in order of importance, ie it prefers rows from your first select.
Alternatively, have the reader logic read the next result-set if there is one and leave the SQL alone.
To avoid getting two separate result sets you can do your first SELECT into a table variable and then do your ##ROWCOUNT check. If >= 2 then just select from the table variable on its own otherwise select the results of the table variable UNION ALLed with the results of the second query.
Edit: There is a slight overhead to using table variables so you'd need to balance whether this was cheaper than Adam's suggestion just to perform the 'UNION' as a matter of routine by looking at the execution stats for both approaches
SET STATISTICS IO ON
Would something along the following lines be of use...
SELECT *
FROM (SELECT 1 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender AND
M1.userAge - 5 >= M2.userAge AND
M1.userAge + 15 <= M2.userAge AND
M1.userCity = M2.userCity
LIMIT TO 2 ROWS
UNION
SELECT 2 AS prio, *
FROM my_table M1 JOIN my_table M2
WHERE M1.userID = supplied_user_id AND
M1.userGender <> M2.userGender
LIMIT TO 2 ROWS)
ORDER BY prio
LIMIT TO 2 ROWS;
I haven't tried it as I have no SQL Server and there may be dialect issues.

Efficient SQL to count an occurrence in the latest X rows

For example I have:
create table a (i int);
Assume there are 10k rows.
I want to count 0's in the last 20 rows.
Something like:
select count(*) from (select i from a limit 20) where i = 0;
Is that possible to make it more efficient? Like a single SQL statement or something?
PS. DB is SQLite3 if that matters at all...
UPDATE
PPS. No need to group by anything in this instance, assume the table that is literally 1 column (and presumably the internal DB row_ID or something). I'm just curious if this is possible to do without the nested selects?
You'll need to order by something in order to determine the last 20 rows. When you say last, do you mean by date, by ID, ...?
Something like this should work:
select count(*)
from (
select i
from a
order by j desc
limit 20
) where i = 0;
If you do not remove rows from the table, you may try the following hacky query:
SELECT COUNT(*) as cnt
FROM A
WHERE
ROWID > (SELECT MAX(ROWID)-20 FROM A)
AND i=0;
It operates with ROWIDs only. As the documentation says: Rows are stored in rowid order.
You need to remember to order by when you use limit, otherwise the result is indeterminate. To get the latest rows added, you need to include a column with the insertion date, then you can use that. Without this column you cannot guarantee that you will get the latest rows.
To make it efficient you should ensure that there is an index on the column you order by, possibly even a clustered index.
I'm afraid that you need a nested select to be able to count and restrict to last X rows at a time, because something like this
SELECT count(*) FROM a GROUP BY i HAVING i = 0
will count 0's, but in ALL table records, because a LIMIT in this query will basically have no effect.
However, you can optimize making COUNT(i) as it is faster to COUNT only one field than 2 or more (in this case your table will have 2 fields, i and rowid, that is automatically created by SQLite in PKless tables)

How should I handle "ranked x out of y" data in PostgreSQL?

I have a table that I would like to be able to present "ranked X out of Y" data for. In particular, I'd like to be able to present that data for an individual row in a relatively efficient way (i.e. without selecting every row in the table). The ranking itself is quite simple, it's a straight ORDER BY on a single column in the table.
Postgres seems to present some unique challenges in this regard; AFAICT it doesn't have a RANK or ROW_NUMBER or equivalent function (at least in 8.3, which I'm stuck on for the moment). The canonical answer in the mailing list archives seems to be to create a temporary sequence and select from it:
test=> create temporary sequence tmp_seq;
CREATE SEQUENCE
test=*> select nextval('tmp_seq') as row_number, col1, col2 from foo;
It seems like this solution still won't help when I want to select just a single row from the table (and I want to select it by PK, not by rank).
I could denormalize and store the rank in a separate column, which makes presenting the data trivial, but just relocates my problem. UPDATE doesn't support ORDER BY, so I'm not sure how I'd construct an UPDATE query to set the ranks (short of selecting every row and running a separate UPDATE for each row, which seems like way too much DB activity to trigger every time the ranks need updating).
Am I missing something obvious? What's the Right Way to do this?
EDIT: Apparently I wasn't clear enough. I'm aware of OFFSET/LIMIT, but I don't see how it helps solve this problem. I'm not trying to select the Xth-ranked item, I'm trying to select an arbitrary item (by its PK, say), and then be able to display to the user something like "ranked 43rd out of 312."
If you want the rank, do something like
SELECT id,num,rank FROM (
SELECT id,num,rank() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4
Or if you actually want the row number, use
SELECT id,num,row_number FROM (
SELECT id,num,row_number() OVER (ORDER BY num) FROM foo
) AS bar WHERE id=4
They'll differ when you have equal values somewhere. There is also dense_rank() if you need that.
This requires PostgreSQL 8.4, of course.
Isn't it just this:
SELECT *
FROM mytable
ORDER BY
col1
OFFSET X LIMIT 1
Or I am missing something?
Update:
If you want to show the rank, use this:
SELECT mi.*, values[1] AS rank, values[2] AS total
FROM (
SELECT (
SELECT ARRAY[SUM(((mi.col1, mi.ctid) < (mo.col1, mo.ctid))::INTEGER), COUNT(*)]
FROM mytable mi
) AS values
FROM mytable mo
WHERE mo.id = #myid
) q
ROW_NUMBER functionality in PostgreSQL is implemented via LIMIT n OFFSET skip.
Find an overview here.
On the pitfalls of ranking see this SO question.
EDIT: Since you are asking for ROW_NUMBER() instead of simple ranking: row_number() is introduced to PostgreSQL in version 8.4. So you might consider to update. Otherwise this workaround might be helpful.
Previous replies tackle the question "select all rows and get their rank" which is not what you want...
you have a row
you want to know its rank
Just do :
SELECT count(*) FROM table WHERE score > $1
Where $1 is the score of the row you just selected (I suppose you'd like to display it so you might select it...).
Or do :
SELECT a., (SELECT count() FROM table b WHERE score > b.score) AS rank FROM table AS a WHERE pk = ...
However, if you select a row which is ranked last, yes you will need to count all the rows which are ranked before it, so you'll need to scan the whole table, and it will be very slow.
Solution :
SELECT count(*) FROM (SELECT 1 FROM table WHERE score > $1 LIMIT 30)
You'll get precise ranking for the 30 best scores, and it will be fast.
Who cares about the losers ?
OK, If you really do care about the losers, you'll need to make a histogram :
Suppose score can go from 0 to 100, and you have 1000000 losers with score < 80 and 10 winners with score > 80.
You make a histogram of how many rows have a score of X, it's a simple small table with 100 rows. Add a trigger to your main table to update the histogram.
Now if you want to rank a loser which has score X, his rank is sum( histo ) where histo_score > X.
Since your score probably isn't between 0 and 100, but (say) between 0 and 1000000000, you'll need to fudge it a bit, enlarge your histogram bins, for instance. so you only need 100 bins max, or use some log-histogram distribution function.
By the way postgres does this when you ANALYZE the table, so if you set statistics_target to 100 or 1000 on score, ANALYZE, and then run :
EXPLAIN SELECT * FROM table WHERE score > $1
you'll get a nice rowcount estimate.
Who needs exact answers ?