How to find unused ID in a column? [duplicate] - sql

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
SQL query to find Missing sequence numbers
I have a table which has a user Id column, The user could select which user ID to add in the table. I am wondering if there is a one sql code that could point me to the list of unused user id or even just the smallest unused ID?
For example, I have the following IDs
USER_ID
1
2
3
5
6
7
8
10
I would like to know if there is a way to select 4 or even selecting 4 and 9?

You can try using the "NOT IN" clause:
select
user_id
from table
where
user_id not in (select user_id from another_table)
Like this:
select
u1.user_id + 1 as start
from users as u1
left outer join users as u2 on u1.user_id + 1 = u2.id
where
u2.id is null
From here.

It depends on the Database you are using. If you are using Oracle, something like this will work:
Step 1: Find out max value of userid in your table:
select max(userid) from tbl_userid
let this number be m
Step 2: Find out the max value of rownum in the foll query
select rownum from all_objects
Step 3: If the max value is greater than m then you can use the foll query to list your unused user ids
select user_id
from tbl_userid
where user_id NOT IN (select rownum from all_objects)
If max value returned by step 2 is less than m you can tweak your query to the following
select user_id
from tbl_userid
where user_id NOT IN
(select rownum
from (select *
from all_objects
UNION ALL
select * from all_objects)
)
Repeat the UNION ALL until you get max(rownum) >= m
If you are using SQL server, kindly let me know. There is no direct equivalent of ROWNUM pseudocolumn in sql server but there are workarounds using the RANK() function.

Given that SQL is generally a set-based language, the only way I could think to do this would be to create the full set of ID's, and outer join your table where no ID's matched. Problem with that is if your table has a significant number of records, you would have to generate a temporary table containing every ID from 1 through MAX(USER_ID). Given a table with tens or hundreds of millions of records, that could be very slow.
Just out of curiosity, why do you need to know the ID holes? Is there some specific reason, or are you just trying to not "waste" an ID? Given the processing effort to find the holes, I would think it is more efficient to just let them be.

Here's one way to do it using SQL Server 2005 or later. It may or may not work efficiently for you:
insert into T values
(1),(2),(3),(5),(6),(9),(11);
with Trk as (
select userid,
row_number() over (
order by userid
) as rk
from T
), Truns(start,finish,gp) as (
select -1+min(userid), 1+max(userid),
userid-rk
from Trk
group by userid-rk
), Tregroup as (
select start, finish,
row_number() over (
order by gp
) as rk
from Truns
), Tpre as (
select a.finish, b.start
from Tregroup as a full outer join Tregroup as b
on a.rk + 1 = b.rk
)
select
rtrim(finish) + case when start = finish then '' else + '-' + rtrim(start) end as gap
from Tpre
where finish+start is not null
drop table T;

Short of looping through all the ids (perhaps using binary search tree logic?) I don't have a good answer for you.
I would ask what you want this for? By their nature, ids are essentially meaningless - all they do is identify some data, not describe it, and as such it shouldn't be a problem if you have large gaps in your user ids. (In fact, some people would say that it's even better to have unguessable ids, to avoid users tampering with information to find security holes)

Related

How can I get the total result count, and a given subset ('page' of results) with the same SQL Query with Oracle

I would like to display a table of results. The data is sourced from a SQL query on an Oracle database. I would like to show the results one page (say, 10 records) at a time, minimising the actual data being sent to the front-end.
At the same time, I would like to show the total number of possible results (say, showing 1-10 of 123), and to allow for pagination (say, to calculate that 10 per page, 123 results, therefore 13 pages).
I can get the total number of results with a single count query.
SELECT count(*) AS NUM_RESULTS FROM ... etc.
and I can get the desired subset with another query
SELECT * FROM ... etc. WHERE ? <= ROWNUM AND ROWNUM < ?
But, is there a way to get all the relevant details in one single query?
Update
Actually, the above query using ROWNUM seems to work for 0 - 10, but not for 10 - 20, so how can I do that too?
ROWNUM is a bit tricky to use.
The ROWNUM pseudocolumn always starts with 1 for the first result that actually gets fetched. If you filter for ROWNUM>10, you will never fetch any result and therefore will not get any.
If you want to use it for paging (not that you really should), it requires nested subqueries:
select * from
(select rownum n, x.* from
(select * from mytable order by name) x
)
where n between 3 and 5;
Note that you need another nested subquery to get the order by right; if you put the order by one level higher
select * from
(select rownum n, x.* from mytable x order by name)
where n between 3 and 5;
it will pick 3 random(*) rows and sort them, but that is ususally not what you want.
(*) not really random, but probably not what you expect.
See http://use-the-index-luke.com/sql/partial-results/window-functions for more effient ways to implement pagination.
You can use inner join on your table and fetch total number of result in your subquery. The example of an query is as follows:
SELECT E.emp_name, E.emp_age, E.emp_sal, E.emp_count
FROM EMP as E
INNER JOIN (SELECT emp_name, COUNT(*) As emp_count
FROM EMP GROUP BY emp_name) AS T
ON E.emp_name = T.emp_name WHERE E.emp_age < 35;
Not sure exactly what you're after based on your question wording, but it seems like you want to see your specialized table of all records with a row number between two values, and in an adjacent field in each record see the total count of records. If so, you can try selecting everything from your table and joining a subquery of a COUNT value as a field by saying where 1=1 (i.e. everywhere) tack that field onto the record. Example:
SELECT *
FROM table_name LEFT JOIN (SELECT COUNT(*) AS NUM_RESULTS FROM table_name) ON 1=1
WHERE ? <= ROWNUM AND ROWNUM < ?

SELECT DISTINCT returns more rows than expected

I have read many answers here, but until now nothing could help me. I'm developing a ticket system, where each ticket has many updates.
I have about 2 tables: tb_ticket and tb_updates.
I created a SELECT with subqueries, where it took a long time (about 25 seconds) to get about 1000 rows. Now I changed it to INNER JOIN instead many SELECTs in subqueries, it is really fast (70 ms), but now I get duplicates tickets. I would like to know how can I do to get only the last row (ordering by time).
My current result is:
...
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
The first column is the ticket ID, the second is the update ID... I would like to get only a row per ticket ID, but DISTINCT does not work in this case. Which row should be? Always the latest one, so in this case 2014-08-26 10:40:21.
UPDATE:
It is a postgresql database. I did not share my current query because it has only portuguese names, so I think it would not help at all.
SOLUTION:
Used_By_Already had the best solution to my problem.
Without the details of your tables one has to guess the field names, but it seems that tb_updates has many records for a single record in tb_ticket (a many to one relationship).
A generic solution to your problem - to get just the "latest" record - is to use a subquery on tb_updates (see alias mx below) and then join that back to tb_updates so that only the record that has the latest date is chosen.
SELECT
t.*
, u.*
FROM tb_ticket t
INNER JOIN tb_updates u
ON t.ticket_id = u.ticket_id
INNER JOIN (
SELECT
ticket_id
, MAX(updated_at) max_updated
FROM tb_updates
GROUP BY
ticket_id
) mx
ON u.ticket_id = mx.ticket_id
AND u.updated_at = mx.max_updated
;
If you have a dbms that supports ROW_NUMBER() then using that function can be a very effective alternative method, but you haven't informed us which dbms you are using.
by the way:
These rows ARE distinct:
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
69759 is different to 69771, and that is enough for the 2 rows to be DISTINCT
there are difference in the 2 dates also.
distinct is a row operator which means is considers the entire row, not just the first column, when deciding which rows are unique.
Used_By_Already's solution would work just fine. I'm not sure on the performance but another solution would be to use cross apply, though that is limited to only a few DBMS's.
SELECT *
FROM tb_ticket ticket
CROSS APPLY (
SELECT top(1) *
FROM tb_updates details
ORDER BY updateTime desc
WHERE details.ticketID = ticket.ticketID
) updates
U Can try something like below if your updateid is identity column:
Select ticketed, max(updateid) from table
group by ticketed
To obtain last row you have to end your query with order by time desc then use TOP (1) in the select statement to select only the first row in the query result
ex:
select TOP (1) .....
from .....
where .....
order by time desc

using a single query to eliminate N+1 select issue

I want to return the last report of a given range of units. The last report will be identified by its time of creation. Therefore, the result would be a collection of last reports for a given range of units. I do not want to use a bunch of SELECT statements e.g.:
SELECT * FROM reports WHERE unit_id = 9999 ORDER BY time desc LIMIT 1
SELECT * FROM reports WHERE unit_id = 9998 ORDER BY time desc LIMIT 1
...
I initially tried this (but already knew it wouldn't work because it will only return 1 report):
'SELECT reports.* FROM reports INNER JOIN units ON reports.unit_id = units.id WHERE units.account_id IS NOT NULL AND units.account_id = 4 ORDER BY time desc LIMIT 1'
So I am looking for some kind of solution using subqueries or derived tables, but I can't just seem to figure out how to do it properly:
'SELECT reports.* FROM reports
WHERE id IN
(
SELECT id FROM reports
INNER JOIN units ON reports.unit_id = units.id
ORDER BY time desc
LIMIT 1
)
Any solution to do this with subqueries or derived tables?
The simple way to do this in Postgres uses distinct on:
select distinct on (unit_id) r.*
from reports r
order by unit_id, time desc;
This construct is specific to Postgres and databases that use its code base. It the expression distinct on (unit_id) says "I want to keep only one row for each unit_id". The row chosen is the first row encountered with that unit_id based on the order by clause.
EDIT:
Your original query would be, assuming that id increases along with the time field:
SELECT r.*
FROM reports r
WHERE id IN (SELECT max(id)
FROM reports
GROUP BY unit_id
);
You might also try this as a not exists:
select r.*
from reports r
where not exists (select 1
from reports r2
where r2.unit_id = r.unit_id and
r2.time > r.time
);
I thought the distinct on would perform well. This last version (and maybe the previous) would really benefit from an index on reports(unit_id, time).

SQL select from data in query where this data is not already in the database?

I want to check my database for records that I already have recorded before making a web service call.
Here is what I imagine the query to look like, I just can't seem to figure out the syntax.
SELECT *
FROM (1,2,3,4) as temp_table
WHERE temp_table.id
LEFT JOIN table ON id IS NULL
Is there a way to do this? What is a query like this called?
I want to pass in a list of id's to mysql and i want it to spit out the id's that are not already in the database?
Use:
SELECT x.id
FROM (SELECT #param_1 AS id
FROM DUAL
UNION ALL
SELECT #param_2
FROM DUAL
UNION ALL
SELECT #param_3
FROM DUAL
UNION ALL
SELECT #param_4
FROM DUAL) x
LEFT JOIN TABLE t ON t.id = x.id
WHERE x.id IS NULL
If you need to support a varying number of parameters, you can either use:
a temporary table to populate & join to
MySQL's Prepared Statements to dynamically construct the UNION ALL statement
To confirm I've understood correctly, you want to pass in a list of numbers and see which of those numbers isn't present in the existing table? In effect:
SELECT Item
FROM IDList I
LEFT JOIN TABLE T ON I.Item=T.ID
WHERE T.ID IS NULL
You look like you're OK with building this query on the fly, in which case you can do this with a numbers / tally table by changing the above into
SELECT Number
FROM (SELECT Number FROM Numbers WHERE Number IN (1,2,3,4)) I
LEFT JOIN TABLE T ON I.Number=T.ID
WHERE T.ID IS NULL
This is relatively prone to SQL Injection attacks though because of the way the query is being built. It'd be better if you could pass in '1,2,3,4' as a string and split it into sections to generate your numbers list to join against in a safer way - for an example of how to do that, see http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows
All of this presumes you've got a numbers / tally table in your database, but they're sufficiently useful in general that I'd strongly recommend you do.
SELECT * FROM table where id NOT IN (1,2,3,4)
I would probably just do:
SELECT id
FROM table
WHERE id IN (1,2,3,4);
And then process the list of results, removing any returned by the query from your list of "records to submit".
How about a nested query? This may work. If not, it may get you in the right direction.
SELECT * FROM table WHERE id NOT IN (
SELECT id FROM table WHERE 1
);

SQL Delete low counts

I have a table with this data:
Id Qty
-- ---
A 1
A 2
A 3
B 112
B 125
B 109
But I'm supposed to only have the max values for each id. Max value for A is 3 and for B is 125. How can I isolate (and delete) the other values?
The final table should look like this :
Id Qty
-- ---
A 3
B 125
Running MySQL 4.1
Oh wait. Got a simpler solution :
I'll select all the max values(group by id), export the data, flush the table, reimport only the max values.
CREATE TABLE tabletemp LIKE table;
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;
DROP TABLE table;
RENAME TABLE tabletemp TO table;
Thanks to all !
Try this in SQL Server:
delete from tbl o
left outer join
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null
Revision 2 for MySQL... Can anyone check this one?:
delete from tbl o
where concat(id,qty) not in
(select concat(id,anz) from (Select max(qty) anz , id
from tbl i
group by i.id))
Explanation:
Since I was supposed to not use joins (See comments about MySQL Support on joins and delete/update/insert), I moved the subquery into a IN(a,b,c) clause.
Inside an In clause I can use a subquery, but that query is only allowed to return one field. So in order to filter all elements that are not the maximum, i need to concat both fields into a single one, so i can return it inside the in clause. So basically my query inside the IN returns the biggest ID+QTY only. To compare it with the main table i also need to make a concat on the outside, so the data for both fields match.
Basically the In clause contains:
("A3","B125")
Disclaimer: The above query is "evil!" since it uses a function (concat) on fields to compare against. This will cause any index on those fields to become almost useless. You should never formulate a query that way that is run on a regular basis. I only wanted to try to bend it so it works on mysql.
Example of this "bad construct":
(Get all o from the last 2 weeks)
select ... from orders where orderday + 14 > now()
You should allways do:
select ... from orders where orderday > now() - 14
The difference is subtle: Version 2 only has to do the math once, and is able to use the index, and version 1 has to do the math for every single row in the orders table., and you can forget about the index usage...
I'd try this:
delete from T
where exists (
select * from T as T2
where T2.Id = T.Id
and T2.Qty > T.Qty
);
For those who might have similar question in the future, this might be supported some day (it is now in SQL Server 2005 and later)
It won't require a join, and it has advantages over the use of a temporary table if the table has dependencies
with Tranked(Id,Qty,rk) as (
select
Id, Qty,
rank() over (
partition by Id
order by Qty desc
)
from T
)
delete from Tranked
where rk > 1;
You'll have to go via another table (among other things that makes a single delete statement here quite impossible in mysql is you can't delete from a table and use the same table in a subquery).
BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where
the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;
MySQL 4.0 and later supports a simple multi-table syntax for DELETE:
DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;
This produces a join of each row with a given id to all other rows with the same id, and deletes only the row with the lesser qty in each pairing. After this is all done, the row with the greatest qty per group of id is left not deleted.
If you only have one row with a given id, it still works because a single row is naturally the one with the greatest value.
FWIW, I just tried my solution using MySQL 5.0.75 on a Macbook Pro 2.40GHz. I inserted 1 million rows of synthetic data, with different numbers of rows per "group":
2 rows per id completes in 26.78 sec.
5 rows per id completes in 43.18 sec.
10 rows per id completes in 1 min 3.77 sec.
100 rows per id completes in 6 min 46.60 sec.
1000 rows per id didn't complete before I terminated it.