using a single query to eliminate N+1 select issue - sql

I want to return the last report of a given range of units. The last report will be identified by its time of creation. Therefore, the result would be a collection of last reports for a given range of units. I do not want to use a bunch of SELECT statements e.g.:
SELECT * FROM reports WHERE unit_id = 9999 ORDER BY time desc LIMIT 1
SELECT * FROM reports WHERE unit_id = 9998 ORDER BY time desc LIMIT 1
...
I initially tried this (but already knew it wouldn't work because it will only return 1 report):
'SELECT reports.* FROM reports INNER JOIN units ON reports.unit_id = units.id WHERE units.account_id IS NOT NULL AND units.account_id = 4 ORDER BY time desc LIMIT 1'
So I am looking for some kind of solution using subqueries or derived tables, but I can't just seem to figure out how to do it properly:
'SELECT reports.* FROM reports
WHERE id IN
(
SELECT id FROM reports
INNER JOIN units ON reports.unit_id = units.id
ORDER BY time desc
LIMIT 1
)
Any solution to do this with subqueries or derived tables?

The simple way to do this in Postgres uses distinct on:
select distinct on (unit_id) r.*
from reports r
order by unit_id, time desc;
This construct is specific to Postgres and databases that use its code base. It the expression distinct on (unit_id) says "I want to keep only one row for each unit_id". The row chosen is the first row encountered with that unit_id based on the order by clause.
EDIT:
Your original query would be, assuming that id increases along with the time field:
SELECT r.*
FROM reports r
WHERE id IN (SELECT max(id)
FROM reports
GROUP BY unit_id
);
You might also try this as a not exists:
select r.*
from reports r
where not exists (select 1
from reports r2
where r2.unit_id = r.unit_id and
r2.time > r.time
);
I thought the distinct on would perform well. This last version (and maybe the previous) would really benefit from an index on reports(unit_id, time).

Related

How can I get the total result count, and a given subset ('page' of results) with the same SQL Query with Oracle

I would like to display a table of results. The data is sourced from a SQL query on an Oracle database. I would like to show the results one page (say, 10 records) at a time, minimising the actual data being sent to the front-end.
At the same time, I would like to show the total number of possible results (say, showing 1-10 of 123), and to allow for pagination (say, to calculate that 10 per page, 123 results, therefore 13 pages).
I can get the total number of results with a single count query.
SELECT count(*) AS NUM_RESULTS FROM ... etc.
and I can get the desired subset with another query
SELECT * FROM ... etc. WHERE ? <= ROWNUM AND ROWNUM < ?
But, is there a way to get all the relevant details in one single query?
Update
Actually, the above query using ROWNUM seems to work for 0 - 10, but not for 10 - 20, so how can I do that too?
ROWNUM is a bit tricky to use.
The ROWNUM pseudocolumn always starts with 1 for the first result that actually gets fetched. If you filter for ROWNUM>10, you will never fetch any result and therefore will not get any.
If you want to use it for paging (not that you really should), it requires nested subqueries:
select * from
(select rownum n, x.* from
(select * from mytable order by name) x
)
where n between 3 and 5;
Note that you need another nested subquery to get the order by right; if you put the order by one level higher
select * from
(select rownum n, x.* from mytable x order by name)
where n between 3 and 5;
it will pick 3 random(*) rows and sort them, but that is ususally not what you want.
(*) not really random, but probably not what you expect.
See http://use-the-index-luke.com/sql/partial-results/window-functions for more effient ways to implement pagination.
You can use inner join on your table and fetch total number of result in your subquery. The example of an query is as follows:
SELECT E.emp_name, E.emp_age, E.emp_sal, E.emp_count
FROM EMP as E
INNER JOIN (SELECT emp_name, COUNT(*) As emp_count
FROM EMP GROUP BY emp_name) AS T
ON E.emp_name = T.emp_name WHERE E.emp_age < 35;
Not sure exactly what you're after based on your question wording, but it seems like you want to see your specialized table of all records with a row number between two values, and in an adjacent field in each record see the total count of records. If so, you can try selecting everything from your table and joining a subquery of a COUNT value as a field by saying where 1=1 (i.e. everywhere) tack that field onto the record. Example:
SELECT *
FROM table_name LEFT JOIN (SELECT COUNT(*) AS NUM_RESULTS FROM table_name) ON 1=1
WHERE ? <= ROWNUM AND ROWNUM < ?

Select last N rows SQL Server 2012

I am currently working with Big Data. I am importing data into a table which is about 200 million records per import. I want to see how many records are loaded in for the current import. But currently my script is running through 1 billion records first to finally count the last imported data.
SELECT Datum, COUNT(Datum) AS recCount
FROM PF161DailyAggregates
GROUP BY Datum
That is my current code which shows the amount of rows per Date
I can make the code that it only shows the current import job but it will still go through all the other records.
Currently this query takes about an hour. How can I make this fast to only count the last N rows?
Thanks in advance
this will restrict result to 100 rows and you can get last rows by giving order by clause desc
SELECT Datum, COUNT(Datum) AS recCount
FROM PF161DailyAggregates
GROUP BY Datum
order by datum desc
OFFSET 1 ROWS
FETCH NEXT 100 ROWS ONLY;
Thats a hard one. I think as long as you want to find out the last records AFTER the import, you are required to use some ordering on the Datum column. You can try various tricks there but as long as this column does not have an index, you will be lost, since any ordering requires a full table scan. So my first suggestion is to make an index on that column, then you can use any technique that restricts your result to the last date like:
select top 1 Datum, count(Datum)
from PF161DailyAggregates
group by Datum
order by Datum desc
or
select count(*)
from PF161DailyAggregates
where Datum = (select top 1 Datum
from PF161DailyAggregates
order by Datum desc)
Another idea would be to break out of the box and make the import job write the number of records per Datum in a seperate table each time it runs. That would be much cheaper.
Fastest way to find count on single table,
SELECT T.name AS [TABLE NAME],
I.rows AS [ROWCOUNT]
FROM sys.tables AS T
INNER JOIN sys.sysindexes AS I
ON T.object_id = I.id
AND I.indid < 2
where T.name ='PF161DailyAggregates'
ORDER BY I.rows DESC
Alternatively,
you can create one identity column.
Before insert find max id== easy and fast
then after insert find SCOPE_IDENTITY() in variable.
then subtract these two .
If table already contain one rownumber type in sequence then
also you can use same technique using First_Value in sql server 2012

SELECT DISTINCT returns more rows than expected

I have read many answers here, but until now nothing could help me. I'm developing a ticket system, where each ticket has many updates.
I have about 2 tables: tb_ticket and tb_updates.
I created a SELECT with subqueries, where it took a long time (about 25 seconds) to get about 1000 rows. Now I changed it to INNER JOIN instead many SELECTs in subqueries, it is really fast (70 ms), but now I get duplicates tickets. I would like to know how can I do to get only the last row (ordering by time).
My current result is:
...
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
The first column is the ticket ID, the second is the update ID... I would like to get only a row per ticket ID, but DISTINCT does not work in this case. Which row should be? Always the latest one, so in this case 2014-08-26 10:40:21.
UPDATE:
It is a postgresql database. I did not share my current query because it has only portuguese names, so I think it would not help at all.
SOLUTION:
Used_By_Already had the best solution to my problem.
Without the details of your tables one has to guess the field names, but it seems that tb_updates has many records for a single record in tb_ticket (a many to one relationship).
A generic solution to your problem - to get just the "latest" record - is to use a subquery on tb_updates (see alias mx below) and then join that back to tb_updates so that only the record that has the latest date is chosen.
SELECT
t.*
, u.*
FROM tb_ticket t
INNER JOIN tb_updates u
ON t.ticket_id = u.ticket_id
INNER JOIN (
SELECT
ticket_id
, MAX(updated_at) max_updated
FROM tb_updates
GROUP BY
ticket_id
) mx
ON u.ticket_id = mx.ticket_id
AND u.updated_at = mx.max_updated
;
If you have a dbms that supports ROW_NUMBER() then using that function can be a very effective alternative method, but you haven't informed us which dbms you are using.
by the way:
These rows ARE distinct:
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
69759 is different to 69771, and that is enough for the 2 rows to be DISTINCT
there are difference in the 2 dates also.
distinct is a row operator which means is considers the entire row, not just the first column, when deciding which rows are unique.
Used_By_Already's solution would work just fine. I'm not sure on the performance but another solution would be to use cross apply, though that is limited to only a few DBMS's.
SELECT *
FROM tb_ticket ticket
CROSS APPLY (
SELECT top(1) *
FROM tb_updates details
ORDER BY updateTime desc
WHERE details.ticketID = ticket.ticketID
) updates
U Can try something like below if your updateid is identity column:
Select ticketed, max(updateid) from table
group by ticketed
To obtain last row you have to end your query with order by time desc then use TOP (1) in the select statement to select only the first row in the query result
ex:
select TOP (1) .....
from .....
where .....
order by time desc

sql get max based on field

I need to get the ID based from what ever the max amount is. Below is giving me an error
select ID from Prog
where Amount = MAX(Amount)
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
The end result is that I need to get the just the ID as I need to pass it something else that is expecting it.
You need to order by Amount and select 1 record instead...
SELECT ID
FROM Prog
ORDER BY Amount DESC
LIMIT 1;
This takes all the rows in Prog, orders them in descending order by Amount (in other words, the first sorted row has the highest Amount), then limits the query to select only one row (the one with the highest Amount).
Also, subqueries are bad for performance. This code runs on a table with 200k records in half the time as the subquery versions.
Just pass a subquery with the max value to the where clause :
select ID from Prog
where Amount = (SELECT MAX(Amount) from Prog)
If you're using SQL Server that should do it :
SELECT TOP 1 ID
FROM Prog
ORDER BY Amount DESC
This should be something like:
select P.ID from Prog P
where P.Amount = (select max(Amount) from Prog)
EDIT:
If you really want only 1 row, you should do:
select max(P.ID) from Prog P
where P.Amount = (select max(Amount) from Prog);
However, if you have multiple rows that would match amount and you only want 1 row, you should have some kind of logic behind how you pick your one row. Not just relying on this max trick, or limit 1 type logic.
Also, I don't write limit 1, because this is not ANSI sql -- it works in mysql but OP doesn't say what he wants. Every db is different -- see here: Is there an ANSI SQL alternative to the MYSQL LIMIT keyword? Don't get used to one db's extensions unless you only want to work in 1 db for the rest of your life.
select min(ID) from Prog
where Amount in
(
select max(amount)
from prog
)
The min statement ensures that you get only one result.

How to find unused ID in a column? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
SQL query to find Missing sequence numbers
I have a table which has a user Id column, The user could select which user ID to add in the table. I am wondering if there is a one sql code that could point me to the list of unused user id or even just the smallest unused ID?
For example, I have the following IDs
USER_ID
1
2
3
5
6
7
8
10
I would like to know if there is a way to select 4 or even selecting 4 and 9?
You can try using the "NOT IN" clause:
select
user_id
from table
where
user_id not in (select user_id from another_table)
Like this:
select
u1.user_id + 1 as start
from users as u1
left outer join users as u2 on u1.user_id + 1 = u2.id
where
u2.id is null
From here.
It depends on the Database you are using. If you are using Oracle, something like this will work:
Step 1: Find out max value of userid in your table:
select max(userid) from tbl_userid
let this number be m
Step 2: Find out the max value of rownum in the foll query
select rownum from all_objects
Step 3: If the max value is greater than m then you can use the foll query to list your unused user ids
select user_id
from tbl_userid
where user_id NOT IN (select rownum from all_objects)
If max value returned by step 2 is less than m you can tweak your query to the following
select user_id
from tbl_userid
where user_id NOT IN
(select rownum
from (select *
from all_objects
UNION ALL
select * from all_objects)
)
Repeat the UNION ALL until you get max(rownum) >= m
If you are using SQL server, kindly let me know. There is no direct equivalent of ROWNUM pseudocolumn in sql server but there are workarounds using the RANK() function.
Given that SQL is generally a set-based language, the only way I could think to do this would be to create the full set of ID's, and outer join your table where no ID's matched. Problem with that is if your table has a significant number of records, you would have to generate a temporary table containing every ID from 1 through MAX(USER_ID). Given a table with tens or hundreds of millions of records, that could be very slow.
Just out of curiosity, why do you need to know the ID holes? Is there some specific reason, or are you just trying to not "waste" an ID? Given the processing effort to find the holes, I would think it is more efficient to just let them be.
Here's one way to do it using SQL Server 2005 or later. It may or may not work efficiently for you:
insert into T values
(1),(2),(3),(5),(6),(9),(11);
with Trk as (
select userid,
row_number() over (
order by userid
) as rk
from T
), Truns(start,finish,gp) as (
select -1+min(userid), 1+max(userid),
userid-rk
from Trk
group by userid-rk
), Tregroup as (
select start, finish,
row_number() over (
order by gp
) as rk
from Truns
), Tpre as (
select a.finish, b.start
from Tregroup as a full outer join Tregroup as b
on a.rk + 1 = b.rk
)
select
rtrim(finish) + case when start = finish then '' else + '-' + rtrim(start) end as gap
from Tpre
where finish+start is not null
drop table T;
Short of looping through all the ids (perhaps using binary search tree logic?) I don't have a good answer for you.
I would ask what you want this for? By their nature, ids are essentially meaningless - all they do is identify some data, not describe it, and as such it shouldn't be a problem if you have large gaps in your user ids. (In fact, some people would say that it's even better to have unguessable ids, to avoid users tampering with information to find security holes)