Oracle Select Max Date on Multiple records - sql

I've got the following SELECT statement, and based on what I've seen here: SQL Select Max Date with Multiple records I've got my example set up the same way. I'm on Oracle 11g. Instead of returning one record for each asset_tag, it's returning multiples. Not as many records as in the source table, but more than (I think) it should be. If I run the inner SELECT statement, it also returns the correct set of records (1 per asset_tag), which really has me stumped.
SELECT
outside.asset_tag,
outside.description,
outside.asset_type,
outside.asset_group,
outside.status_code,
outside.license_no,
outside.rentable_yn,
outside.manufacture_code,
outside.model,
outside.manufacture_vin,
outside.vehicle_yr,
outside.meter_id,
outside.mtr_uom,
outside.mtr_reading,
outside.last_read_date
FROM mp_vehicle_asset_profile outside
RIGHT OUTER JOIN
(
SELECT asset_tag, max(last_read_date) as last_read_date
FROM mp_vehicle_asset_profile
group by asset_tag
) inside
ON outside.last_read_date=inside.last_read_date
Any suggestions?

Try with analytical functions:
SELECT outside.asset_tag,
outside.description,
outside.asset_type,
outside.asset_group,
outside.status_code,
outside.license_no,
outside.rentable_yn,
outside.manufacture_code,
outside.model,
outside.manufacture_vin,
outside.vehicle_yr,
outside.meter_id,
outside.mtr_uom,
outside.mtr_reading,
outside.last_read_date
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY asset_tag ORDER BY last_read_date DESC) Corr
FROM mp_vehicle_asset_profile) outside
WHERE Corr = 1

I think you need to add...
AND outside.asset_tag=inside.asset_tag
...to the criteria in your ON list.
Also a RIGHT OUTER JOIN is not needed. An INNER JOIN will give the same results (and may be more efficicient), since there will be cannot be be combinations of asset_tag and last_read_date in the subquery that do not exist in mp_vehicle_asset_profile.
Even then, the query may return more than one row per asset tag if there are "ties" -- that is, multiple rows with the same last_read_date. In contrast, #Lamak's analytic-based answer will arbitrarily pick exactly one row this situation.
Your comment suggests that you want to break ties by picking the row with highest mtr_reading for the last_read_date.
You could modify #Lamak's analyic-based answer to do this by changing the ORDER BY in the OVER clause to:
ORDER BY last_read_date DESC, mtr_reading DESC
If there are still ties (that is, multiple rows with the same asset_tag, last_read_date, and mtr_reading), the query will again abritrarily pick exactly one row.
You could modify my aggregate-based answer to break ties using highest mtr_reading as follows:
SELECT
outside.asset_tag,
outside.description,
outside.asset_type,
outside.asset_group,
outside.status_code,
outside.license_no,
outside.rentable_yn,
outside.manufacture_code,
outside.model,
outside.manufacture_vin,
outside.vehicle_yr,
outside.meter_id,
outside.mtr_uom,
outside.mtr_reading,
outside.last_read_date
FROM
mp_vehicle_asset_profile outside
INNER JOIN
(
SELECT
asset_tag,
MAX(last_read_date) AS last_read_date,
MAX(mtr_reading) KEEP (DENSE_RANK FIRST ORDER BY last_read_date DESC) AS mtr_reading
FROM
mp_vehicle_asset_profile
GROUP BY
asset_tag
) inside
ON
outside.asset_tag = inside.asset_tag
AND
outside.last_read_date = inside.last_read_date
AND
outside.mtr_reading = inside.mtr_reading
If there are still ties (that is, multiple rows with the same asset_tag, last_read_date, and mtr_reading), the query may again return more than one row.
One other way that the analytic- and aggregate-based answers differ is in their treatment of nulls. If any of asset_tag, last_read_date, or mtr_reading are null, the analytic-based answer will return related rows, but the aggregate-based one will not (because the equality conditions in the join do not evaluate to TRUE when a null is involved.

Related

SQL Query for multiple columns with one column distinct

I've spent an inordinate amount of time this morning trying to Google what I thought would be a simple thing. I need to set up an SQL query that selects multiple columns, but only returns one instance if one of the columns (let's call it case_number) returns duplicate rows.
select case_number, name, date_entered from ticket order by date_entered
There are rows in the ticket table that have duplicate case_number, so I want to eliminate those duplicate rows from the results and only show one instance of them. If I use "select distinct case_number, name, date_entered" it applies the distinct operator to all three fields, instead of just the case_number field. I need that logic to apply to only the case_number field and not all three. If I use "group by case_number having count (*)>1" then it returns only the duplicates, which I don't want.
Any ideas on what to do here are appreciated, thank you so much!
You can use ROW_NUMBER(). For example
select *
from (
select *,
row_number() over(partition by case_number) as rn
) x
where rn = 1
The query above will pseudo-randomly pick one row for each case_number. If you want a better selection criteria you can add ORDER BY or window frames to the OVER clause.

Get most recent record from Right table with sub query

When I join to the right table I am getting way too many duplicates. I am trying to grab the most recent record from the right table however, it does not matter what I try it does not work.
So Far I have tried:
PROC SQL;
CREATE TABLE fs1.sample AS
SELECT A.*,
B.xx1,
max(B.time_s)
FROM lx1.results a left join (Select Distinct C.id, c.per FROM lx2.results c
Where c.id = a.id
and COMPGED(a.txt1, c.txt1,'i') < 100
and c.dt > a.dt
and c.ksv = 37
and datepart(c.lsg) >= '12DEC2020'd ) b
ON a.id = b.id
group by a.id, a.txt1
QUIT;
Unfortunately, I get an error. I also tried using case when exists, but that takes way too long. Essentially I am trying to grab the most recent record from the right table based on time_s. I also want to make sure the record I grab from the right table somewhat matches a.txt1.
Cheers
When you perform a join, you attach all records from the table that match your join conditions.
If the table is indexed appropriately, a subquery could achieve the goal of obtaining the most recent value, however, if the query uses the wrong index, TOP or equivalent functions may return the wrong result.
There are a number of ways to accomplish the task of retrieving the most recent record but they are contingent on a couple of things.
Firstly, you need to be able to identify what the most recent row is, usually by a column called CreatedDate or something similar against the IDs. (You should know what that business logic is, it may be that the table is chronologically entered [as most tables are] and therefore, SubID might be a thing. We're going to assume it is CreatedDate.)
Secondly, you need to rank the rows in terms of the CreatedDate in a descending order so that the newest matching ID is ranked 1.
Finally, you filter your results by 1 to return the newest result, but you could also filter by <= x if you are interested in the top x newest return results per ID.
To use more mathematical language: We are deriving a value from the CreatedDate and ID values and then using that derivative value to sort and filter the data. In this case we are deriving the RowNumber from the CreatedDate in descending order for each ID.
In order to accomplish this, you can use the Windowed Function ROW_NUMBER(),
ROW_NUMBER() OVER (PARTITION BY id ORDER BY CreatedDate DESC) as RankID
This windowed function will return a row value for each ID relative to the CreatedDate in descending order, where the newest created date is equal to 1.
You can then put brackets around the whole query to make it into a table so you will be able to filter the results of that Windowed Function.
SELECT id, txt
(SELECT id, txt
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY CreatedDate DESC) as RankID
FROM SourceTable) A
WHERE RankID = 1
This should achieve your goal of returning the "newest result".
What ever your column is that determines the age of the data relative to the ID, it can be multiple, should be placed within the ORDER BY.
In order to make this query perform faster, you should index your data appropriately, whereby ID is the the first column, and CreatedDate Desc is your next column. This means your system will not have to perform a costly sort every time this runs, but that depends on whether you plan on using this query often and how much overhead it is grabbing.

How to select the row with the lowest value- oracle

I have a table where I save authors and songs, with other columns. The same song can appear multiple times, and it obviously always comes from the same author. I would like to select the author that has the least songs, including the repeated ones, aka the one that is listened to the least.
The final table should show only one author name.
Clearly, one step is to find the count for every author. This can be done with an elementary aggregate query. Then, if you order by count and you can just select the first row, this would solve your problem. One approach is to use ROWNUM in an outer query. This is a very elementary approach, quite efficient, and it works in all versions of Oracle (it doesn't use any advanced features).
select author
from (
select author
from your_table
group by author
order by count(*)
)
where rownum = 1
;
Note that in the subquery we don't need to select the count (since we don't need it in the output). We can still use it in order by in the subquery, which is all we need it for.
The only tricky part here is to remember that you need to order the rows in the subquery, and then apply the ROWNUM filter in the outer query. This is because ORDER BY is the very last thing that is processed in any query - it comes after ROWNUM is assigned to rows in the output. So, moving the WHERE clause into the subquery (and doing everything in a single query, instead of a subquery and an outer query) does not work.
You can use analytical functions as follows:
Select * from
(Select t.*,
Row_number() over (partition by song order by cnt_author) as rn
From
(Select t.*,
Count(*) over (partition by author) as cnt_author
From your_table t) t ) t
Where rn = 1

Oracle subquery in select

I have a table that keeps costs of products. I'd like to get the average cost AND last buying invoice for each product.
My solution was creating a sub-select to get last buying invoice but unfortunately I'm getting
ORA-00904: "B"."CODPROD": invalid identifier
My query is
SELECT (b.cod_aux) product,
-- here goes code to get average cost,
(SELECT round(valorultent, 2)
FROM (SELECT valorultent
FROM pchistest
WHERE codprod = b.codprod
ORDER BY dtultent DESC)
WHERE ROWNUM = 1)
FROM pchistest a, pcembalagem b
WHERE a.codprod = b.codprod
GROUP BY a.codprod, b.cod_aux
ORDER BY b.cod_aux
In short what I'm doing on sub-select is ordering descendantly and getting the first row given the product b.codprod
Your problem is that you can't use your aliased columns deeper than one sub-query. According to the comments, this was changed in 12C, but I haven't had a chance to try it as the data warehouse that I use is still on 11g.
I would use something like this:
SELECT b.cod_aux AS product
,ROUND (r.valorultent, 2) AS valorultent
FROM pchistest a
JOIN pcembalagem b ON (a.codprod = b.codprod)
JOIN (SELECT valorultent
,codprod
,ROW_NUMBER() OVER (PARTITION BY codprod
ORDER BY dtultent DESC)
AS row_no
FROM pchistest) r ON (r.row_no = 1 AND r.codprod = b.codprod)
GROUP BY a.codprod, b.cod_aux
ORDER BY b.cod_aux
I avoid sub-queries in SELECT statements. Most of the time, the optimizer wants to run a SELECT for each item in the cursor, OR it does some crazy nested loops. If you do it as a sub-query in the JOIN, Oracle will normally process the rows that you are joining; normally, it is more efficient. Finally, complete your per item functions (in this case, the ROUND) in the final product. This will prevent Oracle from doing it on ALL rows, not just the ones you use. It should do it correctly, but it can get confused on complex queries.
The ROW_NUMBER() OVER (PARTITION BY ..) is where the magic happens. This adds a row number to each group of CODPRODs. This allows you to pluck the top row from each CODPROD, so this allows you to get the newest/oldest/greatest/least/etc from your sub-query. It is also great for filtering duplicates.

SELECT DISTINCT returns more rows than expected

I have read many answers here, but until now nothing could help me. I'm developing a ticket system, where each ticket has many updates.
I have about 2 tables: tb_ticket and tb_updates.
I created a SELECT with subqueries, where it took a long time (about 25 seconds) to get about 1000 rows. Now I changed it to INNER JOIN instead many SELECTs in subqueries, it is really fast (70 ms), but now I get duplicates tickets. I would like to know how can I do to get only the last row (ordering by time).
My current result is:
...
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
The first column is the ticket ID, the second is the update ID... I would like to get only a row per ticket ID, but DISTINCT does not work in this case. Which row should be? Always the latest one, so in this case 2014-08-26 10:40:21.
UPDATE:
It is a postgresql database. I did not share my current query because it has only portuguese names, so I think it would not help at all.
SOLUTION:
Used_By_Already had the best solution to my problem.
Without the details of your tables one has to guess the field names, but it seems that tb_updates has many records for a single record in tb_ticket (a many to one relationship).
A generic solution to your problem - to get just the "latest" record - is to use a subquery on tb_updates (see alias mx below) and then join that back to tb_updates so that only the record that has the latest date is chosen.
SELECT
t.*
, u.*
FROM tb_ticket t
INNER JOIN tb_updates u
ON t.ticket_id = u.ticket_id
INNER JOIN (
SELECT
ticket_id
, MAX(updated_at) max_updated
FROM tb_updates
GROUP BY
ticket_id
) mx
ON u.ticket_id = mx.ticket_id
AND u.updated_at = mx.max_updated
;
If you have a dbms that supports ROW_NUMBER() then using that function can be a very effective alternative method, but you haven't informed us which dbms you are using.
by the way:
These rows ARE distinct:
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
69759 is different to 69771, and that is enough for the 2 rows to be DISTINCT
there are difference in the 2 dates also.
distinct is a row operator which means is considers the entire row, not just the first column, when deciding which rows are unique.
Used_By_Already's solution would work just fine. I'm not sure on the performance but another solution would be to use cross apply, though that is limited to only a few DBMS's.
SELECT *
FROM tb_ticket ticket
CROSS APPLY (
SELECT top(1) *
FROM tb_updates details
ORDER BY updateTime desc
WHERE details.ticketID = ticket.ticketID
) updates
U Can try something like below if your updateid is identity column:
Select ticketed, max(updateid) from table
group by ticketed
To obtain last row you have to end your query with order by time desc then use TOP (1) in the select statement to select only the first row in the query result
ex:
select TOP (1) .....
from .....
where .....
order by time desc