Select the latest record based on certain criteria in PL/SQL - sql

I have a table with real-time scanning data from our employees. As you can see, each boxes can be scanned multiple times, even employees can scan the boxes multiple time for one status.
I am trying to pull the latest record for each box, but the status for the latest record should be "Refused"
From the picture, as you can see, although Carton 1234 has a record with status "Refused", but this record is not the last one, so I don’t need this. And the carton 1235 is what I need.
I don’t want to use a window function to rank each record in the table first, because I have a lot of rows in the table, and I think it will be time consuming.
So is there any better way to achieve my goal?

Supposing that you don't really need a PL/SQL solution. Here is SQL only:
This is a solution without window functions:
select *
from mytable
where (carton_id, scantime) in
(
select carton_id, max(scantime)
from mytable
group by carton_id
having max(status) keep (dense_rank last order by scantime) = 'Refused'
);
But I don't think that this is superior to using a window function. So you can just as well try
select *
from
(
select mytable.*, max(scantime) over (partition by carton_id) as max_scantime
from mytable
group by carton_id
)
where scantime = max_scantime and status = 'Refused';

Here is one method:
select t.*
from t
where t.status = 'Refused' and
t.scantime = (select max(t2.scantime) from t t2 where t2.carton_id = t.carton_id);

Related

SQL - Min() on a Daily Query

I am trying to pull some specific information from an access control database.
I have a query providing results spanning several days. For a specific day, I need to get the first record of each person for that specific day. I have totally muddled the entire bit, hence my questions
This is the code used to pull the initial query
Select
Message.TimeStamp_SPM,
Message.FirstName,
Message.LastName,
Message.CardNumber,
Message.MessageDescription,
Message.Description,
Department.Description As Description1
From
Message Inner Join
CardHolder On CardHolder.CardHolderID = Message.CardHolderID Inner Join
Department On CardHolder.DepartmentID = Department.DepartmentID
Where
Message.TimeStamp_SPM > Convert(datetime,'2021-03-02',120) And
Message.TimeStamp_SPM < Convert(datetime,'2021-03-03',120) And
Message.Description Not Like '%Truck%'
From this query I need to display the obtain the first record of each person for that specific date. Any advice on the most efficient way to obtain the desired result?
From this query I need to display the obtain the first record of each person for that specific date.
Assuming "person" is CardHolderId, then include that in your query. You can then use window functions to get the most recent record for each CardHolderId:
with cte as (
<your query here with CardHolderId>
)
select cte.*
from (select cte.*,
row_number() over (partition by CardHolderID order by TimeStamp_SPM desc) as seqnum
from cte
) cte
where seqnum = 1;

Efficiently find last date in a table - Teradata SQL

Say I have a rather large table in a Teradata database, "Sales" that has a daily record for every sale and I want to write a SQL statement that limits this to the latest date only. This will not always be the previous day, for example, if it was a Monday the latest date would be the previous Friday.
I know I can get the results by the following:
SELECT s.*
FROM Sales s
JOIN (
SELECT MAX(SalesDate) as SalesDate
FROM Sales
) sd
ON s.SalesDate=sd.SalesDt
I am not knowledgable on how it would process the subquery and since Sales is a large table would there be a more efficient way to do this given there is not another table I could use?
Another (more flexible) way to get the top n utilizes OLAP-functions:
SELECT *
FROM Sales s
QUALIFY
RANK() OVER (ORDER BY SalesDate DESC) = 1
This will return all rows with the max date. If you want only one of them switch to ROW_NUMBER.
That is probably fine, if you have an index on salesdate.
If there is only one row, then I would recommend:
select top 1 s.*
from sales s
order by salesdate desc;
In particular, this should make use of an index on salesdate.
If there is more than one row, use top 1 with ties.

SQL select the first match in an ordered list

Using Microsoft SQL Server Management Studio version 14.0.17213.0
I have a list of events that go in order. I want to select the highest precedent acct_no, complete_date and event.
My problem is if I use
select
account_number, event, max(complete_date) as mx_comp
from
mytable
where
event in ('event1','event2'....)
then I get all my acct_numbers, all the events in the list and the max complete date for that event. But I want acct_no listed with the maximum completed date for any item in the list and the associated event.
Furthermore, its wholly possible that two events occurred on the same date, so I cannot do
select *
from mytable mt
join
(select acct_number, max(complete_date)
from mytable) t on mt.acct_number = t.account_number
and mt.complete_date = t.complete_date
because if two events occurred on the same day then I still get duplicate results.
I have tried to do a similar thing with
row_number() over (order by account_number) as RowNum
but it did not work, because I still get matches to all the events, not just my highest precedence event
it really boils down to needing to return the acct_number, event and complete date associated to the highest importance match from items in an ordered list.
I am sure it is easy - I just cannot seem to figure it out and despite all my google and stack searching I simply cannot figure it out
I have recently been thinking that it might be possible with something like coalesce(mylist) because I would be able to put my list in order but I cannot figure out how to use coalesce in a meaningful way for this problem.
The real solution would be to create a table with precedence numbers or have a most recent indicator but I dont have unlimited access to create any tables I want.
Any help or ideas on how to match to an ordered list would be appreciated
You seem to want:
select t.*
from (select t.*,
row_number() over (partition by account_number order by complete_date desc) as seqnum
from mytable t
where event in ('event1', 'event2', ....)
) t
where seqnum = 1;

How to get the most frequent value SQL

I have a table Orders(id_trip, id_order), table Trip(id_hotel, id_bus, id_type_of_trip) and table Hotel(id_hotel, name).
I would like to get name of the most frequent hotel in table Orders.
SELECT hotel.name from Orders
JOIN Trip
on Orders.id_trip = Trip.id_hotel
JOIN hotel
on trip.id_hotel = hotel.id_hotel
FROM (SELECT hotel.name, rank() over (order by cnt desc) rnk
FROM (SELECT hotel.name, count(*) cnt
FROM Orders
GROUP BY hotel.name))
WHERE rnk = 1;
The "most frequently occurring value" in a distribution is a distinct concept in statistics, with a technical name. It's called the MODE of the distribution. And Oracle has the STATS_MODE() function for it. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions154.htm
For example, using the EMP table in the standard SCOTT schema, select stats_mode(deptno) from scott.emp will return 30 - the number of the department with the most employees. (30 is the department "name" or number, it is NOT the number of employees in that department!)
In your case:
select stats_mode(h.name) from (the rest of your query)
Note: if two or more hotels are tied for "most frequent", then STATS_MODE() will return one of them (non-deterministic). If you need all the tied values, you will need a different solution - a good example is in the documentation (linked above). This is a documented flaw in Oracle's understanding and implementation of the statistical concept.
Use FIRST for a single result:
SELECT MAX(hotel.name) KEEP (DENSE_RANK FIRST ORDER BY cnt DESC)
FROM (
SELECT hotel.name, COUNT(*) cnt
FROM orders
JOIN trip USING (id_trip)
JOIN hotel USING (id_hotel)
GROUP BY hotel.name
) t
Here is one method:
select name
from (select h.name,
row_number() over (order by count(*) desc) as seqnum -- use `rank()` if you want duplicates
from orders o join
trip t
on o.id_trip = t.id_trip join -- this seems like the right join condition
hotels h
on t.id_hotel = h.id_hotel
) oth
where seqnum = 1;
** Getting the most recent statistical mode out of a data sample **
I know it's more than a year, but here's my answer. I came across this question hoping to find a simpler solution than what I know, but alas, nope.
I had a similar situation where I needed to get the mode from a data sample, with the requirement to get the mode of the most recently inserted value if there were multiple modes.
In such a case neither the STATS_MODE nor the LAST aggregate functions would do (as they would tend to return the first mode found, not necessarily the mode with the most recent entries.)
In my case it was easy to use the ROWNUM pseudo-column because the tables in question were performance metric tables that only experienced inserts (not updates)
In this oversimplified example, I'm using ROWNUM - it could easily be changed to a timestamp or sequence field if you have one.
SELECT VALUE
FROM
(SELECT VALUE ,
COUNT( * ) CNT,
MAX( R ) R
FROM
( SELECT ID, ROWNUM R FROM FOO
)
GROUP BY ID
ORDER BY CNT DESC,
R DESC
)
WHERE
(
ROWNUM < 2
);
That is, get the total count and max ROWNUM for each value (I'm assuming the values are discrete. If they aren't, this ain't gonna work.)
Then sort so that the ones with largest counts come first, and for those with the same count, the one with the largest ROWNUM (indicating most recent insertion in my case).
Then skim off the top row.
Your specific data model should have a way to discern the most recent (or the oldest or whatever) rows inserted in your table, and if there are collisions, then there's not much of a way other than using ROWNUM or getting a random sample of size 1.
If this doesn't work for your specific case, you'll have to create your own custom aggregator.
Now, if you don't care which mode Oracle is going to pick (your bizness case just requires a mode and that's it, then STATS_MODE will do fine.

SQL Server: I have multiple records per day and I want to return only the first of the day

I have some records track inquires by DATETIME. There is an glitch in the system and sometimes a record will enter multiple times on the same day. I have a query with a bunch of correlated subqueries attached to these but the numbers are off because when there were those glitches in the system then these leads show up multiple times. I need the first entry of the day, I tried fooling around with MIN but I couldn't quite get it to work.
I currently have this, I am not sure if I am on the right track though.
SELECT SL.UserID, MIN(SL.Added) OVER (PARTITION BY SL.UserID)
FROM SourceLog AS SL
Here's one approach using row_number():
select *
from (
select *,
row_number() over (partition by userid, cast(added as date) order by added) rn
from sourcelog
) t
where rn = 1
You could use group by along with min to accomplish this.
Depending on how your data is structured if you are assigning a unique sequential number to each record created you could just return the lowest number created per day. Otherwise you would need to return the ID of the record with the earliest DATETIME value per day.
--Assumes sequential IDs
select
min(Id)
from
[YourTable]
group by
--the conversion is used to stip the time value out of the date/time
convert(date, [YourDateTime]