I apologize if my code is not properly typed. I am trying to query a table that will return the latest bgcheckdate and status report. The table contains additional bgcheckdates and statuses for each record but in my report I only need to see the latest bgcheckdate with its status.
SELECT BG.PEOPLE_ID, MAX(BG.DATE_RUN) AS DATERUN, BG.STATUS
FROM PKS_BGCHECK BG
GROUP BY BG.PEOPLE_ID, BG.status;
When I run the above query, I still see queries with multiple background check dates and statuses.
Whereas when I run without the status, it works fine:
SELECT BG.PEOPLE_ID, MAX(BG.DATE_RUN)
FROM PKS_BGCHECK BG
GROUP BY BG.PEOPLE_ID;
So just wondering if anyone can help me figure out help me query the date run and status and both reflecting the latest date.
The best solution depends on which RDBMS you are using.
Here is one with basic, standard SQL:
SELECT bg.PEOPLE_ID, bg.DATE_RUN, bg.STATUS
FROM (
SELECT PEOPLE_ID, MAX(DATE_RUN) AS MAX_DATERUN
FROM PKS_BGCHECK
GROUP BY PEOPLE_ID
) sub
JOIN PKS_BGCHECK bg ON bg.PEOPLE_ID = sub.PEOPLE_ID
AND bg.DATE_RUN = sub.MAX_DATERUN;
But you can get multiple rows per PEOPLE_ID if there are ties.
In Oracle, Postgres or SQL Server and others (but not MySQL) you can also use the window function row_number():
WITH cte AS (
SELECT PEOPLE_ID, DATE_RUN, STATUS
, ROW_NUMBER() OVER(PARTITION BY PEOPLE_ID ORDER BY DATE_RUN DESC) AS rn
FROM PKS_BGCHECK
)
SELECT PEOPLE_ID, DATE_RUN, STATUS
FROM cte
WHERE rn = 1;
This guarantees 1 row per PEOPLE_ID. Ties are resolved arbitrarily. Add more expressions to ORDER BY to break ties deterministically.
In Postgres, the simplest solution would be with DISTINCT ON.
Details for both in this related answer:
Select first row in each GROUP BY group?
Selecting the latest row in a time-sensitive set is fairly easy and largely platform independent:
SELECT BG.PEOPLE_ID, BG.DATE_RUN, BG.STATUS
FROM PKS_BGCHECK BG
WHERE BG.DATE_RUN =(
SELECT MAX( DATE_RUN )
FROM PKS_BGCHECK
WHERE PEOPLE_ID = BG.PEOPLE_ID
AND DATE_RUN < SYSDATE );
If the PK is (PEOPLE_ID, DATE_RUN), the query will execute about as quickly as any other method. If they don't form the PK (why not???) then use them to form a unique index. But I'm sure you're already doing one or the other.
Btw, you don't really need the and part of the sub-query if you don't allow future dates to be entered. Some temporal implementations allow for future dates (planned or scheduled events) so I'm used to adding it.
Related
I am looking to write a SQL statement that will pull EventIDs with two or more instances, but will omit the last instance of these records. This seems crazy, but the purpose of this is to look at events that have multiple updates (the updates are related to the TIME_OF_EVENT column, each time a crew/person updates the event, the time it was updated gets stored here) and see which ones have expired in the middle. I see if they expired in the middle by comparing the TIME_OF_EVENT to the PREV_ERT.
select *
from ert_change_log
where time_of_event > '30-SEP-21 23:59:59'
and source <> 'I'
The SQL above generates the picture below. This is simply just a reference of the table to provide an example of EventIDs that meet this criteria.
In a perfect world, the query I am needing would only return EventIDs 210043901 and 210044021 and would omit the latest TIME_OF_EVENT for those EventID.
If this is confusing I would be glad to offer more explanation or clarification!
Thanks for any input.
You can use the ROW_NUMBER analytic function:
SELECT *
FROM (
SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY time_of_event DESC) AS rn
FROM ert_change_log e
WHERE time_of_event >= DATE '2021-10-01'
AND source <> 'I'
)
WHERE rn > 1;
I have a table where I save authors and songs, with other columns. The same song can appear multiple times, and it obviously always comes from the same author. I would like to select the author that has the least songs, including the repeated ones, aka the one that is listened to the least.
The final table should show only one author name.
Clearly, one step is to find the count for every author. This can be done with an elementary aggregate query. Then, if you order by count and you can just select the first row, this would solve your problem. One approach is to use ROWNUM in an outer query. This is a very elementary approach, quite efficient, and it works in all versions of Oracle (it doesn't use any advanced features).
select author
from (
select author
from your_table
group by author
order by count(*)
)
where rownum = 1
;
Note that in the subquery we don't need to select the count (since we don't need it in the output). We can still use it in order by in the subquery, which is all we need it for.
The only tricky part here is to remember that you need to order the rows in the subquery, and then apply the ROWNUM filter in the outer query. This is because ORDER BY is the very last thing that is processed in any query - it comes after ROWNUM is assigned to rows in the output. So, moving the WHERE clause into the subquery (and doing everything in a single query, instead of a subquery and an outer query) does not work.
You can use analytical functions as follows:
Select * from
(Select t.*,
Row_number() over (partition by song order by cnt_author) as rn
From
(Select t.*,
Count(*) over (partition by author) as cnt_author
From your_table t) t ) t
Where rn = 1
I'm trying to figure out how to do this via a view if possible (definitely aware this can be done in-line, via a function, and/or a proc.
There's a view that needs to dedupe a dataset and pick the most recent record. So I'm trying to use wither row_number(), or even a top 1 order by via a cross apply, but the problem is that the query can filter on a date, so for eg.
select x
from view
where date < somedate
and the most recent record needs to be calculated for that filtered dataset. Is there a way to do this in a view? a correlated subquery comes to mind but try as I may, either I get the full duped dataset, or it picks the most recent on the table without a date filter, then applies the date filter after the fact, which isn't the same thing.
Some background per Yogesh:
The table in question contains history of an employee table, where each employee_id can exist multiple times with different date values. There is a primary key on this table which is an employeehistory_id (identity). The goal is to get the most recent record for all employees (1 unique record per employee) where the date < some date. The problem with the windowing is that it needs to almost have the date filter in a subquery within the view (from what I'm seeing). Hopefully that helps clarify the answer.
Currently the view would be something like
SELECT a.*
FROM employeehistory a
join (select employee_id, employeehistory_ID, row_number()OVER(PARTITION
BY employee_id ORDER BY Date DESC) as Ranked
FROM employeehistory) b
on a.employee_id = b.employee_id
and a.employeehistory_ID = b.employeehistory_ID
where b.Ranked = 1
so as you can see, filtering the view with a date, doesn't propagate necessarily to the inner. So asking to see if there is a way to still keep this functional in a view. Once again, I'm aware this can be done either as a function or a proc. Thanks!
We're using SQL Server 2016 Enterprise edition.
how about
select top 1 x
from view
where date < somedate
order by date desc
SELECT TOP 1 x
FROM [View]
WHERE
date < somedate
ORDER BY
date DESC
select *
from MyView
where myDate < somedate
order by myDate desc
limit 1;
limit 1 is the same as top 1 but for MySQL. I have a fiddle that shows what I think your expected output should look like for it as well.
http://sqlfiddle.com/#!9/209226/7
Here is my (simplified) problem, very common I guess:
create table sample (client, recordDate, amount)
I want to find out the latest recording, for each client, with recordDate and amount.
I made the below code, which works, but I wonder if there is any better pattern or Oracle tweaks to improve the efficiency of such SELECT. (I am not allowed to modify to the structure of the database, so indexes etc are out of reach for me, and out of scope for the question).
select client, recordDate, Amount
from sample s
inner join (select client, max(recordDate) lastDate
from sample
group by client) t on s.id = t.id and s.recordDate = t.lastDate
The table has half a million records and the select takes 2-4 secs, which is acceptable but I am curious to see if that can be improved.
Thanks
In most cases Windowed Aggregate Functions might perform better (at least it's easier to write):
select client, recordDate, Amount
from
(
select client, recordDate, Amount,
rank() over (partition by client order by recordDate desc) as rn
from sample s
) dt
where rn = 1
Another structure for the query is not exists. This can perform faster under some circumstances:
select client, recordDate, Amount
from sample s
where not exists (select 1
from sample s2
where s2.client = s.client and
s2.recordDate > s.recordDate
);
This would take good advantage of an index on sample(client, recordDate), if one were available.
And, another thing to try is keep:
select client, max(recordDate),
max(Amount) keep (dense_rank first order by recordDate desc)
from sample s
group by client;
This version assumes only one max record date per client (your original query does not make that assumption).
These queries (plus the one by dnoeth) should all have different query plans and you might get lucky on one of them. The best solution, though, is to have the appropriate index.
I know that there are several posts about how BAD it is to try to loop in SQL Server in a stored procedure. But I haven't quite found what I am trying to do. We are using data connectivity that can be linked internally directly into excel.
I have seen some posts where a few people have said they could convert most loops to a standard query. But for the life of me I am having trouble with this one.
I need all custIDs who have orders right before an event of type 38,40. But only get them if there is no other order between the event and the order in the first query.
So there are 3 parts. I first query for all orders (orders table) based on a time frame into a temporary table.
Select into temp1 odate, custId from orders where odate>'5/1/12'
Then I could use the temp table to inner join on the secondary table to get a customer event (LogEvent table) that may have occurred some time in the past prior to the current order.
Select into temp2 eventdate, temp1.custID from LogEvent inner join temp1 on
temp1.custID=LogEvent.custID where EventType in (38,40) and temp1.odate>eventdate
order by eventdate desc
The problem here is that the queries I am trying to run will return all rows for each of the customers from the first query where I only want the latest for each customer. So this is where on the client side I would loop to only get one Event instead of all the old ones. But as all the query has to run inside of Excel I can't really loop client side.
The third step then could use the results from the second query to make check if the event occurred between most current order and any previous order. I only want the data where the event precedes the order and no other orders are in between.
Select ordernum, shopcart.custID from shopcart right outer join temp2 on
shopcart.custID=temp2.custID where shopcart.odate >= temp2.eventdate and
ordernum is null
Is there a way to simplify this and make it set-based to run in SQL Server instead of some kind of loop that I is perform at the client?
THis is a great example of switching to set-based notation.
First, I combined all three of your queries into a single query. In general, having a single query let's the query optimizer do what it does best -- determine execution paths. It also prevents accidental serialization of queries on a multithreaded/multiprocessor machine.
The key is row_number() for ordering the events so the most recent has a value of 1. You'll see this in the final WHERE clause.
select ordernum, shopcart.custID
from (Select eventdate, temp1.custID,
row_number() over (partition by temp1.CustID order by EventDate desc) as seqnum
from LogEvent inner join
(Select odate, custId
from order
where odate>'5/1/12'
) temp1
on temp1.custID=LogEvent.custID
where EventType in (38,40) and temp1.odate>eventdate order by eventdate desc
) temp2 left outer join
ShopCart
on shopcart.custID=temp2.custID
where seqnum = 1 and shopcart.odate >= temp2.eventdate and ordernum is null
I kept your naming conventions, even though I think "from order" should generate a syntax error. Even if it doesn't it is bad practice to name tables and columns with reserved SQL words.
If you are using a newer version of sql server, then you can use the ROW_NUMBER function. I will write an example shortly.
;WITH myCTE AS
(
SELECT
eventdate, temp1.custID,
ROW_NUMBER() OVER (PARTITION BY temp1.custID ORDER BY eventdate desc) AS CustomerRanking
FROM LogEvent
JOIN temp1
ON temp1.custID=LogEvent.custID
WHERE EventType IN (38,40) AND temp1.odate>eventdate
)
SELECT * into temp2 from myCTE WHERE CustomerRanking = 1;
This gets you the most recent event for each customer without a loop.
Also, you could use RANK, however that will create duplicates for ties, whereas ROW_NUMBER will guarantee no duplicate numbers for your partition.