Filter duplicated value in SQL

Filter duplicated value in SQL - sql

I'm trying to build a query that provides me a list of five jobs for a weekly promotion. The query works fine and gives the right result. There is only one factor that needs a filter.
We want to promote different jobs of different companies. The ORDER BY makes it possible to select jobs with the highest need for applicants. It could be that one company has five times the most urgent need. Therefore the query selects the five jobs of one company. I want to add a filter so the query selects a maximum of two or three job from one company. But couldn't find out how.
I've tried it with different angles of the DISTINCT function. But without results. I think that the underlying problem has something to do with a wrong group function on job.id (just a thought) but can't find a solution.
SELECT
job.id,
company_name,
city,
job.title,
hourly_rate_amount,
created_at,
count(work_intent.id),
number_of_contractors,
(count(work_intent.id)/number_of_contractors) AS applicants,
(3959 * acos(cos(radians(52.370216)) * cos( radians(address.latitude))
* cos(radians(longitude) - radians(4.895168)) + sin(radians(52.370216)) * sin(radians(latitude)))) AS distance
FROM job
INNER JOIN client on job.client_id = client.id
INNER JOIN address on job.address_id = address.id
LEFT JOIN work_intent on job.id = work_intent.job_id
INNER JOIN job_title on job.job_title_id = job_title.id
WHERE job_title.id = ANY
(SELECT job_title.id FROM job_title WHERE job.job_title_id = '28'
or job.job_title_id = '30'
or job.job_title_id = '31'
or job.job_title_id = '32'
)
AND job.status = 'open'
AND convert(job.starts_at, date) = '2019-09-19'
AND hourly_rate_amount > 1500
GROUP BY job.id
HAVING distance < 20
ORDER BY applicants, distance
LIMIT 5
I expect the output would be:
job.id - company_name - applicants
14842 - company_1 - 0
46983 - company_6 - 0
45110 - company_5 - 0
95625 - company_1 - 1
12055 - company_3 - 2

One quite simple solution, that can be applied without essentially modifyin the logic of the query, is to wrap the query and use ROW_NUMBER() to rank the records. Then, you can filter on the row number to limit the number of records per company.
Consider:
SELECT *
FROM (
SELECT
x.*,
row_number() over(partition by company order by applicants, distance) rn
FROM (
-- your query, without ORDER BY and LIMIT
) x
) y
WHERE rn <= 3
ORDER BY applicants, distance
LIMIT 5

Related

SQL Query to retrieve members who didn't make any payment for the past six months

I have tried to create a query to retrieve members who didn't complete a payment in the past six months. I have two tables, one for the members' details and the other for payment management. I have tried the below code but it doesn't work perfectly.
SELECT *
FROM tbl_members e
LEFT OUTER JOIN tbl_paymgt s on e.MemberID=s.memberID
WHERE (((DATEDIFF(NOW(), s.paidDate)<365)
AND (DATEDIFF(NOW(), s.paidDate)>180)))
AND (e.MembershipStatus=1);
This code only retrieves payments completed within the past six months.

Paraphrased; select all customers that are currently members And have no payments in the last 6 months...
Fixing your query directly, that gives...
SELECT
*
FROM
tbl_members e
LEFT OUTER JOIN
tbl_paymgt s
ON e.MemberID=s.memberID
AND s.PaidDate > NOW() - INTERVAL 6 MONTH
WHERE
e.membershipStatus = 1
AND s.MemberID IS NULL
But, using NOT EXISTS() is more readable and often less costly to run.
SELECT
*
FROM
tbl_members m
WHERE
MembershipStatus = 1
AND
NOT EXISTS (
SELECT *
FROM tbl_paymgt p
WHERE p.PaidDate > NOW() - INTERVAL 6 MONTH
AND p.MemberID = m.MemberID
);

I'd recommend using a CTE to capture the members that have made a payment in the past 6 months and then select members who are not in that CTE result set.
something like:
with payers as (
select MemberID
from tbl_paymgt
where (DATEDIFF(NOW(), paidDate) < 180)
)
Select *
from tbl_members
where memberID not in (select memberID from payers)
and (MembershipStatus=1)

Here is an alternative way to use keyword not exists to exclude the members who completed a payment in the past six months.
SELECT *
FROM tbl_members m
WHERE MembershipStatus = 1
AND NOT EXISTS (
SELECT 1
FROM tbl_paymgt p
WHERE DATEDIFF(NOW(), p.paidDate) < 180
AND p.MemberID = m.MemberID);

List values with MaxDate

Im trying to create ie query to show itens with MAX DATE, but I don´t know how !
Follow the script and result:
Select
results.severity As "Count_severity",
tasks.name As task,
results.host,
to_timestamp(results.date)::date
From
tasks Inner Join
results On results.task = tasks.id
Where
tasks.name Like '%CORP 0%' And
results.severity >= 7 And
results.qod > 70
I need to show only tasks with the last date of each one.
Can you help me ?

You seem to be using Postgres (as suggested by the use of casting operator ::). If so - and I follow you correctly - you can use distinct on:
select distinct on(t.name)
r.severity, t.name as task, r.host, to_timestamp(r.date::bigint)::date
from tasks t
inner join results r on r.task = t.id
where t.name like '%corp 0%' and r.severity >= 7 and r.qod > 70
order by t.name, to_timestamp(r.date::bigint)::date desc
This guarantees one row per task only; which row is picked is controlled by the order by clause, so the above gets the row with the greatest date (time portion left apart). If there are ties, it is undefined which row is returned. You might want to adapt the order by clause to your exact requirement, if it is different than what I understood.
On the other hand, if you want top ties, then use window functions:
select *
from (
select r.severity, t.name as task, r.host, to_timestamp(r.date::bigint)::date,
rank() over(partition by t.name order by to_timestamp(r.date::bigint)::date desc) rn
from tasks t
inner join results r on r.task = t.id
where t.name like '%corp 0%' and r.severity >= 7 and r.qod > 70
) t
where rn = 1

Query (or algorithm?) to find cheapest overlapping flights from two distinct locations to one shared location

Say we have a dataset of 500 000 flights from Los Angeles to 80 cities in Europe and back and from Saint Petersburg to same 80 cities in Europe and back. We want to find such 4 flights:
from LA to city X, from city X back to LA, from St P to city X and from city X back to St P
all 4 flights have to be in a time window of 4 days
we are looking for the cheapest combined price of 4 flights
city X can be any of 80 cities, we want to find such cheapest combination for all of them and get the list of these 80 combinations
The data is stored in BigQuery and I've created an SQL query, but it has 3 joins and I assume that under the hood it can have complexity of O(n^4), because the query didn't finish in 30 minutes and I had to abort it.
Here's the schema for the table:
See the query below:
select * from (
select in_led.`from` as city,
in_led.price + out_led.price + in_lax.price + out_lax.price as total_price,
out_led.carrier as out_led_carrier,
out_led.departure as out_led_departure,
in_led.departure as in_led_date,
in_led.carrier as in_led_carrier,
out_lax.carrier as out_lax_carrier,
out_lax.departure as out_lax_departure,
in_lax.departure as in_lax_date,
in_lax.carrier as in_lax_carrier,
row_number() over(partition by in_led.`from` order by in_led.price + out_led.price + in_lax.price + out_lax.price) as rn
from skyscanner.quotes as in_led
join skyscanner.quotes as out_led on out_led.`to` = in_led.`from`
join skyscanner.quotes as out_lax on out_lax.`to` = in_led.`from`
join skyscanner.quotes as in_lax on in_lax.`from` = in_led.`from`
where in_led.`to` = "LED"
and out_led.`from` = "LED"
and in_lax.`to` in ("LAX", "LAXA")
and out_lax.`from` in ("LAX", "LAXA")
and DATE_DIFF(DATE(in_led.departure), DATE(out_led.departure), DAY) < 4
and DATE_DIFF(DATE(in_led.departure), DATE(out_led.departure), DAY) > 0
and DATE_DIFF(DATE(in_lax.departure), DATE(out_lax.departure), DAY) < 4
and DATE_DIFF(DATE(in_lax.departure), DATE(out_lax.departure), DAY) > 0
order by total_price
)
where rn=1
Additional details:
all flights' departure dates fall in a 120 days window
Questions:
Is there a way to optimize this query for better performance?
How to properly classify this problem? The brute force solution is way too slow, but I'm failing to see what type of problem this is. Certainly doesn't look like something for graphs, kinda feels like sorting the table a couple of times by different fields with a stable sort might help, but still seems sub-optimal.

Below is for BigQuery Standard SQL
The brute force solution is way too slow, but I'm failing to see what type of problem this is.
so I would like to see solutions other than brute force if anyone here has ideas
#standardSQL
WITH temp AS (
SELECT DISTINCT *, UNIX_DATE(DATE(departure)) AS dep FROM `skyscanner.quotes`
), round_trips AS (
SELECT t1.from, t1.to, t2.to AS back, t1.price, t1.departure, t1.dep first_day, t1.carrier, t2.departure AS departure2, t2.dep AS last_day, t2.price AS price2, t2.carrier AS carrier2,
FROM temp t1
JOIN temp t2
ON t1.to = t2.from
AND t1.from = t2.to
AND t2.dep BETWEEN t1.dep + 1 AND t1.dep + 3
WHERE t1.from IN ('LAX', 'LED')
)
SELECT cityX, total_price,
( SELECT COUNT(1)
FROM UNNEST(GENERATE_ARRAY(t1.first_day, t1.last_day)) day
JOIN UNNEST(GENERATE_ARRAY(t2.first_day, t2.last_day)) day
USING(day)
) overlap_days_in_cityX,
(SELECT AS STRUCT departure, price, carrier, departure2, price2, carrier2
FROM UNNEST([t1])) AS LAX_CityX_LAX,
(SELECT AS STRUCT departure, price, carrier, departure2, price2, carrier2
FROM UNNEST([t2])) AS LED_CityX_LED
FROM (
SELECT AS VALUE ARRAY_AGG(t ORDER BY total_price LIMIT 1)[OFFSET(0)]
FROM (
SELECT t1.to cityX, t1.price + t1.price2 + t2.price + t2.price2 AS total_price, t1, t2
FROM round_trips t1
JOIN round_trips t2
ON t1.to = t2.to
AND t1.from < t2.from
AND t1.departure2 > t2.departure
AND t1.departure < t2.departure2
) t
GROUP BY cityX
)
ORDER BY overlap_days_in_cityX DESC, total_price
with output (just top 10 out of total 60 rows)
Brief explanation:
temp CTE: Dedup data and introduce dep field - number of days since epoch to eliminate costly TIMESTAMP functions
round_trips CTE: identify all round trip candidates with at most 4 days apart
identify those LAX and LED round trips which have overlaps
for each cityX take the cheapest combination
final output does extra calculation on overlapping days in cityX and lean a little output to have info about all involve flights
Note: in your data - duration field are all zeros - so it is not involved - but if you would have it - it is easy to add it to logic

the query didn't finish in 30 minutes and I had to abort it.
Is there a way to optimize this query for better performance?
My "generic recommendation" is to always learn the data, profile it, clean it - before actual coding! In your example - the data you shared has 469352 rows full of duplicates. After you remove duplicates - you got ONLY 14867 rows. So then I run your original query against that cleaned data and it took ONLY 97 sec to get result. Obviously, it does not mean we cannot optimize code itself - but at least this addresses your issue with "query didn't finish in 30 minutes and I had to abort it"

SQL aggregate functions and sorting

I am still new to SQL and getting my head around the whole sub-query aggregation to display some results and was looking for some advice:
The tables might look something like:
Customer: (custID, name, address)
Account: (accountID, reward_balance)
Shop: (shopID, name, address)
Relational tables:
Holds (custID*, accountID*)
With (accountID*, shopID*)
How can I find the store that has the least reward_balance?
(The customer info is not required at this point)
I tried:
SELECT accountID AS ACCOUNT_ID, shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM Account, Shop, With
WHERE With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID
ORDER BY MIN(reward_balance);
This works in a way that is not intended:
ACCOUNT_ID | SHOP_ID | LOWEST_BALANCE
1 | 1 | 10
2 | 2 | 40
3 | 3 | 100
4 | 4 | 1000
5 | 4 | 5000
As you can see Shop_ID 4 actually has a balance of 6000 (1000+5000) as there are two customers registered with it. I think I need to SUM the lowest balance of the shops based on their balance and display it from low-high.
I have been trying to aggregate the data prior to display but this is where I come unstuck:
SELECT shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM (SELECT accountID, shopID, SUM(reward_balance)
FROM Account, Shop, With
WHERE
With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID;
When I run something like this statement I get an invalid identifier error.
Error at Command Line : 1 Column : 24
Error report -
SQL Error: ORA-00904: "REWARD_BALANCE": invalid identifier
00904. 00000 - "%s: invalid identifier"
So I figured I might have my joining condition incorrect and the aggregate sorting incorrect, and would really appreciate any general advice.
Thanks for the lengthy read!

Approach this problem one step at time.
We're going to assume (and we should probably check this) that by least reward_balance, that refers to the total of all reward_balance associated with a shop. And we're not just looking for the shop that has the lowest individual reward balance.
First, get all of the individual "reward_balance" for each shop. Looks like the query would need to involve three tables...
SELECT s.shop_id
, a.reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
That will get us the detail rows, every shop along with the individual reward_balance amounts associated with the shop, if there are any. (We're using outer joins for this query, because we don't see any guarantee that a shops is going to be related to at least one account. Even if it's true for this use case, that's not always true in the more general case.)
Once we have the individual amounts, the next step is to total them for each shop. We can do that using a GROUP BY clause and a SUM() aggregate.
SELECT s.shop_id
, SUM(a.reward_balance) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
At this point, with MySQL we could add an ORDER BY clause to arrange the rows in ascending order of tot_reward_balance, and add a LIMIT 1 clause if we only want to return a single row. We can also handle the case when tot_reward_balance is NULL, assigning a zero in place of the NULL.
SELECT s.shop_id
, IFNULL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
ORDER BY tot_reward_amount ASC, s.shop_id ASC
LIMIT 1
If there are two (or more) shops with the same least value of tot_reward_amount, this query returns only one of those shops.
Oracle doesn't have the LIMIT clause like MySQL, but we can get equivalent result using analytic function (which is not available in MySQL). We also replace the MySQL IFNULL() function with the Oracle equivalent NVL() function...
SELECT v.shop_id
, v.tot_reward_balance
, ROW_NUMBER() OVER (ORDER BY v.tot_reward_balance ASC, v.shop_id ASC) AS rn
FROM (
SELECT s.shop_id
, NVL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM shop s
LEFT
JOIN with w
ON w.shop_id = s.shop_id
LEFT
JOIN account a
ON a.account_id = w.account_id
GROUP BY s.shop_id
) v
HAVING rn = 1
Like the MySQL query, this returns at most one row, even when two or more shops have the same "least" total of reward_balance.
If we want to return all of the shops that have the lowest tot_reward_balance, we need to take a slightly different approach.
The best approach to building queries is step wise refinement; in this case, start by getting all of the individual reward_amount for each shop. Next step is to aggregate the individual reward_amount into a total. The next steps is to pickout the row(s) with the lowest total reward_amount.

In SQL Server, You can try using a CTE:
;with cte_minvalue as
(
select rank() over (order by Sum_Balance) as RowRank,
ShopId,
Sum_Balance
from (SELECT Shop.shopID, SUM(reward_balance) AS Sum_Balance
FROM
With
JOIN Shop ON With.ShopId = Shop.ShopId
JOIN Account ON With.AccountId = Account.AccountId
GROUP BY
Shop.shopID)ShopSum
)
select ShopId, Sum_Balance from cte_minvalue where RowRank = 1

ORA-00904 "invalid identifier" but identifier exists in query

I'm working in a fault-reporting Oracle database, trying to get fault information out of it.
The main table I'm querying is Incident, which includes incident information. Each record in Incident may have any number of records in the WorkOrder table (or none) and each record in WorkOrder may have any number of records in the WorkLog table (or none).
What I am trying to do at this point is, for each record in Incident, find the WorkLog with the minimum value in the field MXRONSITE, and, for that worklog, return the MXRONSITE time and the REPORTDATE from the work order. I accomplished this using a MIN subquery, but it turned out that several worklogs could have the same MXRONSITE time, so I was pulling back more records than I wanted. I tried to create a subsubquery for it, but it now says I have an invalid identifier (ORA-00904) for WOL1.WONUM in the WHERE line, even though that identifier is in use elsewhere.
Any help is appreciated. Note that there is other stuff in the query, but the rest of the query works in isolation, and this but doesn't work in the full query or on its own.
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM)
ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE (WL1.WORKLOGID IN
(SELECT MIN(WL3.WORKLOGID)
FROM (SELECT MIN(WL3.MXRONSITE), WL3.WORKLOGID
FROM Maximo.Worklog WL3 WHERE WOL1.WONUM = WL3.RECORDKEY))
or WL1.WORKLOGID is null)
To clarify, what I want is:
For each fault in Incident,
the earliest MXRONSITE from the Worklog table (if such a value exists),
For that worklog, information from the associated record from the WorkOrder table.
This is complicated by Incident records having multiple work orders, and work orders having multiple work logs, which may have the same MXRONSITE time.
After some trials, I have found an (almost) working solution:
WITH WLONSITE as (
SELECT
MIN(WLW.MXRONSITE) as "ONSITE",
WLWOW.ORIGRECORDID as "TICKETID",
WLWOW.WONUM as "WONUM"
FROM
MAXIMO.WORKLOG WLW
INNER JOIN
MAXIMO.WORKORDER WLWOW
ON
WLW.RECORDKEY = WLWOW.WONUM
WHERE
WLWOW.ORIGRECORDCLASS = 'INCIDENT'
GROUP BY
WLWOW.ORIGRECORDID, WLWOW.WONUM
)
select
incident.ticketid,
wlonsite.onsite,
wlonsite.wonum
from
maximo.incident
LEFT JOIN WLONSITE
ON WLONSITE.TICKETID = Incident.TICKETID
WHERE
(WLONSITE.ONSITE is null or WLONSITE.ONSITE = (SELECT MIN(WLONSITE.ONSITE) FROM WLONSITE WHERE WLONSITE.TICKETID = Incident.TICKETID AND ROWNUM=1))
AND Incident.AFFECTEDDATE >= TO_DATE ('01/12/2015', 'DD/MM/YYYY')
This however is significantly slower, and also still not quite right, as it turns out a single Incident can have multiple Work Orders with the same ONSITE time (aaargh!).
As requested, here is a sample input, and what I want to get from it (apologies for the formatting). Note that while TICKETID and WONUM are primary keys, they are strings rather than integers. WORKLOGID is an integer.
Incident table:
TICKETID / Description / FieldX
1 / WORD1 / S
2 / WORD2 / P
3 / WORDX /
4 / / Q
Work order table:
WONUM / ORIGRECORDID / REPORTDATE
11 / 1 / 2015-01-01
12 / 2 / 2015-01-01
13 / 2 / 2015-02-04
14 / 3 / 2015-04-05
Worklog table:
WORKLOGID / RECORDKEY / MXRONSITE
101 / 11 / 2015-01-05
102 / 12 / 2015-01-04
103 / 12 /
104 / 12 / 2015-02-05
105 / 13 /
Output:
TICKETID / WONUM / WORKLOGID
1 / 11 / 101
2 / 12 / 102
3 / /
4 / /
(Worklog 101 linked to TICKETID 1, has non-null MXRONSITE, and is from work order 11)
(Worklogs 102-105 linked to TICKETID 2, of which 102 has lowest MXRONSITE, and is work order 12)
(No work logs associated with faults 103 or 104, so work order and worklog fields are null)
Post Christmas attack!
I have found a solution which works:
The method I found was to use multiple WITH queries, as follows:
WLMINL AS (
SELECT
RECORDKEY, MXRONSITE, MIN(WORKLOGID) AS "WORKLOG"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY, MXRONSITE
),
WLMIND AS (
SELECT
RECORDKEY, MIN(MXRONSITE) AS "MXRONSITE"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY
),
WLMIN AS (
SELECT
WLMIND.RECORDKEY AS "WONUM", WLMIND.MXRONSITE AS "ONSITE", WLMINL.WORKLOG AS "WORKLOGID"
FROM
WLMIND
INNER JOIN
WLMINL
ON
WLMIND.RECORDKEY = WLMINL.RECORDKEY AND WLMIND.MXRONSITE = WLMINL.MXRONSITE
)
Thus for each work order finding the first date, then for each work order and date finding the lowest worklogid, then joining the two tables. This is then repeated at a higher level to find the data by incident.
However this method does not work in a reasonable time, so while it may be suitable for smaller databases it's no good for the behemoths I'm working with.

I would do this with row_number function:
SQLFiddle
select ticketid, case when worklogid is not null then reportdate end d1, mxronsite d2
from (
select i.ticketid, wo.reportdate, wl.mxronsite, wo.wonum, wl.worklogid,
row_number() over (partition by i.ticketid
order by wl.mxronsite, wo.reportdate) rn
from incident i
left join workorder wo on wo.origrecordid = i.ticketid
and wo.origrecordclass = 'INCIDENT'
left join worklog wl on wl.recordkey = wo.wonum )
where rn = 1 order by ticketid

When you nest subqueries, you cannot access columns that belong two or more levels higher; in your statement, WL1 is not accessible in the innermost subquery. (There is also a group-by clause missing, btw)
This might work (not exactly sure what output you expect, but try it):
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (
Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM
) ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE WL1.WORKLOGID =
( SELECT MIN(WL3.WORKLOGID)
FROM Maximo.WorkOrder WOL3
LEFT JOIN Maximo.Worklog WL3
ON WL3.RECORDKEY = WOL3.WONUM
WHERE WOL3.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL3.MXRONSITE IS NOT NULL
)
OR WL1.WORKLOGID IS NULL AND NOT EXISTS
( SELECT MIN(WL4.WORKLOGID)
FROM Maximo.WorkOrder WOL4
LEFT JOIN Maximo.Worklog WL4
ON WL4.RECORDKEY = WOL4.WONUM
WHERE WOL4.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL4.MXRONSITE IS NOT NULL )

I may not have the details right on what you're trying to do... if you have some sample input and desired output, that would be a big help.
That said, I think an analytic function would help a lot, not only in getting the output but in organizing the code. Here is an example of how the max analytic function in a subquery could be used.
Again, the details on the join may be off -- if you can furnish some sample input and output, I'll bet someone can get to where you're trying to go:
with wo as (
select
wonum, origrecordclass, origrecordid, reportdate,
max (reportdate) over (partition by origrecordid) as max_date
from Maximo.workorder
where origrecordclass = 'INCIDENT'
),
logs as (
select
worklogid, mxronsite, recordkey,
max (mxronsite) over (partition by recordkey) as max_mx
from Maximo.worklog
)
select
i.ticketid,
l.mxronsite as "Date_First_Onsite",
wo.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo on
wo.origrecordid = i.ticketid and
wo.reportdate = wo.max_date
left join logs l on
wo.wonum = l.recordkey and
l.mxronsite = l.max_mx
-- edit --
Based on your sample input and desired output, this appears to give the desired result. It does do somewhat of an explosion in the subquery, but hopefully the efficiency of the analytic functions will dampen that. They are typically much faster, compared to using group by:
with wo_logs as (
select
wo.wonum, wo.origrecordclass, wo.origrecordid, wo.reportdate,
l.worklogid, l.mxronsite, l.recordkey,
max (reportdate) over (partition by origrecordid) as max_date,
min (mxronsite) over (partition by recordkey) as min_mx
from
Maximo.workorder wo
left join Maximo.worklog l on wo.wonum = l.recordkey
where wo.origrecordclass = 'INCIDENT'
)
select
i.ticketid, wl.wonum, wl.worklogid,
wl.mxronsite as "Date_First_Onsite",
wl.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo_logs wl on
i.ticketid = wl.origrecordid and
wl.mxronsite = wl.min_mx
order by 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filter duplicated value in SQL - sql

Related

SQL Query to retrieve members who didn't make any payment for the past six months

List values with MaxDate

Query (or algorithm?) to find cheapest overlapping flights from two distinct locations to one shared location

SQL aggregate functions and sorting

ORA-00904 "invalid identifier" but identifier exists in query

Categories

Resources