Finding a better way to get the most recent status - sql

I have this way that I am checking the most recent status by grabbing the related detail record with the max(created_on) date.
It works, but it just seems really clumsy, and I am hoping to find a better way.
select rd.stat_code from rec r, rec_detl rd
where r.id = rd.rec_id
and r.id = 13455478
and rd.id in (
select id from rec_detl rd1
where rd1.rec_id = r.id
and rd1.created_on = (
select max(created_on) from rec_detl rd2
where rd2.rec_id = r.id
));
In that example, the rec_detl table has a stat_code column indicating the status. The rec_detl table has a foreign key that points to the rec table, so each row from the rec table corresponds to multiple rec_detl records.
Basically, a new rec_detl is inserted each time the rec record has its status updated.
So my questions are "Is there some better and less clumsy way of achieving the same thing and grabbing the most recent status?" and "other than the fact this way is sort of clumsy and ugly, is there anything wrong with this approach?"

For a single record you may use fetch first 1 rows only addition, avaliable since 12c.
select *
from rec_detl
where rec_id = 13455478
order by created_on desc
fetch first 1 rows only
As long as you do not use any column from rec table, there's no need to join anything.
If you need to have rec columns with rec_detl columns for small amount of rows and with index on rec_detl.rec_id column, this may be adapted to lateral join. Small is because of this query introduces nested loop lookups.
select *
from rec
left join lateral(
select *
from rec_det
where rec.id = rec_det.rec_id
order by created_on desc
fetch first 1 rows only
) q
on 1 = 1
For mass processing it would be better to use keep dense_rank first, because if will use general join (it will be hash for large datasets in general). But it will require to list all the columns explicitly either aggregated or grouped by:
select
rec.id
, max(rec.val) as val
, max(rec_det.status) keep(dense_rank first order by created_on desc) as last_status
from rec
left join rec_det
on rec.id = rec_det.rec_id
group by
rec.id
db<>fiddle here

Related

Limit number of rows fetching in a left outer join in Oracle

I'm going to create a data model in oracle fusion applications. I need to create column End_date based on two tables in the query. So I used two methods.
Using a subquery:
SELECT *
FROM (SELECT projects_A.end_date
FROM projects_A, projects_B
WHERE projects_A.p_id = projects_B.p_id
AND rownum = 1)
Using a LEFT OUTER JOIN:
SELECT projects_A.end_date
FROM projects_A
LEFT JOIN projects_B
ON projects_A.p_id = projects_B.p_id
WHERE rownum = 1
Here when I used a subquery, the query returns the results as expected. But when I use left outer join with WHERE rownum = 1 the result is zero. Without WHERE rownum = 1 it retrieves all the results. But I want only the first result. So how can I do that using left outer join? Thank you.
Looks like you want to bring a non-null end_date value(So, add NULLS LAST), but the sorting order is not determined yet(you might add a DESC to the end of the ORDER BY clause depending on this fact ), and use FETCH clause(the DB version is 12c+ as understood from the comment) with ONLY option to exclude ties as yo want to bring only single row.
So, you can use the following query :
SELECT A.end_date
FROM projects_A A
LEFT JOIN projects_B B
ON A.p_id = B.p_id
ORDER BY end_date NULLS LAST
FETCH FIRST 1 ROW ONLY

Bigquery SQL code to pull earliest contact

I have a copy of our salesforce data in bigquery, I'm trying to join the contact table together with the account table.
I want to return every account in the dataset but I only want the contact that was created first for each account.
I've gone around and around in circles today googling and trying to cobble a query together but all roads either lead to no accounts, a single account or loads of contacts per account (ignoring the earliest requirement).
Here's the latest query. that produces no results. I think I'm nearly there but still struggling. any help would be most appreciated.
SELECT distinct
c.accountid as Acct_id
,a.id as a_Acct_ID
,c.id as Cont_ID
,a.id AS a_CONT_ID
,c.email
,c.createddate
FROM `sfdcaccounttable` a
INNER JOIN `sfdccontacttable` c
ON c.accountid = a.id
INNER JOIN
(SELECT a2.id, c2.accountid, c2.createddate AS MINCREATEDDATE
FROM `sfdccontacttable` c2
INNER JOIN `sfdcaccounttable` a2 ON a2.id = c2.accountid
GROUP BY 1,2,3
ORDER BY c2.createddate asc LIMIT 1) c3
ON c.id = c3.id
ORDER BY a.id asc
LIMIT 10
The solution shared above is very BigQuery specific: it does have some quirks you need to work around like the memory error you got.
I once answered a similar question here that is more portable and easier to maintain.
Essentially you need to create a smaller table(even better to make it a view) with the ID and it's first transaction. It's similar to what you shared by slightly different as you need to group ONLY in the topmost query.
It looks something like this
select
# contact ids that are first time contacts
b.id as cont_id,
b.accountid
from `sfdccontacttable` as b inner join
( select accountid,
min(createddate) as first_tx_time
FROM `sfdccontacttable`
group by 1) as a on (a.accountid = b.accountid and b.createddate = a.first_tx_time)
group by 1, 2
You need to do it this way because otherwise you can end up with multiple IDs per account (if there are any other dimensions associated with it). This way also it is kinda future proof as you can have multiple dimensions added to the underlying tables without affecting the result and also you can use a where clause in the inner query to define a "valid" contact and so on. You can then save that as a view and simply reference it in any subquery or join operation
Setup a view/subquery for client_first or client_last
as:
SELECT * except(_rank) from (
select rank() over (partition by accountid order by createddate ASC) as _rank,
*
FROM `prj.dataset.sfdccontacttable`
) where _rank=1
basically it uses a Window function to number the rows, and return the first row, using ASC that's first client, using DESC that's last client entry.
You can do that same for accounts as well, then you can join two simple, as exactly 1 record will be for each entity.
UPDATE
You could also try using ARRAY_AGG which has less memory footprint.
#standardSQL
SELECT e.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.createddate ASC LIMIT 1
)[OFFSET(0)] e
FROM `dataset.sfdccontacttable` t
GROUP BY t.accountid
)

SQL grouping. How to select row with the highest column value when joined. No CTEs please

I've been banging my head against the wall for something that I think should be simple but just cant get to work.
I'm trying to retrieve the row with the highest multi_flag value when I join table A and table B but I can't seem to get the SQL right because it returns all the rows rather than the one with the highest multi_flag value.
Here are my tables...
Table A
Table B
This is almost my desired result but only if I leave out the value_id row
SELECT CATALOG, VENDOR_CODE, INVLINK, NAME_ID, MAX(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE AS A
INNER JOIN TBLATTRIBUTE_VALUE AS B
ON A.VALUE_ID = B.VALUE_ID
GROUP BY CATALOG, VENDOR_CODE, INVLINK, NAME_ID
ORDER BY CATALOG DESC
This is close to what I want to retreive but not quite notice how it returns unique name_id and the highest multi_flag but I also need the value_id that belongs to such multi_flag / name_id grouping...
If I include the value_id in my SQL statement then it returns all rows and is no longer grouped
Notic ein the results below how it no longer returns the row for the highest multi_flag and how all the different values for name_id (Ex. name_id 1) are also returned
You can choose to use a sub-query, derived table or CTE to solve this problem. Performance will be depending on the amount of data you are querying. To achieve your goal of getting the max multiflag you must first get the max value based on the grouping you want to achieve this you can use a CTE or sub query. The below CTE will give the max multi_flag by value that you can use to get the max multi_flag and then you can use that to join back to your other tables. I have three joins in this example but this can be reduce and as far a performance it may be better to use a subquery but you want know until you get the se the actual execution plans side by side.
;with highest_multi_flag as
(
select value_id, max(multi_flag) AS multiflag
FROM TBLINVENT_ATTRIBUTE
group by value_id
)
select A.CATALOG, a.VENDOR_CODE, a.INVLINK, b.NAME_ID,m.multiflag
from highest_multi_flag m
inner join TBLINVENT_ATTRIBUTE AS A on a.VALUE_ID =b. m.VALUE_ID
INNER JOIN TBLATTRIBUTE_VALUE AS B ON m.VALUE_ID = B.VALUE
You can use Lateral too, its an other solution
SELECT
A.CATALOG, A.VENDOR_CODE, A.INVLINK, B.NAME_ID, M.maxmultiflag
FROM TBLINVENT_ATTRIBUTE AS A
inner join lateral
(
select max(B.multi_flag) as maxmultiflag from TBLINVENT_ATTRIBUTE C
where A.VALUE_ID = C.VALUE_ID
) M on 1=1
INNER JOIN TBLATTRIBUTE_VALUE AS B ON M.maxmultiflag = B.VALUE

Inner Join a Table to Itself

I have a table that uses two identifying columns, let's call them id and userid. ID is unique in every record, and userid is unique to the user but is in many records.
What I need to do is get a record for the User by userid and then join that record to the first record we have for the user. The logic of the query is as follows:
SELECT v1.id, MIN(v2.id) AS entryid, v1.userid
FROM views v1
INNER JOIN views v2
ON v1.userid = v2.userid
I'm hoping that I don't have to join the table to a subquery that handles the min() piece of the code as that seems to be quite slow.
I guess (it's not entirely clear) you want to find for every user, the rows of the table that have minimum id, so one row per user.
In that case, you an use a subquery (a derived table) and join it to the table:
SELECT v.*
FROM views AS v
JOIN
( SELECT userid, MIN(id) AS entryid
FROM views
GROUP BY userid
) AS vm
ON vm.userid = v.userid
AND vm.entryid = v.id ;
The above can also be written using a Common Table Expression (CTE), if you like them:
; WITH vm AS
( SELECT userid, MIN(id) AS entryid
FROM views
GROUP BY userid
)
SELECT v.*
FROM views AS v
JOIN vm
ON vm.userid = v.userid
AND vm.entryid = v.id ;
Both would be quite efficient with an index on (userid, id).
With SQL-Server, you could write this using the ROW_NUMBER() window function:
; WITH viewsRN AS
( SELECT *
, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY id) AS rn
FROM views
)
SELECT * --- skipping the "rn" column
FROM viewsRN
WHERE rn = 1 ;
Well, to use the MIN function along with non-aggregate columns, you'd have to group the statement. That's possible with the query you have... (EDIT based on additional info)
SELECT MIN(v2.id) AS entryid, v1.id, v1.userid
FROM views v1
INNER JOIN views v2
ON v1.userid = v2.userid
GROUP BY v1.id, v1.userid
... however if this is just a simple example and you're looking to pull more data with this query, it quickly becomes an unfeasible solution.
What you seem to want is a list of all the user data in this view, with a link on each row leading back to the "first" record that exists for the same user. The above query will get you what you want, but there are much easier ways to determine the first record for each user:
SELECT v1.id, v1.userid
FROM views v1
ORDER BY v1.userid, v1.id
The first record for each unique user is your "entry point". I think I understand why you want to do it the way you specified, and the first query I gave will be reasonably performant, but you'll have to consider whether not having to use the order by clause to get the correct answer is worth it.
edit-1: as pointed out in the comments, this solution also uses a sub-query. However, it does not use aggregate functions, which (depending on the database) might have a huge impact on the performance.
Can achieve without sub-query (see below).
Obviously, an index on views.userid is of tremedous value for the performance.
SELECT v1.*
FROM views v1
WHERE v1.id = (
SELECT TOP 1 v2.id
FROM views v2
WHERE v2.userid = v1.userid
ORDER BY v2.id ASC
)

How can you add 2 joins in a subquery?

I am trying to get information from 3 tables in my database. I am trying to get 4 fields. 'kioskid', 'kioskhours', 'videotime', 'sessiontime'. In order to do this, i am trying a join in a subquery. This is what I have so far:
SELECT k.kioskid, k.hours, v.time, s.time
FROM `nsixty_kiosks` as k
LEFT JOIN (SELECT time
FROM `nsixty_videos`
ORDER BY videoid) as v
ON kioskid = k.kioskid LEFT JOIN
(SELECT kioskid, time
FROM `sessions`
ORDER BY pingid desc LIMIT 1) as s ON s.kioskid = k.kioskid
WHERE hours is NOT NULL
When I run this query, it works but it shows every row instead of just showing the last row of each kiosk id. Which is meant to show based on the line 'ORDER BY pingid desc LIMIT 1'.
Any body have some ideas?
Instead of joining to s, you can use a correlated subquery:
SELECT k.kioskid,
k.hours,
v.time,
( SELECT time
FROM sessions
WHERE sessions.kioskid = k.kioskid
ORDER
BY pingid DESC
LIMIT 1
)
FROM nsixty_kiosks AS k
LEFT
JOIN ( SELECT time
FROM `nsixty_videos`
ORDER BY videoid
) AS v
ON kioskid = k.kioskid
WHERE hours IS NOT NULL
;
N.B. I didn't fix your LEFT JOIN (...) AS v, because I don't understand what it's trying to do, but it too is broken; the ON clause doesn't refer to any of its columns, and there's no point in having an ORDER BY in a subquery unless you also have a LIMIT or whatnot in there.
Well, your join on the 'v' subquery doesn't actually reference the 'v' subquery, nor does the 'v' subquery even contain a kioskid field to JOIN on, so that's undoubtedly part of the problem.
To go much further we'd need to see schema and sample data.