mystery on BigQuery views - sql

Here is my mistery. On the console, when I compute this query, it works perfectly well:
SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id, 1 AS dummy FROM bq_000010.table) rd
INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy);
Then I save it as a view called dataset.myview, and run:
SELECT * FROM dataset.myview LIMIT 1000
But this raises the following error:
SELECT query which references non constant fields or uses aggregation
functions or has one or more of WHERE, OMIT IF, GROUP BY, ORDER BY
clauses must have FROM clause.
Nevertheless, when I try: SELECT * FROM dataset.myview, i.e. without the LIMIT, it works !!
And in fact, when I run my full query with the LIMIT at the bottom, It also raises the error:
SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id, 1 AS dummy FROM bq_000010.table) rd
INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy) LIMIT 1000;
Nevertheless, when I add an internal ORDER BY, it computes well again:
SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id,
1 AS dummy
FROM bq_000010.000010_flux_visites_ds
ORDER BY ds_id) rd
INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy) LIMIT 1000

What happens if you apply an order by to your select on the view? or do you require random results?
A query with a LIMIT clause may still be non-deterministic if there is no operator in the query that guarantees the ordering of the output result set. This is because BigQuery executes using a large number of parallel workers. The order in which parallel jobs return is not guaranteed.
I'm not sure why the order by here would make a difference. However, It's generally odd to see a limit without any order by; which is why I asked about order. A complete SWAG is that perhaps the parallel workers are completing the outer join and limit before the inner select is complete causing an internal error; and by applying an order by the system is forced to materialize the record before executing the inner join join.
But I really have ~~NO CLUE~

Related

Fastest way to count from a subquery

I have the following query to return a list of current employees and the number of 'corrections' they have. This is working correctly but is very slow.
I was previously not using a subquery, instead opting for a count (from...) as an aggregate subselect but I have read that a subquery should be much faster. Changing to the code to the below did improve performance but not anywhere near what I was expecting.
SELECT DISTINCT
tblStaff.StaffID, CorrectionsOut.Count AS CorrectionsAssigned
FROM tblStaff
LEFT JOIN tblMeetings ON tblMeetings.StaffID = tblStaff.StaffID
JOIN tblTasks ON tblTasks.TaskID = tblMeetings.TaskID
--Get Corrections Issued
LEFT JOIN(
SELECT
COUNT(DISTINCT tblMeetings.TaskID) AS Count, tblMeetings.StaffID
FROM tblRegister
JOIN tblMeetings ON tblRegister.MeetingID = tblMeetings.MeetingID
WHERE tblRegister.FDescription IS NOT NULL
AND tblRegister.CorrectionOutDate IS NULL
GROUP BY tblMeetings.StaffID
) AS CorrectionsOut ON CorrectionsOut.StaffID = tblStaff.StaffID
WHERE tblStaff.CurrentEmployee = 1
I need an open vendor solution as we are transitioning from SQL Server to Postgres. Note this is a simplified example of the query where there are quite few counts. My current query time without the counts is less than half a second, but with the counts, is approx 20 seconds, if it runs at all without locking or otherwise failing.
I would get rid of the joins that you are not using which probably makes the SELECT DISTINCT unnecessary as well:
SELECT s.StaffID, co.Count AS CorrectionsAssigned
FROM tblStaff s LEFT JOIN
(SELECT COUNT(DISTINCT m.TaskID) AS Count, m.StaffID
FROM tblRegister r
tblMeetings m
ON r.MeetingID = m.MeetingID
WHERE r.FDescription IS NOT NULL AND
r.CorrectionOutDate IS NULL
GROUP BY m.StaffID
) co
ON co.StaffID = s.StaffID
WHERE s.CurrentEmployee = 1;
Getting rid of the SELECT DISTINCT and the duplicate rows added by the tasks should help performance.
For additional benefit, you would want to be sure you have indexes on the JOIN keys, and perhaps on the filtering criteria.

Subquery that accesses main table fields combined with LIMIT clause in Oracle SQL

I got a table Users and a table Tasks. Tasks are ordered by importance and are assigned to a user's task list. Tasks have a status: ready or not ready. Now, I want to list all users with their most important task that is also ready.
The interesting requirement that the tasks for each user first need to be filtered and sorted, and then the most important one should be selected. This is what I came up with:
SELECT Users.name,
(SELECT *
FROM (SELECT Tasks.description
FROM Tasks
WHERE Tasks.taskListCode = Users.taskListCode AND Tasks.isReady
ORDER BY Tasks.importance DESC)
WHERE rownum = 1
) AS nextTask
FROM Users
However, this results in the error
ORA-00904: "Users"."taskListCode": invalid identifier
I think the reason is that oracle does not support correlating subqueries with more than one level of depth. However, I need two levels so that I can do the WHERE rownum = 1.
I also tried it without a correlating subquery:
SELECT Users.name, Task.description
FROM Users
LEFT JOIN Tasks nextTask ON
nextTask.taskListCode = Users.taskListCode AND
nextTask.importance = MAX(
SELECT tasks.importance
FROM tasks
WHERE tasks.isReady
GROUP BY tasks.id
)
This results in the error
ORA-00934: group function is not allowed here
How would I solve the problem?
One work-around for this uses keep:
SELECT u.name,
(SELECT MAX(t.description) KEEP (DENSE_RANK FIRST ORDER BY T.importance DESC)
FROM Tasks t
WHERE t.taskListCode = u.taskListCode AND t.isReady
) as nextTask
FROM Users u;
Please try with analytic function:
with tp as (select t.*, row_number() over (partition by taskListCode order by importance desc) r
from tasks t
where isReady = 1 /*or 'Y' or what is positive value here*/)
select u.name, tp.description
from users u left outer join tp on (u.taskListCode = tp.taskListCode)
where tp.r = 1;
Here is a solution that uses aggregation rather than analytic functions. You may want to run this against the analytic functions solution to see which is faster; in many cases aggregate queries are (slightly) faster, but it depends on your data, on index usage, etc.
This solution is similar to what Gordon tried to do. I don't know why he wrote it using a correlated subquery instead of a straight join (and don't know if it will work - I've never seen the FIRST/LAST function used with correlated subqueries like that).
It may not work exactly right if there may be NULL in the importance column - then you will need to add nulls first after t.importance and before ). Note: the max(t.description) is needed, because there may be ties by "importance" (two tasks with the same, highest importance for a given user). In that case, one task must be chosen. If the ordering by importance is strict (no ties), then the MAX() does nothing as it selects the MAX over a set of exactly one value, but the compiler doesn't know that beforehand so it does need the MAX().
select u.name,
max(t.description) keep (dense_rank last order by t.importance) as descr
from users u left outer join tasks t on u.tasklistcode = t.tasklistcode
where t.isready = 'Y'
group by u.name

SQL Performance: SELECT DISTINCT versus GROUP BY

I have been trying to improve query times for an existing Oracle database-driven application that has been running a little sluggish. The application executes several large queries, such as the one below, which can take over an hour to run. Replacing the DISTINCT with a GROUP BY clause in the query below shrank execution time from 100 minutes to 10 seconds. My understanding was that SELECT DISTINCT and GROUP BY operated in pretty much the same way. Why such a huge disparity between execution times? What is the difference in how the query is executed on the back-end? Is there ever a situation where SELECT DISTINCT runs faster?
Note: In the following query, WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A' represents just one of a number of ways that results can be filtered. This example was provided to show the reasoning for joining all of the tables that do not have columns included in the SELECT and would result in about a tenth of all available data
SQL using DISTINCT:
SELECT DISTINCT
ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS,
(SELECT COUNT(PKID)
FROM ITEM_PARENTS
WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID
) AS CHILD_COUNT
FROM
ITEMS
INNER JOIN ITEM_TRANSACTIONS
ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID
AND ITEM_TRANSACTIONS.FLAG = 1
LEFT OUTER JOIN ITEM_METADATA
ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID
LEFT OUTER JOIN JOB_INVENTORY
ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID
LEFT OUTER JOIN JOB_TASK_INVENTORY
ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID
LEFT OUTER JOIN JOB_TASKS
ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID
LEFT OUTER JOIN JOBS
ON JOB_TASKS.JOB_ID = JOBS.JOB_ID
LEFT OUTER JOIN TASK_INVENTORY_STEP
ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID
LEFT OUTER JOIN TASK_STEP_INFORMATION
ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID
WHERE
TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
ORDER BY
ITEMS.ITEM_CODE
SQL using GROUP BY:
SELECT
ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS,
(SELECT COUNT(PKID)
FROM ITEM_PARENTS
WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID
) AS CHILD_COUNT
FROM
ITEMS
INNER JOIN ITEM_TRANSACTIONS
ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID
AND ITEM_TRANSACTIONS.FLAG = 1
LEFT OUTER JOIN ITEM_METADATA
ON ITEMS.ITEM_ID = ITEM_METADATA.ITEM_ID
LEFT OUTER JOIN JOB_INVENTORY
ON ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID
LEFT OUTER JOIN JOB_TASK_INVENTORY
ON JOB_INVENTORY.JOB_ITEM_ID = JOB_TASK_INVENTORY.JOB_ITEM_ID
LEFT OUTER JOIN JOB_TASKS
ON JOB_TASK_INVENTORY.TASKID = JOB_TASKS.TASKID
LEFT OUTER JOIN JOBS
ON JOB_TASKS.JOB_ID = JOBS.JOB_ID
LEFT OUTER JOIN TASK_INVENTORY_STEP
ON JOB_INVENTORY.JOB_ITEM_ID = TASK_INVENTORY_STEP.JOB_ITEM_ID
LEFT OUTER JOIN TASK_STEP_INFORMATION
ON TASK_INVENTORY_STEP.JOB_ITEM_ID = TASK_STEP_INFORMATION.JOB_ITEM_ID
WHERE
TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
GROUP BY
ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS
ORDER BY
ITEMS.ITEM_CODE
Here is the Oracle query plan for the query using DISTINCT:
Here is the Oracle query plan for the query using GROUP BY:
The performance difference is probably due to the execution of the subquery in the SELECT clause. I am guessing that it is re-executing this query for every row before the distinct. For the group by, it would execute once after the group by.
Try replacing it with a join, instead:
select . . .,
parentcnt
from . . . left outer join
(SELECT PARENT_ITEM_ID, COUNT(PKID) as parentcnt
FROM ITEM_PARENTS
) p
on items.item_id = p.parent_item_id
I'm fairly sure that GROUP BY and DISTINCT have roughly the same execution plan.
The difference here since we have to guess (since we don't have the explain plans) is IMO that the inline subquery gets executed AFTER the GROUP BY but BEFORE the DISTINCT.
So if your query returns 1M rows and gets aggregated to 1k rows:
The GROUP BY query would have run the subquery 1000 times,
Whereas the DISTINCT query would have run the subquery 1000000 times.
The tkprof explain plan would help demonstrate this hypothesis.
While we're discussing this, I think it's important to note that the way the query is written is misleading both to the reader and to the optimizer: you obviously want to find all rows from item/item_transactions that have a TASK_INVENTORY_STEP.STEP_TYPE with a value of "TYPE A".
IMO your query would have a better plan and would be more easily readable if written like this:
SELECT ITEMS.ITEM_ID,
ITEMS.ITEM_CODE,
ITEMS.ITEMTYPE,
ITEM_TRANSACTIONS.STATUS,
(SELECT COUNT(PKID)
FROM ITEM_PARENTS
WHERE PARENT_ITEM_ID = ITEMS.ITEM_ID) AS CHILD_COUNT
FROM ITEMS
JOIN ITEM_TRANSACTIONS
ON ITEMS.ITEM_ID = ITEM_TRANSACTIONS.ITEM_ID
AND ITEM_TRANSACTIONS.FLAG = 1
WHERE EXISTS (SELECT NULL
FROM JOB_INVENTORY
JOIN TASK_INVENTORY_STEP
ON JOB_INVENTORY.JOB_ITEM_ID=TASK_INVENTORY_STEP.JOB_ITEM_ID
WHERE TASK_INVENTORY_STEP.STEP_TYPE = 'TYPE A'
AND ITEMS.ITEM_ID = JOB_INVENTORY.ITEM_ID)
In many cases, a DISTINCT can be a sign that the query is not written properly (because a good query shouldn't return duplicates).
Note also that 4 tables are not used in your original select.
The first thing that should be noted is the use of Distinct indicates a code smell, aka anti-pattern. It generally means that there is a missing join or an extra join that is generating duplicate data. Looking at your query above, I am guessing that the reason why group by is faster (without seeing the query), is that the location of the group by reduces the number of records that end up being returned. Whereas distinct is blowing out the result set and doing row by row comparisons.
Update to approach
Sorry, I should have been more clear. Records are generated when
users perform certain tasks in the system, so there is no schedule. A
user could generate a single record in a day or hundreds per-hour. The
important things is that each time a user runs a search, up-to-date
records must be returned, which makes me doubtful that a materialized
view would work here, especially if the query populating it would take
long to run.
I do believe this is the exact reason to use a materialized view. So the process would work this way. You take the long running query as the piece that builds out your materialized view, since we know the user only cares about "new" data after they perform some arbitrary task in the system. So what you want to do is query against this base materialized view, which can be refreshed constantly on the back-end, the persistence strategy involved should not choke out the materialized view (persisting a few hundred records at a time won't crush anything). What this will allow is Oracle to grab a read lock (note we don't care how many sources read our data, we only care about writers). In the worst case a user will have "stale" data for microseconds, so unless this is a financial trading system on Wall Street or a system for a nuclear reactor, these "blips" should go unnoticed by even the most eagle eyed users.
Code example of how to do this:
create materialized view dept_mv FOR UPDATE as select * from dept;
Now the key to this is as long as you don' t invoke refresh you won't lose any of the persisted data. It will be up to you to determine when you want to "base line" your materialized view again (midnight perhaps?)
You should use GROUP BY to apply aggregate operators to each group and DISTINCT if you only need to remove duplicates.
I think the performance is the same.
In your case i think you should use GROUP BY.

Is there a way to get different results for the same SQL query if the data stays the same?

I get a different result set for this query intermittently when I run it...sometimes it gives 1363, sometimes 1365 and sometimes 1366 results. The data doesn't change. What could be causing this and is there a way to prevent it? Query looks something like this:
SELECT *
FROM
(
SELECT
RC.UserGroupId,
RC.UserGroup,
RC.ClientId AS CLID,
CASE WHEN T1.MultipleClients = 1 THEN RC.Salutation1 ELSE RC.DisplayName1 END AS szDisplayName,
T1.MultipleClients,
RC.IsPrimaryRecord,
RC.RecordTypeId,
RC.ClientTypeId,
RC.ClientType,
RC.IsDeleted,
RC.IsCompany,
RC.KnownAs,
RC.Salutation1,
RC.FirstName,
RC.Surname,
Relationship,
C.DisplayName Client,
RC.DisplayName RelatedClient,
E.Email,
RC.DisplayName + ' is the ' + R.Relationship + ' of ' + C.DisplayName Description,
ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id
FROM
SSDS.Client.ClientExtended C
INNER JOIN
SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID
INNER JOIN
SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
LEFT OUTER JOIN
SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId
LEFT OUTER JOIN
SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId
INNER JOIN
(
SELECT
E.Email,
CASE WHEN (COUNT(DISTINCT RC.DisplayName) > 1) THEN 1 ELSE 0 END AS MultipleClients
FROM
SSDS.Client.ClientExtended C
INNER JOIN
SSDS.Client.ClientRelationship R WITH (NOLOCK)ON C.ClientId = R.ClientID
INNER JOIN
SSDS.Client.ClientExtended RC WITH (NOLOCK)ON R.RelatedClientId = RC.ClientId
LEFT OUTER JOIN
SSDS.Client.Email E WITH (NOLOCK)ON RC.ClientId = E.ClientId
LEFT OUTER JOIN
SSDS.Client.UserDefinedData UD WITH (NOLOCK)ON C.ClientId = UD.ClientId AND C.UserGroupId = UD.UserGroupId
WHERE
Relationship IN ('z-Group Principle', 'z-Group Member ')
AND E.Email IS NOT NULL
GROUP BY E.Email
) T1 ON E.Email = T1.Email
WHERE
Relationship IN ('z-Group Principle', 'z-Group Member ')
AND E.Email IS NOT NULL
) T
WHERE
sequence_id = 1
AND T.UserGroupId IN (Select * from iCentral.dbo.GetSubUserGroups('471b9cbd-2312-4a8a-bb20-35ea53d30340',0))
AND T.IsDeleted = 0
AND T.RecordTypeId = 1
AND T.ClientTypeId IN
(
'1', --Client
'-1652203805' --NTU
)
AND T.CLID NOT IN
(
SELECT DISTINCT
UDDF.CLID
FROM
SLacsis_SLM.dbo.T_UserDef UD WITH (NOLOCK)
INNER JOIN
SLacsis_SLM.dbo.T_UserDefData UDDF WITH (NOLOCK)
ON UD.UserDef_ID = UDDF.UserDef_ID
INNER JOIN
SLacsis_SLM.dbo.T_Client CLL WITH (NOLOCK)
ON CLL.CLID = UDDF.CLID AND CLL.UserGroup_CLID = UD.UserID
WHERE
UD.UserDef_ID in
(
'F68F31CE-525B-4455-9D50-6DA77C66FEE5',
'A7CECB03-866C-4F1F-9E1A-CEB09474FE47'
)
AND UDDF.Data = 'NO'
)
ORDER BY T.Surname
EDIT:
I have removed all NOLOCK's (including the ones in views and UDFs) and I'm still having the same issue. I get the same results every time for the nested select (T) and if I put the result set of T into a temp table in the beginning of the query and join onto the temp table instead of the nested select then the final result set is the same every time I run the query.
EDIT2:
I have been doing some more reading on ROW_NUMBER()...I'm partitioning by email (of which there are duplicates) and ordering by Relationship (where there are only 1 of 2 relationships). Could this cause the query to be non-deterministic and would there be a way to fix that?
EDIT3:
Here are the actual execution plans if anyone is interested http://www.mediafire.com/?qo5gkh5dftxf0ml. Is it possible to see that it is running as read committed from the execution plan? I've compared the files using WinMerge and the only differences seem to be the counts (ActualRows="").
EDIT4:
This works:
SELECT * FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY B.Email ORDER BY Relationship DESC) AS sequence_id
FROM
(
SELECT DISTINCT
RC.UserGroupId,
...
) B...
EDIT5:
When running the same ROW_NUMBER() query (T in the original question, just selecting RC.DisplayName and ROW_NUMBER) twice in a row I get different rank for some people:
Does anyone have a good explanation/example of why or how ROW_NUMBER() over a result set that contains duplicates can rank differently each time it is run and ultimately change the number of results?
EDIT6:
Ok I think this makes sense to me now. This occurs when people 2 people have the same email address (e.g a husband and wife pair) and relationship. I guess in this case their ROW_NUMBER() ranking is arbitrary and can change every time it is run.
Your use of NOLOCK all over means you are doing dirty reads and will see uncommitted data, data that will be rolled back, transient and inconsistent data etc
Take these off, try again, report back pleas
Edit: some options with NOLOCKS removed
Data is really changing
Some parameter or filter is changing (eg GETDATE)
Some float comparisons running on different cores each time
See this on dba.se https://dba.stackexchange.com/q/4810/630
Embedded NOLOCKs in udfs or views (eg iCentral.dbo.GetSubUserGroups)
...
As I said yesterday in the comments the row numbering for rows with duplicate E.Email, Relationship values will be arbitrary.
To make it deterministic you would need to do PARTITION BY B.Email ORDER BY Relationship DESC, SomeUniqueColumn . Interesting that it changes between runs though using the same execution plan. I assume this is a consequence of the hash join.
I think your problem is the first row over the partition is not deterministic. I suspect that Email and Relationship is not unique.
ROW_NUMBER() OVER (PARTITION BY E.Email ORDER BY Relationship DESC) AS sequence_id
Later you examine the first row of the partition.
WHERE T.sequence_id = 1
AND T.UserGroupId ...
If that first row is arbitrary then you are going to get an arbitrary where comparison. You need to add to the ORDER BY to include a complete unique key. If there is no unique key then you need to make one or live with arbitrary results. Even on a table with a clustered PK the select row order is not guaranteed unless the entire PK is in the sort clause.
This probably has to do with ordering. You have a sequence_id defined as a row_number ordered by Relationship. You'll always get a sensible order by relationship, but other than that your row_number will be random. So you can get different rows with sequence_id 1 each time. That in turn will affect your where clause, and you can get different numbers of results. To fix this to get a consistent result, add another field to your row_number's order by. Use a primary key to be certain of consistent results.
There's a recent KB that addresses problems with ROW_NUMBER() ... see FIX: You receive an incorrect result when you run a query that uses the row_number function in SQL Server 2008 for the details.
However this KB indicates that it's a problem when parallelism is invoked for execution, and looking at your execution plans I can't see this kicking in. But the fact that MS have found a problem with it in one situation makes me a little bit wary - i.e., could the same issue occur for a sufficiently complicated query (and your execution plan does look sufficiently large).
So it may be worth checking your patch levels of SQL Server 2008.
U Must only use
Order by
without prtition by.
ROW_NUMBER() OVER (ORDER BY Relationship DESC) AS sequence_id

Using analytics with left join and partition by

I have two different queries which produce the same results. I wonder which one is more efficent. The second one, I am using one select clause less, but I am moving the where to the outter select. Which one is executed first? The left join or the where clause?
Using 3 "selects":
select * from
(
select * from
(
select
max(t.PRICE_DATETIME) over (partition by t.PRODUCT_ID) as LATEST_SNAPSHOT,
t.*
from
PRICE_TABLE t
) a
where
a.PRICE_DATETIME = a.LATEST_SNAPSHOT;
) r
left join
PRODUCT_TABLE l on (r.PRODUCT_ID = l.PRODUCT_ID and r.PRICE_DATETIME = l.PRICE_DATETIME)
Using 2 selects:
select * from
(
select
max(t.PRICE_DATETIME) over (partition by t.PRODUCT_ID) as LATEST_SNAPSHOT,
t.*
from
PRICE_TABLE t
) r
left join
PRODUCT_TABLE l on (r.PRODUCT_ID = l.PRODUCT_ID and r.PRICE_DATETIME = l.PRICE_DATETIME)
where
r.PRICE_DATETIME = r.LATEST_SNAPSHOT;
ps: I know, I know, "select star" is evil, but I'm writing it this way only here to make it smaller.
"I wonder which one is more efficent"
You can answer this question yourself pretty easily by turning on statistics.
set statistics io on
set statistics time on
-- query goes here
set statistics io off
set statistics time off
Do this for each of your two queries and compare the results. You'll get some useful output about how many reads SQL Server is doing, how many milliseconds each takes to complete, etc.
You can also see the execution plan SQL Server generates viewing the estimated execution plan (ctrl+L or right-click and choose that option) or by enabling "Display Actual Execution Plan" (ctrl+M) and running the queries. That could help answer the question about order of execution; I couldn't tell you off the top of my head.