Using second highest value in an ON clause - sql

I have an existing MSSQL view where I need to include a new join to the view. To get the correct record data I need to select the entry where the ActivityKey is the second highest (essentially the second most recent revision of the policy).
select
...
from polmem a
left join polMemPremium wpmp on (wpmp.policyNumber=pf.sreference
and wpmp.lPolicyMemberKey=a.lPolicyMemberKey
and wpmp.lPolicyActivityKey = (select Max(wpmp.lPolicyActivityKey) where wpmp.lPolicyActivityKey
NOT IN (SELECT MAX(wpmp.lPolicyActivityKey))))
where
...
But the above results in this error:
An aggregate cannot appear in an ON clause unless it is in a subquery contained in a HAVING clause or select list, and the column being aggregated is an outer reference.
Essentially the error is telling me I need to have the aggregate
(select Max(wpmp.lPolicyActivityKey) where wpmp.lPolicyActivityKey NOT IN (SELECT MAX(wpmp.lPolicyActivityKey)))
in a Having and then list most if not all of the columns in the view's Select statement in a Group By. My issue is as this is a view used in multiple places and doing what MSSQL wants is a massive change to the view for the sake of what I thought would be a relatively simple addition. I'm just wondering if I'm approaching this wrong and if there is a better way to achieve what I want?

Just try something like:
select ...
from .....
..........
cross apply (select
*
,row_number() over (order by wpmp.lPolicyActivityKey desc)
from web_PolicyMemberPremium wpmp
where wpmp.policyNumber=pf.sreference
and wpmp.lPolicyMemberKey=a.lPolicyMemberKey) wpmp
....
where ...
and wpmp.rn = 2
I added cross apply (that means there should be a policy in the table otherwise the rows will be excluded). You could put an outer apply and change the where clause isnull(wpmp.rn,2) = 2 or similar .. but it doesn't make much sense to me.
PS. It would help a lot us (and mostly you) if you format the code in a nice manner.

Related

Filtering on ROW_NUMBER() is changing the results

I did implement an OData service of my own that takes an SQL statement and apply the top / skip filter using a ROW_NUMBER(). Most statement tested so far are working well except for a statement involving 2 levels of Left Join. For some reason I can't explain, the data returned by the sql is changing when I apply a where clause on the row number column.
For readability (and testing), I removed most of the sql to keep only the faulty part. Basically, you have a Patients table that may have 0 to N Diagnostics and the Diagnostics may have 0 to N Treatments:
SELECT RowNumber, PatientID, DiagnosticID, TreatmentID
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNumber
, *
FROM PATIENTS
LEFT JOIN DIAGNOSTICS ON DIAGNOSTICS.PatientID = PATIENTS.PatientID
LEFT JOIN TREATMENTS ON TREATMENTS.DiagnosticID = DIAGNOSTICS.DiagnosticID
) AS Wrapper
--WHERE RowNumber BETWEEN 1 AND 10
--If I uncomment the line above, I'll get 10 lines that differs from the first 10 line of this query
This is the results I got from the statement above. The result on the left is showing the first 10 rows without the WHERE clause while the one on the right is showing the results with the WHERE clause.
For the record, I'm using SQL Server 2008 R2 SP3. My application is in C# but the problem occurs in SQL server too so I don't think .NET is involved in this case.
EDIT
About the ORDER BY (SELECT NULL), I took that code a while ago from this SO question. However, an order by null will work only if the statement is sorted... in my case, I forgot about adding an order by clause so that's why I was getting some random sorting.
Let me first ask: why do you expect it to be the same? Or rather, why do you expect it to be anything in particular? You haven't imposed an ordering, so the query optimizer is free to use whatever execution operators are most efficient (according to its cost scheme). When you add the WHERE clause, the plan will change and the natural ordering of the results will be different. This can also happen when adding joins or subqueries, for example.
If you want the results to come back in a specific order, you need to actually use the ORDER BY subclause of the ROW_NUMBER() window function. I'm not sure why you are ordering by SELECT NULL, but I can guarantee you that's the problem.

LEFT JOIN with ROW NUM DB2

I am using two tables having one to many mapping (in DB2) .
I need to fetch 20 records at a time using ROW_NUMBER from two table using LEFT JOIN. But due to the one to many mapping, result is not consistent. I might be getting 20 records but those records does not contains 20 unique records of first table
SELECT
A.*,
B.*,
ROW_NUMBER() OVER (ORDER BY A.COLUMN_1 DESC) as rn
from
table1
LEFT JOIN
table2 ON A.COLUMN_3 = B.COLUMN3
where
rn between 1 and 20
Please suggest some solution.
Sure, this is easy... once you know that you can use subqueries as a table reference:
SELECT <relevant columns from Table1 and Table2>, rn
FROM (SELECT <relevant columns from Table1>,
ROW_NUMBER() OVER (ORDER BY <relevant columns> DESC) AS rn
FROM table1) Table1
LEFT JOIN Table2
ON <relevant equivalent columns>
WHERE rn >= :startOfRange
AND rn < :startOfRange + :numberOfElements
For production code, never do SELECT * - always explicitly list the columns you want (there are several reasons for this).
Prefer inclusive lower-bound (>=), exclusive upper-bound (<) for (positive) ranges. For everything except integral types, this is required to sanely/cleanly query the values. Do this with integral types both to be consistent, as well as for ease of querying (note that you don't actually need to know which value you "stop" on). Further, the pattern shown is considered the standard when dealing with iterated value constructs.
Note that this query currently has two problems:
You need to list sufficient columns for the ORDER BY to return consistent results. This is best done by using a unique value - you probably want something in an index that the optimizer can use.
Every time you run this query, you (usually) have to order the ENTIRE set of results before you can get whatever slice of them you want (especially for anything after the first page). If your dataset is large, look at the answers to this question for some ideas for performance improvements. The optimizer may be able to cache the results for you, but it's not guaranteed (especially on tables that receive many updates).

In an EXISTS can my JOIN ON use a value from the original select

I have an order system. Users with can be attached to different orders as a type of different user. They can download documents associated with an order. Documents are only given to certain types of users on the order. I'm having trouble writing the query to check a user's permission to view a document and select the info about the document.
I have the following tables and (applicable) fields:
Docs: DocNo, FileNo
DocAccess: DocNo, UserTypeWithAccess
FileUsers: FileNo, UserType, UserNo
I have the following query:
SELECT Docs.*
FROM Docs
WHERE DocNo = 1000
AND EXISTS (
SELECT * FROM DocAccess
LEFT JOIN FileUsers
ON FileUsers.UserType = DocAccess.UserTypeWithAccess
AND FileUsers.FileNo = Docs.FileNo /* Errors here */
WHERE DocAccess.UserNo = 2000 )
The trouble is that in the Exists Select, it does not recognize Docs (at Docs.FileNo) as a valid table. If I move the second on argument to the where clause it works, but I would rather limit the initial join rather than filter them out after the fact.
I can get around this a couple ways, but this seems like it would be best. Anything I'm missing here? Or is it simply not allowed?
I think this is a limitation of your database engine. In most databases, docs would be in scope for the entire subquery -- including both the where and in clauses.
However, you do not need to worry about where you put the particular clause. SQL is a descriptive language, not a procedural language. The purpose of SQL is to describe the output. The SQL engine, parser, and compiler should be choosing the most optimal execution path. Not always true. But, move the condition to the where clause and don't worry about it.
I am not clear why do you need to join with FileUsers at all in your subquery?
What is the purpose and idea of the query (in plain English)?
In any case, if you do need to join with FileUsers then I suggest to use the inner join and move second filter to the WHERE condition. I don't think you can use it in JOIN condition in subquery - at least I've never seen it used this way before. I believe you can only correlate through WHERE clause.
You have to use aliases to get this working:
SELECT
doc.*
FROM
Docs doc
WHERE
doc.DocNo = 1000
AND EXISTS (
SELECT
*
FROM
DocAccess acc
LEFT OUTER JOIN
FileUsers usr
ON
usr.UserType = acc.UserTypeWithAccess
AND usr.FileNo = doc.FileNo
WHERE
acc.UserNo = 2000
)
This also makes it more clear which table each field belongs to (think about using the same table twice or more in the same query with different aliases).
If you would only like to limit the output to one row you can use TOP 1:
SELECT TOP 1
doc.*
FROM
Docs doc
INNER JOIN
FileUsers usr
ON
usr.FileNo = doc.FileNo
INNER JOIN
DocAccess acc
ON
acc.UserTypeWithAccess = usr.UserType
WHERE
doc.DocNo = 1000
AND acc.UserNo = 2000
Of course the second query works a bit different than the first one (both JOINS are INNER). Depeding on your data model you might even leave the TOP 1 out of that query.

Selecting the very last record of the returned table

I'm from an access background with a little mySQL, so I'm slightly lost when it comes to SQL.
Here is the query I am using:
Select
tbl_AcerPFSSurveyIVR.NTlogin,
tbl_AcerPFSSurveyIVR.Customer_Firstname,
tbl_AcerPFSSurveyIVR.Customer_Lastname,
tbl_AcerPFSSurveyIVR.CaseId,
tbl_AcerPFSSurveyIVR.ContactNumber,
CRM_TRN_ORDER.ORDER_PRICE,
CRM_TRN_ORDER.ORDER_CREATEDDATE
This returns the proper record, but I want the very last... I know I should use something like this...
SELECT TOP 1 *
FROM table_Name
ORDER BY unique_column DESC
Where I get lost, and if I am correct in saying so, you can only do one Select... so how do I integrate the two? Thanks in advance for your help.
What you are wanting is something like:
SELECT TOP(1)
tbl_AcerPFSSurveyIVR.NTlogin,
tbl_AcerPFSSurveyIVR.Customer_Firstname,
tbl_AcerPFSSurveyIVR.Customer_Lastname,
tbl_AcerPFSSurveyIVR.CaseId,
tbl_AcerPFSSurveyIVR.ContactNumber,
CRM_TRN_ORDER.ORDER_PRICE,
CRM_TRN_ORDER.ORDER_CREATEDDATE
FROM
tbl_AcerPFSSurveyIVR
JOIN CRM_TRN_ORDER
ON tbl_AcerPFSSurveyIVR.CustomerId = CRM_TRN_ORDER.CUSTOMERID
ORDER BY
CRM_TRN_ORDER.ORDER_CREATEDDATE DESC
Note: I made up the JOIN clause, because I don't know your schema. You should pick real columns that satisfy the join, assuming there is a foreign key relationship of some kind. Otherwise, you would simply be taking a cartesian product which is most likely NOT what you want. However, you could do that by replacing the FROM ... JOIN clauses above with "FROM tbl_AcerPFSSurveyIVR, CRM_TRN_ORDER".
Have you tried :
Select TOP (1)
tbl_AcerPFSSurveyIVR.NTlogin,
tbl_AcerPFSSurveyIVR.Customer_Firstname,
tbl_AcerPFSSurveyIVR.Customer_Lastname,
tbl_AcerPFSSurveyIVR.CaseId,
tbl_AcerPFSSurveyIVR.ContactNumber,
CRM_TRN_ORDER.ORDER_PRICE,
CRM_TRN_ORDER.ORDER_CREATEDDATE
FROM Table_Name ORDER BY unique_Column DESC
This Includes a top 1 in your previous query and filters the request by descending order at the same time. I took for granted that the first SELECTwas from table_name.

T-SQL SELECT TOP returns duplicates

I'm using SQL Server 2008 R2.
I'm not sure if I've discovered a strange SQL quirk, or (more likely) something in my code is causing this strange behaviour, particularly as Google has turned up nothing. I have a view called vwResponsible_Office_Address.
SELECT * FROM vwResponsible_Office_Address
..returns 403 rows
This code:
SELECT TOP 1000 * FROM vwResponsible_Office_Address
..returns 409 rows, as it includes 6 duplicates.
However this:
SELECT TOP 1000 * FROM vwResponsible_Office_Address
ORDER BY ID
..returns 403 rows again.
I can post the code for the view if it's relevant, but does it make sense for SELECT TOP to ever work in this way? I understand that SELECT TOP is free to return records in any order but don't understand why the number of records returned should vary.
The view does use cross apply which might be affecting the result set some how?
EDIT: View definition as requested
CREATE VIEW [dbo].[vwResponsible_Office_Address]
AS
SELECT fp.Entity_ID [Reg_Office_Entity_ID],
fp.Entity_Name [Reg_Office_Entity_Name],
addr.Address_ID
FROM [dbo].[Entity_Relationship] er
INNER JOIN [dbo].[Entity] fp
ON er.[Related_Entity_ID] = fp.[Entity_ID]
INNER JOIN [dbo].[Entity_Address] ea
ON ea.[Entity_ID] = fp.[Entity_ID]
CROSS APPLY (
SELECT TOP 1 Address_ID
FROM [dbo].[vwEntity_Address] vea
WHERE [vea].[Entity_ID] = fp.Entity_ID
ORDER by ea.[Address_Type_ID] ASC, ea.[Address_ID] DESC
) addr
WHERE [Entity_Relationship_Type_ID] = 25 -- fee payment relationship
UNION
SELECT ets.[Entity_ID],
ets.[Entity_Name],
addr.[Address_ID]
FROM dbo.[vwEntity_Entitlement_Status] ets
INNER JOIN dbo.[Entity_Address] ea
ON ea.[Entity_ID] = ets.[Entity_ID]
CROSS APPLY (
SELECT TOP 1 [Address_ID]
FROM [dbo].[vwEntity_Address] vea
WHERE vea.[Entity_ID] = ets.[Entity_ID]
ORDER by ea.[Address_Type_ID] ASC, ea.[Address_ID] DESC
) addr
WHERE ets.[Entitlement_Type_ID] = 40 -- registered office
AND ets.[Entitlement_Status_ID] = 11 -- active
I would assume that there is some non determinism going on which means that different access methods can return different results.
Looking at the view definition the only place that appears likely would be if vwEntity_Address has some duplicates for Entity_ID.
This would make the top 1 Address_ID returned arbitrary in that case which will effect the result of the union operation when it removes duplicates.
Definitely this does look extremely suspect
SELECT TOP 1 [Address_ID]
FROM [dbo].[vwEntity_Address] vea
WHERE vea.[Entity_ID] = ets.[Entity_ID]
ORDER by ea.[Address_Type_ID] ASC, ea.[Address_ID] DESC
You are ordering by values from the outer query in the cross apply. This will have absolutely no effect whatsoever as these will be constant for a particular CROSS APPLY invocation.
Can you try changing to
SELECT TOP 1 [Address_ID]
FROM [dbo].[vwEntity_Address] vea
WHERE vea.[Entity_ID] = ets.[Entity_ID]
ORDER by vea.[Address_ID] DESC
I was wondering if your view included a function, until I got to the end, where you say you use cross-apply. I would assume that is your problem, if your interested in the details, take a look at the various query plans.
EDIT: Expansion of answer
I.e. your function is non-deterministic and can either return more than one row per input or return the same row for different input. In combination, this means that you'll get exactly what you are seeing: duplicate rows under some circumsntaces. Adding a distinct to your view is the costly way to solve your problem, a better way would be to change your function so that for any input there is only one row output, and for a row output only one input will produce that row.
EDIT: I didn't see that you're now including your view definition.
Your problem is definitely the cross apply, in particular you are sorting inside the cross apply by values from OUTSIDE of the cross apply, making the top 1 effectively random.