SQL ROW_NUMBER with INNER JOIN - sql

I need to use ROW_NUMBER() in the following Query to return rows 5 to 10 of the result. Can anyone please show me what I need to do? I've been trying to no avail. If anyone can help I'd really appreciate it.
SELECT *
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity

You need to stick it in a table expression to filter on ROW_NUMBER. You won't be able to use * as it will complain about the column name starRating appearing more than once so will need to list out the required columns explicitly. This is better practice anyway.
WITH CTE AS
(
SELECT /*TODO: List column names*/
ROW_NUMBER()
OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity) AS RN
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT /*TODO: List column names*/
FROM CTE
WHERE RN BETWEEN 5 AND 10
ORDER BY RN

You can use a with clause. Please try the following
WITH t AS
(
SELECT villa_data.starRating,
villa_data.capacity,
villa_data.bedrooms,
villa_prices.period,
villa_prices.price,
ROW_NUMBER() OVER (ORDER BY villa_prices.price,
villa_data.bedrooms,
villa_data.capacity ) AS 'RowNumber'
FROM villa_data
INNER JOIN villa_prices
ON villa_prices.starRating = villa_data.starRating
WHERE villa_data.capacity >= 3
AND villa_data.bedrooms >= 1
AND villa_prices.period = 'lowSeason'
)
SELECT *
FROM t
WHERE RowNumber BETWEEN 5 AND 10;

Related

Row_number not showing 1 rank

I am using the below code to get the rank but not able to see the 1st rank. I am getting the results 2 and 3.
Select BPMDate,Tenor,TenorStartDays,TenorEndDays,NominalTransacted,
SumofNominalRate,AverageRate,VWAR,rw,SortOrder
FROM
(
SELECT case_id,created_date BPMDate,tenor Tenor,tenor_start_days TenorStartDays,
tenor_end_days TenorEndDays
,nominal_transacted NominalTransacted,sum_of_nominal_rates SumofNominalRate
,average_rate AverageRate,vmar VWAR
,Case when tenor='O/N' Then 1 when tenor='1W' Then 2 when tenor='1M' Then 3
when tenor='3M' Then 4
when tenor='6M' Then 5 when tenor='1Y' Then 6 Else 7 End SortOrder,
row_number() over ( partition by tenor order by created_date desc ) rw
From table1
Where df_type = 'DF1' and to_date(created_date) >= '2020-10-13' and
to_date(created_date) <= '2020-10-13'
) ard
inner join
(
select case_id,case_status from table2
where case_status='Completed') ad
on ard.case_id=ad.case_id
order by sortorder
The column rw is beginning from 2 instead 1
thanks for the support
I think you are filtering it out when you join to ad , If you just select the sub query I think you will be able to see it.

How to select a single row for each unique ID

SQL novice here learning on the job, still a greenhorn. I have a problem I don't know how to overcome. Using IBM Netezza and Aginity Workbench.
My current output will try to return one row per case number based on when a task was created. It will only keep the row with the newest task. This gets me about 85% of the way there. The issue is that sometimes multiple tasks have a create day of the same day.
I would like to incorporate Task Followup Date to only keep the newest row if there are multiple rows with the same Case Number. I posted an example of what my current code outputs and what i would like it to output.
Current code
SELECT
A.PS_CASE_ID AS Case_Number
,D.CASE_TASK_TYPE_NM AS Task
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT AS Task_Followup_Date
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON (A.CASE_ID = C.CASE_ID)
INNER JOIN VW_CASE_TASK_TYPE D ON (C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID)
INNER JOIN ADMIN.VW_RSN_CTGY B ON (A.RSN_CTGY_ID = B.RSN_CTGY_ID)
WHERE
(A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT')
AND CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01'
AND B.RSN_CTGY_NM = 'Chargeback Initiation'
AND CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE)) from VW_CASE_TASK C2 WHERE C2.CASE_ID = C.CASE_ID)
GROUP BY
A.PS_CASE_ID
,D.CASE_TASK_TYPE_NM
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT
Current output
Desired output
You could use ROW_NUMBER here:
WITH cte AS (
SELECT DISTINCT A.PS_CASE_ID AS Case_Number, D.CASE_TASK_TYPE_NM AS Task,
C.TASK_CRTE_TMS, C.TASK_FLWUP_DT AS Task_Followup_Date,
ROW_NUMBER() OVER (PARTITION BY A.PS_CASE_ID ORDER BY C.TASK_FLWUP_DT DESC) rn
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON A.CASE_ID = C.CASE_ID
INNER JOIN VW_CASE_TASK_TYPE D ON C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID
INNER JOIN ADMIN.VW_RSN_CTGY B ON A.RSN_CTGY_ID = B.RSN_CTGY_ID
WHERE (A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT') AND
CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01' AND
B.RSN_CTGY_NM = 'Chargeback Initiation' AND
CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE))
FROM VW_CASE_TASK C2
WHERE C2.CASE_ID = C.CASE_ID)
)
SELECT
Case_Number,
Task,
TASK_CRTE_TMS,
Task_Followup_Date
FROM cte
WHERE rn = 1;
One method used window functions:
with cte as (
< your query here >
)
select x.*
from (select cte.*,
row_number() over (partition by case_number, Task_Followup_Date
order by TASK_CRTE_TMS asc
) as seqnum
from cte
) x
where seqnum = 1;

SQL ROW_NUMBER OVER syntax

I have this:
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured
on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
AND RowNumber = 1
ORDER BY homepagefeatured.orderby
It fails on "Invalid column name RowNumber"
Not sure how to prefix it to access it?
Thanks
You can't reference the field like that. You can however use a subquery or a common-table-expression:
Here's a subquery:
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa,
homepagefeatured.orderby
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
) T
WHERE RowNumber = 1
ORDER BY orderby
Rereading your query, since you aren't partitioning by any fields, the order by at the end is useless (it contradicts the order of the ranking function). You're probably better off using top 1...
Using top:
SELECT top 1 vwmain.vehicleref,vwmain.capid,
vwmain.manufacturer,vwmain.model,vwmain.derivative,
vwmain.isspecial,
vwmain.created,vwmain.updated,vwmain.stocklevel,
vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa,
homepagefeatured.orderby
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
ORDER BY homepagefeatured.orderby
(EDIT: This answer is a nearby pitfall. I leave it for documentation.)
Have a look here: Referring to a Column Alias in a WHERE Clause
This is the same situation.
It is a matter of how the sql query is parsed/compiled internally, so your field alias names are not known at the time the where clause is interpreted. Therefore you might try, in reference to the example above:
SELECT ROW_NUMBER() OVER (ORDER BY vwmain.ch) as RowNumber,
vwmain.vehicleref,vwmain.capid, vwmain.manufacturer,vwmain.model,vwmain.derivative, vwmain.isspecial,vwmain.created,vwmain.updated,vwmain.stocklevel, vwmain.[type],
vwmain.ch,vwmain.co2,vwmain.mpg,vwmain.term,vwmain.milespa
FROM vwMain_LATEST vwmain
INNER JOIN HomepageFeatured on vwMain.vehicleref = homepageFeatured.vehicleref
WHERE homepagefeatured.siteskinid = 1
AND homepagefeatured.Rotator = 1
AND ROW_NUMBER() OVER (ORDER BY vwmain.ch) = 1
ORDER BY homepagefeatured.orderby
Thus you see your expression in the select statement is exactly reused in the where clause.

How do I combine subquery rows into one column in Oracle?

I'm working with FileNet. I'm trying to get the folders that a document may be filed in to appear in one column of the record set delimited with semicolons. This was the layout previously decided on and I am tasked with making Oracle do it. Here's what I have for a query so far:
SELECT d1.F_DOCNUMBER,
d1.F_DOCCLASSNUMBER,
d1.F_ENTRYDATE,
d1.F_ARCHIVEDATE,
d1.F_RETENTBASE,
d1.F_RETENTDISP,
d1.F_RETENTOFFSET,
d1.F_PAGES,
d1.F_DOCTYPE,
d1.F_DOCFORMAT,
d1.A32 AS CERT_NUM,
d1.A35 AS DOC_TYPE,
d1.A36 AS BATCH_KEY,
d1.A37 AS FIELD_REP_CODE,
d1.A38 AS EFFECTIVE_DATE,
d1.A39 AS VOUCH_NUM_HIGH,
d1.A40 AS VOUCH_NUM_LOW,
f1.Folders
FROM doctaba d1
LEFT JOIN (SELECT SUBSTR (SYS_CONNECT_BY_PATH (F_FOLDERNAME , ';'), 2) Folders
FROM (SELECT fc2.F_DOCNUMBER, f2.F_FOLDERNAME, ROW_NUMBER () OVER (ORDER BY f2.F_FOLDERNAME) rn, COUNT (*) OVER () cnt
FROM folder_contents fc2
INNER JOIN folder f2
ON f2.F_FOLDERNUMBER = fc2.F_FOLDERNUMBER
WHERE fc2.F_DOCNUMBER = d1.F_DOCNUMBER)
WHERE rn = cnt
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1) f1
ON d1.F_DOCNUMBER = f1.F_DOCNUMBER
WHERE d1.F_DOCTYPE IS NULL
AND d1.F_DOCNUMBER >= 107777
AND d1.F_DOCNUMBER <= 305791
ORDER BY d1.F_DOCNUMBER;
The problem is that d1.F_DOCNUMBER is being marked as an invalid identifier. I read on some forums that Oracle may not let that column identifier work multiple query levels down. Anyone have some suggestions on how to make this work? Thanks!
EDIT:
Here's my original query that just includes the folder values in rows.
SELECT doctaba.F_DOCNUMBER,
doctaba.F_DOCCLASSNUMBER,
doctaba.F_ENTRYDATE,
doctaba.F_ARCHIVEDATE,
doctaba.F_RETENTBASE,
doctaba.F_RETENTDISP,
doctaba.F_RETENTOFFSET,
doctaba.F_PAGES,
doctaba.F_DOCTYPE,
doctaba.F_DOCFORMAT,
doctaba.A32 AS CERT_NUM,
doctaba.A35 AS DOC_TYPE,
doctaba.A36 AS BATCH_KEY,
doctaba.A37 AS FIELD_REP_CODE,
doctaba.A38 AS EFFECTIVE_DATE,
doctaba.A39 AS VOUCH_NUM_HIGH,
doctaba.A40 AS VOUCH_NUM_LOW,
folder.F_FOLDERNAME
FROM doctaba
LEFT JOIN folder_contents
ON doctaba.F_DOCNUMBER = folder_contents.F_DOCNUMBER
INNER JOIN folder
ON folder.F_FOLDERNUMBER = folder_contents.F_FOLDERNUMBER
WHERE doctaba.F_DOCTYPE IS NULL
AND doctaba.F_DOCNUMBER >= 107777
AND doctaba.F_DOCNUMBER <= 17208174
ORDER BY doctaba.F_DOCNUMBER;
In this case, you are lucky. You are only getting one value from the subquery, so you can just make it a correlated subquery in the select clause:
SELECT . . .
(SELECT SUBSTR(SYS_CONNECT_BY_PATH (F_FOLDERNAME , ';'), 2) as Folders
FROM (SELECT fc2.F_DOCNUMBER, f2.F_FOLDERNAME,
ROW_NUMBER () OVER (ORDER BY f2.F_FOLDERNAME) rn,
COUNT (*) OVER () cnt
FROM folder_contents fc2 INNER JOIN
folder f2
ON f2.F_FOLDERNUMBER = fc2.F_FOLDERNUMBER
WHERE fc2.F_DOCNUMBER = d1.F_DOCNUMBER
)
WHERE rn = cnt
START WITH rn = 1
CONNECT BY rn = PRIOR rn + 1
) as Folders
FROM doctaba d1
WHERE d1.F_DOCTYPE IS NULL AND
d1.F_DOCNUMBER >= 107777 AND
d1.F_DOCNUMBER <= 305791
ORDER BY d1.F_DOCNUMBER;

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.
So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'
The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.