SQL Server Having vs Join - performance issues - sql

I have 2 queries that are almost identical. One executes in just a couple of minutes, the other either times out or takes over 20 minutes. Researched but didn't find online a comparison between Inner join vs Having (only find Where vs Inner Join, or Where vs Having). I would like to understand why one works a lot faster than the other. The 1st query is the one that works OK, the 2nd is the query that is not working properly :
SELECT IW.OrderCategory
, IW.ParkName
, IW.TurbineNumber
, IW.UserStatus
, IW.MaintenancePlan
, Chk_List.Checklist_Description
, Chk_List.DateTimeStamp
, Chk_List.Stamp
, Chk_List.Man
, Chk_List.type
, Chk_List.Joint
, Chk_List.ToolPos
, Chk_List.Tech
FROM ServiceWorkOrders.EDW.SAP_ServiceOrders_IW73 AS IW
INNER JOIN dbo.vwSQL_Checklist_data_w_item_details AS Chk_List ON IW.TurbineNumber = Chk_List.Turbine_Num
GROUP BY IW.OrderCategory
, IW.ParkName
, IW.TurbineNumber
, IW.UserStatus
, IW.MaintenancePlan
, Chk_List.Checklist_Description
, Chk_List.DateTimeStamp
, Chk_List.Stamp
, Chk_List.Man
, Chk_List.type
, Chk_List.Joint
, Chk_List.ToolPos
, Chk_List.Tech
HAVING(IW.ActualStartDate = MAX(Chk_List.DateStamp));
SELECT IW.OrderCategory
, IW.ParkName
, IW.TurbineNumber
, IW.UserStatus
, IW.MaintenancePlan
, Chk_List.Checklist_Description
, Chk_List.DateTimeStamp
, Chk_List.Stamp
, Chk_List.Man
, Chk_List.type
, Chk_List.Joint
, Chk_List.ToolPos
, Chk_List.Tech
FROM ServiceWorkOrders.EDW.SAP_ServiceOrders_IW73 AS IW
INNER JOIN dbo.vwSQL_Checklist_data_w_item_details AS Chk_List ON IW.TurbineNumber = Chk_List.Turbine_Num
AND IW.ActualStartDate = Chk_List.DateStamp
GROUP BY IW.OrderCategory
, IW.ParkName
, IW.TurbineNumber
, IW.UserStatus
, IW.MaintenancePlan
, Chk_List.Checklist_Description
, Chk_List.DateTimeStamp
, Chk_List.Stamp
, Chk_List.Man
, Chk_List.type
, Chk_List.Joint
, Chk_List.ToolPos
, Chk_List.Tech;

Related

Total customer per reporting date without union

I would like to display to run this report where I show the total number of customers per reporting date. Here is a how I need the data to look like:
My original dataset look like this (please see query): In order to calculate the number of customers. I need to use the start and end date: if Start_Date>reporting_date and End_Date<=reporting_date then count as a customer.
I was able to develop a script, but it only gives me the total number of customers for only one reporting date.
select '2022-10-31' reporting_date, count(case when Start_Date>'2022-10-31' and End_Date<='2022-10-31' then Customer_ID end)
from (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','US','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
Is there a way to amend the code with cross-join or other workarounds to the total customers per reporting date without doing many unions
select '2022-10-31' reporting_date, count(case when Start_Date>'2022-10-31' and End_Date<='2022-10-31' then Customer_ID end)
from (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','US','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
UNION ALL
select '2022-9-30' reporting_date, count(case when Start_Date>'2022-9-301' and End_Date<='2022-9-30' then Customer_ID end)
from (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','US','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
It is possible to provide date ranges as a separate table/subquery, join to the actual data and perform grouping:
select s.start_d, s.end_d, COUNT(Customer_ID) AS total
FROM (SELECT '2022-10-31'::DATE, '2022-10-31'::DATE
UNION SELECT '2022-09-30', '2022-09-30')
AS s(start_d, end_d)
LEFT JOIN (values ('2022-10-14','2022-8-19','0010Y654012P6KuQAK')
, ('2022-3-15','2022-9-14','0011v65402PoSpVAAV')
, ('2021-1-11','2022-10-11','0010Y654012P6DuQAK')
, ('2022-12-1','2022-5-14','0011v65402u7muLAAQ')
, ('2021-1-30','2022-3-14','0010Y654012P6DuQAK')
, ('2022-10-31','2022-2-14','0010Y654012P6PJQA0')
, ('2021-10-31','2021-10-31','0010Y654012P6PJQA0')
, ('2021-5-31','2022-5-14','0011v65402x8cjqAAA')
, ('2022-6-2','2022-1-13','0010Y654016OqkJQAS')
, ('2022-1-1','2022-11-11','0010Y654016OqIaQAK')
) a(Start_Date ,End_Date ,Customer_ID)
ON a.Start_Date>s.start_d and a.End_Date<=s.end_d
GROUP BY s.start_d, s.end_d;
Output:

How to go look back 7 days in a hive query

I've got a sql where I need to constantly look back 4 days. This code would run once a week so I need to look 7 days back. My where clause is set stationary and looks between two dates, However I need it to look back 7 days all the time.
Here is a snippet of my code:
WITH gps_traces AS(
SELECT
gtrips.trip_id
, to_date(gtrips.trip_date) as trip_date
, gtrips.fleet_id
, vin.vehicle_vin
, gtrips.driver_id
, gtrips.trip_distance_travelled
, gtrips.trip_duration
, to_timestamp(gdata.trip_timestamp, "yyyy-MM-dd'T'HH:mm:ss") as gps_timestamp
, rank() over
(partition by gtrips.trip_id
order by to_timestamp(gdata.trip_timestamp, "yyyy-MM-dd'T'HH:mm:ss") asc)
as timestamp_rank
, gdata.latitude
, gdata.longitude
, gdata.postcode
FROM
cms.gps_trips gtrips
INNER JOIN
cms.gps_data gdata
ON gtrips.trip_id = gdata.trip_id
INNER JOIN
(
SELECT
DISTINCT --why are there duplicates?
devices.vehicle_id
, devices.vehicle_vin
, devices.data_effective_timestamp
FROM
cms.devices devices
INNER JOIN
(
SELECT
vehicle_id
, max(data_effective_timestamp) as data_effective_timestamp
FROM
cms.devices
GROUP BY
vehicle_id
) max_data_effective
ON devices.vehicle_id = max_data_effective.vehicle_id
AND devices.data_effective_timestamp = max_data_effective.data_effective_timestamp
) vin
WHERE
to_date(gtrips.trip_date) >= "2020-12-11" --Only keeping this date for now
AND
to_date(gtrips.trip_date) <= "2020-12-17"
AND
gtrips.fleet_id = 10211 --Only keeping due for this example
)
SELECT
gps.trip_id
, gps.trip_date
, gps.fleet_id
, gps.vehicle_vin
, gps.driver_id
, gps.trip_distance_travelled
, gps.trip_duration
, gps.gps_timestamp
, gps.latitude
, gps.longitude
, gps.postcode
, gps1.gps_timestamp as next_timestamp
, gps1.latitude as next_latitude
, gps1.longitude as next_longitude
, ACOS(
SIN(RADIANS(gps.latitude))*SIN(RADIANS(gps1.latitude)) +
COS(RADIANS(gps.latitude))*COS(RADIANS(gps1.latitude))*COS(RADIANS(gps1.longitude) - RADIANS(gps.longitude))
)*3958.76 AS COSINES_DISTANCE
, ASIN(
SQRT(
POWER(SIN((RADIANS(gps.latitude) - RADIANS(gps1.latitude))/2), 2) +
COS(RADIANS(gps.latitude))*COS(RADIANS(gps1.latitude))*
POWER(SIN((RADIANS(gps.longitude) - RADIANS(gps1.longitude))/2), 2)
)
)*3958.76*2 AS HAVERSINE_DISTANCE
, (UNIX_TIMESTAMP(gps1.gps_timestamp) - UNIX_TIMESTAMP(gps.gps_timestamp)) AS GPS_INTERVAL
FROM
gps_traces gps
LEFT JOIN
gps_traces gps1
ON gps.trip_id = gps1.trip_id
AND gps.timestamp_rank = (gps1.timestamp_rank - 1)
ORDER BY
gps.fleet_id
, gps.trip_id
, gps.timestamp_rank
specifically, I'm needing to change this snippet here:
WHERE
to_date(gtrips.trip_date) >= "2020-12-11" --Needs to be rolling 7 days
AND
to_date(gtrips.trip_date) <= "2020-12-17"
I tried converting the date but it's falling over in Hive. Can someone assist with this?
You can use current_date:
WHERE
to_date(gtrips.trip_date) >= date_sub(current_date, 7) --7 days back
AND
to_date(gtrips.trip_date) <= current_date
Or pass current date as a -hiveconf parameter:
WHERE
to_date(gtrips.trip_date) >= date_sub(to_date('${hiveconf:current_date}'), 7) --7 days back
AND
to_date(gtrips.trip_date) <= to_date('${hiveconf:current_date}')

Bigquery similar query different output

I have 2 standard SQL queries in Bigquery. They are:
Query1:
select sfcase.case_id
, sfuser.user_id
, sfcase_create_date
, sfcase_status
, sfcase_origin
, sfcategory_category1
, sfcategory_category2
, sfcase_priority
, sftime_elapsedmin
, sftime_targetmin
, sfcase_sla_closemin
, if(count(sfcomment.parentid)=0,"0"
,if(count(sfcomment.parentid)=1,"1"
,if(count(sfcomment.parentid)=2,"2"
,"3"))) as comment_response
from(
select id as case_id
, timestamp_add(createddate, interval 7 hour) as sfcase_create_date
, status as sfcase_status
, origin as sfcase_origin
, priority as sfcase_priority
, case when status = 'Closed' then timestamp_diff(timestamp_add(closeddate, interval 7 hour),timestamp_add(createddate, interval 7 hour),minute)
end as sfcase_sla_closemin
, case_category__c
from `some_of_my_dataset.cs_case`
) sfcase
left join(
select upper(x1st_category__c) as sfcategory_category1
, upper(x2nd_category__c) as sfcategory_category2
, id
from `some_of_my_dataset.cs_case_category`
) sfcategory
on sfcategory.id = sfcase.case_category__c
left join(
select parentid as parentid
from `some_of_my_dataset.cs_case_comment`
) sfcomment
on sfcase.case_id = sfcomment.parentid
left join(
select ELAPSEDTIMEINMINS as sftime_elapsedmin
, TARGETRESPONSEINMINS as sftime_targetmin
, caseid
from `some_of_my_dataset.cs_case_milestone`
)sftime
on sfcase.case_id = sftime.caseid
left join(
select id as user_id
, createddate
from `some_of_my_dataset.cs_user`
)sfuser
on date(sfuser.createddate) = date(sfcase.sfcase_create_date)
group by 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
, 11
Query2:
select sfcase.id as case_id
, sfuser.id as user_id
, timestamp_add(sfcase.createddate, interval 7 hour) as sf_create_date
, sfcase.status as sf_status
, sfcase.origin as sf_origin
, upper(sfcategory.x1st_category__c) as sf_category1
, sfcategory.x2nd_category__c as sf_category2
, sfcase.priority as sf_priority
, sftime.ELAPSEDTIMEINMINS as sf_elapsedresponsemin
, sftime.TARGETRESPONSEINMINS as sf_targetresponsemin
, case when sfcase.status = 'Closed' then timestamp_diff(timestamp_add(sfcase.closeddate, interval 7 hour),timestamp_add(sfcase.createddate, interval 7 hour),minute)
end as sla_closemin
, if(count(sfcomment.parentid)=0,"0"
,if(count(sfcomment.parentid)=1,"1"
,if(count(sfcomment.parentid)=2,"2"
,"3"))) as comment_response
from `some_of_my_dataset.cs_case` as sfcase
left join `some_of_my_dataset.cs_case_category` as sfcategory
on sfcategory.id = sfcase.case_category__c
left join `some_of_my_dataset.cs_case_comment` as sfcomment
on sfcase.id = sfcomment.parentid
left join `some_of_my_dataset.cs_case_milestone` as sftime
on sfcase.id = sftime.caseid
left join `some_of_my_dataset.cs_user` as sfuser
on date(sfuser.createddate) = date(sfcase.createddate)
group by 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 10
, 11
I tried to run them in the same time. Query1 perform faster with less rows of data, while Query2 perform longer with more rows of data. Both of Query1 and Query2 have 12 columns.
Why do they return different result?
Which query should i use?
update: rename my dataset

why does grouping__id can not be filter by having stmt when I use grouping sets(cube rollup)?

hive code as follows:
set mapred.reduce.tasks = 100;
create table order_dimensions_cube as
select
grouping__id as groupid,
user_level ,
city_level ,
region_name ,
province_name ,
city_name ,
platform ,
sale_type ,
item_first_cate_name ,
app_module ,
department ,
sum(COALESCE(complete_sum, 0)) as complete_price
from
data
group by
user_level ,
city_level ,
region_name ,
province_name ,
city_name ,
platform ,
sale_type ,
item_first_cate_name,
app_module ,
department
with cube having grouping__id >= 704;
this turns out that no records generated.
more info:
I checked that I have a lot of records in table:data.
I have tried this sql without the having stmt and there is alot records generated.
why this happens and how to solve this if I want to use having to do some constraints on the result?
thank you.
since you did not provide actual data, please try the following:
select grouping_id,count(*) from
(select
grouping__id as groupid,
user_level ,
city_level ,
region_name ,
province_name ,
city_name ,
platform ,
sale_type ,
item_first_cate_name ,
app_module ,
department ,
sum(COALESCE(complete_sum, 0)) as complete_price
from
data
group by
user_level ,
city_level ,
region_name ,
province_name ,
city_name ,
platform ,
sale_type ,
item_first_cate_name,
app_module ,
department
with cube) A
group by grouping_id
and see how many records are there for each grouping__id. there could be some issues there.
also - try changing the outer query to
select * from
(select
grouping__id as groupid,
user_level ,
city_level ,
region_name ,
province_name ,
city_name ,
platform ,
sale_type ,
item_first_cate_name ,
app_module ,
department ,
sum(COALESCE(complete_sum, 0)) as complete_price
from
data
group by
user_level ,
city_level ,
region_name ,
province_name ,
city_name ,
platform ,
sale_type ,
item_first_cate_name,
app_module ,
department
with cube) A
where grouping__id >= 704
and see if problem persists..
this is not a solution but more of a trial to understand what goes

SQL query help to generate data

Below the query I created to get certain itemnumbers, qty ordered and price and others from the database. The problem is that sometimes an order doesn't contain 20 itemsnumbers but only 2. Now my question is if it's possible to fill the spaces with other itemnumbers random from the DB. It doesn't need to be correct because it's just for testing.
So can anybody help?
select
t.*,
-- THE THREE SUMVAT VALUES BELOW ARE VERY IMPORTANT. THEY ARE ONLY CORRECT HOWEVER WHEN THERE ARE NO NULL VALUES INVOLVED IN THE MATH,
-- I.E. WHEN THERE ARE 20 ITEMS/QTYS/PRICES INVOLVED WITH A CERTAIN ORDER_NO
((t.QTY1*t.PRICE1)+(t.QTY2*t.PRICE2)+(t.QTY3*t.PRICE3)+(t.QTY4*t.PRICE4)+(t.QTY5*t.PRICE5)) SUMVAT0, -- example: 5123.45 <- lines 1-5: Q*P
((t.QTY6*t.PRICE6)+(t.QTY7*t.PRICE7)+(t.QTY8*t.PRICE8)+(t.QTY9*t.PRICE9)+(t.QTY10*t.PRICE10)+(t.QTY11*t.PRICE11)+(t.QTY12*t.PRICE12)+(t.QTY13*t.PRICE13)+(t.QTY14*t.PRICE14)+(t.QTY15*t.PRICE15))
SUMVAT6, -- example: 1234.56 <- lines 6-15: Q*P
((t.QTY16*t.PRICE16)+(t.QTY17*t.PRICE17)+(t.QTY18*t.PRICE18)+(t.QTY19*t.PRICE19)+(t.QTY20*t.PRICE20)) SUMVAT19 -- example: 4567.89 <- lines 16-20: Q*P
from (
select
(to_char(p.vdate, 'YYYYMMDD') || to_char(sysdate, 'HH24MISS')) DT,
(to_char(p.vdate, 'YYYY-MM-DD') ||'T' || to_char(sysdate, 'HH24:MI:') || '00') DATETIME,
(to_char(orh.written_date, 'YYYY-MM-DD') ||'T00:00:00') DATETIME2,
orh.supplier FAKE_GLN,
y.*
from (
select
x.order_no ORDNO
, max(decode(r,1 ,x.item,null)) FAKE_GTIN1
, max(decode(r,2 ,x.item,null)) FAKE_GTIN2
, max(decode(r,3 ,x.item,null)) FAKE_GTIN3
, max(decode(r,4 ,x.item,null)) FAKE_GTIN4
, max(decode(r,5 ,x.item,null)) FAKE_GTIN5
, max(decode(r,6 ,x.item,null)) FAKE_GTIN6
, max(decode(r,7 ,x.item,null)) FAKE_GTIN7
, max(decode(r,8 ,x.item,null)) FAKE_GTIN8
, max(decode(r,9 ,x.item,null)) FAKE_GTIN9
, max(decode(r,10,x.item,null)) FAKE_GTIN10
, max(decode(r,11,x.item,null)) FAKE_GTIN11
, max(decode(r,12,x.item,null)) FAKE_GTIN12
, max(decode(r,13,x.item,null)) FAKE_GTIN13
, max(decode(r,14,x.item,null)) FAKE_GTIN14
, max(decode(r,15,x.item,null)) FAKE_GTIN15
, max(decode(r,16,x.item,null)) FAKE_GTIN16
, max(decode(r,17,x.item,null)) FAKE_GTIN17
, max(decode(r,18,x.item,null)) FAKE_GTIN18
, max(decode(r,19,x.item,null)) FAKE_GTIN19
, max(decode(r,20,x.item,null)) FAKE_GTIN20
, max(decode(r,1 ,x.qty_ordered,null)) QTY1
, max(decode(r,2 ,x.qty_ordered,null)) QTY2
, max(decode(r,3 ,x.qty_ordered,null)) QTY3
, max(decode(r,4 ,x.qty_ordered,null)) QTY4
, max(decode(r,5 ,x.qty_ordered,null)) QTY5
, max(decode(r,6 ,x.qty_ordered,null)) QTY6
, max(decode(r,7 ,x.qty_ordered,null)) QTY7
, max(decode(r,8 ,x.qty_ordered,null)) QTY8
, max(decode(r,9 ,x.qty_ordered,null)) QTY9
, max(decode(r,10,x.qty_ordered,null)) QTY10
, max(decode(r,11,x.qty_ordered,null)) QTY11
, max(decode(r,12,x.qty_ordered,null)) QTY12
, max(decode(r,13,x.qty_ordered,null)) QTY13
, max(decode(r,14,x.qty_ordered,null)) QTY14
, max(decode(r,15,x.qty_ordered,null)) QTY15
, max(decode(r,16,x.qty_ordered,null)) QTY16
, max(decode(r,17,x.qty_ordered,null)) QTY17
, max(decode(r,18,x.qty_ordered,null)) QTY18
, max(decode(r,19,x.qty_ordered,null)) QTY19
, max(decode(r,20,x.qty_ordered,null)) QTY20
, max(decode(r,1 ,x.unit_cost,null)) PRICE1
, max(decode(r,2 ,x.unit_cost,null)) PRICE2
, max(decode(r,3 ,x.unit_cost,null)) PRICE3
, max(decode(r,4 ,x.unit_cost,null)) PRICE4
, max(decode(r,5 ,x.unit_cost,null)) PRICE5
, max(decode(r,6 ,x.unit_cost,null)) PRICE6
, max(decode(r,7 ,x.unit_cost,null)) PRICE7
, max(decode(r,8 ,x.unit_cost,null)) PRICE8
, max(decode(r,9 ,x.unit_cost,null)) PRICE9
, max(decode(r,10,x.unit_cost,null)) PRICE10
, max(decode(r,11,x.unit_cost,null)) PRICE11
, max(decode(r,12,x.unit_cost,null)) PRICE12
, max(decode(r,13,x.unit_cost,null)) PRICE13
, max(decode(r,14,x.unit_cost,null)) PRICE14
, max(decode(r,15,x.unit_cost,null)) PRICE15
, max(decode(r,16,x.unit_cost,null)) PRICE16
, max(decode(r,17,x.unit_cost,null)) PRICE17
, max(decode(r,18,x.unit_cost,null)) PRICE18
, max(decode(r,19,x.unit_cost,null)) PRICE19
, max(decode(r,20,x.unit_cost,null)) PRICE20
from (
select
rank() over (partition by oh.order_no order by ol.item asc) r,
oh.supplier,
oh.order_no,
oh.written_date,
ol.item,
ol.qty_ordered,
ol.unit_cost
from
ordhead oh
JOIN ordloc ol ON oh.order_no = ol.order_no
where
-- count(numrows) = 1500
not unit_cost is null
-- and ol.order_no in (6181,6121)
) x
group by x.order_no
) y
JOIN ordhead orh ON orh.order_no = y.ORDNO,
period p
) t
;
Without being able to really test this, you might try something like this. Replace the inline view 'x' with this:
FROM (
WITH q AS (
SELECT LEVEL r, TO_CHAR(TRUNC(dbms_random.value*1000,0)) item
, TRUNC(dbms_random.value*100,0) qty_ordered
, TRUNC(dbms_random.value*10,2) unit_cost
FROM dual CONNECT BY LEVEL <= 20
)
SELECT COALESCE(x1.r, q.r) r, supplier, order_no, written_date
, COALESCE(x1.item, q.item) item
, COALESCE(x1.qty_ordered, q.qty_ordered) qty_ordered
, COALESCE(x1.unit_cost, q.unit_cost) unit_cost
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY oh.order_no ORDER BY ol.item ASC) r
, oh.supplier
, oh.order_no
, oh.written_date
, ol.item
, ol.qty_ordered
, ol.unit_cost
FROM ordhead oh JOIN ordloc ol ON oh.order_no = ol.order_no
WHERE NOT unit_cost IS NULL) x1 RIGHT JOIN q ON x1.r = q.r
) x
GROUP BY x.order_no
The WITH clause will give you a table with 20 rows of random data. Outer join that with your old 'x' data and you will be guaranteed 20 rows of data. You might not need to cast the item as a varchar2 depending on data. (N.B., I finally found a query that it makes sense to use a RIGHT JOIN with. See this SO question)
I'm not quite sure what you're trying to do with the GROUP BY and MAX stuff? In the future it would be helpful to condense your examples into something others can easily test, a minimal case that gets your point across.
I also incorporated #Kevin's good suggestion to use ROW_NUMBER instead of RANK.
very difficult to understand...
i think you might be ok if you put a 0 instead of null in the price values...
, max(decode(r,18,x.unit_cost,0)) PRICE18
and
, max(decode(r,20,x.qty_ordered,0)) QTY20
then at least the math should work.
Rank will not guarantee a sequential count of the items in the groups there, may be gaps when you have several rows with the same value.
for a decent explanation see:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2920665938600
I think you need to use row_number