Invalid column name using row_number() OVER(PARTITION BY ) - sql

I have two tables TBL_CONTACT and TBL_PHONE that I am trying to join.
The TBL_PHONE table contains duplicate rows for contacts with multiple phone numbers, so I am partitioning these and trying to select the first instance only.
Here is the code I have:
SELECT
(CONTACT.CONTACTID),
(CONTACT.FULLNAME),
(PHONE.contactid),
(PHONE.numberdisplay),
(row_number() OVER(PARTITION BY PHONE.Contactid ORDER BY PHONE.Contactid)) prn
FROM
"ACT2015Demo"."dbo"."TBL_CONTACT" AS CONTACT
INNER JOIN
TBL_PHONE AS PHONE
ON CONTACT.Contactid = PHONE.Contactid
WHERE
CAST(Editdate AS DATE) = CAST(GETDATE() AS DATE)
This gives the following:
CONTACTID FULLNAME CONTACTID NUMBERDISPLAY PRN
1001 Name1 1001 Tel1001 1 1
1001 Name1 1001 Tel1001 2 2
1002 Name2 1002 Tel1002 1 1
1003 Name3 1003 Tel1003 1 1
1003 Name3 1003 Tel1003 2 2
1003 Name3 1003 Tel1003 3 3
I then want to use the PRN column to limit the output to only rows with PRN = 1. I have tried the following, as this has worked for me in the past on less complex joins:
SELECT
(CONTACT.CONTACTID),
(CONTACT.FULLNAME),
(PHONE.contactid),
(PHONE.numberdisplay),
(row_number() OVER(PARTITION BY PHONE.Contactid ORDER BY PHONE.Contactid)) prn
FROM
"ACT2015Demo"."dbo"."TBL_CONTACT" AS CONTACT
INNER JOIN
TBL_PHONE AS PHONE
ON CONTACT.Contactid = PHONE.Contactid AND PRN = 1
WHERE
CAST(Editdate AS DATE) = CAST(GETDATE() AS DATE)
However, this gives me the invalid column name error for PRN? I have also tried using PRN = 1 as part of the WHERE with the same error.
How can I get PRN to work as a column name and limit the output?

Something a bit weird and confusing for some people about SQL is that column aliases you define in the select-list cannot be referenced in the FROM clause or WHERE clause.
This is confusing because of the fact that column aliases appear to be defined early in the query (as the select-list is before the FROM clause). But the order of the syntax is not the order of execution.
You can get around this by running your query as a derived table subquery:
SELECT *
FROM (
SELECT
CONTACT.CONTACTID,
CONTACT.FULLNAME,
PHONE.contactid,
PHONE.numberdisplay,
ROW_NUMBER() OVER(PARTITION BY PHONE.Contactid ORDER BY PHONE.Contactid) AS PRN
FROM
"ACT2015Demo"."dbo"."TBL_CONTACT" AS CONTACT
INNER JOIN
TBL_PHONE AS PHONE
ON CONTACT.Contactid = PHONE.Contactid
WHERE
CAST(Editdate AS DATE) = CAST(GETDATE() AS DATE)
) AS DerivedTable
WHERE PRN = 1
PS: MySQL does not support windowing functions in versions up to 5.7. It has been announced that this feature is being worked on, hopefully it will be ready in MySQL 8. I think you may have tagged your question incorrectly.
You use the schema qualifier dbo which makes me think you are using Microsoft SQL Server or Sybase, is this correct?

This error also happens when the row_number() OVER(PARTITION BY ) clause has a composite set of key fields which appear in both tables in the join. The error will not go away no-matter how the aliases are arranged or rearranged. To solve this, the identically named key columns from the joining table must be renamed in an in-line temporary table as follows.
Change this:
INNER JOIN
TBL_PHONE AS PHONE
ON CONTACT.Contactid = PHONE.Contactid
... some other col_names
etc. ...
to this:
INNER JOIN (
SELECT
Contactid as PK_Contactid,
... some other col_names as PK_col_names,
etc. ...
FROM TBL_PHONE
) AS PHONE
ON CONTACT.Contactid = PHONE.PK_Contactid
... some other col_names
etc. ...
NB. and don't prefix the any col_names with table aliases in the row_number() OVER(PARTITION BY ) clause at all. Worked for me.

I have tried the following, as this has worked for me in the past on less complex joins...The way you tried will never work for simple joins as well.You have to use Derived table or CTE
Select
*
from
(
SELECT
(CONTACT.CONTACTID) as concontactid,
(CONTACT.FULLNAME),
(PHONE.contactid),
(PHONE.numberdisplay),
(row_number() OVER(PARTITION BY PHONE.Contactid ORDER BY PHONE.Contactid)) prn
FROM
"ACT2015Demo"."dbo"."TBL_CONTACT" AS CONTACT
INNER JOIN
TBL_PHONE AS PHONE
ON CONTACT.Contactid = PHONE.Contactid
WHERE
CAST(Editdate AS DATE) = CAST(GETDATE() AS DATE)
) as b
where prn=1

Related

Query keeps giving me duplicate records. How can I fix this?

I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
with clientbridge as (Select *
from (Select visitorid, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY visitorid,student_id,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
--where student_id = '9999999-aaaa-6634-bbbb-96fa18a9046e'
)
where rn = 1 --visitorid = '999999999999999999999999999999'---'1111111111111111111111111111111' --and pai.datekey is not null --- 00000000000000000000000000
),
-----------------Data Header Table
studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
select
*
from clientbridge ab inner join studentvisit sv on sv.visid_visitorid = cb.visitorid
I wrote a query which uses 2 temp tables. And then joins them into 1. However, I am seeing duplicate records in the student visit temp table. (Query is below). How could this be modified to remove the duplicate records of the visit temp table?
I think you may get have a better shot by joining the two datasets in the same query where you want the data ranked, otherwise your rank from query will be ignored within the results from the second query. Perhaps, something like ->
;with studentvisit as
(SELECT
--Visit key will allow us to track everything they did within that visit.
distinct visid_visitorid,
--calcualted_visitorid,
uniquevisitkey,
--channel, -- says the room they're in. Channel might not be reliable would need to see how that operates
--office_list, -- add 7 to exact
--user_college,
--first_office_hour_name,
--first_question_time_attended,
studentaccountid_5,
profid_officenumber_8,
studentvisitstarttime,
room_id_115,
--date_time,
qqq144, --Course Name
qqq145, -- Course Office Hour Benefit
qqq146, --Course Office Hour ID
datekey
FROM university.office_hour_details ohd
--left_join niversity.course_office_hour_bridge cohd on ohd.visid_visitorid
where DateKey >='2022-10-01' --between '2022-10-01' and '2022-10-27'
and (qqq146 <> '')
)
,clientbridge as (
Select
sv.*,
university.course_office_hour_bridge cohd, --Visid
roomnumber,
room_id,
profid,
student_id,
ambc.datekey,
RANK() over(PARTITION BY sv.visitorid,sv.student_id,sv,profid ORDER BY ambc.datekey desc) as rn
from university.course_office_hour_bridge cohd
inner join studentvisit sv on sv.visid_visitorid = cohd.visitorid
)
select
*
from clientbridge WHERE rn=1

Find Last Purchase Order For Each Part

I need to find the last P.O.for parts purchased from Vendors.
I was trying to come up with a way to do this using a query I found that allowed me to find
the max Creation date for a group of Quotes linked to an Opportunity:
SELECT
t1.[quoteid]
,t1.[OpportunityId]
,t1.[Name]
FROM
[Quote] t1
WHERE
t1.[CreatedOn] = (SELECT MAX(t2.[CreatedOn])
FROM [Quote] t2
WHERE t2.[OpportunityId] = t1.[OpportunityId])
In the case of Purchase Orders, though, I have a header table and a line item table.
So, I need to include info from both:
SELECT
PURCHASE_ORDER.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURCHASE_ORDER,
PURC_ORDER_LINE
WHERE
PURCHASE_ORDER.ID=
PURC_ORDER_LINE.PURC_ORDER_ID
If the ORDER_DATE from the header were available in the PURC_ORDER_LINE table I thought
this could be done like so:
SELECT
PURC_ORDER_LINE.ORDER_DATE
,PURC_ORDER_LINE.PURC_ORDER_ID
,PURC_ORDER_LINE.PART_ID
,PURC_ORDER_LINE.UNIT_PRICE
,PURC_ORDER_LINE.USER_ORDER_QTY
FROM
PURC_ORDER_LINE T1
WHERE T1.ORDER_DATE=(SELECT MAX(T2.ORDER_DATE)
FROM PURC_ORDER_LINE T2
WHERE T2.PURC_ORDER_ID=T1.PURC_ORDER_ID)
But I'm not sure that's correct and, in any case, there are 2 things:
The ORDER_DATE is in the Header table, not in the line table
I need the last P.O. created for each of the Parts (PART_ID)
So:
PART_A and PART_B, as an example, may appear on several P.O.s
Part
Order Date
P.O. #
PART_A
2020-08-17
PO12345
PART_A
2020-11-21
PO23456
PART_A
2021-07-08
PO29986
PART_B
2019-11-30
PO00861
PART_B
2021-08-30
PO30001
The result set would be (including the other fields from above):
ORDER_DATE
PURC_ORDER_ID
PART_ID
UNIT_PRICE
ORDER_QTY
2021-07-08
PO29986
PART_A
321.00
12
2021-08-30
PO30001
PART_B
426.30
8
I need a query that will give me such a result set.
You can use row-numbering for this. Just place the whole join inside a subquery (derived table), add a row-number, then filter on the outside.
SELECT *
FROM (
SELECT
pol.PART_ID,
po.ORDER_DATE,
pol.PURC_ORDER_ID,
pol.UNIT_PRICE,
pol.USER_ORDER_QTY,
rn = ROW_NUMBER() OVER (PARTITION BY pol.PART_ID ORDER BY po.ORDER_DATE DESC)
FROM PURCHASE_ORDER po
JOIN PURC_ORDER_LINE pol ON po.ID = pol.PURC_ORDER_ID
) po
WHERE po.rn = 1;
Note the use of proper join syntax, as well as table aliases
you can use window function:
select * from (
select * , row_number() over (partition by PART_ID order by ORDER_DATE desc) rn
from tablename
) t where t.rn = 1

Grouping records on consecutive dates

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.
There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?
Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1
The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.
For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)
This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

Oracle - group by of joined tables

I tried to look for an answer and I found more advices, but not anyone of them was helpful, so I'm trying to ask now.
I have two tables, one with distributors (columns: distributorid, name) and the second one with delivered products (columns: distributorid, productid, corruptcount, date) - the column corruptcount contains the number of corrupted deliveries. I need to select the first five distributors with the most corrupted deliveries in last two months. I need to select distributorid, name and sum of corruptcount, here is my query:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
GROUP BY del.distributorid
But Oracle returns error message: "not a GROUP BY expression".And when I edit my query to this:
SELECT del.distributorid, d.name, del.corruptcount-- , SUM(del.corruptcount) AS corrupt
FROM distributor d, delivery del
WHERE d.distributorid = del.distributorid
AND d.distributorid IN
(SELECT distributorid
FROM (SELECT distributorid, SUM(corruptcount) AS corrupt
FROM delivery
WHERE storeid = 1
AND "date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE
AND ROWNUM <= 5
GROUP BY distributorid
ORDER BY corrupt DESC))
--GROUP BY del.distributorid
It's working as you expect and returns correct data:
1 IBM 10
2 DELL 0
2 DELL 1
2 DELL 6
3 HP 3
8 ACER 2
9 ASUS 1
I'd like to group this data. Where and why is my query wrong? Can you help please? Thank you very, very much.
I think the problem is just the d.name in the select list; you need to include it in the group by clause as well. Try this:
SELECT del.distributorid, d.name, SUM(del.corruptcount) AS corrupt
FROM distributor d join
delivery del
on d.distributorid = del.distributorid
WHERE d.distributorid IN
(SELECT distributorid
FROM delivery
WHERE storeid = 1 AND
"date" BETWEEN ADD_MONTHS(SYSDATE, -2) AND SYSDATE AND
ROWNUM <= 5
GROUP BY distributorid
ORDER BY SUM(corruptcount) DESC
)
GROUP BY del.distributorid, d.name;
I also switched the query to using explicit join syntax with an on clause, instead of the outdated implicit join syntax using a condition in the where.
I also removed the additional layer of subquery. It is not really necessary.
EDIT:
"Why does d.name have to be included in the group by?" The easy answer is that SQL requires it because it does not know which value to include from the group. You could instead use min(d.name) in the select, for instance, and there would be no need to change the group by clause.
The real answer is a wee bit more complicated. The ANSI standard does actually permit the query as you wrote it. This is because id is (presumably) declared as a primary key on the table. When you group by a primary key (or unique key), then you can use other columns from the same table just as you did. Although ANSI supports this, most databases do not yet. So, the real reason is that Oracle doesn't support the ANSI standard functionality that would allow your query to work.

Variant use of the GROUP BY clause in TSQL

Imagine the following schema and sample data (SQL Server 2008):
OriginatingObject
----------------------------------------------
ID
1
2
3
ValueSet
----------------------------------------------
ID OriginatingObjectID DateStamp
1 1 2009-05-21 10:41:43
2 1 2009-05-22 12:11:51
3 1 2009-05-22 12:13:25
4 2 2009-05-21 10:42:40
5 2 2009-05-20 02:21:34
6 1 2009-05-21 23:41:43
7 3 2009-05-26 14:56:01
Value
----------------------------------------------
ID ValueSetID Value
1 1 28
etc (a set of rows for each related ValueSet)
I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.
I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?
You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.
Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.
Multiple-column IN with GROUP-BY (probably faster):
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
SELECT OriginatingID, Max(DateStamp)
FROM ValueSet
GROUP BY OriginatingID
)
Correlated Subquery:
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
SELECT Max(DateStamp)
FROM ValueSet V2
WHERE V2.OriginatingID = O.ID
)
SELECT OriginatingObjectID, id
FROM (
SELECT id, OriginatingObjectID, RANK() OVER(PARTITION BY OriginatingObjectID
ORDER BY DateStamp DESC) as ranking
FROM ValueSet)
WHERE ranking = 1;
This can be done with a correlated sub-query. No GROUP-BY necessary.
SELECT
vs.ID,
vs.OriginatingObjectID,
vs.DateStamp,
v.Value
FROM
ValueSet vs
INNER JOIN Value v ON v.ValueSetID = vs.ID
WHERE
NOT EXISTS (
SELECT 1
FROM ValueSet
WHERE OriginatingObjectID = vs.OriginatingObjectID
AND DateStamp > vs.DateStamp
)
This works only if there can not be two equal DateStamps for a OriginatingObjectID in the ValueSet table.