Different way of writing this SQL query with partition - sql

Hi I have the below query in Teradata. I have a row number partition and from that I want rows with rn=1. Teradata doesn't let me use the row number as a filter in the same query. I know that I can put the below into a subquery with a where rn=1 and it gives me what I need. But the below snippet needs to go into a larger query and I want to simplify it if possible.
Is there a different way of doing this so I get a table with 2 columns - one row per customer with the corresponding fc_id for the latest eff_to_dt?
select cust_grp_id, fc_id, row_number() over (partition by cust_grp_id order by eff_to_dt desc) as rn
from table1

Have you considered using the QUALIFY clause in your query?
SELECT cust_grp_id
, fc_id
FROM table1
QUALIFY ROW_NUMBER()
OVER (PARTITION BY cust_grp_id
ORDER BY eff_to_dt desc)
= 1;

Calculate MAX eff_to_dt for each cust_grp_id and then join result to main table.
SELECT T1.cust_grp_id,
T1.fc_id,
T1.eff_to_dt
FROM Table1 AS T1
JOIN
(SELECT cust_grp_id,
MAX(eff_to_dt) AS max_eff_to_dt
FROM Table
GROUP BY cust_grp_id) AS T2 ON T2.cust_grp_id = T1.cust_grp_id
AND T2.max_eff_to_dt = T1.eff_to_dt

You can use a pair of JOINs to accomplish the same thing:
INNER JOIN My_Table T1 ON <some criteria>
LEFT OUTER JOIN My_Table T2 ON <some criteria> AND T2.eff_to_date > T1.eff_to_date
WHERE
T2.my_id IS NULL
You'll need to sort out the specific criteria for your larger query, but this is effectively JOINing all of the rows (T1), but then excluding any where a later row exists. In the WHERE clause you eliminate these by checking for a NULL value in a column that is NOT NULL (in this case I just assumed some ID value). The only way that would happen is if the LEFT OUTER JOIN on T2 failed to find a match - i.e. no rows later than the one that you want exist.
Also, whether or not the JOIN to T1 is LEFT OUTER or INNER is up to your specific requirements.

Related

Get multiple rows using select top 1 and joins

I'm working on northwind database.
I have to select the newest and the oldest order/orders (could be multiple with same date).
Is it possible without using subqueries and top 1, only by joining tables?
Using a CTE:
WITH orders_rnk AS (
SELECT *, RANK() OVER (ORDER BY order_date ASC) AS rnk
FROM orders
)
SELECT * from orders_rn WHERE rnk = 1;
Using a JOIN:
SELECT o.*
FROM orders AS o
LEFT OUTER JOIN orders AS o2
ON o.order_date > o2.order_date
WHERE o2.order_id IS NULL;
Yes you can use TOP 1 but add WITH TIES keyword.
link to MS documentation
Per MS documentation
WITH TIES Returns two or more rows that tie for last place in the
limited results set. You must use this argument with the ORDER BY
clause. WITH TIES might cause more rows to be returned than the value
specified in expression. For example, if expression is set to 5 but
two additional rows match the values of the ORDER BY columns in row 5,
the result set will contain seven rows.
Exmaple
SELECT TOP 1 WITH TIES
FROM tableA A JOIN TableB
ON A.ID=B.ID
ORDER BY A.column1

Joined query producing more results compared to solo query

I am performing the following query which has an inner join against another table.
select count(myTable.name)
from sch2.sample_detail as myTable
inner join sch1.otherTable as otherTable on myTable.name = otherTable.name
where otherTable.is_valid = 1
and myTable.name IS NOT NULL;
This produces a count of 4912304.
The following is a query just on a single table (my table).
SELECT COUNT(myTable.name)
from sch2.sample_detail as myTable
where myTable.name IS NOT NULL;
This produces a count of 2864654.
But how is this possible? Both queries have the clause where myTable.name IS NOT NULL.
Shouldn't the second query produce same results or if not even more cos the second query doesn't have the otherTable.is_valid = 1 clause?
Why does the inner join produces a higher count of result?
Please advice if there is something I should amend in the 1st query, thanks.
Inner, left or cross join can duplicate rows. sch1.otherTable.name is not unique and this causing rows duplication because for each row in left table all corresponding rows from right table are being selected, this is normal join behavior.
To get duplicate names list use this query and decide how to remove duplicated rows: filter or distinct or filter by row_number, etc.
select count(*) cnt,
name
from sch1.otherTable
having count(*)>1
order by cnt desc;
If you need EXISTS (and do not need to select columns from otherTable), use left semi join.
Also subquery with distinct can be used to pre-aggregate name before join and filter:
select count(myTable.name)
from sch2.sample_detail as myTable
LEFT SEMI JOIN (select distinct name from sch1.otherTable otherTable where otherTable.is_valid = 1 ) as otherTable on myTable.name = otherTable.name
where myTable.name IS NOT NULL;

SQL Tables need to be appended, but not like JOINS

I've a scenario as below.
I've two tables, and a Common column/Key between the two.
I need Table2 data to be just appended to Table1 without repetition like JOINs.
If there are more rows in one table, other table rows can be NULL.
As shown in figure, Result Table has NULLs when there is no corresponding row count from Table2.
I tried using Joins, but I'm getting a result of 45 rows. But I should get 9 rows.
Thanks in advance.
Edit: Added my queries
SELECT DISTINCT
APPT.PRSN_ID
,APPT.SCHEDULEDAPPOINTMENTS
,APPT.OVERDUEAPPOINTMENTS
,Visit.ENCOUNTER_CATEGORY
,Visit.ENCOUNTER_TYPE
,Visit.ENCOUNTER_DATE
,Visit.ENCOUNTER_FOLLOWUP_DATE
FROM
APPOINTMENTS APPT
OUTER APPLY dbo.fn_GetVisitsOfAPerson(PROV.PRSN_ID) AS Visit
/***********************************************************/
--IN THE ABOVE QUERY, THE FUNCTION IS DEFINED AS BELOW
CREATE FUNCTION dbo.fn_GetVisitsOfAPerson(#PrsnID AS bigint)
RETURNS TABLE
AS
RETURN
(
SELECT
VISITS.PVISITS_PRSN_KEY
,DATA.HE_Category_Description AS 'ENCOUNTER_CATEGORY'
,DATA.HE_Type_Description AS 'ENCOUNTER_TYPE'
,VISITS.PVISITS_DATE AS 'ENCOUNTER_DATE'
,VISITS.PVISITS_FOLLOWUP_DATE AS 'ENCOUNTER_FOLLOWUP_DATE'
FROM
[HS_PRSN_HEALTH_VISITS] VISITS INNER JOIN
[HS_HealthEncounter_Table] DATA ON
ENCOUNTER.PVISITS_CATEGORY = DATA.HE_Category_Code AND
ENCOUNTER.PVISITS_TYPE = DATA.HE_Type_Code
WHERE
PVISITS_PRSN_KEY = #PrsnID
AND PVISITS_VOID = 0
)
GO
I cannot read your data tables, but I think you just need to introduce a row number.
It would be something like this:
select . . .
from (select t1.*, row_number() over (partition by key order by ??) as seqnum
from table1 t1
) full join
(select t2.*, row_number() over (partition by key order by ??) as seqnum
from table1 t2
) t2
on t1.key = t2.key and t1.seqnum = t2.seqnum;
Based on you wanting the 9 rows present in the Visit table it looks like you need an OUTER JOIN, something like:
SELECT DISTINCT
APPT.PRSN_ID
,APPT.SCHEDULEDAPPOINTMENTS
,APPT.OVERDUEAPPOINTMENTS
,Visit.ENCOUNTER_CATEGORY
,Visit.ENCOUNTER_TYPE
,Visit.ENCOUNTER_DATE
,Visit.ENCOUNTER_FOLLOWUP_DATE
FROM Visit
LEFT OUTER JOIN
APPOINTMENTS APPT
ON Visit.key = APP.key

entry cannot be referenced in this part of the query (subquery) Error

I'm getting the following error on my query:
here is an entry for table "table1", but it cannot be referenced from this part of the query.
This is my query:
SELECT id
FROM property_import_image_results table1
LEFT JOIN (
SELECT created_at
FROM property_import_image_results
WHERE external_url = table1.external_url
ORDER BY created_at DESC NULLS LAST
LIMIT 1
) as table2 ON (pimr.created_at = table2.created_at)
WHERE table2.created_at is NULL
You need a lateral join to be able to reference the outer table in the sub-select for the join.
You are also referencing an alias pimr in the join condition, which isn't available anywhere in the query. So you need to change that to table1 in the join condition.
You should also given the table in the inner query an alias to avoid confusion:
SELECT id
FROM property_import_image_results table1
LEFT JOIN LATERAL (
SELECT p2.created_at
FROM property_import_image_results p2
WHERE p2.external_url = table1.external_url
ORDER BY p2.created_at DESC NULLS LAST
LIMIT 1
) as table2 ON (table1.created_at = table2.created_at)
WHERE table2.created_at is NULL
Edit
This kind of query can also be solved using window functions:
select id
from (
select id,
max(created_at) over (partition by external_url) as max_created
FROM property_import_image_results
) t
where created_at <> max_created;
This might be faster than aggregating and joining as you do. But it's hard to tell. The lateral joins are quite efficient as well. It has the advantage that you can add any column you like to the result because no grouping is required.

How do I limit the number of rows returned by this LEFT JOIN to one?

So I think I've seen a solution to this however they are all very complicated queries. I'm in oracle 11g for reference.
What I have is a simple one to many join which works great however I don't need the many. I just want the left table (the one) to just join any 1 row which meets the join criteria...not many rows.
I need to do this because the query is in a rollup which COUNTS so if I do the normal left join I get 5 rows where I only should be getting 1.
So example data is as follows:
TABLE 1:
-------------
TICKET_ID ASSIGNMENT
5 team1
6 team2
TABLE 2:
-------------
MANAGER_NAME ASSIGNMENT_GROUP USER
joe team1 sally
joe team1 stephen
joe team1 louis
harry team2 ted
harry team2 thelma
what I need to do is join these two tables on ASSIGNMENT=ASSIGNMENT_GROUP but only have 1 row returned.
when I do a left join I get three rows returned beaucse that is the nature of hte left join
If oracle supports row number (partition by) you can create a sub query selecting where row equals 1.
SELECT * FROM table1
LEFT JOIN
(SELECT *
FROM (SELECT *,
ROW_NUMBER()
OVER(PARTITION BY assignmentgroup ORDER BY assignmentgroup) AS Seq
FROM table2) a
WHERE Seq = 1) v
ON assignmet = v.assignmentgroup
You could do something like this.
SELECT t1.ticket_id,
t1.assignment,
t2.manager_name,
t2.user
FROM table1 t1
LEFT OUTER JOIN (SELECT manager_name,
assignment_group,
user,
row_number() over (partition by assignment_group
--order by <<something>>
) rnk
FROM table2) t2
ON ( t1.assignment = t2.assignment_group
AND t2.rnk = 1 )
This partitions the data in table2 by assignment_group and then arbitrarily ranks them to pull one arbitrary row per assignment_group. If you care which row is returned (or if you want to make the row returned deterministic) you could add an ORDER BY clause to the analytic function.
I think what you need is to use GROUP BY on the ASSIGNMENT_GROUP field.
http://www.w3schools.com/sql/sql_groupby.asp
In MySQL you could just GROUP BY ASSIGNMENT and be done. Oracle is more strict and refuses to just choose (in an undefined way) which values of the three rows to choose. That means all returned columns need to be part of GROUP BY or be subject to an aggregat function (COUNT, MIN, MAX...)
You can of course choose to just don't care and use some aggregat function on the returned columns.
select TICKET_ID, ASSIGNMENT, MAX(MANAGER_NAME), MAX(USER)
from T1
left join T2 on T1.ASSIGNMENT=T2.ASSIGNMENT_GROUP
group by TICKET_ID, ASSIGNMENT
If you do that I would seriously doubt that you need the JOIN in the first place.
MySQL could also help with GROUP_CONCAT in the case that you want a string concatenation of group values for a column (humans often like that), but with Oracle that is staggeringly complex.
Using a subquery as already suggested is an option, look here for an example. It also allows you to sort the subquery before selecting the top row.
In Oracle, if you want 1 result, you can use the ROWNUM statement to get the first N values of a query e.g.:
SELECT *
FROM TABLEX
WHERE
ROWNUM = 1 --gets the first value of the result
The problem with this single query is that Oracle never returns the data in the same order. So, you must oder your data before use rownum:
SELECT *
FROM
(SELECT * FROM TABLEX ORDER BY COL1)
WHERE
ROWNUM = 1
For your case, looks like you only need 1 result, so your query should look like:
SELECT *
FROM
TABLE1 T1
LEFT JOIN
(SELECT *
FROM TABLE2 T2 WHERE T1.ASSIGNMENT = T2.ASSIGNMENT_GROUP
AND
ROWNUM = 1) T3 ON T1.ASSIGNMENT = T3.ASSIGNMENT_GROUP
you can use subquery - select top 1