Oracle Sql Duplicate rows when joining new table - sql

I am using oracle sql to join tables. I use the following code:
SELECT
T.TRANSACTION_KEY,
PR.ACCOUNT_KEY,
T.ACCT_CURR_AMOUNT,
T.EXECUTION_LOCAL_DATE_TIME,
TC.DESCRIPTION,
T.OPP_ACCOUNT_NAME,
T.OPP_COUNTRY,
PT.PARTY_TYPE_DESC,
P.PARTY_NAME,
P.CUSTOM_SMALL_STRING_02,
CO.COUNTRY_NAME,
LE.LIST_CD
FROM TRANSACTIONS T
LEFT JOIN TRANSACTION_CODE TC
ON T.TRANSACTION_CODE = TC.ENTITY
LEFT JOIN PARTY_ACCOUNT_RELATION PR
ON T.ACCOUNT = PR.ACCOUNT
LEFT JOIN PARTY P
ON PR.PARTY_KEY = P.PARTY_KEY
LEFT JOIN PARTY_TYPE PT
ON P.PARTY_TYPE = PT.ENTITY
LEFT JOIN COUNTRY CO
ON T.OPP_COUNTRY = CO.ENTITY
LEFT JOIN LISTED_ENTITY LE
ON CO.COUNTRY = LE.ENTITY_KEY
WHERE
PR.PARTY_KEY = '111111111' and T.EXECUTION_LOCAL_DATE_TIME>'2017-01-01';
It works fine until now but I want to join another table which has a column in common(ENTITY_KEY) with PARTY_ACCOUNT_RELATION table (ACCOUNT_KEY) and I want to include some of the new table's columns but when I do that, it becomes dublicated. I am adding the following lines before "where" statment:
LEFT JOIN EVALUATE_RULE ER
ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
Does anyone know where the problem is?

If joining another table into an existing query causes the existing rows to be duplicated, it is because the table being joined in has duplicate values in the columns that are being used as keys for the join
In your case, if you do
SELECT ENTITY_KEY FROM EVALUATE_RULE GROUP BY ENTITY_KEY HAVING COUNT(*) > 1
You'll see which entity_keys are duplicated. When these duplicates are joined to the existing data, the existing data has to be doubled up to permit both rows from EVALUATE_RULE with the same ENTITY_KEY to exist in the result set
You must either de-dupe the table, or put other clauses into your ON condition to further restrict the rows coming from EVALUATE_RULE.
For example, after adding EVALUATE_RULE and putting ER.* in your SELECT list, imagine that you can see that the rows from ER are status = 'old' and status = 'current' but you know you only want the current ones.. So put AND er.status = 'current' in your ON clause
Your comment indicates that multiple records differ by some column you don't care about, so this technique will just select only one row:
LEFT JOIN
(SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e) er
ON
er.entity_key = pr.account_key and
er.rown = 1
If you want info on why this works, run that sql in isolation:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e
ORDER BY e.entity_key -- i added this to make it more clear what is going on. You don't need it in your main query
It just assigns a number to each row in the table, the number restarts at 1 every time entity_key changes, so we can then select all those with rown = 1
If it turns out you DO want something specific like "the latest row from evaluate_rule", you can use something like this:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.created_date DESC) as rown FROM evaluate_rule e
Now the latest created_date row will always have rown = 1

So far as I can understain from your description, table EVALUATE_RULE has moro records with ACCOUNT_KEY=ENTITY_KEY.
You can change your query section:
LEFT JOIN EVALUATE_RULE ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
to
LEFT JOIN (SELECT DISTINCT ENTITY_KEY FROM EVALUATE_RULE) ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
If you post structure of EVALUATE_RULE (indicating PK columns) I can change my answer to let you includ EVALUATE_RULE columns in final query.

Related

BigQuery - Joining on multiple conditions using subqueries and OR statements

Is there anyway to join two tables on multiple potential conditions?
I'm currently migrating some code from Postgres to Bigquery where I joined on multiple potential values like:
SELECT
*
FROM
(
SELECT
offer_table.offer_id
,customer_table.customer_name
,customer_table.visit_count
,ROW_NUMBER() OVER (PARTITION BY offer_table.offer_id ORDER BY customer_table.visit_count DESC) AS customer_visit_rank
FROM
offer_table
LEFT JOIN customer_table ON
(
offer_table.customer_id = customer_table.customer_id
OR offer_table.email = customer_table.email
OR offer_table.phone = customer_table.phone
)
) dummy
WHERE
customer_visit_rank = 1
I needed to this because my offer and customer data had inconsistent usage of our id, email, and phone fields but all were valid potential matches. If multiple fields worked (ex: id and email matched), there would be duplicate rows and I'd filter them out based on the row_number column after ranking using the ORDER BY section.
However when I try to join on multiple conditions in BigQuery, I get this error message:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
Has anyone figured out a solution to join on multiple values instead of doing the above?
You can write separate queries, then use COALESCE:
SELECT
*
FROM
(
SELECT
offer_table.offer_id
,COALESCE(c1.customer_name,c2.customer_name,c3.customer_name)
,COALESCE(c1.visit_count,c2.visit_count,c3.visit_count)
,ROW_NUMBER() OVER (PARTITION BY offer_table.offer_id ORDER BY customer_table.visit_count DESC) AS customer_visit_rank
FROM
offer_table
LEFT JOIN customer_table c1
ON offer_table.customer_id = customer_table.customer_id
LEFT JOIN customer_table c2
ON offer_table.email = customer_table.email
LEFT JOIN customer_table c3
ON offer_table.phone = customer_table.phone
)
) AS dummy
WHERE
customer_visit_rank = 1

How to display last updated record for multiple records

I have joined two tables to pull the data I need. I'm having trouble only displaying the most current record from one table. What I'm trying to do is look for the last updated value. I have tried to incorporate max() and row_num but have not had any success.
Here is what I currently have:
select distinct t1.CaId,t1.Enrolled,t1.Plan,t2.Category,t2.updateddate
from table.one(nolock) t1
inner join table.two(nolock) t2 on t1.CaId=t2.CaID
where t1.coverageyear=2016
and right(t1.Plan,2)<>left(t2.Category,2)
order by 5 desc
You can join your main query with a subquery that just grabs the last update date for each ID, like this:
select all_rec.CaId, all_rec.Enrolled, all_rec.[Plan], all_rec.Category, all_rec.updateddate
from
(select distinct t1.CaId,t1.Enrolled,t1.[Plan],t2.Category,t2.updateddate
from [table.one](nolock) t1
inner join [table.two](nolock) t2 on t1.CaId=t2.CaID
where t1.coverageyear=2016
and right(t1.[Plan],2)<>left(t2.Category,2)
) as all_rec
inner join
(SELECT max(updateddate) AS LAST_DATE, CaId
FROM [table.two](nolock)
GROUP BY CaId)
AS GRAB_DATE
on (all_rec.Ca_Id = GRAB_DATE.Ca_Id)
and (all_rec.updateddate = GRAB_DATE.updateddate)
order by 5 desc
I added brackets around your usages of table and Plan because those are SQL reserved words.
If you are trying to get last updated value, then just simply add to your query:
order by t2.updateddate desc
It will show most current record from tables.

SQL Inner Join using Distinct and Order by Desc

table a.
Table b . I have two tables. Table A has over 8000+ records and continues to grow with time.
Table B has only 5 or so records and grows rarely but does grow sometimes.
I want to query Table A's last records where the Id for Table A matches for Table B. The problem is; I am getting all the rows from Table A. I just need the ones where Table A and B match once. These are unique Id's when a new row is inserted into table B and never get repeated.
Any help is most appreciated.
SELECT a.nshift,
a.loeeworkcellid,
b.loeeconfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM oeeworkcell a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
ORDER BY a.loeeworkcellid DESC
I am assuming you want to get the only the lastest (as you said) row from the TableA but JOIN giving you all the rows.You can use the Row_Number() to get the rownumber and then apply the join and filter it with the Where clause to select only the first row from the JOIN. So what you can try as below,
;WITH CTE
AS
(
SELECT * , ROW_NUMBER() OVER(PARTITION BY loeeconfigworkcellid ORDER BY loeeworkcellid desc) AS Rn
FROM oeeworkcell
)
SELECT a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM CTE a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
WHERE
a.Rn = 1
You need to group by your data and select only the data having the condition with min id.
SELECT a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
FROM oeeworkcell a
INNER JOIN dbo.oeeconfigworkcell b
ON a.loeeconfigworkcellid = b.loeeconfigworkcellid
group by
a.nshift,
a.loeeworkcellid,
b.loeecoonfigworkcellid,
b.loeescheduleid,
b.sdescription,
b.sshortname
having a.loeeworkcellid = min(a.loeeworkcellid)

Eliminate duplicate rows from query output

I have a large SELECT query with multiple JOINS and WHERE clauses. Despite specifying DISTINCT (also have tried GROUP BY) - there are duplicate rows returned. I am assuming this is because the query selects several IDs from several tables. At any rate, I would like to know if there is a way to remove duplicate rows from a result set, based on a condition.
I am looking to remove duplicates from results if x.ID appears more than once. The duplicate rows all appear grouped together with the same IDs.
Query:
SELECT e.Employee_ID, ce.CC_ID as CCID, e.Manager_ID, e.First_Name, e.Last_Name,,e.Last_Login,
e.Date_Created AS Date_Created, e.Employee_Password AS Password,e.EmpLogin
ISNULL((SELECT TOP 1 1 FROM Gift g
JOIN Type t ON g.TypeID = t.TypeID AND t.Code = 'Reb'
WHERE g.Manager_ID = e.Manager_ID),0) RebGift,
i.DateCreated as ImportDate
FROM #EmployeeTemp ct
JOIN dbo.Employee c ON ct.Employee_ID = e.Employee_ID
INNER JOIN dbo.Manager p ON e.Manager_ID = m.Manager_ID
LEFT JOIN EmployeeImp i ON e.Employee_ID = i.Employee_ID AND i.Active = 1
INNER JOIN CreditCard_Updates cc ON m.Manager_ID = ce.Manager_ID
LEFT JOIN Manager m2 ON m2.Manager_ID = ce.Modified_By
WHERE ce.CCType ='R' AND m.isT4L = 1
AND CHARINDEX(e.first_name, Selected_Emp) > 0
AND ce.Processed_Flag = #isProcessed
I don't have enough reputation to add a comment, so I'll just try to help you in an answer proper (even though this is more of a comment).
It seems like what you want to do is select distinctly on just one column.
Here are some answers which look like that:
SELECT DISTINCT on one column
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

Select last record out of grouped records

i have this code and i want someone to help me to change it to a grouped query which orders froms below.
SELECT *
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
Left Join photo_comment
On users_pics.u_pic_id = photo_comment.pic_id
WHERE users_pics.wardrobe = MMColParam
ORDER BY u_pic_id asc
what i mean is i have grouped of records which i want to select one per record only from beneath. for example if i have 10 records of the name "John" i want to select the last "John" out of the 10 and then the rest also follows
I'm going to presume that your users table contains a single user, and each user has a single profile, and your photo_comment table can contain multiple comments.
Depending on your RDBMS, you can do this a number of ways. Row_Number can often be a quick way of doing this if you're using a database which supports window functions such as SQL Server or Oracle.
A generic solution to this is to join the table back to itself using the MAX aggregate. This is dependent on having a field to determine which record is the max. Generally speaking, that would be an identity/auto number field or a time stamp field.
Here is the basic concept using photo_comment_id as your determining column:
SELECT *
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
LEFT Join (
SELECT pic_id, MAX(photo_comment_id) max_photo_comment_id
FROM max_photo_comment
GROUP BY pic_id
) max_photo_comment On users_pics.u_pic_id = max_photo_comment.pic_id
LEFT Join photo_comment On
max_photo_comment.pic_id = photo_comment.pic_id AND
max_photo_comment.max_photo_comment_id = photo_comment.photo_comment_id
WHERE users_pics.wardrobe = MMColParam
ORDER BY u_pic_id asc
If your database supports ROW_NUMBER, then you can do this as well (still using the photo_comment_id field):
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY photo_comment.pic_id
ORDER BY photo_comment.photo_comment_id DESC) rn
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
LEFT JOIN photo_comment
ON users_pics.u_pic_id = photo_comment.pic_id
WHERE users_pics.wardrobe = MMColParam
) t
WHERE rn = 1
ORDER BY u_pic_id asc