Eliminate duplicate rows from query output

Eliminate duplicate rows from query output - sql

I have a large SELECT query with multiple JOINS and WHERE clauses. Despite specifying DISTINCT (also have tried GROUP BY) - there are duplicate rows returned. I am assuming this is because the query selects several IDs from several tables. At any rate, I would like to know if there is a way to remove duplicate rows from a result set, based on a condition.
I am looking to remove duplicates from results if x.ID appears more than once. The duplicate rows all appear grouped together with the same IDs.
Query:
SELECT e.Employee_ID, ce.CC_ID as CCID, e.Manager_ID, e.First_Name, e.Last_Name,,e.Last_Login,
e.Date_Created AS Date_Created, e.Employee_Password AS Password,e.EmpLogin
ISNULL((SELECT TOP 1 1 FROM Gift g
JOIN Type t ON g.TypeID = t.TypeID AND t.Code = 'Reb'
WHERE g.Manager_ID = e.Manager_ID),0) RebGift,
i.DateCreated as ImportDate
FROM #EmployeeTemp ct
JOIN dbo.Employee c ON ct.Employee_ID = e.Employee_ID
INNER JOIN dbo.Manager p ON e.Manager_ID = m.Manager_ID
LEFT JOIN EmployeeImp i ON e.Employee_ID = i.Employee_ID AND i.Active = 1
INNER JOIN CreditCard_Updates cc ON m.Manager_ID = ce.Manager_ID
LEFT JOIN Manager m2 ON m2.Manager_ID = ce.Modified_By
WHERE ce.CCType ='R' AND m.isT4L = 1
AND CHARINDEX(e.first_name, Selected_Emp) > 0
AND ce.Processed_Flag = #isProcessed

I don't have enough reputation to add a comment, so I'll just try to help you in an answer proper (even though this is more of a comment).
It seems like what you want to do is select distinctly on just one column.
Here are some answers which look like that:
SELECT DISTINCT on one column
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

Related

PostgreSQL - SELECT DISTINCT, ORDER BY expressions must appear in select list

I'm new to SQL.
I guess I've misunderstood the concept of how to use DISTINCT keyword.
Here's my code:
SELECT DISTINCT(e.id), e.text, e.priority, CAST(e.order_number AS integer), s.name AS source, e.modified_time, e.creation_time, (SELECT string_agg(DISTINCT text, '|') FROM definitions WHERE entry_id = d.entry_id) AS definitions
FROM entries AS e
LEFT JOIN definitions d ON d.entry_id = e.id
INNER JOIN sources s ON e.source_id = s.id
WHERE vocabulary_id = 22
ORDER BY e.order_number
The error is as follows:
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 6: ORDER BY e.order_number
Just trying to understand what my SELECT statement should look like.

It appears to me that you are trying to distinct on a single column and not on others - which is bound to fail.
For example, select distinct a,b,c from x returns the unique combinations of a,b and c, not unique a but normal b and c

If you want one row per distinct e.id, then you are looking for distinct on. It is very important that the order by be consistent with the distinct on keys:
SELECT DISTINCT ON (e.id), e.id, e.text, e.priority, CAST(e.order_number AS integer),
s.name AS source, e.modified_time, e.creation_time,
(SELECT string_agg(DISTINCT d2.text, '|') FROM definitions d2 WHERE d2.entry_id = d.entry_id) AS definitions
FROM entries e LEFT JOIN
definitions d
ON d.entry_id = e.id INNER JOIN
sources s
ON e.source_id = s.id
WHERE vocabulary_id = 22
ORDER BY e.id, e.order_number;
Given the subquery, I suspect that there are better ways to write the query. If that is of interest, ask another question, provide sample data, desired results, and a description of the logic.

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.

Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)

Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

Fetch rows and count them in sqlserver

I wrote a stored procedure that join three tables to fetch province title from it's table. This is my code:
BEGIN
select TbProvince.title, count(TbProvince.title) as cnt
from TbProvince
where TbProvince.provinceId IN (select TbCustomerUser.provinceId
from TbCustomerUser INNER JOIN
TbDeals
on TbCustomerUser.UserId = TbDeals.sellerUserID
where TbDeals.buyerUserID = 1
)
group by TbProvince.title
end
Description: I have three tables for deals, customers and provinces. I want to retrieve province title and the count of that for customers that were sellers.
The above code have no problem, but only return 1 as a count. The number of customers is more than one.
Can anybody help me solve my problem?

Your query is filtering the rows of TbProvince and then aggregating that table -- and only that table.
Instead, you want to join the tables together to count the customers not the provinces. The query is much simpler to write and read if you use table aliases:
select p.Title, count(*)
from TbCustomerUser cu join
TbDeals d
on cu.UserId = d.sellerUserID join
TbProvince p
on p.provinceId = cu.provinceId
where d.buyerUserID = 1
group by p.Title;

You have to perform the JOIN with customer table. If you use semi join (expressed by IN construct in your case) then you avoid duplicates that are expected in your case.
SELECT TbProvince.title,
COUNT(TbProvince.title) AS cnt
FROM TbProvince
JOIN TbCustomerUser ON TbProvince.provinceId = TbCustomerUser.provinceId
JOIN TbDeals ON TbCustomerUser.UserId = TbDeals.sellerUserID
WHERE TbDeals.buyerUserID = 1
GROUP BY TbProvince.title;

It should be as simple as:
You won't need the subselect. Just join all three tables and you'll receive your desired result.
SELECT TbProvince.title,
count(TbProvince.title) as cnt
FROM TbProvince
INNER JOIN TbCustomerUser
ON TbProvince.provinceId = TbCustomerUser.provinceId
INNER JOIN TbDeals
ON TbCustomerUser.UserId = TbDeals.sellerUserID
AND TbDeals.buyerUserID = 1
GROUP BY TbProvince.title
Why did your solution not work?
You subselect will return a "list" of provinceIDs from TbCustomerUser combinated with TbDeals with your restriction TbDeals.buyerUserID = 1.
The outer select will now return all rows from TbProvince IN this list.
But it's not returning a row for each Customer who had a deal.
That's why you have to JOIN all three tables at once.

Oracle Sql Duplicate rows when joining new table

I am using oracle sql to join tables. I use the following code:
SELECT
T.TRANSACTION_KEY,
PR.ACCOUNT_KEY,
T.ACCT_CURR_AMOUNT,
T.EXECUTION_LOCAL_DATE_TIME,
TC.DESCRIPTION,
T.OPP_ACCOUNT_NAME,
T.OPP_COUNTRY,
PT.PARTY_TYPE_DESC,
P.PARTY_NAME,
P.CUSTOM_SMALL_STRING_02,
CO.COUNTRY_NAME,
LE.LIST_CD
FROM TRANSACTIONS T
LEFT JOIN TRANSACTION_CODE TC
ON T.TRANSACTION_CODE = TC.ENTITY
LEFT JOIN PARTY_ACCOUNT_RELATION PR
ON T.ACCOUNT = PR.ACCOUNT
LEFT JOIN PARTY P
ON PR.PARTY_KEY = P.PARTY_KEY
LEFT JOIN PARTY_TYPE PT
ON P.PARTY_TYPE = PT.ENTITY
LEFT JOIN COUNTRY CO
ON T.OPP_COUNTRY = CO.ENTITY
LEFT JOIN LISTED_ENTITY LE
ON CO.COUNTRY = LE.ENTITY_KEY
WHERE
PR.PARTY_KEY = '111111111' and T.EXECUTION_LOCAL_DATE_TIME>'2017-01-01';
It works fine until now but I want to join another table which has a column in common(ENTITY_KEY) with PARTY_ACCOUNT_RELATION table (ACCOUNT_KEY) and I want to include some of the new table's columns but when I do that, it becomes dublicated. I am adding the following lines before "where" statment:
LEFT JOIN EVALUATE_RULE ER
ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
Does anyone know where the problem is?

If joining another table into an existing query causes the existing rows to be duplicated, it is because the table being joined in has duplicate values in the columns that are being used as keys for the join
In your case, if you do
SELECT ENTITY_KEY FROM EVALUATE_RULE GROUP BY ENTITY_KEY HAVING COUNT(*) > 1
You'll see which entity_keys are duplicated. When these duplicates are joined to the existing data, the existing data has to be doubled up to permit both rows from EVALUATE_RULE with the same ENTITY_KEY to exist in the result set
You must either de-dupe the table, or put other clauses into your ON condition to further restrict the rows coming from EVALUATE_RULE.
For example, after adding EVALUATE_RULE and putting ER.* in your SELECT list, imagine that you can see that the rows from ER are status = 'old' and status = 'current' but you know you only want the current ones.. So put AND er.status = 'current' in your ON clause
Your comment indicates that multiple records differ by some column you don't care about, so this technique will just select only one row:
LEFT JOIN
(SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e) er
ON
er.entity_key = pr.account_key and
er.rown = 1
If you want info on why this works, run that sql in isolation:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e
ORDER BY e.entity_key -- i added this to make it more clear what is going on. You don't need it in your main query
It just assigns a number to each row in the table, the number restarts at 1 every time entity_key changes, so we can then select all those with rown = 1
If it turns out you DO want something specific like "the latest row from evaluate_rule", you can use something like this:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.created_date DESC) as rown FROM evaluate_rule e
Now the latest created_date row will always have rown = 1

So far as I can understain from your description, table EVALUATE_RULE has moro records with ACCOUNT_KEY=ENTITY_KEY.
You can change your query section:
LEFT JOIN EVALUATE_RULE ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
to
LEFT JOIN (SELECT DISTINCT ENTITY_KEY FROM EVALUATE_RULE) ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
If you post structure of EVALUATE_RULE (indicating PK columns) I can change my answer to let you includ EVALUATE_RULE columns in final query.

i want to modify this SQL statement to return only distinct rows of a column

select
picks.`fbid`,
picks.`time`,
categories.`name` as cname,
options.`name` as oname,
users.`name`
from
picks
left join categories
on (categories.`id` = picks.`cid`)
left join options
on (options.`id` = picks.oid)
left join users
on (users.fbid = picks.`fbid`)
order by
time desc
that query returns a result that like:
my question is.... I would like to modify the query to select only DISTINCT fbid's. (perhaps the first row only sorted by time)
can someone help with this?

select
p2.fbid,
p2.time,
c.`name` as cname,
o.`name` as oname,
u.`name`
from
( select p1.fbid,
min( p1.time ) FirstTimePerID
from picks p1
group by p1.fbid ) as FirstPerID
JOIN Picks p2
on FirstPerID.fbid = p2.fbid
AND FirstPerID.FirstTimePerID = p2.time
LEFT JOIN Categories c
on p2.cid = c.id
LEFT JOIN Options o
on p2.oid = o.id
LEFT JOIN Users u
on p2.fbid = u.fbid
order by
time desc
I don't know why you originally had LEFT JOINs, as it appears that all picks must be associated with a valid category, option and user... I would then remove the left, and change them to INNER joins instead.
The first inner query grabs for each fbid, the FIRST entry time which will result in a single entity for the FBID. From that, it re-joins to the picks table for the same ID and timeslot... then continues for the rest of the category, options, users join criteria of that single entry.

2 options, you could write a group by clause.
Or you could write a nested query joined back to itself to get pertinent info.
Nested aliased table:
SELECT
n.fBids
FROM
MyTable t
INNER JOIN
(SELECT DISTINCT fBids
FROM MyTable) n
ON n.ID = t.ID
Or group by option
SELECT fBId from MyTable
GROUP BY fBID

select picks.`fbid`, picks.`time`, categories.`name` as cname,
options.`name` as oname, users.`name` from picks left join categories
on (categories.`id` = picks.`cid`) left join options on (options.`id` = picks.oid)
left join users on (users.fbid = picks.`fbid`)
order by time desc GROUP BY picks.`fbid`

select
picks.fbid,
MIN(picks.time) as first_time,
MAX(picks.time) as last_time
from
picks
group by
picks.fbid
order by
MIN(picks.time) desc
However, if you want only distinct fbid's you cannot display cname and other columns at the same time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Eliminate duplicate rows from query output - sql

Related

PostgreSQL - SELECT DISTINCT, ORDER BY expressions must appear in select list

Slow MS Access Sub Query

Fetch rows and count them in sqlserver

Oracle Sql Duplicate rows when joining new table

i want to modify this SQL statement to return only distinct rows of a column

Categories

Resources