Select a NON-DISTINCT column in a query that return distincts rows - sql

The following query returns the results that I need but I have to add the ID of the row to then update it. If I add the ID directly in the select statement it will return me more results then I need because each ID is unique so the DISTINCT statement see the line as unique.
SELECT DISTINCT ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
So basically I need to add ucpse.ID in the Select statement while keeping DISTINCT values for MemberID,ProductID and UserID.
Any Ideas ?
Thank you

According to you comment:
If the data has been duplicated 67 times for a given employee with a given product and a given client, I need to keep only one of thoses records. It's not important which one, so this is why I use DISTINC to obtain unique combinaison of given employee with a given product and a given client.
You can use MIN() or MAX() and GROUP BY instead of DISTINCT
SELECT MAX(ucpse.ID) AS ID, ucpse.MemberID, ucpse.ProductID, ucpse.UserID
FROM UserCustomerProductSalaryExceptions as ucpse
WHERE EXISTS (SELECT NULL
FROM UserCustomerProductSalaryExceptions as upcse2
WHERE ucpse.userid = upcse2.userid AND ucpse.MemberID = upcse2.MemberID AND ucpse.ProductID = upcse2.ProductID
GROUP BY upcse2.UserID, upcse2.memberid, upcse2.productid
HAVING COUNT(UserID) >= 2
)
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
UPDATE:
From you comments I think the below query is what you need
DELETE FROM UserCustomerProductSalaryExceptions
WHERE ID NOT IN ( SELECT MAX(ucpse.ID) AS ID
FROM #UserCustomerProductSalaryExceptions
GROUP BY ucpse.MemberID, ucpse.ProductID, ucpse.UserID
HAVING COUNT(ucpse.ID) >= 2
)

If all you want is to delete the duplicates, this will do it:
WITH X AS
(SELECT ID,
ROW_NUMBER() OVER (PARTITION BY MemberID, ProductID, UserID ORDER BY ID) AS DupRowNum<br
FROM UserCustomerProductSalaryExceptions
)
DELETE X WHERE DupRowNum > 1

ID's not necessary - try:
UPDATE uu SET
<your settings here>
FROM UserCustomerProductSalaryExceptions uu
JOIN ( <paste your entire query above here>
) uc ON uc.MemberID=uu.MemberId AND uc.ProductID=uu.ProductId AND uc.UserID=uu.UserId

From the sound of your data structure (which I would STRONGLY advise normalizing as soon as possible), it sounds like you should be updating all the records. It sounds as if each duplicate is important because it contains some information about an employee's relation to a customer or product.
I would probably update all the records. Try this:
UPDATE UCPSE
SET
--Do your updates here
FROM UserCustomerProductSalaryExceptions as ucpse
JOIN
(
SELECT UserID, MemberID, ProductID
FROM UserCustomerProductSalaryExceptions
GROUP BY UserID, MemberID, ProductID
HAVING COUNT(UserID) >= 2
) T
ON ucpse.UserID = T.UserID AND ucpse.MemberID = T.MemberID AND ucpse.ProductID = T.ProductID

Related

Postgresql query: update status of limit number of records based on group size

I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).
Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.
I failed to get the wanted output by below query:
UPDATE public."EmailContacts"
SET "Valid"=true
WHERE "EmailServer" IN (
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
LIMIT 5)
Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!
WITH major_servers AS (
SELECT email_server
FROM email_address
GROUP by email_server
HAVING count(*) >=6
),
enumerated_emails AS (
SELECT email,
email_server,
row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
FROM email_address
WHERE email_server IN (SELECT email_server FROM major_servers)
)
UPDATE email_address
SET valid = true
WHERE email IN (SELECT email
FROM enumerated_emails ee
WHERE ee.row_number <= 3);
The first query major_servers finds major groups where more than 5 email servers exist.
The second query enumerated_emails enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window function row_number().
The last query updates the first 3 rows in each major server group.
Find the sql-fiddle here.
You need to get the servers, then order the mails from which one and then perform the update. Something like this:
WITH DataSourceServers AS
(
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
),DataSourceEmails AS
(
SELECT "Email", row_number() OVER (PARTITION BY "EmailServer" ORDER BY "Email") AS rn
FROM public."EmailContacts"
WHERE "EmailServer" IN (SELECT "EmailServer" FROM DataSourceServers)
)
UPDATE public."EmailContacts"
SET "Valid" = true
FROM public."EmailContacts" E
INNER JOIN DataSourceEmails SE
WHERE E."EmailServer" = SE."EmailServer"
AND E."Email" = SE."Email"
AND SE.rn <= 3;

Listing multiple columns in a single row in SQL

(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
Hello,
On above query, I want to get rows of transaction id's which has seqnum=1 and seqnum=2
But if that transaction id has no second row (seqnum=2), I dont want to get any row for that transaction id.
Thanks!!
Something like this
Not 100% sure if this is correct without you table definition, but my understanding is that you want to EXCLUDE records if that record has an entry with seqnum=2 -- you can't use a where clause alone because that would still return seqnum = 1.
You can use an exists /not exists or in/not in clause like this
(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
and not exists ( select 1 from AC_POS_TRANSACTION_TRK a where a.id = aptt.id
and a.seqnum = 2)
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
basically what this does is it excludes records if a record exists as specified in the NOT EXISTS query.
One option you can try is to add a count of rows per group using the same partioning critera and then filter accordingly. Not entirely sure about your query without seeing it in context and with sample data - there's no aggregation so why use group by?
However can you try something along these lines
select * from (
select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,
Row_Number() over(partition by EXTERNAL_TRANSACTION_ID order by ID) as SEQNUM,
Count(*) over(partition by EXTERNAL_TRANSACTION_ID) Qty
from AC_POS_TRANSACTION_TRK
where [RESULT] ='Success'
)x
where SEQNUM in (1,2) and Qty>1
This should do the job.
With Qry As (
-- Your original query goes here
),
Select Qry.*
From Qry
Where Exists (
Select *
From Qry Qry1
Where Qry1.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry1.SEQNUM = 1
)
And Exists (
Select *
From Qry Qry2
Where Qry2.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry2.SEQNUM = 2
)
BTW, your original query looks problematic to me, specifically I think that instead of a GROUP BY columns those columns should be in the PARTITION BY clause of the OVER statement, but without knowing more about the table structures and what you're trying to achieve, I could not say for sure.

Get Exclusive Count in SQL Server

I'm trying to get the count of PlanIds that are exclusive records for a certain location in an m-n table.
Imagine that I have the following table:
Id
PlanId
LocationId
I want to retrieve the count of PlanIds that only have one LocationId associated with it
What I have so far:
SELECT COUNT(PlanId)
FROM PLANLOCATION
WHERE PLANLOCATION = LocationId
Can you guys, help me, please?
Thank you
Here is one method, that uses two levels of aggregation:
SELECT COUNT(*)
FROM (SELECT PlanId
FROM PLANLOCATION
GROUP BY PlanId
HAVING MIN(LocationId) = MAX(LocationId)
) p;
Another method uses NOT EXISTS:
select count(distinct pl.planid)
from planlocation pl
where not exists (select 1
from planlocation pl2
where pl2.planid = pl.planid and
pl2.locationid <> pl.locationid
);
Note that count(distinct) can be just count(*) if planlocation has no duplicates.

How to get Unique CID and Avatar from this table

I want to get unique CID's from this table. If there are 2 logins I still want to get only 1 row. Here is my code so far:
SELECT distinct [s1FirstName]
,[s1MiddleName]
,[s1LastName]
, [s1CIDNumber]
,Login_Name
FROM [dbSuppHousing].[dbo].[tblSurvey] s
where s.survey_dt>='1/1/1'
AND
s.survey_dt<='1/1/2099'
AND
s1CIDNumber<>''
The problem with above code is it will return multiple rows for different Login_Names. I just want to show 1 Login_Name per unique CID.
order by s1CIDNumber
I believe I need a self join but I cannot figure it out.
with x as
(select row_number() over(partition by s1CIDNumber order by Suevey_DT) as rn, *
from [dbSuppHousing].[dbo].[tblSurvey])
select x.* --add any other columns from tblusers as needed
from x join tblUsers t
on t.loginname = x.loginname
where x.rn = 1
You can use a row_number() function to only select 1 row per login. You can change the partitioning and order conditions if needed.

How do I create a SQL Distinct query and add some additional fields

I have the following query that selects combinations of first and last names and show me dupes. It works, not problems here.
I want to include three other fields for reference; Id, cUser, and cDate. These additional fields, however, should not be used to determine duplicates as I'd likely not end up with any duplicates.
SELECT * FROM
(SELECT FirstName, LastName, COUNT(*) as "Count"
FROM Contacts
WHERE ContactTypeID = 1
GROUP BY LastName,FirstName
) AS X
WHERE COUNT > 1
ORDER BY COUNT DESC
Any suggestions? Thanks!
SELECT *
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY FirstName, LastName) AS cnt
FROM Contacts
WHERE ContactTypeId = 1
) q
WHERE cnt > 1
ORDER BY
cnt DESC
This will return all fields for each of the duplicated records.
If these fields are always the same then you can include them in GROUP BY and it will not affect the detection of duplicates
If they are not then you must decide what kind of aggregate function you will apply on them, for example MAX() or MIN() would work and would give you some indication of which values are associated with some of the attributes for the duplicates.
Otherwise, if you want to see all of the records you can join back to the source
SELECT X2.* FROM
(SELECT FirstName, LastName, COUNT(*) as "Count"
FROM Contacts
WHERE ContactTypeID = 1
GROUP BY LastName,FirstName
) AS X INNER JOIN Contact X2 ON X.LastName = X2.LastName AND X.FirstName = X2.FirstName
WHERE COUNT > 1
ORDER BY COUNT DESC