Unable to remove duplicates with subquery - sql

I had a built a query previously that returned zero duplicates until I decided to join in a couple more tables. Now that I've joined them in, I'm unable to create the desired flag due to duplicates being returned. I've attached the scenario below as an example.
I only want one occurrence of each reference number (123456789). I want to create a flag when certain criteria are met. For example, I want to see when reference numbers for a certain account meet "X", but when I join the table I get every instance of that reference number in the joined table.
REFNO BEG END STATUS
123456789 123 456 E
123456789 456 789 E
123456789 789 012 A
I want to see all of the REFNO's based on other parameters set in the query, but I want a flag for anything where END = '012'. I can't left join to the table because I will get all three lines. If I do an inner join then I just get the 012 lines. I Tried the code below in my select statement to only pull when that scenario exists, but I'm getting wacky returns and don't know why. I feel like this should be fairly easy to accomplish, but I can't wrap my head around how to create a flag for just that scenario without getting duplicates or removing results with an inner join.
,(CASE WHEN EXISTS(SELECT 1
FROM QW.ABCD Z
WHERE Z.ABCD = P.ABCD
AND Z.END = '012'
AND Z.TIMESTAMP IS NULL
AND Z.STATUS IN ('A','E'))
THEN 'Y' ELSE 'N'
END)
AS "FLAG"
Please help as I'm not sure what I'm doing wrong to get the flag I want to see.

I am not sure id DB2 allows this combination of UNION and GROUP BY, but what you want is probably something like this:
SELECT 'N' AS FLAG, REF_NO,MIN(BEG),MAX(THEEND) FROM TAB_QW A
WHERE NOT EXIST (SELECT * FROM TAB_QW B WHERE B.REF_NO = A.REF_NO AND B.THEEND = '012')
GROUP BY REF_NO
UNION
SELECT 'Y' AS FLAG, REF_NO,MIN(BEG),MAX(THEEND) FROM TAB_QW C
WHERE EXISTS (SELECT * FROM TAB_QW D WHERE D.REF_NO = C.REF_NO AND D.THEEND = '012')
GROUP BY REF_NO

Related

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4
After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

SQL Server: Return results when not duplicate

I just can't seem to get my mind wrapped around this. To simplify as much as possible, let's say I have a table:
Id cid Account
1 4010 Bank Co
2 5323 Webazon
3 3513 Internal
4 3513 PhoneCo
5 5597 Internal
I'm wanting to return all results except for the lines that are Account = 'Internal' where there's also a customer with the same cid. So, in this case, we would return lines 1,2,4, and 5. Line 3 would not be returned, because 'PhoneCo' and 'Internal' share cid 3513. However, line 5 would be returned because there's not another record that shares cid 5597.
I'm going down the road of doing it with a UNION, where the first part is eliminating all 'Internal' records, and the second part is just those I'm interested in, but I may be going about it the wrong way.
Here is one method:
select t.*
from t
where t.account <> 'Internal' or
not exists (select 1
from t t2
where t2.cid = t.cid and t2.account <> 'Internal'
);
That is, select everything all non-internal records. And, select records for internal accounts where there is not a corresponding non-internal account.

SQL list multiple Duplicates

running a SQL query in access that is giving me matches where A = record 1, and B also = record 1 , C= record 2 and D E and F also = record 2.
I want my results to display (only max Value)
B =record 1
F= record 2. ( this is a matching query)
basically i want to eliminate duplicates and select "distinct" does not seem to be working for me.
SELECT
FEED_2.ID AS FEED_2_ID,
FEED_3.field_ID,
FEED_3.ID AS FEED_3_ID
FROM FEED_2 INNER JOIN FEED_3 ON FEED_2.[field_ID] = FEED_3.[field_ID]
order by FEED_3.ID
im getting results where feed 2 ID #1,3, and 5 all equal feed 3 - ID #1
i only want feed 2, #5 = feed 3 #1. no Dupes
sorry - hope that helps
It's a shot in the dark but, is something like this you are looking for?
SELECT max(Column_With_ABCDEF), Column_With_record from TABLE_NAME GROUP BY Column_With_record;
If this is not what you are asking for, please do edit your question with your table schema and/or the query you are using so we can help.
---------------- EDIT ----------------
Ok so you can try this:
Select max(FEED_2_ID), field_ID , FEED_3_ID
from (
SELECT FEED_2.ID AS FEED_2_ID, FEED_3.field_ID As field_ID, FEED_3.ID AS FEED_3_ID
FROM FEED_2 INNER JOIN FEED_3
ON FEED_2.[field_ID] = FEED_3.[field_ID]
)
GROUP BY FEED_3_ID, field_ID
ORDER BY FEED_3_ID
The main select is going to group the result from the subquery, that way you should not get duplicated values.
Hope this help

Why doesn't this SQL sub query statement work?

The statement produces the following error.
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
I presume I somehow need to concatenate the field names in the subquery?
SELECT (
SELECT COALESCE(Table_Field, Field) AS Fields
FROM API_Objects_Fields
WHERE Field IN (
'fullname'
,'confirmed'
,'primary_email'
,'location_short'
)
)
FROM user_basics U
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
WHERE PM.PodID = 164
ORDER BY U.Ctime DESC
The sub query specifies the fields to be returned from the table.
DECLARE #Name VARCHAR(1000)
Select #Name =
COALESCE(#Name,'') +Table_Field + ';'
FROM API_Objects_Fields
WHERE Field IN
( 'fullname' ,'confirmed' ,'primary_email' ,'location_short' )
Select #Name As FieldName
#akfkmupiwu need to do like this for above comment
WITH CTE AS
(SELECT (
SELECT DISTINCT TOP 1 COALESCE(Table_Field, Field)
FROM API_Objects_Fields F
WHERE F.UserID = PM.UserID AND F.Field IN (
'fullname'
,'confirmed'
,'primary_email'
,'location_short'
)
)AS Fields,
ROW_NUMBER()OVER (PARTITION BY Table_Field ORDER BY FIELD)AS RN
FROM user_basics U
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
WHERE PM.PodID = 164
ORDER BY U.Ctime DESC
)
Select * from CTE WHERE RN = 1
It is an assumption query basing on your question
What the error is telling you
The problem with your query is exactly what the error says, it brings back more than one result. Since your subquery is in the select portion of the outer query (as opposed to the from or the where), sql is looking for the one value to populate the specific column. Think of it more in terms of filling in an excel spreadsheet. You cannot add two separate values to one cell. Instead, you need the data to go into two separate rows.
On another note, coalesce checks if the first value is null, if it is then it returns the second value. If the first value is not null, that value is returned. It sounds to me that this is not the behavior that you are looking for.
How to fix this
You need to either change your query to pull back different rows for each of the possible values that Fields can be or you need to find a way to specify only one value to return for Fields. Since I am unsure what you are looking for, I am going to demonstrate the first way of solving this.
Data
Your question does not provide any data for API_Objects_Fields, so I am going to make some up. Let's assume the columns in this table are Field_ID, Table_Field, and Field and let's say that your table looks like this:
Field_ID | Table_Field | Field
1 | Alan Turing | fullname
2 | Catherine Zeta Jones | fullname
3 | True | confirmed
4 | MN | location_short
5 | 123-456-7890 | phone_number
As I mentioned before, right now your query would try to pull back the rows where the field is fullname, confirmed, or location_short all. Instead of trying to stuff one column of one row. full of 4 results, let's change your query to bring back 4 rows
The Query
SELECT f.Table_Field, Field
FROM user_basics U
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
INNER JOIN (
SELECT Table_Field, Field
FROM API_Objects_Fields
WHERE Field IN (
'fullname'
,'confirmed'
,'primary_email'
,'location_short'
)
) f
WHERE PM.PodID = 164
ORDER BY U.Ctime DESC
What will happen
This query will now pull back data that looks more like this:
Table_Field | Fields
Alan Turing | fullname
Catherine Zeta Jones | fullname
True | confirmed
MN | location_short
However, I think you will be surprised with the results you actually end up with. Since the query does not connect the data in API_Objects_Fields with any other tables, you would get the values from the results table above over and over again. In fact, you would get the values above for every single row returned by
Select *
From user_basics u
INNER JOIN Pod_Membership PM ON U.UserID = PM.UserID
WHERE PM.PodID = 164
If this query returns 12 results, you would end up with 12 Alan Turings, 12 Catherine Zeta Jones, 12 Trues, and 12 MNs. If this is not the result you are looking for, you will need to add an ON portion to the inner join so the results from f are connected with the other tables.

SQL index match to find duplicate data

I have the following table
Code Name Task
aa jones DC
ab dave DC
aca james IF
aca james DC
ab trevor IF
aa jones IF
ag francis DC
ag francis IF
af derek SF
af derek DC
This is a very big table, above is just a quick example.
So, I would like some help finding the code and name that have completed a IF or SF task and a DC task.
I would like it to show where one person has touched both of these tasks. The hierarchy of the tasks is; it comes in as either a SF or IF then someone will do that, then off the back of that we receive a DC task, and I want the ones where it has been completed by the same person, with the same reference number.
I am able to do this in excel with an INDEX MATCH function, but this takes up a tremendous amount of calculation time due to the size of the table.
One way to approach this is using group by with a having. This is a flexible way of expressing these types of conditions:
select code, name
from table t
group by code, name
having sum(case when task = 'DC' then 1 else 0 end) > 0 and
sum(case when task in ('IF', 'SF') then 1 else 0 end) > 0;
Each condition in the having clause counts the number of rows that meet the particular condition. The first, for instance, counts the rows that match 'DC' and takes only the code, name pairs that have at least one such match.
SELECT code,name FROM YOUR_TABLE_NAME WHERE task = 'DC' AND (task = 'IF' OR task = 'SF') GROUP BY name
try this query
Gordon Linoff's query can be made easier under the hypothesys that IF and SF are synonym and cannot be both present for the same Code-Name couple, as the data provided by the OP suggests
SELECT code, name
FROM table t
GROUP BY code, name
HAVING SUM(CASE WHEN task IN ('IF', 'SF', 'DC') THEN 1 ELSE 0 END) = 2;
select code,name from (select distinct code,name from table1 where task='SF' or task='IF') as temp1 inner join (select distinct code as code2,name as name2 from table1 where task='DC') as temp2 on code=code2,name=name2;
I'm assuming that you have the table in table1. The code constructs two tables temp1 and temp2. temp1 contains those codes and names which have been assigned SF and IF. temp2 contains those codes and names which have been assigned DC. Finally, I join the two tables together to find code-name pairs in both tables. This is faster than in Excel because the database engine probably temporarily indexes the columns being joined on.
Actually, you can do this in Excel. You sort the table by code and name, then enter the following formulas (assuming "Code" is in A1):
D2=if(and(A2=A1,B2=B1,D1),true,or(C2="IF",C2="SF"))
E2=if(and(A2=A1,B2=B1,E1),true,C2="DC")
Select these two cells, and double-click the fill-handle (the little square at the bottom right of the selection). Then, with the two columns selected, copy, and then "Paste Special..." > "Values". Then, filter (Alt-D-F-F) for the rows with values in columns D and E being both true. That is the result you want. Select these rows and copy to a new sheet if desired.
Alternatively, you can follow the SQL "group by" solution given by Gordon, so that you do not need to sort: Create two new columns like the above, but:
D1: "D"
E1: "E"
D2=if(or(C2="IF",C2="SF"),1,0)
E2=if(C2="DC",1,0)
Then, "Insert" > "PivotTable", drag "Code" and "Name" to be row labels. Drag "D" to be under Values, click on it, "Value Field Settings...", and then select "Max". Do the same for "E", and then the rows with 1 in both D and E will be the result you want.